3 Population Proportion

3.1 Concept

Cochran’s sample size calculation is a design tool for determining the appropriate sample size to estimate a population proportion. As such, it does not directly test a hypothesis like a statistical test (e.g., t-test or ANOVA). However, the underlying null and alternative hypotheses behind Cochran’s sample size calculation are implicitly related to ensuring a certain level of precision (margin of error) for estimating proportions.

Below are the conceptual null and alternative hypotheses for both infinite population and finite population adjustments.

1. Hypotheses for Infinite Population Proportion Sample Size Calculation

Goal: Determine the minimum sample size needed to estimate a proportion ( $p$ ) within a specified margin of error (e) and confidence level (Z) for a large population (assumed infinite).
Null Hypothesis ( $H_{0}$ ): The estimated proportion ( $\hat{p}$ ) is not within the desired margin of error ( $e$ ) of the true population proportion ( $p$ ) with the specified confidence level.
Alternative Hypothesis ( $H_{1}$ ): The estimated proportion ( $\hat{p}$ ) is within the desired margin of error ( $e$ ) of the true population proportion ( $p$ ) with the specified confidence level.

2. Hypotheses for Finite Population Adjustment

When calculating the sample size for a finite population size ( $N$ ), we adjust the required sample size to account for the fact that the population is not infinitely large. This adjustment ensures efficiency (i.e., fewer participants required) without sacrificing precision.

Null Hypothesis ( $H_{0}$ ): The estimated proportion ( $\hat{p}$ ) is not within the desired margin of error ( $e$ ) when adjusting for finite population size ( $N$ ).
Alternative Hypothesis ( $H_{1}$ ): The estimated proportion ( $\hat{p}$ ) is within the desired margin of error ( $e$ ) even with the adjustment for finite population size ( $N$ ).

3.2 Cochran’s Sample Size for Infinite Proportion

To estimate the sample size for an infinite population, we use Cochran’s sample size formula for proportions:

$n_{0} = \frac{Z^{2} \cdot p \cdot (1 - p)}{e^{2}}$

Where:

$n_{0}$ : Initial sample size (assuming an infinite population)
$Z$ : Z-value
$p$ : Expected proportion of the population with the attribute (use 0.5 for maximum variability)
$e$ : Margin of error ( $e$ = 0.05)

The following R function implements Cochran’s formula:

cochran_sample_size <- function(p, e, conf_level = 0.95) {
   
  Z <- qnorm(1 - (1 - conf_level) / 2) # Z-value for confidence level
  n0 <- (Z^2 * p * (1 - p)) / (e^2)    # Cochran's formula
  ceiling(n0)
  
}

3.2.1 Example

Let’s say you want to calculate the required sample size to estimate a population proportion with:

95% confidence level
Estimated proportion (p) = 0.5 (maximum variability)
Margin of error (e) = 0.05

n0 <- cochran_sample_size(
  p = 0.5,           # Expected proportion (for maximum variability)
  e = 0.05,          # Margin of error
  conf_level = 0.95  # 95% confidence level
  )

n0

[1] 385

3.3 Finite Population Correction

Since our study population is finite, we adjust the initial sample size using the finite population correction formula:

$n = \frac{n_{0}}{1 + \frac{n_{0} - 1}{N}}$

Where:

$n$ : Adjusted sample size
$N$ : Total population size (total number of residents)

R function to apply this correction:

finite_pop_adj <- function(n0, N) {
  n <- n0 / (1 + (n0 - 1) / N)
  ceiling(n) 
}

3.3.1 Example

For a population of 1000, the required sample size is:

finite_pop_adj(n0 = n0, N = 1000)

[1] 279

This approach helps you compute the sample size needed to accurately estimate a population proportion with a given confidence and margin of error.