4  Continuous Outcome

library(pwr)

4.1 Comparing 2 groups

4.1.1 Overview

pwr.t.test() function

  • One-sample t test (type = “one.sample”)
  • Two-sample t test (type = “two.sample”)
  • Paired t test (type = “paired”)

Cohen’s d is used as the effect size

  • Very small (d = 0.01)
  • Small (d = 0.2)
  • Medium (d = 0.5)
  • Large (d = 0.8)
  • Very large (d = 1.2)
  • Huge (d = 2)

In the example below, I will be used these setting for default values:

  • Medium effect size
  • A two-tailed test
  • A significance of 0.05 and a power of 80%

4.1.2 One-sample t-test

Cohen’s d:

\[ d = \frac{\mu_1 - \mu_0}{SD} \]

Where:

  • \(\mu_0\) = mean under \(H_0\)
  • \(\mu_1\) = mean under \(H_1\)
  • \(SD\) = SD under \(H_0\)

4.1.2.1 Ex 1: New dietary supplement

Does the introduction of a new dietary supplement reduce systolic blood pressure in patients with stage 1 hypertension more effectively than the currently recommended lifestyle modifications alone?

The primary outcome is the mean change in systolic blood pressure (mmHg) after a 12-week supplementation period.

Let’s assume a medium effect size (\(d = 0.5\))

cohen.ES(test = "t", size = "medium")

     Conventional effect size from Cohen (1982) 

           test = t
           size = medium
    effect.size = 0.5
pwr.t.test(d = 0.5, 
           sig.level = 0.05, power = 0.8, 
           type = "one.sample", alternative = "two.sided")

     One-sample t test power calculation 

              n = 33.36713
              d = 0.5
      sig.level = 0.05
          power = 0.8
    alternative = two.sided
  • N = 34
  • If dropout rate of 20%, a total of 43 samples are required

For non-parametric test: adding 15% gives a total of 65.

4.1.2.2 Ex 2: New DM Drug

Let’s propose a study of a new drug to reduce hemoglobin A1c in type 2 diabetes over a 1 year study period. You estimate that your recruited participants will have a mean baseline A1c of 9.0, which will be unchanged by your placebo, but reduced (on average) to 7.0 by the study drug.

let’s say 5.0 and 17.0 for min and max of Hgb A1c

sd_approx <- (17 - 5)/4
d1 <- (9 - 7) / sd_approx # delta / sd

pwr.t.test(
  n = NULL,
  sig.level = 0.05,
  type = "two.sample",
  alternative = "two.sided",
  power = 0.80,
  d = d1
)

     Two-sample t test power calculation 

              n = 36.30569
              d = 0.6666667
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

N = 37 in each group (Assuming a 20% dropout rate in each arm, would require 37*5/4 subjects per arm)

If study on 50 participants, what would the power be?

pwr.t.test(
  n = 25, # note that n is per arm
  sig.level = 0.05,
  type = "two.sample",
  alternative = "two.sided",
  power = NULL, # ?
  d = 0.66
)

     Two-sample t test power calculation 

              n = 25
              d = 0.66
      sig.level = 0.05
          power = 0.6280322
    alternative = two.sided

NOTE: n is number in *each* group

4.1.3 Two-sample t-test

Cohen’s d for Welch:

\[ d = \frac{ \mu_1 - \mu_2 }{SD_{pool}} \]

Where

\[ SD_{pool} = \sqrt{ (SD_1^2 + SD_2^2)/2 } \]

pwr.t.test(d = 0.5, sig.level = 0.05, power = 0.8, 
           type = "two.sample",
           alternative = "two.sided")

     Two-sample t test power calculation 

              n = 63.76561
              d = 0.5
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

Assuming a p-value of 0.05 and a power of 80% in a two-tailed test, when the effect size d = 0.5

  • N = 64 x 2

  • If dropout rate of 20%, Total = 160

Mann-Whitney U test:

  • For non-parametric test add an additional 15% for each group would give a total of 240 people.

4.1.4 Paired t-test

pwr.t.test(d = 0.5, sig.level = 0.05, power = 0.8, 
           type = "paired",
           alternative = "two.sided")

     Paired t test power calculation 

              n = 33.36713
              d = 0.5
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number of *pairs*

Assuming a p-value of 0.05 and a power of 80% in a two-tailed test, the minimum number of pairs required to demonstrate statistical significance is 34 when the effect size d = 0.5.

Considering the dropout rate of 20%, a total of 43 pairs are required.

Paired Wilcoxon test

  • For non-parametric test, add an additional 15%, the total 65 pairs are required

4.2 Comparing ≥3 groups

4.2.1 ANOVA (Parametric)

Studies that compare averages of three or more groups.

  • k: number of comparison groups
  • f: means the effect size (Cohen’s \(f\))

\[ f = \sqrt{ \frac{ \sum_{i=1}^{k} p_i \times (\mu_i - \mu)^2 }{\sigma^2} } \]

Effect Size (f-values)

  • Small = 0.1
  • Medium = 0.25
  • Large = 0.4
cohen.ES(test = "anov", size = "medium")

     Conventional effect size from Cohen (1982) 

           test = anov
           size = medium
    effect.size = 0.25
pwr.anova.test(k = 3 , f = 0.25, sig.level = 0.05, power = 0.8)

     Balanced one-way analysis of variance power calculation 

              k = 3
              n = 52.3966
              f = 0.25
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group

Assume that the p-value is 0.05, the power is 80%, and the two-tailed test is performed. When the total comparison group was three groups and the effect size value was 0.25, the number of subjects calculated was 53 in each group.

Considering a dropout rate of 20%, a total of 198 samples are required, which is calculated as 66 per group.

4.2.2 Kruskal-Wallis test (Non-parametric)

For non-parametric test, add an additional 15% of each group