Sample size: comparing two means
Compare two means (two-sample t-test)
Use this tool when your outcome is continuous (e.g. blood pressure, BMI, length of stay) and you want to compare two independent groups. Defaults are tuned for a typical MMed-level study.
The expected average outcome in your reference group.
How to justify this number
Sources, in order of strength:
- A recent local study in a similar setting.
- A systematic review or meta-analysis.
- A pilot of 30–50 of your own records.
In your protocol: "Based on Smith et al. (2024), the mean systolic BP in our cohort was approximately 120 mmHg."
The expected mean in the comparison group.
How to justify this number
Two valid framings:
- Expected mean based on prior trials.
- Minimum clinically important difference (MCID) — the smallest change that would change practice. Often more defensible.
Cite an MCID from clinical guidelines (e.g. 5 mmHg for systolic BP per SEMDSA hypertension guideline 2018).
Spread of the outcome — assumed roughly equal in both groups.
How to justify this number
The hardest input to pin down. Best sources:
- SD reported in the same paper you used for the means.
- Width of a 95 % CI:
SD ≈ √n × (CI width) / 3.92. - Range / 4 if only min–max is reported (rough).
- Pilot data (n ≥ 30) from your own records.
If unsure, run sensitivity at ± 25 % around your best guess and report it.
Acceptable false-positive rate. Two-sided test.
How to justify this number
α = 0.05 is the universal default. Cite: ICH E9 (1998) Statistical Principles for Clinical Trials.
Probability of detecting the effect if it really exists.
Equal groups = 1. Set to 2 for a 2:1 design (e.g. one control to two cases).
You need
What does this calculation actually do?
For a two-sample t-test with allocation ratio k, the required sample size in Group 1 is approximately:
n₁ = (1 + 1/k) · σ² · (z₁₋α/₂ + z₁₋β)² / (μ₁ − μ₂)² n₂ = k · n₁
Cohen's effect size d = (μ₁ − μ₂) / σ. We add 1 to each group as
a small-sample correction (Snedecor & Cochran 1989) — this matches
power.t.test() in base R closely for n ≥ 20.
Assumptions: independent observations, approximately normal outcomes (or n ≥ 30 per group via the central limit theorem), roughly equal variances. Inflate by your expected drop-out rate before recruiting.
References: Snedecor GW, Cochran WG. Statistical Methods, 8th ed. 1989. · Cohen J. Statistical Power Analysis 1988. · Lwanga SK, Lemeshow S. Sample Size Determination in Health Studies, WHO 1991.