Sample size: comparing two proportions
Compare two proportions (or rates)
Use this tool when your outcome is binary (e.g. died / survived, cured / not cured, complication / no complication) and you want to compare two independent groups. Defaults are tuned for a typical MMed-level study.
As a proportion between 0 and 1 (e.g. 0.30 = 30 %).
How to justify this number
This is what you expect to see in your control or standard-care group, before running the study. Sources, in order of strength:
- A recent local study in a similar South African setting.
- A systematic review or meta-analysis of the outcome.
- A registry / audit (NICD, Stats SA, hospital M&M data).
- A pilot chart review of 30–50 of your own records.
Search:
"<outcome>" "<setting>" prevalence OR incidence South Africa
on PubMed (last 5 years).
In your protocol: "Based on Smith et al. (2022), the 30-day mortality at our institution for this cohort was approximately 30 %."
As a proportion between 0 and 1.
How to justify this number
Two valid framings:
- Expected effect — what you predict the comparison group will show, based on prior trials.
- Minimum clinically important difference (MCID) — the smallest change that would justify acting on the result. Often more defensible.
Cite an MCID from clinical guidelines or expert consensus where possible (e.g. SA Society of Anaesthesiologists guideline 2019).
Acceptable false-positive rate. Two-sided test.
How to justify this number
α = 0.05 is the universal default in clinical research. Only deviate with reason — strict (0.01) for confirmatory or multiple-testing studies; lenient (0.10) only for exploratory work, and flag it.
Cite: ICH E9 (1998) Statistical Principles for Clinical Trials.
Probability of detecting the effect if it really exists.
How to justify this number
Convention is 0.80. Increase to 0.90 if the study is expensive, hard to repeat, or missing the effect would have major clinical consequences.
Cite: Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. 1988.
Tick this if participants are grouped (clinics, classrooms, households). People in the same cluster are more similar than random individuals, which inflates the required sample size.
How similar people in the same cluster are (0–1).
How to justify this number
ICC values are notoriously hard to pin down. Cite a published compendium and pick a conservative value (i.e. higher rather than lower) for the planning stage:
- Adams G et al. J Clin Epidemiol 2004 — primary care.
- Murray DM et al. Am J Public Health 2004 — community-based trials.
- Campbell MK et al. Methods in Cluster Randomised Trials, 2005.
If no published ICC exists, justify a conservative estimate (e.g. 0.05 for clinic-level outcomes) and report sensitivity at ρ = 0.02, 0.05, 0.10.
People per cluster (e.g. 20 patients per clinic).
You need
What does this calculation actually do?
For comparing two independent proportions with a two-sided z-test, the required sample size per group is approximately:
n = [ z₁₋α/₂ · √(2 · p̄ · (1 − p̄)) + z₁₋β · √(p₁(1 − p₁) + p₂(1 − p₂)) ]² / (p₂ − p₁)²
where p̄ = (p₁ + p₂) / 2. When clustering is on, the required sample is inflated by the design effect:
DEff = 1 + (m − 1) · ρ
Assumptions: independent observations within groups (or constant ICC across clusters); no interim analyses; complete follow-up. Inflate by your expected drop-out rate before recruiting.
References: Fleiss JL, Tytun A, Ury HK. Biometrics 1980;36:343–6. · Cohen J. Statistical Power Analysis 1988. · Lwanga SK, Lemeshow S. Sample Size Determination in Health Studies, WHO 1991.
Worked MMed example
Setting: A registrar audits 30-day mortality in two ICUs in a tertiary hospital. ICU A uses a standard sepsis bundle; ICU B uses an enhanced bundle introduced in 2023.
From a 2024 single-centre cohort (Mokoena et al.), 30-day mortality on the standard bundle was about 30 %. The enhanced bundle is expected to drop mortality to roughly 50 % of patients surviving without complications — i.e. the registrar wants to detect a difference between p₁ = 0.30 and p₂ = 0.50, with α = 0.05 and 80 % power.
The calculator returns roughly 93 patients per ICU (186 total), comfortably feasible in a 12-month chart-review window. After inflating for an expected 10 % missing-record rate, the registrar plans for 207 charts.
Prefer to explore this in R / Shiny? Open the interactive R version →