Sample size: logistic regression

sample size
logistic regression
modelling
Sample-size calculator for logistic regression — events-per-variable rule plus the Hsieh (1989) formula. Plain-language inputs and a copy-ready methods paragraph.
Published

May 8, 2026

Sample size for logistic regression

Use this tool when you're modelling a binary outcome (yes/no, died/survived) with one or more predictors. We give you both the events-per-variable rule of thumb (Peduzzi 1996) and the Hsieh (1989) formula for a single primary predictor.

Count every coefficient your final model will estimate (including dummy variables for each level of a categorical variable beyond the reference).

How to justify this number

List the variables you genuinely plan to include in the final multivariable model — not every variable in your dataset. Pre-specify this in your protocol; post-hoc inflation defeats the purpose.

Overall event rate in your study population (e.g. 30-day mortality, complication rate).

How to justify this number

Use a recent local cohort, registry, or systematic review. The sample size is most sensitive to this number — a rare outcome (< 10 %) inflates the required sample dramatically.

Convention is 10. Increase to 20 for sparse data or unstable models.

How to justify this number

Cite: Peduzzi P et al. J Clin Epidemiol 1996;49:1373–9. Use 20 for sparse outcomes or when calibration matters more than discrimination (van Smeden et al. 2019).


Hsieh formula (single primary predictor)

If you have one primary predictor of interest and want a more rigorous estimate, fill in the next two fields.

For continuous: OR per 1-SD change. For binary: OR for exposed vs unexposed.

How to justify this number

An OR of 1.3–1.5 is a small-to-modest effect; 2.0+ is large. Take the smallest OR your study still needs to detect, not the largest you might find.

If your primary predictor is correlated with adjustment variables, set this. 0 = independent; 0.3 = mild collinearity; 0.5+ = strong.

You need

Adjust the inputs to see your sample size.
What does this calculation actually do?

Events-per-variable rule (Peduzzi 1996): the minimum sample size is whichever gives at least EPV events per coefficient:

n_EPV = (k · EPV) / p

Hsieh formula for a continuous primary predictor (Hsieh 1989, adjusted for collinearity by VIF = 1/(1−R²)):

n = (z₁₋α/₂ + z₁₋β)² / (p · (1 − p) · ln(OR)²) · 1/(1 − R²)

Hsieh formula for a binary primary predictor (Hsieh, Bloch & Larsen 1998):

n = (z₁₋α/₂ √(p̄(1−p̄)/B) + z₁₋β √(p₁(1−p₁) + p₂(1−p₂)·(1−B)/B))² / ((p₁ − p₂)² · (1 − B))

where B is the proportion exposed and p₁, p₂ are derived from the overall event rate and OR. We report the larger of the EPV estimate and the Hsieh estimate.

References: Peduzzi P et al. J Clin Epidemiol 1996;49:1373–9. · Hsieh FY. Statist Med 1989;8:795–802. · Hsieh FY, Bloch DA, Larsen MD. Statist Med 1998;17:1623–34.

../after.html