Sample size: logistic regression

sample size

logistic regression

modelling

Sample-size calculator for logistic regression — events-per-variable rule plus the Hsieh (1989) formula. Plain-language inputs and a copy-ready methods paragraph.

Published

May 8, 2026

Sample size for logistic regression

Use this tool when you're modelling a binary outcome (yes/no, died/survived) with one or more predictors. We give you both the events-per-variable rule of thumb (Peduzzi 1996) and the Hsieh (1989) formula for a single primary predictor.

Number of candidate predictors k

Count every coefficient your final model will estimate (including dummy variables for each level of a categorical variable beyond the reference).

How to justify this number

List the variables you genuinely plan to include in the final multivariable model — not every variable in your dataset. Pre-specify this in your protocol; post-hoc inflation defeats the purpose.

Probability of the outcome p

Overall event rate in your study population (e.g. 30-day mortality, complication rate).

How to justify this number

Use a recent local cohort, registry, or systematic review. The sample size is most sensitive to this number — a rare outcome (< 10 %) inflates the required sample dramatically.

Events per variable (EPV)

Convention is 10. Increase to 20 for sparse data or unstable models.

How to justify this number

Cite: Peduzzi P et al. J Clin Epidemiol 1996;49:1373–9. Use 20 for sparse outcomes or when calibration matters more than discrimination (van Smeden et al. 2019).

Hsieh formula (single primary predictor)

If you have one primary predictor of interest and want a more rigorous estimate, fill in the next two fields.

Type of primary predictor

Expected odds ratio per 1-unit change OR

For continuous: OR per 1-SD change. For binary: OR for exposed vs unexposed.

How to justify this number

An OR of 1.3–1.5 is a small-to-modest effect; 2.0+ is large. Take the smallest OR your study still needs to detect, not the largest you might find.

R² of primary predictor with other covariates

If your primary predictor is correlated with adjustment variables, set this. 0 = independent; 0.3 = mild collinearity; 0.5+ = strong.

Significance level α

Power 1 − β

You need

—

Adjust the inputs to see your sample size.

What does this calculation actually do?

Events-per-variable rule (Peduzzi 1996): the minimum sample size is whichever gives at least EPV events per coefficient:

n_EPV = (k · EPV) / p

Hsieh formula for a continuous primary predictor (Hsieh 1989, adjusted for collinearity by VIF = 1/(1−R²)):

n = (z₁₋α/₂ + z₁₋β)² / (p · (1 − p) · ln(OR)²) · 1/(1 − R²)

Hsieh formula for a binary primary predictor (Hsieh, Bloch & Larsen 1998):

n = (z₁₋α/₂ √(p̄(1−p̄)/B) + z₁₋β √(p₁(1−p₁) + p₂(1−p₂)·(1−B)/B))² / ((p₁ − p₂)² · (1 − B))

where B is the proportion exposed and p₁, p₂ are derived from the overall event rate and OR. We report the larger of the EPV estimate and the Hsieh estimate.

References: Peduzzi P et al. J Clin Epidemiol 1996;49:1373–9. · Hsieh FY. Statist Med 1989;8:795–802. · Hsieh FY, Bloch DA, Larsen MD. Statist Med 1998;17:1623–34.