What's the difference between population and sample standard deviation?

Population standard deviation (σ) divides the sum of squared deviations by N — used when your dataset IS the entire population (e.g., all 50 US states, all members of a finite group). Sample standard deviation (s) divides by (N − 1) — used when your dataset is a SAMPLE from a larger population (most real-world scenarios). The (N − 1) correction (Bessel's correction) makes the sample SD an unbiased estimator of the population SD.

What's the formula for standard deviation?

Population: σ = sqrt(Σ(xᵢ − μ)² / N). Sample: s = sqrt(Σ(xᵢ − x̄)² / (N − 1)). Both compute the same kind of spread; the divisor is the only difference. For large datasets (N > 30), the two values are essentially identical; for small samples, sample SD is meaningfully larger.

When should I use sample vs population mode?

Sample (s) is the right default for almost all real-world data — scientific experiments, surveys, business metrics, customer measurements. You're trying to ESTIMATE the spread of an underlying population from a finite sample. Use Population (σ) only when you've truly measured every member of the group (rare in practice).

What's variance and why do we square the deviations?

Variance = average squared deviation from the mean. We square the deviations because (a) it makes them positive (otherwise positive and negative deviations would cancel out), (b) it amplifies large deviations more than small ones (penalizing outliers), and (c) it makes the math differentiable for downstream use (least-squares regression, etc.). Standard deviation is the square root of variance — its units match the original data.

What is the coefficient of variation (CV)?

CV = σ ÷ |μ| × 100% — a unit-free measure of relative spread. Useful for comparing variability across datasets with different scales (e.g., comparing wage spreads across countries with different currencies). CV 30% = wide spread. CV is undefined when the mean is zero.

How does standard deviation relate to the normal distribution?

For normally-distributed data, the 68-95-99.7 rule applies: ~68% of values fall within 1 σ of the mean, ~95% within 2σ, ~99.7% within 3σ. This is the foundation for confidence intervals and hypothesis testing. The rule breaks down for non-normal distributions (skewed, bimodal, heavy-tailed) — always check the distribution shape before applying it.

What about robust alternatives to standard deviation?

SD is sensitive to outliers — one extreme value can dominate the calculation. Robust alternatives include the Interquartile Range (IQR, from 25th to 75th percentile) and the Median Absolute Deviation (MAD). For datasets with outliers or non-normal distributions, report SD + IQR + range together. The calculator surfaces min, max, and range to help spot outlier influence.

Can I paste data with extra commas or text?

Yes — the parser silently drops anything that isn't a valid number. You can paste CSV with headers, comma-separated values, whitespace-separated values, or one number per line. The calculator counts only the numeric tokens.

Why is my sample SD bigger than my population SD?

Mathematical fact: dividing by (N − 1) instead of N produces a slightly larger result. At N=2 the difference is 41% larger; at N=10 it's 5.4%; at N=100 it's 0.5%. The correction matters for small samples and becomes negligible for large ones. When in doubt, use sample SD — it's the right inferential estimator.

What's the maximum dataset size this calculator handles?

10,000 values. Beyond that, paste into a tool like R, Python (pandas/numpy), or Excel — they're optimized for very large datasets. For most analytical work (lab data, survey responses, sales figures), 10,000 is more than sufficient.

How accurate is the calculation?

All arithmetic is done in IEEE 754 double precision (16 decimal digits). For datasets with numbers in similar magnitudes, the result is exact to ~10 decimal places. For datasets mixing very large and very small numbers, catastrophic cancellation can reduce accuracy — for those cases, professional statistical packages (R, Python numpy.std) use compensated summation algorithms.

What if my data has only one value?

Population SD = 0 (no spread). Sample SD requires at least 2 values (you can't estimate spread from one observation). The calculator returns an error for sample mode with N=1. For single observations, the question 'what's the spread' is statistically meaningless — you need at least 2 points to talk about variability.

MathFree · No signup · 165K/month · $1.20 CPC

Standard Deviation Calculator — Population (σ) and Sample (s) Modes

Paste a dataset (CSV, whitespace, or one per line) and get the population σ or sample s standard deviation, variance, mean, range, and coefficient of variation. NIST-style math; works for up to 10,000 values.

Instant result
Private — nothing saved
Works on any device
AI insight included

Reviewed by CalcBold Editorial · Sources: NIST SEMATECH e-Handbook §1.3.6.5 + standard statistical methodologyLast verified May 15, 2026Methodology

Embed builderDrop the Standard Deviation on your site →Free widget · 3 sizes · custom theme · auto-resizes · no signupGet embed code

What is Standard Deviation?

Standard deviation (σ for a full population, s for a sample) is the single most widely used measure of how spread out a dataset is. The mean tells you where the center of the data sits; the standard deviation tells you how far the typical observation drifts from that center. Two datasets can share an identical mean and still feel completely different in practice — one tight and predictable, one wide and noisy — and standard deviation is the number that captures that difference in a single value, expressed in the same units as the original data.

Mechanically, it is the square root of the average squared deviation from the mean. Squaring the deviations before averaging is what makes standard deviation behave the way it does: it forces every contribution to be positive (so positive and negative deviations do not cancel), and it amplifies large deviations more than small ones, which is why a single outlier can move the result so much. The final square root brings the answer back into the original units — if your data is in dollars, σ is in dollars; if your data is in seconds, σ is in seconds.

Almost every downstream statistical procedure — confidence intervals, hypothesis testing, z-scores, control charts, regression error bars, Sharpe ratios — takes standard deviation as an input. Getting it right (and choosing between the population and sample variants correctly) is the foundation for everything else.

The Formula — Population vs Sample

The two formulas look almost identical. The only difference is the divisor.

Population standard deviation (you have measured every member of the group) divides by N:

σ = √( Σ(xᵢ − μ)² / N )

Sample standard deviation(your data is a sample drawn from a larger population) divides by N − 1:

s = √( Σ(xᵢ − x̄)² / (N − 1) )

Where μ is the population mean, x̄ is the sample mean, and xᵢ ranges over every observation. Variance is the same calculation without the final square root — variance is σ² (population) or s² (sample). If somebody hands you a variance and asks for standard deviation, just take the square root.

Why N − 1? Bessel’s Correction Explained

The N − 1 in the sample formula is called Bessel’s correctionand it makes the sample variance an unbiased estimator of the population variance. The intuition: when you compute the sample mean x̄ from the data, you have already used the data to pin one number. That costs you one degree of freedom— the deviations xᵢ − x̄ are no longer fully independent (their sum is constrained to be exactly zero). Dividing by N − 1 instead of N corrects for the slight under-estimation that this constraint produces.

For tiny samples the correction is huge: at N=2, dividing by N − 1 = 1 instead of N = 2 doubles the variance and increases the standard deviation by √2 ≈ 41%. At N=10 the gap is about 5.4%. At N=100 it is 0.5%. At N=1,000 it is effectively zero. This is why sample-vs-population matters intensely for small datasets and is irrelevant for large ones — but you should still pick the right one because it signals what kind of inference you are doing.

Worked Example — Dataset [4, 8, 6, 5, 3]

A tiny dataset is the clearest way to see the formula at work. Five values: 4, 8, 6, 5, 3.

Step 1 — Compute the mean.

x̄ = (4 + 8 + 6 + 5 + 3) / 5 = 26 / 5 = 5.2

Step 2 — Compute each deviation from the mean.

4 − 5.2 = −1.2
8 − 5.2 = +2.8
6 − 5.2 = +0.8
5 − 5.2 = −0.2
3 − 5.2 = −2.2

Notice that the deviations sum to exactly zero — that is always true by construction of the mean, and it is precisely why we need to square them before summing.

Step 3 — Square each deviation.

(−1.2)² = 1.44
(+2.8)² = 7.84
(+0.8)² = 0.64
(−0.2)² = 0.04
(−2.2)² = 4.84

Step 4 — Sum the squared deviations.

Σ = 1.44 + 7.84 + 0.64 + 0.04 + 4.84 = 14.80

Step 5 — Divide, then square-root.

Population mode (divide by N=5): σ² = 14.80 / 5 = 2.96, so σ = √2.96 ≈ 1.720.

Sample mode (divide by N−1=4): s² = 14.80 / 4 = 3.70, so s = √3.70 ≈ 1.924.

Sample s is about 12% larger than population σ here — exactly the small-N gap predicted by Bessel’s correction. If you paste [4, 8, 6, 5, 3] into the calculator above and toggle between modes, you will see these two numbers reproduced to the last decimal place.

Variance, Standard Deviation, and Standard Error

Three closely related quantities get confused constantly. They are not the same thing.

Variance (σ² or s²)— the average squared deviation. It is what falls out of the formula beforethe final square root. Same information as standard deviation, but in squared units (dollars-squared, seconds-squared) — harder to interpret intuitively but mathematically convenient for additive properties.
Standard deviation (σ or s)— the square root of variance, back in the original units. This is the spread metric you report to a human reader.
Standard error (SE) — the standard deviation of the sample mean, not of the data itself. SE = σ / √n. It shrinks as n grows. SE is what you use to build confidence intervals around a mean estimate; SD is what you use to describe the spread of the underlying data. Confusing them is one of the most common mistakes in applied statistics.

The Empirical Rule — 68 / 95 / 99.7

For data that follows a normal (bell-curve) distribution, standard deviation has a very specific geometric meaning:

About 68% of values fall within ±1 σ of the mean.
About 95% of values fall within ±2 σ of the mean.
About 99.7% of values fall within ±3 σ of the mean.

This is the foundation of every textbook control chart, six-sigma quality framework, and rough rule-of-thumb confidence interval. It is also the basis of the z-score— a z of 2 means “2 standard deviations from the mean,” which under normality means the value is in the outer 5% of the distribution.

Crucially, the empirical rule only applies to roughly normaldistributions. Heavy-tailed distributions (financial returns, internet traffic, word frequencies) routinely violate it — the “99.7%-within-3σ” promise can be off by orders of magnitude when the data is fat-tailed. Always plot a histogram before trusting the empirical rule.

Common Mistakes & Edge Cases

Using N when you should use N − 1. The single most common error in introductory statistics. Spreadsheets have separate functions (Excel: STDEV.P vs STDEV.S) and beginners pick the wrong one. If you are estimating spread from a sample of a larger population, you need N − 1.
Reporting SD as if it were SE.“Mean = 50, SD = 10” describes individual variability. “Mean = 50, SE = 2” describes uncertainty about the mean. Confusing the two undersells (or oversells) your precision by a factor of √n.
Treating outliers naively.One extreme value can dominate the calculation because deviations are squared. A dataset of [1, 2, 3, 4, 5, 100] has a sample SD of about 40 — almost entirely driven by the single value 100. Decide before computing whether outliers are real signal or measurement error, and report results both with and without them when in doubt.
Forgetting that SD is on the original scale.If your data is in thousands of dollars, an SD of “5” means $5,000. Always carry the units; standard deviation is not dimensionless.
Comparing SDs across different units.Comparing “wage spread in USD” to “wage spread in JPY” via raw SD is meaningless. For cross-scale comparisons use the coefficient of variation: CV = σ / |μ|, a unit-free ratio.
Computing SD on data that should be log-transformed first.Right-skewed data (incomes, response times, file sizes) often becomes approximately normal under a log transform. Computing SD on the raw scale produces a number dominated by the upper tail; computing it on the log scale is frequently the right move.

How to Interpret Your Result

A standard deviation by itself is just a number — meaning comes from comparing it to something. Three useful framings:

Relative to the mean (CV).Coefficient of variation = σ ÷ |μ|. CV under 10% is a tight distribution (sensor measurements, manufactured part dimensions). 10–30% is moderate. Over 30% is wide (consumer spending, marketing response rates). CV is undefined when the mean is zero or near zero.
Relative to a benchmark.If you know the historical SD of a process is 1.5 and your current sample shows 4.2, something has changed. SPC (statistical process control) is built entirely on this comparison — out-of-control signals trigger when current SD drifts meaningfully from the historical baseline.
Relative to a decision threshold.Six-sigma quality means defects fall outside ±6 σ — about 3.4 per million. The decision-relevant question is rarely “what is σ?” but “is the data tight enough that the worst-case observation stays inside our tolerance?”

When SD Is the Wrong Tool

Standard deviation assumes the spread metric you want is symmetric and weighted by squared distance from the mean. Three situations where it misleads:

Heavily skewed distributions. For right-skewed data (incomes, city sizes, web session lengths), the median + interquartile range tells a truer story than mean + SD. Report both.
Datasets with influential outliers. Use the median absolute deviation (MAD) or the interquartile range (IQR, 25th to 75th percentile) as robust alternatives. The IQR is essentially immune to the most extreme 25% of the data on each tail.
Multimodal distributions. A dataset with two clusters (say, weights of children and adults pooled together) has a meaningless mean and an inflated SD that describes the gap between modes rather than the spread within each. Identify the modes first; report SD per cluster.

Related Calculators

Once you have a standard deviation, the next step is usually one of these tools.

Z-Score Calculator— standardize a single observation against the mean and SD you just computed.
P-Value Calculator— convert a test statistic (built from σ or s) into a probability for hypothesis testing.
Average Calculator— if you just need the mean of a dataset without the spread metrics.
Percentage Calculator— quick percent-of-total and percent-change math.
Ratio Calculator— for comparing two quantities directly rather than describing spread of one.

Frequently Asked Questions

The most common questions we get about this calculator — each answer is kept under 60 words so you can scan.

What's the difference between population and sample standard deviation?
Population standard deviation (σ) divides the sum of squared deviations by N — used when your dataset IS the entire population (e.g., all 50 US states, all members of a finite group). Sample standard deviation (s) divides by (N − 1) — used when your dataset is a SAMPLE from a larger population (most real-world scenarios). The (N − 1) correction (Bessel's correction) makes the sample SD an unbiased estimator of the population SD.
What's the formula for standard deviation?
Population: σ = sqrt(Σ(xᵢ − μ)² / N). Sample: s = sqrt(Σ(xᵢ − x̄)² / (N − 1)). Both compute the same kind of spread; the divisor is the only difference. For large datasets (N > 30), the two values are essentially identical; for small samples, sample SD is meaningfully larger.
When should I use sample vs population mode?
Sample (s) is the right default for almost all real-world data — scientific experiments, surveys, business metrics, customer measurements. You're trying to ESTIMATE the spread of an underlying population from a finite sample. Use Population (σ) only when you've truly measured every member of the group (rare in practice).
What's variance and why do we square the deviations?
Variance = average squared deviation from the mean. We square the deviations because (a) it makes them positive (otherwise positive and negative deviations would cancel out), (b) it amplifies large deviations more than small ones (penalizing outliers), and (c) it makes the math differentiable for downstream use (least-squares regression, etc.). Standard deviation is the square root of variance — its units match the original data.
What is the coefficient of variation (CV)?
CV = σ ÷ |μ| × 100% — a unit-free measure of relative spread. Useful for comparing variability across datasets with different scales (e.g., comparing wage spreads across countries with different currencies). CV < 10% = tight distribution; 10-30% = moderate; > 30% = wide spread. CV is undefined when the mean is zero.
How does standard deviation relate to the normal distribution?
For normally-distributed data, the 68-95-99.7 rule applies: ~68% of values fall within 1 σ of the mean, ~95% within 2σ, ~99.7% within 3σ. This is the foundation for confidence intervals and hypothesis testing. The rule breaks down for non-normal distributions (skewed, bimodal, heavy-tailed) — always check the distribution shape before applying it.
What about robust alternatives to standard deviation?
SD is sensitive to outliers — one extreme value can dominate the calculation. Robust alternatives include the Interquartile Range (IQR, from 25th to 75th percentile) and the Median Absolute Deviation (MAD). For datasets with outliers or non-normal distributions, report SD + IQR + range together. The calculator surfaces min, max, and range to help spot outlier influence.
Can I paste data with extra commas or text?
Yes — the parser silently drops anything that isn't a valid number. You can paste CSV with headers, comma-separated values, whitespace-separated values, or one number per line. The calculator counts only the numeric tokens.
Why is my sample SD bigger than my population SD?
Mathematical fact: dividing by (N − 1) instead of N produces a slightly larger result. At N=2 the difference is 41% larger; at N=10 it's 5.4%; at N=100 it's 0.5%. The correction matters for small samples and becomes negligible for large ones. When in doubt, use sample SD — it's the right inferential estimator.
What's the maximum dataset size this calculator handles?
10,000 values. Beyond that, paste into a tool like R, Python (pandas/numpy), or Excel — they're optimized for very large datasets. For most analytical work (lab data, survey responses, sales figures), 10,000 is more than sufficient.
How accurate is the calculation?
All arithmetic is done in IEEE 754 double precision (16 decimal digits). For datasets with numbers in similar magnitudes, the result is exact to ~10 decimal places. For datasets mixing very large and very small numbers, catastrophic cancellation can reduce accuracy — for those cases, professional statistical packages (R, Python numpy.std) use compensated summation algorithms.
What if my data has only one value?
Population SD = 0 (no spread). Sample SD requires at least 2 values (you can't estimate spread from one observation). The calculator returns an error for sample mode with N=1. For single observations, the question 'what's the spread' is statistically meaningless — you need at least 2 points to talk about variability.