Topic · Statistics & Probability

Variance & Standard Deviation

The mean tells you where a dataset is centered. Variance and standard deviation tell you how tightly — or how loosely — the values cluster around that center. They are the language statisticians use whenever they need to talk about spread.

What you'll leave with

  • Why a measure of spread is just as essential as a measure of center.
  • The formula for variance — and why we square the differences instead of taking absolute values.
  • How standard deviation recovers the original units of the data.
  • The difference between dividing by $n$ (population) and $n - 1$ (sample), and what Bessel's correction is correcting.
  • The 68-95-99.7 rule: a single sentence that gives you 90% of practical statistical intuition.

1. Spread, not center

Imagine two basketball players, Alice and Bob. Both average 20 points per game over a season. On paper they look identical — but watch them play and the truth is obvious. Alice scores 19, 21, 20, 20, 20. Bob scores 5, 35, 10, 40, 10. Same mean, completely different stories.

The mean throws away a huge amount of information. To distinguish Alice from Bob, you need a number that captures how far the data wanders from the center. That number is what variance and standard deviation are built to measure.

Same center, different spreads

Two datasets can share a mean and still describe completely different realities. Spread is what separates a reliable, predictable process from a wildly variable one. Without it, "the average customer waits 5 minutes" tells you almost nothing useful.

The naive idea is to average the distances of each point from the mean. But there's a problem: by definition, the positive and negative deviations from the mean cancel out exactly. Their sum is zero, every time:

$$ \sum_{i=1}^{n} (x_i - \bar{x}) = 0 $$

So we have to do something to the deviations before summing them. There are two natural choices — take absolute values, or square them. Squaring wins for reasons we'll see in a moment.

2. Variance

Variance (population)

The mean of the squared deviations from the mean. For a population of $n$ values $x_1, x_2, \ldots, x_n$ with mean $\bar{x}$:

$$ \sigma^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^2 $$

The recipe is four steps, and you should be able to do it in your sleep:

  1. Compute the mean $\bar{x}$.
  2. For each value, find its deviation from the mean: $x_i - \bar{x}$.
  3. Square each deviation.
  4. Average the squared deviations.

Why squared, not absolute?

Both $|x_i - \bar{x}|$ and $(x_i - \bar{x})^2$ get rid of the sign problem. So why does every introductory textbook reach for the square? Three reasons, in order of how much they matter:

  • Squaring is smooth. The function $f(d) = d^2$ is differentiable everywhere; $|d|$ has a kink at zero. Calculus on smooth things is enormously easier, and a lot of statistics is calculus on variance.
  • Squaring punishes outliers more. A value twice as far from the mean contributes four times as much to the variance. This makes variance very sensitive to extreme observations — sometimes a feature, sometimes a bug.
  • Squared distances add cleanly. Variance of a sum of independent random variables equals the sum of their variances. The analogous statement for absolute deviation is simply false. This single property is what makes variance the right tool for most theoretical work.
Mean absolute deviation exists too

The "average absolute deviation," $\tfrac{1}{n}\sum|x_i - \bar{x}|$, is a perfectly valid measure of spread — sometimes more robust to outliers than variance. It just doesn't behave as nicely under the algebraic operations that statisticians need to do constantly, so it isn't the default.

3. Standard deviation

Variance has one annoying property: its units are the square of the data's units. If your numbers are heights in centimeters, the variance is in cm² — a quantity nobody has any intuition for. The fix is just to take the square root.

Standard deviation

The square root of the variance. For a population:

$$ \sigma = \sqrt{\sigma^2} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^2} $$

Standard deviation is in the same units as the original data, which makes it the right number to quote when communicating with humans.

You should think of variance as the right thing to work with algebraically, and standard deviation as the right thing to report. They carry exactly the same information — knowing one tells you the other — but they live in different units, and that matters when you have to interpret a number.

If you want to do math, use variance. If you want to talk to a person, use standard deviation.

4. Sample versus population

So far we've assumed you know every value in the dataset — that the $n$ numbers are the whole population. In practice you almost never do; you have a sample, and you're using it to estimate properties of an unseen population.

When estimating from a sample, the formula changes slightly. The sample variance divides by $n - 1$ instead of $n$:

$$ s^2 = \frac{1}{n - 1}\sum_{i=1}^{n}(x_i - \bar{x})^2 $$

This adjustment is called Bessel's correction. The reason is subtle but worth understanding once.

When you compute the sample mean $\bar{x}$ from the data and then measure deviations from $\bar{x}$, those deviations are systematically smaller than the deviations from the true (unknown) population mean would have been. The sample mean is, by construction, the value that minimizes the sum of squared deviations from the sample. So using $\bar{x}$ instead of the true mean understates the spread.

Formally, you've spent one "degree of freedom" estimating the mean. You only have $n - 1$ independent pieces of information left for estimating the variance. Dividing by $n - 1$ exactly cancels the downward bias and produces an estimator whose expected value equals the true population variance.

Which formula to use?

Almost always: divide by $n - 1$. Calculators and spreadsheets default to it for a reason. Only divide by $n$ when you genuinely have the entire population — every employee, every patient, every member of the set — which in practice is rare.

QuantitySymbolDivide byWhen
Population variance$\sigma^2$$n$You have every value in the population
Sample variance$s^2$$n - 1$You have a sample and want to estimate $\sigma^2$
Population SD$\sigma$$\sqrt{\sigma^2}$
Sample SD$s$$\sqrt{s^2}$

5. The empirical rule (68-95-99.7)

For data that follows a roughly bell-shaped (normal) distribution, the standard deviation comes with a remarkable interpretation. Almost the entire dataset lives within just a few standard deviations of the mean:

  • About 68% of values fall within $\pm 1\sigma$ of the mean.
  • About 95% fall within $\pm 2\sigma$.
  • About 99.7% fall within $\pm 3\sigma$.

This is the single most useful rule of thumb in elementary statistics. Given a mean and a standard deviation, you can sketch the shape of an entire bell-curve dataset on a napkin.

−3σ −2σ −1σ μ +1σ +2σ +3σ 68% 13.5% 13.5% 2.35% 2.35% Normal distribution band widths show the 68-95-99.7 rule ±1σ ≈ 68% ±2σ ≈ 95% ±3σ ≈ 99.7%

An IQ test, for example, is calibrated so the mean is 100 and the standard deviation is 15. The empirical rule then tells you immediately: roughly two-thirds of people score between 85 and 115, about 19 in 20 score between 70 and 130, and a score outside 55–145 happens to about three people in a thousand.

It's only for normal distributions

The 68-95-99.7 numbers are properties of the bell curve specifically. For income data, response times, or anything else with a heavy tail, the percentages can be wildly different — sometimes most of the data sits within one standard deviation, sometimes far less. Don't apply the rule until you've checked the shape.

6. Chebyshev's inequality

The empirical rule is sharp but only works for normal data. When you have no idea what shape your distribution takes, there's a weaker but universal result that still gives you a guarantee: Chebyshev's inequality.

Chebyshev's inequality

For any distribution with finite mean $\mu$ and finite standard deviation $\sigma$, and for any $k > 1$:

$$ P\bigl(|X - \mu| \geq k\sigma\bigr) \leq \frac{1}{k^2} $$

Equivalently, at least $1 - 1/k^2$ of the data lies within $k$ standard deviations of the mean.

Plug in $k = 2$: at least $1 - 1/4 = 75\%$ of any dataset sits within $\pm 2\sigma$. Plug in $k = 3$: at least $1 - 1/9 \approx 88.9\%$ sits within $\pm 3\sigma$. The bounds are loose — for normal data the true numbers (95%, 99.7%) are much higher — but they hold for every distribution, no matter how exotic.

When to use which

Use the empirical rule when you know (or can credibly assume) the data is roughly bell-shaped. Use Chebyshev when you don't know, or when the data is clearly skewed or heavy-tailed. Chebyshev gives a guarantee; the empirical rule gives a tight approximation.

7. Z-scores

Once you know a distribution's mean and standard deviation, you can re-express every value in a universal currency: how many standard deviations is this above or below the mean? That number is the z-score.

Z-score (standardized value)
$$ z = \frac{x - \mu}{\sigma} $$

A z-score of $+1.5$ means "1.5 standard deviations above the mean." A z-score of $-0.5$ means "half a standard deviation below the mean."

Z-scores let you compare apples and oranges. A score of $85$ on a test with mean $70$ and SD $10$ has $z = 1.5$. A height of $190$ cm in a population with mean $175$ cm and SD $7$ cm has $z \approx 2.14$. The height is more "extreme" relative to its distribution than the test score is to its — and the z-score makes that visible without any further work.

Combined with the empirical rule, z-scores give an instant sanity check. A z of $\pm 1$ is unremarkable. A z of $\pm 2$ is rare-ish. A z of $\pm 3$ or beyond is genuinely unusual for normal data — about one in 370 either side.

8. Other measures of spread: range and IQR

Standard deviation isn't the only way to summarize spread. Two simpler alternatives sit at opposite ends of the trade-off curve.

Range

The range is just the maximum minus the minimum:

$$ \text{range} = \max(x) - \min(x) $$

It's trivial to compute and easy to explain. It's also brittle: a single outlier can blow it up arbitrarily, and it ignores everything between the two extremes. Use it for a quick glance, not for serious analysis.

Interquartile range (IQR)

The interquartile range is the spread of the middle 50% of the data:

$$ \text{IQR} = Q_3 - Q_1 $$

where $Q_1$ and $Q_3$ are the 25th and 75th percentiles. By construction, the IQR ignores the most extreme quarter of values on each side — which makes it robust to outliers. A single absurd value can't move it.

MeasureFormulaStrengthWeakness
Range$\max - \min$Trivial to computeWrecked by a single outlier
IQR$Q_3 - Q_1$Robust to outliersThrows away half the data
Standard deviation$\sqrt{\tfrac{1}{n}\sum(x_i - \bar{x})^2}$Uses every point; algebra-friendlySensitive to outliers

Reach for the IQR when your data is skewed or contaminated. Reach for the standard deviation when the distribution is roughly symmetric and you want to do any downstream math.

9. Playground: see the spread

Drag the sliders to change the mean $\mu$ and the standard deviation $\sigma$. Watch the bell curve shift and stretch — and notice that the $\pm 1\sigma$ band always traps about 68% of the area, no matter how wide or narrow you make the curve.

μ = 0.0 σ = 1.00 σ² = 1.00
68% of values fall in [μ−σ, μ+σ] = [−1.00, 1.00]
95% of values fall in [μ−2σ, μ+2σ] = [−2.00, 2.00]
0.0
1.00
x μ−σ μ+σ μ−2σ μ+2σ μ ±1σ (≈ 68%) ±2σ (≈ 95%) mean (μ)
Try it

Pin $\sigma = 1$ and slide $\mu$ left and right — the curve translates but keeps its shape. Now pin $\mu = 0$ and slide $\sigma$ — the curve flattens or pinches without moving its center. Variance ($\sigma^2$) grows quadratically: doubling $\sigma$ quadruples $\sigma^2$. That's the single most important fact about how spread scales.

10. Common pitfalls

Variance is in squared units

If your data is in dollars, the variance is in dollars-squared — a unit no one has intuition for. This is the entire reason standard deviation exists. When reporting spread to a reader, always report the standard deviation, not the variance.

$n$ versus $n - 1$ matters more than people think

For $n = 10$, dividing by 9 instead of 10 increases the variance by about 11% and the standard deviation by about 5%. For $n = 5$, dividing by 4 instead inflates SD by over 11%. Only for very large $n$ does the distinction become negligible — and that's exactly the case where it doesn't matter anyway.

68-95-99.7 only applies to normal data

It is tempting to use the rule as though it were a universal law. It isn't. For a uniform distribution on $[0, 1]$, 100% of the data sits within $\sqrt{3} \approx 1.73$ standard deviations of the mean. For a heavy-tailed distribution, more than 5% of values can sit beyond $\pm 2\sigma$. Always check the shape first.

Outliers wreck variance

Because deviations are squared, a single point ten standard deviations away contributes 100 times what a "typical" outlier does. One bad data point can double the standard deviation of an otherwise clean dataset. If your data has extreme values, consider robust measures of spread like the interquartile range.

11. Worked examples

Work through each one before opening the solution. The goal is to internalize the four-step recipe — compute the mean, find deviations, square them, average them — and to see how the resulting numbers actually behave.

Example 1 · Variance of $\{2, 4, 4, 4, 5, 5, 7, 9\}$ (population)

Step 1. Compute the mean:

$$ \bar{x} = \frac{2 + 4 + 4 + 4 + 5 + 5 + 7 + 9}{8} = \frac{40}{8} = 5 $$

Step 2. Deviations from the mean: $-3, -1, -1, -1, 0, 0, 2, 4$.

Step 3. Square each: $9, 1, 1, 1, 0, 0, 4, 16$.

Step 4. Average the squared deviations:

$$ \sigma^2 = \frac{9 + 1 + 1 + 1 + 0 + 0 + 4 + 16}{8} = \frac{32}{8} = 4 $$

So the population variance is $4$.

Example 2 · Standard deviation of the same dataset

From Example 1, $\sigma^2 = 4$. Standard deviation is just the square root:

$$ \sigma = \sqrt{4} = 2 $$

Notice that the original data had values ranging from 2 to 9, with mean 5. A standard deviation of 2 is exactly the right order of magnitude — most points are within 2 units of the mean, which lines up with what you see.

Example 3 · Why squaring beats absolute value

Take a tiny dataset: $\{-2, 2\}$ with mean $0$. The mean absolute deviation is

$$ \text{MAD} = \frac{|-2| + |2|}{2} = 2 $$

That feels right — each point is exactly 2 units from the mean. Now compute the standard deviation:

$$ \sigma = \sqrt{\frac{(-2)^2 + 2^2}{2}} = \sqrt{\frac{8}{2}} = 2 $$

For symmetric two-point sets, the two measures agree. But add a third point at $0$ to get $\{-2, 0, 2\}$:

$$ \text{MAD} = \tfrac{4}{3} \approx 1.33, \qquad \sigma = \sqrt{\tfrac{8}{3}} \approx 1.63 $$

Standard deviation is larger because squaring weights the far points more heavily. That's the property — punish outliers — that makes variance algebraically well-behaved and gives the empirical rule its power.

Example 4 · Applying the 68-95 rule to test scores

A class of students takes a standardized test. The scores are approximately normally distributed with mean $\mu = 500$ and standard deviation $\sigma = 100$. What fraction of students scored between $400$ and $700$?

Step 1. Convert the bounds into standard-deviation units. $400$ is $-1\sigma$ from the mean; $700$ is $+2\sigma$.

Step 2. Use the symmetric percentages from the empirical rule. The interval $[\mu - 1\sigma, \mu + 1\sigma]$ captures $68\%$, so each half (the left or the right) holds $34\%$. The interval $[\mu, \mu + 2\sigma]$ captures half of $95\%$, which is $47.5\%$.

Step 3. Add the two halves:

$$ 34\% + 47.5\% = 81.5\% $$

So about $81.5\%$ of students scored between $400$ and $700$.

Example 5 · Same mean, different spread

Two five-game seasons for Alice and Bob, both averaging $20$ points per game.

Alice: $\{19, 21, 20, 20, 20\}$, mean $20$.

$$ \sigma^2_A = \frac{(-1)^2 + 1^2 + 0 + 0 + 0}{5} = \frac{2}{5} = 0.4 \quad\Longrightarrow\quad \sigma_A \approx 0.63 $$

Bob: $\{5, 35, 10, 40, 10\}$, mean $20$.

$$ \sigma^2_B = \frac{225 + 225 + 100 + 400 + 100}{5} = \frac{1050}{5} = 210 \quad\Longrightarrow\quad \sigma_B \approx 14.49 $$

Same mean. Standard deviations differ by a factor of more than 20. Alice is a reliable scorer; Bob is a coin flip. The mean alone could never have told you that — but the standard deviation makes the difference unmistakable.

Sources & further reading

The content above is synthesized from established statistics references. If anything reads ambiguously here, the primary sources below are the ground truth — and the "going deeper" links are where to turn when this page has served its purpose.

Test your understanding

A quiz that builds from easy to hard. Pick an answer to get instant feedback and a worked explanation. Your progress is saved in this browser — come back anytime to continue.

Question 1 of 24
0 correct