1. Three notions of "typical"
Suppose you have a list of numbers — test scores, salaries, house prices, the number of pets in each household on your street. You want to communicate one fact: what's the typical value? There isn't a single answer, because "typical" isn't one idea. It's at least three.
- Mean — the balance point. If you piled the data on a seesaw, this is where it would balance.
- Median — the middle of the pack. Half the data sits below it, half above.
- Mode — the most common value. The peak of the histogram.
On a perfectly symmetric, single-peaked dataset they coincide. On almost anything real — incomes, response times, file sizes, exam scores with a long tail — they separate, and the gap between them tells you something about the shape of the data. Knowing which one to use, and which one someone else has used on you, is half of basic statistical literacy.
These three numbers all belong to a family called measures of central tendency. Each one is a different rule for collapsing a list into a single representative value.
2. The mean (arithmetic average)
The sum of all values divided by the count of values. For a dataset $x_1, x_2, \ldots, x_n$:
$$ \bar{x} \;=\; \frac{1}{n} \sum_{i=1}^{n} x_i \;=\; \frac{x_1 + x_2 + \cdots + x_n}{n} $$The mean is what most people mean when they say "average." It's the value that, if every data point were replaced by it, would leave the total unchanged. It's the balance point of the data: imagine placing one unit of mass at each $x_i$ on the number line — the mean is the location where the line would balance on a fulcrum.
It has two things going for it: it's algebraically friendly (you can compute it in a single pass, you can combine means of subgroups, it has tidy properties in proofs), and it uses every data point. Nothing is thrown away.
That same property — using every data point — is also its weakness. A single extreme value can drag the mean a long way. The salaries
$$ 30,\; 32,\; 35,\; 38,\; 40 \quad (\text{thousand dollars}) $$have mean $35$. Replace one person with a CEO earning $1{,}000$:
$$ 30,\; 32,\; 35,\; 38,\; 1000 \quad\Longrightarrow\quad \bar{x} = 227 $$Not one of the five people earns anything close to $227$k. The mean is mathematically correct and descriptively misleading. This is the central thing to remember about the mean: one outlier can move it as much as you want.
There are other "means" in statistics — geometric, harmonic, weighted, trimmed. Unless someone tells you otherwise, "the mean" means the arithmetic mean defined above. The geometric mean shows up when averaging ratios or growth rates; the harmonic mean when averaging rates like speed.
3. The median
Sort the values. The median is the one in the middle. If the count $n$ is odd, it's the single middle value. If $n$ is even, it's the average of the two middle values.
The recipe in steps:
- Sort the values from smallest to largest.
- If $n$ is odd, the median is the value at position $(n+1)/2$.
- If $n$ is even, the median is the average of the values at positions $n/2$ and $n/2 + 1$.
For the salary list $30, 32, 35, 38, 40$ the median is the third value, $35$. Replace one person with the CEO at $1{,}000$:
$$ 30,\; 32,\; 35,\; 38,\; 1000 $$The median is still $35$. The CEO's exact salary doesn't matter — only their rank matters. Push them to a million or a billion and the median doesn't move. This property has a name: the median is robust to outliers.
Geometrically, the median splits the dataset into two equal-sized halves. Half of the values are at or below it; half are at or above. That's the precise sense in which it's "the middle."
Right-skewed data — incomes, house prices, response times. The long tail of large values pulls the mean to the right, but the median (which only cares about rank) stays anchored near the bulk of the data. The mode sits at the peak.
4. The mode
The value (or values) that occur most often in the dataset.
The mode is the simplest of the three to state and the trickiest to use well. For the list
$$ 2,\; 3,\; 3,\; 5,\; 7,\; 7,\; 7,\; 9 $$the mode is $7$ — it appears three times, more than any other value. For
$$ 2,\; 3,\; 3,\; 5,\; 7,\; 7,\; 9 $$there are two modes, $3$ and $7$, each appearing twice. A dataset with two modes is called bimodal; more than two, multimodal. A dataset where every value occurs once has no useful mode at all.
The mode is the only one of the three that works on data that isn't numeric. Ask "what's the most common eye colour in this room?" and only the mode can answer — there's no meaningful way to add or rank eye colours, but you can count them. This is why the mode is the natural summary for categorical data: favourite ice cream flavour, browser used, blood type, vote choice.
If your data is something like exact human heights to the millimetre, every measured value is probably unique — so technically every value is a mode, which is useless. For continuous data, people usually mean the mode of the density (the peak of a histogram or smoothed curve), which depends on how you bin or smooth. It's a different beast from the discrete mode.
5. The weighted mean
Not every value in a dataset deserves equal say. A course grade made of 40% final exam, 30% midterm, 20% homework, and 10% participation isn't the simple average of those four numbers — it's a weighted mean, where each value gets multiplied by its importance before summing.
For values $x_1, \ldots, x_n$ with non-negative weights $w_1, \ldots, w_n$:
$$ \bar{x}_w \;=\; \frac{\sum_{i=1}^n w_i \, x_i}{\sum_{i=1}^n w_i} $$When every $w_i$ is equal, this collapses back to the ordinary arithmetic mean.
Example. A student scores $90$ on the exam (weight $0.4$), $80$ on the midterm (weight $0.3$), $70$ on homework (weight $0.2$), and $60$ on participation (weight $0.1$). The weighted mean is
$$ \bar{x}_w \;=\; \frac{0.4(90) + 0.3(80) + 0.2(70) + 0.1(60)}{0.4 + 0.3 + 0.2 + 0.1} \;=\; \frac{36 + 24 + 14 + 6}{1} \;=\; 80 $$The simple unweighted average of $90, 80, 70, 60$ would be $75$. The weights matter — they shift the mean toward the values that count for more.
Grade calculations, GPA, batting averages combined across seasons, opinion polls combining subgroups of different sizes, and any "average across averages" where the underlying groups had different counts. Whenever you find yourself averaging averages, ask whether you should be weighting.
6. Skewness: what the three numbers say together
The relationship between mean, median, and mode is itself a measurement — a quick read on the shape of the distribution. You don't need a histogram to detect skew; you just need to compare two or three numbers.
| Relationship | Shape | Example |
|---|---|---|
| mean $\approx$ median $\approx$ mode | Symmetric, single-peaked | Heights of adults, IQ scores |
| mean $>$ median $>$ mode | Right-skewed (positive skew) — long tail to the right | Incomes, house prices, response times |
| mean $<$ median $<$ mode | Left-skewed (negative skew) — long tail to the left | Age at death in a developed country, exam scores when most pass easily |
The intuition: the mean follows the tail because every extreme value drags it. The median, anchored by rank, barely moves. The mode sits at the peak. So the longer the tail, the further the mean is pulled away from the mode in the direction of the tail, with the median sitting between them.
The skew is named for the direction of the tail, not the bulk. "Right-skewed" means a long right tail (with most of the data on the left). The mean ends up on the tail side.
Bimodal distributions
If a distribution has two peaks, neither the mean nor the median tells you much on its own — they'll likely fall in the valley between the two modes, where almost no data lives. A dataset of adult shoe sizes mixed across men and women, or wait times for an elevator at two distinct shift changes, will look bimodal. The honest summary in those cases is "two modes, near $X$ and $Y$" — and ideally, splitting the dataset into the two subpopulations first and reporting each separately.
A bimodal dataset reported with one mean and one median erases the very structure that makes it interesting. If "the average commute is 35 minutes" hides two distinct populations — one with 15-minute walks and one with 55-minute drives — the average describes nobody. Plot the data before summarizing it.
7. When each is the right choice
The choice isn't aesthetic — it depends on the shape of the data and the question you want answered.
| Data looks like | Reach for | Why |
|---|---|---|
| Roughly symmetric, no extreme values (e.g. heights, exam scores) | Mean | Uses every data point, has nice algebra, and on symmetric data it agrees with the median anyway. |
| Skewed, or with outliers (incomes, house prices, response times) | Median | Outliers can't drag it. The "typical household income" in a country is always reported as a median for this reason. |
| Categorical, or the question is literally "most common?" | Mode | It's the only one defined for non-numeric data and the only one that directly answers "most common." |
| Tiny dataset or you're not sure | Report all three | The gap between mean and median is itself informative — it tells you how skewed the data is. |
Mean > median means the tail is on the right (high outliers pulling the mean up). Median > mean means the tail is on the left. Mean ≈ median means the distribution is roughly symmetric. You can diagnose skew with two numbers, no plot required.