Mean, Median & Mode

What you'll leave with

Precise definitions of mean, median, and mode — and what each one actually measures.
Why the mean is sensitive to outliers and the median isn't, with a picture for the intuition.
A decision rule for which "average" to reach for given the shape and type of your data.
The classic traps: mean of integer counts, mode of continuous data, and the word "average" itself.

1. Three notions of "typical"

Suppose you have a list of numbers — test scores, salaries, house prices, the number of pets in each household on your street. You want to communicate one fact: what's the typical value? There isn't a single answer, because "typical" isn't one idea. It's at least three.

Mean — the balance point. If you piled the data on a seesaw, this is where it would balance.
Median — the middle of the pack. Half the data sits below it, half above.
Mode — the most common value. The peak of the histogram.

On a perfectly symmetric, single-peaked dataset they coincide. On almost anything real — incomes, response times, file sizes, exam scores with a long tail — they separate, and the gap between them tells you something about the shape of the data. Knowing which one to use, and which one someone else has used on you, is half of basic statistical literacy.

These three numbers all belong to a family called measures of central tendency. Each one is a different rule for collapsing a list into a single representative value.

2. The mean (arithmetic average)

Arithmetic mean

The sum of all values divided by the count of values. For a dataset $x_1, x_2, \ldots, x_n$:

$$ \bar{x} \;=\; \frac{1}{n} \sum_{i=1}^{n} x_i \;=\; \frac{x_1 + x_2 + \cdots + x_n}{n} $$

The mean is what most people mean when they say "average." It's the value that, if every data point were replaced by it, would leave the total unchanged. It's the balance point of the data: imagine placing one unit of mass at each $x_i$ on the number line — the mean is the location where the line would balance on a fulcrum.

It has two things going for it: it's algebraically friendly (you can compute it in a single pass, you can combine means of subgroups, it has tidy properties in proofs), and it uses every data point. Nothing is thrown away.

That same property — using every data point — is also its weakness. A single extreme value can drag the mean a long way. The salaries

$$ 30,\; 32,\; 35,\; 38,\; 40 \quad (\text{thousand dollars}) $$

have mean $35$. Replace one person with a CEO earning $1{,}000$:

$$ 30,\; 32,\; 35,\; 38,\; 1000 \quad\Longrightarrow\quad \bar{x} = 227 $$

Not one of the five people earns anything close to $227$k. The mean is mathematically correct and descriptively misleading. This is the central thing to remember about the mean: one outlier can move it as much as you want.

Aside

There are other "means" in statistics — geometric, harmonic, weighted, trimmed. Unless someone tells you otherwise, "the mean" means the arithmetic mean defined above. The geometric mean shows up when averaging ratios or growth rates; the harmonic mean when averaging rates like speed.

3. The median

Median

Sort the values. The median is the one in the middle. If the count $n$ is odd, it's the single middle value. If $n$ is even, it's the average of the two middle values.

The recipe in steps:

Sort the values from smallest to largest.
If $n$ is odd, the median is the value at position $(n+1)/2$.
If $n$ is even, the median is the average of the values at positions $n/2$ and $n/2 + 1$.

For the salary list $30, 32, 35, 38, 40$ the median is the third value, $35$. Replace one person with the CEO at $1{,}000$:

$$ 30,\; 32,\; 35,\; 38,\; 1000 $$

The median is still $35$. The CEO's exact salary doesn't matter — only their rank matters. Push them to a million or a billion and the median doesn't move. This property has a name: the median is robust to outliers.

Geometrically, the median splits the dataset into two equal-sized halves. Half of the values are at or below it; half are at or above. That's the precise sense in which it's "the middle."

Right-skewed data — incomes, house prices, response times. The long tail of large values pulls the mean to the right, but the median (which only cares about rank) stays anchored near the bulk of the data. The mode sits at the peak.

4. The mode

Mode

The value (or values) that occur most often in the dataset.

The mode is the simplest of the three to state and the trickiest to use well. For the list

$$ 2,\; 3,\; 3,\; 5,\; 7,\; 7,\; 7,\; 9 $$

the mode is $7$ — it appears three times, more than any other value. For

$$ 2,\; 3,\; 3,\; 5,\; 7,\; 7,\; 9 $$

there are two modes, $3$ and $7$, each appearing twice. A dataset with two modes is called bimodal; more than two, multimodal. A dataset where every value occurs once has no useful mode at all.

The mode is the only one of the three that works on data that isn't numeric. Ask "what's the most common eye colour in this room?" and only the mode can answer — there's no meaningful way to add or rank eye colours, but you can count them. This is why the mode is the natural summary for categorical data: favourite ice cream flavour, browser used, blood type, vote choice.

Continuous data is ill-suited to the mode

If your data is something like exact human heights to the millimetre, every measured value is probably unique — so technically every value is a mode, which is useless. For continuous data, people usually mean the mode of the density (the peak of a histogram or smoothed curve), which depends on how you bin or smooth. It's a different beast from the discrete mode.

5. The weighted mean

Not every value in a dataset deserves equal say. A course grade made of 40% final exam, 30% midterm, 20% homework, and 10% participation isn't the simple average of those four numbers — it's a weighted mean, where each value gets multiplied by its importance before summing.

Weighted mean

For values $x_1, \ldots, x_n$ with non-negative weights $w_1, \ldots, w_n$:

$$ \bar{x}_w \;=\; \frac{\sum_{i=1}^n w_i \, x_i}{\sum_{i=1}^n w_i} $$

When every $w_i$ is equal, this collapses back to the ordinary arithmetic mean.

Example. A student scores $90$ on the exam (weight $0.4$), $80$ on the midterm (weight $0.3$), $70$ on homework (weight $0.2$), and $60$ on participation (weight $0.1$). The weighted mean is

$$ \bar{x}_w \;=\; \frac{0.4(90) + 0.3(80) + 0.2(70) + 0.1(60)}{0.4 + 0.3 + 0.2 + 0.1} \;=\; \frac{36 + 24 + 14 + 6}{1} \;=\; 80 $$

The simple unweighted average of $90, 80, 70, 60$ would be $75$. The weights matter — they shift the mean toward the values that count for more.

When weights show up naturally

Grade calculations, GPA, batting averages combined across seasons, opinion polls combining subgroups of different sizes, and any "average across averages" where the underlying groups had different counts. Whenever you find yourself averaging averages, ask whether you should be weighting.

6. Skewness: what the three numbers say together

The relationship between mean, median, and mode is itself a measurement — a quick read on the shape of the distribution. You don't need a histogram to detect skew; you just need to compare two or three numbers.

Relationship	Shape	Example
mean $\approx$ median $\approx$ mode	Symmetric, single-peaked	Heights of adults, IQ scores
mean $>$ median $>$ mode	Right-skewed (positive skew) — long tail to the right	Incomes, house prices, response times
mean $<$ median $<$ mode	Left-skewed (negative skew) — long tail to the left	Age at death in a developed country, exam scores when most pass easily

The intuition: the mean follows the tail because every extreme value drags it. The median, anchored by rank, barely moves. The mode sits at the peak. So the longer the tail, the further the mean is pulled away from the mode in the direction of the tail, with the median sitting between them.

Mnemonic

The skew is named for the direction of the tail, not the bulk. "Right-skewed" means a long right tail (with most of the data on the left). The mean ends up on the tail side.

Bimodal distributions

If a distribution has two peaks, neither the mean nor the median tells you much on its own — they'll likely fall in the valley between the two modes, where almost no data lives. A dataset of adult shoe sizes mixed across men and women, or wait times for an elevator at two distinct shift changes, will look bimodal. The honest summary in those cases is "two modes, near $X$ and $Y$" — and ideally, splitting the dataset into the two subpopulations first and reporting each separately.

When a single "average" misleads

A bimodal dataset reported with one mean and one median erases the very structure that makes it interesting. If "the average commute is 35 minutes" hides two distinct populations — one with 15-minute walks and one with 55-minute drives — the average describes nobody. Plot the data before summarizing it.

7. When each is the right choice

The choice isn't aesthetic — it depends on the shape of the data and the question you want answered.

Data looks like	Reach for	Why
Roughly symmetric, no extreme values (e.g. heights, exam scores)	Mean	Uses every data point, has nice algebra, and on symmetric data it agrees with the median anyway.
Skewed, or with outliers (incomes, house prices, response times)	Median	Outliers can't drag it. The "typical household income" in a country is always reported as a median for this reason.
Categorical, or the question is literally "most common?"	Mode	It's the only one defined for non-numeric data and the only one that directly answers "most common."
Tiny dataset or you're not sure	Report all three	The gap between mean and median is itself informative — it tells you how skewed the data is.

Heuristic

Mean > median means the tail is on the right (high outliers pulling the mean up). Median > mean means the tail is on the left. Mean ≈ median means the distribution is roughly symmetric. You can diagnose skew with two numbers, no plot required.

9. Common pitfalls

"Average" is ambiguous

In everyday English, "average" almost always means the arithmetic mean — but not always. News reporting often uses "average income" for the median. Whenever someone hands you an "average," your first question should be which one? The answer can change the story entirely.

A single outlier moves the mean as far as it likes

One value typo'd as $1{,}000{,}000$ instead of $100$ won't change the median or the mode of a dataset of size $50$, but it will multiply the mean by a factor of $200$ at the largest sample value. Always sanity-check the mean against the median before publishing one.

The mean of integers isn't an integer

"The average household has 1.7 children" is meaningful even though no household has 1.7 children. The mean is a summary, not a prediction about any individual. Wanting the "typical" household? That's the mode (usually 2) or median, not the mean.

Mode of continuous data is ill-defined

If your data is real-valued and finely measured, every value is probably unique, so every value is technically a mode. When people quote "the mode" of a continuous distribution, they almost always mean the peak of a histogram — which depends entirely on how the bins were chosen.

10. Worked examples

Try each one yourself before opening the solution. The mechanics are easy; the point is to check that your steps match the canonical recipe.

Example 1 · Mean of a list

Find the mean of $4,\; 7,\; 2,\; 9,\; 8$.

Step 1. Sum the values:

$$ 4 + 7 + 2 + 9 + 8 = 30 $$

Step 2. Divide by the count $n = 5$:

$$ \bar{x} = \frac{30}{5} = 6 $$

The mean is $6$.

Example 2 · Median with an odd count

Find the median of $11,\; 4,\; 8,\; 2,\; 6$.

Step 1. Sort the values:

$$ 2,\; 4,\; 6,\; 8,\; 11 $$

Step 2. The count is $n = 5$ (odd), so the median is the value at position $(5+1)/2 = 3$:

$$ \text{median} = 6 $$

Example 3 · Median with an even count

Find the median of $11,\; 4,\; 8,\; 2,\; 6,\; 9$.

Step 1. Sort the values:

$$ 2,\; 4,\; 6,\; 8,\; 9,\; 11 $$

Step 2. The count is $n = 6$ (even). Take the average of the two middle values — positions $3$ and $4$, which are $6$ and $8$:

$$ \text{median} = \frac{6 + 8}{2} = 7 $$

Example 4 · Mode

Find the mode of $3,\; 1,\; 4,\; 1,\; 5,\; 9,\; 2,\; 6,\; 5,\; 3,\; 5$.

Step 1. Tally the occurrences:

Value	Count
1	2
2	1
3	2
4	1
5	3
6	1
9	1

Step 2. The largest count is $3$, belonging to $5$:

$$ \text{mode} = 5 $$

Example 5 · Comparing all three on skewed data

A small company has nine employees with annual salaries (in thousands of dollars):

$$ 32,\; 35,\; 38,\; 40,\; 42,\; 45,\; 48,\; 50,\; 320 $$

The last salary belongs to the founder. Compute the mean, median, and mode, and decide which best represents "typical pay."

Mean. Sum is $32 + 35 + 38 + 40 + 42 + 45 + 48 + 50 + 320 = 650$. Divide by $9$:

$$ \bar{x} = \frac{650}{9} \approx 72.2 $$

Median. Already sorted; $n = 9$, so the middle value is at position $5$:

$$ \text{median} = 42 $$

Mode. Every value occurs exactly once, so there is no useful mode.

Interpretation. The mean of $72.2$k makes the company look generously paid, but only one person — the founder — earns anywhere near that. The median of $42$k is what almost every employee experiences. For "typical pay," the median is the honest summary; the mean is technically correct but rhetorically misleading.

Sources & further reading

The content above is synthesized from established statistics references. If anything reads ambiguously here, the primary sources are the ground truth — and the "going deeper" links are where to turn when this page has served its purpose.

Descriptive Statistics Textbook OpenStax · Statistics, Ch. 2

Peer-reviewed, openly licensed introductory statistics chapter. Covers measures of central tendency rigorously, with worked examples and exercises.
Summarizing quantitative data Tutorial Khan Academy · Statistics & Probability

Short video lessons and practice problems on mean, median, mode, and the effect of outliers. Best if you want to drill the mechanics until they're automatic.
Mean Reference Wolfram MathWorld

Formal mathematical reference — short, dense, precise. Use this when you want the definition stated in the language professional mathematicians actually use, plus pointers to the other means (geometric, harmonic, etc.).
Median Encyclopedia Wikipedia

Broad overview of the median, including its robustness properties, the relationship to quantiles, and how it generalizes to higher dimensions. Useful for placing this topic in the wider statistical landscape.