1. The arithmetic mean
For a list of numbers $x_1, x_2, \ldots, x_n$, the arithmetic mean is their sum divided by their count:
$$ \bar{x} \;=\; \frac{x_1 + x_2 + \cdots + x_n}{n} \;=\; \frac{1}{n}\sum_{i=1}^{n} x_i. $$The mean is the "balance point" of the data — if you imagine the numbers as weights placed along a ruler at their values, the mean is where you'd put the fulcrum for the ruler to balance. That mental picture explains a lot of behaviour: pull one value far to the right and the balance point drifts that way; add a value at exactly $\bar{x}$ and nothing changes at all.
The identity that does the work
Rearrange the definition and you get the single most useful identity for solving average problems:
$$ \sum_{i=1}^{n} x_i \;=\; n \cdot \bar{x}. $$Almost every "find the missing value" or "what score do I need on the final" problem reduces to this. Don't think in averages — think in sums, then divide at the end.
If five test scores average 80, their sum is $5 \times 80 = 400$. If four of them are 72, 85, 91, and 68 (summing to 316), the fifth must be $400 - 316 = 84$. You never need to "average" again — you only need the total.
Two derived rules
Addition rule. Append a value $y$ to a list of $n$ items with mean $\bar{x}$:
$$ \bar{x}_{\text{new}} \;=\; \frac{n \bar{x} + y}{n + 1}. $$If $y > \bar{x}$, the mean rises; if $y < \bar{x}$, it falls; if $y = \bar{x}$, it doesn't budge.
Replacement rule. Swap one value $y_{\text{out}}$ for another $y_{\text{in}}$ (keeping $n$ fixed):
$$ \bar{x}_{\text{new}} \;=\; \bar{x}_{\text{old}} + \frac{y_{\text{in}} - y_{\text{out}}}{n}. $$The change in the mean is the change in the sum, spread evenly across all $n$ items. A swap that lifts the sum by 30 in a class of 10 lifts the average by 3.
2. Weighted averages
Plain averages assume every item counts the same. The real world rarely does. A final exam matters more than a homework set. A 4-credit course should pull harder on a GPA than a 1-credit one. A class of 30 should outweigh a class of 10 in any "combined" measure. The fix is to attach a weight $w_i$ to each value:
$$ \bar{x}_w \;=\; \frac{\sum w_i x_i}{\sum w_i}. $$The denominator is the total weight, not the count. When every $w_i = 1$, this collapses back to the ordinary mean — the plain mean is just a weighted mean where every weight happens to equal 1.
Grade-style example
Suppose a course grade is 40% exam, 30% project, 30% homework. A student scores 90, 80, and 70 respectively. Their grade is
$$ \bar{x}_w \;=\; \frac{0.40(90) + 0.30(80) + 0.30(70)}{0.40 + 0.30 + 0.30} \;=\; \frac{36 + 24 + 21}{1} \;=\; 81. $$Weights stated as percentages already sum to 1, so the denominator vanishes — but it's still there in principle. Forgetting that is one of the most common sources of subtle errors.
Combining two groups
Two groups of sizes $n_1$ and $n_2$ with means $\bar{x}_1$ and $\bar{x}_2$ have a combined mean
$$ \bar{x}_{\text{combined}} \;=\; \frac{n_1 \bar{x}_1 + n_2 \bar{x}_2}{n_1 + n_2}. $$A class of 20 averaging 70 combined with a class of 30 averaging 80 gives
$$ \frac{20(70) + 30(80)}{50} \;=\; \frac{1400 + 2400}{50} \;=\; 76, $$not the naive $(70 + 80)/2 = 75$. The larger group pulls the average closer to its own mean.
If the groups have different sizes, the simple average of their means is wrong. You can only get away with $(\bar{x}_1 + \bar{x}_2)/2$ when $n_1 = n_2$. Always weight by size unless you've checked.
3. Harmonic and geometric means
"Average" is a category, not a single formula. Three classical members of the family show up often enough to deserve names. For positive numbers $x_1, \ldots, x_n$:
| Name | Formula | Built for |
|---|---|---|
| Arithmetic mean (AM) | $\bar{x} = \dfrac{1}{n}\sum x_i$ | Additive quantities — scores, prices, counts |
| Geometric mean (GM) | $\sqrt[n]{x_1 x_2 \cdots x_n}$ | Multiplicative quantities — growth rates, ratios |
| Harmonic mean (HM) | $\dfrac{n}{\sum (1/x_i)}$ | Rates measured against a fixed denominator — speed over a fixed distance |
The three are linked by the AM–GM–HM inequality: for any positive numbers, $\text{AM} \geq \text{GM} \geq \text{HM}$, with equality only when every $x_i$ is the same. The means collapse onto each other when the data is uniform, and spread apart as the data spreads.
The average-speed trap
You drive 60 km to a town at 40 km/h, then 60 km home at 60 km/h. What's your average speed for the round trip?
The intuition that screams "50 km/h" is wrong. Average speed is total distance over total time, never the simple average of speeds:
$$ \text{time out} = \frac{60}{40} = 1.5 \text{ h}, \quad \text{time back} = \frac{60}{60} = 1 \text{ h}. $$ $$ \text{avg speed} = \frac{\text{total distance}}{\text{total time}} = \frac{120}{2.5} = 48 \text{ km/h}. $$Why 48 and not 50? Because the slower leg takes longer. You spend more of the trip at 40 km/h than at 60 km/h, so the average sags toward the slower speed. The formula for two equal-distance legs is the harmonic mean:
$$ v_{\text{avg}} \;=\; \frac{2 v_1 v_2}{v_1 + v_2} \;=\; \frac{2 \cdot 40 \cdot 60}{40 + 60} \;=\; \frac{4800}{100} \;=\; 48. $$For equal distances at different speeds, average speed is the harmonic mean. For equal times at different speeds, it really is the arithmetic mean. Which kind of "equal" you're holding fixed decides which mean applies — and getting it backward is the canonical mistake.
When the geometric mean is the right choice
An investment grows 50% in year one and loses 50% in year two. What's the "average" annual return?
Arithmetic says $(50 + (-50))/2 = 0\%$ — but a $\$100$ stake becomes $\$150$, then $\$75$. You're down 25%, not flat. The correct average is the geometric mean of the growth factors:
$$ \sqrt{1.5 \cdot 0.5} \;=\; \sqrt{0.75} \;\approx\; 0.866. $$So the average annual factor is about 0.866 — a roughly $13.4\%$ annual loss, which compounded over two years gives the actual 25% drop. Multiplicative processes call for multiplicative means.
4. Mixtures as weighted averages
A mixture problem combines two (or more) components with different characteristic values — prices, concentrations, alcohol percentages — and asks about the result. The mixture's characteristic is simply the weighted average of the components, weighted by quantity:
$$ m \;=\; \frac{w_1 a + w_2 b}{w_1 + w_2}, $$where $a, b$ are the components' values and $w_1, w_2$ are how much of each you used. That's the whole secret. Everything that follows is bookkeeping.
A worked blend
You mix 2 L of orange juice (10% sugar) with 1 L of soda (15% sugar). The blend's sugar content is
$$ \frac{2 \cdot 0.10 + 1 \cdot 0.15}{2 + 1} \;=\; \frac{0.20 + 0.15}{3} \;=\; \frac{0.35}{3} \;\approx\; 11.67\%. $$The answer sits between 10% and 15%, biased toward the juice's value because there's twice as much juice. That bias is the whole point: more of something pulls the average toward that something.
Most mixture problems boil down to two conservation statements: total quantity (volume, mass, count) and total characteristic content (grams of sugar, dollars of value, litres of pure alcohol). Write both, and the unknowns fall out.
5. Alligation: the cross method
When you only need the ratio of two components — not their absolute amounts — there's a shortcut that has been taught in commerce schools since the medieval Islamic world. It's called alligation, and it reduces the algebra to a picture.
To mix components valued $a$ (cheaper) and $b$ (dearer) into a blend valued $m$ (with $a < m < b$):
$$ \frac{\text{quantity of } a}{\text{quantity of } b} \;=\; \frac{b - m}{m - a}. $$Visually, write $a$ and $b$ at the top, $m$ in the middle, and "cross" them — each component's share of the mix is its opposite's distance from the target:
Where the formula comes from
It's just the weighted-average equation, solved for the ratio. If quantities $w_1$ of $a$ and $w_2$ of $b$ produce a mixture worth $m$, then
$$ \frac{w_1 a + w_2 b}{w_1 + w_2} = m. $$Multiply out: $w_1 a + w_2 b = m(w_1 + w_2)$. Group the $w$'s: $w_1(a - m) = w_2(m - b)$, or equivalently $w_1(m - a) = w_2(b - m)$. Therefore
$$ \frac{w_1}{w_2} \;=\; \frac{b - m}{m - a}. $$The cross diagram is the formula made visual: each component sits on the opposite diagonal from its share of the mix.
The coffee blend
Arabica costs $\$8$/kg and Robusta costs $\$14$/kg. To blend a $\$10$/kg coffee:
$$ \frac{\text{Arabica}}{\text{Robusta}} \;=\; \frac{14 - 10}{10 - 8} \;=\; \frac{4}{2} \;=\; 2 : 1. $$Two parts Arabica to one part Robusta. Check: 2 kg at $\$8$ plus 1 kg at $\$14$ is $\$30$ total over 3 kg, which is $\$10$/kg exactly. The target is closer to the cheaper component, so the cheaper component dominates the blend — a pattern worth memorising.
The cross differences give the ratio of quantities, not "more of the cheap one" by reflex. The mnemonic: the closer a component's value is to the target, the more of it you need. The smaller distance, the bigger share — they're inversely related.
6. Successive dilution
A vessel holds $V$ litres of pure liquid. You remove $k$ litres of the mixture and replace them with water. Then you do it again. And again. After $n$ rounds, how much of the original liquid is left?
Each operation keeps the same fraction $(1 - k/V)$ of whatever original liquid was present, because you're removing a uniform mixture and replacing the missing volume with water (which contains none of the original). So the original-liquid amount after $n$ rounds is
$$ V \left(1 - \frac{k}{V}\right)^n. $$A 100 L wine cask diluted by removing 10 L and refilling with water three times retains $100 \cdot 0.9^3 = 100 \cdot 0.729 = 72.9$ L of wine. Not 70 L — that would be the answer if dilution were additive, but it's multiplicative. This is exactly the geometric-decay shape that shows up in radioactive half-lives, drug clearance, and compound interest with negative rates.
Successive dilution is multiplicative, not additive. Removing 10% three times leaves $0.9^3 \approx 72.9\%$ — not $100\% - 30\% = 70\%$. The error grows with more rounds; after ten 10%-removals you have $34.9\%$ left, not the (nonsensical) "100% − 100% = 0%".