1. What probability measures
A probability is just a number between $0$ and $1$ attached to an event. The number says how likely the event is — nothing more, nothing less. The two endpoints are the easy cases:
- $P(E) = 0$ — the event $E$ is impossible. It cannot happen.
- $P(E) = 1$ — the event $E$ is certain. It will happen.
- $P(E) = 0.5$ — the event $E$ is equally likely to happen or not.
Everything in between is a smooth interpolation. $P(E) = 0.9$ means "almost certain to happen, but not guaranteed." $P(E) = 0.01$ means "very unlikely, but possible." The scale is the same whether you're describing coin flips, weather forecasts, or the chance a medical test is right.
You could try to describe likelihood in words — "probably," "very likely," "almost never" — but words drift between people. A number forces you to commit to a specific level of belief and lets you combine it with other numbers using arithmetic. That's the whole reason probability exists as a subject.
Probabilities also have to be consistent. Two events that together cover every possibility — like "the coin lands heads" and "the coin lands tails" — must have probabilities that add to $1$. We'll formalize that in the next section.
2. Sample space and events
Before you can ask "how likely?", you have to be clear about what can happen. That's the job of the sample space.
The set of all possible outcomes of a random experiment. Every individual outcome is a single element of $S$, and the outcomes are mutually exclusive — exactly one of them happens each time you run the experiment.
For a single die, the sample space is
$$ S = \{1, 2, 3, 4, 5, 6\}. $$For a coin flip, $S = \{\text{H}, \text{T}\}$. For two coin flips, $S = \{\text{HH}, \text{HT}, \text{TH}, \text{TT}\}$. The sample space depends on the experiment, and writing it down explicitly is often the first move that turns a fuzzy question into a solvable one.
Any subset of the sample space. An event is a collection of outcomes you've grouped together because you care about them as a unit.
For the die, "rolling an even number" is the event $E = \{2, 4, 6\}$ — three of the six outcomes bundled together. "Rolling a six" is the event $E = \{6\}$ — a single outcome. "Rolling something at least one" is the event $E = S$ — every outcome. Events can be tiny or as large as the whole sample space.
This vocabulary may sound formal, but it's exactly what lets you say the same thing in symbols that you were already saying in words. The set-theoretic view is what makes the rest of probability go through cleanly.
3. The classical formula
When the sample space is finite and every outcome is equally likely, probability becomes counting. Specifically:
$$ P(E) = \frac{|E|}{|S|} = \frac{\text{number of favorable outcomes}}{\text{total number of outcomes}}. $$"Rolling an even number" on a fair die: $|E| = 3$ favorable outcomes (namely $2, 4, 6$), $|S| = 6$ total outcomes, so $P(E) = 3/6 = 1/2$. "Drawing the ace of spades" from a shuffled standard deck: $1/52$. "Getting heads twice in a row" from two fair coins: $1/4$ — there are four equally likely outcomes ($\text{HH}, \text{HT}, \text{TH}, \text{TT}$) and only one is favorable.
The formula only works when the outcomes really are equally likely. A loaded die, an unfair coin, or a biased lottery breaks the assumption — and silently substituting the formula anyway is one of the most common ways probability problems go wrong. When outcomes aren't equiprobable, you need a more general approach (assign probabilities to each outcome directly and sum them over the event).
Two consequences fall straight out of the definition and worth naming because you'll use them constantly:
- $0 \le P(E) \le 1$ for any event — the count of favorable outcomes is at least zero and at most the total.
- $P(S) = 1$ — the event "something in the sample space happens" is certain, because something always does.
4. Complement, union, intersection
Three rules let you build the probability of complicated events out of simpler ones. Each comes straight from set theory — events are sets, after all — but each is worth memorizing because real problems almost always involve combining events.
Complement
The complement of $A$, written $A^c$ or "not $A$," is everything in the sample space that isn't in $A$. Since $A$ and $A^c$ together cover the whole sample space exactly once,
$$ P(A^c) = 1 - P(A). $$This is unreasonably useful. Whenever an event is hard to count directly but its complement is easy, flip the problem over. The probability of "at least one heads in five flips" is annoying to compute directly (lots of cases) but trivial via the complement: $1 - P(\text{all tails}) = 1 - (1/2)^5 = 31/32$.
Union (inclusion–exclusion)
The union $A \cup B$ is the event "$A$ or $B$ or both." Its probability is
$$ P(A \cup B) = P(A) + P(B) - P(A \cap B). $$The intersection $A \cap B$ — the event "both $A$ and $B$" — gets subtracted because it was counted once in $P(A)$ and once again in $P(B)$. Without the correction you'd be double-counting the overlap.
Mutually exclusive events
Two events are mutually exclusive (or disjoint) when they can't both happen — their intersection is empty, $A \cap B = \emptyset$. In that case the correction term vanishes and the union rule simplifies to
$$ P(A \cup B) = P(A) + P(B) \quad\text{when } A \cap B = \emptyset. $$"Rolling a 1" and "rolling a 6" on the same die are mutually exclusive — there's no way to roll both, so the probability of either is just $1/6 + 1/6 = 1/3$. "Rolling an even number" and "rolling a 6" are not mutually exclusive (the 6 is in both), so you'd need the full inclusion–exclusion formula.
5. Independent events
Independence is the most slippery idea in this chapter, and the one most worth getting right. Two events are independent if knowing whether one happened tells you nothing about whether the other did.
Events $A$ and $B$ are independent if and only if
$P(A \cap B) = P(A) \cdot P(B).$
This is both the test for independence and the shortcut you get once you've established it. Coin flips are the canonical example: each flip has probability $1/2$ of landing heads, and the result of one flip doesn't influence the next, so the probability of two heads in a row is $\tfrac{1}{2} \cdot \tfrac{1}{2} = \tfrac{1}{4}$.
Drawing two cards without replacement from a deck, on the other hand, is not independent — the first card you draw is gone, so the second draw is from a 51-card deck whose composition depends on what you took. You'd need conditional probability (next topic) to handle that case.
The multiplication rule $P(A \cap B) = P(A) P(B)$ only holds when $A$ and $B$ are independent. Reaching for it reflexively, even when the events influence each other, is one of the most common mistakes in introductory probability. When in doubt, ask: "does knowing $A$ happened change my belief about $B$?" If yes, they're not independent and you need a different tool.
6. Frequentist vs. Bayesian, briefly
What does a probability actually mean? There are two dominant answers, and they disagree at the philosophical level even while they agree on most of the arithmetic.
| View | What $P(E)$ means | Comfortable with |
|---|---|---|
| Frequentist | The long-run fraction of times $E$ occurs when you repeat the experiment many times. | Coin flips, manufacturing defect rates, anything you can rerun. |
| Bayesian | A degree of belief about whether $E$ is true, given everything you currently know. | One-off events ("will it rain tomorrow?"), parameters of a model, beliefs that update with evidence. |
A frequentist will happily say "the probability this coin lands heads is $0.5$" because you can flip the coin a million times and watch the fraction settle. They'll be uneasy with "the probability the defendant is guilty is $0.7$" because the trial only runs once — there's no long-run frequency to point to.
A Bayesian is fine with both. To them, probability is a quantitative language for uncertainty, applicable wherever you have incomplete information. When evidence arrives, beliefs update via Bayes' theorem — which is exactly the next topic in this chapter.
Don't get too philosophical too early. The rules on this page — sample spaces, the classical formula, complement, union, independence — are the same under both interpretations. Pick the framing that makes the problem in front of you easier, and notice when you're switching between them.