Topic · Statistics & Probability

Probability Basics

Probability is the mathematics of uncertainty — a way to attach a number to "how likely?" so that vague hunches turn into something you can compute with. Almost everything that follows in statistics rests on the handful of ideas on this page.

What you'll leave with

  • What a probability is — a number on the interval $[0, 1]$ — and what the endpoints mean.
  • The sample-space-and-event framing that makes every probability question precise.
  • The classical formula $P(E) = |E| / |S|$ and exactly when you can use it.
  • Three rules — complement, union, and the independence test — that handle most introductory problems.
  • The frequentist vs. Bayesian distinction at a level you can hold in your head.

1. What probability measures

A probability is just a number between $0$ and $1$ attached to an event. The number says how likely the event is — nothing more, nothing less. The two endpoints are the easy cases:

  • $P(E) = 0$ — the event $E$ is impossible. It cannot happen.
  • $P(E) = 1$ — the event $E$ is certain. It will happen.
  • $P(E) = 0.5$ — the event $E$ is equally likely to happen or not.

Everything in between is a smooth interpolation. $P(E) = 0.9$ means "almost certain to happen, but not guaranteed." $P(E) = 0.01$ means "very unlikely, but possible." The scale is the same whether you're describing coin flips, weather forecasts, or the chance a medical test is right.

Why a number?

You could try to describe likelihood in words — "probably," "very likely," "almost never" — but words drift between people. A number forces you to commit to a specific level of belief and lets you combine it with other numbers using arithmetic. That's the whole reason probability exists as a subject.

Probabilities also have to be consistent. Two events that together cover every possibility — like "the coin lands heads" and "the coin lands tails" — must have probabilities that add to $1$. We'll formalize that in the next section.

2. Sample space and events

Before you can ask "how likely?", you have to be clear about what can happen. That's the job of the sample space.

Sample space, $S$

The set of all possible outcomes of a random experiment. Every individual outcome is a single element of $S$, and the outcomes are mutually exclusive — exactly one of them happens each time you run the experiment.

For a single die, the sample space is

$$ S = \{1, 2, 3, 4, 5, 6\}. $$

For a coin flip, $S = \{\text{H}, \text{T}\}$. For two coin flips, $S = \{\text{HH}, \text{HT}, \text{TH}, \text{TT}\}$. The sample space depends on the experiment, and writing it down explicitly is often the first move that turns a fuzzy question into a solvable one.

Event, $E$

Any subset of the sample space. An event is a collection of outcomes you've grouped together because you care about them as a unit.

For the die, "rolling an even number" is the event $E = \{2, 4, 6\}$ — three of the six outcomes bundled together. "Rolling a six" is the event $E = \{6\}$ — a single outcome. "Rolling something at least one" is the event $E = S$ — every outcome. Events can be tiny or as large as the whole sample space.

This vocabulary may sound formal, but it's exactly what lets you say the same thing in symbols that you were already saying in words. The set-theoretic view is what makes the rest of probability go through cleanly.

3. The classical formula

When the sample space is finite and every outcome is equally likely, probability becomes counting. Specifically:

$$ P(E) = \frac{|E|}{|S|} = \frac{\text{number of favorable outcomes}}{\text{total number of outcomes}}. $$

"Rolling an even number" on a fair die: $|E| = 3$ favorable outcomes (namely $2, 4, 6$), $|S| = 6$ total outcomes, so $P(E) = 3/6 = 1/2$. "Drawing the ace of spades" from a shuffled standard deck: $1/52$. "Getting heads twice in a row" from two fair coins: $1/4$ — there are four equally likely outcomes ($\text{HH}, \text{HT}, \text{TH}, \text{TT}$) and only one is favorable.

Read the fine print

The formula only works when the outcomes really are equally likely. A loaded die, an unfair coin, or a biased lottery breaks the assumption — and silently substituting the formula anyway is one of the most common ways probability problems go wrong. When outcomes aren't equiprobable, you need a more general approach (assign probabilities to each outcome directly and sum them over the event).

Two consequences fall straight out of the definition and worth naming because you'll use them constantly:

  • $0 \le P(E) \le 1$ for any event — the count of favorable outcomes is at least zero and at most the total.
  • $P(S) = 1$ — the event "something in the sample space happens" is certain, because something always does.

4. Complement, union, intersection

Three rules let you build the probability of complicated events out of simpler ones. Each comes straight from set theory — events are sets, after all — but each is worth memorizing because real problems almost always involve combining events.

Complement

The complement of $A$, written $A^c$ or "not $A$," is everything in the sample space that isn't in $A$. Since $A$ and $A^c$ together cover the whole sample space exactly once,

$$ P(A^c) = 1 - P(A). $$

This is unreasonably useful. Whenever an event is hard to count directly but its complement is easy, flip the problem over. The probability of "at least one heads in five flips" is annoying to compute directly (lots of cases) but trivial via the complement: $1 - P(\text{all tails}) = 1 - (1/2)^5 = 31/32$.

Union (inclusion–exclusion)

The union $A \cup B$ is the event "$A$ or $B$ or both." Its probability is

$$ P(A \cup B) = P(A) + P(B) - P(A \cap B). $$

The intersection $A \cap B$ — the event "both $A$ and $B$" — gets subtracted because it was counted once in $P(A)$ and once again in $P(B)$. Without the correction you'd be double-counting the overlap.

S A B A ∩ B P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

Mutually exclusive events

Two events are mutually exclusive (or disjoint) when they can't both happen — their intersection is empty, $A \cap B = \emptyset$. In that case the correction term vanishes and the union rule simplifies to

$$ P(A \cup B) = P(A) + P(B) \quad\text{when } A \cap B = \emptyset. $$

"Rolling a 1" and "rolling a 6" on the same die are mutually exclusive — there's no way to roll both, so the probability of either is just $1/6 + 1/6 = 1/3$. "Rolling an even number" and "rolling a 6" are not mutually exclusive (the 6 is in both), so you'd need the full inclusion–exclusion formula.

5. Independent events

Independence is the most slippery idea in this chapter, and the one most worth getting right. Two events are independent if knowing whether one happened tells you nothing about whether the other did.

Independence

Events $A$ and $B$ are independent if and only if

$P(A \cap B) = P(A) \cdot P(B).$

This is both the test for independence and the shortcut you get once you've established it. Coin flips are the canonical example: each flip has probability $1/2$ of landing heads, and the result of one flip doesn't influence the next, so the probability of two heads in a row is $\tfrac{1}{2} \cdot \tfrac{1}{2} = \tfrac{1}{4}$.

Drawing two cards without replacement from a deck, on the other hand, is not independent — the first card you draw is gone, so the second draw is from a 51-card deck whose composition depends on what you took. You'd need conditional probability (next topic) to handle that case.

Pitfall

The multiplication rule $P(A \cap B) = P(A) P(B)$ only holds when $A$ and $B$ are independent. Reaching for it reflexively, even when the events influence each other, is one of the most common mistakes in introductory probability. When in doubt, ask: "does knowing $A$ happened change my belief about $B$?" If yes, they're not independent and you need a different tool.

6. Frequentist vs. Bayesian, briefly

What does a probability actually mean? There are two dominant answers, and they disagree at the philosophical level even while they agree on most of the arithmetic.

ViewWhat $P(E)$ meansComfortable with
Frequentist The long-run fraction of times $E$ occurs when you repeat the experiment many times. Coin flips, manufacturing defect rates, anything you can rerun.
Bayesian A degree of belief about whether $E$ is true, given everything you currently know. One-off events ("will it rain tomorrow?"), parameters of a model, beliefs that update with evidence.

A frequentist will happily say "the probability this coin lands heads is $0.5$" because you can flip the coin a million times and watch the fraction settle. They'll be uneasy with "the probability the defendant is guilty is $0.7$" because the trial only runs once — there's no long-run frequency to point to.

A Bayesian is fine with both. To them, probability is a quantitative language for uncertainty, applicable wherever you have incomplete information. When evidence arrives, beliefs update via Bayes' theorem — which is exactly the next topic in this chapter.

In practice

Don't get too philosophical too early. The rules on this page — sample spaces, the classical formula, complement, union, independence — are the same under both interpretations. Pick the framing that makes the problem in front of you easier, and notice when you're switching between them.

7. Playground: frequency converges to probability

Pick an experiment, then run it. Once is just a single outcome — but as you keep going, the observed proportion of successes settles toward the theoretical probability. That convergence is the Law of Large Numbers, and it's why frequentist probability works at all: the more trials you stack up, the smaller the gap between "what actually happened" and "what was supposed to happen."

Trials
0
Successes
0
Observed
Theoretical
0.500
|Error|
1.0 0.75 0.5 0.25 0.0 observed proportion trial number P = 0.500

Capped at 50,000 trials per session to keep the browser responsive. Reset clears the run.

What to notice

The first few flips can sit far from the dashed line — small samples are noisy. Push to a few hundred trials and the orange line tightens onto the green. Switch experiments and re-run: a $P = 0.3$ biased coin converges to a lower line, but it converges just the same. The Law of Large Numbers doesn't care what the target probability is, only that it exists.

8. Common pitfalls

Assuming outcomes are equally likely when they aren't

The classical formula $P(E) = |E|/|S|$ is only valid when every outcome in $S$ has the same probability. A spinner with one big section and one tiny section has two "outcomes" but they aren't equiprobable — counting them naively will give you nonsense. Before applying the formula, ask: are these outcomes genuinely symmetric?

Confusing "independent" with "mutually exclusive"

These sound similar and mean nearly opposite things. Mutually exclusive events can't both happen — $P(A \cap B) = 0$. Independent events can both happen, but knowing one doesn't shift the probability of the other — $P(A \cap B) = P(A) P(B)$. In fact, two events with nonzero probability that are mutually exclusive are never independent: knowing $A$ happened tells you for certain that $B$ didn't.

Forgetting to subtract the overlap

When computing $P(A \cup B)$, beginners often write $P(A) + P(B)$ and stop there. That's only right when the events are mutually exclusive. If $A$ and $B$ can both occur, you've double-counted the overlap and the answer will be too big — sometimes hilariously so (greater than $1$).

The gambler's fallacy

"The coin has landed heads five times in a row, so tails is due." No, it isn't. If the flips are genuinely independent, the coin has no memory — the probability of heads on the next flip is still $1/2$, no matter what the last hundred flips were. The long-run average does settle toward $1/2$, but not because individual flips correct for past results; rather because past results get drowned out by the sheer volume of future flips. Confusing "the long-run average is $1/2$" with "the next flip is biased toward whatever has been rarer" is the gambler's fallacy, and it costs people money every day.

9. Worked examples

Try each one before opening the solution. The arithmetic is the easy part — the skill being built is identifying which rule applies.

Example 1 · Probability of rolling a 6 on a fair die

Set up. Sample space $S = \{1, 2, 3, 4, 5, 6\}$, with $|S| = 6$. The event is $E = \{6\}$, with $|E| = 1$.

Apply. All outcomes are equally likely on a fair die, so the classical formula applies:

$$ P(E) = \frac{|E|}{|S|} = \frac{1}{6} \approx 0.167. $$
Example 2 · Probability of not rolling a 6

Set up. The event "not a 6" is the complement of "a 6." We just computed $P(\{6\}) = 1/6$.

Apply. Use the complement rule:

$$ P(\text{not } 6) = 1 - P(6) = 1 - \tfrac{1}{6} = \tfrac{5}{6}. $$

Sanity check. Five outcomes ($1, 2, 3, 4, 5$) are favorable out of six total, so $5/6$ directly. Both routes agree.

Example 3 · Probability of rolling a 6 or a 4

Set up. Let $A = \{6\}$ and $B = \{4\}$. The two events are mutually exclusive — a single die roll can't be both 4 and 6 — so $A \cap B = \emptyset$ and $P(A \cap B) = 0$.

Apply. The simplified union rule for disjoint events:

$$ P(A \cup B) = P(A) + P(B) = \tfrac{1}{6} + \tfrac{1}{6} = \tfrac{2}{6} = \tfrac{1}{3}. $$
Example 4 · Probability of two independent events both occurring

Set up. Flip a fair coin and roll a fair die. Let $A = $ "coin lands heads" and $B = $ "die shows 6." These two physical processes don't influence each other, so $A$ and $B$ are independent. $P(A) = 1/2$, $P(B) = 1/6$.

Apply. The independence multiplication rule:

$$ P(A \cap B) = P(A) \cdot P(B) = \tfrac{1}{2} \cdot \tfrac{1}{6} = \tfrac{1}{12}. $$

Sanity check. The combined sample space has $2 \times 6 = 12$ equally likely outcomes, and exactly one of them (heads-and-6) is favorable: $1/12$ again.

Example 5 · Probability that at least one of two independent events occurs

Set up. Same setup as Example 4: $P(A) = 1/2$ and $P(B) = 1/6$, independent. We want $P(A \cup B)$ — the probability that at least one of the two events happens.

Route 1: complement. "At least one occurs" is the complement of "neither occurs." For independent events, "neither" also multiplies:

$$ P(\text{neither}) = P(A^c) \cdot P(B^c) = \tfrac{1}{2} \cdot \tfrac{5}{6} = \tfrac{5}{12}. $$ $$ P(A \cup B) = 1 - \tfrac{5}{12} = \tfrac{7}{12}. $$

Route 2: inclusion–exclusion.

$$ P(A \cup B) = P(A) + P(B) - P(A \cap B) = \tfrac{1}{2} + \tfrac{1}{6} - \tfrac{1}{12} = \tfrac{6}{12} + \tfrac{2}{12} - \tfrac{1}{12} = \tfrac{7}{12}. $$

Both routes agree. Notice how much cleaner the complement approach is — that's the pattern. Whenever "at least one" appears in the question, try the complement first.

Sources & further reading

The ideas on this page are standard across every probability textbook. The references below are the ones to reach for when you want a more rigorous treatment, more practice, or the formal definitions stated in their canonical form.

Test your understanding

A quiz that builds from easy to hard. Pick an answer to get instant feedback and a worked explanation. Your progress is saved in this browser — come back anytime to continue.

Question 1 of 22
0 correct