Distributions

What you'll leave with

What a distribution is, in both the discrete (table) and continuous (density) cases.
The three foundational discrete distributions: Bernoulli, binomial, Poisson — and what real situations they model.
The three foundational continuous distributions: uniform, exponential, normal — and the parameters that shape each.
Why the normal distribution shows up almost everywhere — and why the value of a density at a single point is not a probability.
The relationship between the PDF (the curve) and the CDF (the running total).

1. What a distribution is

Probability distribution

A description of the probabilities for all possible values a random variable $X$ can take. It assigns total probability $1$ across the set of outcomes — never more, never less.

The shape of that description depends on what kind of values $X$ can take.

Discrete: a table

If $X$ takes values from a countable set — like $\{0, 1, 2, \dots\}$ — its distribution is a probability mass function (PMF): a rule $p(x) = P(X = x)$ that gives the probability of each specific value. The list of all those probabilities sums to $1$:

$$ \sum_{x} p(x) = 1 $$

You can write it out as a table. For a fair six-sided die:

$x$	$1$	$2$	$3$	$4$	$5$	$6$
$P(X = x)$	$\tfrac{1}{6}$	$\tfrac{1}{6}$	$\tfrac{1}{6}$	$\tfrac{1}{6}$	$\tfrac{1}{6}$	$\tfrac{1}{6}$

That table is the distribution.

Continuous: a curve

If $X$ takes values from a continuum — like all real numbers, or all positive times — there are infinitely many outcomes, and assigning each a positive probability would make them sum to infinity. So we describe a continuous random variable by a probability density function (PDF) $f(x)$, with the rule that probability comes from area under the curve:

$$ P(a \leq X \leq b) = \int_{a}^{b} f(x)\,dx, \qquad \int_{-\infty}^{\infty} f(x)\,dx = 1 $$

The density at a single point isn't a probability — it's a rate of probability per unit of $x$. The probability of any exact value is zero. (We'll harp on this in the pitfalls.)

2. Discrete: Bernoulli, binomial, geometric, Poisson, uniform

A handful of named families cover an enormous share of everyday discrete situations. Bernoulli is a single trial; binomial counts successes in a fixed number of trials; geometric counts trials until the first success; Poisson counts rare events in a window; discrete uniform spreads probability equally across a finite set.

Bernoulli — a single yes/no trial

A Bernoulli random variable models one trial with two outcomes: success (call it $1$) with probability $p$, failure ($0$) with probability $1 - p$.

$$ P(X = 1) = p, \qquad P(X = 0) = 1 - p $$

Mean $\mu = p$, variance $\sigma^2 = p(1-p)$. A single coin flip with $p = 0.5$ is the canonical example, but it works for any binary trial: a click or no click, a defective part or not, a hit or a miss.

Binomial — counting successes in $n$ Bernoulli trials

Run $n$ independent Bernoulli trials, each with probability $p$ of success, and count the number $k$ of successes. That count is binomial:

$$ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}, \qquad k = 0, 1, \dots, n $$

The binomial coefficient $\binom{n}{k}$ counts the number of ways to choose which $k$ of the $n$ trials are the successes; the $p^k (1-p)^{n-k}$ is the probability of any one specific arrangement. Mean $\mu = np$, variance $\sigma^2 = np(1-p)$.

Sketch of the PMF for $n = 10, p = 0.4$ — bars stand at each integer $k$ and their heights add to $1$:

Geometric — trials until the first success

Repeat independent Bernoulli($p$) trials and ask: which trial is the first success? Let $X$ count the number of trials needed (so $X \in \{1, 2, 3, \dots\}$). Then $X$ is geometric:

$$ P(X = k) = (1 - p)^{k - 1} \, p, \qquad k = 1, 2, 3, \dots $$

The story is direct: $k - 1$ failures in a row, each with probability $1 - p$, followed by a success with probability $p$. Mean $\mu = 1/p$, variance $\sigma^2 = (1-p)/p^2$. The geometric is the discrete cousin of the exponential and shares its memoryless property: given you've already failed $j$ times, the number of additional trials needed is still geometric with the same $p$.

Poisson — counting rare events in a fixed interval

Sometimes you don't have a fixed $n$ — you have a window of time or space (an hour at a call center, a square meter of fabric, a kilometer of road) and you count how many events fall inside it. If events arrive independently at a constant average rate $\lambda$ per window, the count $X$ is Poisson:

$$ P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \qquad k = 0, 1, 2, \dots $$

Mean $\mu = \lambda$, variance $\sigma^2 = \lambda$ — the same number for both, which is itself a clue when you suspect a Poisson in real data. The Poisson is also the limit of the binomial when $n \to \infty$, $p \to 0$, and $np \to \lambda$: many opportunities, each tiny, with a fixed average count.

Discrete uniform — every outcome equally likely

If $X$ takes one of $n$ equally likely values $\{x_1, x_2, \dots, x_n\}$, it is discrete uniform:

$$ P(X = x_i) = \frac{1}{n}, \qquad i = 1, 2, \dots, n $$

A fair die ($n = 6$, values $\{1, \dots, 6\}$) is the canonical example. For consecutive integers $\{1, 2, \dots, n\}$: mean $\mu = (n+1)/2$, variance $\sigma^2 = (n^2 - 1)/12$.

When to reach for which

A single yes/no trial → Bernoulli. A fixed number $n$ of repeated yes/no trials, count successes → binomial. Repeat trials until the first success, count trials → geometric. Counts of rare events in a continuous window with no fixed $n$ → Poisson. Every outcome in a finite set equally likely → discrete uniform.

3. Continuous: uniform, exponential, normal

For continuous random variables we trade tables for densities. Three families do most of the work.

Uniform — flat density on an interval

If every value in $[a, b]$ is equally likely and nothing outside the interval is possible, $X$ is uniform on $[a, b]$:

$$ f(x) = \begin{cases} \dfrac{1}{b - a} & a \leq x \leq b \\[4pt] 0 & \text{otherwise} \end{cases} $$

The constant height $1/(b-a)$ is exactly what's needed for the total area to be $1$. Mean $\mu = (a+b)/2$, variance $\sigma^2 = (b-a)^2/12$.

Exponential — waiting time for a Poisson event

If events occur as a Poisson process with rate $\lambda$, the waiting time $X$ until the next event is exponential:

$$ f(x) = \lambda e^{-\lambda x}, \qquad x \geq 0 $$

Mean $\mu = 1/\lambda$, variance $\sigma^2 = 1/\lambda^2$. The exponential is "memoryless": knowing you've already waited 10 minutes tells you nothing about how much longer you'll wait. That's not intuitive — but it's the exact property that makes it model independent Poisson arrivals.

Normal (Gaussian) — the bell curve

The normal distribution with mean $\mu$ and standard deviation $\sigma$ has density

$$ f(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp\!\left( -\frac{(x - \mu)^2}{2\sigma^2} \right) $$

Symmetric about $\mu$, with $\sigma$ controlling the width. The standard normal is the special case $\mu = 0, \sigma = 1$ — denoted $Z$ and tabulated in every statistics textbook because every other normal can be rescaled to it via $Z = (X - \mu)/\sigma$.

The 68–95–99.7 rule is worth memorizing: roughly $68\%$ of the probability sits within $\pm 1\sigma$ of the mean, $95\%$ within $\pm 2\sigma$, and $99.7\%$ within $\pm 3\sigma$.

4. The normal distribution — why it's everywhere

The normal isn't just one distribution among many — it's the shape that quietly takes over whenever you add up enough small independent random influences. Heights, measurement errors, sample means, sums of dice rolls — the more independent random nudges contribute, the closer the result lies to a bell curve.

This isn't folklore. It's the Central Limit Theorem (CLT): if $X_1, X_2, \dots, X_n$ are independent random variables (from any reasonable distribution) with mean $\mu$ and finite variance $\sigma^2$, then for large $n$ the standardized sample mean

$$ Z_n = \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} $$

converges in distribution to the standard normal $N(0, 1)$. The underlying distribution can be wildly non-normal — uniform, skewed, even discrete — and the average still pulls toward the bell. That's why the normal is overrepresented in measurements of natural quantities: those quantities are sums.

Next up

The CLT is important enough that the next topic is devoted entirely to it. This page just plants the flag — the explanation belongs there.

Normal isn't universal

Plenty of natural data isn't normal — incomes, file sizes, city populations, and earthquake magnitudes are famously heavy-tailed (often power-law or log-normal). Reaching for the normal as a default always is a beginner's mistake; reach for it when the data-generating process is "many small independent contributions added together."

5. PDF vs CDF

For any random variable, there are two equivalent ways to write down its distribution: the PDF (or PMF, in the discrete case) and the CDF.

Cumulative distribution function (CDF)

The function $F(x) = P(X \leq x)$ — the probability that $X$ is at most $x$. It's the running total of probability up to $x$.

The CDF accumulates the PDF:

$$ F(x) = \int_{-\infty}^{x} f(t)\,dt \quad \text{(continuous)}, \qquad F(x) = \sum_{t \leq x} p(t) \quad \text{(discrete)} $$

So $F$ is just the running area (or running sum) of the PDF. Conversely, in the continuous case you recover the PDF by differentiating: $f(x) = F'(x)$.

Three useful facts:

$F$ is non-decreasing — probability never goes negative, so the running total can only grow.
$F(-\infty) = 0$ and $F(\infty) = 1$ — start with no probability, end with all of it.
$P(a < X \leq b) = F(b) - F(a)$ — interval probabilities are differences of CDF values. This is how statistical tables let you compute probabilities without doing the integral yourself.

	PDF / PMF	CDF
What it returns	Density (or mass) at a point	Probability of being $\leq$ that point
Continuous shape	Curve (bell, flat, decaying, …)	S-shaped — flat, rises, flat again
Discrete shape	Bars at integer values	Step function
Probability of $[a, b]$	$\int_a^b f(x)\,dx$ (area)	$F(b) - F(a)$ (subtraction)

7. Common pitfalls

Density at a point is not a probability

For a continuous distribution, $f(x)$ can be larger than $1$. The density at a point has units of "probability per unit of $x$" — it is not $P(X = x)$. The probability of any exact value is zero; only intervals have positive probability, and you get them by computing area.

"Equal to" vs "less than or equal to" in the continuous case

Because $P(X = x) = 0$ for continuous $X$, we have $P(X < x) = P(X \leq x)$. The strict vs non-strict inequality makes no difference — only for discrete variables does it matter.

The normal is not the only "natural" distribution

Reaching for the normal as a default for any real-valued data is a mistake. Waiting times skew right (exponential or gamma), proportions are bounded in $[0,1]$ (beta), counts of rare events are Poisson, and income/wealth/city-size data are typically heavy-tailed. Match the distribution to the generating process, not to convenience.

Poisson approximates binomial only when events are rare

The Poisson is the limit of the binomial as $n \to \infty$, $p \to 0$, with $np = \lambda$ fixed. A rule of thumb: the approximation is good when $n \geq 20$ and $p \leq 0.05$, or whenever $n \geq 100$ and $np \leq 10$. Use it for $n = 10, p = 0.5$ and you'll get a noticeably wrong answer.

8. Worked examples

Try each one yourself before opening the solution. The point is to recognize which distribution applies and then plug into its formula — not to compute the number from first principles.

Example 1 · Binomial: $4$ heads in $10$ flips of a fair coin

Identify. Fixed $n = 10$ independent trials, each with $p = 0.5$. Binomial.

Apply the formula.

$$ P(X = 4) = \binom{10}{4} (0.5)^4 (0.5)^6 = 210 \cdot (0.5)^{10} = \frac{210}{1024} \approx 0.205 $$

So a fair coin gives exactly four heads in ten flips about $20.5\%$ of the time — less than half the time, even though $4$ is close to the mean of $5$.

Example 2 · Poisson approximation: defective parts

A factory makes parts that are defective with probability $p = 0.01$, independently. In a batch of $n = 200$, what's the probability of at most $2$ defects?

Identify. Large $n$, small $p$, $np = 2$. Poisson with $\lambda = 2$ is an excellent approximation.

Apply.

$$ P(X \leq 2) = \sum_{k=0}^{2} \frac{2^k e^{-2}}{k!} = e^{-2}\!\left(1 + 2 + 2\right) = 5 e^{-2} \approx 0.677 $$

About $67.7\%$. The exact binomial calculation gives $0.677$ to three decimals — the approximation is essentially indistinguishable here.

Example 3 · Reading a normal table

A test score $X$ is normally distributed with mean $\mu = 70$ and standard deviation $\sigma = 8$. What fraction of scores fall between $66$ and $82$?

Step 1 — standardize. Convert to $z$-scores:

$$ z_1 = \frac{66 - 70}{8} = -0.5, \qquad z_2 = \frac{82 - 70}{8} = 1.5 $$

Step 2 — look up CDF values. From a standard normal table, $\Phi(1.5) \approx 0.9332$ and $\Phi(-0.5) \approx 0.3085$.

Step 3 — subtract.

$$ P(66 \leq X \leq 82) = \Phi(1.5) - \Phi(-0.5) \approx 0.9332 - 0.3085 = 0.6247 $$

About $62.5\%$ of scores fall in that range.

Example 4 · Why the normal density integrates to $1$ (conceptual)

We need $\int_{-\infty}^{\infty} \frac{1}{\sigma\sqrt{2\pi}} e^{-(x-\mu)^2/(2\sigma^2)}\, dx = 1$. The clever trick is to compute the square of the standard-normal integral and convert to polar coordinates.

Let $I = \int_{-\infty}^{\infty} e^{-x^2/2}\, dx$. Then

$$ I^2 = \int_{-\infty}^{\infty}\!\int_{-\infty}^{\infty} e^{-(x^2 + y^2)/2}\, dx\, dy $$

Switch to polar: $x^2 + y^2 = r^2$ and $dx\, dy = r\, dr\, d\theta$.

$$ I^2 = \int_0^{2\pi}\!\int_0^{\infty} e^{-r^2/2}\, r\, dr\, d\theta = 2\pi \cdot 1 = 2\pi $$

So $I = \sqrt{2\pi}$, which means the normalization constant $\tfrac{1}{\sigma\sqrt{2\pi}}$ is exactly what's needed for the total area to come out to $1$. The $\sqrt{2\pi}$ in the formula isn't decorative — it's there to pay for the area.

Example 5 · PDF vs CDF for the exponential

Take an exponential with rate $\lambda = 1$. Its PDF and CDF are

$$ f(x) = e^{-x}, \qquad F(x) = 1 - e^{-x}, \quad x \geq 0 $$

Notice that $f(0) = 1$ — the density at zero is $1$, but $P(X = 0) = 0$, and $F(0) = 0$. The density value is not a probability.

The probability that $X$ lies in $[1, 2]$ is

$$ P(1 \leq X \leq 2) = F(2) - F(1) = (1 - e^{-2}) - (1 - e^{-1}) = e^{-1} - e^{-2} \approx 0.233 $$

You could also have gotten this by integrating the PDF directly: $\int_1^2 e^{-x}\,dx = -e^{-2} + e^{-1}$. Same answer, two routes — that's the whole point of having both PDF and CDF available.

Sources & further reading

The shapes and formulas above are foundational and universally agreed-on. The sources below are where to go for fuller derivations, more named families, and the historical context.

Discrete Random Variables Textbook OpenStax · Statistics, Ch. 4

Open-license treatment of Bernoulli, binomial, geometric, hypergeometric, and Poisson with extensive worked problems. The companion Chapter 5 covers continuous distributions in the same style.
Continuous Random Variables Textbook OpenStax · Statistics, Ch. 5–6

Continuous distributions and the normal in depth, including standardization and using $z$-tables. Ch. 6 is dedicated to the normal alone.
Random variables & probability distributions Tutorial Khan Academy · Statistics & Probability

Bite-sized video lessons and practice problems for each of the named distributions, with instant feedback. Use this to drill the mechanics until they're automatic.
Normal Distribution Reference Wolfram MathWorld

Formal, dense reference on the normal — moments, characteristic function, properties, and links to every related distribution. Reach for this when you want the precise statement in mathematician's language.
Probability distribution Encyclopedia Wikipedia

Broad overview connecting discrete, continuous, and mixed distributions, with a long catalog of named families. The best starting point for placing this topic in the larger landscape.

What you'll leave with

1. What a distribution is

Discrete: a table

Continuous: a curve

2. Discrete: Bernoulli, binomial, geometric, Poisson, uniform

Bernoulli — a single yes/no trial

Binomial — counting successes in $n$ Bernoulli trials

Geometric — trials until the first success

Poisson — counting rare events in a fixed interval

Discrete uniform — every outcome equally likely

3. Continuous: uniform, exponential, normal

Uniform — flat density on an interval

Exponential — waiting time for a Poisson event

Normal (Gaussian) — the bell curve

4. The normal distribution — why it's everywhere

5. PDF vs CDF

6. Playground: tune the distribution

7. Common pitfalls

8. Worked examples

Sources & further reading

Test your understanding