Topic · Statistics & Probability

Distributions

A distribution is the complete probabilistic fingerprint of a random variable — it tells you which values are likely, which are rare, and how the total probability of 1 is spread across every possible outcome. A handful of named distributions show up so often in the wild that learning their shapes is the fastest way to recognize what you're looking at.

What you'll leave with

  • What a distribution is, in both the discrete (table) and continuous (density) cases.
  • The three foundational discrete distributions: Bernoulli, binomial, Poisson — and what real situations they model.
  • The three foundational continuous distributions: uniform, exponential, normal — and the parameters that shape each.
  • Why the normal distribution shows up almost everywhere — and why the value of a density at a single point is not a probability.
  • The relationship between the PDF (the curve) and the CDF (the running total).

1. What a distribution is

Probability distribution

A description of the probabilities for all possible values a random variable $X$ can take. It assigns total probability $1$ across the set of outcomes — never more, never less.

The shape of that description depends on what kind of values $X$ can take.

Discrete: a table

If $X$ takes values from a countable set — like $\{0, 1, 2, \dots\}$ — its distribution is a probability mass function (PMF): a rule $p(x) = P(X = x)$ that gives the probability of each specific value. The list of all those probabilities sums to $1$:

$$ \sum_{x} p(x) = 1 $$

You can write it out as a table. For a fair six-sided die:

$x$$1$$2$$3$$4$$5$$6$
$P(X = x)$$\tfrac{1}{6}$$\tfrac{1}{6}$$\tfrac{1}{6}$$\tfrac{1}{6}$$\tfrac{1}{6}$$\tfrac{1}{6}$

That table is the distribution.

Continuous: a curve

If $X$ takes values from a continuum — like all real numbers, or all positive times — there are infinitely many outcomes, and assigning each a positive probability would make them sum to infinity. So we describe a continuous random variable by a probability density function (PDF) $f(x)$, with the rule that probability comes from area under the curve:

$$ P(a \leq X \leq b) = \int_{a}^{b} f(x)\,dx, \qquad \int_{-\infty}^{\infty} f(x)\,dx = 1 $$

The density at a single point isn't a probability — it's a rate of probability per unit of $x$. The probability of any exact value is zero. (We'll harp on this in the pitfalls.)

2. Discrete: Bernoulli, binomial, geometric, Poisson, uniform

A handful of named families cover an enormous share of everyday discrete situations. Bernoulli is a single trial; binomial counts successes in a fixed number of trials; geometric counts trials until the first success; Poisson counts rare events in a window; discrete uniform spreads probability equally across a finite set.

Bernoulli — a single yes/no trial

A Bernoulli random variable models one trial with two outcomes: success (call it $1$) with probability $p$, failure ($0$) with probability $1 - p$.

$$ P(X = 1) = p, \qquad P(X = 0) = 1 - p $$

Mean $\mu = p$, variance $\sigma^2 = p(1-p)$. A single coin flip with $p = 0.5$ is the canonical example, but it works for any binary trial: a click or no click, a defective part or not, a hit or a miss.

Binomial — counting successes in $n$ Bernoulli trials

Run $n$ independent Bernoulli trials, each with probability $p$ of success, and count the number $k$ of successes. That count is binomial:

$$ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}, \qquad k = 0, 1, \dots, n $$

The binomial coefficient $\binom{n}{k}$ counts the number of ways to choose which $k$ of the $n$ trials are the successes; the $p^k (1-p)^{n-k}$ is the probability of any one specific arrangement. Mean $\mu = np$, variance $\sigma^2 = np(1-p)$.

Sketch of the PMF for $n = 10, p = 0.4$ — bars stand at each integer $k$ and their heights add to $1$:

0 1 2 3 4 5 6 7 8 9 10 0 0.10 0.20 0.25 Binomial PMF · n = 10, p = 0.4 mean = np = 4 · most likely k = 4 k P(X = k)

Geometric — trials until the first success

Repeat independent Bernoulli($p$) trials and ask: which trial is the first success? Let $X$ count the number of trials needed (so $X \in \{1, 2, 3, \dots\}$). Then $X$ is geometric:

$$ P(X = k) = (1 - p)^{k - 1} \, p, \qquad k = 1, 2, 3, \dots $$

The story is direct: $k - 1$ failures in a row, each with probability $1 - p$, followed by a success with probability $p$. Mean $\mu = 1/p$, variance $\sigma^2 = (1-p)/p^2$. The geometric is the discrete cousin of the exponential and shares its memoryless property: given you've already failed $j$ times, the number of additional trials needed is still geometric with the same $p$.

Poisson — counting rare events in a fixed interval

Sometimes you don't have a fixed $n$ — you have a window of time or space (an hour at a call center, a square meter of fabric, a kilometer of road) and you count how many events fall inside it. If events arrive independently at a constant average rate $\lambda$ per window, the count $X$ is Poisson:

$$ P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \qquad k = 0, 1, 2, \dots $$

Mean $\mu = \lambda$, variance $\sigma^2 = \lambda$ — the same number for both, which is itself a clue when you suspect a Poisson in real data. The Poisson is also the limit of the binomial when $n \to \infty$, $p \to 0$, and $np \to \lambda$: many opportunities, each tiny, with a fixed average count.

Discrete uniform — every outcome equally likely

If $X$ takes one of $n$ equally likely values $\{x_1, x_2, \dots, x_n\}$, it is discrete uniform:

$$ P(X = x_i) = \frac{1}{n}, \qquad i = 1, 2, \dots, n $$

A fair die ($n = 6$, values $\{1, \dots, 6\}$) is the canonical example. For consecutive integers $\{1, 2, \dots, n\}$: mean $\mu = (n+1)/2$, variance $\sigma^2 = (n^2 - 1)/12$.

When to reach for which

A single yes/no trial → Bernoulli. A fixed number $n$ of repeated yes/no trials, count successes → binomial. Repeat trials until the first success, count trials → geometric. Counts of rare events in a continuous window with no fixed $n$ → Poisson. Every outcome in a finite set equally likely → discrete uniform.

3. Continuous: uniform, exponential, normal

For continuous random variables we trade tables for densities. Three families do most of the work.

Uniform — flat density on an interval

If every value in $[a, b]$ is equally likely and nothing outside the interval is possible, $X$ is uniform on $[a, b]$:

$$ f(x) = \begin{cases} \dfrac{1}{b - a} & a \leq x \leq b \\[4pt] 0 & \text{otherwise} \end{cases} $$

The constant height $1/(b-a)$ is exactly what's needed for the total area to be $1$. Mean $\mu = (a+b)/2$, variance $\sigma^2 = (b-a)^2/12$.

Exponential — waiting time for a Poisson event

If events occur as a Poisson process with rate $\lambda$, the waiting time $X$ until the next event is exponential:

$$ f(x) = \lambda e^{-\lambda x}, \qquad x \geq 0 $$

Mean $\mu = 1/\lambda$, variance $\sigma^2 = 1/\lambda^2$. The exponential is "memoryless": knowing you've already waited 10 minutes tells you nothing about how much longer you'll wait. That's not intuitive — but it's the exact property that makes it model independent Poisson arrivals.

Normal (Gaussian) — the bell curve

The normal distribution with mean $\mu$ and standard deviation $\sigma$ has density

$$ f(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp\!\left( -\frac{(x - \mu)^2}{2\sigma^2} \right) $$

Symmetric about $\mu$, with $\sigma$ controlling the width. The standard normal is the special case $\mu = 0, \sigma = 1$ — denoted $Z$ and tabulated in every statistics textbook because every other normal can be rescaled to it via $Z = (X - \mu)/\sigma$.

The 68–95–99.7 rule is worth memorizing: roughly $68\%$ of the probability sits within $\pm 1\sigma$ of the mean, $95\%$ within $\pm 2\sigma$, and $99.7\%$ within $\pm 3\sigma$.

−3σ −2σ −1σ μ +1σ +2σ +3σ 0 0.1 0.2 0.3 68% 95% 99.7% Standard normal · μ = 0, σ = 1 z f(z)

4. The normal distribution — why it's everywhere

The normal isn't just one distribution among many — it's the shape that quietly takes over whenever you add up enough small independent random influences. Heights, measurement errors, sample means, sums of dice rolls — the more independent random nudges contribute, the closer the result lies to a bell curve.

This isn't folklore. It's the Central Limit Theorem (CLT): if $X_1, X_2, \dots, X_n$ are independent random variables (from any reasonable distribution) with mean $\mu$ and finite variance $\sigma^2$, then for large $n$ the standardized sample mean

$$ Z_n = \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} $$

converges in distribution to the standard normal $N(0, 1)$. The underlying distribution can be wildly non-normal — uniform, skewed, even discrete — and the average still pulls toward the bell. That's why the normal is overrepresented in measurements of natural quantities: those quantities are sums.

Next up

The CLT is important enough that the next topic is devoted entirely to it. This page just plants the flag — the explanation belongs there.

Normal isn't universal

Plenty of natural data isn't normal — incomes, file sizes, city populations, and earthquake magnitudes are famously heavy-tailed (often power-law or log-normal). Reaching for the normal as a default always is a beginner's mistake; reach for it when the data-generating process is "many small independent contributions added together."

5. PDF vs CDF

For any random variable, there are two equivalent ways to write down its distribution: the PDF (or PMF, in the discrete case) and the CDF.

Cumulative distribution function (CDF)

The function $F(x) = P(X \leq x)$ — the probability that $X$ is at most $x$. It's the running total of probability up to $x$.

The CDF accumulates the PDF:

$$ F(x) = \int_{-\infty}^{x} f(t)\,dt \quad \text{(continuous)}, \qquad F(x) = \sum_{t \leq x} p(t) \quad \text{(discrete)} $$

So $F$ is just the running area (or running sum) of the PDF. Conversely, in the continuous case you recover the PDF by differentiating: $f(x) = F'(x)$.

Three useful facts:

  • $F$ is non-decreasing — probability never goes negative, so the running total can only grow.
  • $F(-\infty) = 0$ and $F(\infty) = 1$ — start with no probability, end with all of it.
  • $P(a < X \leq b) = F(b) - F(a)$ — interval probabilities are differences of CDF values. This is how statistical tables let you compute probabilities without doing the integral yourself.
PDF / PMFCDF
What it returnsDensity (or mass) at a pointProbability of being $\leq$ that point
Continuous shapeCurve (bell, flat, decaying, …)S-shaped — flat, rises, flat again
Discrete shapeBars at integer valuesStep function
Probability of $[a, b]$$\int_a^b f(x)\,dx$ (area)$F(b) - F(a)$ (subtraction)

6. Playground: tune the distribution

Pick a family, drag the parameters, and watch the shape shift. The point is to internalize what each parameter does — how $\sigma$ widens the bell, how $p$ tilts the binomial, how $\lambda$ stretches the exponential's tail. The mean, variance, and standard deviation update in real time.

f(x) = N(0.0, 1.0)
mean0.000 variance1.000 std dev1.000
0.0
1.0
x f(x)
Copied!
Try this

On the normal, pin $\mu = 0$ and slide $\sigma$ from $0.5$ to $3$ — notice that as the curve widens, the peak drops. Total area must stay at $1$, so any spreading has to be paid for by a shorter peak. On the binomial, set $p = 0.5$ and watch the bars form a near-symmetric mound that gets closer to a bell as $n$ grows — the central limit theorem, peeking out of a discrete distribution.

7. Common pitfalls

Density at a point is not a probability

For a continuous distribution, $f(x)$ can be larger than $1$. The density at a point has units of "probability per unit of $x$" — it is not $P(X = x)$. The probability of any exact value is zero; only intervals have positive probability, and you get them by computing area.

"Equal to" vs "less than or equal to" in the continuous case

Because $P(X = x) = 0$ for continuous $X$, we have $P(X < x) = P(X \leq x)$. The strict vs non-strict inequality makes no difference — only for discrete variables does it matter.

The normal is not the only "natural" distribution

Reaching for the normal as a default for any real-valued data is a mistake. Waiting times skew right (exponential or gamma), proportions are bounded in $[0,1]$ (beta), counts of rare events are Poisson, and income/wealth/city-size data are typically heavy-tailed. Match the distribution to the generating process, not to convenience.

Poisson approximates binomial only when events are rare

The Poisson is the limit of the binomial as $n \to \infty$, $p \to 0$, with $np = \lambda$ fixed. A rule of thumb: the approximation is good when $n \geq 20$ and $p \leq 0.05$, or whenever $n \geq 100$ and $np \leq 10$. Use it for $n = 10, p = 0.5$ and you'll get a noticeably wrong answer.

8. Worked examples

Try each one yourself before opening the solution. The point is to recognize which distribution applies and then plug into its formula — not to compute the number from first principles.

Example 1 · Binomial: $4$ heads in $10$ flips of a fair coin

Identify. Fixed $n = 10$ independent trials, each with $p = 0.5$. Binomial.

Apply the formula.

$$ P(X = 4) = \binom{10}{4} (0.5)^4 (0.5)^6 = 210 \cdot (0.5)^{10} = \frac{210}{1024} \approx 0.205 $$

So a fair coin gives exactly four heads in ten flips about $20.5\%$ of the time — less than half the time, even though $4$ is close to the mean of $5$.

Example 2 · Poisson approximation: defective parts

A factory makes parts that are defective with probability $p = 0.01$, independently. In a batch of $n = 200$, what's the probability of at most $2$ defects?

Identify. Large $n$, small $p$, $np = 2$. Poisson with $\lambda = 2$ is an excellent approximation.

Apply.

$$ P(X \leq 2) = \sum_{k=0}^{2} \frac{2^k e^{-2}}{k!} = e^{-2}\!\left(1 + 2 + 2\right) = 5 e^{-2} \approx 0.677 $$

About $67.7\%$. The exact binomial calculation gives $0.677$ to three decimals — the approximation is essentially indistinguishable here.

Example 3 · Reading a normal table

A test score $X$ is normally distributed with mean $\mu = 70$ and standard deviation $\sigma = 8$. What fraction of scores fall between $66$ and $82$?

Step 1 — standardize. Convert to $z$-scores:

$$ z_1 = \frac{66 - 70}{8} = -0.5, \qquad z_2 = \frac{82 - 70}{8} = 1.5 $$

Step 2 — look up CDF values. From a standard normal table, $\Phi(1.5) \approx 0.9332$ and $\Phi(-0.5) \approx 0.3085$.

Step 3 — subtract.

$$ P(66 \leq X \leq 82) = \Phi(1.5) - \Phi(-0.5) \approx 0.9332 - 0.3085 = 0.6247 $$

About $62.5\%$ of scores fall in that range.

Example 4 · Why the normal density integrates to $1$ (conceptual)

We need $\int_{-\infty}^{\infty} \frac{1}{\sigma\sqrt{2\pi}} e^{-(x-\mu)^2/(2\sigma^2)}\, dx = 1$. The clever trick is to compute the square of the standard-normal integral and convert to polar coordinates.

Let $I = \int_{-\infty}^{\infty} e^{-x^2/2}\, dx$. Then

$$ I^2 = \int_{-\infty}^{\infty}\!\int_{-\infty}^{\infty} e^{-(x^2 + y^2)/2}\, dx\, dy $$

Switch to polar: $x^2 + y^2 = r^2$ and $dx\, dy = r\, dr\, d\theta$.

$$ I^2 = \int_0^{2\pi}\!\int_0^{\infty} e^{-r^2/2}\, r\, dr\, d\theta = 2\pi \cdot 1 = 2\pi $$

So $I = \sqrt{2\pi}$, which means the normalization constant $\tfrac{1}{\sigma\sqrt{2\pi}}$ is exactly what's needed for the total area to come out to $1$. The $\sqrt{2\pi}$ in the formula isn't decorative — it's there to pay for the area.

Example 5 · PDF vs CDF for the exponential

Take an exponential with rate $\lambda = 1$. Its PDF and CDF are

$$ f(x) = e^{-x}, \qquad F(x) = 1 - e^{-x}, \quad x \geq 0 $$

Notice that $f(0) = 1$ — the density at zero is $1$, but $P(X = 0) = 0$, and $F(0) = 0$. The density value is not a probability.

The probability that $X$ lies in $[1, 2]$ is

$$ P(1 \leq X \leq 2) = F(2) - F(1) = (1 - e^{-2}) - (1 - e^{-1}) = e^{-1} - e^{-2} \approx 0.233 $$

You could also have gotten this by integrating the PDF directly: $\int_1^2 e^{-x}\,dx = -e^{-2} + e^{-1}$. Same answer, two routes — that's the whole point of having both PDF and CDF available.

Sources & further reading

The shapes and formulas above are foundational and universally agreed-on. The sources below are where to go for fuller derivations, more named families, and the historical context.

Test your understanding

A quiz that builds from easy to hard. Pick an answer to get instant feedback and a worked explanation. Your progress is saved in this browser — come back anytime to continue.

Question 1 of 1
0 correct