Partial Derivatives — Calculus

What you'll leave with

Why $f(x, y)$ has two first derivatives, not one — and what each one geometrically is.
How to compute $\partial f/\partial x$ and $\partial f/\partial y$ by treating the other variable as a constant.
Higher-order partials, and Clairaut's quiet miracle: $f_{xy} = f_{yx}$.
The gradient $\nabla f$: direction of steepest ascent, perpendicular to level curves, with magnitude equal to the max rate of change.
Directional derivatives as a dot product, and the tangent-plane equation built from the partials.
A first look at critical points and the second-derivative test in 2D.

1. From curves to surfaces

In single-variable calculus, $f(x)$ takes a number and returns a number. Its graph is a curve in the plane, and at any point on that curve there's a single tangent line whose slope is $f'(x)$. One input, one slope, one derivative.

Now consider $f(x, y)$ — say, the altitude of a landscape, the temperature in a room, or the value of a financial option as a function of price and time. Two inputs, one output. The graph is no longer a curve but a surface floating over the $xy$-plane, with height $z = f(x, y)$ at each ground point $(x, y)$.

Standing on that surface, you can ask: how steeply does it climb? But "how steeply" no longer has a single answer. The slope you measure depends on the direction you walk. East might rise sharply; north might fall away. The one number $f'$ has to split into a richer object that captures slope in every direction at once. The pieces of that object are the partial derivatives.

A curve has one tangent line at each point; a surface has a whole tangent plane, built from the partial derivatives.

2. The partial derivative

The trick is to make the multivariable problem look like a one-variable problem. Fix $y$ at some value $y_0$. The function $g(x) = f(x, y_0)$ now depends only on $x$ — it's a slice through the surface along the line $y = y_0$. You already know how to take its derivative. That number is the partial derivative of $f$ with respect to $x$ at $(x_0, y_0)$.

Partial derivatives of $f(x, y)$

The partial derivative with respect to $x$ at $(a, b)$ is

$$ \frac{\partial f}{\partial x}(a, b) \;=\; \lim_{h \to 0} \frac{f(a + h,\, b) - f(a, b)}{h}, $$

and with respect to $y$,

$$ \frac{\partial f}{\partial y}(a, b) \;=\; \lim_{h \to 0} \frac{f(a,\, b + h) - f(a, b)}{h}. $$

In each, only one input moves; the other is held fixed.

The notation has many forms — pick the one that fits the situation:

Form	Reads as	Common in
$\partial f/\partial x$	"del f del x"	physics, calculus textbooks
$f_x$	"f sub x"	compact written work
$\partial_x f$	"del x of f"	differential geometry, PDEs
$D_x f$	"D x of f"	operator-style writing

The geometry: slope of a cross-section

$\partial f/\partial x$ at $(a, b)$ is the slope of the curve you get by slicing the surface with the vertical plane $y = b$. Walk along the surface in the $+x$ direction with your $y$-coordinate locked: the partial tells you how fast your altitude $z$ rises.

$\partial f/\partial y$ is the mirror image — slice with the plane $x = a$ and walk in the $+y$ direction. Two slices, two slopes, two partials.

Reading the symbol

The curly $\partial$ exists for one reason: to warn you that other variables are still floating around. A straight $d$, as in $df/dx$, says "this function depends only on $x$." A curly $\partial$ says "there are more inputs; I'm freezing them."

3. Computing partials in practice

The mechanical recipe is shorter than the geometry suggests:

To compute $\partial f/\partial x$, treat $y$ as a constant and differentiate as usual.

Everything you learned in single-variable calculus carries over verbatim. The power rule, product rule, chain rule, trig and exponential derivatives — all of them apply with the other variables sitting quietly as parameters.

A first example

Take $f(x, y) = x^2 y + y^3$. To differentiate with respect to $x$, freeze $y$: it's just a number multiplying $x^2$, and the $y^3$ term has no $x$ in it at all, so it contributes nothing.

$$ \frac{\partial f}{\partial x} = 2xy, \qquad \frac{\partial f}{\partial y} = x^2 + 3y^2. $$

For $\partial f/\partial y$, the $x^2 y$ term differentiates to $x^2$ (with $x^2$ as a constant coefficient), and $y^3$ goes to $3y^2$ as usual.

A trickier one

$f(x, y) = e^{xy} \sin(x)$. Differentiating with respect to $x$ requires the product rule and the chain rule — both still work:

$$ \frac{\partial f}{\partial x} = y\,e^{xy}\sin(x) + e^{xy}\cos(x). $$

With respect to $y$, only the exponential carries $y$, so:

$$ \frac{\partial f}{\partial y} = x\,e^{xy}\sin(x). $$

A small mental check

After computing a partial, glance at it and ask: "does this still depend on the variable I differentiated?" Usually yes — partials are themselves functions of all the original inputs, so $f_x$ generally depends on both $x$ and $y$. Only when the function is linear in $x$ does $f_x$ lose its $x$.

4. Higher-order partials & Clairaut's theorem

Once $f_x$ is itself a function of $x$ and $y$, you can differentiate it again. That gives the four second partial derivatives:

$$ f_{xx} = \frac{\partial^2 f}{\partial x^2},\qquad f_{yy} = \frac{\partial^2 f}{\partial y^2},\qquad f_{xy} = \frac{\partial^2 f}{\partial y\,\partial x},\qquad f_{yx} = \frac{\partial^2 f}{\partial x\,\partial y}. $$

The straight ones, $f_{xx}$ and $f_{yy}$, measure curvature along each axis. The mixed partials $f_{xy}$ and $f_{yx}$ measure how a slope in one direction is changing as you move in the other.

Notation order

The order of subscripts and the order in $\partial^2/\partial y\,\partial x$ run opposite to one another. $f_{xy}$ means "differentiate by $x$ first, then by $y$" — read subscripts left to right. The Leibniz form $\partial^2 f / \partial y \,\partial x$ means the same thing — read the denominator right to left, like applying operators inside-out. Both conventions are universal; just remember which one you're reading.

Try it on $f(x, y) = x^2 y + y^3$ from above. We had $f_x = 2xy$ and $f_y = x^2 + 3y^2$. Differentiating once more:

$$ f_{xx} = 2y,\qquad f_{yy} = 6y,\qquad f_{xy} = 2x,\qquad f_{yx} = 2x. $$

The two mixed partials came out equal. That isn't an accident.

Clairaut's theorem (equality of mixed partials)

If $f_x$, $f_y$, $f_{xy}$, and $f_{yx}$ all exist and are continuous on an open region containing $(a, b)$, then

$$ f_{xy}(a, b) \;=\; f_{yx}(a, b). $$

Order of differentiation doesn't matter — at least, not for the well-behaved functions you'll meet in practice.

This is more than a labour-saver. It says that the second-order behavior of $f$ at a point is captured by a symmetric $2 \times 2$ array — the Hessian, which we'll meet at the end of this page. That symmetry is the doorway into the whole theory of multivariable optimization.

Pitfall — Clairaut needs continuity

Equality of mixed partials is a theorem, not a definition. For pathological functions whose mixed partials exist but aren't continuous, the two orders can disagree. The classic counterexample is $f(x, y) = xy(x^2 - y^2)/(x^2 + y^2)$ at the origin: $f_{xy}(0,0) = -1$ but $f_{yx}(0,0) = 1$.

5. The gradient $\nabla f$

Stack the partials into a vector and you get the central object of multivariable calculus:

$$ \nabla f(x, y) \;=\; \left\langle \frac{\partial f}{\partial x},\ \frac{\partial f}{\partial y} \right\rangle \;=\; \langle f_x, f_y \rangle. $$

(For three or more variables, just stack more components.) The symbol $\nabla$ is read "del" or "nabla." It's pronounced like its shape: a triangle pointing down, suggesting "the operator that acts on $f$."

The gradient is a vector field on the input space — at every point $(x, y)$ in the plane, it gives you a vector. That vector has a remarkable double identity, both geometric facts about the surface above:

Direction: $\nabla f$ points in the direction in which $f$ increases fastest. If you're standing on the surface and want to climb most steeply, look down at your shadow on the $xy$-plane and walk in the direction that the gradient is pointing.
Magnitude: $|\nabla f|$ is exactly the rate of that fastest climb — units of "output per unit horizontal distance."
Perpendicularity: $\nabla f$ is perpendicular to the level curve of $f$ through that point. Level curves are the contour lines of a topographic map; the gradient cuts straight across them, never along them.

Why perpendicular to level curves?

Walk along a level curve — by definition, $f$ doesn't change, so the directional rate of change in that direction is zero. The gradient gives the maximum rate of change. The direction of zero change and the direction of maximum change have to be perpendicular, the way "east" and "north" are perpendicular on a compass. (We'll see this fall out of the dot-product formula for directional derivatives in a moment.)

Left: the surface with a steepest-ascent direction at one point. Right: the same function as a contour map. The gradient field always points across the level curves, toward higher values.

Magnitude: how steep is "steepest"?

For $f(x, y) = x^2 + y^2$ — a parabolic bowl — we have $\nabla f = \langle 2x, 2y \rangle$. At the point $(1, 2)$, the gradient is $\langle 2, 4 \rangle$, with magnitude

$$ |\nabla f(1, 2)| = \sqrt{2^2 + 4^2} = \sqrt{20} = 2\sqrt{5}. $$

So at $(1, 2)$, the steepest direction of climb gives a slope of $2\sqrt{5} \approx 4.47$. Walk one unit in that direction (downward shadow on the $xy$-plane) and your altitude rises by about 4.47. Walking in any other direction, the rise is less.

6. Directional derivatives

The two partials measure rates of change along the $x$- and $y$-axes. But you can walk in any direction. What about northeast? What about due south-southwest? The directional derivative answers this: given a unit vector $\mathbf{u}$ in the input plane, how fast does $f$ change as you step in the direction of $\mathbf{u}$?

Directional derivative

For a unit vector $\mathbf{u}$ in the input plane, the directional derivative of $f$ at $(a, b)$ in the direction $\mathbf{u}$ is

$$ D_{\mathbf{u}} f(a, b) \;=\; \nabla f(a, b) \cdot \mathbf{u}. $$

It's a single number: the rate of change of $f$ per unit step in the direction $\mathbf{u}$.

The formula is just a dot product. That's a small fact with a lot of consequences.

Recall that $\mathbf{a} \cdot \mathbf{b} = |\mathbf{a}|\,|\mathbf{b}|\cos\theta$, where $\theta$ is the angle between the two vectors. Since $\mathbf{u}$ is a unit vector,

$$ D_{\mathbf{u}} f = |\nabla f| \cos\theta. $$

Now read that off:

$\theta = 0$ (walk along $\nabla f$): $\cos\theta = 1$, and $D_{\mathbf{u}} f = |\nabla f|$ — the maximum. Steepest ascent.
$\theta = \pi$ (walk opposite to $\nabla f$): $\cos\theta = -1$, and $D_{\mathbf{u}} f = -|\nabla f|$ — steepest descent.
$\theta = \pi/2$ (walk perpendicular to $\nabla f$): $\cos\theta = 0$, and $D_{\mathbf{u}} f = 0$ — no change. You're walking along a level curve.

The three gradient properties from §5 — direction of steepest ascent, magnitude is the max rate, perpendicular to level curves — are all encoded in this one formula.

Always use a unit vector

The formula $D_{\mathbf{u}} f = \nabla f \cdot \mathbf{u}$ assumes $|\mathbf{u}| = 1$. If you're given a direction vector $\mathbf{v}$ that isn't unit-length, normalize first: $\mathbf{u} = \mathbf{v}/|\mathbf{v}|$. Skipping this step rescales your answer by $|\mathbf{v}|$ and is one of the most common errors in homework.

A concrete computation

Take $f(x, y) = x^2 + y^2$, and find the rate of change at $(1, 2)$ in the direction of $\mathbf{v} = \langle 3, 4 \rangle$.

First, $|\mathbf{v}| = \sqrt{9 + 16} = 5$, so the unit vector is $\mathbf{u} = \langle 3/5, 4/5 \rangle$. The gradient at $(1, 2)$ is $\langle 2, 4 \rangle$. Therefore

$$ D_{\mathbf{u}} f(1, 2) = \langle 2, 4 \rangle \cdot \langle 3/5, 4/5 \rangle = \tfrac{6}{5} + \tfrac{16}{5} = \tfrac{22}{5} = 4.4. $$

Compare this to $|\nabla f(1, 2)| = 2\sqrt{5} \approx 4.47$. The direction $\langle 3, 4 \rangle$ is almost — but not quite — aligned with the gradient $\langle 2, 4 \rangle$, so the directional rate is almost the maximum, but slightly less.

7. The tangent plane

In one variable, the tangent line at $x = a$ is $y = f(a) + f'(a)(x - a)$: the best linear approximation to $f$ near $a$. In two variables, the analog is a tangent plane, and it's built from the two partials in the obvious way.

Tangent plane to $z = f(x, y)$ at $(a, b)$

$$ z \;=\; f(a, b) \;+\; f_x(a, b)\,(x - a) \;+\; f_y(a, b)\,(y - b). $$

It is the best linear approximation to $f$ near $(a, b)$, and it touches the surface at exactly the one point $(a, b, f(a, b))$.

Read the formula structurally. The constant term $f(a, b)$ pins the plane to the surface at the point. The next term contributes $f_x(a, b)$ units of rise per unit step in $x$ — that's the partial in the $x$-direction. The last term contributes $f_y(a, b)$ units of rise per unit step in $y$. The two partials are exactly the slopes the plane needs in the two axis directions; together they fix the plane uniquely.

For $f(x, y) = x^2 + y^2$ at $(1, 2)$: $f(1, 2) = 5$, $f_x = 2$, $f_y = 4$, so the tangent plane is

$$ z = 5 + 2(x - 1) + 4(y - 2) = 2x + 4y - 5. $$

This plane crosses through $(1, 2, 5)$ and rises at slope $2$ in the $x$-direction and slope $4$ in the $y$-direction — exactly matching the bowl's local geometry there.

Linearization

The same equation, viewed as a function approximation, is called the linearization of $f$ at $(a, b)$:

$$ L(x, y) = f(a, b) + f_x(a, b)\,(x - a) + f_y(a, b)\,(y - b), $$

and the claim is $f(x, y) \approx L(x, y)$ when $(x, y)$ is near $(a, b)$. It's the multivariable version of "near $a$, $f$ looks like its tangent line." It's what makes calculus useful: messy nonlinear functions can be replaced, locally, by their planes.

Differentiability, briefly

For the tangent plane to genuinely approximate $f$ near $(a, b)$, $f$ must be differentiable there — a slightly stronger condition than "both partials exist." A sufficient condition you can usually verify by inspection: $f_x$ and $f_y$ exist and are continuous on a neighborhood of $(a, b)$. Almost every elementary function you meet satisfies this.

8. The multivariable chain rule

In single-variable calculus, the chain rule handles compositions: if $y = f(g(t))$, then $dy/dt = f'(g(t))\cdot g'(t)$. In multiple variables, the same idea generalizes — but now we need to account for several routes by which $t$ can affect the output.

Chain rule for $z = f(x(t), y(t))$

If $f$ is differentiable and $x, y$ are differentiable functions of $t$, then

$$ \frac{dz}{dt} \;=\; \frac{\partial f}{\partial x}\,\frac{dx}{dt} \;+\; \frac{\partial f}{\partial y}\,\frac{dy}{dt}. $$

Each input contributes its own term: how sensitive $f$ is to that input ($\partial f/\partial x$), times how fast that input is changing ($dx/dt$). Then sum.

The pattern is "sum over paths." If $t$ feeds into $x$ and into $y$, and both $x$ and $y$ feed into $f$, there are two paths from $t$ to $f$ — and you add a term for each.

For example, if $f(x, y) = x^2 + y^2$ with $x(t) = \cos t$, $y(t) = \sin t$, then

$$ \frac{df}{dt} = 2x\,(-\sin t) + 2y\,(\cos t) = -2\cos t \sin t + 2\sin t \cos t = 0. $$

Zero — because $(\cos t, \sin t)$ traces the unit circle, a level curve of $f = x^2 + y^2$. The chain rule recovers the obvious geometric fact.

Gradient form

The chain rule above is just the dot product $df/dt = \nabla f \cdot \mathbf{r}'(t)$, where $\mathbf{r}(t) = \langle x(t), y(t) \rangle$ is the path. Directional derivatives are the special case where $\mathbf{r}'(t)$ is a constant unit vector.

When $z$ depends on $x, y$ which themselves depend on two variables $s, t$, you get one chain-rule formula per output variable:

$$ \frac{\partial z}{\partial s} = \frac{\partial f}{\partial x}\,\frac{\partial x}{\partial s} + \frac{\partial f}{\partial y}\,\frac{\partial y}{\partial s}, \qquad \frac{\partial z}{\partial t} = \frac{\partial f}{\partial x}\,\frac{\partial x}{\partial t} + \frac{\partial f}{\partial y}\,\frac{\partial y}{\partial t}. $$

9. Implicit differentiation

Sometimes a relationship between variables is given by an equation $F(x, y) = 0$ that you can't (or don't want to) solve for $y$ explicitly. The chain rule lets you compute $dy/dx$ anyway.

Differentiate both sides with respect to $x$, treating $y$ as a function of $x$:

$$ \frac{\partial F}{\partial x} + \frac{\partial F}{\partial y}\,\frac{dy}{dx} = 0. $$

Solving for $dy/dx$:

Implicit derivative formula

$$ \frac{dy}{dx} \;=\; -\,\frac{F_x}{F_y}, \qquad \text{provided } F_y \neq 0. $$

It works as long as the partial in the denominator is nonzero at the point of interest.

For example, the unit circle $F(x, y) = x^2 + y^2 - 1 = 0$ gives $F_x = 2x$, $F_y = 2y$, so $dy/dx = -x/y$ — matching what you'd get by differentiating $y = \pm\sqrt{1 - x^2}$ directly, but with no case-splitting.

In three variables, $F(x, y, z) = 0$ defines $z$ implicitly as a function of $x, y$, and the partials follow the same pattern:

$$ \frac{\partial z}{\partial x} = -\,\frac{F_x}{F_z}, \qquad \frac{\partial z}{\partial y} = -\,\frac{F_y}{F_z}. $$

10. Critical points and the second-derivative test

In one variable, finding extrema starts with setting $f'(x) = 0$. In two variables, the analog is setting both partials to zero at once — equivalently, $\nabla f = \mathbf{0}$.

A point where $\nabla f(\mathbf{c}) = \mathbf{0}$ is called a critical point. At a critical point, the tangent plane is horizontal: the surface has flattened out, momentarily, in every direction.

But flat in every direction doesn't pin down what's happening. There are three possibilities:

Local minimum

Surface curves up in every direction.
Example: $f(x, y) = x^2 + y^2$ at $(0, 0)$ — a bowl.

Local maximum

Surface curves down in every direction.
Example: $f(x, y) = -x^2 - y^2$ at $(0, 0)$ — an upside-down bowl.

Saddle point

Up in some directions, down in others.
Example: $f(x, y) = x^2 - y^2$ at $(0, 0)$ — a pringle chip.

Degenerate

The second derivatives can't distinguish max from saddle.
Need higher-order analysis (or other tools).

The saddle point is the genuinely new case — it has no analog in one variable, where a critical point is either a min, a max, or an inflection of one type. To tell which case you're in, you look at the second partials.

The Hessian and the discriminant

At a critical point, package the second partials into the Hessian matrix:

$$ H = \begin{pmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{pmatrix}. $$

By Clairaut, $H$ is symmetric. Its determinant is called the discriminant:

$$ D \;=\; \det H \;=\; f_{xx}\,f_{yy} \;-\; (f_{xy})^2. $$

Second-derivative test (2D)

At a critical point $\mathbf{c}$:

$D > 0$ and $f_{xx} > 0$: local minimum.
$D > 0$ and $f_{xx} < 0$: local maximum.
$D < 0$: saddle point.
$D = 0$: inconclusive — the test fails; use other methods.

The test sits on the eigenvalues of $H$ in disguise. $D > 0$ with $f_{xx} > 0$ means both eigenvalues are positive (curving up in every direction). $D > 0$ with $f_{xx} < 0$ means both are negative (curving down). $D < 0$ means one positive and one negative — a saddle.

A worked classification

Take $f(x, y) = x^2 + y^2 - 2x + 4y$. Then $\nabla f = \langle 2x - 2,\ 2y + 4 \rangle$, which is zero only at $(1, -2)$. The second partials are $f_{xx} = 2$, $f_{yy} = 2$, $f_{xy} = 0$. So $D = 2\cdot 2 - 0^2 = 4 > 0$ and $f_{xx} > 0$: local (and in fact global) minimum, with value $f(1, -2) = -5$.

Compare $f(x, y) = x^2 - y^2$: $\nabla f = \langle 2x, -2y \rangle$, zero at $(0, 0)$. Then $f_{xx} = 2$, $f_{yy} = -2$, $f_{xy} = 0$, so $D = -4 < 0$: saddle.

The full picture lives elsewhere

This is a glimpse. Constrained optimization, Lagrange multipliers, global extrema on bounded regions, gradient descent — all build on the gradient and Hessian we just introduced. They belong to a topic of their own (Multivariable Optimization), which we'll get to. The point here is that the same two derivatives that built the tangent plane also classify extrema. Nothing new needs to be invented; you already have the tools.

11. Common pitfalls

Forgetting to hold the other variable constant

When computing $\partial f/\partial x$, every $y$ in sight is a constant. It's tempting, midway through a long expression, to start treating $y$ as a variable — especially if the chain rule is involved. Keep reminding yourself: "$y$ is locked."

Mixing up subscript order

$f_{xy}$ means "differentiate by $x$ first, then by $y$." The Leibniz form $\partial^2 f/\partial y\, \partial x$ means the same thing, but reads right-to-left. Clairaut's theorem makes them equal for well-behaved $f$, so the order rarely matters in practice — but it does matter when you're computing them and want to know what you've just computed.

Treating a direction vector as if it were unit-length

$D_{\mathbf{u}} f = \nabla f \cdot \mathbf{u}$ requires $|\mathbf{u}| = 1$. If you forget to normalize, your "rate" is off by a factor of $|\mathbf{v}|$. Always check the length of your direction vector before dotting it with the gradient.

Existence of partials ≠ differentiability

You can construct a function whose two partials both exist at $(0, 0)$ but which isn't continuous there — let alone differentiable, let alone admits a meaningful tangent plane. The standard example is $f(x, y) = xy/(x^2 + y^2)$ for $(x, y) \neq (0, 0)$ with $f(0, 0) = 0$. Both partials at the origin are zero, but the function fails to have a limit at the origin. Partials are a necessary condition for differentiability, not a sufficient one.

$\nabla f$ vanishing ≠ extremum

A zero gradient is a necessary condition for a local extremum at an interior point — but it admits saddles too. Always classify your critical points; don't assume "gradient is zero" means "I found a min."

12. Worked examples

Example 1 · Find both partials of $f(x, y) = x^3 \cos(y) + y\,e^{x}$.

$\partial f/\partial x$. Treat $y$ as a constant. The $\cos(y)$ is a constant multiplier, $y$ is a constant in the second term:

$$ f_x = 3x^2 \cos(y) + y\,e^{x}. $$

$\partial f/\partial y$. Treat $x$ as a constant. Now $x^3$ and $e^x$ are constants:

$$ f_y = -x^3 \sin(y) + e^{x}. $$

Example 2 · Verify Clairaut for $f(x, y) = x^2 e^{xy}$.

First partials. By the product rule on $x^2 \cdot e^{xy}$:

$$ f_x = 2x\,e^{xy} + x^2 \cdot y\,e^{xy} = (2x + x^2 y)\,e^{xy}, $$ $$ f_y = x^2 \cdot x\,e^{xy} = x^3\,e^{xy}. $$

$f_{xy}$. Differentiate $f_x$ with respect to $y$:

$$ f_{xy} = \frac{\partial}{\partial y}\!\left[(2x + x^2 y)\,e^{xy}\right] = x^2 e^{xy} + (2x + x^2 y)\cdot x\,e^{xy} = (3x^2 + x^3 y)\,e^{xy}. $$

$f_{yx}$. Differentiate $f_y$ with respect to $x$:

$$ f_{yx} = \frac{\partial}{\partial x}\!\left[x^3\,e^{xy}\right] = 3x^2\,e^{xy} + x^3 \cdot y\,e^{xy} = (3x^2 + x^3 y)\,e^{xy}. $$

Equal, as Clairaut guarantees.

Example 3 · Gradient and direction of steepest ascent of $f(x, y) = x^2 y$ at $(2, 3)$.

$f_x = 2xy$, $f_y = x^2$. At $(2, 3)$: $f_x = 12$, $f_y = 4$.

$$ \nabla f(2, 3) = \langle 12, 4 \rangle. $$

Magnitude: $|\nabla f| = \sqrt{144 + 16} = \sqrt{160} = 4\sqrt{10}$.

Direction of steepest ascent: $\mathbf{u} = \nabla f/|\nabla f| = \langle 12, 4 \rangle/(4\sqrt{10}) = \langle 3, 1\rangle/\sqrt{10}$.

The maximum rate of increase at $(2, 3)$ is $4\sqrt{10} \approx 12.65$, achieved by stepping in the direction $\langle 3, 1\rangle/\sqrt{10}$.

Example 4 · Directional derivative of $f(x, y) = \ln(x^2 + y^2)$ at $(1, 1)$ in the direction $\mathbf{v} = \langle 1, -1 \rangle$.

$f_x = 2x/(x^2 + y^2)$, $f_y = 2y/(x^2 + y^2)$. At $(1, 1)$: $f_x = 1$, $f_y = 1$, so $\nabla f(1, 1) = \langle 1, 1 \rangle$.

Normalize $\mathbf{v}$: $|\mathbf{v}| = \sqrt{2}$, so $\mathbf{u} = \langle 1, -1 \rangle/\sqrt{2}$.

$$ D_{\mathbf{u}} f(1, 1) = \langle 1, 1 \rangle \cdot \langle 1, -1\rangle/\sqrt{2} = (1 - 1)/\sqrt{2} = 0. $$

Zero — meaning $\mathbf{u}$ is perpendicular to $\nabla f$, so $\mathbf{u}$ runs along a level curve of $f$ at the point.

Example 5 · Tangent plane to $z = \sin(xy)$ at $(\pi/2, 1)$.

$f(\pi/2, 1) = \sin(\pi/2) = 1$.

$f_x = y\cos(xy)$, $f_y = x\cos(xy)$. At $(\pi/2, 1)$: $\cos(\pi/2) = 0$, so $f_x = 0$ and $f_y = 0$.

Tangent plane: $z = 1 + 0\cdot(x - \pi/2) + 0\cdot(y - 1) = 1$. A horizontal plane — $(\pi/2, 1)$ is a critical point (in fact, a maximum, since $\sin \le 1$).

Example 6 · Classify the critical points of $f(x, y) = x^3 - 3xy + y^3$.

Critical points. $\nabla f = \langle 3x^2 - 3y,\ -3x + 3y^2 \rangle$. Setting both to zero: $y = x^2$ and $x = y^2$. Substituting, $x = (x^2)^2 = x^4$, so $x(x^3 - 1) = 0$, giving $x = 0$ or $x = 1$. The critical points are $(0, 0)$ and $(1, 1)$.

Hessian. $f_{xx} = 6x$, $f_{yy} = 6y$, $f_{xy} = -3$, so $D = 36xy - 9$.

At $(0, 0)$: $D = -9 < 0$ → saddle point.

At $(1, 1)$: $D = 36 - 9 = 27 > 0$ and $f_{xx} = 6 > 0$ → local minimum, with value $f(1, 1) = 1 - 3 + 1 = -1$.

Sources & further reading

The treatment above is synthesized from standard multivariable calculus references. When you want more depth — proofs, more examples, the multivariable chain rule and beyond — these are the places to go.

Partial Derivatives Textbook OpenStax · Calculus Volume 3, §4.3

Peer-reviewed, openly licensed textbook chapter. Covers partials, higher-order partials, Clairaut's theorem, and tangent planes rigorously with worked examples.
Directional Derivatives and the Gradient Textbook OpenStax · Calculus Volume 3, §4.6

Companion chapter on the gradient, directional derivatives, steepest ascent, and the geometric meaning of perpendicularity to level curves.
Partial Derivatives Tutorial Paul's Online Math Notes · Lamar University

Concise, worked-example-heavy notes with the canonical step-by-step computational style. Excellent if you want to drill the mechanics of computing partials, gradients, and directional derivatives.
Derivatives of Multivariable Functions Course Khan Academy · Multivariable Calculus

Video lessons with practice problems, including Grant Sanderson's visual treatment of the gradient. Best for building intuition through animation before working through formal derivations.
Gradient Reference Wolfram MathWorld

Formal mathematical reference. Short, dense, precise — use this when you want the definition stated in the language professional mathematicians actually use, including coordinate-free formulations.
Partial derivative Encyclopedia Wikipedia

Broad overview that covers notation conventions, higher-dimensional analogs, and historical context. Useful for placing partial derivatives in the wider mathematical landscape, including PDEs and differential geometry.