Matrix Calculus Rules
Worked examples from simple to complex, with finite difference checks every step of the way.
Example 1: A Simple Product
Let's start with something easy. Consider $f(x) = x_1 x_2 x_3$ where $x \in \mathbb{R}^3$.
The partial derivatives are straightforward:
$$\frac{\partial f}{\partial x_1} = x_2 x_3, \quad \frac{\partial f}{\partial x_2} = x_1 x_3, \quad \frac{\partial f}{\partial x_3} = x_1 x_2$$
So the gradient is:
$$\nabla f(x) = \begin{bmatrix} x_2 x_3 \\ x_1 x_3 \\ x_1 x_2 \end{bmatrix}$$
Notice the pattern: each partial derivative is the product of all variables except the one you're differentiating with respect to.
Example 2: Composition with a Power
Now let $f(x) = (x_1 x_2 x_3)^3$. Let $p = x_1 x_2 x_3$, so $f = p^3$.
Step 1: Outer derivative
$\frac{df}{dp} = 3p^2 = 3(x_1 x_2 x_3)^2$
Step 2: Inner derivative (chain rule)
$\frac{\partial p}{\partial x_i}$ is the product of all $x_j$ except $x_i$ (same as Example 1)
Step 3: Chain rule
$\frac{\partial f}{\partial x_i} = \frac{df}{dp} \cdot \frac{\partial p}{\partial x_i} = 3(x_1 x_2 x_3)^2 \cdot \frac{x_1 x_2 x_3}{x_i} = \frac{3(x_1 x_2 x_3)^3}{x_i}$
Result
$$\nabla f(x) = 3(x_1 x_2 x_3)^2 \begin{bmatrix} x_2 x_3 \\ x_1 x_3 \\ x_1 x_2 \end{bmatrix}$$
Next Step
Previous
Show All
The Essential Rules
Most gradient computations boil down to a handful of rules applied repeatedly.
Rule 1: Linearity
$$\nabla_x [\alpha f(x) + \beta g(x)] = \alpha \nabla f(x) + \beta \nabla g(x)$$
Rule 2: Chain Rule (scalar composition)
If $f(x) = \phi(g(x))$ where $g: \mathbb{R}^n \to \mathbb{R}$ and $\phi: \mathbb{R} \to \mathbb{R}$, then:
$$\nabla f(x) = \phi'(g(x)) \cdot \nabla g(x)$$
Dimensions: scalar $\times$ $n \times 1$ $=$ $n \times 1$.
Rule 3: Chain Rule (vector composition)
If $f(x) = \phi(g(x))$ where $g: \mathbb{R}^n \to \mathbb{R}^m$ and $\phi: \mathbb{R}^m \to \mathbb{R}$, then:
$$\nabla_x f = J_g(x)^T \nabla_g \phi$$
Dimensions: $(m \times n)^T \cdot (m \times 1) = (n \times m)(m \times 1) = n \times 1$. The Jacobian $J_g$ is $m \times n$.
Common Building Blocks
$\nabla_x (a^T x) = a$
$\nabla_x (x^T x) = 2x$
$\nabla_x (x^T A x) = (A + A^T)x$
$\nabla_x \|Ax - b\|^2 = 2A^T(Ax-b)$
Detailed Example: $f(x) = e^{\ln(x)^T A \ln(x)}$
Here $\ln(x)$ is applied elementwise, $A$ is a fixed $n \times n$ matrix, and $x > 0$ componentwise. Let's build this up layer by layer.
Step 1: Define intermediate variables
Let $u = \ln(x)$ (elementwise), $\;q = u^T A u$ (quadratic form), $\;f = e^q$.
Step 2: Outermost derivative
$\frac{df}{dq} = e^q$
Step 3: Quadratic form derivative
$\frac{\partial q}{\partial u} = (A + A^T) u$
If $A$ is symmetric, this simplifies to $2Au$.
Step 4: Elementwise log derivative
$\frac{\partial u}{\partial x} = \text{diag}(1/x_1, \ldots, 1/x_n)$
This is a diagonal Jacobian since each $u_i = \ln(x_i)$ depends only on $x_i$.
Step 5: Chain rule assembly
$$\nabla_x f = \frac{\partial u}{\partial x}^T \frac{\partial q}{\partial u} \frac{df}{dq} = \text{diag}(1/x) \cdot (A + A^T)\ln(x) \cdot e^{\ln(x)^T A \ln(x)}$$
Step 6: Simplify
$$\nabla_x f(x) = \frac{e^{\ln(x)^T A \ln(x)}}{x} \odot \left[(A + A^T)\ln(x)\right]$$
where $\frac{1}{x}$ and $\odot$ are elementwise. Component $i$: $\;\frac{e^q}{x_i} [(A + A^T)\ln(x)]_i$.
Step 7: Dimension check
$\text{diag}(1/x)$: $n \times n$. $(A+A^T)\ln(x)$: $n \times 1$. $e^q$: scalar. Product: $n \times 1$. ✔
Next Step
Previous
Show All
Derivative Workshop
Pick a function from the presets, or type your own expression. See the gradient (analytic for presets, finite-difference for custom), step-by-step derivation, and a numerical check.