Matrix Calculus Rules

Worked examples from simple to complex, with finite difference checks every step of the way.

Example 1: A Simple Product

Let's start with something easy. Consider $f(x) = x_1 x_2 x_3$ where $x \in \mathbb{R}^3$.

The partial derivatives are straightforward:

$$\frac{\partial f}{\partial x_1} = x_2 x_3, \quad \frac{\partial f}{\partial x_2} = x_1 x_3, \quad \frac{\partial f}{\partial x_3} = x_1 x_2$$

So the gradient is:

$$\nabla f(x) = \begin{bmatrix} x_2 x_3 \\ x_1 x_3 \\ x_1 x_2 \end{bmatrix}$$
Notice the pattern: each partial derivative is the product of all variables except the one you're differentiating with respect to.

Example 2: Composition with a Power

Now let $f(x) = (x_1 x_2 x_3)^3$. Let $p = x_1 x_2 x_3$, so $f = p^3$.

Step 1: Outer derivative
$\frac{df}{dp} = 3p^2 = 3(x_1 x_2 x_3)^2$
Step 2: Inner derivative (chain rule)
$\frac{\partial p}{\partial x_i}$ is the product of all $x_j$ except $x_i$ (same as Example 1)
Step 3: Chain rule
$\frac{\partial f}{\partial x_i} = \frac{df}{dp} \cdot \frac{\partial p}{\partial x_i} = 3(x_1 x_2 x_3)^2 \cdot \frac{x_1 x_2 x_3}{x_i} = \frac{3(x_1 x_2 x_3)^3}{x_i}$
Result
$$\nabla f(x) = 3(x_1 x_2 x_3)^2 \begin{bmatrix} x_2 x_3 \\ x_1 x_3 \\ x_1 x_2 \end{bmatrix}$$
FD Check: Examples 1 & 2

The Essential Rules

Most gradient computations boil down to a handful of rules applied repeatedly.

Rule 1: Linearity
$$\nabla_x [\alpha f(x) + \beta g(x)] = \alpha \nabla f(x) + \beta \nabla g(x)$$
Rule 2: Chain Rule (scalar composition)
If $f(x) = \phi(g(x))$ where $g: \mathbb{R}^n \to \mathbb{R}$ and $\phi: \mathbb{R} \to \mathbb{R}$, then: $$\nabla f(x) = \phi'(g(x)) \cdot \nabla g(x)$$
Dimensions: scalar $\times$ $n \times 1$ $=$ $n \times 1$.
Rule 3: Chain Rule (vector composition)
If $f(x) = \phi(g(x))$ where $g: \mathbb{R}^n \to \mathbb{R}^m$ and $\phi: \mathbb{R}^m \to \mathbb{R}$, then: $$\nabla_x f = J_g(x)^T \nabla_g \phi$$
Dimensions: $(m \times n)^T \cdot (m \times 1) = (n \times m)(m \times 1) = n \times 1$. The Jacobian $J_g$ is $m \times n$.
Common Building Blocks
$\nabla_x (a^T x) = a$
$\nabla_x (x^T x) = 2x$
$\nabla_x (x^T A x) = (A + A^T)x$
$\nabla_x \|Ax - b\|^2 = 2A^T(Ax-b)$

Detailed Example: $f(x) = e^{\ln(x)^T A \ln(x)}$

Here $\ln(x)$ is applied elementwise, $A$ is a fixed $n \times n$ matrix, and $x > 0$ componentwise. Let's build this up layer by layer.

Step 1: Define intermediate variables
Let $u = \ln(x)$ (elementwise), $\;q = u^T A u$ (quadratic form), $\;f = e^q$.
Step 2: Outermost derivative
$\frac{df}{dq} = e^q$
Step 3: Quadratic form derivative
$\frac{\partial q}{\partial u} = (A + A^T) u$
If $A$ is symmetric, this simplifies to $2Au$.
Step 4: Elementwise log derivative
$\frac{\partial u}{\partial x} = \text{diag}(1/x_1, \ldots, 1/x_n)$
This is a diagonal Jacobian since each $u_i = \ln(x_i)$ depends only on $x_i$.
Step 5: Chain rule assembly
$$\nabla_x f = \frac{\partial u}{\partial x}^T \frac{\partial q}{\partial u} \frac{df}{dq} = \text{diag}(1/x) \cdot (A + A^T)\ln(x) \cdot e^{\ln(x)^T A \ln(x)}$$
Step 6: Simplify
$$\nabla_x f(x) = \frac{e^{\ln(x)^T A \ln(x)}}{x} \odot \left[(A + A^T)\ln(x)\right]$$ where $\frac{1}{x}$ and $\odot$ are elementwise. Component $i$: $\;\frac{e^q}{x_i} [(A + A^T)\ln(x)]_i$.
Step 7: Dimension check
$\text{diag}(1/x)$: $n \times n$. $(A+A^T)\ln(x)$: $n \times 1$. $e^q$: scalar. Product: $n \times 1$. ✔
FD Check: $f(x) = e^{\ln(x)^T A \ln(x)}$

Matrix Derivatives: $f(X) = \ln(\det(X))$

Now consider a scalar function of a matrix. For a positive definite $X$:

$$f(X) = \ln(\det(X))$$
Step 1: Recall the identity
$\det(X + \epsilon E_{ij}) = \det(X)(1 + \epsilon [X^{-1}]_{ji} + O(\epsilon^2))$
where $E_{ij}$ is the matrix with 1 in position $(i,j)$ and 0 elsewhere.
Step 2: Apply log
$\ln\det(X + \epsilon E_{ij}) = \ln\det(X) + \ln(1 + \epsilon [X^{-1}]_{ji} + \cdots) = \ln\det(X) + \epsilon [X^{-1}]_{ji} + O(\epsilon^2)$
Step 3: Read off the derivative
$\frac{\partial f}{\partial X_{ij}} = [X^{-1}]_{ji} = [X^{-T}]_{ij}$
Result
$$\frac{\partial \ln\det(X)}{\partial X} = X^{-T}$$ For symmetric $X$, this simplifies to $X^{-1}$.
FD Check: $\ln(\det(X))$

Derivative Workshop

Pick a function from the presets, or type your own expression. See the gradient (analytic for presets, finite-difference for custom), step-by-step derivation, and a numerical check.

Derivative Workshop

Cheat Sheet

Function $f(x)$Gradient $\nabla f(x)$
$a^T x$$a$
$x^T x$$2x$
$x^T A x$ ($A$ symmetric)$2Ax$
$x^T A x$ ($A$ general)$(A + A^T)x$
$\|Ax - b\|^2$$2A^T(Ax - b)$
$\phi(g(x))$$\phi'(g(x)) \nabla g(x)$
$\ln\det(X)$$X^{-T}$

Next: Automatic Differentiation →