Part 2: Matrix Calculus Rules | Matrix Calculus & AD

Example 1: A Simple Product

Let's start with something easy. Consider $f(x) = x_1 x_2 x_3$ where $x \in \mathbb{R}^3$.

The partial derivatives are straightforward:

\frac{\partial f}{\partial x_1} = x_2 x_3, \quad \frac{\partial f}{\partial x_2} = x_1 x_3, \quad \frac{\partial f}{\partial x_3} = x_1 x_2

So the gradient is:

\nabla f(x) = \begin{bmatrix} x_2 x_3 \\ x_1 x_3 \\ x_1 x_2 \end{bmatrix}

Notice the pattern: each partial derivative is the product of all variables except the one you're differentiating with respect to.

Example 2: Composition with a Power

Now let $f(x) = (x_1 x_2 x_3)^3$. Let $p = x_1 x_2 x_3$, so $f = p^3$.

Step 1: Outer derivative

$\frac{df}{dp} = 3p^2 = 3(x_1 x_2 x_3)^2$

Step 2: Inner derivative (chain rule)

$\frac{\partial p}{\partial x_i}$ is the product of all $x_j$ except $x_i$ (same as Example 1)

Step 3: Chain rule

$\frac{\partial f}{\partial x_i} = \frac{df}{dp} \cdot \frac{\partial p}{\partial x_i} = 3(x_1 x_2 x_3)^2 \cdot \frac{x_1 x_2 x_3}{x_i} = \frac{3(x_1 x_2 x_3)^3}{x_i}$

Result

$$\nabla f(x) = 3(x_1 x_2 x_3)^2 \begin{bmatrix} x_2 x_3 \\ x_1 x_3 \\ x_1 x_2 \end{bmatrix}$$

The Essential Rules

Most gradient computations boil down to a handful of rules applied repeatedly.

Rule 1: Linearity

$$\nabla_x [\alpha f(x) + \beta g(x)] = \alpha \nabla f(x) + \beta \nabla g(x)$$

Rule 2: Chain Rule (scalar composition)

If $f(x) = \phi(g(x))$ where $g: \mathbb{R}^n \to \mathbb{R}$ and $\phi: \mathbb{R} \to \mathbb{R}$, then: $$\nabla f(x) = \phi'(g(x)) \cdot \nabla g(x)$$

Dimensions: scalar $\times$ $n \times 1$ $=$ $n \times 1$.

Rule 3: Chain Rule (vector composition)

If $f(x) = \phi(g(x))$ where $g: \mathbb{R}^n \to \mathbb{R}^m$ and $\phi: \mathbb{R}^m \to \mathbb{R}$, then: $$\nabla_x f = J_g(x)^T \nabla_g \phi$$

Dimensions: $(m \times n)^T \cdot (m \times 1) = (n \times m)(m \times 1) = n \times 1$. The Jacobian $J_g$ is $m \times n$.

Common Building Blocks

$\nabla_x (a^T x) = a$

$\nabla_x (x^T x) = 2x$

$\nabla_x (x^T A x) = (A + A^T)x$

$\nabla_x \|Ax - b\|^2 = 2A^T(Ax-b)$

Detailed Example: $f(x) = e^{\ln(x)^T A \ln(x)}$

Here $\ln(x)$ is applied elementwise, $A$ is a fixed $n \times n$ matrix, and $x > 0$ componentwise. Let's build this up layer by layer.

Step 1: Define intermediate variables

Let $u = \ln(x)$ (elementwise), $\;q = u^T A u$ (quadratic form), $\;f = e^q$.

Step 2: Outermost derivative

$\frac{df}{dq} = e^q$

Step 3: Quadratic form derivative

$\frac{\partial q}{\partial u} = (A + A^T) u$

If $A$ is symmetric, this simplifies to $2Au$.

Step 4: Elementwise log derivative

$\frac{\partial u}{\partial x} = \text{diag}(1/x_1, \ldots, 1/x_n)$

This is a diagonal Jacobian since each $u_i = \ln(x_i)$ depends only on $x_i$.

Step 5: Chain rule assembly

$$\nabla_x f = \frac{\partial u}{\partial x}^T \frac{\partial q}{\partial u} \frac{df}{dq} = \text{diag}(1/x) \cdot (A + A^T)\ln(x) \cdot e^{\ln(x)^T A \ln(x)}$$

Step 6: Simplify

$$\nabla_x f(x) = \frac{e^{\ln(x)^T A \ln(x)}}{x} \odot \left[(A + A^T)\ln(x)\right]$$ where $\frac{1}{x}$ and $\odot$ are elementwise. Component $i$: $\;\frac{e^q}{x_i} [(A + A^T)\ln(x)]_i$.

Step 7: Dimension check

$\text{diag}(1/x)$: $n \times n$. $(A+A^T)\ln(x)$: $n \times 1$. $e^q$: scalar. Product: $n \times 1$. ✔

Matrix Derivatives: $f(X) = \ln(\det(X))$

Now consider a scalar function of a matrix. For a positive definite $X$:

f(X) = \ln(\det(X))

Step 1: Recall the identity

$\det(X + \epsilon E_{ij}) = \det(X)(1 + \epsilon [X^{-1}]_{ji} + O(\epsilon^2))$

where $E_{ij}$ is the matrix with 1 in position $(i,j)$ and 0 elsewhere.

Step 2: Apply log

$\ln\det(X + \epsilon E_{ij}) = \ln\det(X) + \ln(1 + \epsilon [X^{-1}]_{ji} + \cdots) = \ln\det(X) + \epsilon [X^{-1}]_{ji} + O(\epsilon^2)$

Step 3: Read off the derivative

$\frac{\partial f}{\partial X_{ij}} = [X^{-1}]_{ji} = [X^{-T}]_{ij}$

Result

$$\frac{\partial \ln\det(X)}{\partial X} = X^{-T}$$ For symmetric $X$, this simplifies to $X^{-1}$.

Derivative Workshop

Pick a function from the presets, or type your own expression. See the gradient (analytic for presets, finite-difference for custom), step-by-step derivation, and a numerical check.

Cheat Sheet

Function $f(x)$	Gradient $\nabla f(x)$
$a^T x$	$a$
$x^T x$	$2x$
$x^T A x$ ($A$ symmetric)	$2Ax$
$x^T A x$ ($A$ general)	$(A + A^T)x$
$\\|Ax - b\\|^2$	$2A^T(Ax - b)$
$\phi(g(x))$	$\phi'(g(x)) \nabla g(x)$
$\ln\det(X)$	$X^{-T}$

Next: Automatic Differentiation →