Automatic Differentiation¶

The automatic differentiation packages in Julia are quite sophisticated. They implement most of the standard chain rules in these documents by manipulating the syntax of the Julia function.

This means you should keep your Julia function syntax as clean and simple as possible.

The package you want is Zygote and the functions are: gradient, jacobian, hessian

In [1]:
#using Pkg; Pkg.add("Zygote")
    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
   Resolving package versions...
   Installed ZygoteRules ─── v0.2.3
   Installed LLVMExtra_jll ─ v0.0.18+0
   Installed IRTools ─────── v0.4.9
   Installed GPUArrays ───── v8.6.2
   Installed Zygote ──────── v0.6.55
   Installed ChainRules ──── v1.48.0
   Installed LLVM ────────── v4.17.1
    Updating `~/.julia/environments/v1.8/Project.toml`
⌃ [e88e6eb3] + Zygote v0.6.55
    Updating `~/.julia/environments/v1.8/Manifest.toml`
  [082447d4] + ChainRules v1.48.0
⌃ [0c68f7d7] + GPUArrays v8.6.2
  [7869d1d1] + IRTools v0.4.9
⌅ [929cbde3] + LLVM v4.17.1
⌃ [e88e6eb3] + Zygote v0.6.55
  [700de1a5] + ZygoteRules v0.2.3
⌅ [dad2f222] + LLVMExtra_jll v0.0.18+0
        Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated -m`
Precompiling project...
  ✓ LLVMExtra_jll
  ✓ ZygoteRules
  ✓ IRTools
  ✓ LLVM
  ✓ ChainRules
  ✓ GPUArrays
  ✓ Zygote
  7 dependencies successfully precompiled in 8 seconds. 411 already precompiled. 5 skipped during auto due to previous errors.

ChatGPT wrote the following code for me in response to a prompt about using Julia to do an automatic gradient.

In [92]:
using Zygote

function f(alpha, sigma)
    sum(exp.(-(alpha.^2) ./ sigma))
end

# Compute the gradient of f with respect to alpha and sigma
gradient_f(alpha, sigma) = gradient(f, alpha, sigma)

# Test the gradient_f function
alpha_test = 1.0
sigma_test = 2.0

grad_alpha, grad_sigma = gradient_f(alpha_test, sigma_test)

println("Gradient with respect to alpha: ", grad_alpha)
println("Gradient with respect to sigma: ", grad_sigma)
Gradient with respect to alpha: -0.6065306597126334
Gradient with respect to sigma: 0.15163266492815836
In [93]:
#alpha_test = [1.0,2.0]
#sigma_test = 2.0
using LinearAlgebra
"""
    finite_difference_gradient(f, x)

"""
function finite_difference_gradient(f, x; h=nothing)
    fx = f(x) # computed function 
    
    if h === nothing 
        h = sqrt(eps(fx))
    end 
    xi = copy(x)
    gxd = copy(x) 
    for i=1:length(x)
        xi[i] += h
        gxd[i] = (f(xi) - fx)/h
        xi[i] = x[i] # reset
    end
    return gxd 
end 
"""
    gradient_check(f, x, g)

Check that g seems to express the gradient of f by comparing
values computed with finite differences. This is based on the 
gradientcheck function in the Poblano Matlab toolbox.
"""
function gradient_check(f, x, g)
    gx = g(x) # putative gradient
    
    gxd = finite_difference_gradient(f, x)
        
    absdiff = abs.(gxd .- gx)
    relabsdiff = abs.(gxd .- gx) ./ max.(abs.(gx), 1.0)
    
    return (g=gx, gfd=gxd, maxdiff=maximum(absdiff), 
            maxreldiff = maximum(relabsdiff),            
            normdiff=norm(gxd - gx))
end
gradient_check(x -> x'*x, randn(10), x -> 2*x).maxdiff
Out[93]:
4.779139217703232e-8
In [107]:
sigmas = rand(10)
gradient_check(x -> f(x, sigmas), randn(10), 
               x -> gradient_f(x, sigmas)[1]).maxdiff
Out[107]:
6.129271010735238e-7
In [110]:
using LinearAlgebra
function logdet(X)
    log(det(X))
end
gradlogdet(X) = gradient(logdet, X)
gradlogdet(triu(rand(10,10)))
Out[110]:
([1.124998654396587 0.0 … 0.0 0.0; -0.5326680961029698 1.36256303520933 … 0.0 0.0; … ; 2.568782278236005 6.726872059783083 … 4.9216580080856795 0.0; 3.3035474626954975 8.633931237814297 … -1.0912479621173081 1.2954401924752859],)
In [123]:
gradient_check(x->logdet(x), triu(rand(10,10)), 
               x->gradlogdet(x)[1]).maxdiff
Out[123]:
0.0007923966184364417
In [124]:
using StableRNGs
X = triu(rand(StableRNG(3), 10,10))
gradlogdet(X)[1]
Out[124]:
10×10 Matrix{Float64}:
     2.73632     0.0          0.0         …   0.0        0.0       0.0
   -13.8578     25.0635       0.0             0.0        0.0       0.0
    68.0617   -155.79         9.86121         0.0        0.0       0.0
     6.44452   -40.3972      -3.2654          0.0        0.0       0.0
   -13.8235     45.325       -1.71273         0.0        0.0       0.0
  -135.885     313.995      -13.7         …   0.0        0.0       0.0
    69.0426   -149.992        4.37892         0.0        0.0       0.0
    -3.49697    -0.778595    -0.00974034      1.20345    0.0       0.0
   395.881    -751.759       48.3518         -6.06897    6.34153   0.0
 -4568.44     8938.47      -563.165          54.0732   -68.5513   11.2178
In [125]:
inv(X)'
Out[125]:
10×10 adjoint(::Matrix{Float64}) with eltype Float64:
     2.73632     0.0          0.0         …   0.0        0.0       0.0
   -13.8578     25.0635       0.0             0.0        0.0       0.0
    68.0617   -155.79         9.86121         0.0        0.0       0.0
     6.44452   -40.3972      -3.2654          0.0        0.0       0.0
   -13.8235     45.325       -1.71273         0.0        0.0       0.0
  -135.885     313.995      -13.7         …   0.0        0.0       0.0
    69.0426   -149.992        4.37892         0.0        0.0       0.0
    -3.49697    -0.778595    -0.00974034      1.20345    0.0       0.0
   395.881    -751.759       48.3518         -6.06897    6.34153   0.0
 -4568.44     8938.47      -563.165          54.0732   -68.5513   11.2178
In [127]:
gradient_check(x->logdet(x), X, 
               x->gradlogdet(x)[1]).maxdiff
Out[127]:
1.683266373760489
In [129]:
gradient_check(x->logdet(x), X, 
               x->gradlogdet(x)[1]).maxreldiff
Out[129]:
0.00018831696676604022
In [131]:
max.(abs.(inv(X)'), 1.0)
Out[131]:
10×10 Matrix{Float64}:
    2.73632     1.0       1.0        1.0      …   1.0       1.0       1.0
   13.8578     25.0635    1.0        1.0          1.0       1.0       1.0
   68.0617    155.79      9.86121    1.0          1.0       1.0       1.0
    6.44452    40.3972    3.2654     8.45944      1.0       1.0       1.0
   13.8235     45.325     1.71273    3.14934      1.0       1.0       1.0
  135.885     313.995    13.7        5.15814  …   1.0       1.0       1.0
   69.0426    149.992     4.37892    2.35577      1.0       1.0       1.0
    3.49697     1.0       1.0        1.3294       1.20345   1.0       1.0
  395.881     751.759    48.3518    30.1474       6.06897   6.34153   1.0
 4568.44     8938.47    563.165    291.629       54.0732   68.5513   11.2178
In [8]:
using LinearAlgebra, SparseArrays
function pagerank(a,P,v)
    (I-a*P)\((1-a)*v)
end
n = 5
P = Matrix(sparse(1:n, mod.(1:n, n) .+ 1, 1.0, n, n)) # make a cycle matrix
P[1,2] = 0.5
P[1,3] = 0.5
P = P'
prgrad(a) = gradient(pagerank, a, P, ones(n)/5)
Out[8]:
prgrad (generic function with 1 method)
In [9]:
gradient_check(x->pagerank(x,P,ones(n)/5), 0.85, 
               x->prgrad(x)[1]).maxdiff
Output is an array, so the gradient is not defined. Perhaps you wanted jacobian.

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] sensitivity(y::Vector{Float64})
   @ Zygote ~/.julia/packages/Zygote/HdT4O/src/compiler/interface.jl:119
 [3] gradient(::Function, ::Float64, ::Vararg{Any})
   @ Zygote ~/.julia/packages/Zygote/HdT4O/src/compiler/interface.jl:154
 [4] prgrad(a::Float64)
   @ Main ./In[8]:10
 [5] #14
   @ ./In[9]:2 [inlined]
 [6] gradient_check(f::var"#13#15", x::Float64, g::var"#14#16")
   @ Main ./In[2]:13
 [7] top-level scope
   @ In[9]:1
In [10]:
using LinearAlgebra, SparseArrays
function pagerank(a,P,v)
    (I-a*P)\((1-a)*v)
end
n = 5
P = Matrix(sparse(1:n, mod.(1:n, n) .+ 1, 1.0, n, n)) # make a cycle matrix
P[1,2] = 0.5
P[1,3] = 0.5
P = P'
prgrad(a) = jacobian(pagerank, a, P, ones(n)/5)
Out[10]:
prgrad (generic function with 1 method)
In [58]:
[(pagerank(0.75+sqrt(eps(1.0)),P,ones(n)/5) - 
    pagerank(0.75,P,ones(n)/5))/sqrt(eps(1.0))  prgrad(0.75)[1]]
Out[58]:
5×2 Matrix{Float64}:
  0.0393128   0.0393128
 -0.0797884  -0.0797884
 -0.0105278  -0.0105278
  0.0180326   0.0180326
  0.0329708   0.0329708
In [30]:
function f2(x) 
    n = length(x) ÷ 2
    alpha = x[1:n]
    sigma = x[n+1:end]
    f(alpha, sigma)
end 
hessian_f(alpha,sigma) = hessian(f, [alpha, sigma])
hessian_f(alpha_test, sigma_test)
MethodError: no method matching f(::Vector{ForwardDiff.Dual{Nothing, Float64, 2}})
Closest candidates are:
  f(::Any, ::Any) at In[5]:3

Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/Zygote/g2w9o/src/compiler/interface2.jl:0 [inlined]
  [2] _pullback(ctx::Zygote.Context{false}, f::typeof(f), args::Vector{ForwardDiff.Dual{Nothing, Float64, 2}})
    @ Zygote ~/.julia/packages/Zygote/g2w9o/src/compiler/interface2.jl:9
  [3] pullback(f::Function, cx::Zygote.Context{false}, args::Vector{ForwardDiff.Dual{Nothing, Float64, 2}})
    @ Zygote ~/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:44
  [4] pullback
    @ ~/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:42 [inlined]
  [5] gradient(f::Function, args::Vector{ForwardDiff.Dual{Nothing, Float64, 2}})
    @ Zygote ~/.julia/packages/Zygote/g2w9o/src/compiler/interface.jl:96
  [6] (::Zygote.var"#114#115"{typeof(f)})(x::Vector{ForwardDiff.Dual{Nothing, Float64, 2}})
    @ Zygote ~/.julia/packages/Zygote/g2w9o/src/lib/grad.jl:64
  [7] forward_jacobian(f::Zygote.var"#114#115"{typeof(f)}, x::Vector{Float64}, #unused#::Val{2})
    @ Zygote ~/.julia/packages/Zygote/g2w9o/src/lib/forward.jl:29
  [8] forward_jacobian(f::Function, x::Vector{Float64}; chunk_threshold::Int64)
    @ Zygote ~/.julia/packages/Zygote/g2w9o/src/lib/forward.jl:44
  [9] forward_jacobian
    @ ~/.julia/packages/Zygote/g2w9o/src/lib/forward.jl:42 [inlined]
 [10] hessian_dual
    @ ~/.julia/packages/Zygote/g2w9o/src/lib/grad.jl:64 [inlined]
 [11] hessian
    @ ~/.julia/packages/Zygote/g2w9o/src/lib/grad.jl:62 [inlined]
 [12] hessian_f(alpha::Float64, sigma::Float64)
    @ Main ./In[30]:7
 [13] top-level scope
    @ In[30]:8
In [ ]:
hessian_f(alpha_test, sigma_test)
In [31]:
function simplef(alphas)
    sum(exp.(-(alphas.^2) ./ sigmas))
end
sigmas = rand(10)
hessian_f(alphas) = hessian(simplef, alphas)
alphas = randn(10)
hessian_f(alphas)
Out[31]:
10×10 Matrix{Float64}:
  1.7684  -0.0     0.0      -0.0      …  -0.0      -0.0       0.0
 -0.0     -6.4505  0.0      -0.0         -0.0      -0.0       0.0
 -0.0     -0.0     2.18349  -0.0         -0.0      -0.0       0.0
 -0.0     -0.0     0.0       0.62219     -0.0      -0.0       0.0
 -0.0     -0.0     0.0      -0.0         -0.0      -0.0       0.0
 -0.0     -0.0     0.0      -0.0      …  -0.0      -0.0       0.0
 -0.0     -0.0     0.0      -0.0         -0.0      -0.0       0.0
 -0.0     -0.0     0.0      -0.0         -2.40502  -0.0       0.0
 -0.0     -0.0     0.0      -0.0         -0.0       0.102418  0.0
 -0.0     -0.0     0.0      -0.0         -0.0      -0.0       1.94047
In [ ]: