In this workbook, we'll see a few ideas for doing derivative computations. You'll see more on the homework.
The first type of derivative computation is a source code transform. This is called automatic differentiation, and consists of taking in a program and outputing a program. We won't talk too much about this idea, but it's used frequently in optimization. Purdue Prof. Alex Pothen is currently doing research on these ideas.
Julia has a package to do this (actually, a few).
using Pkg
Pkg.add("ReverseDiff")
using ReverseDiff
using Plots
using Interact
using ReverseDiff: gradient
myf(x,y) = (1.0.-x).^2 + 100(y.-x.^2).^2
gradient(myf, ([3.0],[5.0]))
myf2(x,y) = sin(x).*(1.0.-x).^2 + 100(y.-x.^2).^2
gradient(myf2, ([3.0],[5.0]))
sin.([5.0])
@show gradient(myf, ([-1.0],[0.0]))
@show gradient(myf, ([1.0],[0.0]))
@show gradient(myf, ([0.0],[-1.0]))
@show gradient(myf, ([0.0],[1.0]))
;
using ReverseDiff: hessian
myf(x,y) = (1.0.-x).^2 + 100(y.-x.^2).^2
hessian(myf, ([1.0],[1.0]))
myf1(x) = (1.0.-x[1]).^2 + 100(x[2].-x[1].^2).^2
hessian(myf1, ([1.0,1.0]))
# There is a more efficient way to use these same ideas too. It involves a little more setup.
using ReverseDiff: GradientTape, compile
# this just needs the same.
const myf_tape = GradientTape(myf, (rand(1), rand(1)))
# compile `f_tape` into a more optimized representation
const compiled_myf_tape = compile(myf_tape)
using ReverseDiff: gradient!
gradient!(([1.0],[2.0]), compiled_myf_tape, ([1.0],[1.0]))
#=
more generally, this gives you a way to _allocate_ the output a head of time. This is important for performance because you don't want to be creating and deleting memory all the time. (Allocating and dealing with memory is a _fast_ operation, but is slower than pure arithmetic.)
=#
output = (randn(1),randn(1))
input = ([0.0],[1.0])
gradient!(output, compiled_myf_tape, input)
# We can 'reuse' output.
output = (randn(1),randn(1))
input = ([0.0],[1.0])
map( x->gradient!(output, compiled_myf_tape, x), [([0.0],[1.0]),([0.0],[-1.0])])
# What goes wrong here? That isn't right! It's our 'reuse' of output. We are storing 2 pointers to the output!
map( x-> println(gradient!(output, compiled_myf_tape, x)), [([0.0],[1.0]),([0.0],[-1.0])])
# But these compiled versions are specific to the sizes!
using ReverseDiff: gradient!
gradient!(([0.0,0.0],[0.0,0.0]), compiled_myf_tape, ([1.0,2.0],[1.0,3.0]))
Pkg.add("XGrad")
using XGrad
xdiff( :(exp(x^2)*(sin(100x)^2)) , x=1.)
You should be able to directly evaluate these functions, but I couldn't get that working before class.
# This should work, but as of 2016-02-23, it doesn't :(
# rosenbrock(x) = (1 - x[1])^2 + 100(x[2] - x[1]^2)^2
# rdiff( rosenbrock , x=ones(2)) # 'x=2.' indicates the type of x to rdiff
rosenbrock(x,y) = (1 - x)^2 + 100(y - x^2)^2
drosen = xdiff(rosenbrock; x=1.,y=1.)
We can also use a finite difference approximation. The idea is that the derivative is formally $$ f'(x) = \lim_{h\to 0} \frac{f(x+h) - f(x)}{h} $$ and so we should be able to get a good approximation by using $$ f'(x) \approx \frac{f(x+h) - f(x)}{h} $$ with $h$ very small
using Plots
using LinearAlgebra
h = 0.01
expderiv(x) = (exp.(x+h)-exp.(x))./h
xx = collect(range(-2,stop=2,length=1000))
plot(xx,exp.(xx))
plot!(xx,expderiv.(xx))
title!(string(norm(exp.(xx) - expderiv.(xx),Inf)))
using Blink,Interact
ui = @manipulate for h=collect(10 .^ range(-16,stop=-1,length=16))
expderiv(x) = (exp.(x+h)-exp.(x))./h
xx = collect(range(-2,stop=2,length=1000))
plot(xx,exp.(xx))
plot!(xx,expderiv.(xx))
title!(string(norm(exp.(xx) - expderiv.(xx),Inf)))
end
w = Window()
body!(w,ui)