This is a pyTorch Tutorial for Purdue CS57300

You will need to install pytorch.
At the scholar cluster, just invoke
module load anaconda/5.0.0-py36
and then invoke
conda create -n torch --clone="/apps/rhel6/Anaconda/5.0.0-py36"
followed by
source activate torch
and finally
conda install pytorch torchvision cuda80 -c soumith

Whenever you want to use pytorch again, you just call
source activate torch
before running python3. Note that this is using python3.6.

In [1]:
# This imports pytorch
import torch 

# Imports numpy
import numpy as np

#Optimizer
import torch.optim as optim

Construct Matrices

Say, you want to construct a $5\times 3$ matrix (uninitialized)

In [2]:
x = torch.Tensor(5, 3)
print(x)
 0.0000e+00  3.6893e+19  0.0000e+00
 3.6893e+19  7.0065e-45  0.0000e+00
 0.0000e+00  0.0000e+00  0.0000e+00
 0.0000e+00  0.0000e+00  0.0000e+00
 0.0000e+00  0.0000e+00  0.0000e+00
[torch.FloatTensor of size 5x3]

Construct a randomly (uniformly) initialized matrix

In [3]:
x = torch.rand(5, 3)
print(x)
 0.5717  0.2725  0.5575
 0.4839  0.5469  0.6992
 0.8449  0.3994  0.7515
 0.9654  0.4085  0.2314
 0.2241  0.2777  0.4896
[torch.FloatTensor of size 5x3]

Get the matrix size

In [4]:
print(x.size())
torch.Size([5, 3])

Matrix Operations

Matrix sum

In [5]:
y = torch.rand(5, 3)
print(x + y)
 1.2668  0.9799  0.6360
 1.2243  1.2588  1.1249
 1.2597  0.5122  1.4981
 1.6725  0.6720  0.5507
 0.7328  0.5926  1.0335
[torch.FloatTensor of size 5x3]

pyTorch is very similar to numpy

In [6]:
print(x[:, 1])
 0.2725
 0.5469
 0.3994
 0.4085
 0.2777
[torch.FloatTensor of size 5]

Move pyTorch matrix to numpy matrix

In [7]:
x_numpy = x.numpy()
print(x_numpy)
[[ 0.5717144   0.27254805  0.55749762]
 [ 0.48388916  0.54686373  0.69917142]
 [ 0.84490418  0.39944157  0.75153291]
 [ 0.96536344  0.40845931  0.23141801]
 [ 0.22409014  0.27771181  0.48964411]]

Numpy matrix to pyTorch matrix

In [8]:
a = np.ones(5)
b = torch.from_numpy(a)
print(b)
 1
 1
 1
 1
 1
[torch.DoubleTensor of size 5]

Using GPUs

Most pyTorch matrix operations can be performed on GPUs

In [9]:
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    z = x + y
    z_cpu_matrix = z.cpu()
    z_numpy_matrix = z_cpu_matrix.numpy()
    print(z)
else:
    print("No GPUs available")
No GPUs available

pyTorch neural networks

Using pyTorch we could construct a neural network the same way we would do with numpy, but using the .cuda() we can perform all operations in the GPU

However, pyTorch offers a variety of libraries that make our lives easier.

In [10]:
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)
Net (
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear (400 -> 120)
  (fc2): Linear (120 -> 84)
  (fc3): Linear (84 -> 10)
)

Do a first foward pass

In [11]:
input = Variable(torch.randn(1, 1, 32, 32))
out = net(input)
print(out)
Variable containing:
-0.1482  0.0038  0.0452  0.0244  0.0266 -0.0163 -0.0564  0.0259 -0.0398  0.0440
[torch.FloatTensor of size 1x10]

Loss Function

A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

There are several different loss functions under the nn package . A simple loss is: nn.MSELoss which computes the mean-squared error between the input and the target.

In [12]:
target = Variable(torch.arange(1, 11))  # a dummy target, for example
criterion = nn.MSELoss()

loss = criterion(out, target)
print(loss)
Variable containing:
 38.4990
[torch.FloatTensor of size 1]

Very important, need to zero the gradient buffer before we can use the computed gradients

In [13]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
conv1.bias.grad before backward
None
conv1.bias.grad after backward
Variable containing:
-0.0039
 0.0157
-0.1019
-0.0177
 0.0403
-0.0325
[torch.FloatTensor of size 6]

Now we have the gradients for each parameter of each layer Let's update the model following the opposite direction of the gradient (because we want to minimize the loss)

In [14]:
learning_rate = 0.01

print("Before")
print(net.conv1.bias)
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

print("After")
print(net.conv1.bias)
Before
Parameter containing:
-0.1321
 0.1946
-0.0708
 0.0138
-0.0841
-0.0981
[torch.FloatTensor of size 6]

After
Parameter containing:
-0.1320
 0.1945
-0.0698
 0.0139
-0.0845
-0.0978
[torch.FloatTensor of size 6]

Using the above procedure we are learning everything by hand...
pyTorch also offers more automated methods to learn the model parameters.

Let's perform one step with an SGD optimizer.

In [15]:
print("Before")
print(net.conv1.bias)

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

print("After")
print(net.conv1.bias)
Before
Parameter containing:
-0.1320
 0.1945
-0.0698
 0.0139
-0.0845
-0.0978
[torch.FloatTensor of size 6]

After
Parameter containing:
-0.1316
 0.1944
-0.0688
 0.0144
-0.0844
-0.0970
[torch.FloatTensor of size 6]

In [ ]:
 
In [ ]: