Pytorch tutorial | Deep learning with Pytorch

Contents

Introduction

Occasionally, a Python library is developed that has the potential to change the landscape in the field of deep learning. PyTorch is one of those libraries.

In the last weeks, I've been dabbling in PyTorch a bit. I was impressed by how easy it is to understand. Among the various deep learning frameworks that I have used to date, PyTorch has been the most flexible and effortless of all.

pytorch-logo-flat-300x210-3691924

In this article, we will explore PyTorch with a more practical approach, covering the basics along with a case study. We will also compare a neural network built from scratch in both numpy and PyTorch to see their similarities in implementation..

Let's move on!

Note: this article assumes you have a basic understanding of deep learning. If you want to catch up on deep learning, read this article first.

What's more, if you want a more detailed explanation of PyTorch from scratch, understand how tensioners work, how you can perform math and matrix operations with PyTorch, I highly recommend that you check out A Beginner's Guide to PyTorch and How It Works From Scratch.

Table of Contents

  • An overview of PyTorch
  • Dive into the technicalities
  • Building a neural network in Numpy vs. PyTorch
  • Comparison with other deep learning libraries
  • Case study: solving an image recognition problem with PyTorch

If you prefer to address the following concepts in a structured format, You can enroll in this free course at PyTorch and follow them by chapters.

An overview of PyTorch

The creators of PyTorch say they have a philosophy: they want to be imperative. This means that we run our calculation immediately. This fits perfectly into the Python programming methodology, What we don't have to wait for all the code to be written before we know if it works or not. We can easily execute a part of the code and inspect it in real time. For me, as a neural network debugger, This is a blessing!

PyTorch is a Python-based library created to provide flexibility as a deep learning development platform. PyTorch workflow is the closest thing to Python's scientific computing library: numpy.

Now i might ask, Why would we use PyTorch to build deep learning models? I can list three things that could help answer that:

  • Easy to use API – It is as simple as Python can be.
  • Python support – As mentioned earlier, PyTorch integrates seamlessly with the Python data science stack. It's so similar to numpy that you might not even notice the difference.
  • Dynamic calculation graphs – Instead of predefined charts with specific functionalities, PyTorch provides a framework for us to build computational graphics as we go along, and we even change them during runtime. This is valuable for situations where we don't know how much memory it will take to create a neural network..

Some other advantages of using PyTorch are its multiGPU support, custom data loaders and simplified preprocessors.

Since its launch in early January 2016, many researchers have adopted it as a reference library due to its ease in creating novel and even extremely complex graphics. Having said that, there is still some time left before PyTorch is adopted by most data science professionals due to its new and “in construction”.

Dive into the technicalities

Before we dive into the details, let's see the PyTorch workflow.

PyTorch uses an imperative paradigm / anxious. Namely, each line of code required to build a chart defines a component of that chart. We can independently perform calculations on these components, even before your chart is fully built. This is called the “definition by execution”.

dynamic_graph-300x169-1039405

Source: http://pytorch.org/about/

Installing PyTorch is pretty easy. You can follow the steps mentioned in the official documents and run the command according to your system specifications. For instance, this was the command i used based on the options i chose:

3-300x144-2267249

conda install pytorch torchvision cuda91 -c pytorch

The main elements that we must know when starting with PyTorch are:

  • PyTorch tensioners
  • Mathematical operations
  • Self-grade module
  • Optim module and
  • modul nn

Then, we will take a look at each of them in some detail.

PyTorch tensioners

Tensors are nothing more than multidimensional matrices. The tensors in PyTorch are similar to the ndarrays in numpy, with the addition that the tensioners can also be used on a GPU. PyTorch compatible various types of tensioners. If you are familiar with other deep learning frameworks, must have also found tensors in TensorFlow. In fact, you can also implement the following tasks in Tensorflow and do your own comparison between PyTorch and TensorFlow.

You can define a simple one-dimensional array as shown below:

# import pytorch
import torch

# define a tensor
torch.FloatTensor([2])
 2
[torch.FloatTensor of size 1]

Mathematical operations

As with numpy, it is very important that a scientific computing library has efficient implementations of mathematical functions. PyTorch offers you a similar interface, with more of 200 mathematical operations you can use.

Below is an example of a simple addition operation in PyTorch:

a = torch.FloatTensor([2])
b = torch.FloatTensor([3])

a + b
 5
[torch.FloatTensor of size 1]

Doesn't this seem like a kinessential Python approach? We can also perform various matrix operations on the PyTorch tensors that we define. For instance, we will transpose a two-dimensional matrix:

matrix = torch.randn(3, 3)
matrix

0.7162 1.0152 1.1525
-0.3503 -0.9452 -1.0861
-0.1093 -0.0927 -0.0476
[torch.FloatTensor of size 3x3]

matrix.t()

0.7162 -0.3503 -0.1093
 1.0152 -0.9452 -0.0927
 1.1525 -1.0861 -0.0476
[torch.FloatTensor of size 3x3]

Self-grade module

PyTorch uses a technique called automatic differentiation. Namely, we have a recorder that records the operations we have performed and then plays them back to calculate our gradients. This technique is especially powerful when building neural networks, since we save time in an epoch when calculating the differentiation of the parameters in the direct pass.

4-4309123

Source: http://pytorch.org/about/

from torch.autograd import Variable

x = Variable(train_x)
y = Variable(train_y, requires_grad=False)

Optimal module

torch.optim is a module that implements various optimization algorithms used to build neural networks. Most of the most used methods are already supported, so we don't have to create them from scratch (Unless you want to!).

Below is the code to use an Adam optimizer:

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

modul nn

PyTorch autograd makes it easy to define computational graphics and take gradients, but the raw autograd may be too low a level to define complex neural networks. This is where the nn module can help.

The package nn defines a set of modules, which we can consider as a neural network layer that produces an output from the input and can have some trainable weights.

You can consider a module nn as the hard of PyTorch!

import torch

# define model
model = torch.nn.Sequential(
 torch.nn.Linear(input_num_units, hidden_num_units),
 torch.nn.ReLU(),
 torch.nn.Linear(hidden_num_units, output_num_units),
)
loss_fn = torch.nn.CrossEntropyLoss()

Now that you know the basic components of PyTorch, you can easily build your own neural network from scratch. Follow it if you want to know how!

Building a neural network in Numpy vs. PyTorch

I have mentioned above that PyTorch and Numpy are very similar. Lets see why. In this section, we will see an implementation of a simple neural network to solve a binary classification problem (you can read this article for detailed explanation).

## Neural network in numpy

import numpy as np

#Input array
X=np.array([[1,0,1,0],[1,0,1,1],[0,1,0,1]])

#Output
y=np.array([[1],[1],[0]])

#Sigmoid Function
def sigmoid (x):
 return 1/(1 + np.exp(-x))

#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
 return x * (1 - x)

#Variable initialization
epoch=5000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = X.shape[1] #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer

#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))

for i in range(epoch):
  #Forward Propogation
  hidden_layer_input1=np.dot(X,wh)
  hidden_layer_input=hidden_layer_input1 + bh
  hiddenlayer_activations = sigmoid(hidden_layer_input)
  output_layer_input1=np.dot(hiddenlayer_activations,road)
  output_layer_input= output_layer_input1+ bout
  output = sigmoid(output_layer_input)

  #Backpropagation
  E = y-output
  slope_output_layer = derivatives_sigmoid(output)
  slope_hidden_layer = derivatives_sigmoid(hiddenlayer_activations)
  d_output = E * slope_output_layer
  Error_at_hidden_layer = d_output.dot(wout.T)
  d_hiddenlayer = Error_at_hidden_layer * slope_hidden_layer
  wout += hiddenlayer_activations.T.dot(d_output) *lr
  bout += np.sum(d_output, axis=0,keepdims=True) *lr
  wh += X.T.dot(d_hiddenlayer) *lr
  bh + = np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr

print('actual :n', Y, 'n')
print('predicted :n', output)

Now, try to spot the difference in a super simple implementation of the same in PyTorch (the differences are mentioned in bold in the following code).

## neural network in pytorch
import torch

#Input array
X = torch.Tensor([[1,0,1,0],[1,0,1,1],[0,1,0,1]])

#Output
y = torch.Tensor([[1],[1],[0]])

#Sigmoid Function
def sigmoid (x):
  return 1/(1 + torch.exp(-x))

#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
  return x * (1 - x)

#Variable initialization
epoch=5000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = X.shape[1] #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer

#weight and bias initialization
wh=torch.randn(inputlayer_neurons, hiddenlayer_neurons).type(torch.FloatTensor)
bh=torch.randn(1, hiddenlayer_neurons).type(torch.FloatTensor)
road =torch.randn(hiddenlayer_neurons, output_neurons)
bout=torch.randn(1, output_neurons)

for i in range(epoch):

  #Forward Propogation
  hidden_layer_input1 = torch.mm(X, wh)
  hidden_layer_input = hidden_layer_input1 + bh
  hidden_layer_activations = sigmoid(hidden_layer_input)
 
  output_layer_input1 = torch.mm(hidden_layer_activations, road)
  output_layer_input = output_layer_input1 + bout
  output = sigmoid(output_layer_input1)

  #Backpropagation
  E = y-output
  slope_output_layer = derivatives_sigmoid(output)
  slope_hidden_layer = derivatives_sigmoid(hidden_layer_activations)
  d_output = E * slope_output_layer
  Error_at_hidden_layer = torch.mm(d_output, wout.t())
  d_hiddenlayer = Error_at_hidden_layer * slope_hidden_layer
  wout += torch.mm(hidden_layer_activations.t(), d_output) *lr
  bout += d_output.sum() *lr
  wh += torch.mm(X.t(), d_hiddenlayer) *lr
  bh += d_output.sum() *lr
 
print('actual :n', Y, 'n')
print('predicted :n', output)

Comparison with other deep learning libraries

In one benchmarking script, PyTorch has been successfully shown to outperform all other major deep learning libraries in training a long and short term memory network (LSTM) having the lowest median time per epoch (refer to the picture below).

APIs for data loading are well designed in PyTorch. Interfaces are specified in a data set, a sampler and a data loader.

When comparing tools for data loading in TensorFlow (readers, colas, etc.), I found PyTorchData loading modules are quite easy to use. What's more, PyTorch is perfect when trying to build a neural network, so we don't have to rely on third-party high-level libraries like keras.

Secondly, I still wouldn't recommend using PyTorch for implementation. PyTorch has yet to evolve. As the PyTorch developers have said, “What we are seeing is that users first create a PyTorch model. When they are ready to deploy their model into production, just turn it into a Caffe model 2 and then they send it to a mobile platform or another ".

Case study: Resolving an Image Recognition Problem in PyTorch

To get familiar with PyTorch, we will solve DataPeaker's deep learning practice problem: Identify the digits. Let's take a look at our problem statement:

Our problem is an image recognition problem, to identify digits of a given image of 28 x 28. We have a subset of images for training and the rest for testing our model.

First, download the train and test files. The dataset contains a compressed file of all images and both train.csv and test.csv are named after the corresponding train and test images. Additional features are not provided in the data sets, only raw images are provided in ‘.png’ format.

Let's start:

PASO 0: Getting ready

a) Import all necessary libraries

# import modules
%pylab inline
import os
import numpy as np
import pandas as pd
from scipy.misc import imread
from sklearn.metrics import accuracy_score

b) Let's set a seed value, so that we can control the randomness of our models

# To stop potential randomness
seed = 128
rng = np.random.RandomState(seed)

c) The first step is to set directory paths, For your safekeeping!

root_dir = os.path.abspath('.')
data_dir = os.path.join(root_dir, 'data')

# check for existence
os.path.exists(root_dir), os.path.exists(data_dir)

PASO 1: Data loading and preprocessing

a) Now let's read our data sets. These are in .csv formats and have a filename along with the corresponding tags.

# load dataset
train = pd.read_csv(os.path.join(data_dir, 'Train', 'train.csv'))
test = pd.read_csv(os.path.join(data_dir, 'Test.csv'))

sample_submission = pd.read_csv(os.path.join(data_dir, 'Sample_Submission.csv'))

train.head()
file name label
0 0.png 4
1 1.png 9
2 2.png 1
3 3.png 7
4 4.png 3

b) Let's see what our data looks like!! We read our image and show it.

# print an image
img_name = rng.choice(train.filename)
filepath = os.path.join(data_dir, 'Train', 'Images', 'train', img_name)

img = imread(filepath, flatten=True)

pylab.imshow(img, cmap='gray')
pylab.axis('off')
pylab.show()

3-4197017

d) For easier data manipulation, we store all our images as numpy arrays

# load images to create train and test set
temp = []
for img_name in train.filename:
  image_path = os.path.join(data_dir, 'Train', 'Images', 'train', img_name)
  img = imread(image_path, flatten=True)
  img = img.astype('float32')
  temp.append(img)
 
train_x = np.stack(temp)

train_x /= 255.0
train_x = train_x.reshape(-1, 784).astype('float32')

temp = []
for img_name in test.filename:
  image_path = os.path.join(data_dir, 'Train', 'Images', 'test', img_name)
  img = imread(image_path, flatten=True)
  img = img.astype('float32')
  temp.append(img)
 
test_x = np.stack(temp)

test_x /= 255.0
test_x = test_x.reshape(-1, 784).astype('float32')

train_y = train.label.values

e) How this is a typical AA problem, to test the correct operation of our model we create a validation set. Let's take a division size of 70:30 for the train set vs. the validation set

# create validation set
split_size = int(train_x.shape[0]*0.7)

train_x, val_x = train_x[:split_size], train_x[split_size:]
train_y, val_y = train_y[:split_size], train_y[split_size:]

PASO 2: Model building

a) Now comes the main part! Let's define our neural network architecture. We define a neural network with 3 input layers, hidden and exit. The number of input and output neurons is fixed, since the input is our image of 28 x 28 and the output is a vector of 10 x 1 what does the class represent. We take 50 neurons in the hidden layer. Here we use Adam like our optimization algorithms, which is an efficient variant of the Gradient Descent algorithm.

import torch
from torch.autograd import Variable
# number of neurons in each layer
input_num_units = 28*28
hidden_num_units = 500
output_num_units = 10

# set remaining variables
epochs = 5
batch_size = 128
learning_rate = 0.001

b) Time to train our model

# define model
model = torch.nn.Sequential(
  torch.nn.Linear(input_num_units, hidden_num_units),
  torch.nn.ReLU(),
  torch.nn.Linear(hidden_num_units, output_num_units),
)
loss_fn = torch.nn.CrossEntropyLoss()

# define optimization algorithm
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
## helper functions
# preprocess a batch of dataset
def preproc(unclean_batch_x):
  """Convert values to range 0-1"""
  temp_batch = unclean_batch_x / unclean_batch_x.max()
 
  return temp_batch

# create a batch
def batch_creator(batch_size):
  dataset_name="train"
  dataset_length = train_x.shape[0]
  
  batch_mask = rng.choice(dataset_length, batch_size)
  
  batch_x = eval(dataset_name + '_x')[batch_mask]
  batch_x = preproc(batch_x)
  
  if dataset_name == 'train':
    batch_y = eval(dataset_name).ix[batch_mask, 'label'].values
  
  return batch_x, batch_y
# train network
total_batch = int(train.shape[0]/batch_size)

for epoch in range(epochs):
  avg_cost = 0
  for i in range(total_batch):
    # create batch
    batch_x, batch_y = batch_creator(batch_size)

    # pass that batch for training
    x, y = Variable(torch.from_numpy(batch_x)), Variable(torch.from_numpy(batch_y), requires_grad=False)
    before = model(x)

    # get loss
    loss = loss_fn(pred, Y)

    # perform backpropagation
    loss.backward()
    optimizer.step()
    avg_cost += loss.data[0]/total_batch

  print(epoch, avg_cost)
# get training accuracy
x, y = Variable(torch.from_numpy(preproc(train_x))), Variable(torch.from_numpy(train_y), requires_grad=False)
before = model(x)

final_pred = np.argmax(pred.data.numpy(), axis=1)

accuracy_score(train_y, final_pred)
# get validation accuracy
x, y = Variable(torch.from_numpy(preproc(val_x))), Variable(torch.from_numpy(val_y), requires_grad=False)
before = model(x)
final_pred = np.argmax(pred.data.numpy(), axis=1)

accuracy_score(val_y, final_pred)

The training score turns out to be:

0.8779008746355685

while the validation score is:

0.867482993197279

This is quite an impressive score!, especially when we have trained a very simple neural network for only five epochs!

Final notes

I hope this article has given you an idea of ​​how the PyTorch framework can change the perspective of building deep learning models. In this article, we just scratched the surface. To deepen, you can read the documentation Y tutorials on PyTorch's own official page.

In the next articles, i will apply PyTorch for audio analysis and we will try to create deep learning models for speech processing. Stay tuned!

Have you used PyTorch to build an application or in any of your data science projects? Let me know in the comments below..

Learn, to compete, hack and get hired!

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.