What is ResNet? Build ResNet from Scratch with Python

Contents

This article was published as part of the Data Science Blogathon

Introduction

There have been a number of advancements in the field of deep learning and computer vision. Especially with the introduction of very deep convolutional neural networks, These models helped achieve state-of-the-art results on problems such as image recognition and image classification.

Then, over the years, deep learning architectures got deeper and deeper (adding more layers) to solve increasingly complex tasks, which also helped to improve the performance of classification and recognition tasks and also to make them robust.

But when we keep adding more layers to the neural network, it becomes much more difficult to train and the accuracy of the model starts to saturate and then also degrades. Here comes the ResNet to rescue us from that scenario and help solve this problem.

What is ResNet?

Residual Network (ResNet) is one of the famous deep learning models introduced by Shaoqing Ren, Kaiming He, Jian Sun and Xiangyu Zhang in their article. The document was named “Deep residual learning for image recognition”. [1] in 2015. The ResNet model is one of the most popular and successful deep learning models so far.

Residual blocks

The problem of training very deep networks has been alleviated with the introduction of these residual blocks and the ResNet model is made up of these blocks.

63071n1-8299125
Source: ‘Deep residual learning for image recognition‘ paper

The problem of training very deep networks has been alleviated with the introduction of these residual blocks and the ResNet model is made up of these blocks.

In the figure above, the first thing we can notice is that there is a direct connection that omits some layers of the model. This connection is called “jump connection” and it's the heart of the residual blocks. The output is not the same due to this jump connection. Without the jump connection, the 'X entry is multiplied by the layer weights followed by adding a skew term.

Then comes the activation function, f () and we get the output as H (x).

H (x) = f (wx + b) o H (x) = f (x)

Now, with the introduction of a new jump connection technique, the output is H (x) it changes to

H (x) = f (x) + x

But the dimension of the inlet may vary from that of the outlet, what could happen to a convolutional layer or grouped layers. Therefore, this problem can be handled with these two approaches:

· Zero is padded with the jump connection to increase its dimensions.

· Convolutional layers are added 1 × 1 at the entrance to match the dimensions. In that case, the output is:

H (x) = f (x) + w1.x

Here an extra parameter w1 is added while no extra parameter is added when using the first approach.

This bypassing technique in ResNet solves the problem of gradient disappearance in deep CNNs by allowing an alternate shortcut path for the gradient to flow.. What's more, bypass connection helps if any layer hurts architecture performance, then it will be skipped by regularization.

ResNet architecture

There is a simple network of 34 layers in architecture that is inspired by VGG-19 in which direct access connection or hop connections are added. These hop connections or residual blocks then convert the architecture into the residual network as shown in the figure below.

28984n2-1814961

Source: ‘Deep residual learning for image recognition‘ paper

Using ResNet with Keras:

Keras is an open source deep learning library capable of running on top of TensorFlow. Keras Applications provides the following versions of ResNet.

– ResNet50

– ResNet50V2

– ResNet101

– ResNet101V2

– ResNet152

– ResNet152V2

Let's build ResNet from scratch:

78567n3-1485933

Source: ‘Deep residual learning for image recognition‘ paper

Let's keep the image above for reference and start building the network..

ResNet architecture uses CNN blocks multiple times, so let's create a class for the CNN block, which takes input channels and output channels. There is a batchnorm2d after each layer of conv.

import torch
import torch.nn as nn
class block(nn.Module):
    def __init__(
        self, in_channels, intermediate_channels, identity_downsample=None, stride=1
    ):
        super(block, self).__init__()
        self.expansion = 4
        self.conv1 = nn.Conv2d(
            in_channels, intermediate_channels, kernel_size=1, stride=1, padding=0, bias=False
        )
        self.bn1 = nn.BatchNorm2d(intermediate_channels)
        self.conv2 = nn.Conv2d(
            intermediate_channels,
            intermediate_channels,
            kernel_size=3,
            stride=stride,
            padding=1,
            bias=False
        )
        self.bn2 = nn.BatchNorm2d(intermediate_channels)
        self.conv3 = nn.Conv2d(
            intermediate_channels,
            intermediate_channels * self.expansion,
            kernel_size=1,
            stride=1,
            padding=0,
            bias=False
        )
        self.bn3 = nn.BatchNorm2d(intermediate_channels * self.expansion)
        self.relu = nn.ReLU()
        self.identity_downsample = identity_downsample
        self.stride = stride

    def forward(self, x):
        identity = x.clone()

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.conv3(x)
        x = self.bn3(x)

        if self.identity_downsample is not None:
            identity = self.identity_downsample(identity)

        x += identity
        x = self.relu(x)
        return x

Later, create a ResNet class that takes input from multiple blocks, covers, image channels and the number of classes.

In the following code, the ‘_make_layer function’
create the ResNet layers, which takes the input of blocks, the number of residuals
blocks, output channel and strides.

class ResNet(nn.Module):
    def __init__(self, block, layers, image_channels, num_classes):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(image_channels, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# Essentially the entire ResNet architecture are in these 4 lines below
self.layer1 = self._make_layer(
block, layers[0], intermediate_channels=64, stride=1
)
self.layer2 = self._make_layer(
block, layers[1], intermediate_channels=128, stride=2
)
self.layer3 = self._make_layer(
block, layers[2], intermediate_channels=256, stride=2
)
self.layer4 = self._make_layer(
block, layers[3], intermediate_channels=512, stride=2
)

self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * 4, num_classes)

def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)

x = self.avgpool(x)
x = x.reshape(x.shape[0], -1)
x = self.fc(x)

return x

def _make_layer(self, block, num_residual_blocks, intermediate_channels, stride):
identity_downsample = None
layers = []

# Either if we half the input space for ex, 56x56 -> 28x28 (stride=2), or channels changes
# we need to adapt the Identity (skip connection) so it will be able to be added
# to the layer that's ahead


if stride != 1 or self.in_channels != intermediate_channels * 4:
identity_downsample = nn.Sequential(
nn.Conv2d(
self.in_channels,
intermediate_channels * 4,
kernel_size=1,
stride=stride,
bias=False
),
nn.BatchNorm2d(intermediate_channels * 4),
)

layers.append(
block(self.in_channels, intermediate_channels, identity_downsample, stride)
)

# The expansion size is always 4 for ResNet 50,101,152
self.in_channels = intermediate_channels * 4

# For example for first resnet layer: 256 will be mapped to 64 as intermediate layer,
# then finally back to 256. Hence no identity downsample is needed, since stride = 1,
# and also same amount of channels.
for i in range(num_residual_blocks - 1):
layers.append(block(self.in_channels, intermediate_channels))

return nn.Sequential (* covers)

Then define different versions of ResNet

For ResNet50, the sequence of layers is [3, 4, 6, 3].

For ResNet101, the sequence of layers is [3, 4, 23, 3].

For ResNet152, the sequence of layers is [3, 8, 36, 3]. (Ask the Deep residual learning for image recognition‘ paper)

def ResNet50(img_channel=3, num_classes=1000):
    return ResNet(block, [3, 4, 6, 3], img_channel, num_classes)
def ResNet101(img_channel=3, num_classes=1000):
return ResNet(block, [3, 4, 23, 3], img_channel, num_classes)


def ResNet152(img_channel=3, num_classes=1000):
return ResNet(block, [3, 8, 36, 3], img_channel, num_classes)

Later, write a little test code to check if the model is working fine.

def test():
    net = ResNet101(img_channel=3, num_classes=1000)
    device = "miracles" if torch.cuda.is_available() else "cpu"
    y = net(torch.randn(4, 3, 224, 224)).to(device)
    print(y.size())
test()

For the test case above, the output must be:

76052n4-1368070

The full code can be accessed here:

https://github.com/BakingBrains/Deep_Learning_models_implementation_from-scratch_using_pytorch_/blob/main/ResNet_.py

[1]. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: Residual deep learning for image recognition, December of 2015, DOI: https://arxiv.org/abs/1512.03385

Thanks.

Your suggestions and doubts are welcome here in the comments section. Thanks for reading my article!

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.