This article was published as part of the Data Science Blogathon
Introduction
There have been a number of advancements in the field of deep learning and computer vision. Especially with the introduction of very deep convolutional neural networks, These models helped achieve state-of-the-art results on problems such as image recognition and image classification.
Then, over the years, deep learning architectures got deeper and deeper (adding more layers) to solve increasingly complex tasks, which also helped to improve the performance of classification and recognition tasks and also to make them robust.
But when we keep adding more layers to the neural network, it becomes much more difficult to train and the accuracy of the model starts to saturate and then also degrades. Here comes the ResNet to rescue us from that scenario and help solve this problem.
What is ResNet?
Residual Network (ResNet) is one of the famous deep learning models introduced by Shaoqing Ren, Kaiming He, Jian Sun and Xiangyu Zhang in their article. The document was named “Deep residual learning for image recognition”. [1] in 2015. The ResNet model is one of the most popular and successful deep learning models so far.
Residual blocks
The problem of training very deep networks has been alleviated with the introduction of these residual blocks and the ResNet model is made up of these blocks.
The problem of training very deep networks has been alleviated with the introduction of these residual blocks and the ResNet model is made up of these blocks.
In the figure above, the first thing we can notice is that there is a direct connection that omits some layers of the model. This connection is called “jump connection” and it's the heart of the residual blocks. The output is not the same due to this jump connection. Without the jump connection, the 'X entry is multiplied by the layer weights followed by adding a skew term.
Then comes the activation function, f () and we get the output as H (x).
H (x) = f (wx + b) o H (x) = f (x)
Now, with the introduction of a new jump connection technique, the output is H (x) it changes to
H (x) = f (x) + x
But the dimension of the inlet may vary from that of the outlet, what could happen to a convolutional layer or grouped layers. Therefore, this problem can be handled with these two approaches:
· Zero is padded with the jump connection to increase its dimensions.
· Convolutional layers are added 1 × 1 at the entrance to match the dimensions. In that case, the output is:
H (x) = f (x) + w1.x
Here an extra parameter w1 is added while no extra parameter is added when using the first approach.
This bypassing technique in ResNet solves the problem of gradient disappearance in deep CNNs by allowing an alternate shortcut path for the gradient to flow.. What's more, bypass connection helps if any layer hurts architecture performance, then it will be skipped by regularization.
ResNet architecture
There is a simple network of 34 layers in architecture that is inspired by VGG-19 in which direct access connection or hop connections are added. These hop connections or residual blocks then convert the architecture into the residual network as shown in the figure below.
Source: ‘Deep residual learning for image recognition‘ paper
Using ResNet with Keras:
Keras is an open source deep learning library capable of running on top of TensorFlow. Keras Applications provides the following versions of ResNet.
– ResNet50
– ResNet50V2
– ResNet101
– ResNet101V2
– ResNet152
– ResNet152V2
Let's build ResNet from scratch:
Source: ‘Deep residual learning for image recognition‘ paper
Let's keep the image above for reference and start building the network..
ResNet architecture uses CNN blocks multiple times, so let's create a class for the CNN block, which takes input channels and output channels. There is a batchnorm2d after each layer of conv.
import torch import torch.nn as nn
class block(nn.Module): def __init__( self, in_channels, intermediate_channels, identity_downsample=None, stride=1 ): super(block, self).__init__() self.expansion = 4 self.conv1 = nn.Conv2d( in_channels, intermediate_channels, kernel_size=1, stride=1, padding=0, bias=False ) self.bn1 = nn.BatchNorm2d(intermediate_channels) self.conv2 = nn.Conv2d( intermediate_channels, intermediate_channels, kernel_size=3, stride=stride, padding=1, bias=False ) self.bn2 = nn.BatchNorm2d(intermediate_channels) self.conv3 = nn.Conv2d( intermediate_channels, intermediate_channels * self.expansion, kernel_size=1, stride=1, padding=0, bias=False ) self.bn3 = nn.BatchNorm2d(intermediate_channels * self.expansion) self.relu = nn.ReLU() self.identity_downsample = identity_downsample self.stride = stride def forward(self, x): identity = x.clone() x = self.conv1(x) x = self.bn1(x) x = self.relu(x) x = self.conv2(x) x = self.bn2(x) x = self.relu(x) x = self.conv3(x) x = self.bn3(x) if self.identity_downsample is not None: identity = self.identity_downsample(identity) x += identity x = self.relu(x) return x
Later, create a ResNet class that takes input from multiple blocks, covers, image channels and the number of classes.
In the following code, the ‘_make_layer function’
create the ResNet layers, which takes the input of blocks, the number of residuals
blocks, output channel and strides.
class ResNet(nn.Module): def __init__(self, block, layers, image_channels, num_classes): super(ResNet, self).__init__() self.in_channels = 64 self.conv1 = nn.Conv2d(image_channels, 64, kernel_size=7, stride=2, padding=3, bias=False) self.bn1 = nn.BatchNorm2d(64) self.relu = nn.ReLU() self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# Essentially the entire ResNet architecture are in these 4 lines below self.layer1 = self._make_layer( block, layers[0], intermediate_channels=64, stride=1 ) self.layer2 = self._make_layer( block, layers[1], intermediate_channels=128, stride=2 ) self.layer3 = self._make_layer( block, layers[2], intermediate_channels=256, stride=2 ) self.layer4 = self._make_layer( block, layers[3], intermediate_channels=512, stride=2 ) self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) self.fc = nn.Linear(512 * 4, num_classes) def forward(self, x): x = self.conv1(x) x = self.bn1(x) x = self.relu(x) x = self.maxpool(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = self.avgpool(x) x = x.reshape(x.shape[0], -1) x = self.fc(x) return x def _make_layer(self, block, num_residual_blocks, intermediate_channels, stride): identity_downsample = None layers = [] # Either if we half the input space for ex, 56x56 -> 28x28 (stride=2), or channels changes # we need to adapt the Identity (skip connection) so it will be able to be added # to the layer that's ahead if stride != 1 or self.in_channels != intermediate_channels * 4: identity_downsample = nn.Sequential( nn.Conv2d( self.in_channels, intermediate_channels * 4, kernel_size=1, stride=stride, bias=False ), nn.BatchNorm2d(intermediate_channels * 4), ) layers.append( block(self.in_channels, intermediate_channels, identity_downsample, stride) ) # The expansion size is always 4 for ResNet 50,101,152 self.in_channels = intermediate_channels * 4 # For example for first resnet layer: 256 will be mapped to 64 as intermediate layer, # then finally back to 256. Hence no identity downsample is needed, since stride = 1, # and also same amount of channels. for i in range(num_residual_blocks - 1): layers.append(block(self.in_channels, intermediate_channels))
return nn.Sequential (* covers)
Then define different versions of ResNet
– For ResNet50, the sequence of layers is [3, 4, 6, 3].
– For ResNet101, the sequence of layers is [3, 4, 23, 3].
– For ResNet152, the sequence of layers is [3, 8, 36, 3]. (Ask the ‘Deep residual learning for image recognition‘ paper)
def ResNet50(img_channel=3, num_classes=1000): return ResNet(block, [3, 4, 6, 3], img_channel, num_classes)
def ResNet101(img_channel=3, num_classes=1000): return ResNet(block, [3, 4, 23, 3], img_channel, num_classes) def ResNet152(img_channel=3, num_classes=1000): return ResNet(block, [3, 8, 36, 3], img_channel, num_classes)
Later, write a little test code to check if the model is working fine.
def test(): net = ResNet101(img_channel=3, num_classes=1000) device = "miracles" if torch.cuda.is_available() else "cpu" y = net(torch.randn(4, 3, 224, 224)).to(device) print(y.size())
test()
For the test case above, the output must be:
The full code can be accessed here:
https://github.com/BakingBrains/Deep_Learning_models_implementation_from-scratch_using_pytorch_/blob/main/ResNet_.py
[1]. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: Residual deep learning for image recognition, December of 2015, DOI: https://arxiv.org/abs/1512.03385
Thanks.
Your suggestions and doubts are welcome here in the comments section. Thanks for reading my article!