Convolutional neural network architecture | CNN architecture

Contents

This article was published as part of the Data Science Blogathon.

Introduction

Are you working on an image recognition or object detection project but didn't have the basics to build an architecture??

In this article, we will see what convolutional neural network architectures are from the basics and take a basic architecture as a case study to apply our learnings The only prerequisite is that you just need to know how convolution works. worry it's very simple !!

Let's take a simple convolutional neural network,

90650dnn2-5115625

We will go in layers to get detailed information about this CNN.

First, there are some things to learn from the cape 1 What is it strides and padding, we will see each of them shortly with examples

Suppose this in the input matrix of 5 × 5 and a 3X3 matrix filter, for those who don't know what The filter is a set of weights in a matrix that is applied on an image or a matrix to obtain the required characteristics., search by convolution if it is the first time.

Note: We always take the sum or average of all values ​​while doing a convolution.

A filter can be of any depth, if a filter has a depth d, can go to a depth of d layers and convolve, namely, add all (weights x tickets) of d layers

69676dnn3-4639825

Here the entrance is of size 5 × 5 after applying a kernel or filters 3 × 3, a map of output characteristics of 3 × 3, so let's try to formulate this

65111screenshot20166-5880710

Then, the output height is formulated and the same with the width of or / p also …

Filling

While applying convolutions, we will not obtain the same output dimensions as the input dimensions, we will lose data on the edges, so we add a border of zeros and recalculate the convolution that covers all input values.

99433dnn4-3579501

We will try to formulate this,

65969screenshot20175-9136898

Here 2 is for two columns of zeros along with the height and width, and formulates the same for the width as well

Strides

Sometimes we do not want to capture all the data or information available so we skip some neighboring cells let us visualize it,

92373dnn5-6792110

Here the input matrix or image is of dimensions 5 × 5 with a filter 3 × 3 and a stride of 2 so every time we skip two columns and convolve, let's formulate this

21732screenshot20167-1634218

If the dimensions are in float, can take ceil () at the exit, namely (next near integer)

Here H refers to the height, so the output height is formulated and the same with the width of or / p also and here 2 is the stride value so you can make it like S in the formulas.

Grouping

In general terms, grouping refers to a small part, so here we take a small part of the input and try to take the average value called average pool or take a maximum value called maximum pool, so when grouping on an image, we are not getting all the values ​​we are taking a summarized value over all the present values !!!

54575dnn6-1331138

here, this is an example of maximum grouping, so here, taking a step of two, we are taking the maximum value present in the matrix

Trigger function

The activation function is a node that is placed at the end or between the neural networks. They help decide whether the neuron will fire or not.. We have different types of activation functions as in the figure above, but for this post, my focus will be on Rectified linear unit (resume)

54331dnn7-8546250

Don't drop your jaw, this is not so complex this function simply returns 0 if its value is negative, on the contrary, returns the same value you gave, nothing more than eliminates negative outputs and maintains values ​​between 0 Y + Infinity

Now that we have learned all the necessary basics, Let's study a basic neural network called LeNet.

LeNet-5

Before we begin we will see what are the architectures designed to date. These models were tested on ImageNet data where we have over a million images and 1000 classes to predict

88265dnn8-8570398

LeNet-5 is a very basic architecture so anyone can start with advanced architectures

59467dnn9-1012443

What are the inputs and outputs (Front cover 0 and Layer N):

Here we are predicting digits based on the given input image, note that here the image has the dimensions of height = 32 pixels, width = 32 pixels and a depth of 1, so we can assume it is a grayscale or black and white image, Taking into account that the output is a softmax of the 10 values, here softmax gives probabilities or reasons for all 10 digits, we can take the number as the output with the highest probability or reason.

Convolución 1 (Front cover 1):

28844screenshot20168-6310615

Here we are taking input and convolving with size filters 5 x 5, thus producing an output of size 28 x 28 Check the above formula to calculate the output dimensions, what here is that we have taken 6 filters of this type and, Thus, the depth of conv1 is 6, Thus, its dimensions were 28 x 28 x 6 now pass this to grouping layer

Grouping 1 (Front cover 2):

59702screenshot20170-3922840

Here we are taking 28 x 28 x 6 as input and applying the average combination of a matrix of 2 × 2 and a step from 2, namely, placing an array of 2 x 2 at the input and taking the average of all those four pixels and jumping with a jump of 2 columns every time, what gives 14 x 14 x 6 as a way out, we are calculating the grouping for each layer, so here the output depth is 6

Convolución 2 (Front cover 3):

59083dnn10-2418903

Here we are taking the 14 x 14 x 6, namely, the o / py convolving with a size filter 5 x5, with a stride of 1, namely (no jumps), and with zero fillings, so we get an output of 10 x 10, now here we take 16 filters of this type of depth 6 and we convolve thus obtaining an output of 10 x 10 x 16

Grouping 2 (Front cover 4):

98064dnn11-1381778

Here we are taking the output from the previous layer and performing an average grouping with a step of 2, namely (skip two columns) and with a size filter 2 x 2, here we superimpose this filter on the layers of 10 x 10 x 16 so for each 10 x 10 we get outputs from 5 x 5, Thus, getting 5 x 5 x 16

Front cover (N-2) and Cape (N-1):

90366dnn12-6023324

Finally, we flatten all the values ​​of 5 x 5 x 16 to a single layer in size 400 and we input them into a forward feeding neural network of 120 neurons that have a weight matrix of size. [400,120] and a hidden layer of 84 neurons connected by 120 neurons with a weight matrix of [120,84] and you are 84 neurons are in fact connected to 10 output neurons

64119dnn13-6134443

These neurons o / p finalize the number predicted by softmaxing.

How does a convolutional neural network really work??

Works through weight sharing and poor connectivity,

69373screenshot20172-6445235

So here, as you can see the convolution has some weights these weights are shared by all input neurons, not each entry has a separate weight called a shared weight, Y not all input neurons are connected to the output neuron and only some that are convoluted are activated, what is known as poor connectivity, CNN is no different from feedforward neural networks, These two properties make them special!

Points to look at

1. After each convolution, the output is sent to a trigger function for better characteristics and to maintain positivity, for instance: resume

2. Poor connectivity and shared weight are the main reason for a convolutional neural network to work.

3. The concept of choosing a series of filters between the layers and the padding and the dimensions of the stride and the filter is taken by conducting a series of experiments, do not worry about it, focus on building the foundation, one day you will do those experiments and build a more productive !!!

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.