CNN for deep learning | Convolutional neural networks

Contents

Introduction

In the last decades, Deep Learning has proven to be a very powerful tool due to its ability to handle large amounts of data. The interest in using hidden layers has exceeded traditional techniques, especially in pattern recognition. One of the most popular deep neural networks is convolutional neural networks.

25366convolutional_neural_network_to_identify_the_image_of_a_bird-9411501

Since the decade of 1950, the early days of AI, researchers have struggled to create a system that can understand visual data. In the following years, this field became known as Computer Vision. In 2012, computer vision took a quantum leap when a group of researchers from the University of Toronto developed an artificial intelligence model that outperformed the best image recognition algorithms and that too by a wide margin.

The artificial intelligence system, which became known as AlexNet (named after its main creator, Alex Krizhevsky), won the ImageNet computer vision contest from 2012 with astonishing precision of 85 percent. The runner-up got a modest 74 percent on test.

At the heart of AlexNet were convolutional neural networks, a special type of neural network that roughly mimics human vision. Over the years, CNNs have become a very important part of many computer vision applications and, Thus, in a part of any computer vision course online. So let's take a look at how CNN works.

CNN Background

CNNs were first developed and used around the decade of 1980. The most a CNN could do at the time was recognize handwritten digits. It was mainly used in the postal sectors to read postal codes, pin codes, etc. The important thing to remember about any deep learning model is that it requires a lot of data to train and it also requires a lot of computing resources.. This was a major inconvenience for CNN in that period and, Thus, CNNs were only limited to the postal sectors and were unable to enter the world of machine learning.

719641_uaeanqioqpqwznnuh-veyw-3445475

In 2012, Alex Krizhevsky realized that the time had come to recover the branch of deep learning that uses multilayer neural networks. The availability of large data sets, to be more specific ImageNet data sets with millions of tagged images and an abundance of computing resources, allowed researchers to revive CNN.

What exactly is a CNN?

In deep learning, a red neuronal convolucional (CNN / ConvNet) it's a kind of deep neural networks, most commonly applied to analyze visual images. Now, when we think of a neural network, we think of matrix multiplications, but that is not the case with ConvNet. Uses a special technique called convolution. Now in math convolution is a mathematical operation on two functions that produces a third function that expresses how the shape of one is modified by the other.

183560_qcmbdpukpdviccdd-3226743

But we don't really need to go beyond the math part to understand what a CNN is or how it works..

The bottom line is that the role of ConvNet is reduce images to a shape that is easier to process, without losing characteristics that are critical to obtain a good prediction.

How does it work?

Before going to CNN operation, let's cover the basics, like what is an image and how is it represented. An RGB image is nothing more than an array of pixel values ​​that has three planes, whereas a grayscale image is the same but has only one plane. Take a look at this image to understand more.

306461_15ydvgkv47a0nkf5qlkooq-2118153

To simplify, let's move on with grayscale images as we try to understand how CNN works.

750710_qs1arbeujjjysxhe-5112900

The image above shows what a convolution is. We take a filter / core (matrix of 3 × 3) and we apply it to the input image to get the convolved function. This convolved feature is passed to the next layer.

419681_gci7g-jlaqieocon7xfbhg-3281682

In the case of RGB color, the channel take a look at this animation to understand how it works.

556091_cidgqejviwlncbmx-eesra-9517452

Convolutional neural networks are composed of multiple layers of artificial neurons. Artificial neurons, a rough imitation of their biological counterparts, are mathematical functions that calculate the weighted sum of multiple inputs and outputs of a trigger value. When you enter an image in a ConvNet, each layer generates several activation functions that are passed to the next layer.

The first layer usually extracts basic features like horizontal or diagonal edges. This output is passed to the next layer, that detects more complex features, as combinational corners or edges. As we enter the web, we can identify even more complex features, as objects, faces, etc.

52794neural-networks-layers-visualization-5070901

According to the activation map of the final convolution layer, the classification layer generates a set of confidence scores (values ​​between 0 Y 1) that specify the probability that the image belongs to a “class”. For instance, if you have a ConvNet that detects cats, dogs and horses, the output of the final layer is the possibility that the input image contains one of these animals.

95438neural-networks-deep-learning-artificial-intelligence-9547578

What is a grouping layer?

Similar to the convolutional layer, the grouping layer is responsible for reducing the spatial size of the convolved entity. This is for Decrease the computational power required to process data. reducing dimensions. There are two types of grouping average grouping and maximum grouping. I have only had experience with Max Pooling so far and have not faced any difficulties.

254781_uowyscv5vbu8shfpapao-w-5502975

Then, what we do in Max Pooling is to find the maximum value of a pixel of a part of the image covered by the kernel. Max Pooling also works as Noise suppressor. It rules out noisy triggers entirely and also performs denoising along with dimensionality reduction.

Besides, Average grouping return the average of all values of the part of the image covered by the kernel. Average grouping simply performs dimensionality reduction as a noise suppression mechanism. Therefore, we can say that Maximum pool works much better than average pool.

597371_kqieqhxzicu7thjaqbfpbq-1663900

Limitations

Despite the power and complexity of CNN resources, provide detailed results. At the root of it all, it is simply about recognizing patterns and details that are so tiny and inconspicuous that they go unnoticed by the human eye. But when it comes to understanding image content fails.

Let's take a look at this example. When we pass the image below to a CNN, detects a person around 30 years and a child probably around 10 years. But when we look at the same picture, we started thinking in multiple different scenarios. Maybe it's a father and son day, a picnic or maybe they're camping. Maybe it's a school ground and the boy scored a goal and his dad is happy so he picks it up.

19625father-son-having-good-time-park_23-2148684657-7308820

These limitations are more than evident when it comes to practical applications. For instance, CNNs were widely used to moderate content on social media. But despite the vast image and video resources they were trained on, you still can't completely block and remove inappropriate content. As it turns out you marked a statue of 30.000 years with nudity on Facebook.

Several studies have shown that CNNs trained on ImageNet and other popular datasets do not detect objects when viewed under different lighting conditions and from new angles..

Does this mean that CNN is useless? But nevertheless, despite the limits of convolutional neural networks, there is no denying that they have caused a revolution in artificial intelligence. Today, CNNs are used in many machine vision applications like facial recognition, search and edit images, augmented reality and more. As advances in convolutional neural networks show, our achievements are remarkable and useful, but we're still a long way from replicate key components of human intelligence.

51026tenor201-5962931

Thank you for reading! If you enjoyed reading this article, please share to help others find it! Feel free to leave a comment 💬 below. You can connect with me at GitHub, LinkedIn

Do you have comments? Let's be friends in Twitter.

All the best and happy coding! 😀

The media shown in this article is not the property of Analytics Vidhya and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.