Puja Govind
Technical Content Writer
This website showcases the technical writing work I've done across various engineering fields. If interested in hiring me, please drop a mail to ambalgekarpuja@gmail.com.

What are Convolutional Neural Networks?

June 22, 2023
What are Convolutional Neural Networks?

Introduction to CNN

Yann LeCun, director of Facebook’s AI Research Group, is the pioneer of convolutional neural networks. He built the first convolutional neural network called LeNet in 1988. LeNet was used for character recognition tasks like reading zip codes and digits.

Have you ever wondered how facial recognition works on social media, how object detection helps in building self-driving cars, or how disease detection is done using visual imagery in healthcare? It’s all possible thanks to convolutional neural networks (CNN). Here’s an example of convolutional neural networks that illustrates how they work:

Imagine there’s an image of a bird, and you want to identify whether it’s really a bird or some other object. The first thing you do is feed the pixels of the image in the form of arrays to the input layer of the neural network (multi-layer networks used to classify things). The hidden layers carry out feature extraction by performing different calculations and manipulations. 

There are multiple hidden layers like the convolution layer, the ReLU layer, and the pooling layer, that perform feature extraction from the image. Finally, there’s a fully connected layer that identifies the object in the image.

What is a Convolutional Neural Network?

A convolutional neural network is a feed-forward neural network that is generally used to analyze visual images by processing data with a grid-like topology. It’s also known as a ConvNet. A convolutional neural network is used to detect and classify objects in an image.

A convolutional neural network, or CNN, is a deep learning neural network designed for processing structured arrays of data such as images. Convolutional neural networks are widely used in computer vision and have become the state of the art for many visual applications such as image classification, and have also found success in natural language processing for text classification.

What does a convolutional neural network (CNN) do differently?

A convolutional neural network is a specific kind of neural network with multiple layers. It processes data that has a grid-like arrangement and then extracts important features. One huge advantage of using CNNs is that you don't need to do a lot of pre-processing on images.

With most algorithms that handle image processing, the filters are typically created by an engineer based on heuristics. CNN's can learn what characteristics in the filters are the most important. That saves a lot of time and trial and error work since we don't need as many parameters.

It doesn't seem like huge savings until you are working with high-resolution images that have thousands of pixels. The convolutional neural network algorithm's main purpose is to get data into forms that are easier to process without losing the features that are important for figuring out what the data represents. This also makes them great candidates for handling huge datasets.

A big difference between a CNN and a regular neural network is that CNN uses convolutions to handle the math behind the scenes. Convolution is used instead of matrix multiplication in at least one layer of the CNN. Convolutions take two functions and return a function.

CNN works by applying filters to your input data. What makes them so special is that CNNs are able to tune the filters as training happens. That way the results are fine-tuned in real-time, even when you have huge data sets, like with images.

Since the filters can be updated to train the CNN better, this removes the need for hand-created filters. That gives us more flexibility in the number of filters we can apply to a data set and the relevance of those filters. Using this algorithm, we can work on more sophisticated problems like face recognition.

One of the things that prevent a lot of problems from using CNNs is a lack of data. While networks can be trained with relatively few data points (~10,000 >), the more data there is available, the better tuned the CNN will be.

Just keep in mind that these data points have to be clean and labelled in order for CNN to be able to use them. That's what makes them so expensive to work with.

How do Convolutional Neural Networks work?

Convolutional neural networks are based on neuroscience findings. They are made of layers of artificial neurons called nodes. These nodes are functions that calculate the weighted sum of the inputs and return an activation map. This is the convolution part of the neural network.

Each node in a layer is defined by its weight values. When you give a layer some data, like an image, it takes the pixel values and picks out some of the visual features.

When you're working with data in a CNN, each layer returns activation maps. These maps point out important features in the data set. If you gave the CNN an image, it'll point out features based on pixel values, like colours, and give you an activation function.

Usually, with images, a CNN will initially find the edges of the picture. Then this slight definition of the image will get passed to the next layer. Then that layer will start detecting things like corners and colour groups. Then that image definition will get passed to the next layer and the cycle continues until a prediction is made.

As the layers get more defined, this is called max pooling. It only returns the most relevant features from the layer in the activation map. This is what gets passed to each successive layer until you get the final layer.

The last layer of a CNN is the classification layer which determines the predicted value based on the activation map. If you pass a handwriting sample to a CNN, the classification layer will tell you what letter is in the image. This is what autonomous vehicles use to determine whether an object is another car, a person, or some other obstacle.

Training a CNN is similar to training many other machine learning algorithms. You'll start with some training data that is separate from your test data and you'll tune your weights based on the accuracy of the predicted values. Just be careful that you don't overfit your model.

Applications of Convolutional Neural Networks

Convolutional neural networks are most widely known for image analysis but they have also been adapted for several applications in other areas of machine learning, such as natural language processing.

Convolutional Neural Networks for Self-Driving Cars

Several companies, such as Tesla and Uber, are using convolutional neural networks as the computer vision component of self-driving cars.

A self-driving car’s computer vision system must be capable of localization, obstacle avoidance, and path planning. 

Let us consider the case of pedestrian detection. A pedestrian is a kind of obstacle which moves. A convolutional neural network must be able to identify the location of the pedestrian and extrapolate their current motion in order to calculate if a collision is imminent.

A convolutional neural network for object detection is slightly more complex than a classification model, in that it must not only classify an object but also return the four coordinates of its bounding box.

Furthermore, the convolutional neural network designer must avoid unnecessary false alarms for irrelevant objects, such as litter, but also take into account the high cost of miscategorizing a true pedestrian and causing a fatal accident.

A major challenge for this kind of use is collecting labelled training data. Google’s Captcha system is used for authenticating on websites, where a user is asked to categorise images as fire hydrants, traffic lights, cars, etc. This is actually a useful way to collect labelled training images for purposes such as self-driving cars and Google StreetView.

Convolutional Neural Networks for Drug Discovery

The first stage of a drug development program is drug discovery, where a pharmaceutical company identifies candidate compounds which are more likely to interact with the body in a certain way. Testing candidate molecules in preclinical or clinical trials are expensive, and so it is advantageous to be able to screen molecules as early as possible.

Proteins which play an important role in disease are known as ‘targets’. There are targets that can cause inflammation or help tumours grow. The goal of drug discovery is to identify molecules that will interact with the target for a particular disease. The drug molecule must have the appropriate shape to interact with the target and bind to it, like a key fitting in a lock.

The San Francisco-based startup Atomwise developed an algorithm called AtomNet, based on a convolutional neural network, which was able to analyse and predict interactions between molecules. Without being taught the rules of chemistry, AtomNet was able to learn essential organic chemical interactions.

Atomwise was able to use AtomNet to identify lead candidates for drug research programs. AtomNet successfully identified a candidate treatment for the Ebola virus, which had previously not been known to have any antiviral activity. The molecule later went on to pre-clinical trials.

Conclusion

Convolutional neural networks are multi-layer neural networks that are really good at getting the features out of data. They work well with images and they don't need a lot of pre-processing.

Using convolutions and pooling to reduce an image to its basic features, you can identify images correctly.

It's easier to train CNN models with fewer initial parameters than with other kinds of neural networks. You won't need a huge number of hidden layers because the convolutions will be able to handle a lot of the hidden layer discovery for you.

One of the cool things about CNNs is the number of complex problems they can be applied to. From self-driving cars to detecting diabetes, CNNs can process this kind of data and provide accurate predictions.