Big Data

Facebook AI Open Source Discovery Transformer (DETR)

Introduction

Occasionally, a library or machine learning framework changes the landscape of the field. Nowadays, Facebook opened such a framework: DETR o DEtection TRansformer.

In this post, we will quickly understand the concept of object detection and then we will dive directly into DETR and what it brings.

Object detection at a glance

A Computer Vision, object detection is a task where we want our model to distinguish the foreground objects from the background and predict the locations and categories of the objects present in the image. Los enfoques actuales de deep learningDeep learning, A subdiscipline of artificial intelligence, relies on artificial neural networks to analyze and process large volumes of data. This technique allows machines to learn patterns and perform complex tasks, such as speech recognition and computer vision. Its ability to continuously improve as more data is provided to it makes it a key tool in various industries, from health... intentan solucionar la tarea de detección de objetos como un obstáculo de clasificación o como un obstáculo de regresión o ambos.

As an example, in the RCNN algorithm, several regions of interest are identified from the input image. Subsequently, these regions are classified as objects or as background and, finally, a regression model is used to generate the bounding boxes for the identified objects.

The YOLO framework (You Only Look Once), Besides, handles object detection in a different way. Takes the entire image in a single instance and predicts the bounding box coordinates and class probabilities for these boxes.

For more information on object detection, see these posts:

We present DEtection TRansformer (DETR) of Facebook AI

As you saw in the previous section, current deep learning algorithms perform multi-step object detection. They also suffer from the problem of almost duplicates, In other words, false positives. To simplify, Facebook AI researchers have devised DETR, an innovative and efficient approach to solving the problem of object detection.

The original paper is here, open source code is hereand you can consult the Colab notebook here.

Source: https://arxiv.org/pdf/2005.12872.pdf

This new model is quite simple and you don't need to install any library to use it. DETR treats an object detection obstacle as a direct set prediction obstacle with the help of a transformer-based encoder-decoder architecture. Per set, I mean the bounding box set. Transformers are the new generation of deep learning models that have performed outstandingly in the domain of NLP.

The authors of this post have evaluated DETR on one of the most popular object detection data sets., COCO, versus a very competitive Faster R-CNN baseline.

In the results, the DETR achieved comparable performances. More accurately, DETR demonstrates significantly better performance on large objects. Despite this, didn't work as well on small objects. I'm sure the researchers will figure it out very soon.

DETR architecture

The general architecture of DETR is quite simple to understand. Contains three main components:

a CNN backbone
an encoder-decoder transformer
a simple feed-through network

Source: https://arxiv.org/pdf/2005.12872.pdf

Here, CNN backbone generates feature map from input image. Subsequently, the output of the CNN backbone is converted to a one-dimensional feature map that is passed to the Transformer encoder as input. The output of this encoder is N number of fixed length embeds (vector), where N is the number of objects in the image assumed by the model.

The Transformer decoder decodes these embeddings at the bounding box coordinates with the help of the decoder's attention mechanism and the encoder itself..

In summary, feedforward neural networks predict normalized central coordinates, la altura y el ancho de los cuadros delimitadores y la capa lineal predice la etiqueta de clase usando una función softmaxThe softmax function is a mathematical tool used in the field of machine learning, especially in neural networks. Converts a value vector into a probability distribution, assigning probabilities to each class in multi-classification problems. Its formula normalises the outputs, ensuring that the sum of all probabilities is equal to one, allowing the results to be interpreted effectively. It is essential in the optimization of....

Final thoughts

This is a truly exciting framework for all deep learning and computer vision enthusiasts.. A big thank you to Facebook for sharing their approach with the community.

Time to buckle up and use this for our next deep learning project!!

Facebook AI Open Source Discovery Transformer (DETR)

Contents

Introduction

Object detection at a glance

We present DEtection TRansformer (DETR) of Facebook AI

DETR architecture

Final thoughts

Related

Recent posts

Artificial Intelligence in Video: How New Technologies Are Changing Video Production?

IT profiles you should consider

How to record a screen on Windows computer?

¿Do you know the seniority levels?

Find Your Best Slip Rings and Rotary Joints Here

Posittion Agency: Advantages of link building for an online store

Subscribe to our Newsletter

Gaming

Brands

Business

Languages

Facebook AI Open Source Discovery Transformer (DETR)

Contents

Introduction

Object detection at a glance

We present DEtection TRansformer (DETR) of Facebook AI

DETR architecture

Final thoughts

Related

Related Posts:

Recent posts

Artificial Intelligence in Video: How New Technologies Are Changing Video Production?

IT profiles you should consider

How to record a screen on Windows computer?

¿Do you know the seniority levels?

Find Your Best Slip Rings and Rotary Joints Here

Posittion Agency: Advantages of link building for an online store

Subscribe to our Newsletter

Gaming

Brands

Business

Languages