Introduction to object tracking using OpenCV

Contents

This article was published as part of the Data Science Blogathon

Introduction

OpenCV is a great tool to play with images and videos. Or you want to give your photos a black and white look from the 90 or perform complex math operations, OpenCV is always ready to serve. If you like computer vision, it is essential to have knowledge of OpenCV. The library includes more than 2500 optimized algorithms that can be used to perform a wide variety of tasks. It is used by many of the industry giants like Google, Microsoft, IBM and is widely used in research groups. The library supports multiple languages, including java, c ++ and python.

This article will show you how to accomplish the complex task of object tracking using some of the basic functions of OpenCV.

You can consider an example of a soccer game. You have a live broadcast of the match and your task is to track the position of the ball at all times. The task seems simple to an average human, but it's too complex for even the smartest machine. How will you know, computers only understand numbers. You don't understand what an image is, but the pixel values ​​associated with the image. Two images that appear to be exactly the same to the human eye may not be the same case for your computer, since even a small change in one pixel will result in a difference. Therefore, object tracking is considered one of the most complex tasks in computer vision. Although complex, it is not something unattainable.

Object tracking can be done using machine learning as well as deep learning based approaches. The deep learning approach on the one hand provides better results in complex tasks and is quite widespread, requires a lot of training data. While ML-based approaches are fairly straightforward but not widespread. For this article, we are using an ML based approach in conjunction with various computer vision techniques which we will discuss later in this article.

The technique is widely used in surveillance, security, traffic monitoring, robot vision, video communication and much more. What's more, object tracking has multiple use cases, as crowd count, autonomous vehicles, face detection, etc. Can you think of some more examples where you can use object tracking in your daily life??

Due to so many real life applications, constant research is being done in this field to achieve higher precision and make the model more robust.

For this article, we will use this video. As you will see, there is a red ball that moves through a maze and our task is to detect the location of the ball and find its centroid. I could also see a big noise (sorry people), background, to make the task a little more challenging.

22521screenshot2025-6154785

1.

First, we import the necessary libraries to be used.

import numpy as np
import cv2

2.

We will be defining a function that will resize the images to fit on our screen in case they are large enough. This step is completely optional and you can skip it.

def resize(img):
        return cv2.resize(img,(512,512)) # arg1- input image, arg- output_width, output_height

3.

How will you know, the videos are made of frames. Frames are just one of many still images that together make up the entire moving image. The next step will be to read those frames using the VideoCapture function () in OpenCV and using the while loop, we can see the pictures moving. You can adjust the video speed using cv2.waitKey (x) which pauses the screen for x milliseconds.

cap=cv2.VideoCapture(vid_file_path)
right,frame=cap.read()

while ret==True:
    right,frame=cap.read()
    cv2.imshow("frame",resize(frame))
    key=cv2.waitKey(1)
    if key==ord('q'):
        break
cv2.waitKey(0)
cv2.destroyAllWindows()

4.

OpenCV reads images in BGR format, so we will convert the color space from BGR to HSV. Why HSV and not BGR or any other format?

We are using the HSV color format because it is more sensitive to minor changes in external lighting. Therefore, will give more accurate masks and, Thus, best results.

After converting the color space, what we have to do is filter the red channel and create a mask frame.

The red channel in hsv format is present in [0,230,170] to [255,255,220] distance.

cap=cv2.VideoCapture(vid_file_path)


right,frame=cap.read()
l_b=np.array([0,230,170])# lower hsv bound for red
u_b=np.array([255,255,220])# upper hsv bound to red

while ret==True:
    right,frame=cap.read()

    hsv=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
    mask=cv2.inRange(hsv,l_b,u_b)

    cv2.imshow("frame",resize(frame))

    cv2.imshow("mask",mask)


    key=cv2.waitKey(1)
    if key==ord('q'):
        break
cv2.waitKey(0)
cv2.destroyAllWindows()

68572masked-4189669

(This image has been re-sized)

5.

Up to now, we have created the masked image of the frame and filtered out most of the noise. What follows is to get the limits of the ball. For this we will use the concept of contour detection. The contours are nothing more than limits that will surround our ball. Fortunately, we don't have to find those limits on our own, since OpenCV allows a findContours function () that we can use for our purpose. Takes a masked image and returns an array of contours. For more information on contours, visit me. Ideally, in our case, the value of the contours should be one, since we only have one ball, but because some people wore red hats, we will get more than one. Can you think of anything to further reduce this noise?

To handle this problem we will use another OpenCV function which is cv2.contourArea (). We know in the masked picture, the ball has the largest area and so will its outline. Therefore, we will obtain the contour with the largest area.

We have the contours of the ball and we can directly draw these contours using the cv2.drawContours function (). But for detection tasks, what we generally do is use a well-delimited rectangle to show that the object has been detected. To do it, we will use the cv2.boundingRect function (). This function will return the coordinates of the rectangle and then the cv2.rectangle function () will draw the rectangle for us.

cap=cv2.VideoCapture(vid_file_path)


right,frame=cap.read()
l_b=np.array([0,230,170])# lower hsv bound for red
u_b=np.array([255,255,220])# upper hsv bound to red

while ret==True:
    right,frame=cap.read()

    hsv=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
    mask=cv2.inRange(hsv,l_b,u_b)

    contours,_= cv2.findContours(mask,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)

    max_contour = contours[0]
         for contour in contours:
                if cv2.contourArea(contour)>cv2.contourArea(max_contour):

                      max_contour=contour

         contour=max_contour
         approx=cv2.approxPolyDP(contour, 0.01*cv2.arcLength(contour,True),True)
         x,Y,w,h=cv2.boundingRect(approx)
         cv2.rectangle(frame,(x,Y),(x+w,y+h),(0,255,0),4)

    cv2.imshow("frame",resize(frame))

    cv2.imshow("mask",mask)

528745-3351114

(This image has been re-sized)

6.

What's more, what we can do is detect the centroid of the ball simultaneously. For that, we will use cv2.moments. cv2.moments calculates the weighted average sum of the pixel intensities within the contour and, Thus, allows you to get more useful information from the blob, like your radio, centroid, etc. Make sure to convert the image to binary format before using the function. You can know more about moments here.

cap=cv2.VideoCapture(vid_file_path)


right,frame=cap.read()
l_b=np.array([0,230,170])# lower hsv bound for red
u_b=np.array([255,255,220])# upper hsv bound to red

while ret==True:
    right,frame=cap.read()

    hsv=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
    mask=cv2.inRange(hsv,l_b,u_b)

    contours,_= cv2.findContours(mask,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)

    max_contour = contours[0]
         for contour in contours:


                if cv2.contourArea(contour)>cv2.contourArea(max_contour):

                  max_contour = contour

         approx=cv2.approxPolyDP(contour, 0.01*cv2.arcLength(contour,True),True)
         x,Y,w,h=cv2.boundingRect(approx)
         cv2.rectangle(frame,(x,Y),(x+w,y+h),(0,255,0),4)

         M=cv2.moments(contour)

cx = int (M['M10']//METRO[‘m00’])
cy = int (M[‘M01’]//METRO[‘m00’])
cv2.circle (marco, (cx, cy), 3, (255,0,0), – 1)


    cv2.imshow("frame",resize(frame))

    cv2.imshow("mask",mask)

    key=cv2.waitKey(1)
    if key==ord('q'):
        break
cv2.waitKey(0)
cv2.destroyAllWindows()

200486-8346694

(This image has been re-sized)

Where to go from here

In this article, we have used object detection in each frame for object tracking task. Although it is useful, it may not work well in all cases. While reading the article, several questions may have hit your brain. What if there is more than one object in the video? What if the mask images do not help to detect the object? What if the object is constantly moving in and out of the frame? What if there is no object?

The only way to find them is to try them on your own.. You can always tweak the inputs and make the task a little more challenging until the fun stops.

The media shown in this article is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.