Object Tracking

Object Tracking has been a challenging problem in the field of computer vision and lots of new techniques are being invented. In this post we build an end-to-end object tracker using two simple techniques to track an object in a video stream. The outcome of the project might looks like as shown below.



The goal of this post is to develop two object tracking techniques namely,

To build an object tracker we need to first select which object we need to track and therefore we need to write code to select the portion of the image (object) to track.

Project Structure

Object Tracker
├── selectors/
│   ├── box_selector.py
│   ├── __init__.py
├── trackers/
│   ├── camshift.py
│   ├── correlation.py
│   ├── __init__.py
└── track.py
  • selectors/box_selector.py: It contains BoxSelector class which helps to select a region of an image.
  • trackers/camshift.py: It contains CamshiftTracker class used for tracking that utilizes Camshift algorithm.
  • trackers/correlation.py: It contains CorrelationTracker class used for tracking that utilizes Correlation tracking technique from dlib.
  • track.py: The actual driver script used to track objects.

The objective of the project can be divided as 2 checkpoints, i.e., selecting the region we want to track and implementing a tracking technique used to track the captured region in a video/live stream.

Selecting regions

Though selecting a region is an easy task and can be achieved by using simple OpenCV mouse callbacks, I’ve included the script as a matter of completeness. Here we implement a class that is capable of selecting a box region in the image when initialized. Let’s open up box_selector.py in the selectors package and fill the code in it.

import numpy as np
import cv2

class BoxSelector(object):
    def __init__(self, image, window_name,color=(0,0,255)):
        #store image and an original copy
        self.image = image
        self.orig = image.copy()

        #capture start and end point co-ordinates
        self.start = None
        self.end = None

        #flag to indicate tracking
        self.track = False
        self.color = color
        self.window_name = window_name

        #hook callback to the named window

We start off by importing necessary packages. Then we create a class named BoxSelector whose constructor takes an image, window_name we want to hook our event callbacks. A copy of the original image is preserved and we initialize self.start and self.end to be the start and end co-ordinates of the selection. self.track is a flag that indicates tracking. Finally we fire up the cv2.namedWindow method which is the name of the window we want to select the region and a mouse callback (self.mouseCallBack) is hooked which we define in a moment.

    def mouseCallBack(self, event, x, y, flags, params):
        #start tracking if left-button-clicked down
        if event==cv2.EVENT_LBUTTONDOWN:
            self.start = (x,y)
            self.track = True

        #capture/end tracking while mouse-move or left-button-click released
        elif self.track and (event==cv2.EVENT_MOUSEMOVE or event==cv2.EVENT_LBUTTONUP):
            self.end = (x,y)
            if not self.start==self.end:
                self.image = self.orig.copy()
                #draw rectangle on the image
                cv2.rectangle(self.image, self.start, self.end, self.color, 2)
                if event==cv2.EVENT_LBUTTONUP:

            #in case of clicked accidently, reset tracking
                self.image = self.orig.copy()
                self.start = None
                self.track = False

Now we define a method named mousecallBack used as a hook for the mouse event callbacks. It just does the below mentioned things.

  • If the mouse left button is clicked down, it starts tracking, and sets the self.track to True and stores the starting coordinates in self.start
  • It only captures the mouse moves and mouse left-button release when the self.track flag is enabled and captures the end coordinates in self.end
  • Also it checks the accidental clicks by checking the start and end positions. If they are same then the self.track flag is set to False.
  • It finally shows the image window to capture the region.
   def roiPts(self):
       if self.start and self.end:
           pts = np.array([self.start,self.end])
           s = np.sum(pts,axis=1)
           (x,y) = pts[np.argmin(s)]
           (xb,yb) = pts[np.argmax(s)]
           return [(x,y),(xb,yb)]
           return []

Finally we define a property named roiPts which gives the start and end coordinates of the selection. It makes sure the first set of co-ordinates is always the top-left and the next ones are bottom-right by simple numpy.argmax and numpy.argmin functions over the sum of their points over axis 1.

Tracking objects

Now that we have our selector ready. All we need to do is to define the tracker we need to use the selected region. For that we (here) use two popular tracking techniques namely,

  1. Camshift technique
  2. Correlation technique

Camshift tracking

If you do not know what exactly the Camshift is, I suggest you to go through this link before continuing. Generally Camshift technique is implemented in the following way.

  • Convert the image from RGB to HSV for better tracking.
  • Extract the patch from the image to track.
  • Construct a histogram for the extracted patch.
  • BackProject the calculated histogram to the image to get the location of the patch/object in the current frame.
  • Finally we get the position by using Camshift algorithm.

So let’s open up camshift.py in trackers package and start coding.

import numpy as np
import cv2

class CamshiftTracker(object):
    def __init__(self,roiPts):
        #set initial points
        self.roiPts = np.array(roiPts)
        self.hist = None
        self.roiBox = None

        #setup termination criteria, either 10 iterations or move by atlease 1 pt
        self.termCrit = (cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,10,1)

We create a class named CamshiftTracker whose constructor takes the initial coordinates of the object we want to track and then we initialize the,

  • roiBox – which is of the object location in the previous (initial) location of the form (x,y,w,h).
  • hist – histogram of the extracted patch of the object selection.

And finally we set the termination criterion which is essential for Camshift algorithm to state that it should terminate if the point doesn’t move by atleast 1 pt in the successive 10 iterations.

    def orderPoints(self):
        assert len(self.roiPts)==2
        return self.roiPts[np.argsort(self.roiPts.sum(axis=1))]

Then we define orderPoints method which is used to order the points of the input points. The code itself is pretty self explanatory.

    def track(self,image):
        self.image = image
        hsv = cv2.cvtColor(self.image,cv2.COLOR_BGR2HSV)

        #mask for better tracking
        mask = cv2.inRange(hsv,np.array((0.,60.,32.)),np.array((180.,255.,255.)))

        #order the points and gather top-left and bottom-right points
        pts = self.orderPoints().tolist()
        tl,br = pts

        #if tracking is not yet started initialize it by setting the roiBox and hist
        if self.roiBox is None:
            self.roiBox = (tl[0],tl[1],br[0]-tl[0],br[1]-tl[1])

        if self.hist is None:
            hsv_roi = hsv[tl[1]:br[1],tl[0]:br[0]]
            mask_roi = mask[tl[1]:br[1],tl[0]:br[0]]
            self.hist = cv2.calcHist([hsv_roi],[0],mask_roi,[16],[0,180])
            self.hist = cv2.normalize(self.hist,self.hist,0,255,cv2.NORM_MINMAX)
            self.hist = self.hist.reshape(-1)

            #backproject the histogram on to the original histogram
            prob = cv2.calcBackProject([hsv],[0],self.hist,[0,180],1)
            prob &= mask

            #get location of the object in the new frame
            trackBox,self.roiBox = cv2.CamShift(prob, self.roiBox, self.termCrit)
            return self.roiBox

        except Exception:
            return None

And now comes the actual track method which is used to track the object given the current frame of the video stream. We generally convert the image to HSV as it provides better insights into the contents of the image. Then we make a mask through that hsv converted image in order to reduce the common saturation and value constraints from the image. Then we make order our points and extract the top-left and bottom-right points.

We set the roiBox to the initial provided co-ordinates to the constructor of the class if the tracking is not yet started, meaning this is the very first frame to track. So we setup the roiBox to be of the form (x,y,w,h) from the top-left and bottom-right coordinates. We also extract the histogram for the selected region with its mask and normalize it and finally flatten it.

Now that we are ready to go for actual tracking procedure. Here we calculate the backprojection of the extracted histogram to the actual frame (here hsv converted frame). Finally we fire up the Camshift algorithm with the backprojected histogram, previous window and termination criterion to get the estimated position of the selected patch/object in the current frame.

Correlation tracking

Though Camshift is pretty awesome for object tracking, Correlation tracking tends to be more stable compared to camshift. Also correlation tracking is a bit tedious to implement by hand. Fortunately, dlib provides an api for the correlation tracking. Therefore all we need to do here is just to code a wrapper that just looks like the previous CamshiftTracker class. Let’s open correlation.py in the trackers package and start coding.

import dlib
import numpy as np

class CorrelationTracker(object):

    def __init__(self,roiPts):
        #set initial points
        self.roiPts = np.array(roiPts)
        self.tracker = None

    def orderPoints(self):
        assert len(self.roiPts)==2
        return self.roiPts[np.argsort(self.roiPts.sum(axis=1))]

We start by creating CorrelationTracker class whose constructor takes the initial coordinates of the object needed to track. And the orderPoints is same as explained before.

def track(self,image):
    #create a new tracker
    if self.tracker is None:
        self.tracker = dlib.correlation_tracker()
        pts = self.orderPoints().tolist()
        tl, br = pts
        (x,y,w,h) = (tl[0],tl[1],br[0]-tl[0],br[1]-tl[1])
        roi_pts = [x,y,x+w,y+h]

    #update the tracker with current frame and get the current estimated position
    pts = self.tracker.get_position()
    (x,y,xb,yb) = (pts.left(),pts.top(),pts.right(),pts.bottom())

    #return the points of the form (x,y,w,h)
    return np.int0((x,y,(xb-x),(yb-y)))

We create a tracker object using dlib.correlation_tracker and initialize the tracking with tracker.start_track by providing the image and the coordinates of the general form. We update the tracker with the current frame and return the estimated position of the object in the form (x,y,w,h).

Let’s go

Finally we have created our selectors and trackers packages and finally we are good to go to build a driver script that uses the above script to track the objects of our interest. Let’s open track.py and start coding.

import cv2
import time
from selectors import BoxSelector
from trackers import CamshiftTracker,CorrelationTracker
import argparse

ap = argparse.ArgumentParser()
ap.add_argument("-v","--video",help="(optional) video file...")
ap.add_argument("-t","--tracker",default="camshift",help="tracker to use (camshift/correlation)")
args = vars(ap.parse_args())

We start off by importing necessary packages and parse our arguments to include a optional video file and the tracker to use from the command line.

#initialize the capure
if args.get("video",None):
    cap = cv2.VideoCapture(args["video"])
    cap = cv2.VideoCapture(0)

trackers = {"camshift":CamshiftTracker,"correlation":CorrelationTracker}
objTracker = None

If the video argument is not provided the live webcam stream is captured. We initialize the trackers dictionary and set the objTracker to None.

while True:
    ret,frame = cap.read()
    if not ret:

    image = frame
    bs = BoxSelector(image, "Stream")
    key = cv2.waitKey(1) & 0xFF

    if key==ord("p"):
        key = cv2.waitKey(0) & 0xFF
        bs_pts = bs.roiPts

        if key==ord("p") and bs_pts:
            objTracker = trackers[args["tracker"].strip().lower()](bs_pts)

    elif key==ord("q"):

    if objTracker is not None:
        trackPts = objTracker.track(image)
        (x,y,w,h) = trackPts

We loop through all the frames and it breaks out of the loop if no frame is read. We then initialize our BoxSelector to capture our region of interest using mouse selection. And then we hook the keypress p to pause the video and select the region to track, and if again p is pressed then it initializes the object tracker from the provided --tracker argument from the command line and starts tracking.

Then it draws a rectangle over the estimated position of the selected patch/object at each frame. Finally we have built our object tracker from scratch which just works for either the live stream or the optional video provided with selected tracker from either camshift/correlation trackers. You can get the code for this project from my github.