Object Tracking has been a challenging problem in the field of computer vision and lots of new techniques are being invented. In this post we build an end-to-end object tracker using two simple techniques to track an object in a video stream. The outcome of the project might looks like as shown below.
TASK
The goal of this post is to develop two object tracking techniques namely,
To build an object tracker we need to first select which object we need to track and therefore we need to write code to select the portion of the image (object) to track.
Project Structure
Object Tracker ├── selectors/ │ ├── box_selector.py │ ├── __init__.py ├── trackers/ │ ├── camshift.py │ ├── correlation.py │ ├── __init__.py └── track.py
selectors/box_selector.py
: It containsBoxSelector
class which helps to select a region of an image.trackers/camshift.py
: It containsCamshiftTracker
class used for tracking that utilizes Camshift algorithm.trackers/correlation.py
: It containsCorrelationTracker
class used for tracking that utilizes Correlation tracking technique fromdlib
.track.py
: The actual driver script used to track objects.
The objective of the project can be divided as 2 checkpoints, i.e., selecting the region we want to track and implementing a tracking technique used to track the captured region in a video/live stream.
Selecting regions
Though selecting a region is an easy task and can be achieved by using simple OpenCV
mouse callbacks, I’ve included the script as a matter of completeness. Here we implement a class that is capable of selecting a box region in the image when initialized. Let’s open up box_selector.py
in the selectors
package and fill the code in it.
import numpy as np import cv2 class BoxSelector(object): def __init__(self, image, window_name,color=(0,0,255)): #store image and an original copy self.image = image self.orig = image.copy() #capture start and end point co-ordinates self.start = None self.end = None #flag to indicate tracking self.track = False self.color = color self.window_name = window_name #hook callback to the named window cv2.namedWindow(self.window_name) cv2.setMouseCallback(self.window_name,self.mouseCallBack)
We start off by importing necessary packages. Then we create a class named BoxSelector
whose constructor takes an image
, window_name
we want to hook our event callbacks. A copy of the original image is preserved and we initialize self.start
and self.end
to be the start and end co-ordinates of the selection. self.track
is a flag that indicates tracking. Finally we fire up the cv2.namedWindow
method which is the name of the window we want to select the region and a mouse callback (self.mouseCallBack
) is hooked which we define in a moment.
def mouseCallBack(self, event, x, y, flags, params): #start tracking if left-button-clicked down if event==cv2.EVENT_LBUTTONDOWN: self.start = (x,y) self.track = True #capture/end tracking while mouse-move or left-button-click released elif self.track and (event==cv2.EVENT_MOUSEMOVE or event==cv2.EVENT_LBUTTONUP): self.end = (x,y) if not self.start==self.end: self.image = self.orig.copy() #draw rectangle on the image cv2.rectangle(self.image, self.start, self.end, self.color, 2) if event==cv2.EVENT_LBUTTONUP: self.track=False #in case of clicked accidently, reset tracking else: self.image = self.orig.copy() self.start = None self.track = False cv2.imshow(self.window_name,self.image)
Now we define a method named mousecallBack
used as a hook for the mouse event callbacks. It just does the below mentioned things.
- If the mouse left button is clicked down, it starts tracking, and sets the
self.track
toTrue
and stores the starting coordinates inself.start
- It only captures the mouse moves and mouse left-button release when the
self.track
flag is enabled and captures the end coordinates inself.end
- Also it checks the accidental clicks by checking the
start
andend
positions. If they are same then theself.track
flag is set toFalse
. - It finally shows the image window to capture the region.
@property def roiPts(self): if self.start and self.end: pts = np.array([self.start,self.end]) s = np.sum(pts,axis=1) (x,y) = pts[np.argmin(s)] (xb,yb) = pts[np.argmax(s)] return [(x,y),(xb,yb)] else: return []
Finally we define a property named roiPts
which gives the start and end coordinates of the selection. It makes sure the first set of co-ordinates is always the top-left and the next ones are bottom-right by simple numpy.argmax
and numpy.argmin
functions over the sum of their points over axis 1.
Tracking objects
Now that we have our selector ready. All we need to do is to define the tracker we need to use the selected region. For that we (here) use two popular tracking techniques namely,
Camshift tracking
If you do not know what exactly the Camshift is, I suggest you to go through this link before continuing. Generally Camshift technique is implemented in the following way.
- Convert the image from RGB to HSV for better tracking.
- Extract the patch from the image to track.
- Construct a histogram for the extracted patch.
- BackProject the calculated histogram to the image to get the location of the patch/object in the current frame.
- Finally we get the position by using Camshift algorithm.
So let’s open up camshift.py
in trackers
package and start coding.
import numpy as np import cv2 class CamshiftTracker(object): def __init__(self,roiPts): #set initial points self.roiPts = np.array(roiPts) self.hist = None self.roiBox = None #setup termination criteria, either 10 iterations or move by atlease 1 pt self.termCrit = (cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,10,1)
We create a class named CamshiftTracker
whose constructor takes the initial coordinates of the object we want to track and then we initialize the,
roiBox
– which is of the object location in the previous (initial) location of the form(x,y,w,h)
.hist
– histogram of the extracted patch of the object selection.
And finally we set the termination criterion which is essential for Camshift algorithm to state that it should terminate if the point doesn’t move by atleast 1 pt in the successive 10 iterations.
def orderPoints(self): assert len(self.roiPts)==2 return self.roiPts[np.argsort(self.roiPts.sum(axis=1))]
Then we define orderPoints
method which is used to order the points of the input points. The code itself is pretty self explanatory.
def track(self,image): self.image = image hsv = cv2.cvtColor(self.image,cv2.COLOR_BGR2HSV) #mask for better tracking mask = cv2.inRange(hsv,np.array((0.,60.,32.)),np.array((180.,255.,255.))) #order the points and gather top-left and bottom-right points pts = self.orderPoints().tolist() tl,br = pts #if tracking is not yet started initialize it by setting the roiBox and hist if self.roiBox is None: self.roiBox = (tl[0],tl[1],br[0]-tl[0],br[1]-tl[1]) if self.hist is None: hsv_roi = hsv[tl[1]:br[1],tl[0]:br[0]] mask_roi = mask[tl[1]:br[1],tl[0]:br[0]] self.hist = cv2.calcHist([hsv_roi],[0],mask_roi,[16],[0,180]) self.hist = cv2.normalize(self.hist,self.hist,0,255,cv2.NORM_MINMAX) self.hist = self.hist.reshape(-1) try: #backproject the histogram on to the original histogram prob = cv2.calcBackProject([hsv],[0],self.hist,[0,180],1) prob &= mask #get location of the object in the new frame trackBox,self.roiBox = cv2.CamShift(prob, self.roiBox, self.termCrit) return self.roiBox except Exception: return None
And now comes the actual track
method which is used to track the object given the current frame of the video stream. We generally convert the image to HSV as it provides better insights into the contents of the image. Then we make a mask through that hsv converted image in order to reduce the common saturation and value constraints from the image. Then we make order our points and extract the top-left and bottom-right points.
We set the roiBox
to the initial provided co-ordinates to the constructor of the class if the tracking is not yet started, meaning this is the very first frame to track. So we setup the roiBox
to be of the form (x,y,w,h)
from the top-left and bottom-right coordinates. We also extract the histogram for the selected region with its mask and normalize it and finally flatten it.
Now that we are ready to go for actual tracking procedure. Here we calculate the backprojection of the extracted histogram to the actual frame (here hsv converted frame). Finally we fire up the Camshift algorithm with the backprojected histogram, previous window and termination criterion to get the estimated position of the selected patch/object in the current frame.
Correlation tracking
Though Camshift is pretty awesome for object tracking, Correlation tracking tends to be more stable compared to camshift. Also correlation tracking is a bit tedious to implement by hand. Fortunately, dlib
provides an api for the correlation tracking. Therefore all we need to do here is just to code a wrapper that just looks like the previous CamshiftTracker
class. Let’s open correlation.py
in the trackers
package and start coding.
import dlib import numpy as np class CorrelationTracker(object): def __init__(self,roiPts): #set initial points self.roiPts = np.array(roiPts) self.tracker = None def orderPoints(self): assert len(self.roiPts)==2 return self.roiPts[np.argsort(self.roiPts.sum(axis=1))]
We start by creating CorrelationTracker
class whose constructor takes the initial coordinates of the object needed to track. And the orderPoints
is same as explained before.
def track(self,image): #create a new tracker if self.tracker is None: self.tracker = dlib.correlation_tracker() pts = self.orderPoints().tolist() tl, br = pts (x,y,w,h) = (tl[0],tl[1],br[0]-tl[0],br[1]-tl[1]) roi_pts = [x,y,x+w,y+h] self.tracker.start_track(image,dlib.rectangle(*roi_pts)) #update the tracker with current frame and get the current estimated position self.tracker.update(image) pts = self.tracker.get_position() (x,y,xb,yb) = (pts.left(),pts.top(),pts.right(),pts.bottom()) #return the points of the form (x,y,w,h) return np.int0((x,y,(xb-x),(yb-y)))
We create a tracker object using dlib.correlation_tracker
and initialize the tracking with tracker.start_track
by providing the image and the coordinates of the general form. We update the tracker with the current frame and return the estimated position of the object in the form (x,y,w,h)
.
Let’s go
Finally we have created our selectors
and trackers
packages and finally we are good to go to build a driver script that uses the above script to track the objects of our interest. Let’s open track.py
and start coding.
import cv2 import time from selectors import BoxSelector from trackers import CamshiftTracker,CorrelationTracker import argparse ap = argparse.ArgumentParser() ap.add_argument("-v","--video",help="(optional) video file...") ap.add_argument("-t","--tracker",default="camshift",help="tracker to use (camshift/correlation)") args = vars(ap.parse_args())
We start off by importing necessary packages and parse our arguments to include a optional video file and the tracker to use from the command line.
#initialize the capure if args.get("video",None): cap = cv2.VideoCapture(args["video"]) else: cap = cv2.VideoCapture(0) trackers = {"camshift":CamshiftTracker,"correlation":CorrelationTracker} objTracker = None
If the video argument is not provided the live webcam stream is captured. We initialize the trackers dictionary and set the objTracker
to None
.
while True: ret,frame = cap.read() time.sleep(0.025) if not ret: break image = frame bs = BoxSelector(image, "Stream") cv2.imshow("Stream",image) key = cv2.waitKey(1) & 0xFF if key==ord("p"): key = cv2.waitKey(0) & 0xFF bs_pts = bs.roiPts if key==ord("p") and bs_pts: objTracker = trackers[args["tracker"].strip().lower()](bs_pts) elif key==ord("q"): break if objTracker is not None: trackPts = objTracker.track(image) (x,y,w,h) = trackPts cv2.rectangle(image,(x,y),(x+w,y+h),(0,0,255),2) cv2.imshow("Detected",image)
We loop through all the frames and it breaks out of the loop if no frame is read. We then initialize our BoxSelector
to capture our region of interest using mouse selection. And then we hook the keypress p
to pause the video and select the region to track, and if again p
is pressed then it initializes the object tracker from the provided --tracker
argument from the command line and starts tracking.
Then it draws a rectangle over the estimated position of the selected patch/object at each frame. Finally we have built our object tracker from scratch which just works for either the live stream or the optional video provided with selected tracker from either camshift/correlation trackers. You can get the code for this project from my github.