Create your own Object Detector

Creating a custom object detector was a challenge, but not now. There are many approaches for handling object detection. Of all, Haarcascades and HOG+SVM are very popular and best known for their performance. Though Haarcascades which were introduced by Viola and Jones are good in achieving decent accuracy, HOG+SVM proved to outperform the Haarcascades implementation. Here in this post we are going to build a object detector using HOG+SVM model. The output from our detector is something similar as shown below.


Here we build a Object detector that works for detecting any trained object, but for the explanation of the post let’s stick to the example of detecting clocks in images. However it’s just a matter of annotating the object in the images we want to detect, which we will see in a moment.


To build an custom end-to-end object detector. Since we need to detect objects of particular types (here clock), we will train our detector with the objects we want to detect. And for that we need to annotate the objects in images. So breaking down the steps to build an object detector at very high level,

  • Collect training images.
  • Annotate object locations in the training images.
  • Train the Object Detector with the object regions.
  • Save and test the trained detector.

Project structure

Object Detector
├── selectors/
  • selectors/ – It contains BoxSelector class which helps us to annotate (select) the object regions.
  • – A script that allows to annotate each image using a selector.
  • – It contains ObjectDetector class that is used for training and detecting objects.
  • – Used for training an object detector.
  • – The actual driver script to detect regions in an image.

Collect training images

Since we want to create a Object detector to detect any object we train it, It’s just a matter of changing images and annotations to create any other object detector, as here we will be using clock images to train the detector as an example. I’ve collected some images containing clocks from internet. I would like to add that the copyright of the images belong to their owners. The training images are shown below.


Annotate object locations

Now that we have our training images ready. We need to annotate the coordinates of the clocks in those images. We will adopt BoxSelector class from the previous post. Let’s build a script ( that helps us annotate the object regions using the BoxSelector class from selectors package and save the annotations to disk.

import numpy as np
import cv2
import argparse
from imutils.paths import list_images
from selectors import BoxSelector

#parse arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d","--dataset",required=True,help="path to images dataset...")
ap.add_argument("-a","--annotations",required=True,help="path to save annotations...")
ap.add_argument("-i","--images",required=True,help="path to save images")
args = vars(ap.parse_args())

We start off by importing necessary packages and parse the necessary arguments.

  • --dataset – Path to training images dataset.
  • --annotations – Path to save the annotations to disk.
  • --images – Path to save the image paths to disk (to make consistent annotations).
#annotations and image paths
annotations = []
imPaths = []

#loop through each image and collect annotations
for imagePath in list_images(args["dataset"]):

    #load image and create a BoxSelector instance
    image = cv2.imread(imagePath)
    bs = BoxSelector(image,"Image")

    #order the points suitable for the Object detector
    pt1,pt2 = bs.roiPts
    (x,y,xb,yb) = [pt1[0],pt1[1],pt2[0],pt2[1]]

We create two empty lists to hold the annotations and image paths. We need to save the image paths as the annotations for an image can be retrieved by index. So there won’t be any mistake in retrieving annotations i.e, retrieving incorrect annotations for an image. And then we loop over each image and create a BoxSelector instance to help us select the regions using mouse. We then collect the object location using the selection and append the annotation and image path to annotations and imPaths respectively.

#save annotations and image paths to disk
annotations = np.array(annotations)
imPaths = np.array(imPaths,dtype="unicode")["annotations"],annotations)["images"],imPaths)

Finally we convert the annotations and imPaths to numpy arrays and save them to disk.

Create an Object Detector

If you do not know what exactly HOG (Histogram of Oriented Gradients), then I recommend you to go through this link and for SVM (Support Vector Machines) go through this link come back. Creating an HOG+SVM object detector from scratch is a bit difficult and a tedious process. Fortunately, we have dlib package which has an api for creating such object detectors. So here we create an abstraction to use the object detector from dlib with ease. The actual functioning of HOG+SVM can be broken down into the following steps.


  • Create a HOG descriptor with certain pixels_per_cell, cells_per_block and orientations.
  • Extract HOG features using the descriptor from each object region (annotated)
  • Create and train a Linear SVM model on the extracted HOG features.


  • Estimate the average window size.
  • Scale down or up the images for several levels upto a certain termination and build an image pyramid.
  • Slide the window through each image in an image pyramid.
  • Extract HOG features from each location.
  • Estimate the probability of trained SVM model with the current HOG features. If it is more than certain threshold then it contains object otherwise not.

We won’t implement the HOG+SVM model from scratch, instead we will use the dlib package as stated before. Let’s open  and start coding.

import dlib
import cv2

class ObjectDetector(object):
    def __init__(self,options=None,loadPath=None):
        #create detector options
        self.options = options
        if self.options is None:
            self.options = dlib.simple_object_detector_training_options()

        #load the trained detector (for testing)
        if loadPath is not None:
            self._detector = dlib.simple_object_detector(loadPath)

We import necessary packages and create an ObjectDetector class whose constructor takes two keyword arguments,

  • options – object detector options for controlling HOG and SVM hyperparameters.
  • loadPath – to load the trained detector from disk.

We create default options for training a simple object detector using dlib.simple_object_detector_training_options() if no options are provided explicitly. These options consists of several hyper parameters like window_size,num_threads,etc., which helps us create and tune the object detector. And we load the trained detector from disk in case of testing phase.

    def _prepare_annotations(self,annotations):
        annots = []
        for (x,y,xb,yb) in annotations:
        return annots

    def _prepare_images(self,imagePaths):
        images = []
        for imPath in imagePaths:
            image = cv2.imread(imPath)
            image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
        return images

And then we define two methods namely _prepare_annotations and _prepare_images which helps preprocessing the given annotations to the form that are acceptable by the dlib detector. And also helps us loading the images from the imagePaths and converting them to RGB since cv2 reads images as BGR and dlib expects the images of RGB format.

    def fit(self, imagePaths, annotations, visualize=False, savePath=None):
        annotations = self._prepare_annotations(annotations)
        images = self._prepare_images(imagePaths)
        self._detector = dlib.train_simple_object_detector(images, annotations, self.options)

        #visualize HOG
        if visualize:
            win = dlib.image_window()

        #save detector to disk
        if savePath is not None:

        return self

We then create our fit method which takes in arguments as follows,

  • imagePaths – a numpy array of type unicode containing paths to images.
  • annotations – a numpy array consisting of annotations for corresponding images in the imagePaths.
  • visualize – (default=False) a flag indicating whether or not to visualize the trained HOG features.
  • savePath – (default=None) path to save the trained detector. If None, no detector will be saved.

We first prepare annotations and images using the above defined methods _prepare_annotations and _prepare_images. Then we create an instance of dlib.train_simple_object_detector using the images, annotations and options obtained above. We then handle the visualization of HOG features and saving the trained detector to disk.

    def predict(self,image):
        boxes = self._detector(image)
        preds = []
        for box in boxes:
            (x,y,xb,yb) = [box.left(),,box.right(),box.bottom()]
        return preds

    def detect(self,image,annotate=None):
        image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
        preds = self.predict(image)
        for (x,y,xb,yb) in preds:
            image = cv2.cvtColor(image,cv2.COLOR_RGB2BGR)

            #draw and annotate on image
            if annotate is not None and type(annotate)==str:

Now that we have our fit method defined and we proceed to defined predict method which takes in an image and outputs the list of bounding boxes for the detected objects in the image. And finally we define detect method which takes in an image, converts to RGB, predicts the bounding boxes and draw the rectangle and annotate the text above the detected location using the keyword argument annotate.

We are all ready to train our detector. We create a file named and fill the following code in it. The code itself is self explainatory.

from detector import ObjectDetector
import numpy as np
import argparse

ap = argparse.ArgumentParser()
ap.add_argument("-a","--annotations",required=True,help="path to saved annotations...")
ap.add_argument("-i","--images",required=True,help="path to saved image paths...")
ap.add_argument("-d","--detector",default=None,help="path to save the trained detector...")
args = vars(ap.parse_args())

print "[INFO] loading annotations and images"
annots = np.load(args["annotations"])
imagePaths = np.load(args["images"])

detector = ObjectDetector()
print "[INFO] creating & saving object detector",annots,visualize=True,savePath=args["detector"])

We finally create another script named used for testing our trained object detector over an image.

from detector import ObjectDetector
import numpy as np
import cv2
import argparse

ap = argparse.ArgumentParser()
ap.add_argument("-d","--detector",required=True,help="path to trained detector to load...")
ap.add_argument("-i","--image",required=True,help="path to an image for object detection...")
ap.add_argument("-a","--annotate",default=None,help="text to annotate...")
args = vars(ap.parse_args())

detector = ObjectDetector(loadPath=args["detector"])

imagePath = args["image"]
image = cv2.imread(imagePath)

Let’s go and run our scripts. We first run and select the regions of object for each image.


Now that we have annotations and image arrays. We are good to go for training our object detector using the script


We have trained our detector and we can see the trained HOG features visualized. This HOG is pretty enough to carry out our detection phase. We then run our script giving it an input image and let it to detect objects in the image.


Finally we have created our own object detector which is capable of detecting any trained object. The code for this post can be downloaded from my github.

Thank you, Have a nice day…

  • Cristhian Malakan

    Great tutorial, do you have the images? Did you only use the ones depicted in the post? (10 clocks)

    • Saideep

      Yeah, I have! By the way I just downloaded them from the google.

  • Rahul Vijay Soans

    Great work. I have error in gather_annotations
    from selectors import BoxSelector
    ImportError: No module named selectors

    Which module i am missing

  • Norman Siboro

    good tutorial.
    I have error in gather_annotations
    from box_selector import BoxSelector
    ImportError: No module named ‘box_selector’
    did I miss something?
    I already copied the file from your github repository

    • Saideep

      Make sure you follow the same package structures.

      If you are using python3 make sure of relative imports! See the link below for more details.

      • Norman Siboro

        thanks for the help. I have solved that issue
        but I have another error

        self.orig = image.copy()
        AttributeError: ‘NoneType’ object has no attribute ‘copy’

        • Saideep

          It occurs when the path to image is not valid. Hence returns None

          • Norman Siboro

            does the size of image matter?

            I tried to debug it, and I found that my last two images had different size which is 100×100 but my other images are 500×400 and also for the last two images, object that I want to train almost cover the whole image

          • Norman Siboro

            is there any minimum amount of pictures to train?

          • Saideep

            It depends on the structural variance of the object. Since, here in this blog post, clocks were used which almost all the clocks are identical, so a few images are good to go. If you want to train for complex ones, example faces, you need more images in variety of poses.

          • Norman Siboro

            OK, thank you Saideep
            your suggestions and ideas help me so much
            Hope you keep sharing your ideas

          • Saideep


          • Saideep

            It doesn’t matter. Even if your object of interest occupies whole image, try to draw a rectangle over it to annotate. You should not leave that empty.

        • Domenico Buzzerio

          How do you resolve this error? I’ve tried but i’m newbie about Python and I dont’understand very well the relative import

          • Laura

            Did you solve it? Im having the same error

        • Keshika Tank

          I have the same error and i am pretty sure the path to my dataset is correct. Can you share any other mistake which can lead to this error so that i can fix it.

  • Haaris

    Im getting this error while implementing this

    AttributeError: ‘numpy.ndarray’ object has no attribute ‘append’

    it terminates after the 2nd image

    • Saideep

      Actually, it happens when you convert the annotations to . I’ve converted after iterating through all the images. Please note that line 34 is not in the for loop.

  • Ayush Saxena

    I have a doubt … “How to give the label “Clock” to the box that we are assigning to image?” Please help even if you find this a stupid query.

  • Laura

    Is there a way to detect more than one object in an image? such as two clocks.

  • Ayush Singhal

    it is taking a lot of time for detecting object .. it takes about 10sec on 1 image to detect an object. Where am I going wrong? How to speed up?

  • Fish Bear

    Thank for you post, can you share you datasets for learning?

  • Ildefonso Mauriaca

    What number of images did you use? I went through the same process but experimented with detecting human ears and it didn’t work very well. Are the clocks a particularly easy example or did you use a lot of training data?

  • Ahmed Sadman

    How do I use this to annotate multiple objects in a Single image? Suppose that I need to detect some other object in the image alongside Clock

  • Padma Kannan

    Hi…for training images, can’t we have an image with two clocks? Is it like only one clock per image and drawing a bounding box around it?
    Is there any way to draw more than one box on an image while training?

    • Praveen Krishna

      Good question. This is multi object detection problem. Even i have same issue. Did you solve this ?

  • Anirudh Katti

    where is the clock_detector.svm?

    • Praveen Krishna

      download the files from github

    • Praveen Krishna

      that’s the custom file name you give to save in file path

  • clawdis

    is there a way to train for 2 or 3 objects such as: clock , bottle or glass ?

    • Praveen Krishna

      Good question. This is multi object detection problem. Even i have same issue. Did you solve this ?

  • Anirudha Patil

    I have created a custom detector for bottles. However, it isn’t detecting anything in the test images containing bottle. Could you tell me how many images have you used for training? Great tutorial !!!!!!

    • Ehtisham Raza

      Brother i need your help kindly contact me on my email

    • Praveen Krishna

      how many images did you train ?

  • Mohammed Abu-Jamous

    did you trained this detector on negative images (non-clock images) ??

    • Praveen Krishna

      not necessary. It is like object detection not classification task

  • Darrell Chin On

    Hi Saideep, I think you did a great job explaining the code. Thank you for that. However, I tried your code including the BoxSelector from the “tracker” project. I can’t resolve the

    AttributeError: module 'selectors' has no attribute 'SelectSelector'

    Can you please point me out as to where to look to solve this? I am running your code in Py3 and I think I have all the dependencies down.
    Thank you, Darrell.

    • Praveen Krishna

      its referring to site-packages insteasd of custom script. Add ‘import sys’ and sys.path.insert(1,#pwd(present working directory path))

  • Keshika Tank

    File “”, line 18, in,annots,visualize=True,savePath=args[“detector”])
    File “/Users/keshikatank/Desktop/Dure/project/Object-Detector-master/”, line 37, in fit
    win = dlib.image_window()
    AttributeError: module ‘dlib’ has no attribute ‘image_window’

    what is causing this error ? Can someone help me resolve it?

  • Praveen Krishna

    Hello Saideep,

    That’s an amazing post. Can we do multi-object recognition using HoG-SVM? What do you think?
    I agree we have pre-trained model now but it is black-box based implementation and will have lot of false-positives

  • It is discussed in post to apply Image Pyramid and Sliding Windows, but I do not see anywhere the code related to it. Anything I am missing?

  • Anirudh Bhat

    How do we modify this code to work for humans?