Single Shot Detectors (SSDs)

SECTION 1

Introduction, Research & Terminology
 
 

Introduction

Object detection tells us what is in an image and also where the object is. The method used in this section of object detection using deep learning the focus is Single Shot Detectors and MobileNets. These methods can be combined together and used for fast, real-time object detection on resource constrained devices,like the Raspberry Pi.

 in deep learning-based object detection there are three popular object detection methods:

 

Faster R-CNNs are likely the most “heard of” method for object detection using deep learning;  even with the “faster” implementation R-CNNs (where the “R” stands for “Region Proposal”) the algorithm can be quite slow, on the order of 7 FPS.

If we are looking for pure speed then we tend to use YOLO as this algorithm is much faster, capable of processing 40-90 FPS on a Titan X GPU. The super fast variant of YOLO can even get up to 155 FPS.

The problem with YOLO is that it leaves much accuracy to be desired.

SSDs, originally developed by Google, are a balance between the two. The algorithm is more straightforward (and I would argue better explained in the original seminal paper) than Faster R-CNNs.

We can also enjoy a much faster FPS throughput than Girshick et al. at 22-46 FPS depending on which variant of the network we use. SSDs also tend to be more accurate than YOLO. To learn more about SSDs, please refer to Liu et al.

Single Shot Detectors (SSD) and Mobilenets.

 

SSD: Single Shot MultiBox Detector

 
In the core of this project we use the “method for detecting objects in images using a single deep neural network”. Using Wei’s methods, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Application

This project uses what Howard et al 2017 refers to as a “class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks.”

SECTION 2

Our Project 

The aim is to use OpenCV’s dnn module to load a pre-trained object detection network. This allows us to pass input images through the network and obtain the output bounding box (x, y)-coordinates of each object in the image. Finally we’ll look at the results of applying the MobileNet Single Shot Detector to example input images.

SSDs, originally developed by Google, are a balance between the two. The algorithm is more straightforward (and I would argue better explained in the original seminal paper) than Faster R-CNNs.

We can also enjoy a much faster FPS throughput than Girshick et al. at 22-46 FPS depending on which variant of the network we use. SSDs also tend to be more accurate than YOLO. To learn more about SSDs, please refer to Liu et al.

How to apply object detection using deep learning and opencv

In order to obtain the bounding box (x, y)-coordinates for an object in an image we need to instead apply object detection.

Object detection can not only tell us what is in an image but also where the object is as well.

Object detection with deep learning and opencv

We’ll begin object detection using deep learning by analyzing single shot detectors and mobilenets.

The architectures can be  combined together for fast, real-time object detection on resource constrained devices in this project we will use the raspberry pi.

This paper will then focus on how to use opencv’s dnn module to load a pre-trained object detection network.

This enables the project to pass input images through the network and obtain the output bounding box (x, y)-coordinates of each object in the image.

Finally we’ll look at the results of applying the mobilenet single shot detector to example input images.

 

The next steps are to extend the script to work with real-time video streams.

SECTION 3

The Data

This project combines the MobileNet architecture and the Single Shot Detector (SSD) framework, providing fast, efficient deep learning-based method for object detection.

The model is a Caffe version of the original TensorFlow implementation by Howard et al. and was trained by chuanqi305 (see GitHub).

The MobileNet SSD was first trained on the COCO dataset (Common Objects in Context) and was then fine-tuned on PASCAL VOC reaching 72.7% mAP (mean average precision).

It detect 20 objects in images (+1 for the background class), including airplanes, bicycles, birds, boats, bottles, buses, cars, cats, chairs, cows, dining tables, dogs, horses, motorbikes, people, potted plants, sheep, sofas, trains, and tv monitors.

SECTION 4

Hardware
  • Raspberry pi
  • camera

Pros

  • Low cost
  • Low power
  • Linux

Software

The following libraries are required:

  • numby
  • argparse
  • cv2
 

SECTION 5

Example

 

SECTION 6

Conclusion and Next steps