deep learning-based object detection architectures
Deep learning based object detection
PleASE CLICK ON ONE OF THE PICTURES FOR SSD OR YOLO TO SEE THE RESULTS USING TWO POPULAR DEEP LEARNING BASED OBJECT DETECTION; SSD AND YOLO
there are three prominent deep learning-based object detection architectures:
- r-cnn and their variants, including the original r-cnn, fast r- cnn, and faster r-cnn
- single shot detector (ssds)
r-cnns are an example of a two-stage deep learning-based object detectors.
- in the first r-cnn publication, rich feature hierarchies for accurate object detection and semantic segmentation, (2013) girshick et al. proposed an object detector that required an algorithm such as selective search (or equivalent) to propose candidate bounding boxes that could contain objects.
- these regions were then passed into a cnn for classification, leading to one of the first deep learning-based object detectors.
the problem with the standard r-cnn method was that it was slow and not a complete end-to-end object detector.
girshick et al. published a second paper in 2015, entitled fast r- cnn. the fast r-cnn algorithm made considerable improvements to the original r-cnn, namely increasing accuracy and reducing the time it took to perform a forward pass. the model relied on an external region proposal algorithm.
it wasn’t until girshick et al.’s follow-up 2015 paper, faster r-cnn: towards real-time object detection with region proposal networks, that r-cnns became a true end-to-end deep learning object detector by removing the selective search requirement and instead relying on a region proposal network (rpn) that is (1) fully convolutional and (2) can predict the object bounding boxes and “objectness” scores (i.e., a score quantifying how likely it is a region of an image may contain an image). the outputs of the rpns are then passed into the r-cnn component for final classification and labeling.
while r-cnns tend to very accurate, the biggest problem with the r-cnn family of networks is their speed — they were incredibly slow, obtaining only 5 fps on a gpu. As we will be using low power devices such as raspberry pi’s we will not use this method. Instread we have focused on the other two methods; single shot detector (ssds) & yolo