Can we do object detection in a smart way by only looking at some of the windows? The answer is yes. There are two approaches to find this subset of windows, which lead to two different categories of object detection algorithms.
- The first algorithm category is to do region proposal first. This means regions highly likely to contain an object are selected either with traditional computer vision techniques (like selective search), or by using a deep learning-based region proposal network (RPN). Once you have gathered the small set of candidate windows, you can formulate a set number of regression models and classification models to solve the object detection problem. This category includes algorithms like Faster R-CNN[1], R_FCN[2] and FPN-FRCN[3]. Algorithms in this category are usually called two-stage methods. They are generally more accurate, but slower than the single-stage method we introduce below.
- The second algorithm category only looks for objects at fixed locations with fixed sizes. These locations and sizes are strategically selected so that most scenarios are covered. These algorithms usually separate the original images into fixed size grid regions. For each region, these algorithms try to predict a fixed number of objects of certain, pre-determined shapes and sizes. Algorithms belonging to this category are called single-stage methods. Examples of such methods include YOLO[4], SSD[5] and RetinaNet[6]. Algorithms in this category usually run faster but are less accurate. This type of algorithm is often utilized for applications requiring real-time detection.
We’ll discuss two common object detection methods below in more detail.
This is a high-level explanation with no code, but it does a good job of describing at that level what is going on.