2022. 10. 1. 21:56ㆍReview/- 2D Object Detection
Motivation
The object detection type of R-CNN(like Faster-RCNN...) use least two steps
Step-1: generate potential bounding box
Step-2: run classifier on proposed boxes
These complex pipelines are slow and hard to optimize
It makes models train separately
Main Idea
Model predicts multiple bounding boxes and class probabilites
1. Unified Detection
YOLO uses features from the entire image, and predict bounding boxes and classes simultaneously
1. Divides the input image into S x S grid.
2. If the center of an object falls into a grid cell, that grid cell is responsible.
3. Each grid cell predicts B bounding boxes and confidence scores.
- confidence score reflect how confident the box contains an object.
4. Each bounding box consists of 5 predictions: x, y, w, h, confidence.
5. Each grid cell also predicts C conditional class probabilites(=Pr(Class|Object)).
- preidictions have S x S x (B * 5 + C).
6. Select bounding box using NMS.
S | Num of grid cell each width, height | B | Num of bounding box in each grid cell |
Pr() | probability(0~1) | IoU | Intersection of Union (more: https://find-knowledge.tistory.com/2) |
Confidence | Pr(Object) * IoU | NMS | non-max suppression (more: https://find-knowledge.tistory.com/2) |
(x, y) | Center coordinates of bounding box (relative to the grid cell) |
(w, h) | The width and height relative to whole iamge |
S=7, B=2, C=20, dataset=Pascal VOC in paper
2. Network Design
Network has 24 convolutional layers, 2 fully connected layers. (fast version use 9 conv layers)
Network use 1 x 1 conv layer followed by 3 x 3 conv layer like inception concept.
Final output of network is 7 x 7 x 30.
2-1. Pretraining
Pretraining use the first 20 convolutional layers followed avg-pooling layer and fc layer.
Using ImageNet dataset for classification.
2-2. Training
After pretraining remove last 2 layer, add 4 conv layers and 2 fc layer with randomly initialized weights.
Increase the input resolution of the network 224x224 -> 448x448
Final layer predicts class probabilities and bouding box coordinates.
Weight and height of bounding box are nomalized by the image width and height. (0~1)
Parametrize the bounding box x and y coordinates to be offsets of a particular grid cell location. (0~1)
YOLO use a linear activation fuction for final layer, other layer use leaky ReLU
2-3. Loss
YOLO is based on SSE(sum-squared error)
But SSE dose not perfectly equal to maximize average precision with two problem.
1. It weights localiztion error equally with classification error which may not be ideal.
2. Many grid cells do not contain any object, this makes confidence score to zero
So alomst grid cells train for confidence score=zero, it makes model instability
To remedy this, YOLO increase localization loss and decrease confidence loss.
YOLO set weights $λ_{coord}$=5, $λ_{noobj}$=0.5.
SSE has another problem that equally weights errors in large boxes and small boxes.
SSE reflect that small deviations in small boxes more sensitive than in large boxes.
YOLO use square root of the bounding box width and height instead of width and height directly
YOLO predicts multiple bounding boxes per grid cell. At training time YOLO only want one bounding box predictor each object. YOLO assign one prediction has the highes IoU.
$1_{i}^{obj}$ | Valuable of object exists in ith cell (1 or 0) |
$λ_{coord}$ | Constant of balancing with coordinates loss and classification loss (=5) |
$1_{ij}^{obj}$ | Valuable of jth bounding box is reponsible in ith cell (1 ro 0) |
$λ_{noobj}$ | Constant of balancing with obj box and no obj box (=0.5) |
① compute coordinate loss of jth bounding box in ith cell (object exist)
② compute size loss of jth bounding box in ith cell (object exist)
③ compute confidence score loss (object exist, Ci=1)
④ compute confidence score loss (object not exist, Ci=1)
⑤ compute conditional class probability loss (object exist, correct class c: Pi(c)=1, otherwise: Pi(c)=0)
2-4. Hyper parameters
batch_size | 64 | learning_rate | 1e-3 ~ 1e-2 |
momentum | 0.9 (decay of 5e-4) | 1e-2 (for 75 epochs) | |
epochs | 135 | 1e-3 (for 30 epochs) | |
dropout ratio | 0.5 | 1e-4 (for last 30 epochs) |
Data augmentation: randomly scaling and translations of up to 20% of the original image size
Activation function: a linear activation fuction for final layer, other layer use leaky ReLU
2-5. Inference
Just like in training, predicting detections requires one network evaluation.
On Pascal Voc the network predict 98(7*7*2) bounding boxes per image and class probabilies for each box.
So YOLO is extremely fast at test time.
Some large objects or objects near the border of multiple cells can be well localized by multiple cells.
NMS can be used to fix these multiple detections (2~3% increasing mAP in YOLO)
3. Limitations of YOLO
1. YOLO imposes stroing spatial constraints since each grid cell only predicts two boxes and one class.
YOLO can't predict number of nearby small objects because of this spatial constraints .
2. YOLO learns to predict bounding boxes from data.
it struggles to generalize to object in new or unusual aspect ratio or configurations.
3. Loss fuction treats errors the same in small bouding boxes and large bounding boxes.
a small error in a small box has more greater effect on IoU than large box.
Experiments
Correct: | correct class and IoU > 0.5 | Localization: | correct class, 0.1 <IoU < 0.5 |
Similar: | class is similar, IoU > 0.1 | Other: | class is wrong, IOU > 0.1 |
Background: | IOU < 0.1 for any object |
Reference
[IMG-ALL]: https://arxiv.org/pdf/1506.02640.pdf
[STUDY]: https://docs.google.com/presentation/d/1aeRvtKG21KHdD5lg6Hgyhx5rPq_ZOsGjG5rJ1HP7BbA/pub?start=false&loop=false&delayms=3000&slide=id.p, https://www.youtube.com/c/Deeplearningai
[Implementation]: https://github.com/kongbuhaja/YOLO_v1
'Review > - 2D Object Detection' 카테고리의 다른 글
Paper review: YOLOv4(CVPR 2020) (0) | 2023.04.18 |
---|---|
Paper review: YOLOv3(arxiv 2018) (0) | 2022.11.27 |
Paper review: YOLOv2(CVPR 2017) (0) | 2022.10.20 |
Paper review: SSD(ECCV 2016) (0) | 2022.10.09 |
Paper review: Faster R-CNN(NeurIPS 2015) (0) | 2022.09.22 |