[Reference] Active Object Localization with Deep Reinforcement Learning
Important progress for improving the accuracy of object detectors has been recently possible with Convolutional Neural Networks (CNNs), which leverage big visual data and deep learning for image categorization. A successful model is the R-CNN
detector proposed by Girshick et al.
RCNN = Region Proposal + CNN
1. selective search: 일단 물체가 있을 것 같은 공간에 2000개 정도로 box를 많이 뽑는다.
Generating category independent region proposals
:이때 region proposal의 방법으로selective search를 사용
R-CNN uses super pixel based selective search algorithm
2. 그 다음 원하는 size로 resize(227x227) 해준다.
Warping
3. 그러면 CNN에 input할 수 있고,CNN을 통해 feature을 추출한다.
Extract a fixed length feature vector from pre-trained CNN
: 이때 CNN의 모델로는 AlexNet을 사용
AlexNet is used tto extrect feature. 4,096 features are extract.
4. 이때 추출된 feature들을 SVM으로 분류한다.
Class specific linear SVMs
: 이때 찾고싶은 class 개수가 20개라면, 분류 class case는 '고려하지 않는 background' class 하나가 추가된 21개의 class로 분류를 진행
5. bounding box regression
Bounding box가 뭔지 찾는 것은 box의 좌표값을 예측하는 것과 마찬가지이므로 이를 'Bounding box regression'이라고 한다.
selective search로 물체가 있을 것 같은 boundging box를 2천개나 뽑았음에도 왜 다시 bounding box regression을 해주는 이유는'selective search'는 '있을 법한'곳을 찝어준는 것이므로 정확도가 낮기 때문이다.
그래서 찾은 bounding box가 어떻게 옮겨져야 더 실제 bounding box(Ground truth)와 비슷해 질지를찾는 과정을 진행해야 한다.
그래서 box의 중심점 x, y, box의 width, box의 height 이 4가지 요소들이 추가로 학습되면서 Ground truth를 찾는 회귀식을 세워
손실함수를 최소화 시킬 수 있는 weight값을 찾는 bounding box regression이 진행되어야 한다.
Active Object Localization
In this article, they propose class-specific active detection model that learns to localize target objects known by the system(Active Object Localization).
The proposed model follows a top-down search strategy, which starts by analyzing the whole scene and then proceeds to narrow down the correct location of objects.
this method differs from the RCNN bounding box regression algorithms, our approach does not localize objects following a single, structured prediction method.
We propose a dynamic attentionaction strategy that requires to pay attention to the contents of the current region, and to transform the box in such a way that the target object is progressively more focused.
즉, 물체 검출 작업에 Deep Q-Learning 사용
Deep Q-Learning
To stimulate the attention of the proposed agent, we use a reward function proportional to how well the current box covers the target object. We incorporate the reward function in a reinforcement learning setting to learn a localization policy, based on the DeepQNetwork algorithm.
- Environment: single image
- State: 튜플 형식으로, representation with feature information of the currently visible region and past actions.
- Action: Bouding Box의 상하좌우 이동, 확대/축소, Fatter, Taller, 종료
- 이동 및 크기 변화는 현재 box 크기에 비례하여 진행 (α = 0.2)
- Reward: positive and negative rewards for each decision made during the training phase.
- 상태 s에서 s'로 이동하기 위해 action a를 했을 때, IoU가 상태 s에서 상태 s '로 개선되면 보상이 양수(1)이고 그렇지 않으면 음수(-1)다.
- r ∈ {−1, +1}
- During testing, the agent does not receive rewards and does not update the model either, it just follows the learned policy.
- Goal: landing a tight box in a target object that can be observed in the environment.
Network Architecture
- input image크기: 224 x 224
- 특징 추출: Pre-trained CNN
reference
www.slideshare.net/ssuser06e0c5/q-learning-cnn-object-localization
Active Object Localization with Deep Reinforcement Learning(https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Caicedo_Active_Object_Localization_ICCV_2015_paper.pdf)
https://github.com/otoofim/ObjLocalisation
https://github.com/ambirpatel/Object-Localization-using-Deep-Reinforcement-Learning
fairyonice.github.io/Object_detection_with_PASCAL_VOC2012_selective_search.html