[Reference] Active Object Localization with Deep Reinforcement Learning

Project/[Landmark Detection] RL

[Reference] Active Object Localization with Deep Reinforcement Learning

HJChung 2020. 12. 2. 07:15

https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Caicedo_Active_Object_Localization_ICCV_2015_paper.pdf

Important progress for improving the accuracy of object detectors has been recently possible with Convolutional Neural Networks (CNNs), which leverage big visual data and deep learning for image categorization. A successful model is the R-CNN detector proposed by Girshick et al.

RCNN = Region Proposal + CNN

1. selective search: 일단 물체가 있을 것 같은 공간에 2000개 정도로 box를 많이 뽑는다.

Generating category independent region proposals
:이때 region proposal의 방법으로selective search를 사용
R-CNN uses super pixel based selective search algorithm

2. 그 다음 원하는 size로 resize(227x227) 해준다.

Warping

3. 그러면 CNN에 input할 수 있고,CNN을 통해 feature을 추출한다.

Extract a fixed length feature vector from pre-trained CNN
: 이때 CNN의 모델로는 AlexNet을 사용
AlexNet is used tto extrect feature. 4,096 features are extract.

4. 이때 추출된 feature들을 SVM으로 분류한다.

Class specific linear SVMs
: 이때 찾고싶은 class 개수가 20개라면, 분류 class case는 '고려하지 않는 background' class 하나가 추가된 21개의 class로 분류를 진행

5. bounding box regression

Bounding box가 뭔지 찾는 것은 box의 좌표값을 예측하는 것과 마찬가지이므로 이를 'Bounding box regression'이라고 한다.

selective search로 물체가 있을 것 같은 boundging box를 2천개나 뽑았음에도 왜 다시 bounding box regression을 해주는 이유는'selective search'는 '있을 법한'곳을 찝어준는 것이므로 정확도가 낮기 때문이다.
그래서 찾은 bounding box가 어떻게 옮겨져야 더 실제 bounding box(Ground truth)와 비슷해 질지를찾는 과정을 진행해야 한다.
그래서 box의 중심점 x, y, box의 width, box의 height 이 4가지 요소들이 추가로 학습되면서 Ground truth를 찾는 회귀식을 세워
손실함수를 최소화 시킬 수 있는 weight값을 찾는 bounding box regression이 진행되어야 한다.

Active Object Localization

In this article, they propose class-specific active detection model that learns to localize target objects known by the system(Active Object Localization).

The proposed model follows a top-down search strategy, which starts by analyzing the whole scene and then proceeds to narrow down the correct location of objects.

this method differs from the RCNN bounding box regression algorithms, our approach does not localize objects following a single, structured prediction method.

We propose a dynamic attentionaction strategy that requires to pay attention to the contents of the current region, and to transform the box in such a way that the target object is progressively more focused.

즉, 물체 검출 작업에 Deep Q-Learning 사용

Deep Q-Learning

To stimulate the attention of the proposed agent, we use a reward function proportional to how well the current box covers the target object. We incorporate the reward function in a reinforcement learning setting to learn a localization policy, based on the DeepQNetwork algorithm.

Environment: single image
State: 튜플 형식으로, representation with feature information of the currently visible region and past actions.

출처: https://www.slideshare.net/ssuser06e0c5/q-learning-cnn-object-localization

Action: Bouding Box의 상하좌우 이동, 확대/축소, Fatter, Taller, 종료
- 이동 및 크기 변화는 현재 box 크기에 비례하여 진행 (α = 0.2)

A box is represented by the coordinates in pixels of its two corners: b = [x1, y1, x2, y2]. Any of the transformation actions make a discrete change to the box by a factor relative to its current size. We set α = 0.2 in all our experiments, since this value gives a good trade-off between speed and localization accuracy.

Reward: positive and negative rewards for each decision made during the training phase.
- 상태 s에서 s'로 이동하기 위해 action a를 했을 때, IoU가 상태 s에서 상태 s '로 개선되면 보상이 양수(1)이고 그렇지 않으면 음수(-1)다.
- r ∈ {−1, +1}

The reward function Ra(s, s′ ) is granted to the agent when it chooses the action a to move from state s to s ′ . Each state s has an associated box b that contains the attended region.

During testing, the agent does not receive rewards and does not update the model either, it just follows the learned policy.
Goal: landing a tight box in a target object that can be observed in the environment.

Network Architecture

input image크기: 224 x 224
특징 추출: Pre-trained CNN

reference

www.slideshare.net/ssuser06e0c5/q-learning-cnn-object-localization

Active Object Localization with Deep Reinforcement Learning(https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Caicedo_Active_Object_Localization_ICCV_2015_paper.pdf)
https://github.com/otoofim/ObjLocalisation
https://github.com/ambirpatel/Object-Localization-using-Deep-Reinforcement-Learning

fairyonice.github.io/Object_detection_with_PASCAL_VOC2012_selective_search.html

'Project > [Landmark Detection] RL' 카테고리의 다른 글

[Medical Image] DICOM(Digital Imaging and Communications in Medicine) (0)	2021.01.19
[Reinforcement Learning] Deep Q-network: Experience replay (0)	2021.01.15
[Reinforcement Learning] Modeling RL Problems: Epsilon-greedy Strategy (0)	2021.01.02
[Reinforcement Learning] reinforcement learning for anatomical landmark detection (0)	2020.11.18
[Reinforcement Learning] Reinforcement Learning이란 (0)	2020.11.17

현재글[Reference] Active Object Localization with Deep Reinforcement Learning

Grace's Tech Blog