Objects as Points

Introduction

CenterNet → object size, dimension, 3D extent, orientation, pose와 같은 속성들을 center location으로부터 direct regression

heatmap을 생성하는 fully convolutional network에 input image를 넣음으로서 동작 - heatmap의 peak는 object center에 해당
각 peak의 image feature는 bounding box의 height 및 width를 예측
inference 과정에서는 post-processing을 위한 NMS(Non-maximal Suppression) 과정 없이 이루어짐

CenterNet은 각 center point에서 output을 추가하여 human pose estimation이나 3d object detection의 task로 확장 가능

pose estimation을 수행하려면 2d joint 위치를 center로부터의 offset으로 간주하고 center point 위치에서 직접 regression 수행
3d object detection을 수행하려면 object absolute depth, 3d bounding box dimension, object orientation을 regression

Image:

$$ I\in R^{W\times H\times 3} $$

모델의 목표는 keypoint heatmap을 추론하는 것

Heatmap:

$$ \hat{Y}\in [0, 1]^{\frac{W}{R}\times\frac{H}{R}\times C} $$