(Full PyTorch Code Reference: https://github.com/autonomousvision/transfuser)
Image-only model, LiDAR-only model (existing models that using only one type of input) → bad performance on adversarial scenarios (environments with many variables in driving)
Problem of image-only model: driving without considering cars coming from the left → crash
Problem of LiDAR-only model: driving without considering a traffic light ahead → signal violation
→ Use two types of data together:
Multi-view 3d object detection network for autonomous driving (2017 CVPR)
But still have some problem - cannot consider all information while extracting feature map by each data (problem of model structure, not about data) ex. difficulty driving in complex situations such as downtown driving - ego-vehicle should consider the relationship between traffic and traffic light
Transfuser: Model that considering whole information by using Transformer when extracting features from single-view image nad LiDAR input data
Task
point-to-point navigation (completing without accident along waypoint to the goal location)