(Full PyTorch Code Reference: https://daebaq27.tistory.com/112)

Transformer (Attention) Concept

Attention is All You Need (https://frost-crate-a82.notion.site/Attention-is-All-You-Need-e5b8aca9d98c4056a75b8301256cd47e?pvs=4)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

ViT & CLIP-5.jpg

ViT & CLIP-6.jpg

Pros of ViT

Cons of ViT