Attention Is All You Need

Encoding: Transform input sentences to vector representations (Embedding된 벡터들을 Sentence Matrix로 변환하는 과정 (주로 Bi-LSTM 이용))
Decoding: Generate output sentences based on vector representations (generated by encoder)
Embedding: Process of mapping text elements such as words and letters into a high-dimensional vector space (토크나이징된 단어 토큰들을 벡터로 변환하는 과정)

Tokenizing → Word Tokens

Embedding → Each Dense Vector (Original Data?)

Encoding → Sentence Matrix

Attention (between Sentence Matrix & Context Vector (or Sentence Matrix)) → Single Vector

Prediction

ViT & CLIP-1.jpg

ViT & CLIP-2.jpg

ViT & CLIP-3.jpg

ViT & CLIP-4.jpg