This blog post is inspired by the paper titled AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE from google’s research team. The paper proposes using a pure Transformer applied directly to image patches for image classification tasks. The Vision Transformer ...