A fAIry tale of the Inductive Bias

As we have seen in recent years deep learning has had exponential growth both in use and in the number of models. What paved the way for this success is perhaps the <a href="https://en.wikipedia.org/wiki/Transfer_learning" rel="noopener ugc nofollow" target="_blank">transfer learning</a> itself-the idea that a model could be trained with a large amount of data and then used for a myriad of specific tasks. In recent years, a paradigm has emerged: <a href="https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)" rel="noopener ugc nofollow" target="_blank">transformer</a> (or otherwise based on this model) is used for NLP applications. While for images, <a href="https://en.wikipedia.org/wiki/Vision_transformer" rel="noopener ugc nofollow" target="_blank">vision transformers</a> or <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network" rel="noopener ugc nofollow" target="_blank">convolutional networks</a> are used instead. On the other hand, while we have plenty of work showing in practice that these models work well, the theoretical understanding of why has lagged behind. This is because these models are very broad and it comes difficult to experiment. The fact that <a href="https://en.wikipedia.org/wiki/Vision_transformer" rel="noopener ugc nofollow" target="_blank">Vision Transformers</a> outperform convolutional neural networks <a href="https://towardsdatascience.com/metas-hiera-reduce-complexity-to-increase-accuracy-30f7a147ad0b" rel="noopener" target="_blank">by having a theoretically less inductive bias for vision</a> shows that there is a theoretical gap to be filled. This article focuses on: <a href="https://medium.com/towards-data-science/a-fairy-tale-of-the-inductive-bias-d418fc61726c">Read More</a>