Word2Vec, GloVe, and FastText, Explained

Computers don’t understand words like we do. They prefer to work with numbers. So, to help computers understand words and their meanings, we use something called embeddings. These embeddings numerically represent words as mathematical vectors. The cool thing about these embeddings is that if we learn them properly, words that have similar meanings will have similar numeric values. In other words, their numbers will be closer to each other. This allows computers to grasp the connections and similarities between different words based on their numeric representations. One prominent method for learning word embeddings is Word2Vec. In this article, we will delve into the intricacies of Word2Vec and explore its various architectures and variants. <h1>Word2Vec</h1>   Figure 1: Word2Vec architectures (Source) In the early days, sentences were represented with n-gram vectors. These vectors aimed to capture the essence of a sentence by considering sequences of words. However, they had some limitations. N-gram vectors were often large and sparse, which made them computationally challenging to create. This created a problem known as the curse of dimensionality. Essentially, it meant that in high-dimensional spaces, the vectors representing words were so far apart that it became difficult to determine which words were truly similar. <a href="https://towardsdatascience.com/word2vec-glove-and-fasttext-explained-215a5cd4c06f">Website</a>