Transformers Well Explained: Masking

Masking is simply the act of hiding a word and asking the model to predict it. Like in the following “Being strong is <mask> what matters”. It is a different task that forces the model to generate an embedding with more contextual semantics in theory. When we ask a model to predict a third word given the previous two words, we call this a one-directional language model. The model will be trying to predict the word given the previous context on the left. However, when we train on masking, the model makes predictions based on the context from the left and the right. We call this a bidirectional language model. <a href="https://pub.towardsai.net/transformers-well-explained-masking-b7f0e671117c">Click Here</a>