While Computer Vision is making amazing progress on self-supervised learning only in the last few years, self-supervised learning has been a first-class citizen in NLP research for quite a while. Language Models have existed since the 90’s even before the phrase “self-supervised learning” was termed. The Word2Vec paper from 2013 popularized this paradigm and the field has rapidly progressed applying these self-supervised methods across many problems. At the core of these self-supervised methods lies a framing called “ pretext task ” that allows us to use the data itself to generate labels and use supervised methods to solve unsupervised problems. These are also referred to as “ auxiliary task ” or “ pre-training task “. The representations learned by performing this task can be used as a starting point for our downstream supervised tasks.
1. Center Word Prediction 2. Neighbor Word Prediction 3. Neighbor Sentence Prediction 4. Auto-regressive Language Modeling 5. Masked Language Modeling 6. Next Sentence Prediction 7. Sentence Order Prediction 8. Sentence Permutation 9. Document Rotation 10. Emoji Prediction References Ryan Kiros, et al. “Skip-Thought Vectors” Tomas Mikolov, et al. “Efficient Estimation of Word Representations in Vector Space” Alec Radford, et al. “Improving Language Understanding by Generative Pre-Training” Jacob Devlin, et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” Yinhan Liu, et al. “RoBERTa: A Robustly Optimized BERT Pretraining Approach” Zhenzhong Lan, et al. “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations” Mike Lewis, et al. “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension” Bjarke Felbo, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm” 链接地址： https://amitness.com/2020/05/self-supervised-learning-nlp/