While Computer Vision is making amazing progress on self-supervised learning only in the last few years, self-supervised learning has been a first-class citizen in NLP research for quite a while. Language Models have existed since the 90’s even before the phrase “self-supervised learning” was termed. The Word2Vec paper from 2013 popularized this paradigm and the field has rapidly progressed applying these self-supervised methods across many problems.
At the core of these self-supervised methods lies a framing called “pretext task” that allows us to use the data itself to generate labels and use supervised methods to solve unsupervised problems. These are also referred to as “auxiliary task” or “pre-training task“. The representations learned by performing this task can be used as a starting point for our downstream supervised tasks.
1. Center Word Prediction
2. Neighbor Word Prediction
3. Neighbor Sentence Prediction
4. Auto-regressive Language Modeling
5. Masked Language Modeling
6. Next Sentence Prediction
7. Sentence Order Prediction
8. Sentence Permutation
9. Document Rotation
10. Emoji Prediction
References
Ryan Kiros, et al. “Skip-Thought Vectors”
Tomas Mikolov, et al. “Efficient Estimation of Word Representations in Vector Space”
Alec Radford, et al. “Improving Language Understanding by Generative Pre-Training”
Jacob Devlin, et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”
Yinhan Liu, et al. “RoBERTa: A Robustly Optimized BERT Pretraining Approach”
Zhenzhong Lan, et al. “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations”
Mike Lewis, et al. “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension”
Bjarke Felbo, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm”
链接地址:https://amitness.com/2020/05/self-supervised-learning-nlp/