I’ve recently had to learn a lot about natural language processing (NLP), specifically Transformer-based NLP models.
Similar to my previous blog post on deep autoregressive models, this blog post is a write-up of my reading and research: I assume basic familiarity with deep learning, and aim to highlight general trends in deep NLP, instead of commenting on individual architectures or systems.
As a disclaimer, this post is by no means exhaustive and is biased towards Transformer-based models, which seem to be the dominant breed of NLP systems (at least, at the time of writing).