• Models based on feed-forward networks, which view text as a bag of words (Section 2.1).
• Models based on RNNs, which view text as a sequence of words, and are intended to capture word
dependencies and text structures (Section 2.2).
• CNN-based models, which are trained to recognize patterns in text, such as key phrases, for classification
• Capsule networks, which address the information loss problem suffered by the pooling operations of CNNs, and recently have been applied to text classification (Section 2.4).
• The attention mechanism, which is effective to identify correlated words in text, and has become a useful
tool in developing deep learning models (Section 2.5).
• Memory-augmented networks, which combine neural networks with a form of external memory, which
the models can read from and write to (Section 2.6).
• Transformers, which allow for much more parallelization than RNNs, making it possible to efficiently
(pre-)train very large language models using GPU clusters (Section 2.7).
• Graph neural networks, which are designed to capture internal graph structures of natural language, such
as syntactic and semantic parse trees (Section 2.8).
• Siamese Neural Networks, designed for text matching, a special case of text classification (Section 2.9) .
• Hybrid models, which combine attention, RNNs, CNNs, etc. to capture local and global features of sentences and documents (Section 2.10).
• Finally, in Section 2.11, we review modeling technologies that are beyond supervised learning, including
unsupervised learning using autoencoder and adversarial training, and reinforcement learning.