Students are often tasked with reading a document and producing a summary (for example, a book report) to demonstrate both reading comprehension and writing ability. Thisis one of the most challenging tasks in natural language processing, involving understanding of long passages, information compression, and language generation. The dominant paradigm for training machine learning models to do this is(seq2seq) learning, where a neural network learns to map input sequences to output sequences. While these seq2seq models were initially developed using,encoder-decoder models have recently become favored as they are more effective at modeling the dependencies present in the long sequences encountered in summarization.
Transformer models combined with self-supervised pre-training (e.g.,,,,,,,) have shown to be a powerful framework for producing general language learning, achieving state-of-the-art performance when fine-tuned on a wide array of language tasks. In prior work, the self-supervised objectives used in pre-training have been somewhat agnostic to the down-stream application in favor of generality; we wondered whether better performance could be achieved if the self-supervised objective more closely mirrored the final task.
In “” (to appear at the), we designed a pre-training self-supervised objective (calledgap-sentence generation) for Transformer encoder-decoder models to improve fine-tuning performance on abstractive summarization, achieving state-of-the-art results on 12 diverse summarization datasets. Supplementary to the paper, we are also releasing the training code and model checkpoints on.