Datasets
We publish the following datasets:
See datasets notebook for an example of how to load the datasets provided below. The extraction notebook shows how to use axcell to extract text and tables from papers.
Evaluation
See the evaluation notebook for the full example on how to evaluate AxCell on the PWCLeaderboards dataset.
Training
pre-training language model on the ArxivPapers dataset
table type classifier and table segmentation on the SegmentedResults dataset
Pre-trained Models
You can download pretrained models here:
axcell — an archive containing the taxonomy, abbreviations, table type classifier and table segmentation model. See the results-extraction notebook for an example of how to load and run the models
language model — ULMFiT language model pretrained on the ArxivPapers dataset