nlp is a lightweight and extensible library to easily share and access datasets and evaluation metrics for Natural Language Processing (NLP).
nlp has many interesting features (beside easy sharing and accessing datasets/metrics):
Build-in interoperability with Numpy, Pandas, PyTorch and Tensorflow 2
Lightweight and fast with a transparent and pythonic API
Strive on large datasets: nlp naturally frees the user from RAM memory limitation, all datasets are memory-mapped on drive by default.
Smart caching: never wait for your data to process several times
nlp currently provides access to ~100 NLP datasets and ~10 evaluation metrics and is designed to let the community easily add and share new datasets and evaluation metrics.
nlp originated from a fork of the awesome TensorFlow Datasets and the HuggingFace team want to deeply thank the TensorFlow Datasets team for building this amazing library. More details on the differences between nlp and tfds can be found in the section Main differences between nlp and tfds.