NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems. It provides a high level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS cuDF library.
Recommender systems require massive datasets to train, particularly for deep learning based solutions. The transformation of these datasets after ETL in order to prepare them for model training is particularly challenging. Often the time taken to do steps such as feature engineering, categorical encoding and normalization of continuous variables exceeds the time it takes to train a model.
NVTabular is designed to support Data Scientists and ML Engineers trying to train (deep learning) recommender systems or other tabular data problems by allowing them to:
Prepare datasets quickly and easily in order to experiment more and train more models.
Work with datasets that exceed GPU and CPU memory without having to worry about scale.
Focus on what to do with the data, and not how to do it, using our abstraction at the operation level.
It is also meant to help ML/Ops Engineers deploying models into production by providing:
Faster dataset transformation, allowing for production models to be trained more frequently and kept up to date helping improve responsiveness and model performance.
Integration with model serving frameworks like NVIDIA’s Triton Inference Server to make model deployment easy.
Statistical monitoring of the dataset for distributional shift and outlier detection during production training or inference.
The library is designed to be interoperable with both PyTorch and Tensorflow using batch data-loaders that we have developed as extensions of native framework code. NVTabular provides the option to shuffle data during preprocessing, allowing the data-loader to load large contiguous chunks from files rather than individual elements. This allows us to do per epoch shuffles orders of magnitude faster than a full shuffle of the dataset. We have benchmarked our data-loader at 100x the baseline item by item PyTorch dataloader and 3x the Tensorflow batch data-loader, with several optimizations yet to come in that stack.
Extending beyond model training, we plan to provide integration with model serving frameworks like NVIDIA’s Triton Inference Server, creating a clear path to production inference for these models and allowing the feature engineering and preprocessing steps performed on the data during training to be easily and automatically applied to incoming data during inference.
Our goal is faster iteration on massive tabular datasets, both for experimentation during training, and also for production model responsiveness.