PyTorch Large Model Support
PyTorch Large Model Support (LMS) is a feature in the PyTorch provided by IBM Watson Machine Learning Community Edition (WML CE) that allows the successful training of deep learning models that would otherwise exhaust GPU memory and abort with "out-of-memory" errors. LMS manages this oversubscription of GPU memory by temporarily swapping tensors to host memory when they are not needed.
One or more elements of a deep learning model can lead to GPU memory exhaustion.
These include:
Model depth and complexity
Base data size (for example, high-resolution images)
Batch size
Traditionally, the solution to this problem has been to modify the model until it fits in GPU memory. This approach, however, can negatively impact accuracy – especially if concessions are made by reducing data fidelity or model complexity.
With LMS, deep learning models can scale significantly beyond what was previously possible and, ultimately, generate more accurate results.
处理GPU的OOM问题,可以把GPU的内存临时切换到宿主机内存。
github地址:https://www.easycstech.com/forum/artificial-intelligence/create-post