State of the art machine learning models are often bulky which often makes them inefficient for deployment in resource-constrained environments, like mobile phones, Raspberry Pis, microcontrollers, and so on. Even if you think that you might get around this problem by hosting your model on the Cloud and using an API to serve results – think of constrained environments where internet bandwidths might not be always high, or where data must not leave a particular device. We need a set of tools that make the transition to on-device machine learning seamless. In this report, I will show you how TensorFlow Lite (TF Lite) can really shine in situations like this. We'll cover model optimization strategies and quantization techniques supported by TensorFlow. Check out the code on GitHub → Thanks to Arun, Khanh, and Pulkit (Google) for sharing incredibly useful tips for this report.