The curse of racing for high performance While deep learning models are getting better in terms of performance, they also tend to get bigger and more expensive to compute. Until recently, it can seem that state-of-the-art models were achieved by using the good ol' “Stack more layers !” property. Indeed, if you take a look at the history of state-of-the-art models on ImageNet, you will notice that each year, the new best results were achieved by using a deeper network. It seems that we are obsessed with getting the best results as possible, leading to models that can involve hundreds of millions or parameters ! But what's the point of having a top-tier performing network if we cannot use it?