Motivations
One of the major challenges when training a model in (Deep) Machine Learning is co-adaptation. This means that the neurons are very dependent on each other. They influence each other considerably and are not independent enough regarding their inputs. It is also common to find cases where some neurons have a predictive capacity that is more significant than others. In other words, we have an output that is excessively dependent on one neuron.
These effects must be avoided and the weight must be distributed to prevent overfitting. The co-adaptation and the high predictive capacity of some neurons can be regulated with different regularization methods. One of the most used is the Dropout. Yet the full capabilities of dropout methods are rarely used.
Depending on whether it is a DNN, a CNN or an RNN, different dropout methods can be applied. In practice, we only use one (or almost). I think that’s a terrible pitfall. So in this article, we will dive mathematically and visually into the world of dropouts to understand :
the Standard Dropout method
variants of the Standard Dropout
dropout methods applied to CNNs
dropout methods applied to RNNs
other dropout applications (Monte Carlo and compression)
(Sorry I couldn’t stop, so it’s a little more than 12 methods… 😄)