For the last decade, advances in machine learning have come from two things: improved compute power and better algorithms. These two areas have become somewhat siloed in most people’s thinking: we tend to imagine that there are people who build hardware, and people who make algorithms, and that there isn’t much overlap between the two.
But this picture is wrong. Hardware constraints can and do inform algorithm design, and algorithms can be used to optimize hardware. Increasingly, compute and modelling are being optimized together, by people with expertise in both areas.
My guest today is one of the world’s leading experts on hardware/software integration for machine learning applications. Max Welling is a former physicist and currently works as VP Technologies at Qualcomm, a world-leading chip manufacturer, in addition to which he’s also a machine learning researcher with affiliations at UC Irvine, CIFAR and the University of Amsterdam. Max has many insights to share about the current state of research in machine learning, as well as the future direction of the field, and here were some of my favourite take-homes from the conversation:
Computations cost energy, and drain phone batteries quickly, so machine learning engineers and chipmakers need to come up with clever ways to reduce the computational cost of running deep learning algorithms. One way this is achieved is by compressing neural networks, or identifying neurons that can be removed with minimal consequences for performance, and another is to reduce the number of bits used to represent each network parameter (sometimes all the way down to one bit!). These strategies tend to be used together, and they’re related in some fairly profound ways.
Currently, machine learning models are trained on very specific problems (like classifying images into a few hundred categories, or translating from one language to another), and they immediately fail if they’re applied even slightly outside of the domain they were trained for. A computer vision model trained to recognize facial expressions on a dataset featuring people with darker skin will underperform when tested on a different dataset featuring people with lighter skin, for example. Life experience teaches humans that skin tone shouldn’t affect interpretations of facial features, yet this minor difference is enough to throw off even cutting-edge algorithms today.
So the real challenge is generalizability — something that humans still do much better than machines. But how can we train machine learning algorithms to generalize? Max believes that the answer has to do with the way humans learn: unlike machines, our brains seem to focus on learning physical principles, like “when I take one thing and throw it at another thing, those things bounce off each other.” This reasoning is somewhat independent of what those two things are. By contrast, machines tend to learn in the other direction, reasoning not in terms of universal patterns or laws, but rather in terms of patters that hold for a very particular problem class.
For that reason, Max feels that the most promising future areas of progress in machine learning will concentrate on learning logical and physical laws, rather than specific applications of those laws or principles.