A preprint paper published by researchers at DeepMind and the CISPA Helmholtz Center for Information Security describes an AI system capable of reverse-engineering the black box functions of programs written in educational programming language Karel. Given access only to the inputs and outputs (I/Os) of an application, they claim the system — dubbed IReEn — can iteratively improve a copy of the target application until it becomes functionally equivalent to the original.
Reverse-engineering might carry a nefarious connotation in some circles, but it isn’t without legitimate applications. For instance, it can help recover software if the source code was lost or aid in the detection and neutralization of malware. Although several machine learning-driven reverse-engineering techniques have been proposed, most can’t recover functional and human-interpretable forms of programs. But IReEn can.
IReEn obtains a set of I/Os by querying the target program’s functions using random inputs drawn from a distribution. A module called a neural program synthesizer — conditioned on the obtained I/Os — outputs clone programs and uses a scoring system to rate the clones in terms of closeness to the original. If the best candidate doesn’t cover all of the I/Os, the system selects a subset of I/Os that weren’t covered by the best candidate and conditions them on a program synthesizer for the next iteration.