An Initial Investigation of Neural Decompilation for WebAssembly

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: WebAssembly is a new standard of the World Wide Web that is used as a compilation target and which is meant to enable high-performance applications. As it becomes more popular, the need for corresponding decompilers increases, for security reasons for instance. However, building an accurate decompiler capable of restoring the original source code is a complicated task. Recently, Neural Machine Translation (NMT) has been proposed as an alternative to traditional decompilers which involve a lot of manual and laborious work. We investigate the viability of Neural Machine Translation for decompiling WebAssembly binaries to C source code. The state-of-the-art transformer and LSTM sequence-to-sequence (Seq2Seq) models with attention are experimented with. We build a custom randomly-generated dataset ofWebAssembly to C pairs of source code and use different metrics to quantitatively evaluate the performance of the models. Our implementation consists of several processing steps that have the WebAssembly input and the C output as endpoints. The results show that the transformer outperforms the LSTM based neural model. Besides, while the model restores the syntax and control-flow structure with up to 95% of accuracy, it is incapable of recovering the data-flow. The different benchmarks on which we run our evaluation indicate a drop of decompilation accuracy as the cyclomatic complexity and the nesting of the programs increase. Nevertheless, our approach has a lot of potential, encouraging its usage in future works. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)