Investigation of 8-bit Floating-Point Formats for Machine Learning

University essay from Linköpings universitet/Datorteknik

Abstract: Applying machine learning to various applications has gained significant momentum in recent years. However, the increasing complexity of networks introduces challenges such as a larger memory footprint and decreased throughput. This thesis aims to address these challenges by exploring the use of 8-bit floating-point numbers for machine learning. The numerical accuracy was evaluated empirically by implementing software models of the arithmetic and running experiments on a neural network provided by MediaTek. While the initial findings revealed poor accuracy when performing computations solely with 8-bit floating-point arithmetic, a significant improvement could be achieved by using a higher-precision accumulator register. The hardware cost was evaluated using a synthesis tool by measuring the increase in silicon area and impact on clock frequency after four new vector instructions had been implemented. A large increase in area was measured for the functional blocks, but the hardware cost for interconnect and instruction decoding were negligible. A slight decrease in system clock frequency was observed, although marginally. Ideas that likely could improve the accuracy of inference calculations and decrease the hardware cost are proposed in the section for future work.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)