Parallel Instruction Decoding for DSP Controllers with Decoupled Execution Units
Abstract: Applications run on embedded processors are constantly evolving. They are for the most part growing more complex and the processors have to increase their performance to keep up. In this thesis, an embedded DSP SIMT processor with decoupled execution units is under investigation. A SIMT processor exploits the parallelism gained from issuing instructions to functional units or to decoupled execution units. In its basic form only a single instruction is issued per cycle. If the control of the decoupled execution units become too fine-grained or if the control burden of the master core becomes sufficiently high, the fetching and decoding of instructions can become a bottleneck of the system. This thesis investigates how to parallelize the instruction fetch, decode and issue process. Traditional parallel fetch and decode methods in superscalar and VLIW architectures are investigated. Benefits and drawbacks of the two are presented and discussed. One superscalar design and one VLIW design are implemented in RTL, and their costs and performances are compared using a benchmark program and synthesis. It is found that both the superscalar and the VLIW designs outperform a baseline scalar processor as expected, with the VLIW design performing slightly better than the superscalar design. The VLIW design is found to be able to achieve a higher clock frequency, with an area comparable to the area of the superscalar design. This thesis also investigates how instructions can be encoded to lower the decode complexity and increase the speed of issue to decoupled execution units. A number of possible encodings are proposed and discussed. Simulations show that the encodings have a possibility to considerably lower the time spent issuing to decoupled execution units.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)