AXI-PACK : Near-memory Bus Packing for Bandwidth-Efficient Irregular Workloads

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: General propose processor (GPP) are demanded high performance in dataintensive applications, such as deep learning, high performance computation (HPC), where algorithm kernels like GEMM (general matrix-matrix multiply) and SPMV (sparse matrix-vector multiply) kernels are intensively used. The performance of these data-intensive applications are bounded with memory bandwidth, which is limited by computing & memory access coupling and memory wall effect. Recent works proposed streaming ISA extensions to maximum memory bandwidth, which decouple computation and memory access, prefetching data by memory access pattern, hiding architecture latency. However, the performance of irregular memory access still suffers from low bus utilization when transferring narrow stream elements on wide memory buses. To solve this problem, the project proposes a new on-chip bus protocol - AXI-PACK, extended from Advance eXtensible Interface4 (AXI4) on-chip protocol, which enables high bandwidth end-to-end irregular memory streaming. Next, an on-chip multi-banked SRAM memory system is designed for supporting AXI-PACK, and AXI-PACK is evaluated under an open-source RISC-V vector processor system. AXI-PACK demonstrates high bus utilization and bandwidth in irregular access, which helps speedup GEMM(element size = 32bits) kernel 6.1 times and SpMV(element size = 32bits) kernel 3.0 times under bus data width of 256 bits, comparing to standard AXI4 bus.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)