Essays about: "Parallel Computing Framework"

Showing result 1 - 5 of 42 essays containing the words Parallel Computing Framework.

  1. 1. Low-power Implementation of Neural Network Extension for RISC-V CPU

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Dario Lo Presti Costantino; [2023]
    Keywords : Artificial intelligence; Deep learning; Neural networks; Edge computing; Convolutional neural networks; Low-power electronics; RISC-V; AI accelerators; Parallel processing; Artificiell intelligens; Deep learning; Neurala nätverk; Edge computing; konvolutionella neurala nätverk; Lågeffektelektronik; RISC-V; AI-acceleratorer; Parallell bearbetning;

    Abstract : Deep Learning and Neural Networks have been studied and developed for many years as of today, but there is still a great need of research on this field, because the industry needs are rapidly changing. The new challenge in this field is called edge inference and it is the deployment of Deep Learning on small, simple and cheap devices, such as low-power microcontrollers. READ MORE

  2. 2. Evaluation of FPGA-based High Performance Computing Platforms

    University essay from Linköpings universitet/Datorteknik

    Author : Martin Frick-Lundgren; [2023]
    Keywords : FPGA; High performance computing; BUDE; GEMM; CPU; GPU;

    Abstract : High performance computing is a topic that has risen to the top in the era ofdigitalization, AI and automation. Therefore, the search for more cost and timeeffective ways to implement HPC work is always a subject extensively researched.One part of this is to have hardware that is capable to improve on these criteria. READ MORE

  3. 3. Register Caching for Energy Efficient GPGPU Tensor Core Computing

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Qiran Qian; [2023]
    Keywords : Computer Architecture; GPGPU; Tensor Core; GEMM; Energy Efficiency; Register File; Cache; Instruction Scheduling; Datorarkitektur; GPGPU; Tensor Core; GEMM; energieffektivitet; registerfil; cache; instruktionsschemaläggning;

    Abstract : The General-Purpose GPU (GPGPU) has emerged as the predominant computing device for extensive parallel workloads in the fields of Artificial Intelligence (AI) and Scientific Computing, primarily owing to its adoption of the Single Instruction Multiple Thread architecture, which not only provides a wealth of thread context but also effectively hide the latencies exposed in the single threads executions. As computational demands have evolved, modern GPGPUs have incorporated specialized matrix engines, e. READ MORE

  4. 4. Implementation of Bolt Detection and Visual-Inertial Localization Algorithm for Tightening Tool on SoC FPGA

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Muhammad Ihsan Al Hafiz; [2023]
    Keywords : Bolt detection; Visual-Inertial localization; System-on-Chip SoC ; Field-Programmable Gate Array FPGA ; Machine learning; Perspective-n-Points; Error-State Extended Kalman Filter ESEKF ; High-Level Synthesis HLS ; YOLO; Tightening tool; Bultdetektering; visuell-tröghetslokalisering; System-on-Chip SoC ; Field-Programmable Gate Array FPGA ; Machine Learning; Perspective-n-Points; Error-State Extended Kalman Filter ESEKF ; High-Level Synthesis HLS ; YOLO; åtdragningsverktyg;

    Abstract : With the emergence of Industry 4.0, there is a pronounced emphasis on the necessity for enhanced flexibility in assembly processes. In the domain of bolt-tightening, this transition is evident. Tools are now required to navigate a variety of bolts and unpredictable tightening methodologies. READ MORE

  5. 5. Modernizing and Evaluating the Autotuning Framework of SkePU 3

    University essay from Linköpings universitet/Institutionen för datavetenskap

    Author : Basel Nsralla; [2022]
    Keywords : SkePU; Autotuning; Parallel Computing; Multicore; OpenCL; OpenMP;

    Abstract : Autotuning is a method which enables a program to automatically choose the most suitable parameters that optimizes it for a certain goal e.g. speed, cost, etc. READ MORE