Hardware-based Acceleration of Network-on-Chip Simulation using FPGAs

M, Prabhu Prasad B.

Please use this identifier to cite or link to this item: https://idr.l2.nitk.ac.in/jspui/handle/123456789/17038

Title:	Hardware-based Acceleration of Network-on-Chip Simulation using FPGAs
Authors:	M, Prabhu Prasad B.
Supervisors:	Talawar, Basavaraj.
Keywords:	Department of Computer Science & Engineering;Network-on-Chip;NoC;FPGA;Simulation acceleration;Performance analysis;DSP48E1;BRAM
Issue Date:	2021
Publisher:	National Institute of Technology Karnataka, Surathkal
Abstract:	Replacing the conventional bus-based architectures, Network-on-Chip (NoC) has become a tangible on-chip communication framework in the many-core processors, Chip Multi-Processors (CMPs), and Multi-Processor System-on-Chips (MPSoCs). Also, NoCs have become an integral part of the heterogeneous systems with applicationspecific accelerators such as databases, graph processing, and deep neural networks. In these heterogeneous systems, it is the responsibility of NoCs to interconnect various components. More number of cores are being incorporated in state-of-the-art homogeneous and heterogeneous multi-core processors to achieve high performance and better power efficiency. Likewise, to achieve high performance in the target applications, various components such as processing cores, input/output peripherals, and memory components being integrated on heterogeneous systems are also increasing. When there is an increase in the number of interconnected components, the performance of the target application becomes highly dependent on the performance of NoC. Hence, there is a need to model and evaluate large NoC designs quickly and accurately as thousands of cores are targeted in the near future multi-core architectures due to the advances in CMOS technology. NoC modeling helps understand the impact of various design parameters on the overall system and the performance characteristics. A crucial hurdle in the design and evaluation of large-scale NoC is the lack of rapid methodologies for modeling, which can deliver a high level of accuracy. Analytical models compromise accuracy to achieve results in a short period of time. Hence, to perform the design space exploration of NoCs, designers frequently employ the software simulators. The software simulators provide better accuracy than analytical modeling. When a large-scale NoC with a huge number of nodes is being simulated, the software simulators tend to become too slow. To address the issue of simulation speed, an Field Programmable Gate Arrays (FPGA) based NoC simulation acceleration framework has been proposed in this thesis. A fully parameterized FPGA based NoC simulation framework called YaNoC has been proposed. YaNoC supports the design space exploration of various NoC topologies considering a rich set of router micro-architectural parameters. To simulate the larger topologies, the hard blocks of the FPGA, such as Block RAMs (BRAMs) and DSP blocks, have been employed to map the NoC router components such as FIFO buffers and the crossbar, respectively. Further, a lightweight NoC router architecture has been proposed to reduce the area utilization and improve network performance. The thesis’s initial work employs profiling to analyze the performance of the Booksim2.0 NoC software simulator with various design decision parameters and memory configurations. Various cache design parameters such as cache size, block size, and associativity have been considered to simulate the NoC topologies of Booksim2.0 to observe the effect of cache configurations. The hotspots of the Booksim2.0 simulator are identified, and software optimizations are employed to improve the performance of the Booksim2.0. To reduce the execution time of Booksim2.0, optimization methodologies such as vectorization and thread parallelization are employed. The OpenMP programming model is used for parallelizing and vectorizing the source code of Booksim2.0. Due to high synchronization cost, the gain achieved in simulation speed is not significant. Higher simulation speed can be achieved by sacrificing the simulation accuracy to mitigate the complexity of synchronizations. FPGA-based simulators are becoming a promising approach for enhancing the speed of simulations. An FPGA-based NoC simulation acceleration framework called YaNoC, supporting design space exploration of standard and custom NoC topologies considering a full set of NoC router microarchitectural parameters, has been proposed. YaNoC is capable of designing custom routing algorithms, various traffic patterns. Obtained results show that the YaNoC consumes fewer hardware resources and is faster than the other FPGA based NoC simulation acceleration platforms. Most of the state-of-the-art FPGA based simulators utilize soft logic only for modeling the NoCs, leaving out the hard blocks unutilized. The FPGA soft logic resources become a limiting factor when simulating a large NoC topology. Multiple FPGAs with off-chip memory can be employed to overcome the limitation of the FPGA resources. ii The entire system becomes more complex and slow by using these approaches, leading to a reduction in the system’s performance. Instead of having a multi-FPGA setup to simulate larger topologies, the hard blocks of an FPGA have been utilized efficiently to map the NoC router components. The functionality of the NoC router’s buffer and crossbar switch are embedded in the BRAMs and the wide multiplexers of the DSP48E1 slices. A substantial decrease in the Configurable Logic Blocks (CLBs) utilization of NoC topologies on the FPGA is observed by embedding the functionality of the buffers and crossbar on the hard blocks of the FPGA compared to other state-of-the-art works. Lightweight and high-performance NoC architecture is suitable for designing the heterogeneous systems to achieve area reduction and to improve the overall system performance. A low latency router with a look-ahead bypass called LBNoC has been proposed. The techniques such as single cycle router pipeline bypass, adaptive routing module, parallel virtual channel and switch allocation, combined flow control mechanism like virtual cut through, and wormhole switching are employed in designing the LBNoC router. The input buffer modules of NoC router are mapped on the FPGA BRAM hard blocks to utilize resources efficiently.
URI:	http://idr.nitk.ac.in/jspui/handle/123456789/17038
Appears in Collections:	1. Ph.D Theses

Files in This Item:

File	Description	Size	Format
Thesis-PrabhuPrasadBM-155113CS15F10.pdf		2.6 MB	Adobe PDF	View/Open

Show full item record