Rohan Rajesh Kalbag

IITB-RISC-2022

The IITB-RISC-22, a 16-bit computer system, boasts a remarkable turing complete ISA, an architecture capable of executing 17 instructions, featuring 8 general-purpose registers (R0 to R7), with R7 doubling as the program counter. The architecture also incorporates a carry flag and a zero flag, along with two 16-bit Arithmetic Logic Units (ALUs), a 16-bit priority encoder providing a 16-bit output and 3-bit register address. The system further encompasses two Sign Extenders, SE6 and SE9, tailored for 6 and 9-bit inputs, respectively, yielding 16-bit outputs. Complementing these, there are two left bit shifters, Lshifter7 and Lshifter1, which respectively perform left shifts of 7 and 1 bit(s), appending zeros to the right, resulting in 16-bit outputs. Additionally, the architecture includes four temporary registers, TA, TB, TC, and TD, where TA, TB, and TC are 16-bit, and TD is 3-bit. This comprehensive design is complemented by a 128-byte (64-word addressable) random access memory.

In a significant development effort, a six-stage pipelined version of the processor was meticulously crafted. The entire instruction set was rigorously tested by loading them into memory, and waveforms were verified on the Xilinx Spartan 6 FPGA. To elevate processor performance, the ALU operations were expanded to encompass a two-way-fetch out-of-order superscalar architecture, boosting instruction per cycle (IPC) rates. This enhancement introduced a re-order buffer, a reservation station, and the implementation of Tomasulo’s register renaming algorithm, thereby streamlining the processor’s execution and optimizing its overall efficiency.

Implementation of Stop-and-Wait Algorithm

Click for more details about project

This project involved the development of a UDP-based Stop-and-Wait algorithm for reliable data transfer. The sender created UDP sockets for communication with the receiver. It meticulously managed packet transmission and retransmission based on acknowledgments from the receiver, ensuring data integrity. The code featured a sophisticated timeout mechanism, utilizing the select() function for precise timeout management. This approach enabled the sender to monitor responses and handle packet loss scenarios effectively. On the receiver side, the system continuously listened for incoming packets. It introduced a probabilistic packet drop simulation mechanism, where packets were randomly dropped based on a user-defined probability, enabling the examination of the protocol’s robustness in the face of potential data loss. The receiver meticulously tracked the sequence numbers of incoming packets, allowing it to acknowledge correctly received packets and request retransmissions when necessary. This project delved into the intricacies of network communication, demonstrating the implementation of a reliable communication protocol with advanced timeout handling and packet loss simulation.

Implementation, Visualisation and Analysis of Circuit Partitioning Algorithms

Click for more details about project

Recognizing that VLSI design schematics often exceed the capacity of a single FPGA due to the finite number of programmable logic elements, this project adopts a graph-based approach to represent interconnections between circuit elements. Logic gates, LUTs, FFs, and other design entities were seamlessly modeled as graph nodes, while interconnections were depicted as edges. In cases involving multiple parallel interconnects, weighted graphs were employed for precision. This method facilitated the automation of dividing the design among multiple FPGAs using CAD, effectively transforming the challenge into a graph partitioning problem. This project implements three significant partitioning algorithms Kernighan-Lin, Clustering-Based, and Hagen-Kahng-EIG – using Python, meticulously identifying metrics to assess their performance and prioritize the minimization of FPGA interconnects. Leveraging co-optimization across these metrics, advanced cost functions were developed, demonstrating the project’s commitment to achieving optimal circuit partitioning solutions within the realm of VLSI CAD.

Optimised Multiply Accumulate Circuit using Dadda Multiplier Architecture

Click for more details about project

This project involves an optimized Multiply Accumulate Circuit, implemented using VHDL with a Dadda Multiplier Architecture and a 16-bit Brent Kung Adder. It effectively multiplies two 8-bit operands and adds a 16-bit number to the product. The hardware descriptions were tested and their simulation was executed using GHDL, offering comprehensive test reports. The submission encompasses all necessary files, including test scripts and waveform analysis tools. Detailed results and RTL synthesis for FPGA was performed using Intel Quartus

Symptotic Analysis of Autoencoder Architectures for Image Colorization and Noise Reduction

Click for more details about project

In this study, the project embarked on leveraging deep learning to address image colorization and noise reduction challenges. The centerpiece of this project was the deployment of a Convolutional Neural Network (CNN)-based autoencoder, meticulously trained on grayscale CIFAR-10 images, achieving a Root Mean Square Error (RMSE) score of 0.052 for generating colorized versions. Further investigations encompassed a comparative analysis of the performance between autoencoders and Principal Component Analysis (PCA) in the context of gaussian and salt pepper noise reduction, employing training data from MNIST images. This analytical approach provided valuable insights into the strengths and limitations of these methodologies, shedding light on suitability of various noise reduction scenarios. Additionally, the project ventured into exploring the data specificity of autoencoders by executing the same model on different image classes, effectively illustrating how autoencoders adapt to diverse data sets.