Rohan Rajesh Kalbag

rCloud - rohan's cloud

Click for more details about the project

rCloud initially envisioned as a hobby project, is a secure file storage and sharing platform designed to provide users with a seamless and reliable way to store, share, and access their files from anywhere. It is currently being used in making my homelab server accessible to any device connected to its local WiFi network as a cloud storage. It creates flask REST API endpoints to interact with the server’s filesystem and a React Native client that utilizes them. It deploys both the frontend and backend in tandem using Nginx

TinyVTA - A General Purpose FPGA-based Deep Neural Network Accelerator

Click for more details about the project

tinyVTA is a high-performance FPGA-based tensor accelerator designed for deep neural network inference. It features optimized block matrix-multiply-accumulate (MMAC) and activation function (ACTIV) kernels, implemented using Vitis HLS and invoked through a custom instruction set architecture (ISA). The project includes a custom Pythonic compiler (tinyTVM) that translates PyTorch and TensorFlow fully-connected neural network models into tinyVTA executable instructions. It utilizes the Pynq API for memory management via PynqBuffer objects and PS/PL interfacing with AXI4 protocols. Additionally, APIs are created using tornado, enabling any client to compile their DNN models, remotely program a supported FPGA connected to a server, and perform ML inference through remote procedure calls. The accelerator was evaluated on the Xilinx UltraScale+ ZCU104 FPGA with a fully connected neural network for MNIST digit recognition, achieving precise hardware-software output consistency through extensive verification with a software testbench and hardware validation.

Pythonic Chat Application Implementing Secure End-to-End Encrypted Messaging using Signal Protocol

Click for more details about the project

This project presents a native implementation of the Signal Protocol for secure end-to-end messaging, using the cryptography library in Python. The project includes the development of a secure messaging application with a simple client-side GUI created using PySide6 and a web-socket based messaging system utilizing socketio. The server adheres to the Signal Protocol Specification, storing only credentials and public keys, receiving only ciphertext, and enabling multiple concurrent two-way communications. The client application allows users to select chat partners and locally persist chat messages, enabling local chat history. For the Signal Protocol implementation, the Extended Triple Diffie-Hellman (X3DH) key agreement protocol and the Double Ratchet Algorithm were employed. The X3DH protocol facilitates the establishment of a shared secret key between two parties using their respective public keys, ensuring forward secrecy and cryptographic deniability. The Double Ratchet Algorithm was implemented to provide secure and synchronized key exchanges for continued communication. The server uses tinyDB to store client information and acts as a communication conduit between clients without retaining any messages. The client-side application features a user-friendly interface for authentication, logging in, selecting chat partners, and viewing chat history. This project demonstrates the practical application of secure messaging protocols and provides a robust foundation for further development in secure communication systems

Deep Recurrent Q-Learning for Partially Observable Markov Decision Processes

Click for more details about the project

This project introduces a unique implementation of Deep Recurrent Q-Learning (DRQL) tailored for Partially Observable Markov Decision Processes (POMDPs). Our approach incorporates Transfer Learning for feature extraction, utilizes a customized LSTM for temporal recurrence, and introduces a domain-informed reward function to expedite convergence compared to the vanilla implementation outlined in the original paper. The performance evaluation centers around two adaptive Atari 2600 games: Assault-v5 and Bowling, where game difficulty scales with player proficiency. Comparative analysis between the convergence of our optimized reward function and the vanilla version is conducted, employing StepLR and CosineAnnealingLR learning rate schedulers, complemented by theoretical explanations. Additionally, we propose an efficient windowed episodic memory implementation that optimizes GPU memory utilization through bootstrapped sequential updates.

Implementing Deep Learning Architectures for Advanced Machine Learning using PyTorch

Click for more details about the project

Implemented Deep Neural Network architectures using PyTorch for advanced machine learning applications. The repository contains two major projects. The first project involves creating a Long-Short-Term-Memory (LSTM) based Algorithmic Stock Trader, utilizing the sp500 stock market tickers dataset. The implementation includes modeling time series with LSTM, experimenting with techniques like normalization and feature engineering, and assessing the algorithmic trading module’s profitability under various conditions such as buy-ask spread and commissions. The second project focuses on Facial Similarity Metric Learning and Face Generation using Deep Convolutional Generative Adversarial Networks (DCGAN) with the Labeled Faces in the Wild dataset. This project employs a Transfer Learned ResNet based Siamese Network for Similarity Metric Learning, along with experiments involving regularization, learning rate scheduling, dropout, and optimization variations. The DCGAN is trained to generate new faces and modified into a Conditional GAN to generate unseen images based on a given input image from the Siamese Network.

Compiler Infrastructure and Optimization Implementations in Java

Click for more details about the project

This repository presents implementations of various compiler optimizations and infrastructure for MiniJava, a Java subset, using JavaCC and JTB. The projects include a type checker, function inliner with Rapid Type Analysis, a register allocator utilizing Kempe’s graph coloring heuristic, and a for-loop parallelization employing the GCD test. Each implementation is accompanied by a detailed problem specification. For the Type Checker project, a Java-like object-oriented language is type-checked using a provided grammar file, and detailed error reporting is implemented. The Function Inliner focuses on determining inlineability based on RTA and transforming method calls. Register Allocation involves spilling variables to memory using liveness analysis results. Loop Parallelization utilizes the GCD test to identify parallelizable for-loops in methods, marked with the /* @Parallel */ decorator.

High-Level Synthesis using Algorithmic Assembly

Click for more details about the project

This project explores High-Level Synthesis using Algorithmic Assembly (AA), an Intermediate Representation (IR) for the AHIRv2 C to VHDL compiler developed at IIT Bombay. The first part involves the design of a Shift and Add Multiplier and a Shift and Subtract Divider Circuit using Algorithmic Assembly. The second part focuses on the Hardware Acceleration of Matrix Multiplication using Loop Optimizations and Parallelism in Algorithmic Assembly providing practical insights into the translation of algorithms to hardware, leveraging the capabilities of Algorithmic Assembly.

Design of a Secure Python Web Application Implementing Chinese Wall Model of Access Control and CSRF Protection

Click for more details about project

Developed the application’s backend using Flask, a powerful web framework in Python. Utilized Python programming language to implement robust and efficient backend functionalities, including user authentication, database management, and secure file access. Implemented a secure login system using CSRF tokens to prevent cross-site request forgery attacks. Employed industry best practices to ensure the confidentiality and integrity of user credentials and session management. Incorporated the Chinese Wall Model for temporal access control of documents within the application. Designed a comprehensive database structure to manage users, companies, and files, enforcing strict access restrictions based on user roles and conflict of interest criteria.

Formal Verification and Test Generation for Ripple Carry Adder Implementation

Click to access source files

This project conducts the formal verification of a given Ripple Carry Adder (RCA) implementation, Binary Decision Diagrams (BDDs) were employed using bddlib to represent the provided RTL description of the adder circuit. Additionally, BDD operations, including Image and Pre-Image, were implemented natively in C++ to rigorously prove critical properties such as Goldberg’s Conjecture, ensuring the correctness of the adder’s design implementation. Furthermore, SAT solvers were used to construct the smallest spanning test-vector set for post-fabrication physical design testing, thereby enhancing the circuit’s reliability and robustness.

Model Based Embedded System Design

Click to access source files

This project involved the development of an embedded system for autonomous valet-parking. This involved the model-based design of a finite state automaton using the heptagon/BZR modeling language to synthesize a highly optimized, easily verifiable reactive kernel. At the hardware level, we engineered sensor and actuator interfacing drivers for the Atmega328p microcontroller, coupled with real-time operating system (RTOS) features like scheduling, interrupt handling, and memory management, resulting in enhanced system responsiveness and performance. Additionally, we designed and fine-tuned native algorithms for obstacle wall-hugging, PID line following, track color inversion, and parking space identification to ensure seamless integration into our embedded system.