Rohan Rajesh Kalbag

Deep Recurrent Q-Learning for Partially Observable Markov Decision Processes

Click for more details about the project

This project introduces a unique implementation of Deep Recurrent Q-Learning (DRQL) tailored for Partially Observable Markov Decision Processes (POMDPs). Our approach incorporates Transfer Learning for feature extraction, utilizes a customized LSTM for temporal recurrence, and introduces a domain-informed reward function to expedite convergence compared to the vanilla implementation outlined in the original paper. The performance evaluation centers around two adaptive Atari 2600 games: Assault-v5 and Bowling, where game difficulty scales with player proficiency. Comparative analysis between the convergence of our optimized reward function and the vanilla version is conducted, employing StepLR and CosineAnnealingLR learning rate schedulers, complemented by theoretical explanations. Additionally, we propose an efficient windowed episodic memory implementation that optimizes GPU memory utilization through bootstrapped sequential updates.