Jubayer Ibn Hamid

I am a 1st year PhD student at Stanford University. I work in artificial intelligence with a focus on algorithms for sequential decision making and reinforcement learning.

I am affiliated with Stanford Artificial Intelligence Laboratory (SAIL), where my research is advised by Dorsa Sadigh.

Previously, I studied mathematical physics as an undergraduate at Stanford University. Outside of AI, I am interested in pure mathematics, especially abstract algebra and neighbouring fields.

Research

I work on reinforcement learning and sequential decision-making with the goal of developing AI that can solve extremely complex problems in new ways. My current interests lie in a number of different areas, including exploration-exploitation strategies, training-time and inference-time search, and stabilizing deep RL over long horizons.

Selected Papers

* denotes co-lead.

Training Exploratory Reasoning LLMs via Set RL. Ifdita Hasan Orney*, Jubayer Ibn Hamid*, Shreya Ramanujam, Shirley Wu, Hengyuan Hu, Noah Goodman, Dorsa Sadigh, Chelsea Finn. In submission.
Polychromic Objectives for Reinforcement Learning. Jubayer Ibn Hamid*, Ifdita Hasan Orney*, Ellen Xu, Chelsea Finn, Dorsa Sadigh. International Conference on Learning Representations (ICLR), 2026.
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling. Yuejiang Liu*, Jubayer Ibn Hamid*, Annie Xie, Yoonho Lee, Max Du, Chelsea Finn. International Conference on Learning Representations (ICLR), 2025.

Full list of publications can be found here: All Papers.

Notes

Introductory notes on various topics I find interesting.

Category Theory and Algebraic Geometry. (in progress)
Abstract Algebra. (in progress)
Whitney's Embedding Theorem and Immersion Theorem.
Deep Reinforcement Learning.
Trust Region Optimization Methods.

Teaching

CS 224R - Deep Reinforcement Learning. Head CA. Spring, 2025.

CS 229 - Machine Learning. CA. Winter, 2025.