Aarav Wattal

Stanford University

B.S./M.S. in Electrical Engineering and
Computer Science

I build the hardware layer of intelligent systems.

My work is in the systems layer underneath modern AI: memory, interconnect, verification, bare-metal inference, and agents that can reason about hardware directly.

I’m currently at NVIDIA working on high-bandwidth chip-to-chip interconnect for GPU systems. Before that, I developed DV infrastructure for GPU memory subsystems at AMD, built ML-powered chip verification tooling, worked on disaggregated memory and compute systems at Majestic Labs, and published research on RRAM device modeling.

Now I’m building toward AI systems with more efficient memory hierarchies, tighter hardware-software integration, and agents that can engineer directly against silicon. Long term, I’m interested in the broader stack around intelligent machines: AI hardware, ML systems, robotics, and AI for chip design.

When I’m not working, you can catch me grinding Tetris, learning Chinese, bricking wide-open 3’s, sidequesting, or gambling on semis.

Selected Work (see all projects)

Baymax: Built an AI-powered robotic arm for elderly assistance using computer-vision perception and custom inverse kinematics. Winner of the TreeHacks 2024 First Place Grand Prize ($10K).
Lifetime-Aware Heterogeneous Memory Fabrics for AI Accelerators: Built a PyTorch/ONNX-to-ZigZag workload characterization and heterogeneous memory-network synthesis pipeline for differentiated weight/activation flows across SRAM and emerging-memory fabrics.
Sentinel: Built an agentic firmware engineering system that reads datasheets, generates bare-metal firmware, and validates execution on physical hardware with FPGA-in-the-loop. Winner of the TreeHacks 2026 Best Hardware Prize.
Biimo: Built a $1M-backed AI fitness startup for elderly users in Japan, developing a television-native workout companion with Raspberry Pi camera input, pose estimation, repetition counting, and a remote-control-first React interface.
RRAM Research: Published first-author research on statistical modeling and optimization of resistive RAM devices at IEEE SISPAD, presented in Kobe, Japan.
Bare-Metal Distributed LLM Inference: Ran a 15M-parameter language model directly on bare-metal Raspberry Pi hardware, implementing the stack from GPIO and UART through multicore execution, caching, and virtual memory.
FPGA AI-Powered Music Synthesizer: Built a Verilog music engine for note playback, sampled audio output, and real-time VGA waveform display; implemented a pipelined CNN for chord-progression-based emotion classification; and designed chord, harmonic, and amplitude-decay control logic.

Reach me at awattal[at]stanford[dot]edu or on X @aaravwattal :)