šŸ” On the Job Market! Available for full-time positions starting 2025! Contact me
Bhavana Mehta

Bhavana Mehta

Ph.D. Candidate Ā· University of Pennsylvania

Distributed Systems Ā· Adaptive Databases Ā· ML for Infrastructure

About

I am a Ph.D. candidate in Computer & Information Science at the University of Pennsylvania, advised by Prof. Boon Thau Loo and co-advised by Prof. Mohammad Javad Amiri. I also collaborate closely with Prof. Ryan Marcus on adaptive data management and distributed learning systems.

My research focuses on high-performance distributed databases, intelligent blockchains, and machine learning for systems optimization. I develop algorithms and systems that make data-intensive applications more efficient, adaptive, and reliable—combining rigorous quantitative analysis with practical engineering solutions.

šŸ“¢ Currently on the job market! I'm seeking full-time industry positions starting 2025! Please reach out at bhavanam@upenn.edu

Research Interests

My research interests span distributed systems, machine learning for systems, and high-performance computing:

Key Projects

Marlin: Byzantine-Resilient Sharding System

Built Byzantine-resilient sharding system with hypergraph partitioning and RL-based resharding agent in Python/Rust; handles 16K TPS under adversarial conditions with ACID guarantees.

Technologies: Python, Rust, Reinforcement Learning, Byzantine Fault Tolerance
AdaChain: Adaptive Blockchain Orchestrator

Created adaptive blockchain orchestrator using multi-armed bandit algorithms for dynamic pipeline selection; achieved 22% throughput increase and 40% latency reduction.

Technologies: Multi-armed Bandits, Blockchain, Performance Optimization
Capybara: DPDK-based TCP Stack

Collaborated with Microsoft Research on DPDK-based TCP stack for 4 µs live-connection migration; wrote C++ and Python modules for seamless data transfer.

Technologies: DPDK, C++, Python, TCP Migration, Microsoft Research
Distributed Training Infrastructure

Built distributed training infrastructure from scratch in Rust and Python, working with a team of 5 researchers to handle GPU cluster orchestration across heterogeneous hardware—achieved 30% throughput improvements.

Technologies: Rust, Python, GPU Clusters, Infrastructure

Selected Publications

View Full Publication List Google Scholar

Experience

Graduate Researcher

Sept 2019 - Present

University of Pennsylvania, Distributed Systems Lab

  • Built distributed training infrastructure from scratch in Rust and Python, working with a team of 5 researchers to handle GPU cluster orchestration across heterogeneous hardware—achieved 30% throughput improvements
  • Designed microservices control plane for job scheduling and data staging; scaled the system to 128 nodes with sub-millisecond heartbeat monitoring
  • Set up monitoring stack with Prometheus, Grafana, and OpenTelemetry to track cluster health and job performance; reduced time to detect failures by about 40%
  • Implemented automated fault injection testing for training pipelines under node failures and network partitions; improved system availability to 99.99% uptime
  • Created infrastructure-as-code using Terraform and Kubernetes to provision HPC clusters and cloud resources; reduced cluster setup time from days to 2 hours

Design Engineer

Jan 2018 - Jun 2019

Bluespec Inc., Boston, MA

  • Built configuration management system in Go for 500+ FPGA build variants with containerized jobs and task queues; improved build throughput by 25%
  • Automated CI/CD pipelines using Jenkins, Ansible, and Docker; reduced manual work in nightly builds by 80%
  • Worked with hardware team on kernel-bypass networking using DPDK for shuffle operations; cut interconnect overhead by 50 µs

Research Intern

May 2017 - Jul 2017

RISE Lab, Indian Institute of Technology Madras

  • Co-designed FPGA pipeline for 64-bit arithmetic achieving 281ns latency; built simulation framework for validation

Technical Skills

Languages

Rust, Go, C++, Python, Bash, SQL, Terraform

Distributed Systems

Microservices, gRPC, Kafka, Sharding, Consensus Protocols, Load Balancing

Infrastructure & Cloud

Kubernetes, Docker, Terraform, AWS, GCP, Linux, DPDK

Observability & DevOps

Prometheus, Grafana, OpenTelemetry, Jenkins, CI/CD, Chaos Engineering

Databases

PostgreSQL, Cassandra, Redis, ScyllaDB, Query Optimization