cyber professional
Infrastructure and security for AI and LLM research
Roboticist (MLOps, DevOps, and Build) at RAI Institute
Activity
Loading activity...
Career: 7
Roboticist (MLOps, DevOps, and Build)
RAI Institute
Cambridge, MA
Mar 2024 - Present
Roboticist (MLOps, DevOps, and Build)
RAI Institute
Cambridge, MA
• Delivered platform engineering services at a hardware+software neolab with 10+ independent research teams, including sourcing GPU compute, supporting multi-arch and CUDA builds, writing microservices in Python, and owning observability and security. • Obtained executive and legal approval to streamline AI agent adoption, add new models and MCP servers, and clarify guidelines to safely scale to 50% of code org-wide written by AI. • Authored migration strategy for a C++/Python/ROS2 robotics monorepo to achieve 10-100x faster builds and a 90% reduction in build failures. Embedded with newly created Build team for 6 months to recruit key hires, develop a Bazel MVP, and secure researcher adoption. • Provisioned and operated Kubernetes AI training clusters (35,000 CPUs and 500 GPUs across cloud providers and on-prem). Provided on-call support for researcher workloads such as distributed training in Ray, Metaflow pipelines, and self-hosted GitHub Actions CI runners. • Owned high-urgency DevOps projects including cross-company reinforcement learning collaboration, security incident response, and EU data privacy compliance. • Earned a HackerOne bounty for discovering an access control bypass in GitHub Actions.
Founding Engineer (DevOps, Security, and Infrastructure)
Tome
Mar 2022 - Nov 2023
Founding Engineer (DevOps, Security, and Infrastructure)
Tome
• Built backend infrastructure for automated analysis of venture capital investment contracts, including job scheduling, sandboxing, data storage, and failure handling. • Established continuous delivery and observability standards across all TensorFlow/Keras inference serving, Python API services, and data pipelines. • Obtained SOC 2 for a seed-stage LLM legaltech startup with a from-scratch compliance program including technical controls, penetration testing, and evidence collection. • Closed "early adopter" deals with law firm security and risk teams as technical sales lead, enabling private model training on proprietary contract data.
Principal Infrastructure Engineer, Fidelity Labs
Fidelity Investments
Boston, MA
Jun 2020 - Jan 2022
Principal Infrastructure Engineer, Fidelity Labs
Fidelity Investments
Boston, MA
• Architected AWS SageMaker research environments, Snowflake data warehouse with PII deidentification, and Elasticsearch cluster with integrated sales conversion ML rankings. • Secured enterprise-wide approval for Terraform as an infrastructure-as-code tool and coached other teams on IaC adoption and best practices. • Increased production deploy frequency 10x with containerized CI/CD on Jenkins/Kubernetes.
Principal Infrastructure Engineer
Catchlight
Jun 2020 - Jan 2022
Principal Infrastructure Engineer
Catchlight
Senior Site Reliability Engineer
Quantopian Inc. (acquired by Robinhood)
Greater Boston
Jan 2019 - Apr 2020
Senior Site Reliability Engineer
Quantopian Inc. (acquired by Robinhood)
Greater Boston
• Eliminated 90% of overnight PagerDuty incidents by migrating hedge fund trading from cron scheduling to Apache Airflow running on Kubernetes pods. • Refactored trading simulator from Python multi-tenant monolith to run as per-algorithm stateless containers on Kubernetes with isolated blast radius and 50% reduction in cost. • Designed on-demand cloud developer environments using Kubernetes + Helm + Buildkite.
Site Reliability Engineer
Quantopian
Greater Boston Area
Mar 2017 - Jan 2019
Site Reliability Engineer
Quantopian
Greater Boston Area
Site Reliability Engineer
Harvard University
Greater Boston Area
May 2015 - Mar 2017
Site Reliability Engineer
Harvard University
Greater Boston Area
• Wrote a Python framework for high reliability, high concurrency statistical processing of MRI brain scan data on HPC hardware (SLURM). • Architected an expanded data pipeline system to cope with the new demands of researchers collecting higher density data such as phone logs and GPS coordinates. • Replaced an existing cron job system with improved scheduling options, including on demand and backfill jobs.