Skip to main content
cyber professional

cyber professional

Infrastructure and security for AI and LLM research

Roboticist (MLOps, DevOps, and Build) at RAI Institute

Boston, Massachusetts, United States🇺🇸
Open to:Full-time rolesMentoring othersBoard positionsCollaborations
Preferred workplace:Remote (same country)Remote (worldwide)HybridRemote (same region)
Report

I’ve been a founding engineer, as well as infrastructure and security lead, for multiple AI and ML startups. I work directly with ML teams, AI researchers, and academic scientists to accelerate their development and get their work deployed to users.

My next role will focus on strategic technical work at a fast-moving and engineering-centric company. I’m particularly interested in working on developer velocity in distributed systems, AI agent governance and security, and scientific Python builds and tooling.

Activity

Active on:Bluesky network

Loading activity...

View full activity

Career: 7

Roboticist (MLOps, DevOps, and Build)

RAI Institute

Cambridge, MA

Mar 2024 - Present

• Delivered platform engineering services at a hardware+software neolab with 10+ independent research teams, including sourcing GPU compute, supporting multi-arch and CUDA builds, writing microservices in Python, and owning observability and security. • Obtained executive and legal approval to streamline AI agent adoption, add new models and MCP servers, and clarify guidelines to safely scale to 50% of code org-wide written by AI. • Authored migration strategy for a C++/Python/ROS2 robotics monorepo to achieve 10-100x faster builds and a 90% reduction in build failures. Embedded with newly created Build team for 6 months to recruit key hires, develop a Bazel MVP, and secure researcher adoption. • Provisioned and operated Kubernetes AI training clusters (35,000 CPUs and 500 GPUs across cloud providers and on-prem). Provided on-call support for researcher workloads such as distributed training in Ray, Metaflow pipelines, and self-hosted GitHub Actions CI runners. • Owned high-urgency DevOps projects including cross-company reinforcement learning collaboration, security incident response, and EU data privacy compliance. • Earned a HackerOne bounty for discovering an access control bypass in GitHub Actions.

Founding Engineer (DevOps, Security, and Infrastructure)

Tome

Mar 2022 - Nov 2023

• Built backend infrastructure for automated analysis of venture capital investment contracts, including job scheduling, sandboxing, data storage, and failure handling. • Established continuous delivery and observability standards across all TensorFlow/Keras inference serving, Python API services, and data pipelines. • Obtained SOC 2 for a seed-stage LLM legaltech startup with a from-scratch compliance program including technical controls, penetration testing, and evidence collection. • Closed "early adopter" deals with law firm security and risk teams as technical sales lead, enabling private model training on proprietary contract data.

Principal Infrastructure Engineer, Fidelity Labs

Fidelity Investments

Boston, MA

Jun 2020 - Jan 2022

• Architected AWS SageMaker research environments, Snowflake data warehouse with PII deidentification, and Elasticsearch cluster with integrated sales conversion ML rankings. • Secured enterprise-wide approval for Terraform as an infrastructure-as-code tool and coached other teams on IaC adoption and best practices. • Increased production deploy frequency 10x with containerized CI/CD on Jenkins/Kubernetes.

Principal Infrastructure Engineer

Catchlight

Jun 2020 - Jan 2022

Senior Site Reliability Engineer

Quantopian Inc. (acquired by Robinhood)

Greater Boston

Jan 2019 - Apr 2020

• Eliminated 90% of overnight PagerDuty incidents by migrating hedge fund trading from cron scheduling to Apache Airflow running on Kubernetes pods. • Refactored trading simulator from Python multi-tenant monolith to run as per-algorithm stateless containers on Kubernetes with isolated blast radius and 50% reduction in cost. • Designed on-demand cloud developer environments using Kubernetes + Helm + Buildkite.

Site Reliability Engineer

Quantopian

Greater Boston Area

Mar 2017 - Jan 2019

Site Reliability Engineer

Harvard University

Greater Boston Area

May 2015 - Mar 2017

• Wrote a Python framework for high reliability, high concurrency statistical processing of MRI brain scan data on HPC hardware (SLURM). • Architected an expanded data pipeline system to cope with the new demands of researchers collecting higher density data such as phone logs and GPS coordinates. • Replaced an existing cron job system with improved scheduling options, including on demand and backfill jobs.

Languages: 3

English(Native or bilingual)
Spanish(Limited working)
Chinese(Elementary)

Skills: 50

Technical

Amazon Web Services (AWS)BashDockerElasticSearchGitLinuxMachine LearningPython (Programming Language)
SQLTerraform

Business

Writing

Community

Open SourceOpen Source Software

Other

Amazon AthenaAmazon EC2Amazon ECSAmazon RDSAmazon Relational Database Service (RDS)Amazon S3AnsibleApache Airflow
Artificial Intelligence (AI)BazelCICloud ComputingConference OrganizationConference SpeakingContinuous DeliveryContinuous IntegrationContinuous Integration and Continuous Delivery (CI/CD)Data EngineeringDatadogDevOpsDistributed SystemsEngineering ManagementHigh Performance Computing (HPC)InfrastructureInfrastructure as code (IaC)JenkinsKubernetesLarge Language Models (LLM)MLOpsOpen Source DevelopmentProgrammingPythonScalabilitySecuritySite Reliability EngineeringSoftware DevelopmentTechnical Writing

Also find cyber professional on…