End-to-End ML Platform & Data Engineer · AI Creator · Synthesia
End-to-end ML Platform & Data Engineer with 10+ years building scalable, governed data systems across AWS and GCP. Creator of DataPains — practical content on ML infrastructure, data engineering, and the tools that matter in production.
Watch DataPains ↗
10+ years building scalable, governed data systems across AWS and GCP — Lakehouse architecture, DataOps pipelines, and production-grade ML infrastructure. Currently Tech Lead at Synthesia, serving R&D researchers across multiple countries as part of a global AI video platform.
DataPains is where I share what I've learned — practical, no-fluff content on data engineering, AI tooling, and the platforms that actually matter in production. Conference speaker at Big Data London and DataNova 2023. Featured on the Data Team Success podcast.
MEDIUM
A look at whether F3's table format improvements actually address the deeper challenge of unifying vector and structured data in modern AI stacks.
Read on Medium →MEDIUM
How to think about cold, inactive data in a Lakehouse — lifecycle policies, tiering strategies, and the cost implications of keeping everything hot.
Read on Medium →MEDIUM
A mental model for structuring data transformations and semantic layers — where dbt fits, where it doesn't, and how to draw the right boundaries.
Read on Medium →Driving ML Platform strategy for a global R&D organisation spanning multiple countries. Implemented Lakehouse lifecycle governance that significantly reduced infrastructure costs. Leading data infrastructure across AWS, architecting scalable pipelines and governance frameworks.
50% reduction in compute and storage costs. 98% query latency reduction via Trino-based Lakehouse architecture on GCP. Built end-to-end DataOps platform with dbt, Airflow, and Terraform.
Built real-time sports data pipelines processing millions of events. PySpark, Kafka, and AWS-native services powering live sports analytics products.
Live conference talk on Lakehouse architecture and DataOps in production — one of Europe's largest data engineering events.
Watch on YouTube →Building a data analytics platform with a Lakehouse at 7bridges — featured by Starburst as a DataNova success story.
Read the story →In conversation with Ross Webb — on building high-performing data teams, ML Platform strategy, DataOps culture, and the realities of end-to-end data engineering at scale.
Listen to the episode →