Simon Thelin

End-to-End ML Platform & Data Engineer  ·  AI Creator  ·  Synthesia

End-to-end ML Platform & Data Engineer with 10+ years building scalable, governed data systems across AWS and GCP. Creator of DataPains — practical content on ML infrastructure, data engineering, and the tools that matter in production.

Watch DataPains ↗
0
Industry experience
Infra
costs cut
Successfully reducing infrastructure cost across Data & AI projects
0
Query latency reduction
Global
R&D
Serving researchers across countries · Synthesia
Simon Thelin
About

End-to-end ML Platform
& Data Engineer.
AI creator.
DataOps fundamentalist.

10+ years building scalable, governed data systems across AWS and GCP — Lakehouse architecture, DataOps pipelines, and production-grade ML infrastructure. Currently Tech Lead at Synthesia, serving R&D researchers across multiple countries as part of a global AI video platform.

DataPains is where I share what I've learned — practical, no-fluff content on data engineering, AI tooling, and the platforms that actually matter in production. Conference speaker at Big Data London and DataNova 2023. Featured on the Data Team Success podcast.

Python Terraform AWS GCP dbt Airbyte Trino PySpark Kafka K8s Delta Lake Airflow ArgoCD Docker LLMs AI Video
Content

DataPains on YouTube

@DataPains

ML PLATFORM · DATA ENGINEERING · AI · TOOLS THAT MATTER IN PRODUCTION

View all videos ↗
Blog

Writing

All posts on Medium →
Experience

Career

Jul 2024 — Present
Tech Lead — ML Platform & Data Engineering
Synthesia · London

Driving ML Platform strategy for a global R&D organisation spanning multiple countries. Implemented Lakehouse lifecycle governance that significantly reduced infrastructure costs. Leading data infrastructure across AWS, architecting scalable pipelines and governance frameworks.

May 2022 — Jul 2024
Lead Data Engineer
7bridges · London

50% reduction in compute and storage costs. 98% query latency reduction via Trino-based Lakehouse architecture on GCP. Built end-to-end DataOps platform with dbt, Airflow, and Terraform.

Aug 2020 — Dec 2021
Data Engineer
IMG Arena · London

Built real-time sports data pipelines processing millions of events. PySpark, Kafka, and AWS-native services powering live sports analytics products.

Speaking & Press

On stage & on the record

Contact

Let's connect