Safoine El Khabich
dark
Toggle monthly spotlight

About

Production AI systems, infra, and the engineering around them

I am a Senior LLM/MLOps engineer and ZenML core maintainer focused on Kubernetes-native, cloud-agnostic AI systems. Lately, a lot of my work has been around AI agents, orchestration reliability, and the infra/cloud patterns that make them production-ready.

What I work on

I design and ship the path from experiment to production for ML and GenAI systems: pipelines, CI/CD, deployment, observability, and the platform pieces that keep teams moving without losing reproducibility.

My background is strong in MLOps and cloud/Kubernetes infrastructure (AWS, Azure, GCP), and more recently I have been spending a lot of time on AI agent workflows, prompt-first/LLM systems, and the operational guardrails they need to run reliably.

A lot of this comes from open-source product work (ZenML), client delivery, and workshop environments where systems need to be both technically sound and teachable. Expect implementation notes, deployment tradeoffs, and patterns you can reuse.

Areas of interest

  • AI agents, orchestration patterns, and tool-calling reliability
  • LLMOps and MLOps workflows: evaluation, tracing, deployment, and rollback
  • Kubernetes-native serving and cloud infrastructure across AWS, Azure, and GCP
  • Developer/platform experience for ML and AI teams shipping production systems

How I write

I prefer concise, implementation-first writing with explicit tradeoffs, failure modes, and operational context. The goal is to make the thinking reusable for engineers building real systems, not just demos.