Thursday, August 21, 2025

Serverless AI: Modal and Scaling AI Inference

Like AWS Lambda, optimized for AI/GPU payloads, Python, not limited on single instance.

Containers, optimized, without Docker, fast "cold start"

Used in production, i.e. by Suno (AI gen songs) and substack

interesting!

Modal and Scaling AI Inference with Erik Bernhardsson - Software Engineering Daily  podcast

Modal is a serverless compute platform that’s specifically focused on AI workloads. The company’s goal is to enable AI teams to quickly spin up GPU-enabled containers, and rapidly iterate and autoscale.

It was founded by Erik Bernhardsson who was previously at Spotify for 7 years where he built the music recommendation system and the popular Luigi workflow scheduler.

...this episode ... talk about the motivation for founding his company, the market gap in ML and AI tooling, optimizing container cold start, Modal’s interface design, and more.

No comments: