Liszt AI: Programmable Serving for Agentic AI

← Back to calendar

We are building the serving stack for agentic AI. Today’s inference engines were built for chat: prompt in, tokens out. Agents are different. They branch, call tools, retry, verify, search, plan, and reuse context across long-running workflows. Today's inference engines force agents through a chat API, which makes inference slower, more expensive, and harder to optimize. Liszt AI fixes this by making LLM serving programmable.

What we are building

Pie: a programmable serving system for agentic workloads.
Inferlets: server-side agent logic that runs inside the serving system.
Workflow-aware control over generation, KV cache, I/O, and forward passes.
Higher performance, lower latency, lower cost, and better answers.

Team

Lin Zhong, Yale CS professor, ACM/IEEE Fellow, expert in low-latency, high-throughput systems.
Seung-seob Lee, Yale CS research scientist.
In Gim, Yale CS Ph.D. student, MLCommons ML and Systems Rising Star.
We are co-inventors of Prompt Cache and Programmable Serving

Vision

Pie becomes the de facto platform for agentic application development and distribution
Pie API becomes the de facto API for developing performant and efficient agentic applications

Why now

The AI stack is moving from chatbots to agents.
Agents know the workflow, but today’s serving engines control the compute.
That mismatch creates repeated prefills, round trips, blind KV-cache management, and fragile orchestration.
Pie moves workflow knowledge into the serving layer.

Product wedge

Start as a drop-in serving stack for agentic workloads.
Users do not need to learn the Pie API on day one.
They send traces or describe workflows; Liszt recommends serving profiles and built-in inferlets.
Power users can later write inferlets to serve their workflows with maximum performance and efficiency.

Initial users

Teams running agents on open models.
vLLM and SGLang users hitting performance/efficiency limits.
Local inference users.
AI infra engineers focused on cost, latency, and quality.

Why we will win

We invented programmable model serving.
Pie demonstrated more than 3× efficiency improvement in research results.
Pie allows agents to implement, deploy and own inference optimizations without relying on inference engines
Gains from Pie are orthogonal to those from lower layers (inference engine, kernel, quantization)
Pie is closer to agentic applications than existing inference engines

Raise

We are raising a pre-seed round.
Use of funds: productize Pie, ship built-in inferlets, grow the open-source community, support design partners, and hire systems/product talent.
We want investors who can help with customers, hiring, open-source growth, and AI infrastructure expertise.