Liszt AI: Programmable Serving for Agentic AI
← Back to calendar
We are building the serving stack for agentic AI. Today’s inference engines were built for chat:
prompt in, tokens out. Agents are different. They branch, call tools, retry, verify,
search, plan, and reuse context across long-running workflows.
Today's inference engines force agents through a chat API, which makes inference slower, more expensive, and harder to optimize.
Liszt AI fixes this by making LLM serving programmable.
What we are building
- Pie: a programmable serving system for agentic workloads.
- Inferlets: server-side agent logic that runs inside the serving system.
- Workflow-aware control over generation, KV cache, I/O, and forward passes.
- Higher performance, lower latency, lower cost, and better answers.
Team
- Lin Zhong, Yale CS professor, ACM/IEEE Fellow, expert in low-latency, high-throughput systems.
- Seung-seob Lee, Yale CS research scientist.
- In Gim, Yale CS Ph.D. student, MLCommons ML and Systems Rising Star.
- We are co-inventors of Prompt Cache and Programmable Serving
Vision
- Pie becomes the de facto platform for agentic application development and distribution
- Pie API becomes the de facto API for developing performant and efficient agentic applications
Why now
- The AI stack is moving from chatbots to agents.
- Agents know the workflow, but today’s serving engines control the compute.
- That mismatch creates repeated prefills, round trips, blind KV-cache management, and fragile orchestration.
- Pie moves workflow knowledge into the serving layer.
Product wedge
- Start as a drop-in serving stack for agentic workloads.
- Users do not need to learn the Pie API on day one.
- They send traces or describe workflows; Liszt recommends serving profiles and built-in inferlets.
- Power users can later write inferlets to serve their workflows with maximum performance and efficiency.
Initial users
- Teams running agents on open models.
- vLLM and SGLang users hitting performance/efficiency limits.
- Local inference users.
- AI infra engineers focused on cost, latency, and quality.
Why we will win
- We invented programmable model serving.
- Pie demonstrated more than 3× efficiency improvement in research results.
- Pie allows agents to implement, deploy and own inference optimizations without relying on inference engines
- Gains from Pie are orthogonal to those from lower layers (inference engine, kernel, quantization)
- Pie is closer to agentic applications than existing inference engines
Raise
- We are raising a pre-seed round.
- Use of funds: productize Pie, ship built-in inferlets, grow the open-source community, support design partners, and hire systems/product talent.
- We want investors who can help with customers, hiring, open-source growth, and AI infrastructure expertise.
Contact