Lightning Talk Beginner MIT License

Resilient LLM - Why your LLM calls fail in production and how to make them resilient

Approved

Session Description

This talk introduces a personal Open Source project that has been receiving a good reviews and shares the technical learning with the community.

When I deployed my AI agents to production, they started facing failures, some predictable (e.g. hitting LLM provider's rate limits), some unpredictable (Anthropic's overload error, network issues, CPU/memory spikes leading to server crash, etc.). Some of these issues were easy to deal with e.g. a simple exponential backoff and retry strategy. But it was not good enough to put it out there on production. I could have put a rate limit gateway in front of my app server but that wouldn't have enough user/app context/control to recover from these failures and leave the gap for unpredictable errors. Also it would have been an extra chore and expense to manage that gateway. So for the multiple agentic apps that I was creating, the LLM calls had to be more resilient, and the solution to deal with most of these failures had to be in the app itself.

Existing libraries such as Vercel AI added even more unpredictability e.g. AI_UnsupportedModelVersionError. So I found myself writing duplicate code to make my LLM apps more resilient. This motivated me to write this Open Source library: ResilientLLM - a resilient, unified LLM interface featuring circuit breaker, token bucket rate limiting, caching, and adaptive retry with dynamic backoff support.

In simpler words, it is a class that unifies API calls to any LLM via a common interface, the usage looks as following - unifiedAndResilientLLM.chat(conversationHistory, llmAndResilienceOptions). It frees up my time from worrying about the critical failures that LLM apps and AI agents face in production.

This minimalist Open Source library aims to solve the same challenges for you by providing a resilient layer that intelligently manages failures and rate limits, enabling you (developers) to integrate LLMs confidently and effortlessly at scale.

Key Takeaways

Understand common culprits why LLM apps fail on production e.g. unstable network, rate limits, unpredictable overload, etc.
Your first steps towards resilience - circuit breakers, token bucket rate limiting, graceful timeout and failures, adaptive retries
Practical and effective implementation of resilience patterns without moving focus away from the AI agent development

References

https://github.com/gitcommitshow/resilient-llm

Session Categories

Introducing a FOSS project or a new version of a popular project

Talk License: MIT License

Speakers

Pradeep Sharma Developer Relations Specialist

Pradeep has been a Computer Science Researcher at L3S Research Center (Germany), Software Engineer at startups, Tech Founder of 2x startups, author of a book for developers and #OpenSourceDiscovery newsletter.

Throughout his decade-long career, Pradeep developed dozens of apps in diverse domains such as dev productivity, privacy, neural search, auth, etc. Pradeep built Developer Diary, a cross-platform productivity tool for developers. His long-term commitment on this project and diverse experiences provided him with deep insights into the evolution of engineering, security, and AI/ML systems.

Pradeep also shares a passion for tinkering with electronics projects developed during his graduation from IIT Roorkee.

Reviews

Reviews are hidden by the event organisers.