This talk introduces a personal Open Source project that has been receiving a good reviews and shares the technical learning with the community.
When I deployed my AI agents to production, they started facing failures, some predictable (e.g. hitting LLM provider's rate limits), some unpredictable (Anthropic's overload error, network issues, CPU/memory spikes leading to server crash, etc.). Some of these issues were easy to deal with e.g. a simple exponential backoff and retry strategy. But it was not good enough to put it out there on production. I could have put a rate limit gateway in front of my app server but that wouldn't have enough user/app context/control to recover from these failures and leave the gap for unpredictable errors. Also it would have been an extra chore and expense to manage that gateway. So for the multiple agentic apps that I was creating, the LLM calls had to be more resilient, and the solution to deal with most of these failures had to be in the app itself.
Existing libraries such as Vercel AI added even more unpredictability e.g. AI_UnsupportedModelVersionError. So I found myself writing duplicate code to make my LLM apps more resilient. This motivated me to write this Open Source library: ResilientLLM - a resilient, unified LLM interface featuring circuit breaker, token bucket rate limiting, caching, and adaptive retry with dynamic backoff support.
In simpler words, it is a class that unifies API calls to any LLM via a common interface, the usage looks as following - unifiedAndResilientLLM.chat(conversationHistory, llmAndResilienceOptions). It frees up my time from worrying about the critical failures that LLM apps and AI agents face in production.
This minimalist Open Source library aims to solve the same challenges for you by providing a resilient layer that intelligently manages failures and rate limits, enabling you (developers) to integrate LLMs confidently and effortlessly at scale.
Understand common culprits why LLM apps fail on production e.g. unstable network, rate limits, unpredictable overload, etc.
Your first steps towards resilience - circuit breakers, token bucket rate limiting, graceful timeout and failures, adaptive retries
Practical and effective implementation of resilience patterns without moving focus away from the AI agent development
Not sure if this is of particular interest to the FOSS community at large
+1