I have built a fairly simple pipeline that extracts text from PDFs, chunks the data, creates embeddings, and then feeds them into an AIP Agent.
My issue is that I receive a ‘Rate limit exceeded’ error almost immediately when I start using the agent, and this happens regardless of which model I choose. Even using 10 retries with exponential backoff fails.
The maximum number of LLM call retries, due to rate limiting, was exceeded.
countAttempts: Optional[10]
As I’m on the free developer plan, are the rate limits super low? Or have I built the solution poorly?
This is likely a combination of the context being passed into the LLM being too large and the developer plan rate limits being too low.
To check the former, you could look at the View raw option in Read instructions > Prompt after you send the message to see exactly what text is being passed into the LLM. With that, you can trim down any context that may not be needed so the context does not exceed the rate limit.
For the latter, are you seeing any message in the AIP Agent UI indicating what limit you are hitting/ how many tokens are you given? If not, could you go to the Resource Management application and then under the AIP usage & limits tab and see the rate limit that you are given there? If that is not sufficient, you may need to increase the subscription tier that you are on.
Looking at the raw input, it looks like I’m passing the full contents of the files, which would at least explain part of the issues.
I cant seem to quite get my head around the best practices for extracting text, chunking/embedding, then allowing the AI Agent to access the content - without giving the agent all the content on every request,