we have a latency sensitive workflow, and AIP logic function is currently a latency bottleneck
current latency
we have llm blocks with single completion outputting strings, and creating objects with action blocks based on the string output.
currently each action block is taking 5-6 seconds.
there are 3 action blocks in the AIP logic function, and so total execution time is 45-60 seconds.
workflow
we really want to bring this down latency to 10-20 seconds if possible.
the aip logic function is executed real time in workshop by the end user. we cannot pre-run the functions due to workflow constraints (AIP logic function input is a combination of N products. there are millions of products. there are way too many potential combinations to pre-run and store)
curious if anyone had similar challenges or have any suggestions on how we can achieve 10-20 second latency
Hey - I think you are on the right path with using single completion and understanding the blocks. It’d be worthwhile to understand which part of the logic is running slowly?
A few leading question:
How long does each block take? Can anything run in parallel?
Are you having the LLM call the actions or are you calling the action with an action block?
When you run your action in code repo, do they take a long time? If so, can you write more efficient code? Can you call the actions in parallel?
How long does each block take?
Action blocks are the biggest bottlenecks with 5-6 sec latency. all other blocks run in less than 0.5 secs
Can anything run in parallel?
This is a great suggestion. I can explore breaking it up to multiple logic functions and parallelizing
Are you having the LLM call the actions or are you calling the action with an action block?
LLM blocks output strings on single completion. Action blocks execute actions
When you run your action in code repo, do they take a long time? If so, can you write more efficient code?
The actions are not function backed. The action creates an object based on given inputs, with no other side effects. I don’t see opportunity for optimization here.
One option would be to write a function backed action that writes all three actions at same time (this might be more efficient). Are you just modifying three objects or are you applying a lot of edits?
One way of running things in parallel that I have used is to import a few logics into a code repo as queries and calling them all inside of a Promise.all(). I am not sure how much this will help given the bottleneck appears to be the action.
Another avenue to explore is if you can change to use a faster model for the LLM blocks. If you set up some Logic Evaluations (more guidance here and here) you can quantify precisely the change in behavior using different models and evaluate the tradeoffs between cost, speed, and accuracy for each block of your Logic.
Also, I’ll second @bkaplan with the general guidance to “factor out” anything that can be done deterministically from the LLM blocks into either Typescript Functions or other Logic blocks as they will operate much more quickly that having the LLM block determine and use a tool.