I have been working with the compute modules to expose LLM’s in bedrock via bedrock SDK
Here is my current setup
Compute module with a function name InvokeLLM that takes input messages calls bedrock and responds
Webhook that calls the compute module
Typsecriipt function that calls the webhook. I had to do this because as per a previous post, I cannt call a compute function directly from a tyescript function
the function published Via typescript now shows up as a model to choose from list of registered models in AIP logic
My question is, if multiple users use the same model at the same time, will the compute module process these requests sequentially ?
I guess I can control the RPM at the webhook level, if the webhook is configured to concurency limit 10, I would assume 10 of these requests will hit the compute module
Will compute module process these request sequentially ?
If so how to make the compute module process more than 1 request at a time
Based on the Documentation, I understand that compute module manages concurrency when using the SDK, This concurrency is set when configuring the container and the replica’s will scale to meet the concurrency.
So I need to make sure that I have enough scaling in the webhook and compute setting.
Although compute modules are not tied to a specific Ontology, you must select one for the import process. It is recommended to select an Ontology that is related to the compute module function you want to import.
Search for your compute module function’s API name.
The example below shows how to import and use a compute module function:
import { Function } from "@foundry/functions-api";
// API Name: com.mycustomnamespace.computemodules.Add
import { add } from "@mycustomnamespace/computemodules";
export class MyFunctions {
@Function()
public async myFunction(): Promise<string> {
return await add({ x: 50, y: 50 });
}
}
Important considerations
Project location: Ensure the compute module is in the same Project as your TypeScript code for live preview to work correctly.
Type consistency: TypeScript enforces strict type checking. Ensure the declared return type matches the actual return type of your compute module function. For example, if you declare a string return type, your registered compute module function must return a string, not a struct type.
Asynchronous operations: Compute module functions are typically asynchronous. Use async/await syntax for proper handling.
Your replicas have a global configurable “concurrency limit”. We are aware of how many jobs each replica it’s currently processing. If we can schedule a job in any of the running replicas we will do so. If not we put that job in a queue that we start dispatching the minute there are replicas available.
On a side note, we scale up and down based on the current load both in progress + queue