AIP Orchestrators [1] let you set request and token rate limits.
Here’s an example from the docs:
from palantir_models.transforms import OpenAiGptChatLanguageModelInput
RATE_LIMIT_PER_MIN = 100
TOKEN_LIMIT_PER_MIN = 50000
@configure(["NUM_EXECUTORS_2"])
@transform(
...
chat_model=OpenAiGptChatLanguageModelInput(
"ri.language-model-service..language-model.gpt-35_azure"
),
)
def compute(output, questions, chat_model, ctx):
...
How do we figure out what the max permissable rate limits are?
For example, for our company’s account with OpenAI we can check which “tier” our account is on, and then refer to this table [2] to see rate limits.
Other than trial-and-error, how can I figure out what the rate limits are for various models in the context of AIP Orchestrators? I don’t see specifics in the Palantir docs.
- https://www.palantir.com/docs/foundry/transforms-python/aip-orchestrators/#aip-orchestrators
- https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-four