How do we find what the max rate limits are for Palantir-provided language models?

taylor · June 12, 2024, 11:41pm

AIP Orchestrators [1] let you set request and token rate limits.

Here’s an example from the docs:

from palantir_models.transforms import OpenAiGptChatLanguageModelInput

RATE_LIMIT_PER_MIN = 100
TOKEN_LIMIT_PER_MIN = 50000


@configure(["NUM_EXECUTORS_2"])
@transform(
    ...
    chat_model=OpenAiGptChatLanguageModelInput(
        "ri.language-model-service..language-model.gpt-35_azure"
    ),
)
def compute(output, questions, chat_model, ctx):
    ...

How do we figure out what the max permissable rate limits are?

For example, for our company’s account with OpenAI we can check which “tier” our account is on, and then refer to this table [2] to see rate limits.

Other than trial-and-error, how can I figure out what the rate limits are for various models in the context of AIP Orchestrators? I don’t see specifics in the Palantir docs.

https://www.palantir.com/docs/foundry/transforms-python/aip-orchestrators/#aip-orchestrators
https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-four

taylor · June 13, 2024, 1:53am

It turns out that if you log your errors when using AIP Orchestrators to call models, and if you hit a rate limit error, the error message will display the max rate limit value.