LLMs in the Model Catalog do not allow all options

I’ve noticed that support for options in the public documentation, such as structured outputs, is not supported as options for the model found in the model catalog. For example https://ai.google.dev/gemini-api/docs/structured-output#javascript. But the params defined for the Gemini family of models in AIP are:

type Parameters = {
    "stopSequences"?: Array<string> | undefined;
    "temperature"?: FunctionsApi.Double | undefined;
    "maxTokens"?: FunctionsApi.Integer | undefined;
    "topP"?: FunctionsApi.Double | undefined;
};

Ideally, we could use the SDK from Google, OpenAI, or a similar package like Vercel’s AI offering. Is there a reason we are forced to use the proxy API in AIP? Is there a workaround?

You can either register in the flavour of Bring Your Own Model or your can call it as an external system via your functions as well to get the full suite of functionalities.

The issue with the one-size-fits-all approach of AIP is that what one doesn’t support, all can’t support, so some API/model specific functionalities must be sacrificed on the altar of broad support.

This shouldn’t stop you however.

Have you tried defining this schema as the output of e.g. an AIP Logic function? Not fully sure what you are looking for in the Gemini API that this output format can’t deliver, so feel free to elaborate if there’s more to your use case than a structured output.

1 Like

I think you are missing the point. The advantage of using built-in AIP models is not the abstraction offered by the model catalog. It’s the fact that these are private endpoints provisioned by the hyperscaler in partnership with Palantir. They tend to offer better rate limits and better security than anything I could provision on my own. I’m totally familiar with bring your own model, etc. In the early days of AIP, Palantir provided us with API keys and the URLs of these private endpoints. I actually used to make raw API requests in my transforms to these endpoints. Then this concept of abstraction was introduced. I can see how this might be valuable in providing an interface that allows features in modules like AIP logic to interoperate with different models. That’s not a problem that most developers building AI systems close to the foundation layer have. For example, we made a focused decision to use the Gemini family of models for planning due to their cost and performance characteristics. We use OpenAI for coding as they tend to make fewer mistakes. As such, we are choosing to couple these models to the solution tightly. A simple example of how the current abstraction in Foundry makes this difficult is the inability to use structured outputs that pass a schema. Both Gemini and OpenAI support structured outputs, but within Foundry, I don’t see a structured outputs option in the type. This feature is critical for driving error rates in generated output to zero. Below is a sample request that uses structured outputs and OpenAI with gpt-5-mini

const response = await fetch('https://api.openai.com/v1/responses', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
            Authorization: `Bearer ${process.env.OPEN_AI_KEY}`,
        },
        body: JSON.stringify({
            model: 'gpt-5-mini',
            input: [
                { role: 'system', content: [{ type: 'input_text', text: system }] },
                { role: 'user', content: [{ type: 'input_text', text: user }] },
            ],
            reasoning: { effort: 'low' },
            // Optional: keep or remove web_search; it isn't needed if you fully inline the spec + code
            tools: [
                {
                    type: 'web_search',
                    user_location: { type: 'approximate', country: 'US' },
                },
            ],
            text: {
                format: {
                    type: 'json_schema',
                    name: 'EditPlanV0',
                    schema,
                    strict: true,
                },
                verbosity: 'low',
            },
            store: true,
        }),
    });

Additionally, tools like WebSearch don’t appear to be supported. Additionally, the outputs may lack the details I need. For example, are the input/output tokens included in the responses from AIP? Ideally, developers could have direct access to the model endpoint and could use native tooling from OpenAI, Google, Vercel, or make a raw fetch request.

2 Likes

It looks like this issue should be solve with API Proxy pictured below. But I can’t find any documentation on this. Can you please provide a link to the docs? Also is TypeScript supported?

Full video:
https://youtu.be/vEU_UgsQZAA?si=t0nTfO_BbaCX_RMt

1 Like

Can someone from Palantir respond here? I have another customer project where the need for structured outputs is outweighing the usefulness of Foundry model endpoints hidden behind the proprietary chat completions API. It looks like there is already a solution but I can’t find any documentation (see the linked DevCon 4 video I posted previously). If there is a way to support structured outputs using the existing API also please let me know. Please help.

There is a way of doing this with the OpenAI models:


from language_model_service_api.languagemodelservice_llm_api import LlmService
from language_model_service_api.languagemodelservice_api import ChatMessage, ChatMessageRole
from language_model_service_api.languagemodelservice_api_completion_v3 import (
Attribution,
CompletionRequestV3,
CreateCompletionRequest,
GptChatCompletionRequest,
GptResponseFormat,
GptResponseFormatType,
)

You can then do something like this:

schema = {
    "type": "object",
    "properties": {
        "answer": {"type": "number"},
        "steps": {"type": "array", "items": {"type": "string"}},
    },
    "required": ["answer", "steps"],
    "additionalProperties": False,
}

req = CreateCompletionRequest(
    attribution=Attribution(rid="ri.compass.main.folder.<your-project-rid>"),  # pick one valid attribution member
    request=CompletionRequestV3(
        gpt_chat=GptChatCompletionRequest(
            messages=[
                ChatMessage(role=ChatMessageRole.SYSTEM, content="You are a precise math assistant."),
                ChatMessage(role=ChatMessageRole.USER, content="Compute (12 * 7) - 5."),
            ],
            temperature=0.0,
            response_format=GptResponseFormat(
                type=GptResponseFormatType.JSON_SCHEMA,
                json_schema=schema,
            ),
        )
    ),
)

This functionality is also live in AIP Logic for the OpenAI models.

Yes I have seen they started adding support for structured outputs. But it doesn’t really address the underlying issue of being able to use the open source SDKs provided by the labs. It looks like Palantir has a solution to this, but is not providing the docs for some reason.

What I am looking for from Palantir is some docs on how I can init clients. The referenced video includes some auto magic like loading certificate files from the env vars. There’s no documentation on how this works. Are these global env that are populated in all code repos by default? IDK. Also how can I do this from TypeScript functions. Below is an example adapted from the video of how I would init and OpenAI client using their SDK (something I would very much like to to to avoid their vendor locked proprietary model catalog API that doesn’t support the full model API).

 import os
from typing import Optional, Dict

import httpx
from openai import OpenAI


def create_openai_client(
    base_url: Optional[str] = None,
    headers: Optional[Dict[str, str]] = None,
    timeout_s: float = 60.0,
) -> OpenAI:
    """
    Create an OpenAI client that works in enterprise/proxied environments
    (e.g., Foundry model gateways) by:
      - allowing a custom base_url (proxy endpoint)
      - injecting custom headers (e.g., Bearer token)
      - honoring REQUESTS_CA_BUNDLE for TLS verification
    """

    # Some environments authenticate at the gateway/proxy (via headers),
    # but the OpenAI SDK still expects an API key value to exist.
    if "OPENAI_API_KEY" not in os.environ:
        os.environ["OPENAI_API_KEY"] = "dummy-api-key"

    # If your org provides a custom CA bundle (common with TLS-intercept proxies),
    # use it for TLS verification. Otherwise, default to system/certifi trust.
    ca_bundle = os.environ.get("REQUESTS_CA_BUNDLE")
    verify = ca_bundle if ca_bundle else True

    # Build an httpx client so we control TLS + headers.
    # NOTE: You can omit headers here and instead pass default_headers to OpenAI;
    # doing both is fine, but avoid duplicating conflicting header keys.
    http_client = httpx.Client(
        verify=verify,
        timeout=timeout_s,
        headers=headers or None,
    )

    # base_url is useful when routing via a proxy, e.g.:
    #   https://{hostname}/api/v2/llm/proxy/openai/v1
    # If you're calling OpenAI directly, you can omit base_url.
    return OpenAI(
        base_url=base_url,
        http_client=http_client,
        default_headers=headers or None,
    )

If anyone out there is looking to understand how you can use structured outputs using functions v1 here is a working example:

import { Function, OntologyEditFunction, Integer } from "@foundry/functions-api";
// Uncomment the import statement below to start importing object types
// import { Objects, ExampleDataAircraft } from "@foundry/ontology-api";
import { GPT_5_mini } from "@foundry/models-api/language-models"
import { GPT_5_nano } from "@foundry/models-api/language-models"
import { GPT_5_codex } from "@foundry/models-api/language-models"
import { GPT_5_1 } from "@foundry/models-api/language-models"

export class MyFunctions {
    
    @Function()
    public async chatCompletions(model: string, user: string, system: string, jsonSchema?: string): Promise<string> {
        let response: string | undefined = 'undefined';
        let responseFormat = {};
        let parsedJSON;
        let completion = undefined;
        let params = {};
        switch(model) {
            case 'gpt5.1':
                if (jsonSchema) {
                    const parsed = JSON.parse(jsonSchema);
                    responseFormat = { 
                        responseFormat: {
                            jsonSchema: JSON.stringify({
                                name: parsed.name,
                                schema: parsed,
                            }),
                            type: 'json_schema'
                        }};
                }
                completion = await GPT_5_1.createChatCompletion({
                params: {
                    ...responseFormat,
                },
                messages: [{ 
                    role: "SYSTEM", contents: [{ text: system }] }, 
                    { role: "USER", contents: [{ text: user }],  }],
                });
                response = completion.choices[0].message.content;
                break;
            case 'gpt5-codex':
                completion = await GPT_5_codex.createChatCompletion({
                input: [
                    { inputMessage: { role: "system", content: { text: system } } },
                    { inputMessage: { role: "user", content: { text: user } } },
                ],
                });
                response = completion.output[0].outputMessage?.content[0].text?.text;
                break;
            case 'gpt5-mini':
                if (jsonSchema) {
                    const parsed = JSON.parse(jsonSchema);
                    responseFormat = { 
                        responseFormat: {
                            jsonSchema: JSON.stringify({
                                name: parsed.name,
                                schema: parsed,
                            }),
                            type: 'json_schema'
                        }};
                }
                completion = await GPT_5_mini.createChatCompletion({
                params: {
                    ...responseFormat,
                },
                messages: [{ 
                    role: "SYSTEM", contents: [{ text: system }] }, 
                    { role: "USER", contents: [{ text: user }],  }],
                });
                response = completion.choices[0].message.content;
                break;
            case 'gpt5-nano':
                if (jsonSchema) {
                    const parsed = JSON.parse(jsonSchema);
                    responseFormat = { 
                        responseFormat: {
                            jsonSchema: JSON.stringify({
                                name: parsed.name,
                                schema: parsed,
                            }),
                            type: 'json_schema'
                        }};
                }
                completion = await GPT_5_nano.createChatCompletion({
                params: {
                        ...responseFormat,
                    },
                    messages: [{ 
                        role: "SYSTEM", contents: [{ text: system }] }, 
                        { role: "USER", contents: [{ text: user }],  }],
                });
                response = completion.choices[0].message.content;
                break;
        }
        return response || 'undefined';
    }
}

If you would like to to use the models directly via a LLM Proxy below is also a working example for OpenAI models using the OpenAI SDK. This is ultimately the solution I was looking for, but due to the limited model support I am resorting to proxies in functions v1 for any models not supported by proxies. Please note that I have only been able to get this to work with some of the GPT4 series models. I can not find documentation on what models are supported behind these proxies. One other note is this requires personal access tokens as the OSDK does not support auth for LLM proxies.

import {
  SupportedFoundryClients,
  type OpenAIService,
} from '@codestrap/developer-foundations-types';
import OpenAI from 'openai';
import { foundryClientFactory } from '../factory/foundryClientFactory';
import type { ChatCompletionCreateParamsStreaming } from 'openai/resources/chat';
import type { RequestOptions } from 'openai/core';
import type { ResponseCreateParamsStreaming } from 'openai/resources/responses/responses';

// ADd tpe definitions for the OpenAI response here, or in a separate file and import them in, to ensure type safety when working with the API response data.
export function makeOpenAIService(): OpenAIService {
  const { getToken, url, ontologyRid } = foundryClientFactory(
    process.env.FOUNDRY_CLIENT_TYPE || SupportedFoundryClients.PRIVATE,
    undefined,
  );

  return {
    // TODO code out all methods using OSDK API calls
    completions: async (
      body: ChatCompletionCreateParamsStreaming,
      options?: RequestOptions,
    ) => {
      const token = await getToken();
      const client = new OpenAI({
        baseURL: `${url}/api/v2/llm/proxy/openai/v1`,
        apiKey: process.env.FOUNDRY_TOKEN,
      });

      const stream = await client.chat.completions.create(body, options);

      let text = '';
      for await (const chatCompletionChunk of stream) {
        text += chatCompletionChunk.choices[0]?.delta?.content || '';
      }
      return text;
    },
    responses: async (
      body: ResponseCreateParamsStreaming,
      options?: RequestOptions,
    ) => {
      const token = await getToken();
      const client = new OpenAI({
        baseURL: `${url}/api/v2/llm/proxy/openai/v1`,
        apiKey: process.env.FOUNDRY_TOKEN,
      });

      // Responses API streaming emits semantic events (delta, completed, error, etc.)
      const stream = await client.responses.create(
        { ...body, stream: true },
        options,
      );

      let text = '';

      for await (const event of stream) {
        if (event.type === 'error') {
          throw new Error(`OpenAI API error: ${event.code} - ${event.message}`);
        }

        if (event.type === 'response.output_text.delta') {
          text += event.delta ?? '';
        }
      }

      return text;
    },
  };
}

Notes for posterity Regarding the diiference between Foundry proxy and OpenAI’s API related to JSON scehma:
OpenAI direct (Responses API): you pass the schema as a real JSON object in the request. Nothing is stringified. You send name and schema as separate fields (and you can also set strict: true). Example shape:
text.format = { type: 'json_schema', name: 'EditPlan', schema: <object>, strict: true }.

Foundry proxy SDK: you don’t pass the schema object directly. You pass a string field (responseFormat.jsonSchema) that must itself be a JSON string encoding an envelope:
{"name":"EditPlan","schema":{ ...actual JSON Schema... }}.
If you put schema keywords like required at the wrong level (i.e., not under schema), the backend complains (that’s the response_format.json_schema.required error you hit).

Key gotcha you already ran into: inside that envelope, your actual JSON Schema must live under the schema key. $schema belongs inside the schema object (schema.$schema), not as a sibling of name.

My guess is this is done due to the fact that Foundry requires Java bindings which are not going to handle the variable shape of JSON schemas. So they just treat it as a blob.