How to create Vision-LLM API that directly accepts image data

Hi there, I’m trying to create a vision-LLM workflow that I can use from outside Foundry by calling an API and supplying the image as input.

More specifically it would be nice if we could pass in the image data directly without uploading to Foundry via attachments or media sets.

Is there an easy way to build something like this? The approach I was thinking was creating an AIP Logic function or typescript/python Function and exposing that using OSDK.

Yes, you basically just need to send a multimodal message containing the image to the model.

You can create a function that passes a multimodal completion request message-object to the service/model (along with params, etc.) and send the based64-encoded image in the message.

Let me know if this answers it or you need more help.