How to create Vision-LLM API that directly accepts image data

chanheel · May 19, 2025, 6:28am

Hi there, I’m trying to create a vision-LLM workflow that I can use from outside Foundry by calling an API and supplying the image as input.

More specifically it would be nice if we could pass in the image data directly without uploading to Foundry via attachments or media sets.

Is there an easy way to build something like this? The approach I was thinking was creating an AIP Logic function or typescript/python Function and exposing that using OSDK.

jakehop · May 19, 2025, 9:04am

Yes, you basically just need to send a multimodal message containing the image to the model.

You can create a function that passes a multimodal completion request message-object to the service/model (along with params, etc.) and send the based64-encoded image in the message.

Let me know if this answers it or you need more help.

nickk · May 19, 2025, 2:28pm

First class base 64 media support in Logic has been something on our radar. We are currently tracking adding support for base64 media in Logic. So ideally in the near future, you would be able to pass a base 64 media string into a Logic function, directly perform transformations, and send to a vision model.

For now, as said above, using functions would probably work best here.

system · July 18, 2025, 2:28pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.