Hi there, I’m trying to create a vision-LLM workflow that I can use from outside Foundry by calling an API and supplying the image as input.
More specifically it would be nice if we could pass in the image data directly without uploading to Foundry via attachments or media sets.
Is there an easy way to build something like this? The approach I was thinking was creating an AIP Logic function or typescript/python Function and exposing that using OSDK.
Yes, you basically just need to send a multimodal message containing the image to the model.
You can create a function that passes a multimodal completion request message-object to the service/model (along with params, etc.) and send the based64-encoded image in the message.
Let me know if this answers it or you need more help.
First class base 64 media support in Logic has been something on our radar. We are currently tracking adding support for base64 media in Logic. So ideally in the near future, you would be able to pass a base 64 media string into a Logic function, directly perform transformations, and send to a vision model.
For now, as said above, using functions would probably work best here.