Hello,
I have been attempting to use Foundry as a backend for a webapp which ingests, indexes, and serves a semantic search API to an LLM chat frontend (a React app hosted outside of the platform). To that end, I have two components, implemented in the following ways:
First, a document indexing pipeline. This takes documents uploaded to a mediaset, chunks them, and adds them to a dataset with a vector index.
Second, a set of functions (previously AIP Logic functions but now code-repository functions) which the frontend may call to search and return results from that indexed set.
I would like to minimize the reliance on custom code, and use as much built-in Foundry functionality as possible, so I present a few questions on the matter of where that simplification might be made:
First, in our document indexing pipeline, we read from a mediaset that we upload to via the mediaset put API endpoints. I would like to incrementally update the output dataset, but there doesn’t seem to be any way to do that without writing a custom transform function. Is that the case, or am I missing something?
Second, I want to support filtering on which particular documents can be searched for through my custom functions. I have allowed the caller to pass in a list of RIDs, but I’ve found that there is no support to filter a nearest neighbor search by such a parameter in AIP Logic, forcing me to use a Code Repository instead. Is there any way to avoid this?
Third, is there some way to allow SQL queries of a dataset via SDK or API call?
Fourth, is there any built-in support for lexical/keyword search over dataset columns in the SDK or through the APIs?
Thank you for any help you can offer.