Hello,
I’m attempting to use an ontology action to create a new instance of an object type featuring a vector embedding property. The objects are chunks of an uploaded and text-extracted PDF, and the aim is to allow users of a document intelligence AI system to upload and use their own files without any developer intervention. I’ve computed an embedding vector using the appropriate model (text-embedding-3-small), but I’m running into some trouble when it comes to actually creating the new object. There seems to be a length limit of 1000 on array parameters passed to ontology actions. Unfortunately for me, the embeddings in question are 1536-dimensional. Is there any way around this limitation? Have I misidentified the problem? Any suggestions on alternative approaches?
The alternative that jumps out at me would be to incrementally run a pipeline off a mediaset, but that has its own share of problems: the uploading app has limited visibility into completion status, there’s no support in pipeline builder for incremental mediaset builds, converting a mediaset to a dataset to be incrementally built from does not seem to work (throwing errors about media references without an associated media set), and though it might be possible to rewrite our entire existing batch ingestion pipeline in a series of code repository transforms, I would much rather avoid it, in large part because of the lack of embedding or LLM support in the Python SDK.
Any help would be greatly appreciated.