How to create embedding pipeline with streamed data?

63e5d2675de6a930d735 · April 24, 2025, 7:25am

I’m trying to build a pipeline that can receive raw text from a Python server, chunk/embed the text, and then store it in an embedding ontology. Essentially, I’ve followed the Semantic Search tutorial (https://www.youtube.com/watch?v=IWGFU7Jrgek), and now I’d like to change the data source from a fixed set of PDFs to text provided via API.

I’ve created an empty manual table (schema is all string columns; title, url, raw_content), and my next step is to take the dataset’s RID and make POST requests from my server. However, I can’t find the RID - is my approach valid, or have I misunderstood how to send/receive data programmatically? I’ve attached a screenshot of my fairly simple pipeline so far:

Thanks!

david · April 24, 2025, 1:23pm

Hey, thanks for posting! You’ll want to create a streaming dataset in the folder in which you created the pipeline builder above!