I have a use case where I want to writeback to 500k objects with AI. I want to loop through each object and calculate a score for each row , then order the object based on this calculated score.
We mix of strict calculations and subjective assessments to determine the scores. Some factors are based on clear numeric ranges, while others rely on keywords and context to make a more nuanced judgment. For those relying on keywords, we get the AI to look at descriptions to score them accordingly.
I was wondering if this is possible to do within AIP logic functions / pipeline builder, and if so, what would be the most efficient solution? Right now I tried code repo to run an AIP logic function (maximum object input is 10k), and tried pipeline builder but deployment fails due to an error relating to spark resource allocation .
Chiming in from the Pipeline Builder side:
What we’ve seen some people do with large amounts of data is turn on the “Skip recomputing rows” option in the use LLM node, and then run the pipeline builder logic on smaller chunks of their data. Eg. you could try with 20k, and then move onto the next 20k, and since the first 20k is already cached by the time you do the next 20k it will be less likely to fail/rate limit
I think you could also do something with logic and automates where over time you run your logic on smaller subsets of 500k