How to prevent a failed LLM call in a subprocess from crashing the whole AIP pipeline?

I’m working on a document processing pipeline built with AIP (AI Platform), where I extract different types of information from a document using multiple LLM-based extractor subprocesses.

Here’s the basic structure:

  • There is a main AIP function that calls several extractor AIP functions.
  • Each extractor function includes an LLM call.
  • If any one of the LLM calls throws a serviceException, the whole main AIP function fails, and I receive no output at all — not even partial results.

My question is:

Is there a way to handle such failures gracefully, so that the main AIP function can continue running even if some of the extractors fail (something like a try-catch per extractor call), and I can still collect the results from the successful ones?

My goal is to make the process fault-tolerant: one failed extractor shouldn’t prevent the others from completing and returning useful data.

Has anyone implemented a similar fault-tolerant pattern in AIP? I’d love to see examples or hear how you approached it.

Any suggestions, workarounds, or pointers to best practices would be much appreciated — thanks in advance!

1 Like

I have a similar workflow. While Logic is still good for orchestrating everything, failure handling is sub-optimal (Agents are even worse, but that’s a separate topic), and I had to implement a few things in code to solve it.

It basically works like this:

  • Logic used to orchestrate everything.
  • Every call is wrapped in conditionals, based on either an additional output boolean or an evaluation of the return.
  • These allow the logic to continue, even if a subprocess has failed.

Would that work in your case? Make sure that you wrap dependencies, so you don’t spend compute on parts that weren’t properly processed if a previous step crashed.

Thank you for the response! Really appreciate it!
Could you help with a slightly more detailed explanation of how to wrap a function call using a conditional?

(I’ve already tried to handle the error for the following function that works with the extracted result. I check weather the result of extract is not null.)

Sure, but I think you’ve basically done it already:

  1. Function A runs, extract is either output something or null
  2. if extract ≠ null, Function B runs, and extract of B is either something or null.
  3. if extract of B ≠ null, Function C runs…
  4. … etc…

You can use these conditionals to provide details for debugging too, which could be useful in finding out where the issue occurs.

Let me know if this helped!