Multi-Agent Orchestration Timeout Issues and Best Practices

Description:

Use Case

I need to orchestrate multiple AI agents that work sequentially, where each agent builds upon the previous agent’s work:

  • Agent 1: Takes input, processes it, creates JSON output

  • Agent 2: Gets output from Agent 1, uses additional tools for enrichment

  • Agents 3, 4, 5: Use inputs from prior stages to create their outputs

  • Agent 6: Collects all inputs from prior agents and creates final output

Why This Approach?

I separated the work into multiple agents instead of using a single prompt because:

  1. Model accuracy: A single prompt handling all tasks caused excessive tool calling and back-and-forth, leading to many mistakes

  2. Timeout issues: AIP Logic’s default 5-minute timeout caused the function to timeout 90% of the time

Current Problem

Approach 1: AIP Logic Orchestrating Agent Functions

Using a single AIP Logic function that calls all agents as functions via tool calls still hits the 5-minute timeout limit, since each agent takes ~2 minutes to execute.

Approach 2: TypeScript v2 Function Orchestration

I attempted to use a TypeScript v2 function to orchestrate all agents:

  • Live Preview: The function successfully ran for 15 minutes in live preview (possibly because it was running in Code Workspaces?)

  • Configuration: I was able to set the timeout to 1200 seconds (20 minutes) and save it in the function configuration page

  • Production Issue: When wired to an action and executed, I get:

  Unknown Server Error
  Error: [Default] Internal
  Error ID: e61c764a-793f-4d58-8e5d-fac9e0d98ac4

Documentation Discrepancy

The documentation states that preview mode only allows 280 seconds, but I observed 15-minute executions in Code Workspaces.

Questions

  1. What is the actual maximum timeout for TypeScript v2 functions in production (non-preview) mode?

  2. What is the recommended architecture for multi-agent orchestration workflows that require >5 minutes of execution time?

  3. Is there a workaround for the timeout limitations when orchestrating multiple AI agents?

  4. Are there alternative patterns (e.g., async workflows, webhooks, scheduled jobs) that would better support this use case?

Environment

  • Function Type: TypeScript v2

  • Execution Context: Ontology Action

  • Total Expected Runtime: ~12-15 minutes (6 agents × 2 minutes each)

Im not sure if this is the right way to do it, but would like to see if there is a better way to do this

As a workaround, added the AIP logic functions to OSDK , create a conda SDK, created Data connection for the foundry endpoint, then I created a python transform repo, added the sources and SDK to the repo

Then wrote a python transform that would read the ontology object and pass them as input to the functions via OSDK, get the output and write it to a dataset

The next step would be be wire the dataset to ontology object, so I can use the inference in ontology layer

I guess another idea would be to build some UDF where we can use the AIP logic function in PB

let me know if anyone else can think of a better option,

I’d recommend creating an orchestration object that has a state and can store the metadata. You can then create automations using automate to orchestrate calling each agent. Each agent will still have a 5 minute timeout, but the overall process can then run for much longer.

so the orchestration object would store output from all agents as paramters.

each agent would be a independent action, and the output of each agent, will be ontology edit on the orchestration object property. Create 6 actions and then create 6 effects .

I guess the advantage with this approach is that the output of the agents are immediately written to ontology.

But I can also do this from a typescript function to edit the orchestration layer, and create an action from the function and configure it as a effect, the thing thats stopping me from doing this is the function timeout hard limit of 5 minutes.

Why would the platform decide 5 minutes is more than enough, this makes me think about how AWS lamdba used to have the same 5 minute limit, and they later changed it to 15 minutes :slight_smile:

I think the orchestration object is a better approach for event driven approach, that we we can target execution for objects that change.