How does a sync perform when network outage comes?

In our foundry environment, Foundry connects an agent server(VPC) with an on-premises internal source system via SFTP.

Sometimes the connection between them is lost. Usually the server mounts EFS again immediately and automatically after that happens. As a result, total time of disconnection is less than a minute (same as the monitoring frequency of the agent server).

The timing of the disconnection and scheduled sync are not overwrapping so far. However, the problem is what will happen if they occur simultaneously? Would you let me know whether a sync will succeed or not in the following two cases:

  1. a sync start during a disconnection.
  2. a disconnection happens during a sync

Now we are searching the problem and found that metrics or logs monitored in Foundry UI say nothing about it. No disconnection can be observed in application side.

The connection between the SFTP server and your agent VM doesn’t impact whether a Sync will start. It will start up and just fail immediately.

Specific semantics on what happens to a sync during network degradation at runtime will vary depending on duration and source type, but generally anything longer than a short blip may cause a sync to fail.

There are some options you can implement to mitigate this.

1. Configure retries on sync schedules, with time delay
You can configure any Foundry schedule to retry on failure after some cooling period. Something like “retry 2 times with 10 minute delays on failure” can provide resilience to short network problems. The configuration name in the schedule configuration view is “Attempt failed jobs multiple times”.

2. Set up multiple agent runtimes on different network paths.
A Source can have multiple agent runtimes assigned. For production critical workflows, you can improve resilience by deploying 2 or more Agents in separate VMs and assign both to the Source. The more these two VMs are isolated from each other the better (different networks, different physical hosts, etc.)

3. Use a file Limit to make syncs shorter
You can use a “Limit number of files” filter on your SFTP source to limit the number of files uploaded during a single sync. If the sync fails due to networking issues during execution no files will be committed. So if you have processed 3000/4000 files when the network fails, all progress is lost. Instead, configuring a file limit of 1000 files and running the sync more frequently would achieve partial progress. Frequent, small syncs are more resilient to outages.