What is better between Agents and Direct Connection ? How can I scale vertically, horizontally, etc.?

VincentF · September 17, 2024, 4:07pm

I’m setting up a connection to an external system. I want to load data from this system.
I have 2 main options:

Agents based connection
Direct connection

Which one should I use ? When ? What are the pros and cons of both ?
How do they scale up vertically, horizontally, dynamically ?

VincentF · September 17, 2024, 4:34pm

Note: the below is only a simplification of all the options available. The docs are more precise and exhaustive, but this should be a start.

First, what is the difference:

Direct connection is a container spun up on-demand to ingest data whenever you need this data to be ingested. It requires egresses to be setup (=authorization to “reach out” to given systems). It can only reaches system that have an IP or a URL that is internet-accessible, as the URL/IP is resolved from Foundry directly.
https://www.palantir.com/docs/foundry/data-connection/set-up-direct-connection/
An Agent proxy behaves like an inverting network proxy, forwarding network traffic originating in Foundry into the network where the agent is deployed, and relaying traffic back to Foundry. This is useful for some specific type of sources.
An Agent worker also historically called Agent is a small piece of software, that runs on (most often) a linux host machine, which can access (network wise) the source system you want to pull data from and your Foundry instance.
https://www.palantir.com/docs/foundry/data-connection/set-up-agent/

Different capabilities are supported by the different means to connect.
https://www.palantir.com/docs/foundry/data-connection/core-concepts/#capabilities.

Other useful vocabulary:

A Source = the description in Foundry of an external source system, which it can connect to, include credentials, URLs, etc. and other information relevant to connect. (e.g. My Postgres Database)
A Sync = the definition of an ingestion job, some data to load and how to load it. (e.g. SELECT * from MY_TABLE) - Note: you will find better definition directly in the docs, there https://www.palantir.com/docs/foundry/data-connection/core-concepts/ which encompass batch sync, stream sync, media syncs etc.

What should I use ?

Use Direct Connection when you can. That’s less to setup, simpler to maintain (no linux box to handle), scales better to increased loads.
Use Agent proxy if you can. Today it is only usable for a subset of sources: REST sources, virtual tables for some source systems, … but it will expand in the future.
Use Agents worker/Agent if required to reach out to on-prem systems, systems not internet-accessible, having no public IP or URL, etc. when Agent proxy is not a viable option.

Note: the explanations below are valid for Agent runtime, but most of it is the same for Agents proxy.

What is more secure ?

Both solutions are secure, but both depends how they are used, configured, and how they are operated.

Direct connection requires an egress to be setup (=the network permissions to hit a particular URL/IP on a given port) which requires to be approved by admins of the platform (requires infosec-related permissions, granted in control panel). So it shouldn’t be possible to create an egress by mistake, without additional approval, as a standard user of the platform, without a review.
The “security” of egresses, depends on the domains allowed and who is authorized to import it or use it. If you have an egress to a public website, then it means that anyone with sufficient perms to use this egress to this website will be able to pull or push data to this website…

Agents are not subject to egress policies. Everything an agent can access network wise, can be accessed. Which means that if the network firewalls grant access to a public website, then you can access this public website from the agent, and hence from data connection.
In that sense, depending on the network boundaries, the URL/IPs accessible will be more or less tight.
You can however configure an allowlist on the agent to ensure that the Agent will only be able to reach to specific IPs.

How do direct connection scales ?
A direct-connection ingests are (for the explanation) container spun up on-demand to ingest data whenever you need this data to be ingested.
Hence it scales horizontally dynamically.

You can see that like builds, essentially. You don’t have to think about the scaling. It will just allocate more if needs be. You won’t wait in a queue (unlike in agent, where you have a fixed parallelism configured).

How do agents scales ?

Agent can be scaled vertically, depending on the hardware they are running on.
If the hardware on which the agent is running is large enough, or if the size/frequency of syncs running on the agent allows it, it is possible to tweak the configuration of the agent to scale up the number of syncs an agent can run at the same time, and the number of files an agent is ingesting in parallel to Foundry.

A pragmatic way to know how far you can scale this up is to test it ! Increase the max parallelism, monitor for failures, and repeat.
e.g. if you have numerous small syncs, you can bump it to like more parallel syncs.
e.g. If you have a few very big syncs, then you can tweak the number of files uploaded per syncs in parallel, to speed them up.

Note: there might be risks for each (e.g. if you have more files loaded per syncs, then you can take down source systems, depending on the source system)

Agents can be scaled horizontally, manually.

You can have multiple agents allocated to the same source(s) or even different source(s). Foundry will take care of the distribution of the syncs amongst all the agents available, mapped to the source you are launching a job for. So you can scale agent “horizontally” as well.

You can have an high-availability (HA) setup by setting-up multiple agents, mapped to the same source, so that the Syncs running on those sources are executed on any available agent mapped to this source.
e.g. You have 2 agents, one is down because your IT needs to perform maintenance on the linux box on which it is hosted, your sync will all be routed to the second agent that is still online.
e.g. You have 2 agents, mapped to 1 source and you are launching 10 syncs on this source, most likely 5 will land and be executed on one agent, 5 will land and be executed on the other agent.

The distribution of jobs on Agent is described in the docs:
https://www.palantir.com/docs/foundry/questions-answers/data-connection#how-do-multiple-agents-on-a-single-sync-handle-division-of-labor-is-there-any-parallelism-between-the-agents
Syncs are scheduled on available healthy agents either randomly or based on whichever has fewer syncs in queue [...].

You can’t dynamically scale horizontally agents, unlike Direct Connection.

It is possible to fill up agent queues and have to wait for the queue to be processed, so that your new job get processed, because you already have N jobs in processing stage, filling up the queues and parallelism of the agent(s) available.
This won’t happen with Direct Connection.