Breaking Cyclic Dependencies in Foundry Code Repository

Breaking Cyclic Dependencies in Foundry Code Repository Transforms

Problem Description

I’m encountering persistent cyclic dependency errors in my Python transforms. My workflow involves:

  1. A materialization dataset (mat-rid)that contains study details

  2. A transform that needs to:

    • Process some basic study details

    • Merge them with data from the materialization

    • Output a dataset that eventually feeds back into the ontology that creates the materialization

This creates a cyclic dependency: transform → output → ontology → materialization → transform input

What I’ve Tried

  1. Creating a snapshot transform in a separate Python file that copies the materialization to a new dataset, then using that snapshot in my main transform

  2. Splitting my transform into two separate transforms (one for processing, one for merging with materialization)

  3. Creating a separate pipeline that syncs the materialization to an intermediate dataset

Despite these approaches, I’m still encountering cyclic dependency errors.

My Transform Structure

My main transform looks like this:

@transform(
output=Output(“output-rid”),
basic_study_details=Input(“input-rid”),
db=Input(“db-rid”),
basic_study_details_mat=Input(“mat-rid”)
)
def process_datasets(ctx, basic_study_details, db, output, basic_study_details_mat):
# Process data and merge with materialization
# …

Original Problem:

I had to come to this approach of merging the old data (from ontology materialization) with newly processed data because we want to move away from the Palantir @incremental approach, as we have frequent releases (new features/updates) which is causing the snapshot build instead of incremental (resulting in re-executing already processed data -strictly not expected)

Questions

  1. What’s the most effective way to break this cyclic dependency while still ensuring my transform has access to the latest materialization data?

  2. Are there specific patterns or best practices for working with materializations in transforms that I should follow?

  3. Is there a way to configure Foundry to ignore certain dependencies for cycle detection?

Any insights or examples from similar situations would be greatly appreciated!

“What’s the most effective way to break this cyclic dependency while still ensuring my transform has access to the latest materialization data?”

Using action logs could work here.

The other question is does this have to be done in the Python transform or could all the logic move to North of the Ontology?

It would be great if it is possible to move this to ontology.

(The original problem : I had to come to this approach of merging the old data (from ontology materialization) with newly processed data because we want to move away from the Palantir @incremental approach, as we have frequent releases (new features/updates) which is causing the snapshot build instead of incremental (resulting in re-executing already processed data -strictly not expected)

HI @jmh ,

  • As per your reply, i could not understand how action logs can help me here, can you please elaborate.

“What’s the most effective way to break this cyclic dependency while still ensuring my transform has access to the latest materialization data?”

Using action logs could work here.

  • As per your other question , i am open for any of the possible solutions that can solve my original problem statement.

The other question is does this have to be done in the Python transform or could all the logic move to North of the Ontology?

The original problem : I had to come to this approach of merging the old data (from ontology materialization) with newly processed data because we want to move away from the Palantir @incremental approach, as we have frequent releases (new features/updates) which is causing the snapshot build instead of incremental (resulting in re-executing already processed data -strictly not expected

Hey!

If this flow truly has to exist, then my only recommendation is a mirrored or second OT in your Ontology + automations.

I have a similar flow that I did not entirely do in transforms or north of the ontology because the processing is too complex and needs to happen in Pipeline Builder. Specifically, I am incrementally processing rows of data, and each time I do, I get a ‘cursor’ to tell me where I left off, but that ‘cursor’ has to be parsed from some pretty complex code that I wanted to handle in Builder.

So to get around the dependency problem: when the cursor information is parsed and reaches the ontology, I trigger an automation - condition being whenever there is a change to that OT - that edits the value on another object type that was feeding into my transforms in the first place.

Automate does have warnings about cycling that you can turn on, which will help you avoid creating an infinite loop, but this got me around the dependency.

Happy cycling!

1 Like

The original problem : I had to come to this approach of merging the old data (from ontology materialization) with newly processed data because we want to move away from the Palantir @incremental approach, as we have frequent releases (new features/updates) which is causing the snapshot build instead of incremental (resulting in re-executing already processed data -strictly not expected

Perhaps you’ve already decided to abandon the @incremental approach but: do you know that transaction limits can be set on incremental transforms? If your transform input forces you to snapshot frequently and risk OOM errors then this could be helpful Python (Spark) • Incremental transforms • Limit batch size of incremental inputs • Palantir