Custom functions in foundry-rules

55b969276dcf43cd2235 · October 8, 2024, 3:13pm

Hello Community team,

Does anyone know how to use custom functions in foundry_rules? where should we create the function to be used?, what languages are supported, what are the limits, etc.?

thi is a very interesting feature but we haven’t found any documentation about it anywhere.

Uploading: custom function.png…

alexanders · October 8, 2024, 3:41pm

Hey there!

Are you referring to a custom transform for Foundry Rules? You can find some documentation about it here.

If you follow these steps, you can deploy your own transform (in Java) that interacts with the Foundry Rules backend.
Please note that once you are using self-managed transforms, there will be manual maintenance effort. It will not update automatically with any workflow changes you might do within the Workflow Configuration after creating your custom transform.

Regarding limitations: The initial state of the self-managed transform should be as “capable” as the managed transform - there should be no special limitations.

What are you trying to achieve with the custom transform?
I have recently also posted about selective rule evaluation within Foundry Rules via a custom transform here.

Hope that helps!

Alex

55b969276dcf43cd2235 · October 8, 2024, 4:48pm

Hello Alex,

Thanks for your feedback and the links, indeed we have a transform that run/interact with foundry rule logic we have implemented. Here, i’m reffering here to custom function (not sure if it is the same thing as custom transform though) which we were able to enable using optional feature in the documentation here Foundry Rules • Settings & customization • Enable optional features • Palantir (adding doc link as i have an error when trying to upload a picture of the feature, this feature seems to be new and is not in the doc link above),

Our use case is simply an optimization problem.

In our rule logic we have a very large dataset on which we add a new column using expression CAST (date_trunc('month', max("date_column")) AS DATE) for calculating the max on date column, however this expression triggers a window function in the query plan with a hugue shuffle which causes the build to fail

We then try an aggregation to calculate the max_date, which first outputs a one-line dataset that we then join with the same dataset to create the max_date column. the aggregation itself is ultra-fast, but the subsequent join once again triggers a huge shuffle in the query planand the build fails.

While searching for options, we discovered the “custom function” feature in the settings, the idea for us being to create a function that takes the dataset as input, calculates the max via aggregation function and adds the column via a simple df.withcolumn(“max_date”, F.lit("max_date), thus avoiding all those shuffles but we can not find any documentation about it.

Hope my description is clear enough to help us,
Best,

alexanders · October 9, 2024, 8:17am

Ah, I see! I am not a maintainer of Foundry Rules, but I think this feature is actually already a bit older but never matured out of the experimental state. Therefore usage is not encouraged.

Is the aggregation dependent on other logic within your rule? If not, maybe you can aggregate the max dates and add them as column to the dataset before running the Foundry Rules transform

55b969276dcf43cd2235 · October 9, 2024, 10:29am

Thanks for your feedback, it duly noted.

This is indeed an option but we didn’t want it initially because the dataset is incremental and snapshotting such a large dataset each time to add the column isn’t great for us and could lead to OOM issue.

Do you know if it is possible to retrieve a value as a parameter in foundry rules?
The idea is to only calculate the max in a dataset, retrieve it as a variable in foundry rules and just add the value to a new column in an expression.
this will save us the snapshot step.

alexanders · October 14, 2024, 8:14am

Do you know if it is possible to retrieve a value as a parameter in foundry rules?

As far as I know, there is unfortunately no first class way to add it as a pure parameter. You could try though to introduce your aggregation as dataset input to the Foundry Rules Workflow, then you could join it to all rows.

You might run into the same OOM issue as before. But I could also imagine that without the aggregation, Spark might optimize for a broadcast join and the transform could succeed. (In addition, you could also still try tweaking the Spark profile)

If you consider this option, please note:

There is no wide union in the Rule editor. So one way to go about adding the parameter is by joining it with your other dataset within the relevant rules
You can do this by adding a join_key, which will be the same for all rows. In the screenshots, I add the join_key via an expression card.