Creating link types between Pipeline Builder owned object types and object types not owned by Pipeline Builder

Bzisch · November 21, 2024, 2:16pm

Hi,

Is it possible to create link type between an object type owned by Pipeline Builder pipeline and an object type that a user has access to but is not owned by the same Pipeline Builder pipeline (e.g. created in another PB pipeline or through OMA)?

helenq · November 21, 2024, 2:25pm

You can create link types between objects owned by Pipeline and objects owned by OMA in the OMA app, but these links won’t visibly show in your Pipeline Builder graph.

Currently, links defined in Pipeline Builder can only use objects created within the same pipeline.

https://www.palantir.com/docs/foundry/pipeline-builder/outputs-add-ontology-output/#add-a-link-type-output

d550f1cdf33e13c57378 · November 21, 2024, 6:03pm

In our organization, we prefer using the pipeline builder over the code builder despite its limitations (e.g., lack of IDE tooling for refactoring, difficulty in copying and adapting code, and easier transform creation in code). The main reason is that it allows each developer to update data generation, exposure to object types, and linking of object types in one place. This integration of ontology with datasets in the pipeline builder is crucial for us.

However, as our pipelines grow, we need to split them into smaller parts. Due to the limitations addressed by this feature request, we cannot do this effectively. When we split pipelines, we need to reference objects created by another pipeline, but we can’t link them, making scaling difficult. This forces us to use the pipeline builder for datasets and manage ontology in the Ontology Manager, which:

Requires two steps for any object change instead of one.
Defeats the purpose of the UI-based pipeline builder for our team.

Without this feature, we might as well use code/Databricks pipelines for datasets and rely on CI/CD and testing, which is easier than using the UI pipeline builder. As Palantir moves towards the Ontology branching ecosystem, this limitation hinders that vision.

Could you please confirm if this feature will be available? If not, what workaround can we use?

Joel · November 21, 2024, 8:21pm

@d550f1cdf33e13c57378,

Out of curiosity, why do you need to split your pipeline into multiple Pipeline Builder (PB) pipelines? One strategy that has helped me keep a PB pipeline clean and maintainable as it scales is to think of PB colors like Code Repo .py files with the following properties:

Colors can share inputs (hidden in screenshot below).
The outputs from one color can be imported as inputs to other colors (see “Dataset 1” below).
Intermediate transforms between colors are prohibited because DataFrames in a PySpark file are typically not accessible to other .py files.

Side note for Palantir: a redacted view option for sharing images like the one below would be nice.

Redacted example with four colors that are independent except for their inputs (i.e. like four .py files in a repo). As the legend shows, the shared inputs group is hidden.