Hi,
I am trying to find the best way to set up a many to many link for two objects. According to https://www.palantir.com/docs/foundry/learning-data-dataeng-08/21/, I need a join table that contains the primary keys of both objects. What is the best way to build this backing dataset in Pyspark? Using each backing dataset for the respective objects as inputs and just joining two dfs that each contain the primary key only? Some advice on this would be great!
Hi @sirnyls ,
The answer may depend on the origin of your data and nature of the links. Commonly, you may already have such a join table originating from your external system holding the truth for links between different object types. In this case, you should ingest it into the Foundry and configure it to back the new many to many link type.
Your proposal will end up linking objects that are not related, which most likely isn’t what you are after. It’s only an artificial over-linked view of the world, whereas as a user you usually want to only capture the relevant links from the real world. Sticking with the Palantir linked example, not every passenger should be connected to every flight alert. Only passengers flying on the affected flight should be connected to the alert.
Furthermore, Foundry allows you to manage (create, delete etc.) these links directly within Ontology using Actions. If you don’t have such a backing dataset representing a join table yet available, you can start with an empty dataset (it can originate from pyspark transforms or Pipeline Builder) which you will populate going forward within Foundry. In this case, many to many link type within Foundry will act as a source of the truth for the object links.
Thanks for clarifying. Maybe a follow-up question on this: I tried to use a restricted view as a linking table between two objects, however I could not select it. If both objects are based on a restricted view and the linking table is just a normal view, which data is then accessible to the user?