Quickly find resources (datasets) from one specified project which have downstream outputs in another specified project

As the title suggests, I am interested in quickly finding the datasets which are “on the border” of specified two projects. Ideally, I would like to see the result in Data Lineage. I only want to find those resources which are in Project_A AND are used as inputs to return outputs in Project_B.
I used to use Data Lineage: putting all the Project’A’s datasets onto the graph, adding 1 downstream step, “coloring” by project, removing resources from all the projects except the 2 I’m interested in (Project_A + Project_B). But this became not feasible, because the first step is not possible - the graph truncates some datasets, as Project_A contains more than 500 datasets. So I may not find some connections between the projects if I continue to use this method.

1 Like

Hi,

There are a couple of common approaches for this:

  • You’ve already identified the quickest one, which is to use data lineage manually. As you mention this doesn’t really scale well to a large number of datasets.
  • There is a feature called ‘Project Catalog’ which is enabled for some Foundry instances, this includes a view of downstream projects but also only works for the first 500 datasets in the project. You’ll have to talk to your Palantir representative to see about enabling this feature, as it is not generally available.
  • Most large Foundry instances will have a metadata dataset containing information on each build run, including it’s inputs and outputs. This is something custom that would be set up by your palantir team; if you’re on a large instance (>1000 users, maybe more than a few years old) it’s likely this could exist.
  • You could try using the data lineage APIs directly. These are not ‘supported’ for external use so you will have to reverse-engineer them and they are liable to change (although data lineage is generally very stable at this point).

Hopefully this helps!

1 Like

Thank you for the truly valuable list of options.

However, I cannot accept it as answer as of now, because all the options either are not “quick” or do not scale well.