Project Lineage

I would like to plot Projects as nodes on the lineage graph. This would help a lot with platform maintenance when enforcing 1:1 ratios with Projects and Marketplace Products. Waiting until a Product is packaged to see the Product dependencies in DevOps is too late.

Right now, I search a Project in lineage and select the ‘add all datasets in this folder’ to plot all dataset nodes in the Project. Then I collapse them into a group. Then I do this again for multiple Projects. It works, but it is time consuming and won’t capture a newly added dataset.

It would be really great if I could add a Project node that would capture all the relationships of the datasets within it. The overall goal here is to make sure that we maintain one-way project dependencies so that when we package and release Products, the order of operations for upgrades and deployments is clear.

Thank you!

1 Like

Hi, thank you for your message and for sharing this request.

Unfortunately this isn’t something we’d be able to add to Data Lineage. We generally hold projects as a layer of metadata on top of the nodes (which you can color by) and not something that can define a varying number of nodes underneath.

However - could you describe your desired workflow a bit more? Is the idea that you’d go from “Project X” to creating a new DevOps product and/or updating an existing DevOps product?

I’m wondering if there is a more directly scoped DevOps feature request here for spinning up and updating products.

Thanks!

Yes, that is what we are trying to achieve. In an ideal, future state, each Project would become a Product. Our general architecture for Projects is Datasource → Transform → Ontology → Workflow. For single use-case initiatives, it is not difficult to know the packaging strategy or order of operations for Product installation and upgrades.

However, as we grow the ontology and set of use-cases, we have started to introduce inter-Project (and therefore Product) dependencies - mainly at the Transform Project layer. As an example, we have a two main Datasource Projects for Employees that feed into our one Transform Project combining and refining that data and then one Ontology Project for employee related objects. Outputs from the Transform Project can be used in other Transform Projects (like ones for timecards, project management, tasks… anything that needs additional employee information). Just in example, if outputs from Employee Transforms feed into Timecard Transforms (and each is its own Product), then Timecard Transform outputs should not feed back into Employee Transforms - or anything downstream of Employee Transforms.

This would cause problems when installing or upgrading packages because there would be a ‘circular’ dependency of inputs to the Products.

The reason it would be nice in lineage and outside of DevOps, is that these issues could be caught before attempting to package or upgrade a chain of dependent Products. Even a quicker method to plot and group all nodes in a list of 10ish Projects would be nice. Additionally, if I could ‘refresh’ that same lineage to check for new nodes, it would help save a lot of time.

I have been thinking about this more. Another idea could be to add a “Marketplace Validation” toggle similar to those that have been added in Code Repositories and Workshop.

This would flag the project as being a target for packaging by Marketplace and perform some basic dependency checks.