It seems focused on controlling which datasets are ‘created’ with a marking within the Project, but what about a new marking being added to an existing dataset’s upstream?
Suppose I have
Project A:
Dataset A [Marking 1]
Project B: Constraints allow Marking 1
Dataset B; uses Dataset A as an input. [Marking 1]
Then Dataset A gets a new marking, ‘Marking 2’ applied. This is not allowed in Project B, because it is not whitelisted in the constraints. What happens?
Options I can think of:
A. Dataset B still gets the new Marking 2, as it was already ‘created’
B. Dataset B still exists, but magically without Marking 2, as it’s not allowed in the Project B folder
C. Dataset B fails to build the next time it’s run, as Marking 2 is not allowed, so only the last transaction is still visible (even though I’m not sure if builds are required to propagate markings).
Unfortunately I’m not in a position to create markings and apply them to test.
The answer here is C - Dataset B will have both the markings, since they would have propagated from upstream - but you will no longer be able to build dataset B due to the project marking constraints. Hope this helps!
If I don’t have permission to see Marking 2, does that mean the Marking will still propagate to Dataset B, and I won’t be able to see it in addition to the build failing?
Basically I want to protect my users from unexpected Markings that are accidentally propagated from upstream locking them out. I’d be happy with having a failed build and stale data, which would then appear in our monitoring, but I don’t want access to suddenly disappear without knowing why.
I’m not sure I understand the question, but as I understand it - the markings propagate regardless of the permissions of the users, but you won’t be able to build the dataset if it has an input that you can’t see. So in your case, you won’t be able to build Dataset B at all if you can’t see Marking 2 [which is on A].
Basically I want to protect my users from unexpected Markings that are accidentally propagated from upstream locking them out.
I am not sure if this would be possible, by design, since what if you genuinely wanted to introduce a new marking and lock out users who don’t have permission anyways?
@Hashir I was wondering if there’s a way to temporarily prevent the propagation of a marking, e.g. linking it to builds or transactions, in case it’s unexpectedly propagating. At least in such a way that a dev/support team can still see enough of the pipeline to figure out what happened.
I guess the core issue is that upstream, near a data ingestion, someone applies a marking that suddenly removes access from many more people than expected, because they can’t see what the full impact downstream will be. When that happens though, we often can’t see enough to debug what changed to contact the right person.
I was wondering if there’s a way to temporarily prevent the propagation of a marking, e.g. linking it to builds or transactions, in case it’s unexpectedly propagating.
Markings are already linked to transactions iirc, and when a dataset acquires a marking, the marking is applied to all of its transactions. This is of course, not good news, if you wanted the users to still see stale data of a dataset they’ve been locked out of.
which would suggest there isn’t an easy way to prevent markings propagating in the way you’d like [of course, you can manually add stop_propagating statements, but I don’t think you’d be able to leverage that here without defeating the point of having markings in the first place].
Which is all to say, it doesn’t sound like there is an easy way around this. The best I can think of is to use Data Lineage more rigorously at the Ingestion-level to try and see the downstream impact of applying markings [although I’d understand if those working at the ingestion level may not have the full context on the impact of changes]. Alternatively, you could ping the Foundry API on a regular basis [e.g. daily] to see which new markings have been acquired, and simply log these all somewhere, just so the dev team have a place to start when someone mysteriously loses access - but this is entering very bodge-y territory!
Apologies if that’s not as helpful as you were perhaps hoping! Might be a shout to keep this thread open in case someone has a very creative idea about how to meet your exact use-case.
Thanks for that suggestion at the end, and that might be the core of the issue. I don’t necessarily want to stop a marking from propagating - that defeats the purpose, and then I could use stop_propagating (even if only with Markings that I know already exist).
I guess what I really want is a way of seeing what changed, and getting an alert if a marking suddenly appears in an unexpected way. Ideal would be a Data Expectation that checks incoming markings, and fails the build/triggers a warning if an unexpected marking is present.
The issue at the moment is if a new marking appears, and locks you out - there’s no warning mechanism. Suddenly you and your customers can’t see anything, and you can’t figure out what changed, and who might be responsible/can help.