Data Health: “Exploded Array Uniqueness” Check (primary-key-like rule across all array elements)

6793eeec65fff51d8d93 · February 17, 2026, 9:47am

Hello,

This is a feature request for the data health check inside Dataset. I would like a new Data Health check that ensures global uniqueness of values contained inside an array-typed column across all rows. Conceptually, it’s similar to a primary key constraint, but applied to each element of an array after exploding, and validated across the entire dataset/object type (not just within a single row).

Simple example

Row 1: tags = [A, B]
Row 2: tags = [C, D]
Row 3: tags = [B, E] ← violation, because “B” already appears in Row 1

Current Problem :
Today we can only guarantee uniqueness at the primary key level. We do not have a built-in Data Health rule that asserts “no element contained in this array column appears in any other row’s array.” To monitor this, we must introduce an intermediate build (explode + deduplicate + check), which adds latency, cost, and operational complexity just to validate uniqueness.

Thanks for reviewing this feature request.
Regards,