Mapping a numeric column to a defined range in Pipeline Builder

A situation I often encounter when trying to build a metric and alerting pipeline is creating some calculated column, that is semantically “meaningless” but captures some combined measure of “badness”, and then map the results into a 0 → 1 range. I find this makes it easier to understand as a pure distribution and allows reusable logic for then bucketing the range and assigning ratings etc.

As a trivial example, consider a case I’m working on where I have one row per page of documentation, and a column for “page_views_prev_30_days” and “days_since_last_update”. To get a metric of the docs to prioritize updating, a naive approach is simply to multiply these columns together. The result has no semantic meaning, but the distribution tells us about docs that have both a lot of views and a gap from when they’ve been updated.

Ideally there would be a one-shot board in pipeline builder where I could give it a numeric column, specify the max and min for the new range, and then have it automatically do the min-max scaling for this normalization.

It’s not too annoying to do with a reusable custom expression, but I was wondering if theres something I’m missing for doing this? Or maybe some smarter way to treat this class of transformation?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.