Hey @Dee
Worked on roughly the same problem and quickly put together a snippet of it for public use, in case it’s useful: https://github.com/sibyl-advisory/foundry-helper
The autonomous flow I’m aiming for breaks into four steps:
1. Compass crawler (not in the repo but easy to build). Walk Compass to enumerate every Pipeline Builder pipeline in scope — by folder, owner, tag, whatever filter you need. This is the discovery layer that feeds steps 2–3 for a fleet-wide migration.
2. Pull the recipe for each pipeline. Hit GET {stack}/eddie/api/pipelines-v2/{pipeline_rid}/all-information — that returns the full pipeline snapshot (transforms, targets, clusters, schemas, expressions, etc.).
This is an internal API — the /eddie/ namespace isn’t part of the public Foundry API surface, so field shapes and endpoints can change between platform releases without notice. Fine for one-shot migrations and tooling, but not fore more - not public API for this AFAIK.
3. Convert the recipe to a transform. Send the JSON to an LLM with a system prompt that defines the Pipeline Builder vocabulary (drop, applyExpression, filter, join, abs, naturalRandom, …) and the target idiom (PySpark, Polars, whatever you want).
Steps 2 and 3 can run either locally on your laptop or as a hosted job in Foundry — you just need a user-scoped token with read access to the pipelines.
The repo above is a thin reference for steps 2+3 today: it calls /eddie/api/pipelines-v2/…/all-information, reshapes the response into the “builder-copy-v2” clipboard format the UI uses, then (optionally) sends that to Claude to emit a transforms-python module.
Hackier alternative (don’t think it’s necessary, but it exists): Pipeline Builder has a native “convert to Java” path in the UI. You could pipe Builder → generated Java → tests. I haven’t gone down that road since most teams want PySpark/Polars, not Java, but it’s there if the LLM step feels too brittle for you.
The fun part is testing. Generating the code is the easy half but proving the new transform produces the same output as the Builder pipeline on a representative input is where this gets real. No suggestion on this side - I’d frame it as a governance approach: notify all the owners of the Pipeline Builders that a PR / repo is ready, have them own the output and the migration (could do this identification via Audit Log v3)
Side note on the target: I’d push hard toward Polars over PySpark wherever the data fits comfortably on one node. A lot of Builder pipelines are sub-billion-row, and Polars-in-a-single-node transform skips the Spark overhead. Meaningfully faster and cheaper than PySpark for that workload class.
Happy to go deeper on any of this — drop a Q.