Converting Pipeline Builder Logic to PySpark/Polars Transforms - Feature or Workaround?

Hi Community,

I’m exploring ways to convert existing Pipeline Builder pipelines into Code Repository transforms (PySpark or Polars), either automatically or semi-automatically.

What I’m Looking For:

  • Is there an official API or tool that exposes Pipeline Builder pipeline metadata/logic in a structured way (e.g., JSON config)?
  • Has anyone built a custom Foundry app (e.g. Workshop + Python transforms) that automates this kind of conversion?

Why This Matters:
Many teams start with Pipeline Builder for quick prototyping but eventually need to move to Code Repositories for more complex transformation logic and access to Python libraries not available in Pipeline Builder.

Feature Request (if no solution exists):
If this isn’t currently possible, I think this would be a highly valuable feature - a native “Convert to Code Repository” option that:

  • Translates Pipeline Builder steps to PySpark or Polars transforms
  • Preserves dataset inputs/outputs and schema mappings
  • Flags any steps that can’t be auto-converted for manual review

Has anyone tackled this before? Would love to hear any thoughts, workarounds, or +1s on the feature request!

Thanks!

pipeline-builder code-repositories

Hey @Dee

Worked on roughly the same problem and quickly put together a snippet of it for public use, in case it’s useful: https://github.com/sibyl-advisory/foundry-helper

The autonomous flow I’m aiming for breaks into four steps:

1. Compass crawler (not in the repo but easy to build). Walk Compass to enumerate every Pipeline Builder pipeline in scope — by folder, owner, tag, whatever filter you need. This is the discovery layer that feeds steps 2–3 for a fleet-wide migration.

2. Pull the recipe for each pipeline. Hit GET {stack}/eddie/api/pipelines-v2/{pipeline_rid}/all-information — that returns the full pipeline snapshot (transforms, targets, clusters, schemas, expressions, etc.). :warning: This is an internal API — the /eddie/ namespace isn’t part of the public Foundry API surface, so field shapes and endpoints can change between platform releases without notice. Fine for one-shot migrations and tooling, but not fore more - not public API for this AFAIK.

3. Convert the recipe to a transform. Send the JSON to an LLM with a system prompt that defines the Pipeline Builder vocabulary (drop, applyExpression, filter, join, abs, naturalRandom, …) and the target idiom (PySpark, Polars, whatever you want).

Steps 2 and 3 can run either locally on your laptop or as a hosted job in Foundry — you just need a user-scoped token with read access to the pipelines.

The repo above is a thin reference for steps 2+3 today: it calls /eddie/api/pipelines-v2/…/all-information, reshapes the response into the “builder-copy-v2” clipboard format the UI uses, then (optionally) sends that to Claude to emit a transforms-python module.

Hackier alternative (don’t think it’s necessary, but it exists): Pipeline Builder has a native “convert to Java” path in the UI. You could pipe Builder → generated Java → tests. I haven’t gone down that road since most teams want PySpark/Polars, not Java, but it’s there if the LLM step feels too brittle for you.

The fun part is testing. Generating the code is the easy half but proving the new transform produces the same output as the Builder pipeline on a representative input is where this gets real. No suggestion on this side - I’d frame it as a governance approach: notify all the owners of the Pipeline Builders that a PR / repo is ready, have them own the output and the migration (could do this identification via Audit Log v3)

Side note on the target: I’d push hard toward Polars over PySpark wherever the data fits comfortably on one node. A lot of Builder pipelines are sub-billion-row, and Polars-in-a-single-node transform skips the Spark overhead. Meaningfully faster and cheaper than PySpark for that workload class.

Happy to go deeper on any of this — drop a Q.

To make it easier, provide the Pipeline Builder URL to AI FDE and ask it to convert it into a Python transform. The Pipeline Builder uses a JSON-like syntax called DSL, which AI FDE can recognize and convert into a Python transform.