Dataset branch copy to another branch

tcr · April 29, 2025, 2:56pm

Hi,

Is it possible to copy data from one branch of a dataset to another branch of the same dataset?if not rename branch?(of dataset)

I have a large data and don’t want to re-run the whole transforms to generate data.

Thanks

yix · April 29, 2025, 3:21pm

Why do you need to copy it? You can use fallback branches such that you’re pointing a downstream transformation to an input that’s on a branch.

Fallbranch docs

tcr · April 29, 2025, 7:48pm

Thank you. That could be an option as well, but would like to have the data in master branch of the dataset without building it. What happens when the retention policy kicks in? and the branch that generated the dataset was archived

yix · April 29, 2025, 7:56pm

It’s not possible to copy the data from one branch to another, you would need to rebuild the new code on the master branch.

My guess is that with fallback branches, it will just fallback to the master branch eventually if it doesn’t exist on a branch.

sandpiper · April 29, 2025, 9:24pm

There is a private API that you can use to change a branch to “point” at another transaction, which can be a transaction on a different branch. Because it’s a private API, it’s unfortunately a “use at your own risk” type of situation, but for your reference, the details are as follows:

URL: https://your-foundry-domain/foundry-catalog/api/catalog/datasets/{datasetRid}/branchesUpdate2/{branchId}
Method: POST
Body: The transaction RID that your want to point the branch at, as a JSON string

(In your case, since you want to change master to point at a transaction on another branch, you would specify “master” for the branch ID in the URL)

I recommend verifying that this works as you expect with a smaller version of the same dataset that is consumed in the same way downstream. Off the top of my head, known issues include the fact that updating a branch in this way does not trigger Ontology syncs (if the dataset backs an object type) and can result in the branch not having the correct hive-style partitioning metadata.

The typical use-case for this API is when you want to test changes in incremental logic on a branch, but the upstream data hasn’t updated recently, so you update your dataset’s development branch to point to an earlier transaction on the master branch and then run a build.