Custom filename in export

Is there a way to write a custom filename for a dataset export? We’re pushing CSVs to a data lake but they all have unreadable names like spark/part-00000-e4f4c451-14af-4ca1-b6ec-6f46badd8b51-c000.csv

You can use this solution to rename files in the transform preceding the export.

https://community.palantir.com/t/rename-files-of-my-output-in-place-in-transforms/1319/2

1 Like

There’s no way to change the filename in an export-- you need to change the actual names in Foundry. What worked for me was:

  1. In my pipeline builder transform, set it to write outputs as CSV. Beneath the hood this may create multiple CSVs, even for a single dataset.
  2. Use the transform below to concatenate those CSVs and output to a new dataset, which contains a single CSV with the filename I wanted. Note that the code below applies to multiple datasets, each of which has a different resulting filename.
from transforms.api import Input, Output, lightweight, transform


def create_transforms(dataset_to_filename):
    results = []

    for dataset_rid, filename in dataset_to_filename.items():
        def create_transform(dataset_rid, filename):
            @lightweight
            @transform(
                dataset_input=Input(dataset_rid),
                output=Output(f"/path/to/output/dataset"),
            )
            def rename_files(dataset_input, output):
                with output.filesystem().open(filename, "w") as f_out:
                    first_file = True
                    for file in dataset_input.filesystem().ls():
                        with dataset_input.filesystem().open(file.path) as f_in:
                            if first_file:
                                f_out.write(f_in.read())
                                first_file = False
                            else:
                                lines = f_in.readlines()
                                if lines:
                                    f_out.write("".join(lines[1:]))

            return rename_files

        results.append(create_transform(dataset_rid, filename))

    return results


dataset_to_filename = {
    "dataset_rid_1": "file1.csv",
    "dataset_rid_2": "file2.csv",
    "dataset_rid_3": "file3.csv",
    "dataset_rid_4": "file4.csv",
}

TRANSFORMS = create_transforms(dataset_to_filename)

@nickornk’s answer is correct to change the filename “in-place” (meaning without a separate transform) but because my datasets were small this was easier.

Legacy export tasks supported this but it was not moved to Exports - :weary_face: