Is there a way to write a custom filename for a dataset export? We’re pushing CSVs to a data lake but they all have unreadable names like spark/part-00000-e4f4c451-14af-4ca1-b6ec-6f46badd8b51-c000.csv
You can use this solution to rename files in the transform preceding the export.
https://community.palantir.com/t/rename-files-of-my-output-in-place-in-transforms/1319/2
1 Like
There’s no way to change the filename in an export-- you need to change the actual names in Foundry. What worked for me was:
- In my pipeline builder transform, set it to write outputs as CSV. Beneath the hood this may create multiple CSVs, even for a single dataset.
- Use the transform below to concatenate those CSVs and output to a new dataset, which contains a single CSV with the filename I wanted. Note that the code below applies to multiple datasets, each of which has a different resulting filename.
from transforms.api import Input, Output, lightweight, transform
def create_transforms(dataset_to_filename):
results = []
for dataset_rid, filename in dataset_to_filename.items():
def create_transform(dataset_rid, filename):
@lightweight
@transform(
dataset_input=Input(dataset_rid),
output=Output(f"/path/to/output/dataset"),
)
def rename_files(dataset_input, output):
with output.filesystem().open(filename, "w") as f_out:
first_file = True
for file in dataset_input.filesystem().ls():
with dataset_input.filesystem().open(file.path) as f_in:
if first_file:
f_out.write(f_in.read())
first_file = False
else:
lines = f_in.readlines()
if lines:
f_out.write("".join(lines[1:]))
return rename_files
results.append(create_transform(dataset_rid, filename))
return results
dataset_to_filename = {
"dataset_rid_1": "file1.csv",
"dataset_rid_2": "file2.csv",
"dataset_rid_3": "file3.csv",
"dataset_rid_4": "file4.csv",
}
TRANSFORMS = create_transforms(dataset_to_filename)
@nickornk’s answer is correct to change the filename “in-place” (meaning without a separate transform) but because my datasets were small this was easier.
Legacy export tasks supported this but it was not moved to Exports -