We have a requirement to zip multiple notepad files and export it to another source.
I have a way of doing this for a dataset input in code repository as shown below:
@transform(
my_output=Output("output_path"),
source_df=Input("dataset_input")
)
def compute(ctx, my_output, source_df):
files = source_df.filesystem().files().collect()
with my_output.filesystem().open("foundry_code_examples.zip", 'wb') as write_zip:
with zipfile.ZipFile(write_zip.name, 'w') as zip_file:
for file_row in files:
with source_df.filesystem().open(file_row["path"], 'rb') as markdown_file:
zip_file.write(markdown_file.name, arcname=file_row["path"])
return
However, this approach doesn’t work when we input a notepad rid as I get this error:
Code references a non-dataset resource
A non-dataset resource is referenced as the input or output of a transform. With a few specific exceptions, it is not possible to reference non-dataset resources (folders, etc.) as transform inputs or outputs.
The referenced non-dataset resources are as below. You can identify which files are referencing them using the code repository search functionality.
ri.notepad.main.notepad-template.*
I would really appreciate any response if it’s possible to do this in Palantir. Also, feel free to suggest alternative approaches (if any) to zip multiple notepad files and export it to other source from Palantir.
with my_output.filesystem().open("export.zip", 'wb') as write_zip:
with zipfile.ZipFile(write_zip.name, 'w') as zip_file:
for file_row in files:
with source_df.filesystem().open(file_row["path"], 'rb') as input_file:
zip_file.write(input_file.name, arcname=file_row["path"])`
Export Configuration 3
Only modified files since last export will be processed by default
Files in destination will be overwritten unless configured otherwise
Create dedicated sub-folders for exported data
Best Practices
Filter and prepare data before export 4
Ensure data meets export control rules
Filter to necessary data only
Optimize file sizes
Configure Output Format 5
Coalesce partitions if needed
Set appropriate compression levels
Choose suitable file format for destination
Monitor and Validate
Check export logs
Verify file integrity
Monitor storage usage
If you need specific guidance on particular file types or export destinations, please provide more details.
Thanks for your response. As you can see, I am already using your batch processing approach.
However, when we take a notepad rid as an input, it doesn’t work as we get the error mentioned above.
We do have such APIs for datasets, mediasets and objects. However, I couldn’t find any API that gets the content from a notepad. Please help me with the endpoint, if you know of any. Thanks!!