Zip multiple notepad files within Palantir

We have a requirement to zip multiple notepad files and export it to another source.
I have a way of doing this for a dataset input in code repository as shown below:

@transform(
    my_output=Output("output_path"),
    source_df=Input("dataset_input")
)
def compute(ctx, my_output, source_df):

    files = source_df.filesystem().files().collect()

    with my_output.filesystem().open("foundry_code_examples.zip", 'wb') as write_zip:
        with zipfile.ZipFile(write_zip.name, 'w') as zip_file:
            for file_row in files:
                with source_df.filesystem().open(file_row["path"], 'rb') as markdown_file:
                    zip_file.write(markdown_file.name, arcname=file_row["path"])

    return

However, this approach doesn’t work when we input a notepad rid as I get this error:

Code references a non-dataset resource

A non-dataset resource is referenced as the input or output of a transform. With a few specific exceptions, it is not possible to reference non-dataset resources (folders, etc.) as transform inputs or outputs.
The referenced non-dataset resources are as below. You can identify which files are referencing them using the code repository search functionality.
ri.notepad.main.notepad-template.*

I would really appreciate any response if it’s possible to do this in Palantir. Also, feel free to suggest alternative approaches (if any) to zip multiple notepad files and export it to other source from Palantir.

1 Like

AIP ASSIST:
Automating ZIP and Export Processes in Foundry
Here’s how to automate zipping and exporting files in Foundry:

  1. Using Pipeline Builder

Basic Setup

  1. Create a new pipeline
  2. Add source files/datasets
  3. Configure compression transforms
  4. Set up export destination

Automation Steps:

# Example transform configuration @transform( output=Output("/path/to/output"), source=Input("/path/to/source") ) def compress_and_export(output, source): # Compression logic here output.write_dataframe(source.dataframe(), output_format="csv")

  1. Batch Processing 2

`@transform(
my_output=Output(“”),
source_df=Input(“”),
)
def batch_zip_export(ctx, my_output, source_df):
files = source_df.filesystem().files(glob=“.”).collect()

with my_output.filesystem().open("export.zip", 'wb') as write_zip:
    with zipfile.ZipFile(write_zip.name, 'w') as zip_file:
        for file_row in files:
            with source_df.filesystem().open(file_row["path"], 'rb') as input_file:
                zip_file.write(input_file.name, arcname=file_row["path"])`
  1. Export Configuration 3
  • Only modified files since last export will be processed by default
  • Files in destination will be overwritten unless configured otherwise
  • Create dedicated sub-folders for exported data

Best Practices

  1. Filter and prepare data before export 4
  • Ensure data meets export control rules
  • Filter to necessary data only
  • Optimize file sizes
  1. Configure Output Format 5
  • Coalesce partitions if needed
  • Set appropriate compression levels
  • Choose suitable file format for destination
  1. Monitor and Validate
  • Check export logs
  • Verify file integrity
  • Monitor storage usage

If you need specific guidance on particular file types or export destinations, please provide more details.

Hello Maverick,

Thanks for your response. As you can see, I am already using your batch processing approach.
However, when we take a notepad rid as an input, it doesn’t work as we get the error mentioned above.

I want to know an approach for notepad files.

Transforms and pipeline builder operate on datasets or media sets as inputs, not on notepad documents.

Did you check if there is an api endpoint you could use to get the string content of a notepad?

1 Like

We do have such APIs for datasets, mediasets and objects. However, I couldn’t find any API that gets the content from a notepad. Please help me with the endpoint, if you know of any. Thanks!!

1 Like