How to Unzip Datasets Without a Schema

Hi everyone,
I have a dataset without a schema and I’m trying to unzip it for use in Foundry. Does anyone have suggestions or methods for extracting and properly organizing the data without a schema?

Thanks in advance!
Jesica

3 Likes

Hello Jessica,

Have you been able to try out the example in this part of the documentation: https://www.palantir.com/docs/foundry/code-examples/raw-file-parsing-transforms#unzipping-and-extracting-files-in-dataset ?

Is there anything there that does not match your use case?

A dataset is a collection of files. It’s a schema that enables a dataset to become structured, such as rows + columns. You can however work with datasets without a schema.

You’ll need to add the zipped files to a dataset in Foundry (the dataset will have no schema), and using a transform, unzip the files and save that as a new dataset. The new dataset can also have no schema.

Hi @jesicaamigo,

To keep it short and sweet. Create a code repository and try this code:

Imports

import shutil
import tempfile
import zipfile

from transforms.api import Input, Output, transform

Transform

@transform(
    input_ds_zip=Input("rid"),
    output_file=Output("rid"),
)
def compute(ctx, input_ds_zip, output_file):
    def process_file(file_status):
        with input_ds_zip.filesystem().open(file_status.path, "rb") as f:
            with tempfile.NamedTemporaryFile() as tmp:
                shutil.copyfileobj(f, tmp)
                tmp.flush()
                with zipfile.ZipFile(tmp) as archive:
                    for filename in archive.namelist():
                        with archive.open(filename) as extracted_file:
                            with output_file.filesystem().open(filename, "wb") as out_f:
                                shutil.copyfileobj(extracted_file, out_f)

    files_df = parcels_zip.filesystem().files(glob="**/*.zip")
    files_df.rdd.foreach(process_file)

Please let me knows if this is what you’re looking for.

6 Likes

Hi @arukavina,

I tried the code you shared, and it worked! I was able to unzip the files successfully.

Thank you so much for your help!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.