Hi everyone,
I have a dataset without a schema and I’m trying to unzip it for use in Foundry. Does anyone have suggestions or methods for extracting and properly organizing the data without a schema?
Thanks in advance!
Jesica
Hi everyone,
I have a dataset without a schema and I’m trying to unzip it for use in Foundry. Does anyone have suggestions or methods for extracting and properly organizing the data without a schema?
Thanks in advance!
Jesica
Hello Jessica,
Have you been able to try out the example in this part of the documentation: https://www.palantir.com/docs/foundry/code-examples/raw-file-parsing-transforms#unzipping-and-extracting-files-in-dataset ?
Is there anything there that does not match your use case?
A dataset is a collection of files. It’s a schema that enables a dataset to become structured, such as rows + columns. You can however work with datasets without a schema.
You’ll need to add the zipped files to a dataset in Foundry (the dataset will have no schema), and using a transform, unzip the files and save that as a new dataset. The new dataset can also have no schema.
Hi @jesicaamigo,
To keep it short and sweet. Create a code repository and try this code:
import shutil
import tempfile
import zipfile
from transforms.api import Input, Output, transform
@transform(
input_ds_zip=Input("rid"),
output_file=Output("rid"),
)
def compute(ctx, input_ds_zip, output_file):
def process_file(file_status):
with input_ds_zip.filesystem().open(file_status.path, "rb") as f:
with tempfile.NamedTemporaryFile() as tmp:
shutil.copyfileobj(f, tmp)
tmp.flush()
with zipfile.ZipFile(tmp) as archive:
for filename in archive.namelist():
with archive.open(filename) as extracted_file:
with output_file.filesystem().open(filename, "wb") as out_f:
shutil.copyfileobj(extracted_file, out_f)
files_df = parcels_zip.filesystem().files(glob="**/*.zip")
files_df.rdd.foreach(process_file)
Please let me knows if this is what you’re looking for.
Hi @arukavina,
I tried the code you shared, and it worked! I was able to unzip the files successfully.
Thank you so much for your help!
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.