Hi everyone,
I’m facing an issue with downloading images in my Jupyter Code Workspace. According to the Foundry documentation, the following approach should work for downloading files:
from foundry.transforms import Dataset
# Download all files in the dataset
downloaded_files = Dataset.get("my_alias").files().download()
local_file = downloaded_files["file.pdf"]
However, I’m struggling to adapt this for downloading images. When I follow these steps, I end up with parquet files instead of the expected images:
{
'spark/part-00000-dcb7197b-d051-4dea-90ed-c56fbfa64726-c000.snappy.parquet': '/foundry/0a697b6f969d726c/ri.foundry.main.dataset.eb2d6994-f832-429b-a23b-47f82d37d3b5/ri.foundry.main.transaction.00000003-13ab-71e1-aa6f-cb06d9026800/spark/part-00000-dcb7197b-d051-4dea-90ed-c56fbfa64726-c000.snappy.parquet',
'spark/part-00001-dcb7197b-d051-4dea-90ed-c56fbfa64726-c000.snappy.parquet': '/foundry/0a697b6f969d726c/ri.foundry.main.dataset.eb2d6994-f832-429b-a23b-47f82d37d3b5/ri.foundry.main.transaction.00000003-13ab-71e1-aa6f-cb06d9026800/spark/part-00001-dcb7197b-d051-4dea-90ed-c56fbfa64726-c000.snappy.parquet'
}
My dataset contains about ~30,000 images that look like this:
Does anyone have experience with downloading images from a dataset in Foundry? Is there a specific approach for handling non-tabular data that contains images?
Thanks in advance!