I’m looking to create an E2E with icd_10_codes.csv, provided by the Palantir’s “buildingchallenge”. When I try to upload the file to my workspace, it fails. I assume it is because of the size of the file (2.56 GB). Any recommendations how to work around this issue?
Try “Upload to a new unstructured dataset” instead of “Upload as a structured dataset.” The current behavior of the Foundry front-end is that when you upload to an unstructured dataset, the upload is performed in 5 MB chunks, each of which is a separate network request, so it should be very resilient to network instability and timeouts. Uploading to a structured dataset happens in a single network request, which can be tricky for large files, depending on your network setup.
To sanity check that the upload is making progress in the case of “Upload to a new unstructured dataset,” you can follow the process using Chrome™ DevTools or equivalent functionality in whichever web browser you are using.
You can always apply a schema to an unstructured dataset (i.e., make it into a structured dataset) from the dataset details page after the upload.
If uploading to an unstructured dataset doesn’t work for you either (perhaps due to memory issues on your device), another option is to compress the file locally, upload the compressed file, and decompress and parse the file in a pipeline inside Foundry. (If you compress with gzip, Spark can actually read it natively without an explicit decompression step, but you’ll end up with all of the data in one Spark partition which isn’t always ideal).