Lightweight Fails to Load Dataset `FoundryDataSidecar:FileDownloadFailure`

I uploaded a small CSV file as a dataset, and am now trying to clean it with Polars, but for some reason it fails to load and I get a FoundryDataSidecar:FileDownloadFailure. Has anyone else seen this, or know why?

Even a basic version of the transform fails:

from transforms.api import transform, Input, Output

@transform.lightweight(
    outfile=Output("ri.foundry.main.dataset.<snip>"),
    source_df=Input("ri.foundry.main.dataset.<snip>"),
)
def compute(outfile, source_df):

    pf = (source_df.polars(lazy=False)
          )

    outfile.write_table(pf)

I have checked that the input dataset has a schema, and Preview works. I can also use the input dataset in a Pipeline builder transform, and with that create an output dataset (saved as parquet), but even trying to use this transformed dataset results in the same error.

This repository is project scope exempted, because we use it to make API calls from, but that’s not been an issue for this use case before.

The full error message is:

Job failed with status 1:
Traceback (most recent call last):
  File "/foundry/python_environment/lib/python3.12/site-packages/conjure_python_client/_http/requests_client.py", line 119, in _request
    _response.raise_for_status()
  File "/foundry/python_environment/lib/python3.12/site-packages/requests/models.py", line 1026, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://localhost:8188/foundry-data-sidecar/api/datasets/source_df/downloadTableAsFilesV2

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/foundry/python_environment/lib/python3.12/site-packages/transforms/api/_lightweight/_bootstrap.py", line 39, in <module>
    main()
  File "/foundry/python_environment/lib/python3.12/site-packages/transforms/api/_lightweight/_bootstrap.py", line 35, in main
    transform.compute()
  File "/foundry/python_environment/lib/python3.12/site-packages/transforms/api/_lightweight/_transform.py", line 193, in compute
    self._compute()
  File "/foundry/python_environment/lib/python3.12/site-packages/transforms/api/_lightweight/_transform.py", line 282, in _compute
    self._user_code(**kwargs)
  File "/foundry/user_code/map_actions/datasets/input_datasets/read_csv.py", line 11, in compute
    pf = (source_df.polars(lazy=False)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/foundry/python_environment/lib/python3.12/site-packages/transforms/api/_lightweight/_param.py", line 94, in polars
    return self._read_table(format="lazy-polars") if lazy else self._read_table(format="polars")
                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/foundry/python_environment/lib/python3.12/site-packages/transforms/api/_lightweight/_param.py", line 97, in _read_table
    return self.dataset.read_table(*args, **kwarg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/foundry/python_environment/lib/python3.12/site-packages/foundry/transforms/_dataset.py", line 180, in read_table
    return self._read_parquet_table(format, mode, force_dataset_download, schema)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/foundry/python_environment/lib/python3.12/site-packages/foundry/transforms/_dataset.py", line 239, in _read_parquet_table
    self._schema = foundry_file_source.schema if foundry_file_source.schema else user_specified_schema
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/foundry/python_environment/lib/python3.12/site-packages/foundry/transforms/_tabular_dataset/_foundry_file_source.py", line 152, in schema
    self._initialize_if_not_downloaded()
  File "/foundry/python_environment/lib/python3.12/site-packages/foundry/transforms/_tabular_dataset/_foundry_file_source.py", line 181, in _initialize_if_not_downloaded
    self._local_disk_file_source = self._download_files()
                                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/foundry/python_environment/lib/python3.12/site-packages/foundry/transforms/_tabular_dataset/_foundry_file_source.py", line 186, in _download_files
    downloaded_table: DownloadTableAsFilesResponse = self._client.download_table_as_files_v2(
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/foundry/python_environment/lib/python3.12/site-packages/foundry/transforms/_clients.py", line 123, in download_table_as_files_v2
    return sidecar_client.download_table_as_files_v2(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/foundry/python_environment/lib/python3.12/site-packages/foundry_data_sidecar_api/_impl.py", line 1053, in download_table_as_files_v2
    _response: Response = self._request(
                          ^^^^^^^^^^^^^^
  File "/foundry/python_environment/lib/python3.12/site-packages/conjure_python_client/_http/requests_client.py", line 122, in _request
    raise ConjureHTTPError(e) from e
conjure_python_client._http.requests_client.ConjureHTTPError: 500 Server Error: Internal Server Error for url: https://localhost:8188/foundry-data-sidecar/api/datasets/source_df/downloadTableAsFilesV2. ErrorCode: 'INTERNAL'. ErrorName: 'FoundryDataSidecar:FileDownloadFailure'. ErrorInstanceId: 'cae9b80f-7f15-41d6-98b2-172ce700fa6d'. TraceId: '995de98e8663618c'. Parameters: {'paths': '[<snip>.csv]'}

If I try to process the same input dataset with a Spark transform it builds without issue, e.g. the below works:

from transforms.api import transform, Input, Output


@transform(
    outfile=Output("ri.foundry.main.dataset.<snip>"),
    source_df=Input("ri.foundry.main.dataset.<snip>"),
)
def compute(outfile, source_df):

    pf = (source_df.dataframe()
          )

    outfile.write_dataframe(pf)

1 Like

try @transform.using instead of @transform.lightweight

https://www.palantir.com/docs/foundry/transforms-python/transforms-versions/#updated-syntax-for-lightweight

1 Like

@rfisk just updated to use that new format, but sadly hasn’t made a difference, I get the same error, even with the transform re-written as:

from transforms.api import transform, Input, Output


@transform.using(
    outfile=Output("ri.foundry.main.dataset.<snip>"),
    source_df=Input("ri.foundry.main.dataset.<snip>"),
)
def compute(outfile, source_df):

    pf = (source_df.polars(lazy=False)
          )

    outfile.write_table(pf

I’ve seen this issue sporadically come and go over the past couple months, usually with the same endpoint referenced:
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://localhost:8188/foundry-data-sidecar/api/datasets/source_df/downloadTableAsFilesV2

Often this will resolve by running the build again. Strange and the solution obviously doesn’t scale.

1 Like

@coffee-operator the other error I’ve encountered intermittently in VS Code based previews is:

RuntimeError: Failed to list files in S3 proxy for https://waypoint-envoy.rubix-system.svc.cluster.local:8443/compute/423ab7/production/s3-proxy/io/s3/ri-foundry-main-dataset-<snip> with status code 403: <?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Unable to access bucket (e05ec541-<snip>)</Message><Resource>/ri-foundry-main-dataset-<snip></Resource><RequestId>439628fe0e3adc93</RequestId><HostId>2248b1ea6ff8f023</HostId></Error>
2025-10-01 13:04:08.009 [info] Executing preview failed    

But trying it again often clears it.

Concerning the FoundryDataSidecar:FileDownloadFailure error, Support confirmed it was because of the repo’s is Project Scope Exemption:

Project Scope exclusion is a pretty old paradigm and configuration mode that marks a repository as “insecure”, which allows to do few things like a) run jobs with user token (as opposed to project scoped tokens), and b) run API calls from within the jobs with the user token.
That’s why Lightweight (much newer feature) will not support it

So if you’re doing that, you have to stick to Spark.

1 Like