I uploaded a small CSV file as a dataset, and am now trying to clean it with Polars, but for some reason it fails to load and I get a FoundryDataSidecar:FileDownloadFailure. Has anyone else seen this, or know why?
Even a basic version of the transform fails:
from transforms.api import transform, Input, Output
@transform.lightweight(
outfile=Output("ri.foundry.main.dataset.<snip>"),
source_df=Input("ri.foundry.main.dataset.<snip>"),
)
def compute(outfile, source_df):
pf = (source_df.polars(lazy=False)
)
outfile.write_table(pf)
I have checked that the input dataset has a schema, and Preview works. I can also use the input dataset in a Pipeline builder transform, and with that create an output dataset (saved as parquet), but even trying to use this transformed dataset results in the same error.
This repository is project scope exempted, because we use it to make API calls from, but that’s not been an issue for this use case before.
The full error message is:
Job failed with status 1:
Traceback (most recent call last):
File "/foundry/python_environment/lib/python3.12/site-packages/conjure_python_client/_http/requests_client.py", line 119, in _request
_response.raise_for_status()
File "/foundry/python_environment/lib/python3.12/site-packages/requests/models.py", line 1026, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://localhost:8188/foundry-data-sidecar/api/datasets/source_df/downloadTableAsFilesV2
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/foundry/python_environment/lib/python3.12/site-packages/transforms/api/_lightweight/_bootstrap.py", line 39, in <module>
main()
File "/foundry/python_environment/lib/python3.12/site-packages/transforms/api/_lightweight/_bootstrap.py", line 35, in main
transform.compute()
File "/foundry/python_environment/lib/python3.12/site-packages/transforms/api/_lightweight/_transform.py", line 193, in compute
self._compute()
File "/foundry/python_environment/lib/python3.12/site-packages/transforms/api/_lightweight/_transform.py", line 282, in _compute
self._user_code(**kwargs)
File "/foundry/user_code/map_actions/datasets/input_datasets/read_csv.py", line 11, in compute
pf = (source_df.polars(lazy=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/foundry/python_environment/lib/python3.12/site-packages/transforms/api/_lightweight/_param.py", line 94, in polars
return self._read_table(format="lazy-polars") if lazy else self._read_table(format="polars")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/foundry/python_environment/lib/python3.12/site-packages/transforms/api/_lightweight/_param.py", line 97, in _read_table
return self.dataset.read_table(*args, **kwarg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/foundry/python_environment/lib/python3.12/site-packages/foundry/transforms/_dataset.py", line 180, in read_table
return self._read_parquet_table(format, mode, force_dataset_download, schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/foundry/python_environment/lib/python3.12/site-packages/foundry/transforms/_dataset.py", line 239, in _read_parquet_table
self._schema = foundry_file_source.schema if foundry_file_source.schema else user_specified_schema
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/foundry/python_environment/lib/python3.12/site-packages/foundry/transforms/_tabular_dataset/_foundry_file_source.py", line 152, in schema
self._initialize_if_not_downloaded()
File "/foundry/python_environment/lib/python3.12/site-packages/foundry/transforms/_tabular_dataset/_foundry_file_source.py", line 181, in _initialize_if_not_downloaded
self._local_disk_file_source = self._download_files()
^^^^^^^^^^^^^^^^^^^^^^
File "/foundry/python_environment/lib/python3.12/site-packages/foundry/transforms/_tabular_dataset/_foundry_file_source.py", line 186, in _download_files
downloaded_table: DownloadTableAsFilesResponse = self._client.download_table_as_files_v2(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/foundry/python_environment/lib/python3.12/site-packages/foundry/transforms/_clients.py", line 123, in download_table_as_files_v2
return sidecar_client.download_table_as_files_v2(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/foundry/python_environment/lib/python3.12/site-packages/foundry_data_sidecar_api/_impl.py", line 1053, in download_table_as_files_v2
_response: Response = self._request(
^^^^^^^^^^^^^^
File "/foundry/python_environment/lib/python3.12/site-packages/conjure_python_client/_http/requests_client.py", line 122, in _request
raise ConjureHTTPError(e) from e
conjure_python_client._http.requests_client.ConjureHTTPError: 500 Server Error: Internal Server Error for url: https://localhost:8188/foundry-data-sidecar/api/datasets/source_df/downloadTableAsFilesV2. ErrorCode: 'INTERNAL'. ErrorName: 'FoundryDataSidecar:FileDownloadFailure'. ErrorInstanceId: 'cae9b80f-7f15-41d6-98b2-172ce700fa6d'. TraceId: '995de98e8663618c'. Parameters: {'paths': '[<snip>.csv]'}
If I try to process the same input dataset with a Spark transform it builds without issue, e.g. the below works:
from transforms.api import transform, Input, Output
@transform(
outfile=Output("ri.foundry.main.dataset.<snip>"),
source_df=Input("ri.foundry.main.dataset.<snip>"),
)
def compute(outfile, source_df):
pf = (source_df.dataframe()
)
outfile.write_dataframe(pf)