Read schema when reading CSV in Code Workspaces

192c7163b888db45ab2e · December 9, 2025, 6:37pm

Thanks for the precisions !
I think the problem comes from the fact that you are downloading the raw CSV files (which makes you loose metadata of the dataset, including the schema) in your code and then reading them via the Spark API.

I would suggest trying to read the Foundry Dataset as a pandas dataset and then reading that pandas dataset to create the Spark dataset.
Something like :

from foundry.transforms import Dataset
from pyspark.sql import SparkSession

csv_files_with_schema = Dataset.get("csv_files_with_schema").read_table(format="pandas")
spark = SparkSession.builder.getOrCreate()
spark_df = spark.createDataFrame(csv_files_with_schema)

Tested it on my side, it preserved the types
spark_df.select(‘A’).dtypes -> [('A', 'bigint')]

Best !