I’m running into ColumnTypesNotSupported
when trying to call readTable (format=CSV) on datasets that have Array columns.
How can i read a table that has these data types? And how can i get a CSV out of it?
I’m running into ColumnTypesNotSupported
when trying to call readTable (format=CSV) on datasets that have Array columns.
How can i read a table that has these data types? And how can i get a CSV out of it?
To read this datatype you will have to use the arrow format by calling readTable with ?format=ARROW
(as mentioned in the docs).
The arrow format can handle this data type. You can then convert the arrow file into a csv using python, here is an example of code that calls the endpoint, fetcehs an arrow file and turns it into a CSV:
import os
import requests
import pyarrow as pa
import pyarrow.csv as csv
def download_arrow_and_save_as_csv(token, hostname, dataset_rid, output_file):
# Define the endpoint URL
url = f"https://{hostname}/api/v1/datasets/{dataset_rid}/readTable?format=ARROW"
# Set up the headers with the authorization token
headers = {
"Authorization": f"Bearer {token}"
}
# Make the request
response = requests.get(url, headers=headers)
# Ensure the response is successful
if response.status_code != 200:
raise Exception(f"Failed to download the Arrow stream: {response.status_code} - {response.text}")
# Load the Arrow stream into a table
table = pa.ipc.open_stream(response.content).read_all()
# Save the table as a CSV file
csv.write_csv(table, output_file)
# Usage:
token = os.environ["TOKEN"]
hostname = os.environ["HOSTNAME"]
dataset_rid = "your_dataset_rid"
output_file = "output.csv"
download_arrow_and_save_as_csv(token, hostname, dataset_rid, output_file)