ColumnTypesNotSupported when using readTable

jkane · September 3, 2024, 10:25am

I’m running into ColumnTypesNotSupported when trying to call readTable (format=CSV) on datasets that have Array columns.

How can i read a table that has these data types? And how can i get a CSV out of it?

jkane · September 3, 2024, 10:27am

To read this datatype you will have to use the arrow format by calling readTable with ?format=ARROW (as mentioned in the docs).

The arrow format can handle this data type. You can then convert the arrow file into a csv using python, here is an example of code that calls the endpoint, fetcehs an arrow file and turns it into a CSV:

import os
import requests
import pyarrow as pa
import pyarrow.csv as csv

def download_arrow_and_save_as_csv(token, hostname, dataset_rid, output_file):
    # Define the endpoint URL
    url = f"https://{hostname}/api/v1/datasets/{dataset_rid}/readTable?format=ARROW"

    # Set up the headers with the authorization token
    headers = {
        "Authorization": f"Bearer {token}"
    }

    # Make the request
    response = requests.get(url, headers=headers)

    # Ensure the response is successful
    if response.status_code != 200:
        raise Exception(f"Failed to download the Arrow stream: {response.status_code} - {response.text}")

    # Load the Arrow stream into a table
    table = pa.ipc.open_stream(response.content).read_all()

    # Save the table as a CSV file
    csv.write_csv(table, output_file)

# Usage:
token = os.environ["TOKEN"]
hostname = os.environ["HOSTNAME"]
dataset_rid = "your_dataset_rid"
output_file = "output.csv"

download_arrow_and_save_as_csv(token, hostname, dataset_rid, output_file)