How to query Foundry datasets locally using SQL?

What’s the easiest way to query Foundry datasets locally, ideally using SQL?

I’m trying to use JDBC but it doesn’t seem to be as easy as docs suggest. What I did is try to run a simple python script to query the data:

  1. As per docs, I downloaded the JDBC driver and put it in my local folder
  2. I created a new python env and installed JayDeBeApi
  3. I ran the attached scripts both with JayDeBeApi and with JPype directly
  4. In both cases I’m getting an error of the sort:
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0x0000000106934288, pid=21605, tid=259
#
# JRE version:  (11.0.18+10) (build )
# Java VM: OpenJDK 64-Bit Server VM (11.0.18+10-LTS, mixed mode, tiered, compressed oops, g1 gc, bsd-aarch64)
# Problematic frame:
# V  [libjvm.dylib+0x300288]  CodeHeap::allocate(unsigned long)+0x1c8
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /<local_dir>/hs_err_pid21605.log
#
#
[1]    21605 abort      python jdbc_test.py

I’m running it locally on my macos 14.7. Java info:

$ java -version
OpenJDK Runtime Environment Zulu11.62+17-CA (build 11.0.18+10-LTS)
OpenJDK 64-Bit Server VM Zulu11.62+17-CA (build 11.0.18+10-LTS, mixed mode)

$ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home

Simple python script that I tried - using JayDeBeApi:

import jaydebeapi

jdbc_driver_path = '<PATH_TO_DRIVER>/foundry-sql-jdbc-driver-3.27.0-withdep.jar'
jdbc_url = 'jdbc:foundrysql://<HOSTNAME>?Password=<TOKEN>'
query = 'SELECT * FROM "<PATH_TO_DATASET>" LIMIT 10'

# Establish a connection
conn = jaydebeapi.connect(
    'com.palantir.foundry.sql.jdbc.FoundryJdbcDriver',
    jdbc_url,
    [],
    jdbc_driver_path
)

# Create a cursor object
cursor = conn.cursor()

# Execute a query
cursor.execute(query)

# Fetch all results
results = cursor.fetchall()

# Print the results
for row in results:
    print(row)

# Close the cursor and connection
cursor.close()
conn.close()

Anothe python script that I tried that uses jpype directly:

import jpype
import jpype.imports
from jpype.types import *

jdbc_driver_path = '<PATH_TO_DRIVER>/foundry-sql-jdbc-driver-3.27.0-withdep.jar'
jdbc_url = 'jdbc:foundrysql://<HOSTNAME>?Password=<TOKEN>'
query = 'SELECT * FROM "<PATH_TO_DATASET>" LIMIT 10'

# Start the JVM
jpype.startJVM(jpype.getDefaultJVMPath(), f"-Djava.class.path={jdbc_driver_path}")

# Import the Java classes
from java.sql import DriverManager

# Establish a connection
conn = DriverManager.getConnection(jdbc_url)

# Create a statement object
stmt = conn.createStatement()

# Execute a query
rs = stmt.executeQuery(query)

# Process the result set
while rs.next():
    # Assuming the table has columns 'id' and 'name'
    id = rs.getInt("id")
    name = rs.getString("name")
    print(f"ID: {id}, Name: {name}")

# Close the result set, statement, and connection
rs.close()
stmt.close()
conn.close()

# Shutdown the JVM
jpype.shutdownJVM()

Am I making some obvious mistake when setting things up? And is JDBC really the way to go or are there easier ways?

Are you aware of Foundry DevTools? You can use it to call the same API the JDBC Driver is calling and retrieve the result as pandas or spark dataframe, or as pyarrow table.

https://emdgroup.github.io/foundry-dev-tools/index.html

On the first page of the docs is a minimal example for SQL Queries as well.

Thanks @nicornk for referencing foundry-dev-tools! I was aware of it, but similarly had troubles setting it up.

I installed foundry-dev-tools in my venv, but then first problem I encountered is that I don’t have the fdt config command:

$ fdt --help
Usage: fdt [OPTIONS] COMMAND [ARGS]...

  Foundry DevTools CLI.

Options:
  --help  Show this message and exit.

Commands:
  build  Command to start a build and tail the logs.
  info   Prints useful information about the Foundry DevTools installation.
  s3     CLI for the s3 compatible foundry datasets API.

I added then a new config manually, and this is the info I see:

Foundry DevTools Information
├── Python
│   └── ╭─────────────────────────────────────────────────╮
│       │ Python v3.9.6 (default, Aug  9 2024, 14:24:13)  │
│       │ [Clang 16.0.0 (clang-1600.0.26.3)], CPython     │
│       │ Foundry DevTools v1.5.0                         │
│       │ not using conda x                               │
│       ╰─────────────────────────────────────────────────╯
├── Spark x
│   └── ╭─────────────────────────╮
│       │ PySpark not installed x │
│       │ Spark not installed x   │
│       ╰─────────────────────────╯
├──  Java ✔
│   └── ╭───────────────────────────────────────────────╮
│       │ Java Runtime Name:OpenJDK Runtime Environment │
│       │ Java Runtime Version: 11.0.18+10-LTS          │
│       ╰───────────────────────────────────────────────╯
  1 [credentials]
├──  System Information
│   └── ╭─────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│       │ OS              │ Darwin                                                                                                       │
│       │ OS release      │ 23.6.0                                                                                                       │
│       │ OS version      │ Darwin Kernel Version 23.6.0: Wed Jul 31 20:49:39 PDT 2024; root:xnu-10063.141.1.700.5~1/RELEASE_ARM64_T6000 │
│       │ Instruction set │ arm64                                                                                                        │
│       ╰─────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
├──  Dependencies
│   ├── core ✔
│   │   ├── pyarrow v17.0.0 installed ✔
│   │   ├── pandas v2.2.3 installed ✔
│   │   ├── requests v2.32.3 installed ✔
│   │   ├── fs v2.4.16 installed ✔
│   │   ├── backoff v2.2.1 installed ✔
│   │   └── palantir-oauth-client v1.6.1 installed ✔
│   ├── cli ✔
│   │   ├── click v8.1.7 installed ✔
│   │   ├── inquirer v3.4.0 installed ✔
  1 [credentials]
│   │   ├── websockets v13.1 installed ✔
│   │   ├── rich v13.8.1 installed ✔
│   │   └── packaging v24.1 installed ✔
│   ├── integration x
│   │   ├── dask not installed x
│   │   └── fastparquet not installed x
│   ├── s3 x
│   │   └── aiobotocore not installed x
│   ├── testing x
│   │   ├── pytest not installed x
│   │   ├── pytest-mock not installed x
│   │   ├── pytest-spark not installed x
│   │   ├── freezegun not installed x
│   │   ├── pytest-asyncio not installed x
│   │   ├── requests-mock not installed x
│   │   ├── fsspec not installed x
│   │   ├── timeflake not installed x
│   │   ├── polars not installed x
│   │   └── streamlit not installed x
│   └── transforms x
│       └── pyspark not installed x
└── Configuration
    ├── Configuration file: ~/.foundry-dev-tools/config (exists: ✔)
    ├── Project configuration file: x
    └── ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
        ┃ Config Name                            ┃ Value                                        ┃
        ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
        │ jwt                                    │ Is not set.                                  │
        │ client_id                              │ Is not set.                                  │
        │ client_secret                          │ Is not set.                                  │
        │ grant_type                             │ authorization_code                           │
        │ scopes                                 │ None                                         │
        │ foundry_url                            │ Is not set.                                  │
        │ cache_dir                              │ ~/.foundry-dev-tools/cache │
        │ transforms_output_folder               │ None                                         │
        │ transforms_sql_sample_select_random    │ x                                            │
        │ transforms_force_full_dataset_download │ x                                            │
        │ enable_runtime_token_providers         │ ✔                                            │
        │ transforms_freeze_cache                │ x                                            │
        │ transforms_sql_sample_row_limit        │ 5000                                         │
        │ transforms_sql_dataset_size_threshold  │ 500                                          │
        └────────────────────────────────────────┴──────────────────────────────────────────────┘

My config at ~/.foundry-dev-tools/config is:

[credentials]
domain = "<DOMAIN>"
jwt = "<SECRET>"

I ran a simple example from the docs page:

from foundry_dev_tools import FoundryContext

# the context, that contains your credentials and configuration
ctx = FoundryContext()

df = ctx.foundry_sql_server.query_foundry_sql("SELECT * FROM `/Global/Foundry Training and Resources/Example Data/Aviation Ontology/airlines`", branch='master')

print(df.shape)

And I get an error:

ImportError: cannot import name 'FoundryContext' from 'foundry_dev_tools'

Not sure if I failed at installing something or setting some config? Ideally would like to find the most minimal case to run that example code.

hi @nsns,

from the output of fdt info I see that you are still on v1.5.0. The FoundryContext only became available with v2 of fdt.

Can you upgrade your fdt version in your venv with

pip install foundry-dev-tool --upgrade

Afterwards, you need to rename your ~/.foundry-dev-tools/config to ~/.foundry-dev-tools/config.toml and a minimum content looks like this:

[credentials]

domain="stack.palantirfoundry.com"
jwt="eyJwb..."

Afterwards, fdt config should output that you are authenticated:

Successfully authenticated as '...'

In case you don’t want a config file and just python (don’t store the token in git!):

from foundry_dev_tools import FoundryContext, JWTTokenProvider

ctx = FoundryContext(
    token_provider=JWTTokenProvider(host="stack.palantirfoundry.com", jwt="eyJwbG...")
)
df = ctx.foundry_sql_server.query_foundry_sql(
    "SELECT * FROM `/Global/Foundry Training and Resources/Example Data/Aviation Ontology/airlines`",
    branch="master",
)
print(df.shape)

Hope that helps, otherwise let me know where you are stuck.

Hi @nicornk, thanks for the response, I managed to find the problem - it was python3.9!

v2.x.x of foundry-dev-tools is supported by only python3.10 and above, so that’s the reason why it wasn’t installed even when I ran pip install foundry-dev-tools. I used now python3.12 and all was smooth.

Thanks again, and great effort developing this library! :muscle:

1 Like