Using Apache Sedona in Code Repositories

We are working on geospatial data pipelines and trying to get Apache Sedona set up within code repositories. We’re having trouble actually being able to use Sedona in the transforms, though. I’ve checked out this doc which introduces the @geospatial decorator (https://www.palantir.com/docs/foundry/geospatial/vector_data_in_transforms) but it looks like this is being deprecated. Additionally, I tried to locate the geospatial-tools library within the Foundry code repository, but it’s not showing up when I search (so I’m actually unable to use the @geospatial decorator at all).

Would appreciate any advice here on methods/best practices for using Sedona within transforms. I was able to use the “GEOSPARK” spark profile, but it seems that this doesn’t allow us to actually use Sedona (it looks like it configures the Spark cluster but doesn’t allow us to actually use Sedona)

It seems another user had this issue a few years ago: https://stackoverflow.com/questions/73254417/can-not-add-geospatial-tools-dependency-to-palantir-foundry-code-repository

Ideally, I’d like to be able to interact directly with Sedona at the code level rather than in pipeline builder. We have some Sedona logic that we’ve verified can run on a local spark cluster, but moving this into Foundry has been proven to be quite the challenge.

Thanks so much in advance for the help.

Sorry to hear that you’ve been having trouble getting this set up! Here’s a brief guide that I can confirm works now on a freshly bootstrapped transforms-python repository.

  1. Add the following section at the bottom of the transforms-python/build.gradle file. Note that the exact versions of these libraries available in your environment may differ (you can search in the Libraries panel of a transforms-java repository to see what is available). Also note that you may need to manually add external-jar to your repository’s backing artifact repositories (in my case, it was added automatically after an initial check run failed).
dependencies {
    condaJars "org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.6.0"
    condaJars "org.datasyslab:geotools-wrapper:1.6.0-28.2"
}
  1. Add the apache-sedona conda dependency.

  2. Verify that things work with the following sample code. It seems that SedonaRegistrator doesn’t work properly in Code Repositories preview, but it worked fine for me in VSCode Preview and during an actual build.

from pyspark.sql import functions as F
from sedona.register import SedonaRegistrator
from sedona.sql import st_constructors as stc
from transforms.api import Output, transform_df


@transform_df(
    Output("..."),
)
def compute(ctx):
    SedonaRegistrator.registerAll(ctx.spark_session)
    df = ctx.spark_session.sql("SELECT array(0.0, 1.0, 2.0) AS values")

    min_value = F.array_min("values")
    max_value = F.array_max("values")

    return df.select(stc.ST_Point(min_value, max_value).alias("point"))