Parsing Shape File in Pipeline Builder yields empty output

jcrane · February 3, 2025, 3:09pm

I am trying to parse some shape files with the “Extract Rows from Shapefile” in pipeline builder and I have confirmed the correct EPSG number for source coordinates. Since I don’t have an error message and just see 0 rows, is there any other way to troubleshoot this? Some logs that indicate malformed shape files or other errors?

I am not aware of any row/size limits (and these aren’t particularly big shapefiles). The data was resent after some back and forth, exported from a users GIS data. Could builder be missing start and end markers for this geo data? Otherwise, are there are other tools to troubleshoot?

arukavina · February 3, 2025, 10:20pm

Something like this happened to me recently. Could you please confirm all the required files are available?

In my case the problem was on the schema definition. So I parsed everything as a string:

if you’re processing a geojson, the issue could be with the usage of single quotes. Changing all single quotes to double and putting through the validator makes your output valid

Hope it helps, let me know.

jcrane · February 3, 2025, 10:49pm

Thanks - I didn’t receive cpg files, and didn’t ask for them since the prompt states all files will be ignored except .shf, .shx, and .dbf. In total, I have shx/shp/dbf/prj files. Do I need to request other file types?

arukavina · February 3, 2025, 10:51pm

No, the irony is that the only optional file: CPG files are optional plain text files that specify the code page used to create a shapefile.

Are all your files named the same? Do you see all of them under a raw (without schema) dataset in Foundry?

jcrane · February 3, 2025, 11:17pm

Yes, they are all named the same, and they are in an unstructured dataset (and they appear as you show in your screenshot above). I am extracting as a string, the shx/shp/dbf files are hexadecimal rows.

I am comparing this to another instance where I successfully used “Extract Nodes from Shapefile” from a collection of files (hexadecimal) in an unstructured dataset. The largest difference I can see is file size of the shp/shx files. The current files I want to parse are much larger. Is it possible this pipeline builder node would overlook the start/end placeholders and not be able to properly output the geo data as rows?

arukavina · February 5, 2025, 2:16am

Ok, I’m running out of ideas.

Could you confirm the following:

Can you open the shapefiles with any other GIS (Qgis, Arcgis, etc)?
Is your shapefile containing any of the following geometry types: point, polyline, polygon and multipoint?

As explained in the docs: PB Shapefile parser only supports point, polyline, polygon and multipoint geometry types

Can you try a code repository transform?

from transforms.api import transform, Input, Output
from geospatial_tools import geospatial
from geospatial_tools.parsers import shapefile_to_dataframe

@geospatial()
@transform(
    output=Output('/my/output/dataset'),
    raw=Input('/my/input/dataset'),
)
def compute(raw, output):
    return output.write_dataframe(
        shapefile_to_dataframe(raw)
    )

All pre-configuration available here: Geospatial • Use vector data in transforms • Palantir

Let me know !

jcrane · February 7, 2025, 5:38pm

Thanks - using this code was a success, and it showed that there was also some metadata along with these geo shapes that was causing pipeline builder to show zero rows.

We are able to progress, thanks for sharing those docs.

arukavina · February 7, 2025, 6:14pm

I’m glad it worked.

Feel free to close this thread pointing to the right solution to others can benefit from it as well.

All the best,
Andrei.-

system · February 21, 2025, 6:15pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.