DateTime column in dataset turns to struct in Pipeline Builder, builds fail

Multiple points of weirdness.

  • I’ve uploaded a raw dataset from a CSV. When I view the dataset, I see a lastRefresh column with DateTime type. Values are rendered like 2024-08-27T18:22:14.800Z. However, when I build a pipeline off this raw dataset, as soon as I apply the first transform lastRefresh appears as a Struct in the preview with fields “timestamp” and “offset”. This happens even though the transform doesn’t touch the lastRefresh column.
  • When I ignore the type change and extract the datetime value using “Get Struct Field” transform, my deployment fails with error org.apache.spark.sql.AnalysisException: [INVALID_EXTRACT_BASE_FIELD_TYPE] Can't extract a value from "lastRefresh". Need a complex type [STRUCT, ARRAY, MAP] but got "STRUCT<timestamp: TIMESTAMP, offset: INT>". This error seems self contradictory.

Any idea what’s going on?

Hello! What are the values looking like prior to upload to foundry?

The CSV values are exactly as they appear in Foundry, for example 2024-08-27T18:22:14.800Z

Can you try “extract many struct fields”?

For future reference, we got a work around by immediately casting the datetime to string and then doing whatever you want with the string inside PB. We’ll track the bug in a GH issue for a long term fix :slight_smile:

1 Like