Multiple points of weirdness.
- I’ve uploaded a raw dataset from a CSV. When I view the dataset, I see a
lastRefresh
column withDateTime
type. Values are rendered like2024-08-27T18:22:14.800Z
. However, when I build a pipeline off this raw dataset, as soon as I apply the first transformlastRefresh
appears as a Struct in the preview with fields “timestamp” and “offset”. This happens even though the transform doesn’t touch thelastRefresh
column. - When I ignore the type change and extract the datetime value using “Get Struct Field” transform, my deployment fails with error
org.apache.spark.sql.AnalysisException: [INVALID_EXTRACT_BASE_FIELD_TYPE] Can't extract a value from "lastRefresh". Need a complex type [STRUCT, ARRAY, MAP] but got "STRUCT<timestamp: TIMESTAMP, offset: INT>"
. This error seems self contradictory.
Any idea what’s going on?