Set parseUnescapedQuotes to true in CSV parser

thien · August 5, 2024, 11:40pm

I’ve got the following error when trying to do a basic identity transform using pipeline builder

Unescaped quote character '"' inside quoted value of CSV field. To allow unescaped quotes, set 'parseUnescapedQuotes' to 'true' in the CSV parser settings. Cannot parse CSV input

How do I actually set this setting?

helenq · August 6, 2024, 10:42am

I think you have to fix this on the dataset itself under Details > Schema > and then manually edit the schema to include parseUnescapedQuotes: true in the schema json

sandpiper · August 7, 2024, 7:23am

I believe that parseUnescapedQuotes is an option of the underlying Univocity library used in TextDataFrameReader, but it’s not actually exposed by TextDataFrameReader per the docs.

You should be able to define the behavior you want by using DataSourceDataFrameReader instead, which allows you to specify any of the options supported by the Spark CSV reader for the unescapedQuoteHandling parameter (see the relevant Spark CSV docs).