Is it possible to create a UDF for use in pipeline builder that takes in PDFs from a data set and outputs either PDFs in a media set or media set references? I saw the related problem linked below but was still wondering if this is even possible: https://community.palantir.com/t/how-to-convert-dataset-with-binary-column-into-a-mediaset/564
For context, I do almost all my transforming in pipeline builder and I am revisiting a goal to extract text from PDFs and display those PDFs from a media set in workshop. I have both many old PDFs and new PDFs weekly, and my ultimate goal is to incrementally process new PDFs. I do not have edit access to the connection that is pulling PDFs into a dataset.