Hey, I’m trying to transform eml files to pdfs media reference using a dynamic UDF. I dont want to create an output dataset but keep the .eml so i can use the transform “Extract rows from a dataset of email files“. Is this possible to build a dynamic UDF in code repo to use in pipeline builder?
What do you mean by you don’t want to create a dataset?
In terms of writing a UDF for use in builder, you can build a python function which can act as a UDF in builder (docs) or you can write a java udf that can be used in pipeline builder.
Thanks for your response. I can explain better: I am trying to build a python UDF that from within pipeline builder takes an input dataset full of .eml files and outputs a PDF that can then be used as a media reference in my dataset (also an ontology object). The problem I am running into is that there doesn’t seem to be a way to have a UDF that inputs unstructured data(eml files) then outputs unstructured data (pdf files). the workaround I found is doing a transform repo that defines the inputs and outputs but I would like to handle everything dynamically within pipeline builder. Is my analysis/strategy wrong?
Based on your problem statement, you can use a dynamic UDF. However, please note that the pipeline builder requires output. Protect your EML files. For converting EML files, I recommend the Advik EML to PDF Converter program. This utility converts your EML files to PDF quickly while maintaining integrity. The EML files remain unchanged; you can do conversions quickly and in bulk, and most importantly, it’s easy to use. I also used this and got good results.