I have a dataset which is the sync output of an SFTP crawler. I only want to process certain files, and have an auxiliary row level dataset which contains the paths of the files I want to process.
Normally, an inner join on the filepath would do the trick, however calling dataframe.filesystem().files() and then joining will throw a object does not have an attribute _jdf
error.
How does one do this?