I’m ingesting files from a sharepoint, which are PDFs.
I see sharepoint is not supported there, hence what’s the best way to convert a dataset to a mediaset, preferably in a no-code way.
How can I convert this dataset of files to a mediaset ?
I’m ingesting files from a sharepoint, which are PDFs.
I see sharepoint is not supported there, hence what’s the best way to convert a dataset to a mediaset, preferably in a no-code way.
How can I convert this dataset of files to a mediaset ?
Hey currently this isn’t supported (there isn’t a way to do this in a no-code way in Pipeline Builder)
Can let the media set folks comment beyond that if there are any workarounds they can think of
Hi, there isn’t a no-code way to do this at the moment.
The workaround would be to use this method in the python sdk that takes files in a dataset and uploads them to a media set here.
I ran into this earlier on today. My dataset had a few xlsx files I had to exclude to prevent exceptions.
> from transforms.api import transform, Input
> from transforms.mediasets import MediaSetOutput
>
>
> @transform(
> pdfs=Input('\input_dataset_path'),
> output_pdfs=MediaSetOutput('\output_mediaset_path')
> )
> def convert_dataset_to_mediaset(ctx, pdfs, output_pdfs):
>
> def process_file(fs):
> with pdfs.filesystem().open(fs.path, 'rb') as f:
> if 'xlsx' not in fs.path:
> output_pdfs.put_media_item(f, fs.path)
>
> pdfs.filesystem().files().foreach(lambda fs: process_file(fs))