How can I convert a dataset of files to a mediaset easily?

VincentF · October 28, 2024, 6:50pm

I’m ingesting files from a sharepoint, which are PDFs.

I see sharepoint is not supported there, hence what’s the best way to convert a dataset to a mediaset, preferably in a no-code way.

How can I convert this dataset of files to a mediaset ?

helenq · October 28, 2024, 8:08pm

Hey currently this isn’t supported (there isn’t a way to do this in a no-code way in Pipeline Builder)

Can let the media set folks comment beyond that if there are any workarounds they can think of

lucyw · October 28, 2024, 8:41pm

Hi, there isn’t a no-code way to do this at the moment.

The workaround would be to use this method in the python sdk that takes files in a dataset and uploads them to a media set here.

joshuam · October 28, 2024, 9:16pm

I ran into this earlier on today. My dataset had a few xlsx files I had to exclude to prevent exceptions.

> from transforms.api import transform, Input
> from transforms.mediasets import MediaSetOutput
> 
> 
> @transform(
>     pdfs=Input('\input_dataset_path'),
>     output_pdfs=MediaSetOutput('\output_mediaset_path')
> )
> def convert_dataset_to_mediaset(ctx, pdfs, output_pdfs):
> 
>     def process_file(fs):
>         with pdfs.filesystem().open(fs.path, 'rb') as f:
>             if 'xlsx' not in fs.path:
>                 output_pdfs.put_media_item(f, fs.path)
> 
>     pdfs.filesystem().files().foreach(lambda fs: process_file(fs))

VincentF · October 23, 2025, 9:46am

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.