Downscaling videos in mediasets

Hello,

We are working with videos in mp4 format stored in media sets and want to process them using multimodal LLMs. However, the video files can be quite large, and we want to downscale them (e.g. 4k β†’ 720p) quality to speed up the process (network transfer etc.).

Do media sets provide an API for downscaling videos natively, or do we need to manually invoke e.g. ffmpeg using Python transforms to accomplish this?

Hi, thank you for your interest in using media sets for this workflow!

Unfortunately, this isn’t a workflow that we natively support. Using a library like ffmpeg to perform the conversion is likely the best approach.

Sharing my solution here for anyone finding this question.

Make sure to add ffmpeg to dependencies:

This transform will incrementally downscale input mp4 videos to 360p, distributing the work to executors.

from transforms.api import transform, Output, incremental
from transforms.mediasets import MediaSetInput, MediaSetOutput
from pyspark.sql import functions as F

import pathlib
import shutil
import subprocess
import tempfile


@incremental(v2_semantics=True)
@transform(
    video_files=MediaSetInput(
        "ri.mio.main.media-set.123456789"
    ),
    output_videos=MediaSetOutput(
        "ri.mio.main.media-set.987654321"
    ),
    output=Output("ri.foundry.main.dataset.abcdefg"),
)
def compute(ctx, video_files, output_videos, output):
    @F.udf
    def downscale(path):
        with tempfile.TemporaryDirectory() as temp_dir:
            working_dir = pathlib.Path(temp_dir)
            input_path = working_dir / "input.mp4"
            output_path = working_dir / "output.mp4"

            # copy video from mediaset to local temp dir
            with (
                video_files.get_media_item_by_path(path) as video_file,
                open(input_path, "wb") as local_input_file,
            ):
                shutil.copyfileobj(video_file, local_input_file)

            # invoke ffmpeg to downscale to 360p
            command = [
                "ffmpeg",
                "-i",
                input_path,
                "-vf",
                "scale=w=640:h=360:force_original_aspect_ratio=decrease:force_divisible_by=2",
                "-preset",
                "fast",
                "-crf",
                "28",
                output_path,
            ]
            subprocess.run(command, check=True)

            # upload downscaled video to output mediaset
            with open(output_path, "rb") as downscaled_video:
                return output_videos.put_media_item(
                    downscaled_video, path
                ).media_item_rid

    # write an output dataset that maps original media items to downscaled media items
    output.write_dataframe(
        video_files.list_media_items_by_path(ctx)
        .repartition(16)  # divide input in 16 tasks for parallelism
        .withColumn("downscaled_media_item_rid", downscale("path"))
    )
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.