haavard
February 7, 2025, 12:41pm
1
Hello,
We are working with videos in mp4 format stored in media sets and want to process them using multimodal LLMs. However, the video files can be quite large, and we want to downscale them (e.g. 4k β 720p) quality to speed up the process (network transfer etc.).
Do media sets provide an API for downscaling videos natively, or do we need to manually invoke e.g. ffmpeg using Python transforms to accomplish this?
leon
February 7, 2025, 4:39pm
2
Hi, thank you for your interest in using media sets for this workflow!
Unfortunately, this isnβt a workflow that we natively support. Using a library like ffmpeg to perform the conversion is likely the best approach.
haavard
February 9, 2025, 12:44pm
3
Sharing my solution here for anyone finding this question.
Make sure to add ffmpeg
to dependencies:
This transform will incrementally downscale input mp4 videos to 360p, distributing the work to executors.
from transforms.api import transform, Output, incremental
from transforms.mediasets import MediaSetInput, MediaSetOutput
from pyspark.sql import functions as F
import pathlib
import shutil
import subprocess
import tempfile
@incremental(v2_semantics=True)
@transform(
video_files=MediaSetInput(
"ri.mio.main.media-set.123456789"
),
output_videos=MediaSetOutput(
"ri.mio.main.media-set.987654321"
),
output=Output("ri.foundry.main.dataset.abcdefg"),
)
def compute(ctx, video_files, output_videos, output):
@F.udf
def downscale(path):
with tempfile.TemporaryDirectory() as temp_dir:
working_dir = pathlib.Path(temp_dir)
input_path = working_dir / "input.mp4"
output_path = working_dir / "output.mp4"
# copy video from mediaset to local temp dir
with (
video_files.get_media_item_by_path(path) as video_file,
open(input_path, "wb") as local_input_file,
):
shutil.copyfileobj(video_file, local_input_file)
# invoke ffmpeg to downscale to 360p
command = [
"ffmpeg",
"-i",
input_path,
"-vf",
"scale=w=640:h=360:force_original_aspect_ratio=decrease:force_divisible_by=2",
"-preset",
"fast",
"-crf",
"28",
output_path,
]
subprocess.run(command, check=True)
# upload downscaled video to output mediaset
with open(output_path, "rb") as downscaled_video:
return output_videos.put_media_item(
downscaled_video, path
).media_item_rid
# write an output dataset that maps original media items to downscaled media items
output.write_dataframe(
video_files.list_media_items_by_path(ctx)
.repartition(16) # divide input in 16 tasks for parallelism
.withColumn("downscaled_media_item_rid", downscale("path"))
)
1 Like
system
Closed
February 23, 2025, 12:45pm
4
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.