From 7afcaa8219a6e507d006d3bc9dc8586a53076cb6 Mon Sep 17 00:00:00 2001 From: Samuel Teodoro <91779432+sateodoro@users.noreply.github.com> Date: Fri, 20 Dec 2024 16:16:34 +0900 Subject: [PATCH] Update index.html --- index.html | 96 +++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 85 insertions(+), 11 deletions(-) diff --git a/index.html b/index.html index 45992aa..7519d82 100644 --- a/index.html +++ b/index.html @@ -247,8 +247,76 @@
+ We also present our new MIVE Dataset specifically designed for multi-instance video editing tasks. + MIVE Dataset features 200 diverse videos sourced from the VIPSeg dataset. +
+ ++ We generated and summarized the source captions using LLaVa and Llama 3, respectively. + We then manually inserted tags in the source captions to establish instance-to-mask correspondence. + Finally, we generated the target edit captions using Llama 3. + We show a sample input video and source and target captions below. + The target instance captions are color-coded to match the color of the masks. +
+ +
+ Source Caption: In a domestic setting, a person in a gray hoodie stands in front of
+ washing machine A and washing machine B against a blue wall, with a blue
+ recycling trash can to the left.
+ + + Source Video: + |
+
+
+ Target Caption: In a domestic setting, an alien
+ stands in front of oven
+ and yellow washing machine against
+ a blue wall, with a blue recycling trash can to the left.
+ + + Masked Source Video: + |
+
---|---|
+ + | + ++ + | + +
@article{teodoro2024mive,
+ title={MIVE: New Design and Benchmark for Multi-Instance Video Editing},
+ author={Samuel Teodoro and Agus Gunawan and Soo Ye Kim and Jihyong Oh and Munchurl Kim},
+ journal={arXiv preprint arXiv:2412.12877},
+ year={2024}
+}
+