
MiraData
Large scale long video dataset with structured subtitles
- Long video duration: An average of 72 seconds of video clips, fully modeling the video content.
- Structured subtitles: provide detailed descriptions from different perspectives, averaging 318 words.
- Dataset version: Publish four versions of the dataset, containing different amounts of data.
- Video collection and annotation: Manually select channels from platforms such as YouTube, download and split videos.
- Subtitle generation: Use GPT-4V to generate multi-dimensional subtitles to improve the accuracy of semantic understanding.
- Evaluation criteria: Design 17 evaluation indicators to assess the generation of long videos from six perspectives.
- License Agreement: The dataset is for informational purposes only and the copyright belongs to the original video owner.
Product Details
MiraData is a large-scale video dataset that focuses on long video segments with an average duration of 72 seconds. It provides structured subtitles with an average subtitle length of 318 words, enriching the description of video content. By using technologies such as GPT-4V, MiraData demonstrates high accuracy and semantic coherence in video understanding and subtitle generation.