
MAVIS
Mathematical Visual Instruction Optimization Model
- MAVIS Capture: Contains 588K high-quality chart title pairs covering geometry and functions.
- MAVIS Struct: Contains 834K instruction tuning data, the reason for using a lightweight text version.
- Math CLIP: A view encoder designed specifically for understanding mathematical charts in MLLMs.
- MAVIS-7B: An MLLM that achieved leading performance in the MathVerse benchmark test through a three-stage training paradigm.
Product Details
MAVIS is a mathematical visual instruction tuning model for multimodal large language models (MLLMs), which primarily enhances MLLMs' ability to solve visual mathematical problems by improving visual encoding mathematical charts, chart language alignment, and mathematical reasoning skills. The model includes two newly planned datasets, a mathematical visual encoder, and a mathematical MLLM, which achieved leading performance in the MathVerse benchmark test through a three-stage training paradigm.