
Mistral-Nemo-Base-2407
A Large Language Model with 12B Parameters
- Support text generation for multiple languages and code data
- 128k contextual window training to enhance text comprehension and generation capabilities
- Pre trained and instruction versions to meet different application requirements
- Apache 2.0 license release, flexible use
- The model architecture includes 40 layers, 5120 dimensions, and 128 head dimensions to optimize model performance
- Has performed well in multiple benchmark tests, such as HellaSwag, Winogrande, etc
- Support the use of multiple frameworks, such as mistral_inference, transformers, NeMo
Product Details
Mistral Nemo-Base-2407 is a large pre trained text model with 12B parameters, jointly trained by Mistral AI and NVIDIA. This model has been trained on multilingual and code data, significantly outperforming existing models of the same or smaller scale. Its main features include: Apache 2.0 license release, support for pre training and instruction versions, 128k context window training, support for multiple languages and code data, and a replacement for Mistral 7B. The model architecture includes 40 layers, 5120 dimensions, 128 head dimensions, 14364 hidden dimensions, 32 heads, 8 KV heads (GQA), a vocabulary of approximately 128k, and rotated embeddings (theta=1M). This model has performed well in multiple benchmark tests, such as HellaSwag, Winogrande, OpenBookQA, etc.