Mistral-Nemo-Base-2407

A Large Language Model with 12B Parameters

Support text generation for multiple languages and code data
128k contextual window training to enhance text comprehension and generation capabilities
Pre trained and instruction versions to meet different application requirements
Apache 2.0 license release, flexible use
The model architecture includes 40 layers, 5120 dimensions, and 128 head dimensions to optimize model performance
Has performed well in multiple benchmark tests, such as HellaSwag, Winogrande, etc
Support the use of multiple frameworks, such as mistral_inference, transformers, NeMo

Product Details

Mistral Nemo-Base-2407 is a large pre trained text model with 12B parameters, jointly trained by Mistral AI and NVIDIA. This model has been trained on multilingual and code data, significantly outperforming existing models of the same or smaller scale. Its main features include: Apache 2.0 license release, support for pre training and instruction versions, 128k context window training, support for multiple languages and code data, and a replacement for Mistral 7B. The model architecture includes 40 layers, 5120 dimensions, 128 head dimensions, 14364 hidden dimensions, 32 heads, 8 KV heads (GQA), a vocabulary of approximately 128k, and rotated embeddings (theta=1M). This model has performed well in multiple benchmark tests, such as HellaSwag, Winogrande, OpenBookQA, etc.

Mistral-Nemo-Base-2407

Product Details

Related Projects

Understood zKnown

MBox AI Meet

Klee

CrossPrism for MacOS