Mistral-Nemo-Instruct-2407

Large language model, supporting multiple languages and code data

Supports multilingual and code data training, suitable for multilingual environments
With a 128k contextual window, capable of processing large amounts of text data
The model architecture includes 40 layers, 5120 dimensions, 128 head dimensions, and 1436 hidden dimensions, providing powerful text processing capabilities
Excellent performance in various benchmark tests, such as HellaSwag, Winogrande, OpenBookQA, etc
Supports three different frameworks: mistral_inference, transformers, and NeMo
You can interact with the model through the mistral cut CLI command
Support function calls, able to obtain current weather and other information

Product Details

Mistral-Nemo-Interstructure-2407 is a large language model (LLM) jointly trained by Mistral AI and NVIDIA, and is a guided fine-tuning version of Mistral-Nemo-Base-2407. This model has been trained on multilingual and code data, significantly outperforming existing models of similar or smaller size. Its main features include: support for multilingual and code data training, 128k context windows, and the ability to replace Mistral 7B. The model architecture includes 40 layers, 5120 dimensions, 128 head dimensions, 1436 hidden dimensions, 32 heads, 8 KV heads (GQA), 2 ^ 17 vocabulary (approximately 128k), and rotated embeddings (theta=1M). This model performs well in various benchmark tests, such as HellaSwag (0-shot), Winogrande (0-shot), OpenBookQA (0-shot), etc.

Mistral-Nemo-Instruct-2407

Product Details

Related Projects

Understood zKnown

MBox AI Meet

Klee

CrossPrism for MacOS