
Mistral-Nemo-Instruct-2407
Large language model, supporting multiple languages and code data
- Supports multilingual and code data training, suitable for multilingual environments
- With a 128k contextual window, capable of processing large amounts of text data
- The model architecture includes 40 layers, 5120 dimensions, 128 head dimensions, and 1436 hidden dimensions, providing powerful text processing capabilities
- Excellent performance in various benchmark tests, such as HellaSwag, Winogrande, OpenBookQA, etc
- Supports three different frameworks: mistral_inference, transformers, and NeMo
- You can interact with the model through the mistral cut CLI command
- Support function calls, able to obtain current weather and other information
Product Details
Mistral-Nemo-Interstructure-2407 is a large language model (LLM) jointly trained by Mistral AI and NVIDIA, and is a guided fine-tuning version of Mistral-Nemo-Base-2407. This model has been trained on multilingual and code data, significantly outperforming existing models of similar or smaller size. Its main features include: support for multilingual and code data training, 128k context windows, and the ability to replace Mistral 7B. The model architecture includes 40 layers, 5120 dimensions, 128 head dimensions, 1436 hidden dimensions, 32 heads, 8 KV heads (GQA), 2 ^ 17 vocabulary (approximately 128k), and rotated embeddings (theta=1M). This model performs well in various benchmark tests, such as HellaSwag (0-shot), Winogrande (0-shot), OpenBookQA (0-shot), etc.