
SenseVoice
A multilingual speech understanding model that provides high-precision speech recognition and emotion recognition.
- Automatic Speech Recognition (ASR): Supports high-precision speech recognition for over 50 languages.
- Speech Language Recognition (LID): capable of recognizing and distinguishing different languages.
- Speech emotion recognition (SER): outperforms the current best model in emotion recognition performance on test data.
- Audio Event Detection (AED): Supports detecting various human-computer interaction events, such as background music, applause, laughter, etc.
- Efficient reasoning speed: The SenseVoice Small model processes 10 second audio in just 70 milliseconds.
- Convenient fine-tuning support: Provides fine-tuning scripts and strategies to facilitate users in adjusting models according to business scenarios.
- Service deployment support: Supports multiple concurrent requests, diverse client languages, and easy integration into different platforms.
Product Details
SenseVoice is a speech based model that includes multiple speech understanding capabilities such as automatic speech recognition (ASR), speech language recognition (LID), speech emotion recognition (SER), and audio event detection (AED). It focuses on high-precision multilingual speech recognition, speech emotion recognition, and audio event detection, supporting over 50 languages and surpassing the Whisper model in recognition performance. The model adopts a non autoregressive end-to-end framework with extremely low inference latency, making it an ideal choice for real-time speech processing.