Qwen2-Audio

Alibaba Cloud launches a large-scale audio language model

Support free voice interaction without text input
Capable of providing audio and text instructions for audio analysis
Excellent performance in multiple standard benchmark tests, such as ASR, S2TT, SER, etc
Two model series are about to be released: Qwen2 Audio and Qwen2 Audio Chat
Overview of the architecture of the three-stage training process
Provide all evaluation scripts to reproduce the results

Product Details

Qwen2-Audio is a large-scale audio language model proposed by Alibaba Cloud, which can accept various audio signal inputs and perform audio analysis or direct text replies based on voice commands. This model supports two different audio interaction modes: voice chat and audio analysis. It performs well in 13 standard benchmark tests, including automatic speech recognition, speech to text translation, speech emotion recognition, and more.

Qwen2-Audio

Product Details

Related Projects

Udio v1.5

Ask the little universe

Speech to Note

SpeechGPT2