Qwen2-Audio

Qwen2-Audio

Alibaba Cloud launches a large-scale audio language model

  • Support free voice interaction without text input
  • Capable of providing audio and text instructions for audio analysis
  • Excellent performance in multiple standard benchmark tests, such as ASR, S2TT, SER, etc
  • Two model series are about to be released: Qwen2 Audio and Qwen2 Audio Chat
  • Overview of the architecture of the three-stage training process
  • Provide all evaluation scripts to reproduce the results

Product Details

Qwen2-Audio is a large-scale audio language model proposed by Alibaba Cloud, which can accept various audio signal inputs and perform audio analysis or direct text replies based on voice commands. This model supports two different audio interaction modes: voice chat and audio analysis. It performs well in 13 standard benchmark tests, including automatic speech recognition, speech to text translation, speech emotion recognition, and more.