CosyVoice

CosyVoice

Multi language large-scale speech generation model, providing full stack inference, training, and deployment capabilities.

  • Supports speech generation in multiple languages, including but not limited to Chinese, English, Japanese, Cantonese, and Korean.
  • Provide zero shot, cross lingual, and directive reasoning capabilities.
  • Support Sound Style Conversion (SFT) technology, which can mimic specific sound styles.
  • Provide complete training and inference scripts to facilitate model training and usage for users.
  • Support quick demonstrations and experiences through a web interface.
  • Support the use of Docker for model deployment, making it convenient to use in different environments.

Product Details

CosyVoice is a large-scale multilingual speech generation model that not only supports speech generation in multiple languages, but also provides full stack capabilities from inference to training and deployment. This model is of great importance in the field of speech synthesis because it can generate natural, smooth, and lifelike speech, suitable for multiple language environments. The background information of CosyVoice shows that it was developed by the FunAudioLLM team and licensed under Apache-2.0.