StreamVC

StreamVC

Real time low latency speech conversion technology

  • Real time low latency speech conversion
  • Maintain the source speech content and rhythm
  • Match the timbre of the target speech
  • Suitable for mobile platforms
  • Suitable for real-time communication scenarios
  • Using SoundStream neural audio codec architecture
  • The causality of learning soft speech units
  • Provide whitening fundamental frequency information to improve pitch stability

Product Details

StreamVC is a real-time low latency speech conversion solution developed by Google, which can match the timbre of the target speech while maintaining the content and rhythm of the source speech. This technology is particularly suitable for real-time communication scenarios such as telephone and video conferencing, and can be used for use cases such as voice anonymization. StreamVC utilizes the architecture and training strategy of SoundStream neural audio codec to achieve lightweight and high-quality speech synthesis. It also demonstrates the causality of learning soft speech units and the effectiveness of providing whitening fundamental frequency information to improve pitch stability without leaking source timbre information.