
GenAU
Audio generation and automatic subtitle generation model
- AutoCap: Utilizing audio metadata to improve subtitle quality, achieving a CIDEr score of 83.2.
- GenAu: Based on FIT architecture, using a 125 million parameter scalable converter architecture to generate audio.
- Audio 1D-VAE: Generate latent sequences from Mel Specgram representation.
- Q-Former module: compresses audio representations into fewer tokens to improve subtitle model efficiency.
- Cross attention layer: passing information between input latent and learnable latent tokens.
- Global attention layer: enables potential tokens to communicate globally.
- Support the generation and training of large-scale audio text datasets.
Product Details
GenAU is an audio generation model developed by Snap Research, which significantly improves the quality of audio generation through AutoCap automatic subtitle generation model and GenAu audio generation architecture. It is challenging in generating environmental sound and effects, especially in situations where data is scarce and subtitle quality is insufficient. The GenAU model is capable of generating high-quality audio and has great potential in the field of audio synthesis.