Cloud computing company Alibaba Cloud has updated its Wan visual generation models with a new Wan2.6 series that allows users to appear in (artificial intelligence) AI-generated videos using their own face and voice.
The Wan2.6 series introduces a reference-to-video model called Wan2.6-R2V. It lets users upload a short reference video of a person, animal, or object, then use text prompts to generate new scenes featuring the same subject. The system keeps the appearance and voice from the reference while placing the subject into new video settings. Multiple subjects can also appear in the same scene.
Alibaba said the model can help simplify how short videos and scripted content are produced by reducing the need for repeated filming and voice recording. The company described Wan2.6-R2V as the first reference-to-video model of its kind released in China.
The update also improves four existing models in the Wan lineup. These include text-to-video, image-to-video, and two image generation models. The models now support multi-shot video creation, which allows several connected scenes to be generated with consistent visuals and sound.
Video outputs can run up to 15 seconds, giving creators more time to build short narratives. Audio and visuals are better aligned, and sound effects can be generated alongside video to make scenes feel more natural.
For image generation, the Wan2.6 series supports mixed text and image outputs. Users can control visual style more precisely and edit images, including portraits, using longer and more detailed prompts in Chinese or English.
Developers and creators can access the Wan2.6 models through Model Studio, Alibaba Cloud’s AI development platform, and through Wan’s official website. The models will also be added to the Qwen App, Alibaba’s main AI application.