Modèles vidéo

daVinci MagiHuman Générateur Texte/Image vers Vidéo avec Synchronisation Audio

Créez des vidéos avec daVinci MagiHuman - un modèle fondation audio-vidéo open-source de 15B par Sand.ai et SII GAIR Lab. Générez vidéo et audio synchronisés à partir de texte ou d'images avec une précision de lip sync leader dans 7 langues. Jusqu'à 1080p avec 5-10 secondes. Architecture Transformer single-stream, vidéo 5s 256p en seulement 2 secondes sur un H100.

/video/text-to-video

Public

Traduire en English

Optimiser l'invite

*

daVinci MagiHuman Text to Video Gallery

Experience the cinematic power of daVinci MagiHuman text-to-video generation. Create stunning videos with synchronized audio from detailed text descriptions, featuring industry-leading lip sync across 7 languages.

Create with daVinci MagiHuman

AI Video

Rainy Tokyo Night

A woman in a red coat walks through a neon-lit Tokyo alley on a rainy night with shimmering reflections.

Prompt

“Rainy night in a neon-lit Tokyo alley, a woman in a red coat walks slowly under an umbrella. Reflections shimmer on wet cobblestones. Handheld camera follows her from behind, bokeh street lights, cinematic color grade, moody atmosphere.”

Live PipelineTake 01 / 01

daVinci MagiHuman Image to Video Gallery

Transform your static images into dynamic videos with daVinci MagiHuman. Experience seamless image-to-video conversion with realistic facial expressions, natural body motion, and synchronized lip-synced audio.

Create with daVinci MagiHuman

Source Feeds01 Inputs

Podcast Host Speaking - Input 1

Pipeline

00%

Program · On AirAI · Generated

Output

Transcript · 01

Podcast Host Speaking

Vidéos YouTube daVinci MagiHuman

Regardez les démonstrations et critiques de la communauté présentant les capacités de génération audio-vidéo de daVinci MagiHuman

daVinci MagiHuman Avis populaires sur X

Découvrez ce que les gens disent de daVinci MagiHuman sur X (Twitter)

🪄 Introducing daVinci-MagiHuman: The Performance-Level Audio-Video Generative Foundation Model Proudly open-sourced and jointly developed by SII GAIR Lab & Sand.ai, it sets a new standard for multimodal AI. ⏳ 1/6

2:30 PM · Mar 23, 2026

daVinci-MagiHuman is a 15B single-stream Transformer, trained from scratch to generate synced video+audio with self-attention only—no cross-attention or multi-stream paths. It is open-source, supports 6 languages, beats Ovi/LTX, and runs on one H100.

2:03 AM · Mar 25, 2026

I have been testing open source daVinci-MagiHuman, a single-stream 15B Transformer trained from scratch that jointly generates video + audio. 5s 1080p video in 38s on a single H100, about 1 minute on newer gaming Nvidia GPUs By @SII_GAIR + @SandAI_HQ

1:23 PM · Mar 25, 2026

Read 10 replies

うみゆき@AI研究

daVinci-MagiHumanという新しい動画生成モデルがオープンで出た。これがLTX-2.3よりもすごいとかいう話。特にオーディオ生成がいい感じらしい。さらに多言語対応してて日本語の音声も対応してると書かれてる。開発したGAIRってのは上海イノベーション研究所内の研究ラボらしい reddit.com/r/StableDiffus…

6:54 AM · Mar 25, 2026

チャエン | デジライズ CEO《重要AIニュースを毎日最速で発信⚡️》

映像と音声を同時生成のオープンソースモデル「daVinci-MagiHuman」が登場・OSS界隈ではトップクラスの性能・日中英韓独仏の6言語対応・音声認識誤り率14.6% クローズドのSeedance 2.0に対抗。デモの感じは精度が高そう H100で5秒間の1080p動画を38秒で生成したらしい

9:51 PM · Mar 25, 2026

田中義弘 | taziku CEO / AI × Creative

動画生成AIはオープンソースでも戦えるか？ daVinci-MagiHuman は、動画と音声をシングルストリームの15B Transformerで同時生成する完全オープンソースモデル。 Ovi 1.1に80.0%、LTX 2.3に60.9%勝率。 H100で1080pの5秒の動画を38.4秒で生成。日本語にも対応！詳細は🧵

11:04 AM · Mar 26, 2026

DaVinci-MagiHuman for ComfyUI. - 15B-param single-stream model runs in ~6GB VRAM via block-level swapping; - 8-step distillation; github.com/mjansrud/Comfy…

Wildminder

@wildmindai

daVinci-MagiHuman. We have another fast single-stream audio-video 15B foundation model by @SandAI_HQ > no separate pathways or cross-attention modules. > just raw self-attention doing all the heavy lifting. > wins 80% vs Ovi 1.1, 60% vs LTX 2.3; > native multilingual realistic

9:35 AM · Mar 27, 2026

🪄 Introducing daVinci-MagiHuman: The Performance-Level Audio-Video Generative Foundation Model Proudly open-sourced and jointly developed by SII GAIR Lab & Sand.ai, it sets a new standard for multimodal AI. ⏳ 1/6

2:30 PM · Mar 23, 2026

I have been testing open source daVinci-MagiHuman, a single-stream 15B Transformer trained from scratch that jointly generates video + audio. 5s 1080p video in 38s on a single H100, about 1 minute on newer gaming Nvidia GPUs By @SII_GAIR + @SandAI_HQ

1:23 PM · Mar 25, 2026

Read 10 replies

チャエン | デジライズ CEO《重要AIニュースを毎日最速で発信⚡️》

映像と音声を同時生成のオープンソースモデル「daVinci-MagiHuman」が登場・OSS界隈ではトップクラスの性能・日中英韓独仏の6言語対応・音声認識誤り率14.6% クローズドのSeedance 2.0に対抗。デモの感じは精度が高そう H100で5秒間の1080p動画を38秒で生成したらしい

9:51 PM · Mar 25, 2026

DaVinci-MagiHuman for ComfyUI. - 15B-param single-stream model runs in ~6GB VRAM via block-level swapping; - 8-step distillation; github.com/mjansrud/Comfy…

Wildminder

@wildmindai

daVinci-MagiHuman. We have another fast single-stream audio-video 15B foundation model by @SandAI_HQ > no separate pathways or cross-attention modules. > just raw self-attention doing all the heavy lifting. > wins 80% vs Ovi 1.1, 60% vs LTX 2.3; > native multilingual realistic

9:35 AM · Mar 27, 2026

daVinci-MagiHuman is a 15B single-stream Transformer, trained from scratch to generate synced video+audio with self-attention only—no cross-attention or multi-stream paths. It is open-source, supports 6 languages, beats Ovi/LTX, and runs on one H100.

2:03 AM · Mar 25, 2026

うみゆき@AI研究

daVinci-MagiHumanという新しい動画生成モデルがオープンで出た。これがLTX-2.3よりもすごいとかいう話。特にオーディオ生成がいい感じらしい。さらに多言語対応してて日本語の音声も対応してると書かれてる。開発したGAIRってのは上海イノベーション研究所内の研究ラボらしい reddit.com/r/StableDiffus…

6:54 AM · Mar 25, 2026

田中義弘 | taziku CEO / AI × Creative

動画生成AIはオープンソースでも戦えるか？ daVinci-MagiHuman は、動画と音声をシングルストリームの15B Transformerで同時生成する完全オープンソースモデル。 Ovi 1.1に80.0%、LTX 2.3に60.9%勝率。 H100で1080pの5秒の動画を38.4秒で生成。日本語にも対応！詳細は🧵

11:04 AM · Mar 26, 2026

🪄 Introducing daVinci-MagiHuman: The Performance-Level Audio-Video Generative Foundation Model Proudly open-sourced and jointly developed by SII GAIR Lab & Sand.ai, it sets a new standard for multimodal AI. ⏳ 1/6

2:30 PM · Mar 23, 2026

うみゆき@AI研究

daVinci-MagiHumanという新しい動画生成モデルがオープンで出た。これがLTX-2.3よりもすごいとかいう話。特にオーディオ生成がいい感じらしい。さらに多言語対応してて日本語の音声も対応してると書かれてる。開発したGAIRってのは上海イノベーション研究所内の研究ラボらしい reddit.com/r/StableDiffus…

6:54 AM · Mar 25, 2026

DaVinci-MagiHuman for ComfyUI. - 15B-param single-stream model runs in ~6GB VRAM via block-level swapping; - 8-step distillation; github.com/mjansrud/Comfy…

Wildminder

@wildmindai

daVinci-MagiHuman. We have another fast single-stream audio-video 15B foundation model by @SandAI_HQ > no separate pathways or cross-attention modules. > just raw self-attention doing all the heavy lifting. > wins 80% vs Ovi 1.1, 60% vs LTX 2.3; > native multilingual realistic

9:35 AM · Mar 27, 2026

daVinci-MagiHuman is a 15B single-stream Transformer, trained from scratch to generate synced video+audio with self-attention only—no cross-attention or multi-stream paths. It is open-source, supports 6 languages, beats Ovi/LTX, and runs on one H100.

2:03 AM · Mar 25, 2026

チャエン | デジライズ CEO《重要AIニュースを毎日最速で発信⚡️》

映像と音声を同時生成のオープンソースモデル「daVinci-MagiHuman」が登場・OSS界隈ではトップクラスの性能・日中英韓独仏の6言語対応・音声認識誤り率14.6% クローズドのSeedance 2.0に対抗。デモの感じは精度が高そう H100で5秒間の1080p動画を38秒で生成したらしい

9:51 PM · Mar 25, 2026

I have been testing open source daVinci-MagiHuman, a single-stream 15B Transformer trained from scratch that jointly generates video + audio. 5s 1080p video in 38s on a single H100, about 1 minute on newer gaming Nvidia GPUs By @SII_GAIR + @SandAI_HQ

1:23 PM · Mar 25, 2026

Read 10 replies

田中義弘 | taziku CEO / AI × Creative

動画生成AIはオープンソースでも戦えるか？ daVinci-MagiHuman は、動画と音声をシングルストリームの15B Transformerで同時生成する完全オープンソースモデル。 Ovi 1.1に80.0%、LTX 2.3に60.9%勝率。 H100で1080pの5秒の動画を38.4秒で生成。日本語にも対応！詳細は🧵

11:04 AM · Mar 26, 2026

🪄 Introducing daVinci-MagiHuman: The Performance-Level Audio-Video Generative Foundation Model Proudly open-sourced and jointly developed by SII GAIR Lab & Sand.ai, it sets a new standard for multimodal AI. ⏳ 1/6

2:30 PM · Mar 23, 2026

チャエン | デジライズ CEO《重要AIニュースを毎日最速で発信⚡️》

映像と音声を同時生成のオープンソースモデル「daVinci-MagiHuman」が登場・OSS界隈ではトップクラスの性能・日中英韓独仏の6言語対応・音声認識誤り率14.6% クローズドのSeedance 2.0に対抗。デモの感じは精度が高そう H100で5秒間の1080p動画を38秒で生成したらしい

9:51 PM · Mar 25, 2026

daVinci-MagiHuman is a 15B single-stream Transformer, trained from scratch to generate synced video+audio with self-attention only—no cross-attention or multi-stream paths. It is open-source, supports 6 languages, beats Ovi/LTX, and runs on one H100.

2:03 AM · Mar 25, 2026

田中義弘 | taziku CEO / AI × Creative

動画生成AIはオープンソースでも戦えるか？ daVinci-MagiHuman は、動画と音声をシングルストリームの15B Transformerで同時生成する完全オープンソースモデル。 Ovi 1.1に80.0%、LTX 2.3に60.9%勝率。 H100で1080pの5秒の動画を38.4秒で生成。日本語にも対応！詳細は🧵

11:04 AM · Mar 26, 2026

I have been testing open source daVinci-MagiHuman, a single-stream 15B Transformer trained from scratch that jointly generates video + audio. 5s 1080p video in 38s on a single H100, about 1 minute on newer gaming Nvidia GPUs By @SII_GAIR + @SandAI_HQ

1:23 PM · Mar 25, 2026

Read 10 replies

DaVinci-MagiHuman for ComfyUI. - 15B-param single-stream model runs in ~6GB VRAM via block-level swapping; - 8-step distillation; github.com/mjansrud/Comfy…

Wildminder

@wildmindai

daVinci-MagiHuman. We have another fast single-stream audio-video 15B foundation model by @SandAI_HQ > no separate pathways or cross-attention modules. > just raw self-attention doing all the heavy lifting. > wins 80% vs Ovi 1.1, 60% vs LTX 2.3; > native multilingual realistic

9:35 AM · Mar 27, 2026

うみゆき@AI研究

daVinci-MagiHumanという新しい動画生成モデルがオープンで出た。これがLTX-2.3よりもすごいとかいう話。特にオーディオ生成がいい感じらしい。さらに多言語対応してて日本語の音声も対応してると書かれてる。開発したGAIRってのは上海イノベーション研究所内の研究ラボらしい reddit.com/r/StableDiffus…

6:54 AM · Mar 25, 2026

Reel · Specifications

Qu'est-ce que daVinci MagiHuman

Le modèle fondation audio-vidéo open-source de 15B de Sand.ai avec le meilleur lip sync

· 0115BParamètres
· 021080pRésolution Max
· 037Langues Supportées
· 042sVitesse 256p

daVinci MagiHuman est un Transformer single-stream de 15 milliards de paramètres qui génère conjointement vidéo et audio synchronisés à partir de texte ou d'images, atteignant une précision de lip sync de premier plan avec un taux d'erreur de mots de 14,6% dans 7 langues.

Reel · Capabilities

Fonctionnalités de daVinci MagiHuman

Découvrez les capacités avancées qui font de daVinci MagiHuman un outil exceptionnel pour la génération audio-vidéo

Feature 01 / 08
Génération Audio-Vidéo Conjointe
Générez vidéo et audio synchronisés en une seule passe grâce à une architecture Transformer single-stream avec self-attention uniquement, sans pipeline audio séparé.
Feature 02 / 08
Lip Sync Leader du Secteur
Taux d'erreur de mots de 14,6% pour la synchronisation labiale, surpassant significativement Ovi 1.1 (40,45%) et LTX 2.3 (19,23%) dans les benchmarks de précision vocale.
Feature 03 / 08
Support Vocal en 7 Langues
Générez des vidéos synchronisées avec la parole en anglais, chinois (mandarin et cantonais), japonais, coréen, allemand et français avec prononciation naturelle.
Feature 04 / 08
Génération Ultra-Rapide
Produisez une vidéo 256p de 5 secondes en seulement 2 secondes sur un seul GPU H100. La distillation DMD-2 en 8 étapes élimine le besoin de guidance sans classificateur.
Feature 05 / 08
Double Mode d'Entrée
Créez des vidéos à partir de prompts texte ou animez des images fixes. Les deux modes supportent des ratios d'aspect, résolutions et durées de 5 à 10 secondes configurables.
Feature 06 / 08
Super-Résolution Jusqu'à 1080p
Générez des vidéos en 256p, 540p, 720p ou 1080p via un pipeline de super-résolution en espace latent sans surcharge de décodage-encodage VAE supplémentaire.
Feature 07 / 08
Open Source Apache 2.0
Entièrement open-source sous licence Apache 2.0 avec pile complète incluant poids de base, modèle distillé, modèle de super-résolution et code d'inférence pour usage commercial illimité.
Feature 08 / 08
Excellence Centrée sur l'Humain
Spécialisé dans la génération d'humains numériques avec expressions faciales expressives, mouvements corporels réalistes et préservation cohérente des personnages entre les images.

FAQ

Questions Fréquemment Posées

Questions courantes sur la génération audio-vidéo de daVinci MagiHuman

D'autres questions ?

[email protected]

Rejoindre Discord Soumettre un ticket

daVinci MagiHuman supporte deux modes principaux : Texte-vers-Vidéo (génération de vidéos avec audio synchronisé à partir de prompts) et Image-vers-Vidéo (animation d'images fixes avec audio optionnel). Les deux modes supportent des ratios configurables (16:9 paysage, 9:16 portrait), résolutions jusqu'à 1080p et durées de 5 à 10 secondes.

daVinci MagiHuman supporte la génération vocale synchronisée en 7 langues : anglais, chinois (mandarin), cantonais, japonais, coréen, allemand et français. Le modèle atteint un taux d'erreur de mots de 14,6% pour le lip sync, surpassant significativement Ovi 1.1 (40,45%) et LTX 2.3 (19,23%).

daVinci MagiHuman supporte plusieurs résolutions : 256p (le plus rapide), 540p (super-résolution), 720p et 1080p (super-résolution). La durée peut être configurée de 5 à 10 secondes par incréments d'une seconde. Les ratios paysage (16:9) et portrait (9:16) sont supportés.

Sur un seul GPU NVIDIA H100, daVinci MagiHuman génère une vidéo 256p de 5 secondes en environ 2 secondes. Pour des résolutions supérieures : 540p prend environ 8 secondes et 1080p environ 38,4 secondes pour une vidéo de 5 secondes. Cette vitesse est obtenue grâce à la distillation DMD-2 en 8 étapes.

Oui, daVinci MagiHuman est entièrement open-source sous licence Apache 2.0 par Sand.ai et SII GAIR Lab. Le stack complet est disponible incluant les poids du modèle de base, le modèle distillé, le modèle de super-résolution et le code d'inférence, permettant une utilisation commerciale sans restriction.

daVinci MagiHuman se distingue par son architecture Transformer single-stream utilisant uniquement la self-attention (sans cross-attention ni chemins multi-stream), permettant la génération conjointe audio-vidéo dans un seul modèle. Il atteint la meilleure précision de lip sync (14,6% WER), supporte 7 langues et obtient un taux de victoire de 80% contre Ovi 1.1 en évaluation humaine.

Comment utiliser daVinci MagiHuman Texte vers Vidéo

Générez des vidéos avec audio synchronisé à partir de descriptions textuelles

Rédigez votre Prompt

Entrez une description détaillée de la vidéo souhaitée. Incluez le sujet, l'action, le contenu vocal et la langue souhaitée pour un lip sync optimal.

Comment utiliser daVinci MagiHuman Image vers Vidéo

Animez des images fixes en vidéos avec audio synchronisé

Téléchargez votre Image

Téléchargez une image de référence de la personne ou scène à animer. daVinci MagiHuman excelle dans le contenu centré sur l'humain avec des expressions faciales réalistes.

Pricing · Choose Yours

Tarifs

Choisissez le plan qui vous convient. Pas de frais cachés, pas de surprises.

Une fois prend en charge le paiement crypto (BTC, USDT, ETH, 350+)

Facturation mensuelle

Gratuit-Une fois

Essayez avant d'acheter

0

Une fois

USD

Gratuit

32crédits

Jusqu'à 3 vidéos

Jusqu'à 32 images

Support multi-modèles

Texte vers vidéo

Image vers vidéo

Vidéo vers vidéo

Personnage cohérent

Générateur d'animations IA

Modèles et effets

Améliorateurs vidéo IA

Communauté interactive

Vitesse de génération plus rapide

Sans filigrane

Plus de mouvements de caméra

Visibilité vidéo privée

Protection contre la copie

Support prioritaire

Populaire

Pro-1 Mois

Améliorez votre expérience IA

29.99

1 Mois

USD

800

800crédits1 Mois

Jusqu'à 80 vidéos1 Mois

Jusqu'à 800 images1 Mois

3 tâches(Tâches parallèles)

Support multi-modèles

Texte vers vidéo

Image vers vidéo

Vidéo vers vidéo

Personnage cohérent

Générateur d'animations IA

Modèles et effets

Améliorateurs vidéo IA

Communauté interactive

Vitesse de génération plus rapide

Sans filigrane

Plus de mouvements de caméra

Visibilité vidéo privée

Protection contre la copie

Support prioritaire

Lite-1 Mois

Commencez votre parcours IA

19.99

1 Mois

USD

300crédits1 Mois

Jusqu'à 30 vidéos1 Mois

Jusqu'à 300 images1 Mois

3 tâches(Tâches parallèles)

Support multi-modèles

Texte vers vidéo

Image vers vidéo

Vidéo vers vidéo

Personnage cohérent

Générateur d'animations IA

Modèles et effets

Améliorateurs vidéo IA

Communauté interactive

Vitesse de génération plus rapide

Sans filigrane

Plus de mouvements de caméra

Visibilité vidéo privée

Protection contre la copie

Support prioritaire

Voir les tarifs détaillés