Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers

Yuechen Zhang*¹, Yaoyang Liu*², Bin Xia¹, Bohao Peng¹, Zexin Yan⁴, Eric Lo¹, Jiaya Jia^1,2,3

¹ CUHK, ² HKUST, ³ SmartMore, ⁴ CMU

Teaser video

Hover on the video to see corresponding text prompts

Reference

Dressed in a lab coat and safety goggles, the curious boy conducts a science experiment in a school laboratory. His excited smile and careful movements demonstrate his enthusiasm for learning as he measures chemicals for a reaction.

Video 1

The musical boy, seated at a grand piano, concentrates intently on sheet music during a recital. His nimble fingers and serious expression reveal his dedication to mastering the instrument as proud family members watch from the audience.

Video 2

Reference

In a cozy, warmly lit kitchen, a woman with curly hair and a floral apron carefully drizzles golden honey over freshly baked cinnamon rolls, their aroma filling the air. The rolls, perfectly spiraled and glistening with a light glaze, sit invitingly on a rustic wooden board. As she pours the honey, it cascades in slow motion, catching the light and creating a mesmerizing effect. Her hands, steady and graceful, add a touch of artistry to the scene. The kitchen, adorned with vintage utensils and potted herbs, enhances the comforting, homely atmosphere, making the moment feel both intimate and indulgent.

Video 1

A focused young woman student sits at a tidy desk in her cozy bedroom, surrounded by colorful stationery and a laptop displaying a virtual classroom. She wears a comfortable sweater, her hair neatly tied back, and her expression is one of concentration as she listens intently to her teacher. The room is softly lit, with a small plant and a motivational poster on the wall, creating an inviting learning environment. Occasionally, she takes notes in a vibrant notebook, her pen moving swiftly across the pages. Her eyes occasionally glance at the screen, reflecting her engagement and eagerness to learn in this digital setting.

Video 2

Reference

In a dimly lit theater, a man discreetly sits in the back row, wearing a dark hoodie and jeans, his face partially obscured by shadows. He holds a small camcorder, its lens glinting faintly in the flickering light from the screen. The theater is sparsely populated, with a few patrons scattered throughout, their attention absorbed by the movie. The man's posture is tense yet focused, as he carefully adjusts the camcorder, ensuring a steady capture of the film. The ambient glow from the screen casts a soft light on his determined expression, highlighting his intent to record the cinematic experience unfolding before him.

Video 1

A diligent man, clad in a wide-brimmed straw hat and a plaid shirt, kneels in a vast, sunlit field, surrounded by rows of lush green onion plants. The sun casts a warm glow, highlighting the earthy tones of the soil and the vibrant green of the onion tops. With skilled hands, the worker gently pulls an onion from the ground, its roots trailing soil, and places it into a woven basket nearby. The scene captures the essence of rural life, with distant rolling hills and a clear blue sky framing the background, emphasizing the harmony between nature and human labor.

Video 2

Reference

A bearded man with a thoughtful expression stands in a cozy, dimly lit room filled with vintage decor, wearing a plaid shirt and jeans. He carefully selects a vinyl record from a wooden shelf lined with albums, the warm glow of a nearby lamp casting soft shadows. As he gently places the record onto the turntable, his fingers move with precision and care, reflecting his appreciation for music. The room is filled with the soft crackle of the needle touching the vinyl, and he closes his eyes momentarily, savoring the nostalgic sound. The ambiance is intimate, with the gentle hum of the record player and the soft lighting creating a serene atmosphere.

Video 1

A focused man stands in the dugout, gripping his baseball bat with determination, wearing a classic white jersey with blue pinstripes and a matching cap. The sunlight casts dramatic shadows across his face, highlighting his intense gaze as he prepares for the game. His hands, wrapped in black batting gloves, firmly hold the bat, showcasing his readiness and anticipation. The background reveals the bustling stadium, with blurred fans and vibrant green field, creating an atmosphere of excitement and competition. As he adjusts his stance, the player's concentration and passion for the sport are palpable, embodying the spirit of baseball.

Video 2

Reference

An extreme close-up of the young woman reveals every detail of her face, her shoulder-length black hair falling softly around her delicate features. The beige sweater's collar frames her neck as she remains by the window. The warm light highlights her micro-expressions - the gentle movement of her eyelashes, the slight shifts in her jaw, while her dark eyes seem to process deep thoughts.

Video 1

A serene woman img with delicate features, wearing a flowing white blouse, sits at her kitchen counter bathed in morning light. Her face is clearly visible as she mindfully slices fresh fruit, arranging colorful pieces of mango, strawberries, and kiwi on a white ceramic plate. Soft sunlight streams through sheer curtains, highlighting her gentle expressions as she works. Her movements are precise and graceful as she creates an artistic breakfast composition, her long fingers placing each piece with intention.

Video 2

Reference

In a cozy kitchen, the versatile woman, now in a chef's jacket, expertly plates a gourmet dish. Her meticulous attention to detail and satisfied smile showcase her culinary skills as she prepares for an important dinner service.

Video 1

The creative woman, wearing a paint-splattered apron, stands before a large canvas in an art studio. Her focused expression and steady hand reveal her artistic passion as she adds intricate details to a colorful abstract painting.

Video 2

More Showcases

Reference

Video 1

Video 2

Reference

Video 1

Video 2

Reference

Video 1

Video 2

Reference

Video 1

Video 2

Reference

Video 1

Video 2

Reference

Video 1

Video 2

Reference

Video 1

Video 2

Reference

Video 1

Video 2

Reference

Video 1

Video 2

Comparisons

Reference

DynamiCrafter

EasyAnimate-I2V

CogVideoX-I2V

ID-Animator

Magic Mirror

A seasoned male police officer, wearing a crisp navy uniform adorned with badges, stands beside his patrol car, holding a radio to his mouth. His expression is focused and serious, reflecting the gravity of his communication. The scene is set in an urban environment, with the city skyline visible in the background, and the flashing lights of the patrol car casting a rhythmic glow. As he speaks into the radio, his other hand rests on his utility belt, showcasing his readiness and professionalism. The ambient sounds of distant traffic and the occasional chirp of the radio punctuate the scene, emphasizing the officer's role in maintaining order.

Reference

DynamiCrafter

EasyAnimate-I2V

CogVideoX-I2V

ID-Animator

Magic Mirror

An elderly man, with a determined expression, stands in a sunlit gym, wearing a gray tank top and black shorts, his muscles taut as he grips a heavy kettlebell. The room is filled with natural light streaming through large windows, casting shadows on the polished wooden floor. His face shows concentration and strength, highlighting his commitment to fitness. As he lifts the kettlebell with steady hands, the camera captures the sweat glistening on his brow, emphasizing his effort and resilience. The background features neatly arranged gym equipment, adding to the atmosphere of dedication and perseverance.

Reference

DynamiCrafter

EasyAnimate-I2V

CogVideoX-I2V

ID-Animator

Magic Mirror

A bearded man in his thirties, wearing a plaid shirt and jeans, sits at a rustic wooden bar, surrounded by an array of beer taps and vintage brewery decor. He carefully lifts a frosty pint glass filled with amber beer, examining its color and clarity against the warm, ambient lighting. He takes a slow, appreciative sip, his eyes closing momentarily as he savors the complex flavors. The camera captures the subtle smile of satisfaction on his face, highlighting the rich foam on his upper lip. The background hum of soft chatter and clinking glasses adds to the cozy, inviting atmosphere of the pub.

Reference

DynamiCrafter

EasyAnimate-I2V

CogVideoX-I2V

ID-Animator

Magic Mirror

A serene woman, dressed in a cozy oversized sweater and jeans, kneels on a lush green meadow, gently petting a friendly golden retriever. The dog's tail wags enthusiastically, its fur gleaming in the soft sunlight. Her face lights up with a warm smile as her hand moves tenderly over the dog's head and back. In the background, a picturesque landscape of rolling hills and blooming wildflowers enhances the tranquil scene. The golden retriever, with its tongue lolling out and eyes full of affection, leans into her touch, creating a heartwarming moment of connection and joy.

Reference

DynamiCrafter

EasyAnimate-I2V

CogVideoX-I2V

ID-Animator

Magic Mirror

A focused man in a contemporary gym performs a leg exercise, clad in a fitted black tank top and gray athletic shorts. The scene conveys the intensity of his workout, with sweat glistening on his brow and muscles visibly engaged. Positioned on a sleek leg press machine, he pushes against the resistance with determination. The ambient lighting accentuates his form, while the background showcases neatly arranged weights and exercise equipment. His expression reflects concentration and resolve, embodying the dedication and effort of his fitness journey..

Reference

DynamiCrafter

EasyAnimate-I2V

CogVideoX-I2V

ID-Animator

Magic Mirror

A serene woman, dressed in a flowing white blouse and light blue jeans, stands at a rustic wooden table in a sunlit room filled with greenery. She carefully selects vibrant blooms from a wicker basket, including roses, lilies, and daisies, and begins arranging them in a crystal vase. Sunlight filters through the window, casting a warm glow on her focused expression. As she works, her hands move gracefully, adjusting stems and leaves to create a harmonious bouquet. The scene transitions to a close-up of her hands tying a delicate ribbon around the vase, completing the arrangement with a touch of elegance. The final shot captures her stepping back to admire her creation, a satisfied smile on her face, with the room's natural beauty enhancing the tranquil atmosphere.

Personalized Videos with Style-Specific Prompts

Hover on video to see the text prompts & style

Reference

Art nouveau, organic curves, floral patterns style, a male police officer talking on the radio

Video

Reference

Ukiyo-e japanese woodblock print style, young woman activist posing with flag.

Video

Reference

Low poly 3D, geometric reduction, faceted style, a "woman" washing the dishes

Video

Reference

Low poly 3D, geometric reduction, faceted style, a male police officer talking on the radio.

Video

Reference

Pixel art, 8-bit graphics, retro gaming style, a woman is filling eyebrows

Video

Reference

Synthwave retro, 80s style, sunset colors style, an elderly woman is reading book.

Video

Reference

Electronic glitch art, neon, cyberpunk aesthetic style, a girl is looking around.

Video

Reference

Electronic glitch art, neon, cyberpunk aesthetic style, the elegant woman, now in a power suit...

Video

Multi-shot Video Generation with One Character

A serene woman with delicate features wearing a flowing white blouse:

(1) practices gentle yoga stretches... (2) sits at her kitchen counter bathed in morning light... (3) is working at her writing desk near a window... (4) starts painting her artwork... (5) is preparing for the lunch...

Reference

Shot 1

Shot 2

Shot 3

Shot 4

Shot 5

The young woman with the black shoulder-length hair is captured, wearing her beige knit sweater: shots with different aspects:

Reference

Shot 1

Shot 2

Shot 3

Shot 4

Shot 5

A beard man with yellow T-shirt: working on a wooden table in his workshop:

Reference

Shot 1

Shot 2

Shot 3

Shot 4

Shot 5

Module-wise Ablations.


ID Reference	w/o facial embedding	w/o adaptive condition	Full Model


ID Reference	image pre-training only	video fine-tuning only	Full Model

Limitations

Our method has limitations, including the inability to handle multi-person videos and process fine-grained features. For example, it sometimes fails to accurately capture details like eye color (case 1). Additionally, since its basic generation capability is bound to the base model, it may produce artifacts when dealing with complex physical motions (case 2).

Reference ID	Our failcase	Reference ID	Our failcase

Contact Us

Feel free to contact Yuechen Zhang at zhangyc@link.cuhk.edu.hk for any question，cooperation, and communication.

If you find this work useful, please consider citing:

@article{zhang2025magic,
                        title={Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers},
                        author={Zhang, Yuechen and Liu, Yaoyang and Xia, Bin and Peng, Bohao and Yan, Zexin and Lo, Eric and Jia, Jiaya},
                        journal={arXiv preprint arXiv:2501.03931},
                        year={2025}
                      }

Magic Mirror

Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers

Teaser video

Hover on the video to see corresponding text prompts

More Showcases

Comparisons

Personalized Videos with Style-Specific Prompts

Hover on video to see the text prompts & style

Multi-shot Video Generation with One Character

A serene woman with delicate features wearing a flowing white blouse:

(1) practices gentle yoga stretches... (2) sits at her kitchen counter bathed in morning light... (3) is working at her writing desk near a window... (4) starts painting her artwork... (5) is preparing for the lunch...

The young woman with the black shoulder-length hair is captured, wearing her beige knit sweater: shots with different aspects:

A beard man with yellow T-shirt: working on a wooden table in his workshop:

Module-wise Ablations.

Limitations

Contact Us

Thank UltraPixel, ControlNeXt, and ToonCrafter to provide us the project page's template!