Training-Free Efficient Video Generation via Dynamic Token Carving
Please stay tuned for all video loading...
More Showcases
Hover on the video to see corresponding text prompts
Jenga+HunyuanI2V / 338s
Jenga+HunyuanI2V / 338s
Jenga+HunyuanI2V / 338s
Jenga+HunyuanI2V / 338s
Jenga-3Stage / 157s
Jenga-3Stage / 157s
Jenga-3Stage / 157s
Jenga-3Stage / 157s
Jenga-3Stage / 157s
Jenga-Turbo / 225s
Jenga-Turbo / 225s
Jenga-Turbo / 225s
Jenga-Turbo / 225s
Jenga-Turbo / 225s
Jenga-Turbo / 225s
Jenga+AccVideo / 76s
Jenga+AccVideo / 76s
Jenga+AccVideo / 76s
Jenga+Wan2.1-1.3B / 24s
Jenga+Wan2.1-1.3B / 24s
Jenga+Wan2.1-1.3B / 24s
Comparisons
Simple Guess Game: Guess which of the following three pairs of video generation processes takes less time? (hover on the video to see the result)
Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.
Left Video
Right Video
3D animation of a small, round, fluffy creature with big, expressive eyes explores a vibrant, enchanted forest. The creature, a whimsical blend of a rabbit and a squirrel, has soft blue fur and a bushy, striped tail. It hops along a sparkling stream, its eyes wide with wonder. The forest is alive with magical elements: flowers that glow and change colors, trees with leaves in shades of purple and silver, and small floating lights that resemble fireflies. The creature stops to interact playfully with a group of tiny, fairy-like beings dancing around a mushroom ring. The creature looks up in awe at a large, glowing tree that seems to be the heart of the forest.
Left Video
Right Video
A close up view of a glass sphere that has a zen garden within it. There is a small dwarf in the sphere who is raking the zen garden and creating patterns in the sand.
Left Video
Right Video
More Compared Cases
The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it's tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.
HunyuanVideo / 1625s
TeaCache-fast / 708s
SVG / 908s
Jenga-Base / 347s
Jenga-Turbo / 225s
Jenga-3stage / 157s
A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.
HunyuanVideo / 1625s
TeaCache-fast / 708s
SVG / 908s
Jenga-Base / 347s
Jenga-Turbo / 225s
Jenga-3stage / 157s
A large orange octopus is seen resting on the bottom of the ocean floor, blending in with the sandy and rocky terrain. Its tentacles are spread out around its body, and its eyes are closed. The octopus is unaware of a king crab that is crawling towards it from behind a rock, its claws raised and ready to attack. The crab is brown and spiny, with long legs and antennae. The scene is captured from a wide angle, showing the vastness and depth of the ocean. The water is clear and blue, with rays of sunlight filtering through. The shot is sharp and crisp, with a high dynamic range. The octopus and the crab are in focus, while the background is slightly blurred, creating a depth of field effect.
HunyuanVideo / 1625s
TeaCache-fast / 708s
SVG / 908s
Jenga-Base / 347s
Jenga-Turbo / 225s
Jenga-3stage / 157s
A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.
HunyuanVideo / 1625s
TeaCache-fast / 708s
SVG / 908s
Jenga-Base / 347s
Jenga-Turbo / 225s
Jenga-3stage / 157s
Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.
HunyuanVideo / 1625s
TeaCache-fast / 708s
SVG / 908s
Jenga-Base / 347s
Jenga-Turbo / 225s
Jenga-3stage / 157s
Historical footage of California during the gold rush.
HunyuanVideo / 1625s
TeaCache-fast / 708s
SVG / 908s
Jenga-Base / 347s
Jenga-Turbo / 225s
Jenga-3stage / 157s
A movie trailer featuring the adventures of the 30 year old space man wearing a red knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.
HunyuanVideo / 1625s
TeaCache-fast / 708s
SVG / 908s
Jenga-Base / 347s
Jenga-Turbo / 225s
Jenga-3stage / 157s
Ablation Study
1. Effect of different text-attention amplification bias values that affect field of views
negative bias
zero bias
low bias
mid bias
high bias
w/o bias: abnormal FOV (360P first stage)
original 1 stage (720P first stage)
bias with 2 stages (540P first stage)
bias with 3 stages (360P first stage)
w/o bias: abnormal FOV (360P first stage)
original 1 stage (720P first stage)
bias with 2 stages (540P first stage)
bias with 3 stages (360P first stage)
2. Effectiveness of the Adjacency Mask
w/o adjacency mask
with adjacency mask
w/o adjacency mask
with adjacency mask
Limitation Analysis
Please hover on the video to see the text prompt
Main failure case: latent misalignment when resizing
A. hand has wrong content
A. with enhanced prompt
B. boundary misalignment
B. with enhanced prompt
Alternative Solution: Use enhanced prompts / Generate contents with complex scene & textures
Based on enhanced prompts, we can eliminate the quality degradation of the generated video, with a much smaller inital resolution (360P).
dynamic scene
contents with detailed textures
static scene with enhanced prompt
complex scene with enhanced prompt
Contact Us
Feel free to contact Yuechen Zhang at zhangyc@link.cuhk.edu.hk for any question,cooperation, and communication.
If you find this work useful, please consider citing:
@article{zhang2025trainingfreeefficientvideogeneration, title={Training-Free Efficient Video Generation via Dynamic Token Carving}, author={Yuechen Zhang and Jinbo Xing and Bin Xia and Shaoteng Liu and Bohao Peng and Xin Tao and Pengfei Wan and Eric Lo and Jiaya Jia}, journal={arXiv preprint arXiv:2505.16864}, year={2025} }