ZHANG Yuechen, Julian 🥥

Hi! Here is ZHANG Yuechen, Julian. I am a fourth-year Ph.D. student at CUHK, advised by Prof. Jiaya Jia. I also join SmartMore as a computer vision developer.

Before that, I received my B.Sc. degree at CUHK.

My primary research interest is in controllable AIGC & VLM. Additionally, I have worked on several projects involving image segmentation.

I will graduate in 2025 Fall. Drop me an Email if you are recruiting!

Email  /  Google Scholar  /  Github

Selected Research
Hover the picture to see more
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers
Yuechen Zhang*, Yaoyang Liu*, Bin Xia, Bohao Peng, Zexin Yan, Eric Lo, Jiaya Jia

Preprint, 2024
arXiv / Project Page / Demo / Code GitHub Repo stars

Light passes through the Magic Mirror to create a personalized virtual world~ It generates identity-preserved videos from reference images using a conditional adaptive normalization module for faster convergence in video Diffusion Transformers.

ControlNeXt: Powerful and Efficient Control for Image and Video Generation
Bohao Peng, Jian Wang, Yuechen Zhang, Wenbo Li, Ming-Chang Yang, Jiaya Jia

Preprint, 2024
arXiv / Project Page / Demo / Code GitHub Repo stars

This work proposes a light-weight controllable module for various base models (SD1.5, SDXL, SD3, SVD) and tasks (image / video generation with various conditions).

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li*, Yuechen Zhang*, Chengyao Wang*, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

Preprint, 2024
arXiv / Project Page / Demo / Model / Data / Code GitHub Repo stars

Mining potential of open-source VLMs! Mini-Gemini is a novel framework ranges from 2B to 34B VLMs for hi-resolution image understanding. It has a impressive OCR capability, and can generate HQ images powered by its multi-modal reasoning ability.

Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, Jiaya Jia

CVPR, 2024
arXiv / Project Page / Code GitHub Repo stars

Control text generation by highlighting our prompt! Prompt Highlighter is a training-free inference pipeline, which facilitates token-level user interactions for customized generation. Our method is compatible for both LLMs and VLMs.

Real-World Image Variation by Aligning Diffusion Inversion Chain
Yuechen Zhang, Jinbo Xing, Eric Lo, Jiaya Jia

NeurIPS (Spotlight), 2023
arXiv / Project Page / Code GitHub Repo stars

Given an image as the prompt, we can generate its variations by aligning the diffusion inversion chain. The variations are diverse and controllable.

Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

CVPR, 2024
arXiv / Project Page / Code GitHub Repo stars

Add 'Lego' attribute to the child, an edited video is generated. Powered by a novel video inversion process and cross-attention control. We also find that a Decoupled-Guidance strategy is essential for video editing.

Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields
Yuechen Zhang, Zexin He, Jinbo Xing, Xufeng Yao, Jiaya Jia

CVPR, 2023
arXiv / Project Page / Code GitHub Repo stars

We present a controllable scene stylization method utilizing radiance fields to stylize a 3D scene, with a single stylized 2D view taken as reference.

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
Jinbo Xing, Menghan Xia, Yuechen Zhang, Xiaodong Cun, Jue Wang, Tien-Tsin Wong

CVPR, 2023
arXiv / Project Page/ Code GitHub Repo stars

A speech input generates an authentic 3D facial animation based on representations with discrete motion prior.

R2Former: Probing Region Relationship in Semantic Segmentation Transformers
Yuechen Zhang, Tiancheng Shen*, Huaijia Lin, Lu Qi, Eric Lo, Jiaya Jia

Preprint, 2022

R2Former decomposes the relationships into intra-region ones and inter-region ones. With such modifications, the performance of mask classification can be improved on widely used semantic segmentation benchmarks.

High Quality Segmentation for Ultra High-resolution Images
Tiancheng Shen, Yuechen Zhang, Lu Qi, Jason Kuen, Xingyu Xie, Jianlong Wu, Zhe Lin, Jiaya Jia

CVPR, 2022
Paper / Code GitHub Repo stars

We propose the Continuous Refinement Model (CRM) for the ultra high-resolution mask refinement. CRM aligns the feature with the refinement target and aggregates them to reconstruct image details.

PCL: Proxy-based Contrastive Learning for Domain Generalization
Xufeng Yao, Yang Bai, Xinyun Zhang, Yuechen Zhang, Qi Sun, Ran Chen, Ruiyu Li, Bei Yu

CVPR, 2022
Paper / Code GitHub Repo stars

We propose a novel proxy-based contrastive learning method, which replaces the original sample-to-sample relations with proxy-to-sample relations, significantly alleviating the positive alignment issue.

Other Projects

Flow-aware synthesis: A generic motion model for video frameinterpolation Paper

Computational Visual Media, 2021. Jinbo Xing, Wenbo Hu, Yuechen Zhang, Tien-Tsin Wong


ESTR4999: Few-Shot Glyph Style Transfer
Code

CUHK FYP. Supervised by Tien-Tsin Wong.


ESTR4998: Protrait Style Transfer
Code

CUHK FYP. Collaborated with Jinbo Xing, supervised by Tien-Tsin Wong.

Experiences
SmartMore Corporation Limited Computer Vision Developer
Mentor: Shu Liu
Jan. 2020 - Present
The Chinese University of Hong Kong Bachelor of Computer Science
Supervisor: Tien-Tsin Wong
Sep. 2016 - Jul. 2021
Nanyang Technological University GEM Trailblazer Exchange Program Jan. 2019 - May. 2019

Selected Awards
  • CUHK ELITE Stream Scholarship, 2017, 2018

  • CUHK CSE Academic Outstanding Award, 2018, 2019, 2020

  • CUHK Faculty of Engineering, Dean’s List, 2017, 2018

  • CWChu College Scholarships for Academic Excellence, 2018, 2019, 2020

Teaching
CSCI3280 | Introduction to Multimedia Systems | 2023 Spring
ESTR4998 | Final Year Thesis | 2024-2025
Reviewer
2024 | CVPR, NeurIPS, ECCV, AAAI
2023 | AAAI

Last updated: Nov 2022
Web page design credit to Jon Barron and Zexin He
Avatar style credit to Shaoteng Liu