About Me

I am the founding director of Celia Large Model Application Lab @HKRC where I lead a research group of 30+ excellent researchers focusing on Multi-modal (Includes both vision understanding and generation) and AI Agent.
Before that, I was a senior researcher at SenseTime Group where I investigated on-device multi-modal models including vision language models (VLMs) and diffusion models (DMs).
I hold a PhD from MMLab, CUHK, supervised by Prof. Xiaogang Wang and Prof. Hongsheng Li.

News

[May 2026] Three papers accepted by ICML 2026.
[Apr., 2026] Released Aura, a SOTA streaming video understanding framework.
[Apr., 2026] One paper accepted by ACL 2026.
[Mar., 2026] Released CoVe, a novel agentic post-training framework.
[Feb., 2026] Released Capybara, the first unified video generation & editing model.
[Feb., 2026] Three papers accepted by CVPR 2026.
[Jan., 2026] Two papers accepted by ICLR 2026.
[Oct., 2025] Check out our new work MathCanvas, the first unified multi-model that enables thinking while drawing auxiliary lines!
[Aug., 2025] One paper accepted by EMNLP 2025.
[July, 2025] Released a novel reinforcement learning algorithm GHPO for LLM post-training.
[Apr., 2025] Released a SOTA level Image-to-Video model Pusa.
[Sep., 2024] Joined Huawei Hong Kong Research Center as TopMinds.
[July, 2022] Joined Sensetime as senior researcher.
[June, 2022] Graduated from MMLab, CUHK.