I am an undergraduate student pursuing a B.Sc. in Computational Data Science at The Chinese University of Hong Kong, with an expected graduation in May 2027. My research interests focus on multimodal large language models (MLLMs), spatial reasoning, reinforcement learning in games, and video security in computer vision.

Currently, I am involved in multiple research projects spanning computer vision, natural language processing, and machine learning. I have experience working with token-level reinforcement learning algorithms, adversarial video purification, and benchmarking spatial reasoning abilities of MLLMs.

I am passionate about advancing the field of artificial intelligence, particularly in areas where computer vision meets robust security and language understanding.

🔥 News

2026.01: 📄 Our paper “FMVP: Masked Flow Matching for Adversarial Video Purification” is published on arXiv.
2025.11: 🚀 Started internship at Huawei 2012 Lab.
2025.02: 📄 Our paper “Human Cognitive Benchmarks Reveal Foundational Visual Gaps in MLLMs” is published on arXiv.
2024.07: 🏆 Achieved Dean’s List recognition (Top 10%) for exceptional academic performance.
2024.05: 🏅 Meritorious Winner in the MCM Mathematical Modeling Contest (COMAP).

📝 Publications

arXiv 2026

FMVP: Masked Flow Matching for Adversarial Video Purification

Duoxun Tang, Xueyi Zhang, Chak Hin Wang, Xi Xiao, Dasen Dai, Xinhang Jiang, Wentao Shi, Rui Li, Qing Li

arXiv

Proposed a novel video purification framework integrating Conditional Flow Matching (CFM) with a masking strategy to physically disrupt adversarial patterns.
Designed a Frequency-Gated Loss (FGL) to suppress high-frequency adversarial noise while preserving low-frequency semantic fidelity.

arXiv 2025

Human Cognitive Benchmarks Reveal Foundational Visual Gaps in MLLMs

Jen-Tse Huang, Dasen Dai, Jen-Yuan Huang, Youliang Yuan, Xiaoyuan Liu, Wenxuan Wang, Wenxiang Jiao, Pinjia He, Zhaopeng Tu

arXiv

Built a benchmark suite for evaluating MLLMs via the Kit of Factor-Referenced Cognitive Tests, revealing foundational visual gaps in current models.
A difficulty-controllable generator was developed for the most challenging sub-tests. It can produce infinite, gradient-based test cases, effectively preventing model overfitting and ensuring the long-term validity of the benchmark.

🎖 Honors and Awards

2025 Dean’s List (Top 10%), Faculty of Engineering, CUHK
2024 Dr Shu-chia Yang GOAL Programme Memorial Scholarship, United College, CUHK
2024 The Alumni Association of United College of the CUHK Ltd Prize, United College, CUHK
2024 ELITE Stream Scholarship, Faculty of Engineering, CUHK
2024 Dean’s List (Top 10%), Faculty of Engineering, CUHK
2024 Talent Development Scholarship, HKSAR
2024 Meritorious Winner in the MCM Mathematical Modeling Contest, COMAP

📖 Education

2024.09 - 2027.05 (Expected), B.Sc. in Computational Data Science, The Chinese University of Hong Kong, New Territories, Hong Kong

🔬 Research Experience

2025.05 - Present, Token-Level RL Algorithm for Multimodal Large Language Models, Dr. Xufang Luo, Shanghai, China
- Developing a token-level RL algorithm to strengthen the spatial-reasoning capacity of MLLMs.
2025.06 - Present, Pushing the Boundaries of Game-RL Generalization, Prof. Xiangyu Yue, NT, Hong Kong
- Established an automated benchmark for classic puzzle games with dynamic visual puzzle descriptions.
2024.09 - 2025.06, Benchmarking Spatial Reasoning Abilities of MLLMs, Dr. Jen-Tse Huang, NT, Hong Kong
- Built a benchmark suite for evaluating MLLMs via the Kit of Factor-Referenced Cognitive Tests.

💻 Technical Skills