About Me

Ruihao Xia is a PhD student from the East China University of Science and Technology (ECUST) who specializes in the field of computer vision and deep learning. His research is centered on 3D AI Generation, Event-based Vision, Cross-Modality Domain Adaptation, and Semantic Segmentation. Xia received his B.S. in Mechanical Engineering also from ECUST, where he got excellent grades and developed a passion for Computer Vision.

Education

Visiting PhD Student in School of Computing and Information Systems
Singapore Management University (SMU)
2025.01 - Present (2025.10)

Research Interests

  • 3D AI Generation
PhD in Control Science and Engineering
East China University of Science and Technology (ECUST)
2021 - Present (2026.09)

Honors and Awards

B.S. in Mechanical Engineering
East China University of Science and Technology (ECUST)
2017 - 2021

Honors and Awards

  • 2021 Shanghai Excellent Graduates
  • 2020 National Undergraduate Smart Car Competition 2nd Prize
  • 2019-2020 National Scholarship
  • 2019 Shanghai Undergraduate Creative Robot Competition 2nd Prize
  • 2018-2019 National Scholarship

Working Experience

Algorithm Research Intern
- Imaging Algorithm Research Department, Quality Enhancement Center
2024.05 - 2024.09
Conducted frontier research on image matting algorithms for mobile imaging. Focused on addressing the generalization limitations of interactive matting, proposed the COCO-Matting dataset and the SEMat framework. Related work has been submitted to IEEE TCSVT.
Algorithm Research Intern
- Central Research Institute, Advanced Computing and Storage Laboratory
2024.09 - 2024.12
Conducted frontier research on scene understanding algorithms based on event cameras. Focused on the underutilization of foundation models in event-based vision, proposed the TGVFM framework. Related work has been submitted to IEEE TCSVT.

Research & Publications (First Author)

Towards Scalable and Consistent 3D Editing

Ruihao Xia, Yang Tang*, Pan Zhou*

Under Review 2025

We introduce 3DEditVerse, the largest paired 3D editing benchmark, and propose 3DEditFormer, a mask-free transformer enabling precise, consistent, and scalable 3D edits.

Paper | Code | Project Page
3D Editing 3D Generation
3DEditFormer
Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation

Ruihao Xia, Yu Liang, Peng-Tao Jiang, Hao Zhang, Bo Li*, Yang Tang*, Pan Zhou

Neural Information Processing Systems (NeurIPS) 2024

We propose MADM, a diffusion-based framework that leverages text-to-image pre-trained models with pseudo-label stabilization and latent label regression, achieving SoTA semantic segmentation adaptation across image, depth, infrared, and events.

Paper | Code
Cross-Modality Domain Adaptation Semantic Segmentation
MADM
CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation

Ruihao Xia, Chaoqiang Zhao, Meng Zheng, Ziyan Wu, Qiyu Sun, Yang Tang*

International Conference on Computer Vision (ICCV) 2023

We propose CMDA, a cross-modality domain adaptation framework that leverages both images and events with daytime labels, introducing the first image-event nighttime segmentation dataset for evaluation.

Paper | Code
Domain Adaptation Semantic Segmentation Event-based Vision
CMDA
Towards Natural Image Matting in the Wild via Real-Scenario Prior

Ruihao Xia, Yu Liang, Peng-Tao Jiang*, Hao Zhang, Qianru Sun, Yang Tang*, Bo Li, Pan Zhou

Submitted to IEEE TCSVT 2025

We introduce COCO-Matting and SEMat, a dataset-method pair that leverages real-world human mattes and a feature/matte-aligned transformer-decoder design with trimap-based regularization.

Paper | Code
Image Matting Image Segmentation
SEMat
Temporal-Guided Visual Foundation Models for Event-Based Vision

Ruihao Xia, Junhong Cai, Luziwei Leng*, Ran Cheng, Yang Tang*, Pan Zhou

Submitted to IEEE TCSVT 2025

We present TGVFM, a temporal-guided framework that integrates pretrained Visual Foundation Models with novel spatiotemporal attention blocks, achieving SoTA gains in event-based semantic seg., depth estimation, and object detection.

Event-based Vision Visual Foundation Models
TGVFM
Modality Translation and Fusion for Event-based Semantic Segmentation

Ruihao Xia, Chaoqiang Zhao, Qiyu Sun, Shuang Cao, Yang Tang*

IFAC Control Engineering Practice (CEP) 2023

We propose MTF, a modality translation and fusion framework that distills complementary cross-modality knowledge from image-based teachers to event-based networks, achieving SoTA semantic segmentation in low-light conditions.

Paper
Event-based Vision Semantic Segmentation
MTF