Pengcheng Xi

I'm a senior research scientist at National Research Council Canada (NRC) in Ottawa, Ontario, Canada, where I lead projects on AI for digital health, human motion, and robotics.

I'm an adjunct research professor at Carleton University (CU), and a previous adjunct assistant professor at the University of Waterloo.

I did my PhD at CU, where I was advised by Rafik Goubran and Chang Shu.

Email  /  Scholar  /  X  /  LinkedIn

profile photo

Research

My recent research focuses on the intersection of machine learning, robotics, computer vision, and human-centered systems. In the area of AI for digital health, I develop systems that leverage audio, video, physiological sensing, and multimodal data fusion to support safety, wellbeing, and early detection of health-related conditions.

I am also actively researching human-robot interaction, particularly in assistive contexts for older adults, with an emphasis on physical collaboration and human perception. A key direction in this space involves 3D human motion generation, where we explore AI methods for synthesizing realistic human movements across different environments—work that informs and complements our efforts in assistive robotics.

In the domain of AI for diet and food understanding, I investigate novel approaches that combine visual and linguistic cues with computer vision, unsupervised learning, and generative AI to model natural processes such as food degradation.

My earlier work includes contributions to 3D human shape reconstruction, cross-cultural anthropometric analysis, and deep learning methods for medical image analysis. Across these domains, my goal is to develop intelligent systems that enhance real-world usability, safety, and quality of life.

Updates

  • April 2025: Paper accepted to IEEE Transactions on Instrumentation and Measurement.
  • January 2025: Started new project on generative motion models.
Learning to Assist: Studying Teleoperation Modalities for Scalable Imitation Learning in Humanoid Robots
Submitted., 2025

We present a comparative study on how different teleoperation modalities—markerless hand-tracking, VR controllers, and haptic devices—affect the quality of demonstrations and learned policies in imitation learning for humanoid robots. Focusing on assistive manipulation tasks relevant to aging populations, we show that VR and haptic inputs provide more consistent demonstrations than hand-tracking alone. Moreover, combining modalities improves policy robustness and accuracy. These findings emphasize the importance of modality selection in designing effective imitation learning pipelines for assistive robotics.

Coughprint: Distilled Cough Representations From Speech Foundation Model Embeddings
IEEE Transactions on Instrumentation and Measurement, vol. 74, pp. 1-10, 2025, Art no. 2532210, doi: 10.1109/TIM.2025.3568985., 2025

This paper presents a lightweight, embedded cough analysis model for smart home systems, enabling accurate and privacy-preserving health monitoring. By distilling knowledge from a large speech foundation model, the student network achieves strong performance across multiple cough-related tasks while dramatically reducing model size and computation. The proposed approach generalizes well to unseen sound classes and eliminates the need for cloud-based audio processing.

Food Degradation Analysis Using Multimodal Fuzzy Clustering
CVPR MetaFood Workshop, 2025

The paper presents an approach to modeling food degradation using fuzzy clustering, capturing the gradual nature of decay without relying on labeled data. It integrates traditional visual features with semantic features from Vision-Language Models to enable both low-level and high-level interpretation.