I am a Senior Research Scientist at the National Research Council Canada (NRC) in Ottawa,
where I lead projects at the intersection of AI, digital health, human motion, and robotics.
My recent research focuses on the intersection of machine learning, robotics, computer vision, and human-centered systems.
A key direction is human-autonomy teaming, where I study human trust in automation, situational awareness,
and command-and-control frameworks to support effective collaboration between people and intelligent systems.
I am actively researching human-robot interaction, particularly in assistive contexts for older adults,
with an emphasis on physical collaboration and human perception. Closely connected to this is my work on
3D human motion generation, where we develop AI methods for synthesizing realistic human movements across
environments—research that informs both motion understanding and interaction design for assistive robotics.
In the area of AI for digital health, I develop systems that leverage audio, video, physiological sensing,
and multimodal data fusion to support safety, wellbeing, and early detection of health-related conditions.
This includes research on diet and food understanding, combining visual and linguistic cues with computer vision,
unsupervised learning, and generative AI to model processes such as food degradation and monitor home activities.
My earlier work includes contributions to 3D human modeling and cross-cultural anthropometric analysis.
This focus has since shifted toward studying human motion—both in terms of motion generation and interaction with robotic systems—
as a foundation for advancing assistive technologies and human-centered AI.
Updates
Sept 2025: Paper accepted to ARC workshop @ HUMANOIDS 2025.
Sept 2025: Paper accepted to Data workshop @ CoRL 2025.
Sept 2025: Paper accepted to SIGGRAPH Asia 2025.
July 2025: Invited to review for Croatian Science Foundation (HRZZ).
April 2025: Paper accepted to IEEE Transactions on Instrumentation and Measurement.
January 2025: Started new project on generative motion models.
We introduce StableMotion, a diffusion-based approach for cleaning corrupted motion capture data without the need for paired clean datasets. By leveraging simple motion quality indicators, our method trains directly on raw, artifact-prone mocap data and can both detect and fix errors at test time. Applied to a 245-hour SoccerMocap dataset, StableMotion reduces motion pops by 68% and frozen frames by 81%, outperforming state-of-the-art methods while preserving motion content.
We investigate how different teleoperation interfaces shape the quality of robot learning data. In a study of 400 demonstrations on assistive tasks (wiping and lamp switching), we compare VR controllers and a haptic device, measuring operator workload with NASA-TLX and fine-tuning the Octo model under varying conditions. Our findings reveal clear modality-driven differences in data quality and downstream imitation learning performance, offering practical guidance for designing robot data collection strategies.
We present a comparative study on how different teleoperation modalities—markerless hand-tracking, VR controllers, and haptic devices—affect the quality of demonstrations and learned policies in imitation learning. Focusing on assistive manipulation tasks relevant to aging populations, we show that VR and haptic inputs provide more consistent demonstrations than hand-tracking alone. Moreover, combining modalities improves policy robustness and accuracy.
This paper presents a lightweight, embedded cough analysis model for smart home systems, enabling accurate and privacy-preserving health monitoring. By distilling knowledge from a large speech foundation model, the student network achieves strong performance across multiple cough-related tasks while dramatically reducing model size and computation. The proposed approach generalizes well to unseen sound classes and eliminates the need for cloud-based audio processing.
The paper presents an approach to modeling food degradation using fuzzy clustering, capturing the gradual nature of decay without relying on labeled data. It integrates traditional visual features with semantic features from Vision-Language Models to enable both low-level and high-level interpretation.