The 2nd Workshop on Nonverbal Cues for Human-Robot Cooperative Intelligence

Georgia World Congress Center in Atlanta, USA

23rd May

08:30 to 12:30

About the Workshop

This workshop is dedicated to discussing computational methods for sensing and recognition of nonverbal cues and internal states in the wild to realize cooperative intelligence between humans and intelligent systems. We gather researchers from different expertise, yet having the common goal, motivation, and resolve to explore and tackle this delicate issue considering the practicality of industrial applications. We are calling for papers to discuss novel methods to realize human-robot cooperative intelligence by sensing and understanding humans’ behavior, internal states, and to generate empathetic interactions.

Human internal state inference, e.g., cognitive, emotional, intention models.
Recognition of nonverbal cues, e.g., gaze and attention, body language, para-language.
Multi-modal sensing fusion for scene perception.
Nonverbal behavior generation for robots/agents, e.g., gaze salience, gesture.
Synchronization of nonverbal and verbal behavior
Learning algorithms, e.g., cross-embodiment and cross-context learning, imitation learning.
Generative and adversarial algorithms to enhance human-robot interaction, e.g., LLMs, diffusion models, VLMs.
Empathetic interaction between humans and intelligent systems.
Robust sensing of facial and body key points.
Social interaction dynamics modeling, e.g., harmony level, engagements.
Personalization of intelligent systems from nonverbal cues and trust evaluation.
Applications of cooperative intelligence in the wild.

Keywords: "Human: Face, gaze, body, pose, gesture, movement, attention, cognitivestate, emotion state, intention, empathy, Environment: Object"

Secondary subject: "Human-Robot cooperative intelligence", "Nonverbal cues recognition from audiovisual", "Human internal state inference from multi-modality", "Vision applications and systems", "Human-Object interaction and scene understanding"

Organizers

News updates

Sept 25th	Workshop webpage was launched.
Jan 20th	Submission can be made via Easychair.
May 24th	Call for paper for Advanced Robotics Special Issue on Nonverbal Cues for Human-Robot Cooperative Intelligence (submission deadline: 30 June 2025)

Call for Papers

Submission Guidelines

We invite authors to submit unpublished papers (2-4 pages excluding references) to our workshop, to be presented at a workshop session upon acceptance. Submissions will undergo a peer-review process by the workshop's program committee and accepted papers will be invited to present their works at the workshop (see presentation format).

We are pleased to announce that award will be given to the best paper accepted by this workshop.

Important Dates

~~Mar 10, 2025~~ → April 4, 2025
Workshop paper submission deadline
~~Mar 31, 2025~~ → April 25, 2025
Workshop paper reviews deadline
~~April 10, 2025~~ → April 25, 2025
Notification to authors
~~April 24, 2025~~ → May 9, 2025
Camera-ready deadline
~~May 23, 2025~~
Workshop day

Submission Instructions

Please use the IEEE conferences paper format to write your manuscript.

LaTex and MS Word template: https://www.ieee.org/conferences/publishing/templates.html

Please submit your paper electronically through the workshop's EasyChair submission system.

Link to EasyChair submission system: https://easychair.org/my/conference?conf=icra2025workshop

Presentation Format

Accepted papers should be presented in three-way presentation approach to foster active participation

Spotlight talks (5 mins talk, Q&A in the poster session)
In-person A0 posters for in-depth discussions
Short pre-recorded videos (about 2 minutes) to be uploaded on the workshop webpage

Publication Format

Authors are recommended to archive their papers and inform workshop organizers once this procedure is completed. Accepted papers which have been archived will be hosted on the workshop webpage.

Link to arXiv: https://info.arxiv.org/help/submit/index.html

As with the previous IROS2024 workshop, extensions of the papers presented at this ICRA2025 workshop will be invited to submit to a special issue journal to-be-announced at a later date.

Program

We plan a half-day event for 4 hours, including talks by four invited speakers. For participants who could not attend in person, we will disseminate the papers and pre-recorded videos on our workshop page, which also consists of a comment section for Q&A.

08:30 08:33 Welcome and opening remarks (3 mins)
08:33 09:13 Invited talk I by Tetsuya Ogata (40 mins including 2 mins Q&A)
09:13 10:03 Flash talks: Accepted workshop and invited papers (5 mins each)
10:03 10:28 Coffee break and poster session (25 mins)
10:28 11:08 Invited talk II by Marynel V´azquez(40 mins including 2 mins Q&A)
11:08 11:48 Invited talk III by Sehoon Ha(40 mins including 2 mins Q&A)
11:48 12:28 Invited talk IV by Tetsunari Inamura(40 mins including 2 mins Q&A)
12:28 12:30 Closing remarks and awards ceremony (2 mins)

Invited Speakers

We intend to have speakers from different ethnic backgrounds, countries, and career stages. Specifically, we confirmed the attendance of four speakers.

Invited Talk I

Learning Tasks and Interactions for AI Robots Assisting Human Daily Life

Tetsuya Ogata, Waseda University, Japan.

Link to website: https://ogata-lab.jp/member/ogata.html

Abstract

Generative AI based on foundation models has become a powerful tool for enabling robots to perform diverse and complex tasks. However, the large-scale nature of these models presents challenges for deployment on autonomous robots with limited computational resources. This presentation will discuss the importance of predictive inference (active inference) during task execution, particularly in the context of deploying generative AI on edge devices for robots. We will explore how this approach reduces reliance on large-scale training data and minimizes memory usage, making AI-powered assistance more feasible in real-world settings. Furthermore, we will introduce a developmental robotics perspective, focusing on constructing foundation models that do not rely on vast datasets but instead leverage continuous, adaptive learning.

Biography

Tetsuya Ogata received the B.S., M.S., and D.E. degrees in mechanical engineering fromWaseda University, Tokyo, Japan, in 1993, 1995, and 2000, respectively. He was a Research Associate with Waseda University from 1999 to 2001. From 2001 to 2003, he wasa Research Scientist with the RIKEN Brain Science Institute, Saitama, Japan. From 2003to 2012, he was an Associate Professor at the Graduate School of Informatics, Kyoto University, Kyoto, Japan. Since 2012, he has been a Professor with the Faculty of Science andEngineering, at Waseda University. From 2009 to 2015, he was a JST (Japan Science andTechnology Agency) PREST Researcher. Since 2017, he is a Joint-appointed Fellow withthe Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo. He served as director of the Robotics Society of Japan (RSJ)from 2014 to 2015 and of the Japanese Society of Artificial Intelligence (JSAI) from 2016to 2018. He is currently a member of the director board of the Japan Deep Learning Asso-ciation (JDLA) since 2017, and a director of the Institute of AI and Robotics, at WasedaUniversity since 2020. His current research interests include deep learning for robot motioncontrol, human–robot interaction, and dynamics of human–robot mutual adaptation.

Invited Talk II

Leveraging the Relational Structure of Interactions to Interpret and Generate Social Signals in HRI

Marynel V´azquez, Yale University, USA.

Link to website: https://cpsc.yale.edu/people/marynel-vazquez

Abstract

Many real-world applications require that robots engage appropriately in social interactions with small groups of people. For example, a robot that serves as a tour guide may approach a group of people in a museum to explain an artwork, or an educational robot may help a pair of people practice a language by encouraging them to talk to each other. In this talk, I will discuss lessons that we have learned over the last few years as we investigated ways to enable robots to reason about social signals in these small group interactions. In particular, I will discuss graph abstractions for representing social contexts, and data-driven models that leverage the relational structure of interaction data to interpret and generate social behavior in group human-robot interactions. Some of the work that I will be presenting is a collaboration with Sarah Gillet and Iolanda Leite at KTH.

Biography

Marynel Vázquez is an Assistant Professor in Yale’s Computer Science Department, where she leads the Interactive Machines Group. Her research focuses on advancing multi-party human-robot interaction, both by studying social group phenomena and developing perception and decision making algorithms that enable autonomous robot behavior. Marynel received her bachelor's degree in Computer Engineering from Universidad Simón Bolívar in 2008, and obtained her M.S. and Ph.D. in Robotics from Carnegie Mellon University in 2013 and 2017, respectively. Before joining Yale, she was a collaborator of Disney Research and a Post-Doctoral Scholar at the Stanford Vision & Learning Lab. Marynel received a 2024 AFOSR YIP Award, a 2022 NSF CAREER Award, two Amazon Research Awards, and a Google Research Scholar award. Her work has been recognized with best paper awards at HRI 2023 and RO-MAN 2022 as well as nominations for paper awards at HRI 2021, IROS 2018, and RO-MAN 2016.

Invited Talk III

Learning expressive motion controllers for robotic creatures

Sehoon Ha, Georgia Institute of Technology, USA.

Link to website: https://faculty.cc.gatech.edu/~sha9/

Abstract

Intelligent robot companions have the potential to improve the quality of human life significantly by changing how we live, work, and play. While recent advances in software and hardware opened a new horizon of robotics, state-of-the-art robots are yet far from being blended into our daily lives due to the lack of human-level scene understanding, motion control, safety, and rich interactions. I envision legged robots as intelligent machines beyond simple walking platforms, which can execute a variety of real-world motor tasks in human environments, such as home arrangements, last-mile delivery, and assistive tasks for disabled people. In this talk, I will discuss relevant multi-disciplinary research topics, particularly focusing on how we can extend deep reinforcement learning algorithms to learn more expressive motion controllers for complex robotic creatures.

Biography

Sehoon Ha is currently an assistant professor at the Georgia Institute of Technology. Before joining Georgia Tech, he was a research scientist at Google and Disney Research Pittsburgh. He received his Ph.D. degree in Computer Science from the Georgia Institute of Technology. His research interests lie at the intersection between computer graphics and robotics, including physics-based animation, deep reinforcement learning, and computational robot design. He is a recipient of the NSF CAREER Award. His work has been published at top-tier venues including ACM Transactions on Graphics, IEEE Transactions on Robotics, and International Journal of Robotics Research, nominated as the best conference paper (Top 3) in Robotics: Science and Systems, and featured in the popular media press such as IEEE Spectrum, MIT Technology Review, PBS News Hours, and Wired.

Invited Talk IV

Human-Centric Assistance: Enhancing Agency and Self-efficacy in Assistive Robot

Tetsunari Inamura, Tamagawa University, Japan.

Link to website: https://researchmap.jp/inamura/?lang=english

Abstract

The development of intelligent assistive robots for physical support and rehabilitation in nursing care is advancing rapidly. While traditional research has focused on optimizing physical assistance and improving motor performance, the integration of cognitive and psychological support remains underexplored. Conventionally, addressing users’ emotional and cognitive needs has been the role of physical therapists and healthcare professionals. However, next-generation assistive AI must seamlessly combine physical and mental support to empower users more holistically. This talk explores how assistive robots can enhance not only users’ motor performance but also their self-efficacy—the belief in their ability to perform tasks independently. By dynamically adjusting control parameters to influence the sense of agency and leveraging virtual reality to create tailored success experiences, assistive systems can foster greater confidence and autonomy. Additionally, I will discuss how virtual reality and digital twin technologies provide a scalable framework for optimizing human-robot collaboration, paving the way for more adaptive and psychologically aware assistive solutions.

Biography

Tetsunari Inamura received the B.E., M.E., and Ph.D. degrees from the University of Tokyo, Japan, in 1995, 1997, and 2000, respectively in the realization of a human-robot interaction system for personal robots. He was a Researcher of the CREST Program, LetterJapanese Science and Technology Cooperation, from 2000 to 2003, and then joined the Department of Mechano-Informatics, School of Information Science and Technology, University of Tokyo, as a Lecturer, from 2003 to 2006. He was an Associate Professor with the Principles of Informatics Research Division,National Institute of Informatics, and an Associate Professor with the Department of Informatics, School of Multidisciplinary Sciences, The Graduate University for Advanced Studies, SOKENDAI, Japan, from 2006 to 2023. He is a professor at Advanced Intelligence and Robotics Research Center, Brain Science Institute, Tamagawa University, Japan. His research interests are learning from human demonstration, symbol emergence on social robots, quality evaluation of human-robot interaction, human-robot interaction using virtual reality, affective computing for assistive robots, etc.

Flash talks

09:13 09:18

M3PT: A Transformer for Multimodal, Multi-Party Social Signal Prediction with Person-aware Blockwise Attention Yiming Tang; Abrar Anwar; Jesse Thomason

PDF video poster

09:18 09:23

Legged Robot Agility Guided by Human Gesture Taerim Yoon; Dongho Kang; Jin Cheng; Minsung Ahn; Stelian Coros and Sungjoon Choi

PDF video

09:23 09:28

A Multimodal Self-supervised AI Framework for Monitoring Challenge Behavior Risks in Children with ASD Zhenhao Zhao; Eunsun Chung; Kyoug-Mee Chung; Michelle Crawford; Chung Hyuk Park

PDF video poster

09:28 09:33

Empathetic engagement drives nonverbal interactions between humans and a small-scale robot Jude Fogarty; Ifeoma Nwogu; Ryan St. Pierre

PDF video poster

09:33 09:38

Towards Embedding Dynamic Personas in Interactive Robots: Masquerading Animated Social Kinematic (MASK) Jeongeun Park; Taemoon Jeong; Hyeonseong Kim; Taehyun Byun; Seungyoun Shin; Keunjun Choi; Jaewoon Kwon; Taeyoon Lee; Matthew Pan; Sungjoon Choi

PDF video poster

09:38 09:43

GazeHTA: End-to-end Gaze Target Detection with Head-Target Association Zhi-Yi Lin, Jouh Yeong Chew, Jan van Gemert, Xucong Zhang

PDF video poster

09:43 09:48

Implicit Behavioral Cues for Enhancing Trust and Comfort in Robot Social Navigation Yi Lian, J. Taery Kim, Sehoon Ha

PDF video poster

09:48 09:53

Context-aware collaborative pushing of heavy objects using skeleton-based intention prediction Gokhan Solak, Gustavo J. G. Lahr, Idil Ozdamar, Arash Ajoudani

PDF video poster

09:53 09:58

Let Me Help You! Neuro-Symbolic Short-Context Action Anticipation Sarthak Bhagat, Samuel Li, Joseph Campbell, Yaqi Xie, Katia Sycara, Simon Stepputtis

PDF video poster

09:58 10:03

COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models Divyanshu Daiya; Damon Conover; Aniket Bera

PDF video poster

Motivation and Background

Humans can perceive social cues and the interaction context of another human to infer the internal states including cognitive and emotional states, empathy, and intention. This unique ability to infer internal states leads to effective social interaction between humans desirable in many intelligent systems such as collaborative and social robots, and humanmachine interaction systems. However, it is challenging for machines to perceive human states under noisy real-world settings, which are usually measured by noninvasive sensors. Recent works investigating the potential solutions for the estimation of human states under controlled conditions using facial features with the off-the-shelf camera by leveraging deep learning methods. This workshop aims to bring interdisciplinary researchers across computer vision, artificial intelligence, robotics, and human-computer interaction together to share current research achievements and discuss future research directions for human behavior and state understanding, and their potential application, especially in the wild environment. Specifically, we are interested in cognition-aware computing by integrating environment contexts and multi-modal nonverbal social cues not limited to gaze interaction, body language and para language. More importantly, we extend multi-modal human behavior research to infer the internal states of humans. This is a challenging problem yet important to realize effective interaction between humans and intelligent systems.

It is desirable for intelligent systems like robots, virtual agents, human-machine interfaces to collaborate and interact seamlessly with humans in the era of Industry 5.0, where intelligent systems must work alongside humans to perform a variety of tasks anywhere at home, factories, offices, transit, etc. The underlying technologies to achieve efficient and intelligent collaboration between humans and ubiquitous intelligent systems can be realized by cooperative intelligence, spanning interdisciplinary studies between robotics, AI, human-robot and -computer interaction, computer vision, cognitive science, etc.

One of the main considerations to achieve cooperative intelligence between humans and intelligent systems is to enable everyone and everything to know each other well, like how humans can trust or infer the implicit internal states like intention, emotion, and cognitive states of each other. The importance of empathy to facilitate human-robot interaction has been highlighted in previous studies . However, it is difficult for intelligent systems to estimate the internal states of humans because they are dependent on the complex social dynamics and environment contexts. This requires intelligent systems to be capable of sensing the multi-modal inputs, reasoning the underlying abstract knowledge, and generating the corresponding responses to collaborate and interact with humans.

There are many studies on estimating internal states of humans through measurements of wearables and non-invasive sensors, but it would be difficult to implement these solutions in the wild because of the additional sensors to be worn by humans. One promising solution is to use audiovisual data like nonverbal behavior cues consisting of gaze interaction, facial expression, body language and paralanguage to infer the internal states of humans. Researchers in cognitive and social psychology have long advocated that these nonverbal behaviors are subconsciously generated by humans and reflect the internal states of humans under different contexts. Some salient examples are the studies on emotion recognition using facial and body language in controlled environment. It remains an open question for intelligent systems to sense and recognize nonverbal cues and reason the rich underlying internal states of humans in the wild and noisy environments.

Organizers

Jouh Yeong Chew

Honda Research Institute Japan

jouhyeong.chew@jp.honda-ri.com

Xucong Zhang

TU Delft

xucong.zhang@tudelft.nl

Iolanda Leite

KTH Royal Institute of Technology

iolanda@kth.se

Daisuke Kurabayashi

Tokyo Institute of Technology

kurabayashi.d.aa@m.titech.ac.jp

Eiichi Yoshida

Tokyo University of Science

eiichi.yoshida@rs.tus.ac.jp

Siyu Tang

ETH Zürich

siyu.tang@inf.ethz.ch

Andreas Bulling

University of Stuttgart

andreas.bulling@vis.uni-stuttgart.de

Sarah Gillet

KTH Royal Institute of Technology

sgillet@kth.se

The 2nd Workshop on Nonverbal Cues for Human-Robot Cooperative Intelligence

About the Workshop

Sponsors

Organizers

News updates

Call for Papers

Submission Guidelines

Important Dates

Submission Instructions

Presentation Format

Publication Format

Program

Invited Speakers

Invited Talk I

Learning Tasks and Interactions for AI Robots Assisting Human Daily Life

Invited Talk II

Leveraging the Relational Structure of Interactions to Interpret and Generate Social Signals in HRI

Invited Talk III

Learning expressive motion controllers for robotic creatures

Invited Talk IV

Human-Centric Assistance: Enhancing Agency and Self-efficacy in Assistive Robot

Flash talks

Motivation and Background

Organizers

Jouh Yeong Chew

Xucong Zhang

Iolanda Leite

Daisuke Kurabayashi

Eiichi Yoshida

Siyu Tang

Andreas Bulling

Sarah Gillet