SoundSpaces Challenge @ CVPR 2023 Embodied AI Workshop

Deadline: 00 days 00h 00m 00s

Overview

We are pleased to announce the third iteration of the SoundSpaces Challenge!

This year, we are hosting two challenges: the first challenge based on the audio-visual navigation task, where an agent is tasked to find a sound-making object in unmapped 3D environments with visual and auditory perception, and the second challenge based on the active audio-visual separation task, where an agent is tasked to separate the time-varying sound of a target audio source in unmapped 3D environments with visual and auditory perception..

In AudioGoal navigation (AudioNav), an agent is spawned at a random starting position and orientation in an unseen environment. A sound-emitting object is also randomly spawned at a location in the same environment. The agent receives a one-second audio input in the form of a waveform at each time step and needs to navigate to the target location. No ground-truth map is available and the agent must only use its sensory input (audio and RGB-D) to navigate.

In Active Audio-Visual Separation (active AV separation), an agent is spawned at a random starting position and orientation in an unseen environment. Multiple sound-emitting objects, each of which emits a time-varying sound, are also randomly spawned at a location in the same environment. The agent receives a one-second audio input in the form of a waveform, which is a mixture of the spatial sounds from all sources, at each time step and needs to navigate to separate the audio from a target source, denoted by a target class label, at every step of its motion. No ground-truth map is available and the agent must only use its sensory input (audio and RGB) to navigate. The current version of the challenge considers separation scenarios like speech vs. speech and speech. vs. music.

We believe intelligent agents of the future should be able to process multi-modal (audio-visual) inputs to fully understand the space. We encourage teams to participate and help push the state of the art in this exciting area!

Challenge: link

Dates

14 Feb 2023 — SoundSpaces Challenge 2023 announced!
1 March 2023 — Leaderboard opens for submissions.
03 June 2023 (23:59:59 AoE) — Submission deadline for participants.
19 Jun 2023 (expected) — Winners' announcement at the Embodied AI Workshop, CVPR 2023.

Dataset Description

The challenge will be conducted on the SoundSpaces Dataset, which is based on AI Habitat, Matterport3D, and Replica. For this challenge, we use the Matterport3D dataset due to its diversity and scale of environments. This challenge focuses on evaluating agents' ability to generalize to unheard sounds and unseen environments. For AudioNav, the training and validation splits are the same as used in Unheard Sound experiments reported in the SoundSpaces paper. They can be downloaded from the SoundSpaces repo. For active AV separation, the training and validation splits are the same as used in Unheard Sound experiments reported in the Active AV Dynamic Separation paper.

Participation Guidelines

To participate, teams must register on EvalAI and create a team for the challenge (see this quickstart guide). The challenge page is available here: link.

It is not acceptable to create multiple accounts for a single team in order to bypass these limits. The exception to this is if a group is working on multiple unrelated methods; in this case all sets of results can be submitted for evaluation. Results must be submitted to the evaluation server by the challenge deadline -- no exceptions will be made.

Starter Code

Simulator and baselines: @facebookresearch/sound-spaces
Challenge: @facebookresearch/soundspaces-challenge

The first repo provides implementations of two state-of-the-art models on audio-visual navigation as well as the simulation environment. The second repo provides instructions for participants to evaluate both locally and submit to leaderboard.

Submission Method

Participate in the contest by registering on the EvalAI challenge page and creating a team. Like last year, to make participation easier, you can directly upload predicted trajectories to the leaderboard. Before pushing the submissions to leaderboard, participants should test the submission locally to make sure it is working. For more details, please check out the instructions on our challenge repo @facebookresearch/soundspaces-challenge.

Winners' Announcement and Analysis

Winning teams of the SoundSpaces Challenge 2022 were announced at the Embodied AI Workshop. Presentation videos are included here.

2022 Challenge Winners

colab_buaa_sound_space team (Jinyu Chen, Chen Gao, Pengliang Ji, Yusheng Zhao, Wenguan Wang, Erli Meng and Si Liu) won this year's SoundSpaces Challenge (2022), while Freiburg Sound team (Abdelrahman Younes, Daniel Honerkamp, Tim Welschehold and Abhinav Valada) from University of Freiburg was the runner-up.

The video below shows the methods used by the winning team and runner-up.

2021 Challenge Winners

Freiburg Sound team (Abdelrahman Younes, Daniel Honerkamp, Tim Welschehold and Abhinav Valada) from University of Freiburg won the previous SoundSpaces Challenge (2021).

The video below consists of our 5 minutes summary of this year's challenge and winning team's 5 minutes presentation of their method.

Frequently Asked Questions (FAQ)

As we get questions, we'll include them here?
Their answers will come here.

I don't see my question here, what do I do?
Email us, see contact organizers below.

I have too many questions, can we schedule a 1-1 video call to help me understand?
Sure, we'd love to help! Don't hesitate to reach out.

Acknowledgments

The SoundSpaces Challenge would not have been possible without the infrastructure and support of EvalAI and Habitat team.

Organizers

Sagnik Majumder

UT Austin, FAIR

Changan Chen

UT Austin

Unnat Jain

FAIR

Ruohan Gao

Stanford University

Kristen Grauman

UT Austin, FAIR

Contact Organizers

Email — sagnik [at] cs [dot] utexas [dot] edu || changanvr [at] gmail [dot] com || unnatjain [at] gmail [dot] com