SoundSpaces Challenge @ CVPR 2021 Embodied AI Workshop



We are pleased to announce the first SoundSpaces Challenge!

This year, we are hosting a challenge on the audio-visual navigation task, where an agent is tasked to find a sound-making object in unmapped 3D environments with visual and auditory perception.

In AudioGoal navigation (AudioNav), an agent is spawned at a random starting position and orientation in an unseen environment. A sound-emitting object is also randomly spawned at a location in the same environment. The agent receives a one-second audio input in the form of a waveform at each time step and needs to navigate to the target location. No ground-truth map is available and the agent must only use its sensory input (audio and RGB-D) to navigate.

We believe intelligent agents of the future should be able to process multi-modal (audio-visual) inputs to fully understand the space. We encourage teams to participate and help push the state of the art in this exciting area!



17 Feb 2021 — SoundSpaces Challenge 2021 announced!
1 March 2021 — Leaderboard opens for submissions.
08 June 2021 (23:59:59 AoE) — Submission deadline for participants.
TBD (19/20/25) Jun 2021 — Winners' announcement at the Embodied AI Workshop, CVPR 2021.

Dataset Description

The challenge will be conducted on the SoundSpaces Dataset, which is based on AI Habitat, Matterport3D, and Replica. For this challenge, we use the Matterport3D dataset due to its diversity and scale of environments. This challenge focuses on evaluating agents' ability to generalize to unheard sounds and unseen environments. The training and validation splits are the same as used in Unheard Sound experiments reported in the SoundSpaces paper. They can be downloaded from the SoundSpaces repo. For the challenge test split, we will use new sounds that are not currently publicly available on the website.

Participation Guidelines

To participate, teams must register on EvalAI and create a team for the challenge (see this quickstart guide). The challenge page is available here:

The challenge has three phases:

Phase # Episodes Submissions Results Leaderboard
minival 20 max 100 per day immediate none
test-standard 1000 max 10 per day immediate public (optional)
test-challenge 1000 max 5 per day announced at CVPR 2021 private, announced at CVPR 2021

Minival phase is useful for sanity checking submitted code without wasting submissions in the other phases. Test-standard phase uses the test-multiple-unheard split used in the paper, which serves the purpose of benchmarking. Test-challenge is a new test split with different sound and agent location configurations from test-standard. For the test-standard and test-challenge phases, the results must be submitted on the full test set. By default, the submissions for test-standard phase are private but can be voluntarily released to the public leaderboard, with a limit of one public leaderboard entry per team. Submissions to test-challenge phase are considered entries into the challenge. For multiple submissions to test-challenge, the approach with the highest test-standard accuracy will be used.

It is not acceptable to create multiple accounts for a single team in order to bypass these limits. The exception to this is if a group is working on multiple unrelated methods; in this case all sets of results can be submitted for evaluation. Results must be submitted to the evaluation server by the challenge deadline -- no exceptions will be made.

Starter Code

The first repo provides implementations of two state-of-the-art models on audio-visual navigation as well as the simulation environment. The second repo provides instructions for participants to submit a trained model via docker image and evaluate it both locally and remotely.

Submission Method

Participate in the contest by registering on the EvalAI challenge page and creating a team. Participants will upload docker containers with their agents that evaluated on a AWS GPU-enabled instance. Before pushing the submissions for remote evaluation, participants should test the submission docker locally to make sure it is working. For more details, please check out the instructions on our challenge repo @facebookresearch/soundspaces-challenge.

Winners' Announcement and Analysis

Winning teams of the SoundSpaces Challenge 2021 will be announced at the Embodied AI Workshop. Presentation videos will be included here.

The winning team will receive $5k Amazon Web Services Cloud credits. Thank you for the support AWS!

Challenge 2021 Summary and Winners' Announcement

Freiburg Sound team (Abdelrahman Younes, Daniel Honerkamp, Tim Welschehold and Abhinav Valada) from University of Freiburg won the SoundSpaces Challenge 2021! Congratulations!

The video below consists of our 5 minutes summary of this year's challenge and winning team's 5 minutes presentation of their method.

Frequently Asked Questions (FAQ)

  • As we get questions, we'll include them here?
  • Their answers will come here.

  • I don't see my question here, what do I do?
  • Email us, see contact organizers below.

  • I have too many questions, can we schedule a 1-1 video call to help me understand?
  • Sure, we'd love to help! Don't hesitate to reach out.


The SoundSpaces Challenge would not have been possible without the infrastructure and support of EvalAI and Habitat team.


Changan Chen
UT Austin, FAIR
Kristen Grauman
UT Austin, FAIR

Email — changanvr [at] gmail [dot] com and unnatjain [at] gmail [dot] com