SoundSpaces

About the SoundSpaces Platform

Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf -- restricted to solely their visual perception of the environment. This project aims to fill this void by building agents capable of audio-visual navigation in complex, acoustically and visually realistic 3D environments.

SoundSpaces is a first-of-its-kind dataset of audio renderings based on geometrical acoustic simulations for two sets of publicly available 3D environments -- Matterport3D¹ and Replica². SoundSpaces is AIHabitat-compatible and allows rendering arbitrary sounds at any pair of source and receiver (agent) locations on a uniform grid of nodes. The room-impulse-responses (RIR) enable realistic audio experience of arbitrary sounds in the photorealistic environments.

Click on the gif to view the demonstration video. And listen with headphones to hear the spatial sound properly!

SoundSpaces dataset stats

85 scenes from Matterport3D

18 scenes from Replica

102 copyright-free sounds

Total 17.6M RIRs (16.7M for Matterport3D and 0.9M for Replica)

[New] SoundSpaces 2.0 features

Real-time acoustic simulation performance

Spatial and acoustic continuous rendering

Simulation parameters, microphones and materials are configurable

Generalize to arbitrary scene datasets and even in-the-wild scans

News

June 2022 — We release SoundSpaces 2.0: a fast, continuous, configurable and generalizable audio-visual simulation platform
Feb 2022 — SoundSpaces Challenge at CVPR 2022 Embodied AI workshop announced!
Feb 2021 — SoundSpaces Challenge at CVPR 2021 Embodied AI workshop announced!

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

Changan Chen, Carl Schissler, Sanchit Garg, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman

arXiv 2022 [PDF] [Code] [Project]

SoundSpaces: Audio-Visual Navigation in 3D Environments

Changan Chen*, Unnat Jain*, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman

ECCV 2020 [Bibtex] [PDF] [Code] [Project]

Other related papers

Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations

Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Henderson, Paul Calamia, Kristen Grauman, Vamsi Ithapu

arXiv 2023 [Bibtex] [PDF] [Project]

Few-Shot Audio-Visual Learning of Environment Acoustics

Sagnik Majumder, Changan Chen*, Ziad Al-Halah*, Kristen Grauman

NeurIPS 2022 [Bibtex] [PDF] [Project]

Active Audio-Visual Separation of Dynamic Sound Sources

Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

ECCV 2022 [Bibtex] [PDF] [Project]

Learning Audio-Visual Dereverberation

Changan Chen, Wei Sun, David Harwath, Kristen Grauman

arXiv 2021 [Bibtex] [PDF] [Project]

Move2Hear: Active Audio-Visual Source Separation

Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

ICCV 2021 [Bibtex] [PDF] [Project]

Semantic Audio-Visual Navigation

Changan Chen, Ziad Al-Halah, Kristen Grauman

CVPR 2021 [Bibtex] [PDF] [Project]

Audio-Visual Floorplan Reconstruction

Senthil Purushwalkam, Sebastian Vicenc Amengual Gari, Vamsi Krishna Ithapu, Carl Schissler, Philip Robinson, Abhinav Gupta, Kristen Grauman

arXiv 2020 [Bibtex] [PDF] [Project]

Learning to Set Waypoints for Audio-Visual Navigation

Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh K. Ramakrishnan, Kristen Grauman

ICLR 2021 [Bibtex] [PDF] [Project]

See, Hear, Explore: Curiosity via Audio-Visual Association

Victoria Dean, Shubham Tulsiani, Abhinav Gupta

NeurIPS 2020 [Bibtex] [PDF] [Project]

VisualEchoes: Spatial Image Representation Learning through Echolocation

Ruohan Gao, Changan Chen, Carl Schissler, Ziad Al-Halah, Kristen Grauman

ECCV 2020 [Bibtex] [PDF] [Code] [Project]

References

¹Matterport3D: Learning from RGB-D Data in Indoor Environments

Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, Yinda Zhang

3DV 2017

²The Replica Dataset: A Digital Replica of Indoor Spaces

Julian Straub et al.

arXiv 2019

Acknowledgements

UT Austin is supported in part by DARPA Lifelong Learning Machines. We thank Alexander Schwing, Dhruv Batra, Erik Wijmans, Oleksandr Maksymets, Ruohan Gao, and Svetlana Lazebnik for valuable discussions and support with the AI-Habitat platform. We also thank Abhishek Das for sharing the website code for visualdialog.org.

License

SoundSpaces and this website are licensed under a Creative Commons Attribution 4.0 International License.
Matterport3D based task datasets and trained models are distributed with its own terms and under CC BY-NC-SA 3.0 US License.
Replica based task datasets, the code for generating such datasets, and trained models are under Replica license.