About the SoundSpaces Platform
Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf -- restricted to solely their visual perception of the environment. This project aims to fill this void by building agents capable of audio-visual navigation in complex, acoustically and visually realistic 3D environments.
SoundSpaces is a first-of-its-kind dataset of audio renderings based on geometrical acoustic simulations
for two sets of publicly available 3D environments -- Matterport3D1 and Replica2.
SoundSpaces is AIHabitat-compatible and allows rendering arbitrary sounds at any pair of source and receiver
(agent) locations on a uniform grid of nodes. The room-impulse-responses (RIR) enable realistic audio
experience of arbitrary sounds in the photorealistic environments.
Click on the gif to view the demonstration video. And listen with headphones to hear the spatial sound properly!
SoundSpaces dataset stats
[New] SoundSpaces 2.0 features
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces: Audio-Visual Navigation in 3D Environments
Other related papers
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Few-Shot Audio-Visual Learning of Environment Acoustics
Active Audio-Visual Separation of Dynamic Sound Sources
Learning Audio-Visual Dereverberation
Move2Hear: Active Audio-Visual Source Separation
Semantic Audio-Visual Navigation
Audio-Visual Floorplan Reconstruction
Learning to Set Waypoints for Audio-Visual Navigation
See, Hear, Explore: Curiosity via Audio-Visual Association
VisualEchoes: Spatial Image Representation Learning through Echolocation
1Matterport3D: Learning from RGB-D Data in Indoor Environments
2The Replica Dataset: A Digital Replica of Indoor Spaces
UT Austin is supported in part by DARPA Lifelong Learning Machines. We thank Alexander Schwing, Dhruv Batra, Erik Wijmans, Oleksandr Maksymets, Ruohan Gao, and Svetlana Lazebnik for valuable discussions and support with the AI-Habitat platform. We also thank Abhishek Das for sharing the website code for visualdialog.org.
SoundSpaces and this website are licensed under a Creative Commons Attribution 4.0 International License.
Matterport3D based task datasets and trained models are distributed with its own terms and under CC BY-NC-SA 3.0 US License.
Replica based task datasets, the code for generating such datasets, and trained models are under Replica license.