About the SoundSpaces Platform

Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf -- restricted to solely their visual perception of the environment. This project aims to fill this void by building agents capable of audio-visual navigation in complex, acoustically and visually realistic 3D environments.

SoundSpaces is a first-of-its-kind dataset of audio renderings based on geometrical acoustic simulations for two sets of publicly available 3D environments -- Matterport3D1 and Replica2. SoundSpaces is AIHabitat-compatible and allows rendering arbitrary sounds at any pair of source and receiver (agent) locations on a uniform grid of nodes. The room-impulse-responses (RIR) enable realistic audio experience of arbitrary sounds in the photorealistic environments.

Click on the gif to view the demonstration video. And listen with headphones to hear the spatial sound properly!

SoundSpaces dataset stats

  • 85 scenes from Matterport3D
  • 18 scenes from Replica
  • 102 copyright-free sounds
  • Total 17.6M RIRs (16.7M for Matterport3D and 0.9M for Replica)

  • [New] SoundSpaces 2.0 features

  • Real-time acoustic simulation performance
  • Spatial and acoustic continuous rendering
  • Simulation parameters, microphones and materials are configurable
  • Generalize to arbitrary scene datasets and even in-the-wild scans

  • News

    June 2022 — We release SoundSpaces 2.0: a fast, continuous, configurable and generalizable audio-visual simulation platform
    Feb 2022 — SoundSpaces Challenge at CVPR 2022 Embodied AI workshop announced!
    Feb 2021 — SoundSpaces Challenge at CVPR 2021 Embodied AI workshop announced!

    SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

    Changan Chen, Carl Schissler, Sanchit Garg, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman
    arXiv 2022 [PDF] [Code] [Project]

    SoundSpaces: Audio-Visual Navigation in 3D Environments

    Changan Chen*, Unnat Jain*, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman
    ECCV 2020 [Bibtex] [PDF] [Code] [Project]

    Other related papers

    Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations

    Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Henderson, Paul Calamia, Kristen Grauman, Vamsi Ithapu
    arXiv 2023 [Bibtex] [PDF] [Project]

    Few-Shot Audio-Visual Learning of Environment Acoustics

    Sagnik Majumder, Changan Chen*, Ziad Al-Halah*, Kristen Grauman
    NeurIPS 2022 [Bibtex] [PDF] [Project]

    Active Audio-Visual Separation of Dynamic Sound Sources

    Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
    ECCV 2022 [Bibtex] [PDF] [Project]

    Learning Audio-Visual Dereverberation

    Changan Chen, Wei Sun, David Harwath, Kristen Grauman
    arXiv 2021 [Bibtex] [PDF] [Project]

    Move2Hear: Active Audio-Visual Source Separation

    Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
    ICCV 2021 [Bibtex] [PDF] [Project]

    Semantic Audio-Visual Navigation

    Changan Chen, Ziad Al-Halah, Kristen Grauman
    CVPR 2021 [Bibtex] [PDF] [Project]

    Audio-Visual Floorplan Reconstruction

    Senthil Purushwalkam, Sebastian Vicenc Amengual Gari, Vamsi Krishna Ithapu, Carl Schissler, Philip Robinson, Abhinav Gupta, Kristen Grauman
    arXiv 2020 [Bibtex] [PDF] [Project]

    Learning to Set Waypoints for Audio-Visual Navigation

    Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh K. Ramakrishnan, Kristen Grauman
    ICLR 2021 [Bibtex] [PDF] [Project]

    See, Hear, Explore: Curiosity via Audio-Visual Association

    Victoria Dean, Shubham Tulsiani, Abhinav Gupta
    NeurIPS 2020 [Bibtex] [PDF] [Project]

    VisualEchoes: Spatial Image Representation Learning through Echolocation

    Ruohan Gao, Changan Chen, Carl Schissler, Ziad Al-Halah, Kristen Grauman
    ECCV 2020 [Bibtex] [PDF] [Code] [Project]


    1Matterport3D: Learning from RGB-D Data in Indoor Environments

    Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, Yinda Zhang
    3DV 2017

    2The Replica Dataset: A Digital Replica of Indoor Spaces

    Julian Straub et al.
    arXiv 2019


    UT Austin is supported in part by DARPA Lifelong Learning Machines. We thank Alexander Schwing, Dhruv Batra, Erik Wijmans, Oleksandr Maksymets, Ruohan Gao, and Svetlana Lazebnik for valuable discussions and support with the AI-Habitat platform. We also thank Abhishek Das for sharing the website code for visualdialog.org.


    SoundSpaces and this website are licensed under a Creative Commons Attribution 4.0 International License.
    Matterport3D based task datasets and trained models are distributed with its own terms and under CC BY-NC-SA 3.0 US License.
    Replica based task datasets, the code for generating such datasets, and trained models are under Replica license.