About the SoundSpaces Platform


Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf -- restricted to solely their visual perception of the environment. This project aims to fill this void by building agents capable of audio-visual navigation in complex, acoustically and visually realistic 3D environments.

SoundSpaces is a first-of-its-kind dataset of audio renderings based on geometrical acoustic simulations for two sets of publicly available 3D environments -- Matterport3D1 and Replica2. SoundSpaces is AIHabitat-compatible and allows rendering arbitrary sounds at any pair of source and receiver (agent) locations on a uniform grid of nodes. The room-impulse-responses (RIR) enable realistic audio experience of arbitrary sounds in the photorealistic environments.

Click on the gif to view the demonstration video. And listen with headphones to hear the spatial sound properly!




SoundSpaces dataset stats

  • 85 scenes from Matterport3D
  • 18 scenes from Replica
  • 102 copyright-free sounds
  • Total 17.6M RIRs (16.7M for Matterport3D and 0.9M for Replica)


  • News

    Feb 2021 — SoundSpaces Challenge at CVPR 2021 Embodied AI workshop announced!

    SoundSpaces: Audio-Visual Navigation in 3D Environments

    Changan Chen*, Unnat Jain*, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman
    ECCV 2020 [Bibtex] [PDF] [Code] [Project]

    Other related papers

    Semantic Audio-Visual Navigation

    Changan Chen, Ziad Al-Halah, Kristen Grauman
    CVPR 2021 [Bibtex] [PDF] [Project]


    Audio-Visual Floorplan Reconstruction

    Senthil Purushwalkam, Sebastian Vicenc Amengual Gari, Vamsi Krishna Ithapu, Carl Schissler, Philip Robinson, Abhinav Gupta, Kristen Grauman
    arXiv 2020 [Bibtex] [PDF] [Project]


    Learning to Set Waypoints for Audio-Visual Navigation

    Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh K. Ramakrishnan, Kristen Grauman
    ICLR 2021 [Bibtex] [PDF] [Project]


    See, Hear, Explore: Curiosity via Audio-Visual Association

    Victoria Dean, Shubham Tulsiani, Abhinav Gupta
    NeurIPS 2020 [Bibtex] [PDF] [Project]


    VisualEchoes: Spatial Image Representation Learning through Echolocation

    Ruohan Gao, Changan Chen, Carl Schissler, Ziad Al-Halah, Kristen Grauman
    ECCV 2020 [Bibtex] [PDF] [Code] [Project]

    References

    1Matterport3D: Learning from RGB-D Data in Indoor Environments

    Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, Yinda Zhang
    3DV 2017


    2The Replica Dataset: A Digital Replica of Indoor Spaces

    Julian Straub et al.
    arXiv 2019

    Acknowledgements

    UT Austin is supported in part by DARPA Lifelong Learning Machines. We thank Alexander Schwing, Dhruv Batra, Erik Wijmans, Oleksandr Maksymets, Ruohan Gao, and Svetlana Lazebnik for valuable discussions and support with the AI-Habitat platform. We also thank Abhishek Das for sharing the website code for visualdialog.org.


    License




    SoundSpaces and this website are licensed under a Creative Commons Attribution 4.0 International License.
    Matterport3D based task datasets and trained models are distributed with its own terms and under CC BY-NC-SA 3.0 US License.
    Replica based task datasets, the code for generating such datasets, and trained models are under Replica license.