π Berlin Noise Dataset
A collection of city-noise recordings of Berlin and other places.
The audio recordings are collected by Thomas Rosen and are stored at github.com/thomasrosen/berlin_noise.
The name is a wordplay on "Perlin Noise" and the city-name "Berlin", where the project started. This project is not affiliated with the city of Berlin.
Contact: [email protected]
About the dataset
The dataset contains recordings of city-noise in Berlin and other places. I regulary update the repository with more recordings.
Amount of recordings: 96
Sum length of all recordings: 02:27:40.00 (8860 seconds)
Average length of a recording: 01:32.00 (92 seconds)
Whats in the dataset?
Each entry has some tags. Here is an overview of these tags. In each row: the tag and how much of the dataset is tagged with it. Only tags with more than one recording are shown.
- outside 92
- people 49
- birds 31
- cars 31
- bikes 22
- busses 13
- trees 13
- music 11
- wind 11
- inside 10
- street 7
- forest 4
- bikes passing 3
- construction site 3
- people talking 3
- trains 3
- bells 2
- boots 2
- cars in the distance 2
- dog barking 2
- heavy traffic 2
- highway 2
- many cars driving by 2
- pedestrian-traffic-light 2
- people chatting 2
- rain 2
- some cars 2
- traffic 2
- train station announcements 2
Map of the recordings
Red = Recording / Black = Berlin / Grey = Potsdam
Recordings per hour of day
The graph shows the amount of recordings per hour of day.
Use cases (ideas)
Here are some ideas on what you could do with this dataset. If you have more ideas, please let me know! I'll add the good ones to the list :)
If you have a project using the dataset, please also let me know, I would love to see it!
- π·οΈ Tagging of background noise
- πΎ Background noise generation for games/VR and maps. The tags and geo-location could be helpful in combination with OpenStreetMap features.
- πΌοΈ Maybe image to sound-scape with Google-StreetView-like images.
- πΆ Usage in music and other media.
- π¦οΈIf there is enough recordings in different weather situations, prediction and generation of this type could also be possible. OpenWeatherMap could be helpful.
- π¦ Distribution of bird in Berlin. Which bird it is could be checked with the eBird-database.
- π Noise pollution statistics. (there is already a good map for Berlin.)
- Maybe separation of fore- and background noise. (Is this missing foreground noise data?)
π€ More ideas, generated with ChatGPTβ¦
Here are some additional ideas for utilizing the Berlin Noise Dataset:
- π₯Ύ Soundscape-based urban exploration: Develop a mobile application that uses the audio recordings to create immersive soundscapes for users as they navigate different neighborhoods or landmarks in Berlin. This can enhance the exploration experience and provide a unique perspective on the city.
- π Noise-based sentiment analysis: Explore the correlation between background noise characteristics and the sentiment or mood of specific locations. By analyzing the audio recordings alongside user-generated sentiment data (e.g., from social media), you can investigate how noise levels and types affect people's perceptions and emotions in different areas of Berlin.
- ποΈ Noise-aware urban planning: Utilize the dataset to inform urban planning decisions, such as determining suitable locations for residential areas, parks, or schools based on noise levels and types. This can contribute to creating more livable and sustainable urban environments.
- πΆ Soundscape composition for therapeutic purposes: Use the audio recordings as a resource for creating calming or soothing soundscapes that can be employed in relaxation therapy, meditation apps, or wellness spaces. The diverse range of background noises can help create immersive and tranquil environments.
- π Acoustic fingerprinting for location recognition: Develop a system that can recognize specific locations or neighborhoods in Berlin based on their unique acoustic fingerprints. By training a machine learning model on the audio recordings, you can enable location identification using sound data.
- π Noise-based storytelling: Combine the audio recordings with narratives, anecdotes, or historical information about specific places in Berlin to create interactive audio tours or storytelling experiences. This can add a rich layer of context and engagement for tourists or locals exploring the city.
- π€ Soundscape-based machine learning applications: Utilize the dataset to train machine learning models for various applications, such as audio event detection, urban sounds classification, or soundscape synthesis. These models can have practical uses in fields like smart city development, soundscape research, or soundscape generation for virtual environments.
License
This dataset (recordings, metadata and website) is licensed under the Creative Commons Attribution 4.0 International License.
The license does not cover the geojson files for Potsdam and Berlin.
You can use the recordings for any purpose, as long as you give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
You can use the following text to give appropriate credit:
Berlin Noise by Thomas Rosen is licensed under CC BY 4.0
BibTeX
@misc{berlinnoise,
author = {Rosen, Thomas},
title = {Berlin Noise Dataset},
howpublished = {GitHub repository},
year = {2024},
url = {https://github.com/thomasrosen/berlin_noise},
note = {Recordings, metadata, and website licensed under Creative Commons Attribution 4.0 International License},
contact = {[email protected]}
}
Related Datasets
A few datsets that also contain environmental sound recordings.
- Urban sound datasets: "Two datasets and a taxonomy for urban sound research" (also on kaggel)
- From TU-Berlin: DNC: Dataset for Noise Classification: "The DNC dataset contains 4377 environmental background noise recordings labeled according to the type of noise."