Voice activity detection in eco-acoustic data enables privacy protection and is a proxy for human disturbance
Peer reviewed, Journal article
Published version
Åpne
Permanent lenke
https://hdl.handle.net/11250/3033728Utgivelsesdato
2022Metadata
Vis full innførselSamlinger
- Scientific publications [1437]
Originalversjon
10.1111/2041-210X.14005Sammendrag
1. Eco-acoustic monitoring is increasingly being used to map biodiversity across
large scales, yet little thought is given to the privacy concerns and potential
scientific value of inadvertently recorded human speech. Automated speech de tection is possible using voice activity detection (VAD) models, but it is not clear
how well these perform in diverse natural soundscapes. In this study we pre sent the first evaluation of VAD models for anonymization of eco-acoustic data
and demonstrate how speech detection frequency can be used as one potential
measure of human disturbance.
2. We first generated multiple synthetic datasets using different data preprocess ing techniques to train and validate deep neural network models. We evaluated
the performance of our custom models against existing state-of-the-art VAD
models using playback experiments with speech samples from a man, woman
and child. Finally, we collected long-term data from a Norwegian forest heavily
used for hiking to evaluate the ability of the models to detect human speech and
quantify a proxy for human disturbance in a real monitoring scenario.
3. In playback experiments, all models could detect human speech with high accu racy at distances where the speech was intelligible (up to 10 m). We showed that
training models using location specific soundscapes in the data preprocessing
step resulted in a slight improvement in model performance. Additionally, we
found that the number of speech detections correlated with peak traffic hours
(using bus timings) demonstrating how VAD can be used to derive a proxy for
human disturbance with fine temporal resolution.
4. Anonymizing audio data effectively using VAD models will allow eco-acoustic
monitoring to continue to deliver invaluable ecological insight at scale, while
minimizing the risk of data misuse. Furthermore, using speech detections as a
proxy for human disturbance opens new opportunities for eco-acoustic moni toring to shed light on nuanced human–wildlife interactions