Voice activity detection in eco-acoustic data enables privacy protection and is a proxy for human disturbance

Cretois, Benjamin; Rosten, Carolyn; Sethi, Sarab Singh

Cretois, Benjamin; Rosten, Carolyn; Sethi, Sarab Singh

Peer reviewed, Journal article

Published version

Åpne

Artikkel (3.762Mb)

Permanent lenke

https://hdl.handle.net/11250/3033728

Utgivelsesdato

2022

Sammendrag

1. Eco-acoustic monitoring is increasingly being used to map biodiversity across

large scales, yet little thought is given to the privacy concerns and potential

scientific value of inadvertently recorded human speech. Automated speech de tection is possible using voice activity detection (VAD) models, but it is not clear

how well these perform in diverse natural soundscapes. In this study we pre sent the first evaluation of VAD models for anonymization of eco-acoustic data

and demonstrate how speech detection frequency can be used as one potential

measure of human disturbance.

2. We first generated multiple synthetic datasets using different data preprocess ing techniques to train and validate deep neural network models. We evaluated

the performance of our custom models against existing state-of-the-art VAD

models using playback experiments with speech samples from a man, woman

and child. Finally, we collected long-term data from a Norwegian forest heavily

used for hiking to evaluate the ability of the models to detect human speech and

quantify a proxy for human disturbance in a real monitoring scenario.

3. In playback experiments, all models could detect human speech with high accu racy at distances where the speech was intelligible (up to 10 m). We showed that

training models using location specific soundscapes in the data preprocessing

step resulted in a slight improvement in model performance. Additionally, we

found that the number of speech detections correlated with peak traffic hours

(using bus timings) demonstrating how VAD can be used to derive a proxy for

human disturbance with fine temporal resolution.

4. Anonymizing audio data effectively using VAD models will allow eco-acoustic

monitoring to continue to deliver invaluable ecological insight at scale, while

minimizing the risk of data misuse. Furthermore, using speech detections as a

proxy for human disturbance opens new opportunities for eco-acoustic moni toring to shed light on nuanced human–wildlife interactions

Tidsskrift

Methods in Ecology and Evolution

Opphavsrett

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse-Ikkekommersiell 4.0 Internasjonal