Spoken language recognition on Mozilla Common Voice — Audio Transformations. | by Sergey Vilov | Aug, 2023

This is the third article on spoken language recognition based on the Mozilla Common Voice dataset. In Part I, we discussed data selection and data preprocessing and in Part II we analysed performance of several neural network classifiers.

The final model achieved 92% accuracy and 97% pairwise accuracy. Since this model suffers from somewhat high variance, the accuracy could potentially be improved by adding more data. One very common way to get extra data is to synthesize it by performing various transformations on the available dataset.

In this article, we will consider 5 popular transformations for audio data augmentation: adding noise, changing speed, changing pitch, time masking, and cut & splice.

The tutorial notebook can be found here.

For illustration purposes, will use the sample common_voice_en_100040 from the Mozilla Common Voice (MCV) dataset. This is the sentence The burning fire had been extinguished.

import librosa as lr
import IPythonsignal, sr = lr.load('./transformed/common_voice_en_100040.wav', res_type='kaiser_fast') #load signal
IPython.display.Audio(signal, rate=sr)

Original sample common_voice_en_100040 from MCV.

Original signal waveform (image by the author)

Adding noise is the simplest audio augmentation. The amount of noise is characterised by the signal-to-noise ratio (SNR) — the ratio between maximal signal amplitude and standard deviation of noise. We will generate several noise levels, defined with SNR, and see how they change the signal.

SNRs = (5,10,100,1000) #Signal-to-noise ratio: max amplitude over noise stdnoisy_signal = {}
for snr in SNRs:
noise_std = max(abs(signal))/snr #get noise std
noise =  noise_std*np.random.randn(len(signal),) #generate noise with given std
noisy_signal[snr] = signal+noise
IPython.display.display(IPython.display.Audio(noisy_signal[5], rate=sr))
IPython.display.display(IPython.display.Audio(noisy_signal[1000], rate=sr))

Signals obtained by superimposing noise with SNR=5 and SNR=1000 on the original MCV sample common_voice_en_100040 (generated by the author).

Source link

This post originally appeared on TechToday.

In this article, we will consider 5 popular transformations for audio data augmentation: adding noise, changing speed, changing pitch, time masking, and cut & splice.

The tutorial notebook can be found here.

For illustration purposes, will use the sample common_voice_en_100040 from the Mozilla Common Voice (MCV) dataset. This is the sentence The burning fire had been extinguished.

import librosa as lr
import IPythonsignal, sr = lr.load('./transformed/common_voice_en_100040.wav', res_type='kaiser_fast') #load signal
IPython.display.Audio(signal, rate=sr)

Original sample common_voice_en_100040 from MCV.

SNRs = (5,10,100,1000) #Signal-to-noise ratio: max amplitude over noise stdnoisy_signal = {}
for snr in SNRs:
noise_std = max(abs(signal))/snr #get noise std
noise =  noise_std*np.random.randn(len(signal),) #generate noise with given std
noisy_signal[snr] = signal+noise
IPython.display.display(IPython.display.Audio(noisy_signal[5], rate=sr))
IPython.display.display(IPython.display.Audio(noisy_signal[1000], rate=sr))

Source link

This post originally appeared on TechToday.

Spoken language recognition on Mozilla Common Voice — Audio Transformations. | by Sergey Vilov | Aug, 2023

Spoken language recognition on Mozilla Common Voice — Audio Transformations. | by Sergey Vilov | Aug, 2023

Time masking

Cut & splice

Time masking

Cut & splice

Leave a Reply Cancel reply

Spoken language recognition on Mozilla Common Voice — Audio Transformations. | by Sergey Vilov | Aug, 2023

Spoken language recognition on Mozilla Common Voice — Audio Transformations. | by Sergey Vilov | Aug, 2023

Time masking

Cut & splice

Time masking

Cut & splice

Golden Pokies Casino Site Review

AyoBet Casino Online Game, Bonus, Aplikasi Seluler, dan Cara Login

As Estratégias Mais Eficazes para Maximizar Seus Ganhos em Cassinos Online

Leave a Reply Cancel reply