顯示具有 sound 標籤的文章。 顯示所有文章
顯示具有 sound 標籤的文章。 顯示所有文章

2022/03/15

Python: Short-Time Fourier Transform (STFT) and its inverse transform (ISTFT) using librosa

Below shows an example using STFT to transform speech into the frequency domain and ISTFT to transform the spectral data back to waveforms in the time domain. The Librosa package is used.

import librosa

import soundfile as sf

x, sr = librosa.load('1.wav', sr=None)


n_fft = 256

hop_length = 128

win_length = 256


X = librosa.stft(x, n_fft=n_fft, hop_length=hop_length, win_length=win_length)


y = librosa.istft(X, n_fft=n_fft, hop_length=hop_length, win_length=win_length)


sf.write('1_istft.wav', y, 16000)

Results for x and y:



References:

Matlab: Short-Time Fourier Transform (STFT) and its inverse transform (ISTFT) (StudyEECC)

librosa.stft

librosa.istft

2022/03/10

Matlab: Short-Time Fourier Transform (STFT) and its inverse transform (ISTFT)

For sound processing in realtime scenarios, it is not possible to wait for a complete sound file. In this case, short-time processing is important.

Here is an example using STFT to transform speech into the frequency domain and inverse transform the spectral data back to waveforms in the time domain:


[S,F,T] = stft(x, fs,'Window',win2,'OverlapLength',128,'FFTLength',256); 

[y,ti] = istft(S,fs,'Window',win2,'OverlapLength',128,'FFTLength',256);

References:

stft - Short-time Fourier transform (MathWorks)

istft - Inverse short-time Fourier transform (MathWorks)

2021/07/10

Audacity: Change the sample rate from 44.1 kHz to 16 kHz

To change the sample rate of a sound file is easy. Just use Audacity.

Select the sound tracks and then select Tracks/Resample.



Enter the new sample rate, e.g. 16000.


Press OK. The sample rate becomes 16000 Hz.

Related Information:

Audacity: Change audio speed and keep the pitch of the talker unchanged (StudyEECC)

2021/06/12

Audacity: Change audio speed and keep the pitch of the talker unchanged

If the speed of a sound file is increased, the pitch is also increased. This experiment can be easily made using Audacity.


Select Effect/Change Speed to change the speed of the sound file.


Increase the speed by 50% in this example.

The sound file is shorten to 2/3 of the original length.

By playing the new sound file, the pitch of the sound is increased. This phenomenon is known as the chipmunk effect. In the very early video clip for Chip 'n' Dale, "Private Pluto" video clip in 1943, the voices of the two chipmunk roles were produced by speeding up normal speech. (Wikipedia)

If we want to speech up without changing pitch to higher frequencies like chipmunks, simply choose Effect/Change Tempo instead.

Adjust the tempo by 50%.

Results:
The 1.5x tempo speech file sounds similar to the original file in pitch, but much faster.


Zoom in to show the oscillations in waveforms and frequencies in spectrograms:


Reference:

2020/10/06

Matlab: Spectrogram of white noise processed by a 500-Hz Butterworth lowpass filter

Matlab code:

clc;clear;close all;

 

Fs = 16000;  %Sampling frequency

Fco = 500;   %Cutoff frequency

order = 4;

[b, a] = butter(order, Fco/(Fs/2), 'low');

 

%Generate white noise

x=2*rand(16000,1)-1;

 

y = filter(b, a, x); %get time domain result

 

spectrogram(x,[],[],[],Fs,'yaxis');colormap hot;title('x');figure

spectrogram(y,[],[],[],Fs,'yaxis');colormap hot;title('y');

Result:


References:

Divide a speech utterance into 3 bands

Apply a LPF in the time and frequency using filter() and freqz()

2020/06/25

STOI Code Test for Matlab and Python

(這篇文章主要是在測試Matlab和Python兩個版本程式所計算出的STOI值是否有差異)
STOI (Short-time objective intelligibility) is a prediction metric useful to replace previous objective prediction measures.

I tried to check if the Matlab and Python versions of STOI codes work equally with the identical output scores.

The Matlab code is available from Cees Taal's web site.

The python version is available from Pariente Manuel's GitHub.

The same processed wav file and reference wav file are tested using Matlab and Python.

The Python version needs a few configurations.

Create minicoda environments
conda create -n py38_STOI python=3.8 pytorch
conda activate py38_STOI

Install sound packages
pip install SoundFile
conda install -c conda-forge librosa

Run the python file.
python STOI_test.py

where the STOI_test.py is modified from Pariente Manuel's GitHub:

import soundfile as sf
from pystoi import stoi #or use from pystoi.stoi import stoi

clean, fs = sf.read('my_clean.wav')
denoised, fs = sf.read('my_processed.wav')

# Clean and den should have the same length, and be 1D
d = stoi(clean, denoised, fs, extended=False)
print(d)

And the result is


0.8157936197843473


The result of Matlab version is

>> d = stoi(y_my_clean, y_my_processed, fs)
d =
    0.8158

The results are similar, but there is still a tiny difference when showing more decimal points using digits() and vpa() commands.

>> digits(16)
>> vpa(d)
ans =
0.8157974711433416

The difference between Matlab and Python versions is small than 0.00001 and hence can be ignored.

References:

Cees Taal's web site
Pariente Manuel's GitHub

2019/11/06

Matlab: Parseval's theorem

Parseval's theorem states that for N discrete points of signal,

Total Energy ∑ |x[n]|2 = 1/N × ∑ |X[k]|2

where X[k] is the kth point of the discrete Fourier transform of x[n].

Let's check this with Matlab:


load mtlb; %Load "Matlab" sound example

N = 128;
y = mtlb(1:N); %Read the 1st 128 points only

%Check Parseval's theorem

%y^2
y_2 = y.^2;

%Energy
energy = sum(y_2)

%Frequency domain
y_fft = fft(y); %Complex double

%|y_fft|^2

y_fft_2 = abs(y_fft).^2;

y_fft_2_check = y_fft.*conj(y_fft); %Should be identical to y_fft_2

energy_fft = sum(y_fft_2)/N

energy_fft_check = sum(y_fft_2_check)/N

Result:

Reference:

energy =

    2.5595


energy_fft =

    2.5595


energy_fft_check =

    2.5595

>> 

Parseval's theorem (Wikipedia)

2018/04/18

Signal-to-Noise Ratio (SNR)

Signal-to-Noise Ratio (SNR) 訊號雜訊比/訊雜比/訊噪比/信噪比

SNR = (signal power)/(noise power) = Psignal/Pnoise
SNRdB = 10log10(SNR) = 10log10(Psignal)-10log10(Pnoise) in decibels (dB)

Therefore, if Psignal > Pnoise,

SNR > 1 and SNRdB > 0 dB.

If SNRdB < 0, Psignal < Pnoise. In other words, when the SNRdB is negative, the power of the signal is weaker than the power of background noise.

Reference

Signal-to-noise ratio/訊號雜訊比 (Wikipedia)

2018/02/06

Matlab: Generate a pure tone with *.m file

To play a pure tone (sine wave) in Matlab, type the following code in a new file and save it as genTone.m:

function genTone(Fs, F, time)
t = 0:1/Fs:time;
y = sin(2*pi*F*t);
sound(y,Fs);
end

Now play sound in command

>> genTone(16000,100,0.5);
>> genTone(16000,1000,0.5);

The 3000Hz sound below will be wrong since it is greater than the 4000/2 = 2000 Hz Nyquist frequency.

>> genTone(4000,3000,0.5);

References:

how do i generate sound using MATLAB?
Chapter 2: Practical Audio Processing / tonegen.m (http://mcloughlin.eu/)

2018/01/26

Matlab: Divide a speech utterance into 3 bands


>> load mtlb %Load the 'Matlab' speech / dimention: 4001x1
>> sound(mtlb,Fs); %Play the original sound
>> len = length(mtlb); %Number of points
>> time = [1:len]/Fs; %x-axis time vector / dimention: 1x4001
>> subplot(5,1,1);
>> plot(time,mtlb);
>> title('Original Speech');

>> Fco1 = 1000/(Fs/2); %Cutoff freq. 1
>> Fco2 = 2000/(Fs/2); %Cutoff freq. 2

>> [Bhpf, Ahpf] = butter(4,Fco2,'high'); %design a HPF
>> [Bbpf, Abpf] = butter(4,[Fco1 Fco2],'bandpass'); %design a BPF
>> [Blpf, Alpf] = butter(4,Fco1,'low'); %design a LPF


>> y_hpf = filter(Bhpf, Ahpf, mtlb); %get time domain result

>> y_bpf = filter(Bbpf, Abpf, mtlb); %get time domain result
>> y_lpf = filter(Blpf, Alpf, mtlb); %get time domain result

>> subplot(5,1,2);
>> y_reconstruct = y_hpf + y_bpf + y_lpf;
>> plot(time,y_reconstruct);
>> title('Reconstructed Speech');
>> sound(y_reconstruct,Fs); %Play the reconstructed sound

>> subplot(5,1,3);
>> plot(time,y_hpf);
>> title('HPF Speech');
>> sound(y_hpf,Fs); %Play the HPF sound


>> subplot(5,1,4);
>> plot(time,y_bpf);
>> title('BPF Speech');
>> sound(y_bpf,Fs); %Play the BPF sound


>> subplot(5,1,5);
>> plot(time,y_lpf);
>> title('LPF Speech');
>> sound(y_lpf,Fs); %Play the LPF sound


Result:


2018/01/06

Matlab: Apply a LPF in the time and frequency using filter() and freqz()

The code below shows how to implement a 1000Hz Butterworth low pass filter using the filter() function in time domain and freqz() function in the complex frequency domain.

>> load mtlb %Load the 'Matlab' speech / dimention: 4001x1
>> sound(mtlb,Fs); %Play the original sound
>> len = length(mtlb); %Number of points
>> time = [1:len]/Fs; %x-axis time vector / dimention: 1x4001
>> subplot(3,1,1);
>> plot(time,mtlb);
>> title('Original Speech');

>> [b, a] = butter(3,1000/(Fs/2),'low'); %design a Butterworth filter


Time domain

>> y1 = filter(b, a, mtlb); %get time domain result directly % dimention: 4001x1
>> sound(y1,Fs); %Play the sound processed by filter()
>> subplot(3,1,2);
>> plot(time,y1);
>> title('filter()');

Frequency domain (Method 1)
>> mtlb_FFT = fft(mtlb); %transformed input signal / dimention: 4001x1
>> H = freqz(b, a, len, 'whole'); %filter transfer function / dimention: 4001x1
>> Y2_FFT = mtlb_FFT .* H; %filtered result in freq. domain = multiplication of transformed speech signal and the filter transfer function in the freq. domain. % dimention: 4001x1
>> y2 = ifft(Y2_FFT); %Inverse FFT to time domain % dimention: 4001x1
>> sound(y2, Fs);
>> subplot(3,1,3);
>> plot(time,y2);
>> title('freqz() and transform');

Frequency domain (Method 2)
>> mtlb_FFT = fft(mtlb); %transformed input signal / dimention: 4001x1
>> H = freqz(b, a, len); %filter transfer function / dimention: 4001x1
>> Y2_FFT = mtlb_FFT .* H; %filtered result in freq. domain = multiplication of transformed speech signal and the filter transfer function in the freq. domain. % dimention: 4001x1
>> y2 = ifft(Y2_FFT); %Inverse FFT to time domain % dimention: 4001x1
>> sound(abs(y2), Fs);
>> subplot(3,1,3);
>> plot(time,y2);
>> title('freqz() and transform');

Result:


References:

Matlab: Low pass filtered Chirp sound - Study EECC
Filter Applications (濾波器應用) - Audio Signal Processing and Recognition (音訊處理與辨識)
MATLAB: filter in the frequency domain using FFT/IFFT with an IIR filter - StackOverflow

2017/10/12

Fundamental Frequency, Harmonic and Overtone 基頻、諧波、泛音

Fundamental Frequency 基本頻率/基頻
Fundamental Tone 基音/根音
Harmonic 諧波/諧音
Harmonic series 諧波列/泛音列
Overtone 泛音

Standing Wave 駐波
Periodic Tone 週期音
Pure Tone 純音
Complex Tone 複合音/複音
Higher Harmonics 高次諧波 (f×2, f×3...)

Partial 分音 - 出現於complex tone中,不一定是F0的integer multiple的sine wave

Example of a harmonic series:

f = fundamental frequency / fundamental tone / 1st harmonic 第一諧音/波
f×2 = 1st overtone 第一泛音 / 2nd harmonic 第二諧音/波
f×3 = 2nd overtone 第二泛音 / 3rd harmonic 第三諧音/波

參考資料

Harmonic (Wikipedia)
駐波(standing waves)(小小整理網站)
Harmonics (諧波)、Overtones (泛音)、Partials (分音)
Fourier Analysis Terms 傅立葉分析相關名詞

2017/07/24

Matlab: Play Demo Sounds

It is very easy to play some demo sounds with Matlab commands using the sound function.

To play a female speaker saying 'Matlab' type:

>> load mtlb
>> sound(mtlb,Fs)

Note that the workspace contains mtlb and Fs, so sound(mtlb,Fs) is called.

You may clear the workspace with the the clear command:

>> clear

To play a snippet of Handel's Hallelujah Chorus, type:

>> load handel
>> sound(y,Fs)

There are also other sounds you may load:

load chirp
load gong
load laughter
load train

References:

Formant Estimation with LPC Coefficients
Record and Play Audio
Sound + FFT

2017/05/13

Cent and Semitone 音分和半音

cent 音分
semitone/half step/half tone 半音
whole tone 全音
accidental 變音記號

1 semitone = 100 cents

octave 八度音

一個八度音 = 12個半音 = 5個全音 + 2個半音
Twelve-tone equal temperament (十二平均律)
- divide an octave into 12 equal parts

Number of Semitones between Two Frequencies (兩個頻率間相差的半音數):

n = abs(log2(f1/f2)/log2(2(1/12))) = abs(12log2(f1/f2))

Reference

How to find the number of semitones between two frequencies?

2017/03/18

樂器的ADSR模型

A sound produced by a musical instrument can be explained using the ADSR model, which involves 4 stages:

attack
decay
sustain
release

2016/08/20

Bluetooth Audio Profiles 與聲音有關的藍牙規範/描述檔

For Audio Streaming (Single-directional)

A2DP (Advanced Audio Distribution Profile) - high quality audio streaming.

AVRCP (Audio/Video Remote Control Profile) - control A/V equipment. Used with A2DP or VDP (Video Distribution Profile).

GAVDP (Generic Audio/Video Distribution Profile) - basis for A2DP and VDP streaming.

For Mobile Phones (Bi-directional)

HSP (Headset Profile) - for headsets to communicate with mobile phones.

HFP (Hands-Free Profile) - for communication between hands-free devices and mobile phones in the car. With extended features to HSP.

More Information:

Differences between HSP & HFP

For Audio Streaming

CTP (Cordless Telephony Profile) - for cordless phones.

參考資料

藍牙規範(維基百科)
如何選購藍芽耳機