I've seen this question concerning the same type of issue between librosa, python_speech_features and tensorflow.signal.
I am trying to make torchaudio and librosa compute MFCC features with the same arguments and underlying methods. This is part of a transition from librosa to torchaudio.
Given:
import numpy as np
import torch
from librosa.feature import mfcc
from torchaudio.transforms import MFCC
sample_rate = 22050
audio = np.ones((sample_rate,), dtype=np.float32)
librosa_mfcc = mfcc(audio, sr=sr, n_mfcc=20, n_fft=2048, hop_length=512, power=2)
mfcc_module = MFCC(sample_rate=sr, n_mfcc=20, melkwargs={"n_fft": 2048, "hop_length": 512, "power": 2})
torch_mfcc = mfcc_module(torch.tensor(audio))
The shapes of librosa_mfcc and torch_mfcc are both (20, 44), but the arrays themselves are different. For example, librosa_mfcc[0][0] is -487.6101, while torch_mfcc[0][0] is -302.7711.
I admit I am lacking a good amount of domain knowledge here, but am working through the librosa and torchaudio documentation and parameters to learn the different routes they take in MFCC calculation as well as the meaning behind each parameter. How do I make torch_mfcc have the same values as librosa_mfcc?