I'm trying to make a Speaker recognition (not speech but speaker) system using Python. I've extracted mfcc features of both train audio file and test audio file and have made a gmm model for each. I'm not sure how to compare the models to compute a score of similarity based on which I can program the system to validate the test audio. I'm struggling for 4 days to get this done. Would be glad if someone can help.
            Asked
            
        
        
            Active
            
        
            Viewed 1,250 times
        
    0
            
            
        
        Ubdus Samad
        
- 1,218
 - 1
 - 15
 - 27
 
        Tilak Sharma
        
- 1
 - 1
 - 3
 
- 
                    1Please provide more details and show your efforts (If any). – Ubdus Samad Apr 22 '18 at 08:01
 - 
                    im taking 3 audio files to model a training set(.gmm) and then taking one more audio clip(test clip) to compare it with the training model to compute the similarity – Tilak Sharma Apr 22 '18 at 08:06
 - 
                    Possible duplicate of [Python Speaker Recognition](https://stackoverflow.com/questions/7309219/python-speaker-recognition) – Nikolay Shmyrev Apr 22 '18 at 15:06
 
1 Answers
-1
            
            
        From what I can understand from the question, you are describing an aspect of the cocktail party problem I have found a whitepaper with a solution to your problem using a modified iterative Wiener filter and a multi-layer perceptron neural network that can separate speakers into separate channels.
Intrestingly the cocktail party problem can be solved in one line in ocatve: [W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
you can read more about it on this stackoverflow post
        James Burgess
        
- 487
 - 1
 - 4
 - 12