I want to classify file types based on their extensions in python.Before writing it up myself i wanted to check if there is any python package which can be used for this purpose. By file type i mean to classify it as eg. Doc,ppt,pdf,tar,txt,iso etc. ideally it would take the file name as input and return its type.i am running on linux
            Asked
            
        
        
            Active
            
        
            Viewed 655 times
        
    1
            
            
        - 
                    A file's extension has nothing to do with its type. – Burhan Khalid Sep 04 '12 at 06:48
- 
                    3Take a look at this question: http://stackoverflow.com/questions/43580/how-to-find-the-mime-type-of-a-file-in-python . You can *guess* by extension using `mimetypes`, but something like the `python-magic` (mentioned in the second answer) may be more reliable. – kenm Sep 04 '12 at 06:51
- 
                    Not *nothing* (you hope they're related), but they are definitely not the same thing. Eg., You can totally change the extension of a `.jpg` to a `.doc`, but the type is still jpeg. – Matthew Adams Sep 04 '12 at 06:53
- 
                    i just want to classify based on what the extension says. Not bothered about the actual content of the file. Any help now? – auny Sep 04 '12 at 06:57
2 Answers
2
            You should look into a document metadata parser. I have used Apache Tika which is a java library in some of my projects. You can look at this question Python-based document metadata parser? to see how to use it in Python
 
    
    
        Community
        
- 1
- 1
 
    
    
        Pratik Mandrekar
        
- 9,362
- 4
- 45
- 65
1
            
            
        In Linux you can use 'file' utillity which determine file type. So if you want you can use it and in your scripts too:
import subprocess
subprocess.call(['file', 'yourfile'])
 
    
    
        Denis
        
- 7,127
- 8
- 37
- 58
- 
                    1Command 'file' uses libmagic library, there is a 'python-magic' module that provides native interface and uses the same logic. – neutrinus Mar 13 '13 at 15:57
