I have begun using the Python library textract to parse text from PowerPoint (.pptx), Word documents (.docx), and text files (*.txt).  I wrote a simple script to test it.
# Python textract test script
import textract
textract.process("H:\My Documents\Test.docx")
When I run it, either on the command line or in Idle, I get a traceback with the last few lines being:
File: "C:...\textract\parsers\docx_parser.py", line 1 in import docx2txt ImportError: No module named docx2txt
I am using version 1.5.0, downloaded from https://pypi.python.org/pypi/textract.  I don't know why it would not include any dependencies.  Will I have to install docx2txt and its subsequent dependencies?  Why would the textract package not contain everything I need?
 
     
     
     
    