I am attempting to create a spell checking function that reads in a text file containing a passage that contains several misspelt words. For example: "My favorite subjects are: Physcs, Maths, Chemistree and Biology - I find it necesary to use my iPad to make comprensive notes after class." I have three issues that I am trying to resolve:
- Currently, the program considers Maths to be an incorrect word due to the comma that is present immediately after the word. I believe that in order to solve this issue, it would be best to split the string in the text file like so: ['My', 'favorite', 'subjects', 'are', ':', ' ', 'Physcs', ' ', 'Maths', ','...etc]. How do I split the string into words and punctuation without using any imported python functions (e.g. string or regex (re) functions)? 
- I am currently comparing each word with a dictionary of accepted English words by iterating over each word in the text file. Is there a better method to preprocess a list to quickly identify whether a word contains a given element to improve the runtime of the program? 
- There are several words such as 'eBook' and 'iPad' that are exceptions to the rules used in the function - is_valid_wordbelow (i.e. the word must start with a capital with all the other letters being lowercase or all characters in the word must be uppercase). Is there a way that I can check whether the string is a valid word?
Any help would be greatly appreciated!
def get_words():
    with open( "english.txt" ) as a:
         words = a.readlines()
    words = [word.strip() for word in words]
    return words
isWord = get_words()
def is_valid_word(st):
    if isinstance(st, str):
        st_lower = st.lower()
        if st_lower in isWord:
            if (st[0:len(st)].isupper() or st[0:len(st)].islower()) or (st[0].isupper() and st[1:len(st)].islower()) or st[0:len(st)].isupper():
                return (True)
            else: 
                return(False)
        else:
            return (False)
    else:
        return (False)
def spell_check_file( file ):
    incorrectWords = []  # Will contain all incorrectly spelled words.
    num = 0  # Used for line counter.
    with open(file, 'r') as f:
        for line_no, line in enumerate(f):
            for word in line.split():
                if is_valid_word(word) == False:
                    incorrectWords.append(line_no)
                    incorrectWords.append(word)
        for f in incorrectWords:
            return incorrectWords
            print (incorrectWords)
spell_check_file("passage.txt")
 
    