Issue
I need to check if each word of a string is spelled correctly by searching a mongoDB collection for each word.
- Doing a minimum amount of DB query
- First word of each sentence must be in upper case, but this word could be upper or lower case in the dictionary. So I need a case sensitive match for each word. Only the first word of each sentence should be case insensitive.
Sample string
This is a simple example. Example. This is another example.
Dictionary structure
Assume there is a dictionary collection like this
{ word: 'this' },
{ word: 'is' },
{ word: 'a' },
{ word: 'example' },
{ word: 'Name' }
In my case, there are 100.000 words in this dictionary. Of course names are stored in upper case, verbs are stored lower case and so on...
Expected result
The words simple and another should be recognized as 'misspelled' word as they are not existing in the DB.
An array with all existing words should be in this case: ['This', 'is', 'a', 'example']. This is upper case as it is the first word of a sentence; in DB it is stored as lower case this.
My attempt so far (Updated)
const   sentences   = string.replace(/([.?!])\s*(?= [A-Z])/g, '$1|').split('|');
let     search      = [],
        words       = [],
        existing,
        missing;
sentences.forEach(sentence => {
    const   w   = sentence.trim().replace(/[^a-zA-Z0-9äöüÄÖÜß ]/gi, '').split(' ');
    w.forEach((word, index) => {
        const regex = new RegExp(['^', word, '$'].join(''), index === 0 ? 'i' : '');
        search.push(regex);
        words.push(word);
    });
});
existing = Dictionary.find({
    word: { $in: search }
}).map(obj => obj.word);
missing = _.difference(words, existing);
Problem
- The insensitive matches don't work properly: /^Example$/iwill give me a result. But inexistingthere will go the original lowercaseexample, that meansExamplewill go tomissing-Array. So the case insensitive search is working as expected, but the result arrays have a missmatch. I don't know how to solve this.
- Optimizing the code possible? As I'm using two forEach-loops and adifference...
 
    