I am using a function created by Vitim.us for counting all occurrences of a substring.
The function(linked above) goes like this:
/** Function that count occurrences of a substring in a string;
 * @param {String} string               The string
 * @param {String} subString            The sub string to search for
 * @param {Boolean} [allowOverlapping]  Optional. (Default:false)
 *
 * @author Vitim.us https://gist.github.com/victornpb/7736865
 * @see Unit Test https://jsfiddle.net/Victornpb/5axuh96u/
 * @see https://stackoverflow.com/a/7924240/938822
 */
function occurrences(string, subString, allowOverlapping) {
    string += "";
    subString += "";
    if (subString.length <= 0) return (string.length + 1);
    var n = 0,
        pos = 0,
        step = allowOverlapping ? 1 : subString.length;
    while (true) {
        pos = string.indexOf(subString, pos);
        if (pos >= 0) {
            ++n;
            pos += step;
        } else break;
    }
    return n;
}I have an index of words (containing tags of stemmed words and the original content). To improve speed, I thought of finding if the word exists in the tags and then counting the occurrences if required.
To count if the word exists, I make use of
s.indexOf(word)
When comparing a single indexOf call with the occurrences function which calls indexOf multiple times, I found that the occurrences function took less time consistently.
- How is this possible?
This is the exact code and string I used for benchmarking - code
- This might be a separate question...If this is the case, then what is the use of creating an index with stemmed words? I can directly find the occurrences from the content(which is a faster way).
