So before few days I started learning C++. I'm writing a simple xHTML parser, which doesn't contain nested tags. For testing I have been using the following data: http://pastebin.com/bbhJHBdQ (around 10k chars). I need to parse data only between p, h2 and h3 tags. My goal is to parse the tags and its content into the following structure:
struct Node {
    short tag; // p = 1, h2 = 2, h3 = 3
    std::string data;
};
for example <p> asdasd </p> will be parsed to tag = 1, string = "asdasd". I don't want to use third party libs and I'm trying to do speed optimizations. 
Here is my code:
short tagDetect(char * ptr){
    if (*ptr == '/') {
        return 0;
    }
    if (*ptr == 'p') {
        return 1;
    }
    if (*(ptr + 1) == '2')
        return 2;
    if (*(ptr + 1) == '3')
        return 3;
    return -1;
}
struct Node {
    short tag;
    std::string data;
    Node(std::string input, short tagId) {
        tag = tagId;
        data = input;
    }
};
int _tmain(int argc, _TCHAR* argv[])
{
    std::string input = GetData(); // returns the pastebin content above
    std::vector<Node> elems;
    String::size_type pos = 0;
    char pattern = '<';
    int openPos;
    short tagID, lastTag;
    double  duration;
    clock_t start = clock();
    for (int i = 0; i < 20000; i++) {
        elems.clear();
        pos = 0;
        while ((pos = input.find(pattern, pos)) != std::string::npos) {
            pos++;
            tagID = tagDetect(&input[pos]);
            switch (tagID) {
            case 0:
                if (tagID = tagDetect(&input[pos + 1]) == lastTag && pos - openPos > 10) {
                    elems.push_back(Node(input.substr(openPos + (lastTag > 1 ? 3 : 2), pos - openPos - (lastTag > 1 ? 3 : 2) - 1), lastTag));
                }
                break;
            case 1:
            case 2:
            case 3:
                openPos = pos;
                lastTag = tagID;
                break;
            }
        }
    }
    duration = (double)(clock() - start) / CLOCKS_PER_SEC;
    printf("%2.1f seconds\n", duration);
}
My code is in loop in order to performance test my code. My data contain 10k chars.
I have noticed that the biggest "bottleneck" of my code is the substr. As presented above, the code finishes executing in 5.8 sec. I noticed that if I reduce the strsub len to 10, the execution speed gets reduce to 0.4 sec. If I replace the whole substr with "" my code finishes in 0.1 sec.
My questions are:
- How can I optimize the substr, because it's the main bottleneck to my code?
- Are there any other optimization I can make to my code?
I'm not sure if this question is fine for SO, but I'm pretty new in C++ and I don't have idea who to ask if my code is complete crap.
Full source code can be found here: http://pastebin.com/dhR5afuE
 
     
     
    