I have an html file called basic.html and what my task is, is to create a small Java program using regular expressions to output various strings. My program should display the line number of all of the occurrences of each of the strings below:
- div tag
- div class="menuItem" tag
- span tag
- class=”emph”
- Any string beginning with < and ending with >, i.e. all tags.
- The contents of the body tag.
- The contents of all divs
- All divs that make menus
I must also use start and end methods to display index values.
I have started my code as follows:
import java.io.IOException;
import java.util.Arrays;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexHTML {
   public static void main(String[] args) throws IOException {
      // Input for matching the regexe pattern
       String file_name = "basic.html";
           ReadFile file = new ReadFile(file_name);
           String[] aryLines = file.OpenFile();  
           String asString = Arrays.toString(aryLines);
            // Regexe to be matched
               String regexe = "<div>";
           int i;
           for ( i=0; i < aryLines.length; i++ ) {
           System.out.println( aryLines[ i ] ) ;
           }
      // Step 1: Allocate a Pattern object to compile a regexe
      Pattern pattern = Pattern.compile(regexe);
      //Pattern pattern = Pattern.compile(regexe, Pattern.CASE_INSENSITIVE);  // case-    insensitive matching
      // Step 2: Allocate a Matcher object from the compiled regexe pattern,
      //         and provide the input to the Matcher
      Matcher matcher = pattern.matcher(asString);
      // Step 3: Perform the matching and process the matching result
      int count = 0;
      // Use method find()
      while (matcher.find()) {     // find the next match
         System.out.println("find() found the pattern \"" + matcher.group()
               + "\" starting at index " + matcher.start()
               + " and ending at index " + matcher.end());
          count++;
      }
      System.out.println("\nFound the pattern "+count+ " times.\n");
      // Use method matches()
      if (matcher.matches()) {
         System.out.println("matches() found the pattern \"" + matcher.group()
               + "\" starting at index " + matcher.start()
               + " and ending at index " + matcher.end());
      } else {
         System.out.println("matches() found nothing");
      }
      // Use method lookingAt()
      if (matcher.lookingAt()) {
         System.out.println("lookingAt() found the pattern \"" + matcher.group()
               + "\" starting at index " + matcher.start()
               + " and ending at index " + matcher.end());
      } else {
         System.out.println("lookingAt() found nothing");
      }
   }
}
My biggest problem is how exactly am I going to be able to display all those occurrences, my code so far only gives me the index value of the div tag but I would like to have all the occurrences listed above displayed in the output. My second problem of course is how to display the line every string occurs but I haven't really researched this yet as I'm thinking about the first question at the moment. However If you could give me a hint as to where to get started on this one too, I would appreciate it.
 
     
    