I'm new to web scraping, and I've been using Selenium for this particular project. In this example, I'm crawling through the listings on a website and they are structured as follows...
Listing 1:
<html>
     <div class="div_class">
          <i class="first_i_class" style="i_style"> ::before </i>
          First Category: 
          <span class="span_class">5</span>
          <br>
          <i class="second_i_class" style="i_style"> ::before </i>
          Second Category: 
          <span class="span_class">3</span>
          <br>
     </div>
</html>
As you can see, the values for the first and second categories are similar, so finding all elements and then using a regex won't work here. I need to be able to get the text (5 and 3, in this example) based on the preceding text, in this case "First Category: " or "Second Category: ". Some listings, however, might skip certain categories and look like this...
Listing 2:
<html>
     <div class="div_class">
          <i class="third_i_class" style="i_style"> ::before </i>
          Third Category: 
          <span class="span_class">7</span>
          <br>
     </div>
</html>
Because the categories change between listings, I don't think I can use something like:
cat_2_value = browser.find_element_by_xpath("/html/div/span[2][@class='span_class']")
because the xpath will also change. Is there a way that I can find the text in a given span based on either
- The preceding text (like "First Category: ") or
- The preceding <i>class (like "first_i_class")?
Any help or clarifying questions are much appreciated!
 
     
     
    
tag? I think that would include both the category and the value? But I'm not sure if there is an easier way. – DRo Jun 29 '20 at 10:04