I frequently need a list of CVEs listed on a vendor's security bulletin page. Sometimes that's simple to copy off, but often they're mixed in with a bunch of text.
I haven't touched Python in a good while, so I thought this would be a great exercise to figure out how to extract that info – especially since I keep finding myself doing it manually.
Here's my current code:
#!/usr/bin/env python3
# REQUIREMENTS
#   python3
#   BeautifulSoup (pip3 install beautifulsoup)
#   python 3 certificates (Applications/Python 3.x/ Install Certificates.command) <-- this one took me forever to figure out!
import sys
if sys.version_info[0] < 3:
    raise Exception("Use Python 3:  python3 " + sys.argv[0])
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
#specify/get the url to scrape
#url ='https://chromereleases.googleblog.com/2020/02/stable-channel-update-for-desktop.html'
#url = 'https://source.android.com/security/bulletin/2020-02-01.html'
url = input("What is the URL?  ") or 'https://chromereleases.googleblog.com/2020/02/stable-channel-update-for-desktop.html'
print("Checking URL: " + url)
# CVE regular expression
cve_pattern = 'CVE-\d{4}-\d{4,7}'
# query the website and return the html
page = urlopen(url).read()
# parse the html returned using beautiful soup
soup = BeautifulSoup(page, 'html.parser')
count = 0
############################################################
# ANDROID === search for CVE references within <td> tags ===
# find all <td> tags
all_tds = soup.find_all("td")
#print all_tds
for td in all_tds:
    if "cve" in td.text.lower():
        print(td.text)
############################################################
# CHROME === search for CVE reference within <span> tags ===
# find all <span> tags
all_spans = soup.find_all("span")
for span in all_spans:
    # this code returns results in triplicate
    for i in re.finditer(cve_pattern, span.text):
        count += 1
        print(count, i.group())
    # this code works, but only returns the first match
#   match = re.search(cve_pattern,span.text)
#   if match:
#       print(match.group(0))
What I have working for the Android URL works fine; the problem I'm having is for the Chrome URL.  They have the CVE info inside <span> tags, and I'm trying to leverage regular expressions to pull that out.
Using the re.finditer approach, I end up with results in triplicate.
Using the re.search approach it misses CVE-2019-19925 – they listed two CVEs on that same line.
Can you offer any advice on the best way to get this working?
 
    