I am writing a regex to grab data between "". The only issue I am running into is the last " is being captured. Regex
  line = '<DT><A HREF="https://cheatsheetseries.owasp.org/cheatsheets/Clickjacking_Defense_Cheat_Sheet.html" ADD_DATE="1567455957">Clickjacking Defense · OWASP Cheat Sheet Series</A>'
  capture_regex = re.compile(r'(?<=HREF=").*?"',re.IGNORECASE)
  m = capture_regex.search(line)
m.group() prints https://cheatsheetseries.owasp.org/cheatsheets/Clickjacking_Defense_Cheat_Sheet.html". How to write the regex where it does not include the last quotation mark.
Answered my question. I added I added what is called non-greedy to my regex.
capture_regex = re.compile(r'(?<=HREF=").*?(?=")',re.IGNORECASE). By adding the ? after * made it only stop at the first ".
 
     
     
    