When reading book: web scraping with python, the re expression confused me,
webpage_regex = re.compile('<a[^>]+href=["\'](.*?)["\']', re.IGNORECASE)
And a link in usually looks like:
<a href="/view/Afghanistan-1">
My confusion is that:
Since
[^>]means no>, why it followed by a+? This+seems useless.The confusion is that
(.*?), since*means repeat 0 or more times, why it needs?to repeat*again?