How to gather links from "View More Campaigns" using Python 3? I wish to gather all 260604 links from this page? https://www.gofundme.com/mvc.php?route=category&term=sport
            Asked
            
        
        
            Active
            
        
            Viewed 601 times
        
    3
            
            
        - 
                    1*always* use a generic [python] tag, if only to get more eyeballs on the question – juanpa.arrivillaga Nov 22 '17 at 19:57
2 Answers
2
            
            
        When clicking on the View More Campaigns button, the browser requests the following URL:
https://www.gofundme.com/mvc.php?route=category/loadMoreTiles&page=2&term=sport&country=GB&initialTerm=
This could be used to request further pages as follows:
from bs4 import BeautifulSoup    
import requests
page = 1
links = set()
length = 0
while True:
    print("Page {}".format(page))
    gofundme = requests.get('https://www.gofundme.com/mvc.php?route=category/loadMoreTiles&page={}&term=sport&country=GB&initialTerm='.format(page))
    soup = BeautifulSoup(gofundme.content, "html.parser")
    links.update([a['href'] for a in soup.find_all('a', href=True)])
    # Stop when no new links are found
    if len(links) == length:
        break
    length = len(links)
    page += 1
for link in sorted(links):
    print(link)
Giving you an output starting like:
https://www.gofundme.com/100-round-kumite-rundraiser
https://www.gofundme.com/10k-challenge-for-disabled-sports
https://www.gofundme.com/1yeti0
https://www.gofundme.com/2-marathons-1-month
https://www.gofundme.com/23yq67t4
https://www.gofundme.com/2fwyuwvg
Some of the links returned are duplicates, so a set is used to avoid this.
The script continues to attempt to request new pages until no new links are seen, which appears to happen at around 18 pages.
 
    
    
        Martin Evans
        
- 45,791
- 17
- 81
- 97
1
            
            
        From retrieve links from web page using python and BeautifulSoup
import httplib2 from bs4 import BeautifulSoup, SoupStrainer http = httplib2.Http() status, response = http.request('https://www.gofundme.com/mvc.php?route=category&term=sport') for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')): if link.has_attr('href'): print (link['href'])
 
    
    
        whackamadoodle3000
        
- 6,684
- 4
- 27
- 44
- 
                    This won't gather all the fundraising campaign links the OP wants, only the campaigns that are initially on the page. – hoefling Nov 22 '17 at 20:49
