Hello fellow coders :)
So as part of my research project I need to scrape data out of a website. Obviously it detects bots therefore I am trying to implement proxies on a loop I know works (getting the brands url):
The working loop:
brands_links= []
for country_link in country_links:
    r = requests.get(url + country_link, headers=headers)
    soup_b = BeautifulSoup(r.text, "lxml")
    for link in soup_b.find_all("div", class_='designerlist cell small-6 large-4'):
        for link in link.find_all('a'):
            durl = link.get('href')
            brands_links.append(durl)
The loop using proxies:
brands_links= []
i = 0 
while i in range(0, len(country_links)):
    print(i)
    try:
        proxy_index = random.randint(0, len(proxies) - 1)
        proxy = {"http": proxies[proxy_index], "https": proxies[proxy_index]}
        r = requests.get(url + country_links[i], headers=headers, proxies=proxy, timeout=10)
        soup_b = BeautifulSoup(r.text, "lxml")
        for link in soup_b.find_all("div", class_='designerlist cell small-6 large-4'):
            for link in link.find_all('a'):
                durl = link.get('href')
                brands_links.append(durl)
        
        if durl is not None :
            print("scraping happening")
            i += 1
        else: 
            continue
    
    except:
        print("proxy not working")
        proxies.remove(proxies[proxy_index]) 
    
    if i == len(country_links):
        break
    else:
        continue
Unfortunately it does not scrape all the links.
With the working loop only using headers I get a list of lenght 3788. With this one I only get 2387.
By inspecting the data I can see it skips some country links hence the difference in length. I am trying to force the loop to scrape all the links with the "if" statement but it does not seem to work.
Anyone knows what I am doing wrong or got an idea which would make it scrape everything? Thanks in advances
 
    