I have written some code that help me scrape websites. It has worked well on some sites but I am currently running into an issue.
The collectData() function collects data from a site and appends it to 'dataList'. From this dataList I can create a csv file to export the data.
The issue I am having right now is that the function appends multiple whitespances and \n characters into my list. The output look like this: (the excessive whitespaces are not shown here)
dataList = ['\n 2.500.000 ']
Does anyone what what could cause this? As I mentioned, there are some websites where the function works fine.
Thank you!
def scrape():
dataList = []
pageNr = range(0, 1)
for page in pageNr:
    pageUrl = ('https://www.example.com/site:{}'.format(page))
    print(pageUrl)
    def getUrl(pageUrl):
        openUrl = urlopen(pageUrl)
        soup = BeautifulSoup(openUrl, 'lxml')
        links = soup.find_all('a', class_="ellipsis")
        for link in links:
            linkNew = link.get('href')
            linkList.append(linkNew)
            #print(linkList)
            return linkList
    anzList = getUrl(pageUrl)
    lenght = len(anzList)
    print(lenght)
    anzLinks = []
    for i in range(lenght):
        anzeigenLinks.append('https://www.example.com/ + anzList[i]')
    print(anzLinks)
    def collectData():
        for link in anzLinks:
            openAnz = urlopen(link)
            soup = BeautifulSoup(openAnz, 'lxml')
            try:
                kaufpreisSuche = soup.find('h2')
                kaufpreis = kaufpreisSuche.text
                dataListe.append(kaufpreis)
                print(kaufpreis)
            except:
                kaufpreis = None
                dataListe.append(kaufpreis)
