Alright so this is my code for a webscraper I've build. Right now it scrapes everything that I've selected with soup. But when I view the source code of my page this data includes a <br> which is line break.
When I scrape and save everything to the file, this gets excluded which makes all the data in one line without the <br> tag. I want this <br> to be there after each data is written to the file as follows:
Data<br>Data<br>Data<br>Data<br>
And not:
DataDataDataDataData
Is there anyway to currently modify my code? I think it's the g = item.text.encode('utf-8') that makes it remove the <br>. I would be happy if I could include the <br> in the code because then I can just regex it.
try :
t_data = soup.find_all("div", {"class": "blockrow restore"})
for item in t_data:
f = open('test.txt' , 'w')
g = item.text.encode('utf-8')
f.write(g)
f.close
finally:
Thanks.
tags within them? – Jon Winsley Nov 28 '16 at 19:39
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data The output becomes: DataDataDataDataDataDataDataDataData instead of: Data
Data
Data
Data
Data
– alexanderjoe Nov 28 '16 at 19:43