2

Sorry if the title is confusing but I have literally researched this for two hours and have no idea how I am supposed to ask this question so anyone feel free to edit this post.

I have this string variable that I created using web scraping and it uses special characters and looks something like "\ud83d\ude00\u0107\u00e7 \n hello" without the quotations when I print it. The issue is that I want it to print the actual special characters but I am not sure what encoding method I should use. If I copy and paste the exact string and print it, it works fine but as a variable I created, it doesn't display the special characters, just the text.

I have tried converting it to a string, using json.load, unicode-escape, UTF-8, and a bunch of others but I am honestly not sure what method I should use

page = requests.get('https://www.example.com')
soup = str(BeautifulSoup(page.text, 'html.parser')).splitlines()

for line in soup:
    if 'hello' in line:
        print(line) #produces literal text of \ud83d\ude00\u0107\u00e7 \n hello

print('\ud83d\ude00\u0107\u00e7 \n hello') #produces wanted result

I would like to outcome to look like this:

ćç

hello

Brandalf
  • 476
  • 1
  • 6
  • 20

2 Answers2

7

Through another hour of trial and error, I figured out this was the answer:

line.encode('utf-8').decode('unicode-escape')

Brandalf
  • 476
  • 1
  • 6
  • 20
2

Let a = "\ud83d\ude00\u0107\u00e7 \n hello"

 a.encode('utf-16', 'surrogatepass').decode('utf-16')    

Output:

'ćç \n hello'
Rohit-Pandey
  • 2,039
  • 17
  • 24
  • Thanks for the help but it didn't work, I think the variable is in some kind of weird format when it got web scraped. Remember, the actual string prints perfectly fine but the variable I web scraped does not. – Brandalf Jun 17 '19 at 10:32
  • it actual produces the exact same results. Any other ideas? – Brandalf Jun 17 '19 at 11:09