The situation is: I'm scraping one website, the urls for the pages follow the pattern:
http://www.pageadress/somestuff/ID-HERE/
Nothing unusual. I have a lot of id's that i need to scrape and most of them work correctly. However, the page behaves in portal-like way. In browser, when you enter such address, you get redirected to:
http://www.pageadress/somestuff/ID-HERE-title_of_subpage
What might be problematic is that sometimes that title might contain non-ascii characters (approximately 0.01% of cases), therefore (i think that's the issue) i get the exception:
  File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 469, in open
    response = meth(req, response)
  File "/usr/lib/python3.4/urllib/request.py", line 579, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.4/urllib/request.py", line 501, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 684, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 463, in open
    response = self._open(req, data)
  File "/usr/lib/python3.4/urllib/request.py", line 481, in _open
    '_open', req)
  File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 1210, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib/python3.4/urllib/request.py", line 1182, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "/usr/lib/python3.4/http/client.py", line 1088, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.4/http/client.py", line 1116, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/lib/python3.4/http/client.py", line 973, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 38-39: ordinal not in range(128).
The bizarre thing is that no unicode characters in url i'm redirected to are actually on position 38-39, but there are on others.
The code being used:
import socket
import urllib.parse
import urllib.request
socket.setdefaulttimeout(30)
url = "https://www.bettingexpert.com/archive/tip/3207221"
headers = {'User-Agent': 'Mozilla/5.0'}
content = urllib.request.urlopen(urllib.request.Request(url, None, headers)).read().decode('utf-8')
Any method to get around it, preferably without using other libraries?
//Oh the glorious world of python, creating 1000s of problems i wouldn't even think were possible if i was writing in ruby.
