I parse html with python and there is date string: [ 24-Янв-17 07:24 ]. "Янв" is "Jan". I want to convert it into datetime object.
# Some beautifulsoup parsing
timeData = data.find('div', {'id' : 'time'}).text
import locale
locale.setlocale(locale.LC_TIME, 'ru_RU.UTF-8')
result = datetime.datetime.strptime(timeData, u'[ %d-%b-%y %H:%M ]')
The error is:
ValueError: time data '[ 24-\xd0\xaf\xd0\xbd\xd0\xb2-17 07:24 ]' does not match format '[ %d-%b-%y %H:%M ]'
type(timeData) returns unicode. Encoding timeData from utf-8 returns UnicodeEncodeError. What's wrong?
chardet returns {'confidence': 0.87625, 'encoding': 'utf-8'} and when I write: datetime.datetime.strptime(timeData.encode('utf-8'), ...) it returns error as above.
Original page has window-1251 encoding.
print type(timeData)
print timeData
timeData = timeData.encode('cp1251')
print type(timeData)
print timeData
returns
<type 'unicode'>
[ 24-Янв-17 07:24 ]
<type 'str'>
[ 24-???-17 07:24 ]