There are multiple parts of Python's functionality involved here: reading the source code and parsing the string literals, transcoding, and printing. Each has its own conventions.
Short answer:
- For the purpose of code parsing:
- str(Py2) -- not applicable, raw bytes from the file are taken
- unicode(Py2)/- str(Py3) -- "source encoding", defaults are- ascii(Py2) and- utf-8(Py3)
- bytes(Py3) -- none, non-ASCII characters are prohibited in the literal
 
- For the purpose of transcoding:
- both (Py2) -- sys.getdefaultencoding()(asciialmost always)
- there are implicit conversions which often result in a UnicodeDecodeError/UnicodeEncodeError
 
- both (Py3) -- none, must specify encoding explicitly when converting
 
- For the purpose of I/O:
- unicode(Py2) --- <file>.encodingif set, otherwise- sys.getdefaultencoding()
- str(Py2) -- not applicable, raw bytes are written
- str(Py3) --- <file>.encoding, always set and defaults to- locale.getpreferredencoding()
- bytes(Py3) -- none,- printing produces its- repr()instead
 
First of all, some terminology clarification so that you understand the rest correctly. Decoding is translation from bytes to characters (Unicode or otherwise), and encoding (as a process) is the reverse. See The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – Joel on Software to get the distinction.
Now...
Reading the source and parsing string literals
At the start of a source file, you can specify the file's "source encoding" (its exact effect is described later). If not specified, the default is ascii for Python 2 and utf-8 for Python 3. A UTF-8 BOM has the same effect as a utf-8 encoding declaration.
Python 2
Python 2 reads the source as raw bytes. It only uses the "source encoding" to parse a Unicode literal when it sees one. (It's more complicated than that under the hood, but this is the net effect.)
> type t.py
# Encoding: cp1251
s = "абвгд"
us = u"абвгд"
print repr(s), repr(us)
> py -2 t.py
'\xe0\xe1\xe2\xe3\xe4' u'\u0430\u0431\u0432\u0433\u0434'
<change encoding declaration in the file to cp866, do not change the contents>
> py -2 t.py
'\xe0\xe1\xe2\xe3\xe4' u'\u0440\u0441\u0442\u0443\u0444'
<transcode the file to utf-8, update declaration or replace with BOM>
> py -2 t.py
'\xd0\xb0\xd0\xb1\xd0\xb2\xd0\xb3\xd0\xb4' u'\u0430\u0431\u0432\u0433\u0434'
So, regular strings will contain the exact bytes that are in the file. And Unicode strings will contain the result of decoding the file's bytes with the "source encoding".
If the decoding fails, you will get a SyntaxError. Same if there is a non-ASCII character in the file when there's no encoding specified. Finally, if unicode_literals future is used, any regular string literals (in that file only) are treated as Unicode literals when parsing, with all what that means.
Python 3
Python 3 decodes the entire source file with the "source encoding" into a sequence of Unicode characters. Any parsing is done after that. (In particular, this makes it possible to have Unicode in identifiers.) Since all string literals are now Unicode, no additional transcoding is needed. In byte literals, non-ASCII characters are prohibited (such bytes must be specified with escape sequences), evading the issue altogether.
Transcoding
As per the clarification at the start:
- str(Py2)/- bytes(Py3) -- bytes => can only be- decoded (directly, that is; details follow)
- unicode(Py2)/- str(Py3) -- characters => can only be- encoded
Python 2
In both cases, if the encoding is not specified, sys.getdefaultencoding() is used. It is ascii (unless you uncomment a code chunk in site.py, or do some other hacks which are a recipe for disaster). So, for the purpose of transcoding, sys.getdefaultencoding() is the "string's default encoding".
Now, here's a caveat:
Python 3
There's no "default encoding" at all: implicit conversion between str and bytes is now prohibited.
- bytescan only be- decoded and- str--- encoded, and the- encodingargument is mandatory.
- converting bytes->str(incl. implicitly) produces itsrepr()instead (which is only useful for debug printing), evading the encoding issue entirely
- converting str->bytesis prohibited
Printing
This matter is unrelated to a variable's value but related to what you would see on the screen when it's printed -- and whether you will get a UnicodeEncodeError when printing.
Python 2
- A unicodeisencoded with<file>.encodingif set; otherwise, it's implicitly converted tostras per the above. (The final third of theUnicodeEncodeErrorSO questions fall into here.)
- For standard streams, the stream's encoding is guessed at startup from various environment-specific sources, and can be overridden with the PYTHONIOENCODINGenvironment variable.
 
- str's bytes are sent to the OS stream as-is. What specific glyphs you will see on the screen depends on your terminal's encoding settings (if it's something like UTF-8, you may see nothing at all if you print a byte sequence that is invalid UTF-8).
Python 3
The changes are:
- Now files opened with text vs. binarymodenatively acceptstrorbytes, correspondingly, and outright refuse to process the wrong type. Text-mode files always have anencodingset,locale.getpreferredencoding(False)being the default.
- printfor text streams still implicitly converts everything to- str, which in the case of- bytesprints its- repr()as per the above, evading the encoding issue altogether