Following a related question, I'd like to ask about the new character and string literal types in C++11. It seems that we now have four sorts of characters and five sorts of string literals. The character types:
char     a =  '\x30';         // character, no semantics
wchar_t  b = L'\xFFEF';       // wide character, no semantics
char16_t c = u'\u00F6';       // 16-bit, assumed UTF16?
char32_t d = U'\U0010FFFF';   // 32-bit, assumed UCS-4
And the string literals:
char     A[] =  "Hello\x0A";         // byte string, "narrow encoding"
wchar_t  B[] = L"Hell\xF6\x0A";      // wide string, impl-def'd encoding
char16_t C[] = u"Hell\u00F6";        // (1)
char32_t D[] = U"Hell\U000000F6\U0010FFFF"; // (2)
auto     E[] = u8"\u00F6\U0010FFFF"; // (3)
The question is this: Are the \x/\u/\U character references freely combinable with all string types? Are all the string types fixed-width, i.e. the arrays contain precisely as many elements as appear in the literal, or to \x/\u/\U references get expanded into a variable number of bytes? Do u"" and u8"" strings have encoding semantics, e.g. can I say char16_t x[] = u"\U0010FFFF", and the non-BMP codepoint gets encoded into a two-unit UTF16 sequence? And similarly for u8? In (1), can I write lone surrogates with \u? Finally, are any of the string functions encoding aware (i.e. they are character-aware and can detect invalid byte sequences)?
This is a bit of an open-ended question, but I'd like to get as complete a picture as possible of the new UTF-encoding and type facilities of the new C++11.
 
     
    