I would like to understand how regular std::string and std::map operations deal with Unicode code units should they be present in the string.
Sample code:
    include <iostream>
    #include "sys/types.h"
    using namespace std;
    int main()
    {
        std::basic_string<u_int16_t> ustr1(std::basic_string<u_int16_t>((u_int16_t*)"ยฤขฃ", 4));
        std::basic_string<u_int16_t> ustr2(std::basic_string<u_int16_t>((u_int16_t*)"abcd", 4));
        for (int i = 0; i < ustr1.length(); i++)
            cout << "Char: " << ustr1[i] << endl;
        for (int i = 0; i < ustr2.length(); i++)
            cout << "Char: " << ustr2[i] << endl;
        if (ustr1 == ustr2)
            cout << "Strings are equal" << endl;
        cout << "string length: " << ustr1.length() << "\t" << ustr2.length() << endl;
        return 0;
    }
The strings contain Thai characters and ascii characters, and the intent behind using basic_string<u_int16_t> is to facilitate storage of characters which cannot be accommodated within a single byte. The code was run on a Linux box, whose encoding type is en_US.UTF-8. The output is:
$ ./a.out
Char: 47328
Char: 57506
Char: 42168
Char: 47328
Char: 25185
Char: 25699
Char: 17152
Char: 24936
string length: 4        4
A few questions:
- Do the character values in the output correspond to - en_US.UTF-8code points? If not, what are they?
- Would the - std::stringoperators like- ==,- !=,- <etc., be able to work with Unicode code points? If so, would it be a mere comparison of each code points in the corresponding locations? Would- std::mapwork on similar lines?
- Would changing the locale to UTF-16 result in the strings getting stored as UTF-16 code points? 
Thanks!
 
     
    