I have a string encoded in windows-1256 and is displayed as ÓíÞÑÕäí áßí ¿.
The string should be displayed in Arabic if the operating system is configured to use the encoding.
Here is the HEX representation of the string:
My intention is to convert the text to utf8 manually (using lookup tables to see which bytes need to be altered, and which should be left as-is).
I will need to iterate through all bytes in the string to see the binary value of the byte.
The string is printed to the output stream as ÓíÞÑÕäí áßí ¿. The string length is 13 visible characters. But when I try to iterate through the bytes, the loop is run double the size (24) iterations. Maybe it wrongly assumes at UTF8 or UTF16.
How can I access the numerical value of each byte in the string?
#include <iostream>
#include <bitset>
using std::string;
using std::cout;
using std::endl;
int main() {
string myString = "ÓíÞÑÕäí áßí ¿";
// text is written in Windows-1256 encoding
cout << "string is : " << myString << endl;
// outputs: string is : ÓíÞÑÕäí áßí ¿
cout << "length : " << myString.size() << endl;
// outputs : length : 24
for (std::size_t i = 0; i < myString.size(); ++i)
{
uint8_t b1 = (uint8_t)myString.c_str()[i];
unsigned char b2 = (unsigned char) myString.c_str()[i];
unsigned int b3 = (unsigned int) myString.c_str()[i];
int b4 = (int) myString.c_str()[i];
cout << i << " - "
<< std::bitset<8>(myString.c_str()[i])
<< " : " << b1 // prints �
<< " : " << b2 // prints �
<< " : " << b3 // prints very large numbers, except for spaces (32)
<< " : " << b4 // negative values, except for the space (32)
<< endl;
}
return 0;
}
output
string is : ÓíÞÑÕäí áßí ¿
length : 24
0 - 11000011 : � : � : 4294967235 : -61
1 - 10010011 : � : � : 4294967187 : -109
2 - 11000011 : � : � : 4294967235 : -61
3 - 10101101 : � : � : 4294967213 : -83
4 - 11000011 : � : � : 4294967235 : -61
5 - 10011110 : � : � : 4294967198 : -98
6 - 11000011 : � : � : 4294967235 : -61
7 - 10010001 : � : � : 4294967185 : -111
8 - 11000011 : � : � : 4294967235 : -61
9 - 10010101 : � : � : 4294967189 : -107
10 - 11000011 : � : � : 4294967235 : -61
11 - 10100100 : � : � : 4294967204 : -92
12 - 11000011 : � : � : 4294967235 : -61
13 - 10101101 : � : � : 4294967213 : -83
14 - 00100000 : : : 32 : 32
15 - 11000011 : � : � : 4294967235 : -61
16 - 10100001 : � : � : 4294967201 : -95
17 - 11000011 : � : � : 4294967235 : -61
18 - 10011111 : � : � : 4294967199 : -97
19 - 11000011 : � : � : 4294967235 : -61
20 - 10101101 : � : � : 4294967213 : -83
21 - 00100000 : : : 32 : 32
22 - 11000010 : � : � : 4294967234 : -62
23 - 10111111 : � : � : 4294967231 : -65
