I'm trying to write a Java equivalent to PHP's ord():
public static int ord(char c) {
    return (int) c;
}
public static int ord(String s) {
    return s.length() > 0 ? ord(s.charAt(0)) : 0;
}
This seems to works well for characters with an ordinal value of up to 127, i.e. within ASCII. However, PHP returns 195 (and higher) for characters from the extended ASCII table or beyond. A comment by Mr. Llama to the answer on a related question explains this as follows:
To elaborate, the reason é showed ASCII 195 is because it's actually a two-byte character (UTF-8), the first byte of which is ASCII 195. – Mr. Llama
I hence changed my ord(char c) method to mask out all but the most significant byte:
public static int ord(char c) {
    return (int) (c & 0xFF);
}
Still, the results differ. Two examples:
- ord('é')(U+00E9) gives- 195in PHP while my Java function yields- 233
- ord('⸆')(U+2E06) gives- 226in PHP while my Java function yields- 6
I manged to get the same behavior for the method that accepts a String by first turning the String into a byte array, explicitly using UTF-8 encoding:
public static int ord(String s) {
    return s.length() > 0 ? ord((char)s.getBytes(StandardCharsets.UTF_8)[0]) : 0;
}
However, using the method that accepts a char still behaves as before and I could not yet find a solution for that. In addition, I don't understand why the change actually worked: Charset.defaultCharset() returns UTF-8 on my platform anyway. So...
- How can I make my function behave similar to that of PHP?
- Why does the change to ord(String s)actually work?
Explanatory answers are much appreciated, as I want to grasp what's going on exactly.
 
     
    