In Java String uses UTF-16 internally:
If you have a plain Java String you do not need to do anything, your JDBC driver will convert a Java String to whatever encoding it uses transparently if you insert it as a String in your insert statement.
And when you read ResultSet.getString() it will give you back a Java String transparently.
If this is not the case then something is not configured correctly in the application and is inserting bad data that is not the encoding that it says it is. Garbage In/Garbage Out.
When you need to worry about encoding/decoding:
You only have to worry about translating byte[] encodings when reading/writing textual data to files or sockets that only accept byte[].
When working with byte[] that represent text you need to use new String(bytes,Charset) and byte[] b = string.getBytes(Charset); respectfully specifying whatever encoding the source/destination String is coming in and needs to be going out.
Never rely on the default encoding:
Never use new String(byte[]) or .getBytes() which uses the default encoding which is crap shoot what you get because of all the ways that it can vary that are opaque to your code.
The subtle issue is that UTF-8, Windows-1252 and a couple of other encodings are a superset of ASCII and overlap each other as well in this range. So if you use the default encoding everything might look like it is working fine and then things blow up when you ingest/export some byte[] that contains non-ASCII range characters.
In Summary:
- Never use
byte[] to represent text unless some API requires you to.
- Never rely on the default encoding, even if you think you know what it is.
- Always specify the
Charset when converted from byte[] or to byte[].
- Never conflate or confuse
Charset encoding with URL/URI/HTML/XML escaping.
- Unicode is not an encoding.