I've found a useful function on another answer and I wonder if someone could explain to me what it is doing and if it is reliable. I was using mb_detect_encoding(), but it was incorrect when reading from an ISO 8859-1 file on a Linux OS.
This function seems to work in all cases I tested.
Here is the question: Get file encoding
Here is the function:
function isUTF8($string){
    return preg_match('%(?:
    [\xC2-\xDF][\x80-\xBF]              # Non-overlong 2-byte
    |\xE0[\xA0-\xBF][\x80-\xBF]         # Excluding overlongs
    |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # Straight 3-byte
    |\xED[\x80-\x9F][\x80-\xBF]         # Excluding surrogates
    |\xF0[\x90-\xBF][\x80-\xBF]{2}      # Planes 1-3
    |[\xF1-\xF3][\x80-\xBF]{3}          # Planes 4-15
    |\xF4[\x80-\x8F][\x80-\xBF]{2}      # Plane 16
    )+%xs', $string);
}
Is this a reliable way of detecting UTF-8 strings? What exactly is it doing? Can it be made more robust?
 
     
     
     
     
     
     
     
    