EDIT2: The issue was with how my Perl client was interpreting the output from PHP's json_encode which outputs Unicode code points by default. Putting the JSON Perl module in ascii mode (my $j = JSON->new()->ascii();) made things work as expected.
I'm interacting with an API written in PHP that returns JSON, using a client written in Perl which then submits a modified version of the JSON back to the same API. The API pulls values from a PostgreSQL database whose encoding is UTF8. What I'm running in to is that the API returns a different character encoding, even though the value PHP receives from the database is proper UTF-8.
I've managed to reproduce what I'm seeing with a couple lines of PHP (5.3.24):
<?php
$val = array("Millán");
print json_encode($val)."\n";
According to the PHP documentation, string literals are encoded ... in whatever fashion [they are] encoded in the script file.
Here is the hex dumped file encoding (UTF-8 lower case a-acute = c3 a1):
$ grep ill test.php | od -An -t x1c
  24  76  61  6c  20  3d  20  61  72  72  61  79  28  22  4d  69
   $   v   a   l       =       a   r   r   a   y   (   "   M   i
  6c  6c  c3  a1  6e  22  29  3b  0a
   l   l 303 241   n   "   )   ;  \n
And here is the output from PHP:
$ php -f test.php | od -An -t x1c
  5b  22  4d  69  6c  6c  5c  75  30  30  65  31  6e  22  5d  0a
   [   "   M   i   l   l   \   u   0   0   e   1   n   "   ]  \n
The UTF-8 lower case a-acute has been changed to a "Unicode" lower case a-acute by json_encode.
How can I keep PHP/json_encode from switching the encoding of this variable?
EDIT: What's interesting is that if I change the string literal to utf8_encode("Millán") then things work as expected. The utf8_encode docs say that function only supports ISO-8859-1 input, so I'm a bit confused about why that works.
 
     
     
    