I am trying to get Perl and the GNU/Linux sort(1) program agree on how to sort Unicode strings. I'm running sort with LANG=en_US.UTF-8. In the Perl program I have tried the following methods:
use Unicode::Collatewith$Collator = Unicode::Collate->new();use Unicode::Collate::Localewith$Collator = Unicode::Collate->new(locale => $ENV{'LANG'});use locale
Each one of them failed with the following errors (from the Perl side):
- Input is not sorted: [----,] came after [($1]
- Input is not sorted: [...] came after [&]
- Input is not sorted: [($1] came after [1]
The only method that worked for me involved setting LC_ALL=C for sort, and using 8-bit characters in Perl. However, in this way Unicode strings are not properly ordered.