On 7 May 2017 08:58 -0600, from arnold(a)skeeve.com:
I don't imagine it would be hard to re-write
[uniq] to handle utf-8.
It does look like at least GNU coreutils 8.13 uniq is broken in that
regard, which frankly surprised me. That version isn't _that_ old.
Are your LC_* env variables set correctly?
I believe so (I don't recall seeing any other UTF-8-related weirdness
for a very long time), but I would want to verify the behavior on a
system that doesn't have a gazillion customizations accumulated over
years _and_ has the most recent version of coreutils before I file an
actual bug report. (It's not like Debian is ever bleeding edge, to
begin with.) I'll see if I spin up a VM or three one of these days to
check.
--
Michael Kjörling •
https://michael.kjorling.se • michael(a)kjorling.se
“People who think they know everything really annoy
those of us who know we don’t.” (Bjarne Stroustrup)