On 7 May 2017 11:42 +1000, from noel.hunt(a)gmail.com (Noel Hunt):
I don't imagine it would be hard to re-write
[uniq] to
handle utf-8.
It does look like at least GNU coreutils 8.13 uniq is broken in that
regard, which frankly surprised me. That version isn't _that_ old.
$ uniq --version
uniq (GNU coreutils) 8.13
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Richard M. Stallman and David MacKenzie.
$ ( echo $'\u1234' ; echo $'\u2345' ; echo $'\u1234' )
ሴ
⍅
ሴ
$ ( echo $'\u1234' ; echo $'\u2345' ; echo $'\u1234' ) |
uniq
ሴ
$
--
Michael Kjörling •
https://michael.kjorling.se • michael(a)kjorling.se
“People who think they know everything really annoy
those of us who know we don’t.” (Bjarne Stroustrup)