[TUHS] Character sets
random832 at fastmail.com
Mon Mar 28 11:20:32 AEST 2016
On Sun, Mar 27, 2016, at 19:30, John Cowan wrote:
> > > while (*c && *c++ != " ");
> That particular piece of code still works if the encoding is UTF-8.
Sure it does, but replace that != " " with !isblank(*c), and it doesn't
work anymore since it ignores multibyte characters. Often you don't
care, but you've got to remember to set LC_ALL=C when running grep etc
on large data sets or it will be much slower, since \w and \s care about
multibyte characters (as does case-insensitive matching, etc).
More information about the TUHS