[TUHS] Canonical Historic Approach to iconv(1)

27 Nov 2024

So a project I'm working on recently includes a need to store UTF-8 Japanese kana
text in source files for readability, but then process those source files through tools
only guaranteed to support single-byte code points, with something mapping the UTF-8 code
points to single-byte points in the destination execution environment.  After a bit of
futzing, I've landed on the definition of iconv(1) provided by the Single UNIX
Specification to push this character mapping concern to the tip of my pipelines.  It is
working well thus far and insulates the utilities down-pipe from needing multi-byte
support (I'm looking at you Apple).
I started thumbing through my old manuals and noted that iconv(1) is not a historic
utility, rather, SUS picked it up from HP-UX along the way.
Was there any older utility or set of practices for converting files between character
encodings besides the ASCII/EBCDIC stuff in dd(1)?  As I understand it, iconv(1) is just
recognizing sequences of bytes, mapping them to a symbolic name, then emitting them in the
complementary series of bytes assigned to that symbolic name in a second charmap file.
This sounds like a simple filter operation that could be done in a few other ways.
I'm curious if any particular approach was relatively ubiquitous, or if this was an
exercise largely left to the individual and so solutions were wide and varied?  My tool
chain doesn't need to work on historic UNIX, but it would be cool to understand how
to make it work on the least common denominator.
- Matt G.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

[TUHS] Canonical Historic Approach to iconv(1)