Document Processing Requirements for a Ph.D Thesis

Warren Toomey, 18th June 1998

Introduction

This is an on-line version of a talk I gave to the current Ph.D students in the School of Computer Science at ADFA. I have left the bulleted points of the talk untouched, but I have added a section with hyperlinks to the tools mentioned in the talk.

If you have any questions about the presentation, or the tools I used, please email me at wkt@cs.adfa.edu.au.

What Document Processing Environment Did I Need?

• I could only tell once I had finished the whole thesis!
• Separate files, e.g per chapter, with ability to produce 1 document.
• Chapters, sections, subsection.
• Markup & layout ability, fonts, bold/italic, quotation, computer code, math equations.
• Prefaces, appendices.
• Tables of contents, diagrams, references, etc.
• Graphics: diagrams, graphs, artwork, photos.
• Internal references: chapter, section, subsection, page. Done automatically.
• Citation: of direct quotes, paper references. Flexible citation style. Production of bibliography and reference sections.
• A reference database: what details can be stored, ability to add more fields and use them.
• Document management: backups, versioning, visualisation of differences.
• Preview of output. Printing of selected pages.
• Spell checking.
• Tools you are comfortable with.

LaTeX

• LaTeX is a markup-driven document processing system: what you ask is what you get:

    \section*{LaTeX}
\begin{itemize}
\item LaTeX is markup-driven: what you {\bf ask} is what you get:

• Very old: around 1985.
• Files can be edited with any text editor. File format has not changed significantly: I can still edit & print my 1988 honours thesis.
• LaTeX does: markup, layout, fonts, bold/italic, internal references, prefaces & appendices, tables of contents etc., citations.
• Math formulae and quality of layout are its stong points. Very heavily used by math journals and by academics.
• Available on nearly any platform: Unix, Macs, Windows 95, Win 3.1, mainframes, Amigas etc.
• Downsides: batch oriented, you must process the entire document to see the layout changes of any editing. Forces you to think of document's structure, not its layout.
• Global layout changes are relatively simple to do.
• LaTeX also provides a sophisticated built-in programming language. You probably won't need it. However, it means the system is extensible.

Displaying LaTeX's Output

• LaTeX only processes the input files, it doesn't display them.
• LaTeX outputs documents in a special format known as DVI, or device independent' format.
• Several DVI viewers available. I use xdvi for X Windows.
• To print these files, I use dvips to produce PostScript files which can then be printed on our laserprinters.
• Other DVI display and print tools are available.

Graphics

• LaTeX's built-in drawing capability is terrible.
• However, there are standard' extensions to include external graphic files. I use the EPSF extensions. Both the xdvi and dvips tools understand these extensions.
• I use xfig as my tool for drawing diagrams. It produces EPSF files.
• To convert data to graphs, I use Gnuplot. It can produce EPSF files directly, but I normally convert to xfig format, so I can add/move labels, and then convert to EPSF.
• For screen-shots, artwork etc., xv can be used to convert from nearly any bitmap format to EPSF format.

References

• The reference database tool which comes with LaTeX is BibTeX.
• Again, based on text files, which you have to hand-edit.
• Citation style within LaTeX is completely malleable. I tweaked the `scribe' format to suit my thesis. You can define new fields as well.
• Other standard citation formats available. Many journals give out LaTeX templates for the format they require.
• There are some BibTeX database tools which give you a GUI front-end, instead of manually editing the files.

Document Management

• A crucial aspect of thesis production. Versioning later.
• You need backups! Use whatever tools you can use. Do it regularly!
• Home/work document migration: a pain to keep duplicates in sync.
• I have Unix at home & at work. I use rsync to synchronise trees of (any) files joined by a network. Rsync only sends differences where required, and also has built-in compression.
• With a 14.4K modem, I can usually rsync my work/home Ph.D area (roughly 40 Megs) in under 5 minutes, often faster.

Document Version Control

• You need to be able to find out when you edited a chapter, why, and what the changes were.
• I use RCS for document versioning. When you check-in a file, you can add a comment in describing why you checked it in. Checked-in files are read-only.
• When you check-out a file, it gets a new version number, and becomes writable.
• You can check-in or -out many files at the same time.
• You need to be able to print version numbers on drafts. I modified LaTeX's page style to do this for me.
• Every document I modified as part of my Ph.D went into RCS: thesis chapters, source code, log of activities etc.

Summary

• LaTeX and a number of other tools gave me the document processing environment I required to write my thesis.
• It wasn't as user-friendly as current word processors, but it had the ability to be moulded to my requirements. That was very important!
• Even better, all the tools are freely available.
• Finally, I expect to still be able to read and use my LaTeX documents in 10 years with little changes.

LaTeX

The current version of LaTeX is LaTeX2e, which differs from the LaTeX described in Leslie Lamports book, published in 1985. A Nutshell book by O'Reilly and Associates, Making TeX Work was more up to date, but is no longer being maintained by the author. I'd welcome any other hyperlinks to good, up-to-date LaTeX books.

On-line information about LaTeX and LaTeX2e, including documentation, can be found at the LaTeX Encyclopedia site.

LaTeX itself, and more styles, extensions & associated tools you can poke a stick at, can be obtained from any of the Comprehensive TeX Archive Network sites, also known as CTAN.

If possible, you want get a pre-compiled binary set of the LaTeX tools, to save you the trouble of building it. If you run FreeBSD or Linux , you can obtain pre-compiled binary packages. The same exists for Windows 95, but I don't have any hyperlinks at hand for them. The BiBTeX bibliographic tools come with LaTeX, and you can many reference styles from the CTAN.

DVI Tools

The DVI tools xdvi and dvips can be obtained through the CTAN. xdvi has it own home page . I haven't found one for dvips. Both are pretty easy to compile, and both are available as binary packages for FreeBSD and Linux. I used some home-grown tools written in perl to separate colour pages from black & white pages in the PostScript output from dvips, so I could send them to different printers.

Graphic Tools

The main tools I used were xfig to draw figures, Gnuplot to do plotting, and xv to work with bit images. All three can produce EPSF files. I used an old LaTeX extension, epsf.sty, to include EPSF figures. There are newer extensions to work with EPSF files, but I haven't used them. Check on CTAN for more details.

Xfig doesn't have a web page, but is software contributed to the X Windows system, and is available at ftp://ftp.x.org/ . Gnuplot has its own home page . Xv has its own home page . Again, binary packages for FreeBSD and Linux. Easily built on most Unix platforms.

Document Management

Rsync is a great tool for synchronising entire trees of file between two Unix systems connected by a (possibly slow) Internet connection. You can find out more about rsync from its home page . There are many document revision systems: RCS, SCCS, CVS, and I hope there are some systems for Windows 95 (anybody got some hyperlinks?).

Here is the original RCS paper by Walter Tichy. Here are the basic commands:

 ci -u file Check-in a new/existing file Makes read-only, gives new version number co -l file Check-out file, makes writable rlog file Shows log of check-ins rcsdiff file On checked-out file, shows differences from last checked-in version. rcsdiff -rX -rY file At any time, shows differences between versions X and Y.

Finally, you might be interested to know that this talk was written in LaTeX and translated to HTML using latex2html . Similarly, I have converted my Ph.D thesis to HTML and it is now on-line .

Warren Toomey
6/18/1998