Brief Introduction to troff

A Brief Introduction to Text-Formatting using troff

Paul E. Dunne

Abstract

This document gives a simple overview of the main components of the Documenter's WorkBench (DWB) system which comprises a suite of utilities geared toward particular text-formatting activities. In this note we describe troff and its use with the -ms macros; and the table layout mechanism tbl. This document is only intended to describe sufficient of their capabilities to allow them to be used for basic text preparation: it is not intended as a substitute for the various user guides which fully document them.

1 Introduction

Documenter's WorkBench (DWB) is the standard text-formatting system run under the UNIX operating system. Although it consists of a number of different programs the parts with which this guide is concerned are the following:

All of these are accessible from the the HP workstations.

The formatting system produces typesetter quality output (on the HP LaserJet). The system takes as input a character file which contains the document text interspersed with formatting directives. These directives determine how the text should be laid out on the printed page. In essence the different components of DWB are responsible for handling different types of directive: the -ms macros provide simple layout facilities e.g. starting a new paragraph, making a section heading etc; eqn deals with directives for typesetting mathematical expressions e.g. producing subscripts and superscripts; tbl controls the layout of tabular data. troff is the program which translates these directives into a form which can be used to drive the LaserJet printers.

In practice you should only rarely need to use facilities other than those catered for by the -ms macros. The first two sections of this document have been produced using only those directives described in Section(2) below.

The remainder of this guide is organised as follows: in the next section the -ms macros and some other simple formatting directives are described. These are sufficient to generate high-quality output containing only text i.e. without mathematical expressions, tables, diagrams etc; Section(3) describes the basic features of tbl to a level where it can be employed to produce simple tables; finally Section(4) describes how the system can be invoked.

2 The -ms macros and some basic troff directives

As was stated above a source file for consists of the text of the document being produced, this text being interspersed with formatting directives.

Warning: Every troff source file MUST start with some -ms directive. You cannot start with text right away.

A directive arising from the -ms macros is indicated in the source file by placing it on a single line of the file: all such directives have the form .XY where X is an upper-case Roman letter and Y is (normally) an upper-case Roman letter but in some cases can be a single numeric digit. Thus

 here is some text which is going to be formatted eventually but I want a
 .XY
 special effect here so I have used a directive .XY to give it.

The full stop (`.') indicating a directive must be the first character occurring in the line.

The most frequently used -ms directives are those dealing with different types of section heading and different paragraph layouts.

Paragraph Layout

There are three basic methods of indicating a new paragraph in the text, in all three cases a blank line is left between the last line of text entered and the opening line of the new paragraph: one method leaves indentation (blank spaces) on the first line of the paragraph; the second leaves no indentation; the third indents the entire paragraph. The amount of indentation can be varied but you should not need to use this facility.

The first type of paragraph (of which this paragraph is an example) is produced by the directive .PP. Thus

  .PP
  The first type of paragraph (of which this paragraph is an example) is
  produced by the directive .PP. Thus
reproduces the paragraph describing this directive above.

Paragraphs of the second type are indicated by the directive .LP. Note that this is the type of paragraph which should open a new section or chapter. Thus

   .PP
   In the morning Marlowe and Van Norden leave to search for the false teeth.
   Marlowe is blubbering. He imagines they are his teeth.
   .LP
   It is my last dinner at the dramatist's home. They have rented a new piano,
   a concert grand. I meet Sylvester coming out of the florist's with a rubber
   plant in his arms. He asks me if I would carry it for him while he goes for
   the cigars. [1]

would produce

In the morning Marlowe and Van Norden leave to search for the false teeth. Marlowe is blubbering. He imagines they are his teeth.

It is my last dinner at the dramatist's home. They have rented a new piano, a concert grand. I meet Sylvester coming out of the florist's with a rubber plant in his arms. He asks me if I would carry it for him while he goes for the cigars. [1]

The third type can occur in two forms: unlabelled and labelled. The latter type is useful for typesetting lists of references. The unlabelled form is indicated by the directive .IP e.g.

  This is a quotation. Completely indented paragraphs are often used to set
  these off from the main body of the text.
  .IP
  Longago Beforetheworldsfair Beforeyouwereborn and they went to Mexico
  on a private car on the new international line and the men shot
  antelope off the back of the train and big rabbits jackasses they called
  them and once one night Longago Beforetheworldsfair Beforeyouwereborn
  one night Mother was so frightened on account of all the rifleshots
  but it was allright turned out to be nothing but a little shooting
  they'd only been shooting a greaser that was all [2]

would generate This is a quotation. Completely indented paragraphs are often used to set these off from the main body of the text.

The labelled form is indicated by adding the required paragraph label following the .IP directive. Thus

  .IP [1]
  Miller, H: Tropic of Cancer; Obelisk Press Paris (1934), (Granada, 1977, pp. 60-61)
  .IP [2]
  Dos Passos, J: U.S.A. (The 42nd Parallel, The Camera Eye (3)); Constable (1938)
  (Penguin, 1986, p.37)

Results in

Note that the paragraph labels ([1], [2]) are not indented.

4 Section Headings

Two types of section heading are possible: numbered headings, such as the ones used so far in this guide; and short headings. Numbered headings are invoked by typing the directive .NH followed by the heading itself on a single line which must then be followed by one of the three paragraph directives. The `Introduction' heading at the start of this document was produced by

  .NH
  Introduction
  .LP
The number attached to a section heading is incremented with each call of .NH: thus for the title of the second section of the guide it was sufficient to type

  .NH
  The -ms macros and some basic troff directives
  .LP

The directive .NH may be followed by a single numeric parameter this will have the affect of automatically numbering sub-section headings consistently with the numbering of the outer sections: thus the directive .NH 1 is equivalent to .NH on its own; the two sub-heading used already in this section were produced by

  .NH 2
  Paragraph Layout
  .LP
  ..... text ...
  .NH 2
  Section Headings
  .LP

Similarly .NH 3 would produce (for this section) 2.2.1. as the number.

Short headings are produced in the same style as numbered headings but without an attached numeric label. These are invoked by the directive .SH followed by the text which again must be followed by a paragraph directive. e.g.

  .SH
  This is an example of a short heading
  .LP
will produce

This is an example of a short heading

Displays and Keeps

troff will format contiguous blocks of text into justified paragraphs unless there are explicit directions that this should not be done. There are a number of cases where one wishes some text to appear on the printed exactly according to the layout in which it is typed e.g. sections of programs or algorithms. To allow one to do this there is a mechanism called `display blocks'. The start of a display block is signaled by the directive .DS and the end by the directive .DE: all text occurring between .DS and .DE will appear laid out on the printed page in the same form as it is typed. Thus

  .DS
     That woman's days were spent
     In ignorant good-will,
     Her nights in argument
     Until her voice grew shrill.
     What voice more sweet than hers
     When, young and beautiful,
     She rode to harriers?
                               W.B. Yeats
                               Easter 1916
  .DE

will be printed as

   That woman's days were spent
   In ignorant good-will,
   Her nights in argument
   Until her voice grew shrill.
   What voice more sweet than hers
   When, young and beautiful,
   She rode to harriers?
                             W.B. Yeats
                             Easter 1916

One problem that may arise with displays is that the content may end up being split over 2 pages. `Keep blocks' are a means of forcing a particular portion of a document to appear on the same page. Of course their use is not confined to just display blocks: they may be used to force any block of text onto one page. The start of a keep block is indicated by the directive .KS and the end by the directive .KE: all text between .KS and .KE is processed normally but if the text is too long to fit on the current page then a gap will be left at the bottom of this and the material be printed at the top of the next page.

Warning: One frequent source of error is to forget to end a keep or display block with .KE or .DE as appropriate. The error detection mechanisms in troff are very limited and will not flag this; as a result the text following the .KS or .DS will not be printed. It is a good idea, after typing in your file, to check that all displays and keeps are properly terminated.

Some basic troff directives

The -ms macros are actually defined from sequences of lower level directives recognised by the troff processor. troff directives occur in two forms: those which are indicated by source lines of the form `.x' or `.xy' where `x' and `y' are lower case Roman letters; and those directives which can occur within a line of the document text (we shall refer to the former as commands and the latter as in-line directives).

A complete list of troff commands and in-line directives may be found in the full troff documentation. In this section we describe only the commands to throw a new page, perform line centering and provide blank spaces; and the in-line directives for changing the font being used.

troff Commands

A new page is obtained by typing the command

  .bp
on a line (exactly as with the -ms directives).

To centre a single line of text the command .ce is used followed by the line to be centred. To centre a group of n (where n is a numeric value) the command `.ce n' is given followed by the n lines of text. e.g.

  .ce
  This is a single centred line
  .LP
  .ce 3
  followed by
  a sequence of three (3)
  centred lines

produces

This is a single centred line

followed by

a sequence of three (3) centred lines

To leave an amount of blank space in the printed output the command

  .sp
is used. `.sp' on its own will throw a single blank line; `.sp n' (where n is a number) will throw n blank lines. It is also possible to request a literal amount of space (which is useful for leaving gaps in which to place diagrams). Thus

  .sp 3.25i
will leave a space of 3.25 inches (metric measurements in `cm' can also be used).

In-line directives

All in-line directives are signaled by a string of the form z where `z' is some ASCII character symbol (not necessarily alphabetic). Only the directive for changing fonts is described.

A large number of different fonts are available but the three basic fonts, Roman (with which the bulk of this guide is produced), Italic, and Bold, ought to be sufficient for your purposes. A change in font is indicated by the string \fF where `F' is one of the three letters R (for roman); I (for italic); or B (for bold). Certain non-ASCII characters, such as Greek or other non-Roman letters can be placed using any font by interpolating the appropriate two character name in the text: this takes the form \(xy where `x' and `y' are ASCII characters. A complete list of such special symbols may be found in the full documentation. One instance in the example below is the string \(ss which prints as ?? in the italic font.

  .IP
  \fIAls Zarathustra drei\(ssig Jahre alt war, verlie\(ss er seine
  Heimat und den See seiner Heimat und ging in das Gebirge. Hier geno\(ss
  er seines Geistes und seiner Einsamkeit und wurde dessen zehn Jahre
  nicht m\*:ude.\fR
  .LP
  .tl '''\fBFreidrich Nietzsche\fR'
  .tl '''\fIAlso Sprach Zarathustra\fR'

The command `.tl' prints a title in three parts: the text between the first pair of quotes appears on the extreme left; the text in the middle is centred; the text between the second pair of quotes appears on the extreme right.

This example would actually be printed as:

Note that a font remains in effect until a direction to change it is given. Thus after printing something in italic or bold you must explicitly change back to roman font when returning to the remaining text.

Page Layout Control

There are various parameters which control the appearance of text on the printed page: the size of the text (point size); the spacing between lines of text (vertical spacing); the line length; and the width of the left-hand margin (page offset).

When troff is run these parameters are set to default values held in named internal registers. These values can be altered by using the .nr command in troff. This takes two parameters: the name of the register to be changed and the new value of this register.

Examples are the following texts: the first has been set using point size 12 on vertical spacing 16 with a page offset of 2 inches and line length of 4 inches; the second using point size 10 on vertical spacing 11 with a page offset of 3 inches and line length of 3 inches; the last using point size 8 on vertical spacing 10 with a page offset of 2.5 inches and line length of 3.5 inches.

Ah. You mean it's nothing else but a table. Well some people would envy your certainty, wouldn't they Joey? For instance, I've got a couple of friends of mine, we often sit round the Ritz bar having a few liqueurs, and they're always saying things like that, you know, things like: Take a table, take it. All right, I say, take it, take a table, but once you've taken it, what you going to do with it? Once you've got hold of it, where you going to take it?

Harold Pinter

The Homecoming, Act 2

Example One

   We were talking - about the love we all could
   share - when we find it
   To try our best to hold it there - with our love
   With our love - we could save the world - if
   they only knew.
   Try to realise it's all within yourself no-one else
   can make you change
   And to see you're really only very small,
   and life flows on within you and without you.

Example Two

   Mon destin fut digne d'envie,
   Et pour avoir un sort si beau
   Plus d'un aurait donn sa vie.
   Car sur ton sein j'ai mon tombeau,
   Et sur l'albatre ou je repose
   Un poète avec un baiser
   Ecrivit:  Ci-git une rose
   Que tous les rois vont jalouser.

Théophile Gautier

Le spectre de la rose (from Les nuits d'eté)

Example Three

3 Formatting Tables

Table processing is performed by the tbl preprocessor. Again only the basic features of this are described.

The start of a table is indicated by the directive .TS and its conclusion by the directive .TE - between these, in the source file, is placed the table specification. This consists of three parts: a description of the options to be used e.g. whether each item should be placed in a box, whether the table should be printed in the centre of the page, how different table entries in the same row are separated etc; a series of table formatting descriptions; and finally the table data itself.

The options list is terminated with a semicolon `;'; the format details with a full stop `.'. Thus

  .TS
  options;
  format.
  data
  .TE

Options

The four important options (others are possible) are: center; box; allbox; and tab (x).

The word `center' appearing in the options list will cause the table to be printed in the centre of the page (the default is for it to be left-adjusted); `box' will place a single box around the entire table; `allbox' will surround each table entry with a box. tab (x) is used to specify which ASCII character will be used (in the data section) to separate items in the table. The default is the TAB character however it is preferable, when checking for errors, to use instead a character such as `%'. e.g.

  .TS
  center allbox tab(%);
  etc
  .TE

indicates a table which is to be centred, every item enclosed in a box and with % used to separate table entires in the data section.

Format

The four important format possibilities are: l, r, c and s.

The format description consists one or more lines which describe the arrangement of columns in a table: each line must contain exactly the same number of formatting characters although which precise characters are used may vary.

The entry l in a format line indicates that the corresponding table item is to be printed left justified; the entry r indicates that it should be right-justified; c that it should be centred. s indicates that this column is a continuation (and so will not be `boxed') of the previous one. e.g.

  .TS
  center allbox tab(%);
  l l s s r c
  c c c c c r
  c c c c c r
  l s s s c s.
  etc
  .TE
is the format of a table with (at least) 4 rows and exactly 6 columns: for the first row the first entry will be left justified; the second, third and fourth columns form a single `column' (so there will be just one entry in the data section for these three columns); the fifth place is right justified and the final item centred.

Data

The data items are entered in rows with each item in a row separated by the specified `tab' character and each row on a different line. The number of rows in the data part can exceed the number of rows in the format part: in this case the last rows will have the same format as the last format description. The number of data items in each row must be identical to the number of `printing' columns: i.e. in the example above the first row has 4 `printing' columns, the second and third have 6, the fourth (and subsequent rows) have 2. For example the table specification given by

  .TS
  center allbox tab(%);
  l l s s r c
  c c c c c r
  c c c c c r
  l s s s c s.
  First % Spanning Three Columns % Right % Centre
  A % B % C % D % E % F 
  a % b % c % def % ghi % jklmn
  New Format % Numeric
  For these remaining % 1
  Rows % 2
  .TE

would produce the printed table
First Spanning Three Columns Right Centre
A B C D E F
a b c def ghi jklmn
New Format Numeric
For these remaining 1
Rows 2

4 Using the system

The document to be formatted should be produced as a standard ASCII file containing the text and directives. To have this printed on the LaserJet issue the command

  jetroff -ms <filename>

The system will respond by printing a message indicating on which printer (Lab A or Lab D) the file is being output. There is no need to indicate if tbl or other preprocessors such as eqn are required, this will be done automatically.

Further Documentation