vulcanize - Convert LaTeX files to HTML

SYNOPSIS

vulcanize [ filename... ]

DESCRIPTION

vulcanize does its best to convert its input from LaTeX to HTML. If filenames are given on the command line, the input is the concatenation of the named files, in order; if no files are given, the input is the standard input. The resulting HTML is written to the standard output.

BUGS AND CAVEATS

The most important thing to remember about vulcanize is that it does not work, and it will never work, because the problem it is trying to solve is completely intractable. However, it adequately solves a worthwhile subset of this intractable problem. For a program that doesn't work, it is remarkably successful. See the Raison d'Être section below.

vulcanize does not properly handle nested environments. For example,

          {\em italics {\bf bold face } more italics }

becomes

          <em>italics <b>bold face </em> more italics </b>

vulcanize doesn't convert ~ (tilde) properly. It should be converted to  , the HTML entity for a non-breaking space. However, Mosaic for X doesn't understand  , so vulcanize replaces ~ with a breakable space instead.

vulcanize only converts \verb sequences when the verbatim text is delimited by plus signs.

Like many PERL programs, vulcanize is an abominable morass of unreadable cryptica.

Raison d'Être

Let's consider a simple and common LaTeX construct: $x_2$ , which should appear as x with a subscript 2. Now how should we translate this to HTML? HTML doesn't have a way to specify subscripts, and most WWW clients aren't capable of displaying subscripts.

We have two choices. We can ignore such constructs, in the hope that a smart human will come along later and decide what to do about them, or we can do what Nikos Drakos, author of LaTeX2HTML, does, which is:

Run LaTeX on the construct, yielding a .dvi file.
Run dvips on the .dvi file, yielding a PostScript file.
Run gs on the PostScript file, yielding a bitmap of the original formula.
Use the <IMG ...> tag to inline the bitmap into the HTML document.

Nikos' method has a number of serious drawbacks. It is slow and it is a lot of work. Even after it's done, the result is not satisfactory:

The resulting bitmap is in a fixed font, which means that it is unlikely to match the font that the client is using to display the text portions of your document.
The bitmap will probably not be aligned properly with the rest of the text, because there are only three options for vertical alignment of inline bitmaps in HTML. This is how LaTeX2HTML translates the \LaTeX macro; none of the three possible alignments is correct:
Tops aligned: .
Middles aligned: .
Bottoms aligned: .
Since it is a bitmap, it has a resolution hardwired in. Let's say you generated a 72dpi bitmap. If the user is viewing your document on a 144dpi monitor, the bitmap will be 25% the size it should be. If the user tries to print out your document on a 600dpi printer, the individual pixels in the bitmap will come out as big squares instead of little dots, and the bitmap will look choppy and jagged, even though the output device is of high quality.
The graphic cannot be displayed at all on text terminals. It will simply be omitted on these terminals.

This simple example demonstrates that good LaTeX to HTML translation is, in general, impossible.

Even barring typesetting quality problems like this, complete LaTeX to HTML translation is impossible. HTML documents are supposed to be displayable on a wide variety of output devices, and so HTML avoids assumptions about screen width and available fonts; this means that commands like \marginpar and \Big have no HTML equivalents. Tables can be typeset, but typically they will be displayed in fixed width fonts. \hrules and \vrules cannot appear at all. (<HR> is not the same as \hrule.) The bottom line is that TeX and LaTeX are complete typesetting and layout systems of immense power and flexibility, while HTML is a presentation-independent document structuring language with only a few simple structures; there is no good mapping between the two sets of capabilities.

In the face of these insurmountable difficulties, vulcanize sticks its head in the sand. It does the best it can at translating most common, simple LaTeX constructions, and it ignores the rest. For a large class of documents, this is enough. In cases where it is not, there are a number of approaches that may be more satisfactory. One is to make a DVI or PostScript version of your document available. These document formats contain formatting, layout, and font information, and can be printed or viewed on many output devices without any unnecessary loss of quality.

AUTHOR

Mark-Jason Dominus, University of Pennsylvania

EXPLANATION OF NAME

To vulcanize rubber is to improve its strength, resiliency, and freedom from stickiness and odor, by combining it with sulfur or other additives in the presence of heat and pressure.

The rubber molecule is many long parallel polymer chains; the chains can bunch up, and this is why rubber is springy and flexible. Vulcanization cross-links these chains.

M-J. Dominus, mjd@saul.cis.upenn.edu