PDF vs HTML
Original version e-mailed to Electronic Publication Implementation
Committee (EPIC), Entomological Society of America (ESA), 22 July 1997.
Thomas J. Walker, 25 August 1997.
The question was posed as to whether HTML or PDF would be the more useful
format for ESA's electronic reprints. The short answer is that it depends
on whether the articles are to be viewed and used mainly on the computer
screen or mainly as hardcopy.
Here is a brief comparison of the two:
PDF, Adobe Acrobat's Portable Document Format,
retains the exact appearance of a document, no matter what platform (PC,
Mac, Unix) is used to view or print it.
HTML is the language of the Web. HTML documents are thus designed for on-screen
viewing and interactivity. They are intended to be accessed via computer
whenever needed rather than to be printed to simulate a traditional reprint.
Making HTML versions of traditionally printed articles is more expensive
than making PDF versions and managing the files that are involved is more
complex.
Here is a more detailed comparison:
- Cost to produce
PDF: Very low, especially if the files used to produce plates for printing
are available.
HTML: Modest to much, in proportion to the attention given to interactivity
and on-screen appearance. Lots of good hyperlinks and lots of thumbnail
and enlarged images require lots of knowledge and labor.
- File structure
PDF: Each article is a single binary file, usually between 100 KB and
a few MB in size. Files are small compared to WinWord files of the same
documents.
HTML: The document and figures are separate files. A richly illustrated
article with both thumb-nail and full screen versions of each figure could
require 10-20 files.
- On-screen appearance
PDF: Retains the exact appearance of the original article. Screen images,
or selected portions, can be zoomed in or out from 12 to 800%.
HTML: Appearance determined by mark-up tags, the browser, the monitor,
and the settings of the browser. Thus an HTML document may look great on
the state-of-the-art system used to create it but be debased by the software
and/or hardware used to view it.
- Interactivity
PDF: Some possible, but costs more.
HTML: In addition to internal and external hyperlinks, HTML permits a variety
of other enhancements over traditional formats. Among these are audio and
video clips, and forms encouraging viewers to submit their corrections
and critiques.
- Printed appearance
PDF: Like a good photocopy of the original article.
HTML: Like the document appears when viewed on screen, including thumbnails
(if used) rather than full-size figures. (The full-sized figures can be
printed in separate printing operations). If color is important in the
on-screen image, a color printer is needed.
- Math symbols
PDF: Renders math symbols seamlessly.
HTML: Does not support math symbols, except as auxiliary bit-mapped files.
- Searchability
PDF: While viewed in the Acrobat reader, a PDF-formatted article can
be searched for any word or phrase, permitting users to find passages of
interest. Search-service robots, such as those of AltaVista, do not retrieve
and index PDF files. Commercial software can index and search all PDF
files on a single server.
HTML: Search-service robots retrieve and index Web-published HTML files
when requested to do so. Once indexed, articles of interest can
be retrieved by free boolean searches of the search service's
index. Free software can index and search all HTML files on a
single server..
- Chief advantages
PDF: Low cost; produces images and hardcopy that researchers know and
like.
HTML: Interactivity (e.g., forms, internal and external hyperlinks); on-screen
appearance; audio and video clips easily incorporated.
- Chief disadvantages
PDF: Not easily made interactive.
HTML: Hardcopy lacks interactivity (links and thumbnails are dead; format
not designed for print). Hardcopy is less compact (more pages) and harder
to read (no columns). Difficult for user to save the article in electronic
form for later viewing (many files, sometimes requiring a particular directory
structure).