By Adam Hodgkin
September 29, 2000 (The Times Literary Supplement-London) - The web might have been built for reference works. In his original proposal, Tim Benters-Lee, the web's inventor, articulates the need for a multi-authored and highly adaptable framework for storing and retrieving information. He makes analogies with publications and paper-based systems and envisages a generalized computerized reference system. He notes:
"We should work toward a universal linked information system, in which generality and portability are more important than fancy graphics techniques and complex extra facilities. "The aim would be to allow a place to be found for any information of reference which one felt was important, and a way of finding it afterwards. The result should be sufficiently attractive to use that the information contained would grow past a critical threshold, so that the usefulness [of] the scheme would in turn encourage its increased use. (Information Management: A Proposal. March 1989-CERN memorandum)"
The web has regularly borrowed heavily from the world of the book, and most heavily of all from the apparatus of reference books. Almost all the most important navigational devices on the web (search engines, hyperlinks, meta-tags) are adaptations or extensions of the apparatus of printed reference works (catalogues/indices, citations, keywords).
But the web is different from other forms of literature. The web is dynamic and virtual, where print is permanent and physical. Indeed, the web is now so heterogeneous and all-embracing that it is much more than a literary structure; images, sounds, multimedia, databases, softwares, not to mention conventions and user communities, now form a part of its essential nature. But we must admit that the web is also unreliable, slow, frustrating and misleading. Why is this so, when it contains so much information? The short answer is that the web is unreliable precisely because it contains too much information, much of it deliberately misleading. While Tim Berners-Lee was stirring into his recipe "generality", "equality", "simplicity", and "openness", the wicked fairy was peering over his shoulder muttering "Self-advertisement, spam, prolixity and error." The web encourages contributions from every quarter, experimentation, variety and freedom of expression, and it is short of devices to ensure accuracy, reliability, economy, revision and permanence. An expert can use the web to obtain reliable information in many disciplines, but the expert user will be using sources which are known to be reliable. In many cases, these will be reference works or trusted resources (catalogues, timetables, or databases) which have been transferred from the web.
In its short life, the web has become a medium in which many thousands of books are published, distributed and consulted. Among them are at least a thousand reference books (dictionaries, gazetteers, encyclopedias, catalogues, atlases, thesauri and subject handbooks). The most highly trafficked dictionary is probably Merriam-Webster's Collegiate Dictionary, which has been on the web for over three years. Other notable reference works include The Columbia Encyclopedia (in various editions on several different websites) and the Cambridge Encyclopedia, which oddly never appears with the Cambridge title or branding, but can be found on half a dozen websites (including Biography.com and Freeserve). Academic Press offers free use of its Dictionary of Science and Technology. Lots of replication, but the web has not had a deep effect on the way in which we actually use and work with authoritative reference works. Most of the currently available reference books have been simply "ported" to the web. In most cases, the "porting" has been a two-stage process, in which a dictionary or encyclopedia is converted first to a CD-ROM format and then redeployed as a website which serves up entries, one at a time, in response to searches on key items.
Little effort has been spent on working out whether this is the way reference works ought to be realized on the web. For example, Merriam-Webster provides users with a thesaurus as well as the Dictionary at their website. This is a good idea, but it is poorly implemented: the site only permits you to consult the works in sequence, whereas it would be much more useful if one could consult the works simultaneously, seeing on one page the results from a single search term. A similarly plodding design decision means that the phonetic key appears at the bottom of the virtual page. Some printed dictionaries operate with this convention (phonetic keys as "running footers"), but a printed page can be taken in at glance; with a virtual page, the user is forced to compare a phonetic representation with a key which can only be located by scrolling, or mousing, down and "off the bottom" of the screen. A convention that works in print is almost the worst possible solution on a web page, where we should ideally have a sound-file to pronounce the word for us, rather than a phonetic transcription. The newest edition of Houghton-Mifflin's American Heritage Dictionary at Bartleby.com does offer us audible pronunciations (click on the icon to hear the word). Bartleby has half a dozen reference works, and the overall feel is elegant, but overly bookish. This site also has hundreds of full-text editions of poetry, plays and prose, so it is not surprising that the reference books are also bookish in their design.
This reliance on straightforward and book-based implementations will change. The change will be accelerated by the increasing commercialization of the web, and it will come from two directions. First, established reference works will become web resources, and they will no longer be produced as books; second, there will be an increasing number of purely web services which meet reference needs. Mainstream publishers are increasingly looking to deliver their most important reference works through the web (The Oxford English Dictionary, The Dictionary of National Biography, The Grove Dictionary of art, most of Chapman and Hall's Chemical Dictionaries. While Halsbury's Laws of England has not yet made the plunge, it surely soon will). Several of the largest reference works are already successfully launched as subscription-based web services. It is unlikely that they will ever again be printed as complete editions on paper.
In their first releases, these electronic works have tried to remain close to the publications from which they derive: in content, coverage, arrangement, apparatus and even appearance. They are bound to change and diverge from their printed parents, but we may expect that they will evolve relatively slowly. An important reason for this is that they are large-scale editorial operations. Major reference projects which co-ordinate the works of scores of compilers have a relatively slow "turning circle". But one thing is certain: to the extent that these massive reference ventures prosper, they will inevitably become much larger. Paper requires, and bookselling economics encourages, an editorial policy of concision and stringent selection for any printed publication. Databases and electronic media encourage comprehensiveness, inclusion and multiplicity. An authoritative and complete dictionary or encyclopedia can be much larger as an electronic publication than a print-based publication. We may not know how it will be done, or what these authoritative reference works will include, but it is a safe bet that the most reputable dictionaries and encyclopedias of the twenty-second century will be orders-of-magnitude larger than the Oxford English Dictionary or the Encyclopedia Britannica.
The Encyclopedia Britannica is the most striking example of the way in which commercial publishers of large reference works are adapting to the web. Britannica used to be the largest and most expensive mass-market publication. It is now the largest free mass-market publication, made available through the web to anyone who logs onto the site. Since the early 1990s, Britannica has gradually shifted its sales emphasis from the multi-volume print edition to CD-ROM sales. In the course of the decade, Britannica abandoned its direct-sales force and reduced the price of the work from several thousand dollars to less than $100 for the CD. In the autumn of 1999, the publishers of Britannica announced that they would no longer sell printed sets and simultaneously released the whole of the multi-volume Encyclopedia on a free website. This was a bold and pre-emptive move, which altered the landscape for commercially published reference works. Once a publisher with the reputation and brand of Britannica takes the steps of putting all its best material out on the web for free, it becomes difficult for any other mass-market publisher to take a different approach. Since Britannica moved decisively with its global brand and since it provides many other additional services from its website, it has gained a "first mover" advantage. The site works well. The advertising is no more intrusive than at Yahoo. But there is a risk that too many extraneous e-commerce offerings may be loaded onto the site (a trial search for "Bologna" calls up the expected entry on the city, but, additionally, numerous book "deals" which are only tenuously connected to Bologna).
The business logic behind Britannica's move was impeccable. But it was not without risk, and there remains an unanswered question: will the revenue from advertising and sponsorship be sufficient to support the editorial work and the capital investment needed to create a new edition of Britannica? If not, the co modification and leveling imposed by the web will have the unfortunate effect of preventing a new edition. Britannica, as first mover, has the best chance of establishing an advertising-supported popular reference work, but it is not obvious that there will be room for many similar services. Specialist, scientific and scholarly markets will be served by subscription services, but the mass market will, in this gloomy scenario, become dominated by "lighter" and inferior products.
A commercial publisher who thinks about the web as a distribution medium is likely to be choosing between two "business models": either the specialist and private service which relies on subscription fees to a limited market, or (and for many publishers this will be the "second best") an advertising or sponsorship service which generates revenue from an open publication to an unrestricted market. But this is the choice that faces a publisher who is wondering how to develop his list and considering what to do with the next edition of an already existing book.
And this is not the only way to view the opportunities presented by the web. One of the most interesting features of the web is the way in which a large number of essentially amateur or experimental reference works have appeared, some of which could well become the basis of a new form of reference publishing. There are scores of worthwhile examples, but a few illustrate particularly well some key features of the web.
Robert Beard is Professor of Russian at Bucknell University, PA. Since 1995, Beard has been developing a site which he called a "Web of Online Dictionaries". This site began as little more than a collection of bookmarks to dictionaries on the web. But it grew rapidly, and Beard developed a useful framework for classifying and accessing dictionaries, professional and amateur, bilingual and subject-specific, traditional or web compilations. The site now covers hundred of dictionaries, and it is an ideal jumping-off point when looking for dictionaries on the web. Professor Beard's site has recently become Yourdictionary.com, with clearly commercial ambitions, and it appears to be aiming to be a specialist portal for languages and dictionaries (a vertical portal sometimes known as a "vortal"). Yourdictionary.com is a "content aggregator", to use a piece of web jargon. Aggregators bring together resources which can be found at hundreds of other sites on the web.
The content aggregator has a role similar to the anthologist of the print world; that is, he brings together content which is otherwise discrete and hard to find. Content "integrators" have another role, for which there is no obvious print analogy. They make sure that content (in this case reference material) is available and integrated within some other web application. So Britannica has reinforced its encyclopedic content with a web search service; the encyclopedic articles are supported by links to useful websites on the same topic. Another model of content integration is provided by gurunet.com, a company which provides a downloadable software application with which users consult various on-line reference sources without leaving the unrelated web pages they happen to be browsing. Xrefer.com, a company I co-founded last year, has a similar mission, with the emphasis this time on aggregating and integrating different reference works through providing additional hyperlinks between the various reference books. There are plenty of other models for reference aggregation and integration.
Note that these examples of web distribution, aggregation and integration, leave the editorial and compilation process undisturbed. They are alternative ways of "repurposing" books which publishers have made available for use on the web. But the web is in many ways an ideal medium for organizing and creating a reference work. The central challenge that faces any such collective enterprise is the difficulty of ensuring adequate editorial standards of consistency and reliability, but the technologies of the web can be harnessed to this end. An example of a reference work produced by contributors collaborating through the web is FOLDOC (the Free Online Dictionary of Computing). FOLDOC was started in the early 1990s by Denis Howe, then a research student at Imperial College, London. It is just what the name implies - an online dictionary with over 12,000 extensively hyperlinked terms. The dictionary is undergoing continual revision by more than 1,200 contributors, who have written most of the entries, and some of whom have helped Howe in editing, vetting and updating entries. FOLDOC is a useful, broad and reliable source of information on most aspects of computing. It is much more reliable than most multi-authored web resources, because Howe has built and largely automated a quality control mechanism by which entries are vetted before they are included, and through which revisions can be monitored and approved.
Another example of a new kind of collaborative web-based reference work is the ambitiously conceived Tree of Life produced by Wayne and David Maddison of the University of Arizona. This is an extensive collection of World Wide Web pages that "seeks to comprise an illustrated, annotated phylogeny of all living things". In now has over 300 contributors, and has a multi-level quality-control procedure which should ensure that the enterprise establishes a good standard of reliability.
FOLDOC and Tree of Life are, and will probably remain, works indefinitely in process. They have no obvious editorial "ending". A print edition is, however, perfectly practical. Practical but pointless: as soon as it was printed, the edition would be out of date. But the web has already given rise to reference works which could not under any circumstances be printed. The key to producing reference works which are "beyond the book" lies with the design of new interfaces and new modes of presentation. The most exciting and compelling reference works of the web will be of this kind.
The web has a much richer range of interfaces than the printed book. The web is dynamic and "virtually" 3D. The best example of how 3D interfaces might work with reference material is the Visual Thesaurus, produced by Plumb Design of New York. Users navigate their way through a "cloud" of connected terms by pointing and clicking on terms that are semantically related one to another. We can imagine how an etymological dictionary might allow us to trace, traverse and descend etymological relationships; and the same database could rearrange itself to be given a "thesaurus view" or a conventional alphabetical order. Or to take a different subject matter, how we might be able to crawl through a Tree of Life whose branches and leaves respond to our navigational focus, perhaps by showing us "snapshots" of the environment in which each species might be found, or the timeline of its evolution.
Similarly we can visualize ways in which a music encyclopedia might use graphic and audio illustrations to provide jumping-off points for text-based information. A small set of densely designed scenarios of score, synthesizer, orchestra, portrait gallery, performance, timeline, etc, could provide alternative routs to hyperlinked textual information. The point of such multimedia reference resources is only incidentally to do with the information carried in the illustrations. More important are the ways in which images and design allow us to group concepts and organize information. With a rich and interactive graphical interface, consulting the next-generation music dictionary will be more like playing a computer game than thumbing an index.
The web is, at its foundation ("devoid of fancy graphics techniques", as Berners-Lee put it), a system of plain documents, references and hyperlinks, but it is becoming a multimedia environment, and its reference services will move in the same current. As the web drifts inexorably into all kinds of not-just-computers (mobile phones and interactive TVs, but other things as well), it will drag the web-enabled reference works into the same modalities. The web-based dictionaries, encyclopedias, gazetteers and atlases of the future will be multimedia chameleons so that they can be voice driven, hyperlinked, or image-accessed, as circumstances dictate.