NASSR '96 Seminar on Electronic Texts and Textuality
Hyping the Hypertext: Scholarship and the Limits of Technology
Ashton Nichols, Department of English, Dickinson College
Amid current conversations about the profound impact that hypermedia, electronic texts, and the Internet will have on scholarly activity, less attention has been focused on the precise nature of the problems confronting those who design, implement, and make use of electronic texts ands archives. I would like to offer some reflections on the hardware and software limitations of hypermedia and electronic texts generally, presented from the vantage points of producers and users of these technologies. While promising legitimate wonders for the future, the current states of these technologies pose important questions and concerns for scholars, teachers, and students. My goal throughout this essay is to provoke discussion of these issues, not to offer single-voiced or even settled answers. I will focus on three categories of potential promise and peril: hardware, software, and the rationale of hypertext.
I. Hardware and the problem of finance
I teach at an institution that is now equally divided between PC and MAC platforms for computer use. I work with colleagues who range from software designing wizards to unrepentant legal-pad Luddites. I have experienced a network crash in the middle of a classroom presentation ("The Alpha is down"!), and I have seen a PowerMac literally start to smoke when it was switched too often and too quickly from one platform to another. On another occasion, I helped a computer technician rip a network connection from a classroom wall so that we could link to a database for a student bibliographic exercise. All of these events occurred in the span of one semester's work trying to link hypertext resources with classroom teaching. These anecdotal incidents suggest some of the most basic sorts of technical problems that will continue to confront producers and consumers of new computer technologies.
Networking costs are astonishingly high for most institutions. Capital expenditures are complicated by cost-versus-service anxieties. Fiber optic cable, the current standard, costs ten times as much as traditional copper wire. As the feature-sets on electronic networks expand, the cost rises exponentially. My institution currently sends out 10 million bits per second via an Ethernet connection onto a 1.5 million bits per second feed to the Internet that is shared with a number of servers. That's a tenfold traffic jam of information processing at the point of entry. As a result of such differences in scale and capacity, the Internet backbone is now, in the words of one technician, "a back country road trying to accommodate several superhighways." In addition, desktop systems that were considered standard at my institution two years ago are now piled up in a warehouse, for sale for $25 (that is to say--obsolete). In the absence of T1 phone lines--my institution purchased a restricted one only with the aid of external funding--such problems will continue to confront networked users. Server access remains a problem, financial and geographic, for all those scholars, readers, and students without access to a university or college server.
In a course I am currently teaching on a Mellon grant, we have had ongoing difficulty getting three campuses to agree on a mutually usable system for interactive video teleconferences. After frustrations with Sun UNIX workstations (small video image, poor audio quality), elaborate demos of Picture-Tel and V-tel systems, and a brief flirtation with something called the M-Bone on the Internet, we have retreated to the original Suns with only slightly enhanced video capacity, we hope. Hardware problems much less complicated than ours will only be solved when the box we use is as universal as a television and the per unit cost reaches the level of other standard appliances (tv, VHS, CD player, dishwasher, dryer: all $300-500). Ordinary desktops with basic options still run $2,000 by the time we are printing and modem linked. We are now promised cheap and almost "empty" boxes that will be simply the conduits for vast amounts of Internetted data of all kinds, but desires for institutional storage, printing, and instant personal access of various kinds will probably limit the universality of real-time online systems.
Finances will only stop being a problem on campuses when colleges and universities are fully networked with each other and when band-width compression technology has improved beyond current standards. By then of course, the demand for fully interactive Web sites and audio-visual storage will have become new problems for institutional budget crunchers.
II. Software and the problems of readability
We must have a language that will allow us to format, tag, and enter documents in the precise ways we want and need. We currently have languages and protocols that allow us to do many things with texts, but they are sometimes limited to specific hardwares, softwares, servers, or networks. SGML (Standard Generalized Markup Language) looks like the current best option, because it is an umbrella language that can be adapted for a wide variety of uses. HTML is still lodged in the minds of most early and ongoing workers on the Web as the subset of SGML most useful and easily employed for primary data functions. HTML is not technically a programming language (nor is it a program); it is a set of standards for defining document types (DTD) within SGML. These standards are themselves constantly being revised to account for new extensions and add-ons. At the same time, HTML is becoming an increasingly invisible technology, swallowed up by a range of editors and templates that efface the commands that allow composition, mark-up, and tagging to proceed. SGML, by way of HTML, has received widespread support and development from within the humanities community. But it by no means solves all the problems of attribute handling within documents or other electronic data. Manuscript documents containing the word "Shakespeare" (much less its variant spellings), or the eighteenth-century "f"-looks-like-an-"s" confusion, are hard to mark-up and sometimes unsearchable. Graphics of all kinds, much less audio and video files, produce obstacles to any electronic research that would represent itself as complete. Very different results are obtained from key-punched and scanned information. We will clearly need enhanced languages for markup and encoding; the current question is whose language enhanced for what purposes. The archivers of newspapers from the nineteenth-century have very different requirements than the indexers of scholarly essays and articles. The researcher of Aztec shards places different demands on hyperspace than the cyberpunk novelist or the Renaissance bibliographer. Full text documents are always hard to read on the screen and usually slow to encode.
As it now stands, we find HTML zealots, faithful to the print-media model with which SGML seems to work best, competing with free-form "razamataz" HTML-extenders, whose extensions (blinks, moody backgrounds, and the like) are unreadable by certain browsers. The result is a tension between aesthetics and information transfer. Readability is a function of the signal as sent, but it is nevertheless still in the eye of the beholder. Do I want my document to look slick or to work well? Can I combine the two demands without sacrificing either? A similar conflict produces a trade-off between searchability and historical accuracy. Do I need variable fonts, or should I produce easily dissectable photographic images to accomplish the same task? Humanities scholars in particular are pressing to make HTML more sophisticated so that it can deal with a wider range of data: documents, photos, audio files, and films. The complete SGML has a much wider application than HTML; it defines, for example, military standard documents (Milspec) that have thousands of pages of specifications alone. Do we stick with an expanded HTML or move (and this is a gross oversimplification) to the wider superset that is SGML? The answer to such a question is almost entirely a function of our particular and individual needs on the Net.
While researching this essay, I contacted a colleague who is pursuing a graduate degree in computer science and asked him for some comments from the perspective of programming language. His response was revealing:
What everyone now wants are ?interactive' Web pages, accomplished via Java (which is pure programming, plain and not so simple) or Perl, a Unix ?scripting' language with nothing simple about it. If any of this were demonstrated on a single system--as opposed to a distributed (internetworked) environment--or even a local area net, people would think it was quite primitive. We have grown up with disparate systems that didn't communicate well, and now we're getting them all to talk. This is something of a feat, but the Web and HTML are really very primitive from a technological standpoint. Computer scientists will complicate the system as soon as they can get it out of the hands of humanities types. [The system] will then do more, but you will have to hire [computer scientists] at a big fee to get it done.
So what looks like a remarkable development from the consumer side of the equation is seen as "primitive" in the eyes of the program developers. As a result, the skills and equipment in which many of us are now investing time and energy will remain two or three steps behind the curve of the programmers and software developers. At the same time, of course, only the easily reproducible and teachable technologies will ever gain widespread appeal or application. As my programming friend noted: "it's that very simplicity and accessibility that have allowed the Web to get so big so fast. Just like Fax machines, [the system is] accessible and useable without a lot of training or investment of time. If you had to program [any resource you are using on the Internet] in C++, or had to direct someone else to do so, you would never have even started." At the same time, as is clear to those now involved in Web work and tagging schemes, these "visible technologies" (where we actually see the codes required for storage, transfer, and retrieval of data) are becoming increasingly transparent themselves (i.e. replaced by text editors). HTML might disappear altogether with a new generation of users who don't want to expend the time and energy required to learn complex mark-up technologies that may be quickly obsolete.
As I write this, increasing numbers of information content-providers are outsourcing HTML projects to workers offshore and in economically depressed rural areas. Data encoded, or merely keyed in, by seven-dollar-an-hour workers will have very different status as research material than material encoded by graduate students or full professors. Likewise, while certain technology workers may lack academic acumen, full professors and their minions often lack technological sophistication or rudimentary data-basing skills. Academic computing is constrained by a whole series of factors--discipline, rank, promotion, institutional support--that have no specific connection to the wider computing world. The desire to make money is driving an ever-increasing percentage of activity on the Net (AOL, Netscape, Java, faster modems), but the most significant impacts on humanities computing are likely to come from much more obscure research sources (grouped networks, band-width compression, and back-up storage).
One additional anxiety has to do with the process of coding itself. Under current conditions, the human link-maker (or coder) becomes a new guardian of knowledge. Like editors and publishers of old, the flesh-and-blood individuals who determine codes, tags and links to documents, audio files, and visual images are now the new gatekeepers of knowledge. The tag, code, or link becomes a gate through which information must pass. At the same time that such links are emerging as a new form of knowledge ("what link allowed me to get to this point?"), high walls are going up around proprietary data. My server won't locate some of the URL addresses I enter. Certain addresses are on restricted access when I arrive; others cost money that has not been provided for by my project. Likewise, the incessant flood of data is increasingly controlled by servers almost as fast as it is generated. This proprietary control is exercised by a variety of interests: financial, educational, ethical, and sometimes suspect. At the same time, environments like MUDs (Multi-User Dungeons) and MOOs (MUDs: Object Oriented) promise completely new hyperspace environments for the transmission and reception of humanities data. Imagine where most of us were ten years ago (typewriters, Commodores, K-Pros, thundering printers); that image provides some sense of where we may be five years from now.
III. The Rational of Hypermedia: Information as a Sort of Solution
I draw my title for this section from Jerome McGann's recent essay on "The Rationale of Hypertext." McGann believes, perhaps not surprisingly, that "a fully networked hypermedia archive would be an optimal goal." Of course, such a complete archive presents the same practical problems as those posed by the birth of the Net itself. Hundreds, perhaps thousands, of scholarly and not-so-scholarly workers, laboring in a staggering array of private and institutional contexts to prepare, present, and inter-relate texts, images, audio, and video files. The mind already boggles when it attempts to negotiate a single straightforward search ("Thomas Hardy and nature poetry" was my most recent example) on Alta Vista, Lycos, Webcrawler or some other search engine. "James Joyce," almost regardless of how I limit the search, brings up lots of information in which I have no interest whatsoever. The newest version of Netscape available to me (3.0) provides a fairly sophisticated search engine as part of its URL window. If I mistype or don't know a precise URL (some now running to dozens of characters), Netscape searches its own database of http-indexed tags and brings forward all occurrences of the term "Hardy" or "Joyce." My view is that information itself will continue to be a key to many of these sorts of problems. As we learn more, we will also learn more of what we need to know: this index was carelessly produced; this tagging was done by someone with little knowledge of the field; this content provider is more concerned with slick graphics than with careful editing. As this kind of information becomes available and understood by users of hypertexts, scholars will be more and more able to make effective use of a wide range of new data. The "hypercontext" will become as important as the hypertext. Our caveat will always be an old one: data is finally produced, transferred, organized, interpreted, retrieved and assessed by human beings--more or less accurate, more or less fallible--even when sophisticated tools make it seem as though the human is just the ghost outside the machine.
McGann is right to point out that we need hyper editing projects designed with the "largest and most ambitious goals" in mind, rather than those keyed to languages or technologies that may prove obsolete in the short term. We also need to be as flexible in text design as possible, seeking logical mark-up and digitized imagery that will remain stable over time in a variety of media but which can also be reformatted and accessed from future platforms. We also need thoughtful link-makers and coders who can leave more rather than less options open within text and other files, who can provide for the widest possible access to information that is relevant to particular tasks. Consider an example based on a single text. I want to read Wordsworth's "Lines composed a few miles above Tintern Abbey" on the Web; someone else wants to word- search the poem. A third scholar wants to scan this text and ten thousand other documents related to it. Yet another researcher wants audio and video files that link to various versions of "Tintern Abbey" and to extensive information about Wordsworth's life. A final scholar wants to correct the errors that are lodged in the various versions of this poem that are currently available in hyperspace. All of these readers and scholars are placing legitimate demands on hypertexts, but all of them have varying hardware, software and server-access requirements. Such multi- tasked uses will be a major challenge for the next generation of programmers and content providers. How do we organize the current library (or is it a landfill?) of information in ways that will make it most effective and useful to those it is designed to serve? How do we tag, mark-up, and classify data sources in the most open-ended and yet useful ways? How do we prevent valuable resources from remaining hidden or inaccessible to all but a few elite users? How do we guarantee access to those who need and want information while constraining those who might damage, destroy or "misuse" the same materials? Should we maintain any form of control over information once it exists in hyperspace? Can we maintain such control?
In conclusion, let me say that I am not pessimistic about any of these issues. The "library" metaphor, as McGann notes, is the one that workers in the humanities should probably employ, albeit with slight qualifications. A library is a library no matter how many books it contains. It can be a superb library even if it has no books about cookery, or Cleveland, or the French Revolution. It can also be useful if it contains extensive holdings of still pictures, edited films, and "raw" video footage, or if it doesn't. The "ultimate" library of the future will be a linkage of all the world's libraries, text and hypertext, a library that translates documents from one language into another with accuracy and cultural nuance, a library that allows patrons to shift from verbal text to static imagery to full video and back again effortlessly, a library that has every text available to every patron at the precise moment it is needed. Those of us now at work in an early version of such a library are like the literate and semi-literate Europeans who gathered around platen presses in the second half of the fifteenth century, watching the first hand-printed sheets lifted off the presses, thinking to themselves--"this is wonderful, but when will they get to the book I want," or "this is astonishing, but why doesn't it work faster," or "look at that, but what is it good for?" Until we reach the (unreachable?) Platonic hypermedia archive, we should be legitimately astounded with resources already at hand: Gutenberg, Bartleby, The Shuttle, Perseus, The Oxford Text Archive, Romantic Circles, SETIS, CURIA, RECALL. These terms and acronyms have new and powerful roles in the critical lexicon of humanities research. We should not despair that the human limits of all technology--lack of funds, carelessness, confusion, deceit--will render our new world of information any less useful than the old one. Nor should we allow any of the problems associated with hypermedia software or the scholarly limitations of electronic media to be reasons for not getting on with our work.
Useful sources of information for issues discussed in this paper:
Let me add a final word about electronic contact and information sharing. We have, on the one hand, a sense of millions of terminals linking technical wizards via instant access and, at the same time, a feeling of isolation and exclusion on the part of many scholars at "lower" levels of familiarity with these tools and technologies. Some of us are afraid to reveal our ignorance of tools we are already using; others sling acronyms and techno-jargon with an ease that borders on intimidation. There are numerous resources easily at hand. TheCenter for Electronic Texts in the Humanities (CETH) at Princeton and Rutgers as well as the Institute for Advanced Technology in the Humanities (IATH) and the Electronic Text Center (ETC) at the University of Virginia are providing useful on-line information for readers, learners, teachers, and scholars who want to take advantage of the new resources available to them. The Association for College and Research Libraries (ACRL) Electronic Text Centers Discussion Group and the Association for Library Collections and Technical Services (ALCTS) of the American Libraries Association (ALA) also have valuable discussion groups and resources to offer. Other useful information comes from the Text Encoding Initiative (TEI), The Consortium for the Computer Interchange of Museum Information (CIMI), and the University of California Encoded Archival Description (EAD) project. See also the 1996 CETH Summer Seminar Participants list, which includes individuals engaged in marking up, editing, and establishing a wide range of electronic texts and data bases.
Web sites of useful information:
General background to SGML from the Institute for Advanced Technologies in the Humanitieshttp://www.iath.virginia.edu/elab/hfl0109.html
Jerome McGann's "The Rationale of HyperText," a manifesto with case studies: http:// jefferson.village.virginia.edu/public/jjm2f/rationale.html
I am grateful to Robert Cavenagh, Director of Instructional Media at Dickinson College; Neal Baker, Librarian at Dickinson; Mike Hite, Network Manager at Dickinson, and Jack Cowardin, of the graduate program in Computer Science at William and Mary College, for assistance and advice in the preparation of this essay. Remaining errors, and technological glitches, are entirely my own.