Mô tả: 1The Peloponnesian War and the Future of Reference, Cataloging,and Scholarship in Research LibrariesByThomas MannPrepared for AFSCME 2910The Library of Congress Professional Guildrepresenting over 1,600 professional employeeswww.guild2910.orgJune 13, 2007No copyright is claimed for this paper.It may be freely reproduced, reprinted, and republished.___________________________________________________________________________Thomas Mann, Ph.D., a member of AFSCME 2910, is the author of The Oxford Guide to LibraryResearch, third edition (Oxford and New York: Oxford University Press, 2005) and LibraryResearch Models (Oxford U. Press, 1993).The judgements made in this paper do not represent official views of the Library of Congress.______________________________________________________________________________AbstractThe paper is an examination of the overall principles and practices of bothreference service and cataloging operations in the promotion of scholarly research,pointing out important differences not just in content available onsite and offsite, but alsoamong necessary search techniques. It specifies the differences between scholarship andquick information seeking, and examines the implications of those differences for thefuture of cataloging. It examines various proposals that the profession should concentrateits efforts on alternatives to cataloging: relevance ranking, tagging, under-the-hoodprogramming, etc. The paper considers the need for, and requirements of, education ofresearchers; and it examines in detail many of the glaring disconnects between theory andpractice in the library profession today. Finally, it provides an overview of the whole“shape of the elephant” of library services, within which cataloging is only onecomponent.2 What is involved in providing library service to the academic community? Is ourpurpose merely to provide “something quickly”? What, exactly, is wrong with promotingthat end as our goal? What is the role of reference work? How does library cataloging fitinto a larger scheme of necessary services? What is the larger scheme of whichcataloging is only a part? What should research instruction classes strive to cover? Whatis a good outline for a basic research class? Does anything need to be explained at all ifour “under the hood” programming and federated searching capabilities are adequate? Inshort, what idea of “the shape of the elephant” of research, and of library resources as awhole, do we wish to convey to an academic clientele? Users of public and special libraries have different needs; my concern in thispaper is the future of research libraries. Much of what the latter do, of course, spills overinto public and special library practices.A wide range of important issues and distinctions is involved here:• Differences in content available onsite and offsite - copyright restrictions on what can and cannot be digitized- digitized sources restricted by site licenses or password use• Differences in search methods available onsite and offsite- the variety of search methods, beyond keyword access (e.g,controlled vocabulary searching, citation searching, relatedrecord searching, browsing classified book stacks, use ofpublished bibliographies), available onsite: their differentretrieval capabilities• Differences between cataloging (conceptual categorization atscope-match level1, vocabulary standardization within andacross multiple languages, systematic linkage of categories) vs.relevance ranking of keywords, tagging, folksonomies, etc.- the need for search methods enabling recognition of relevantsources whose characteristics (and keywords) cannot bespecified in advance• Differences between scholarship and quick informationseeking3- relationships, interconnections, contexts, and integrations vs. isolated facts or snippets- the need for successive, sequenced steps (with feedbackloops) vs. “seamless one-stop shopping”• The problems of federated searching- misrepresenting the full contents and search capabilities ofindividual databases- masking the existence of non-included sources• The inadequacy of the open Internet alone for scholarlyresearch- its inability to provide overviews of “the wholeelephant”—i.e., not showing all relevant parts, notdistinguishing important from tangential, not showinginterconnections or relationships, not adequately allowingrecognition of what cannot be specified• The need for education of users, not just improvements in“under the hood” algorithms- education not just on how to use subject headings, but on howto do keyword searching itself - education on multiple search techniques other than keywordor subject-heading searching• The need for increased one-to-one connections with referencelibrarians, not just the digitizing of more material for directfull-text searching• The disconnects between library theory and practice- the assumption that library catalogs/portals should“seamlessly” cover “everything” to begin with- the assumption that library catalogs—or any other accessmechanism—can operate efficiently without any priorinstruction or point-of-use reference intervention - knee-jerk dismissals of enduring cataloging principles onlybecause they originated in times of earlier technologies4- disregard of the importance of vocabulary control and cross-referencing because it cannot be accomplished by algorithms- disregard of the significance of scope-match subjectcataloging as the major solution to the problem of excessiveirrelevant retrievals at the “granular” level- disregard of the importance of shelving books in classifiedorder, on the assumption that everything relevant can beidentified online- disregard of the extensive web of integral interconnectionsbetween LC subject headings and LC class numbers inproviding access to book collections- disregard of the increased utility of precoordinated strings ofsubject terms, and catalog browse displays of themThe problem with any discussion of such issues lies in the complexity of theirinterrelationships. It=s like trying to pin down a warped piece of linoleum—flattening abulge in one area immediately causes other bulges to pop up elsewhere. I cannot claim tohave a system that flattens all the lumps, but I am concerned that many of the moreimportant problems facing scholars are being ignored because a “digital library” paradigmputs blinders on our very ability to notice the problems in the first place.I think the best way to clarify what I mean is to provide a concrete example, as akind of central spine (I’m changing the metaphor) to which all of these issues areattached; I will discuss the various offshoot “ribs” as they arise in a real-world researchsituation. A major problem with much of the discussion in our profession these days isthat many of us are indeed speaking from different paradigmatic frameworks. The onlyway to determine which is the better frame is to examine which one works best “atground level”–i.e, which most readily enables the library profession to serve its scholarlyclientele in ways that solve the full range of their problems.Getting a researcher efficiently from what he or she asks for to what is available ina research library is a much more complex operation than most non-librarians realize; it isalso more complex than too many library managers themselves seem to understand. Mostof it cannot be done remotely through searching the open Internet, no matter how muchunder-the-hood programming underlies the utopian “single search box.” As the followingexample will illustrate, the work involved also escapes description in quantifiable ormeasurable terms; but when it is done properly it nonetheless makes an enormousdifference to the quality of the research that gets done. (It also justifies the expense ofinvesting in costly resources that would otherwise be overlooked by most researchers, butwhich can indeed be brought efficiently to their attention.)I am going to insist on differences between what I=ll call “scholarship,” on the onehand, vs. “quick information seeking” on the other. Obviously there is a spectrum of5continuities between the two–no one disputes that–but there are also big differences thatare too often swept under the rug. Scholarship requires linkages, connections, contexts,and overviews of relationships; quick information seeking is largely satisfied by discreteinformation or facts without the need to also establish the contexts and relationshipssurrounding them. Scholarship is judged by the range, extent, and depth of elements itintegrates into a whole; quick information seeking is largely judged by whether itprovides a “right” answer or puts out an immediate informational “brush fire.” Becauseof the range of elements involved, and the complexity of their integration, book formatsare unusually important for scholarship (especially outside the hard sciences); more thanany other medium, they allow an amplitude of coverage in ways that screen displays(especially of lengthy texts) make much more difficult to grasp. For scholarly inquiries, the extent and depth of relationships matter–indeed, theyare crucial to any judgment of the quality of the research product. Judging the result of a“quick information” search does not require an assessment of whether–or howsuccessfully–it integrates the information discovered within larger expositions ornarratives; the adequacy of an overall argument or survey does not arise in the same wayit does in scholarly inquiries. There is a tendency in much current library literature toconflate “knowledge” and “understanding”–levels of learning that requireinterconnections to be made–with “information”; but they must be distinguished.The example: Tribute payments in the Peloponnesian warA graduate student came into the reading room where I work and asked, “Whereare the books on ancient Greece?” It was evident this was a new user who was notfamiliar with closed stacks policy of the Library of Congress. I explained that particularbooks or other resources had to be identified through subject searches in the computersystem (or other sources) and requested through call slips. Equally important, I turnedthis explanation of the stacks policy into a reference interview which elicited the fact thatwhat the student really wanted was information on “the system of tribute paymentsamong the Greek city-states during the Peloponnesian War.”The student said he had already done Google searches. Today, a search on“tribute” and “Peloponnesian” produces these results:Google: 78,400 Web sitesGoogle Book Search [full texts of some digitized books]: 674 hitsGoogle Scholar [full texts of some digitized journals]: 2,030 hitsIn each case, even months ago (when the retrievals were somewhat smaller), the studentwas overwhelmed with too much information: he “could not see the forest for the trees”or discern if he was finding the best relevant sources. A search on Wikipedia turned up6nothing right on the button, although it does have brief articles on th “PeloponnesianLeague” and “Peloponnesian War” that have the word “tribute” in them.Most researchers–at any level, whether undergraduate or professional–who aremoving into any new subject area experience the problem of the fabled Six Blind Men ofIndia who were asked to describe an elephant: one grasped a leg and said “the elephant islike a tree”; one felt the side and said “the elephant is like a wall”; one grasped the tailand said “the elephant is like a rope”; and so on with the tusk (“like a spear”), the trunk(“a hose”) and the ear (“a fan”). Each of them discovered something immediately, butnone perceived either the existence or the extent of the other important parts–or how theyfit together. Finding “something quickly,” in each case, proved to be seriously misleading totheir overall comprehension of the subject. In a very similar way, Google searching leaves remote scholars, outside theresearch library, in just the situation of the Blind Men of India: it hides the existence andthe extent of relevant sources on most topics (by overlooking many relevant sources tobegin with, and also by burying the good sources that it does find within massive andincomprehensible retrievals). It also does nothing to show the interconnections of theimportant parts (assuming that the important can be distinguished, to begin with, from theunimportant).In this Peloponnesian case, my thinking was, first, to try to guide the student to anintelligible overview of the relevant literature, so that he could indeed see “the wholeelephant,” and not just “something” on the topic. This is the most important function areference librarian can serve in a large research library.My first thought was of encyclopedia articles (rather than whole books or journalarticles) because their very purpose is to provide concise overviews of topics, withmanageably small bibliographies of highly-recommended sources (rather than printouts of“everything”). So I started by searching an obscure subscription database, ReferenceUniverse, which indexes all of the individual articles in over 12,000 reference sources; itis particularly good in its coverage of specialized subject encyclopedias. (As with somany subscription services, the title of the source does not begin to convey what it cando—even if the reader, working on his own, did come across this title in the Library’s listof proprietary database subscriptions, he still would probably not have bothered toexplore it.) The indexing in this file immediately identified an article o “Tribute lists(Athenian)” in a highly reliable source, The Oxford Classical Dictionary. This volumewas right in the Main Reading Room reference collection; its article provided exactly theconcise overview of the topic that the student wanted—without knowing how to ask forit, or even that it was possible to ask for a concise overview. The article also mentioned7at its end that “the standard work on the tribute records is B.D. Meritt, H.T. Wade-Gery,and M.F. McGregor, The Athenian Tribute Lists, 4 vols. (1939-53).”Whenever there is a “standard work” on a topic, it is better to find this out soonerrather than later in the course of one=s research (as many grad students–myself amongthem–have discovered “the hard way”). Armed with this information, I showed thereader how to search the computer catalog for that standard work. The LC catalogingrecord for the book then provided crucial information for the next step of the search–i.e.,the record found through a known-item title search indicated that its most promisingsubject category is “Finance, public–Greece–Athens” (i.e., not “tribute” AND“Peloponnesian”). A search under this standardized LC subject heading retrieved a rosterof directly relevant works whose keyword variations could never have been specified inadvance:Tribute Assessments in the Athenian Empire (1919)Studies in the Athenian Tribute Lists (1926)Treasurers of Athena (1932)Athenian Financial Documents of the Fifth Century (1932)Athenian Assessment of 425 B.C. (1934)Documents on Athenian Tribute (1937)Vorschlage zur Beschaffung von Geldmitteln, Oder, Uber die Staatseinkunft (1982)Finances Publiques et Richesses Privees dans le Discours Athenian au Ve et IVeSiecles (1988)Pathogene Syndroma sto Demosionomiko Systema tes Archais Athenas (1991)Money, Expense, and Naval Power in Thucydides= History 1-5.24 (1993)Money and the Corrosion of Power in Thucydides (2001)Poroi: A New Translation / Xenophon (2003)Advantages of controlled vocabulary useNote several things about this retrieval:A) Again, not one of these titles would have been retrieved by a keywordsearch on Atribute@ combined with “Peloponnesian” (let alone “ancient Greece”–thewords initially used by the researcher before I did the reference interview).B) The works found through an LC subject heading search in the Library=scatalog include both current and older works–from 1919 through 2003–together in thesame set (not just recent, in-print works).C) The works found through an LC subject heading search in the Library=scatalog also include both English and foreign language sources–German, French, and8Greek–together in the same set, without the searcher having to specify any foreignlanguage terms. (I should note that this subject heading was not the only one relevant tothe topic.)D) The retrieval was of manageable size, not overwhelming.E) The works identified were actually owned by the Library, immediatelyaccessible without the delays of borrowing or interlibrary loan. (The Principle of LeastEffort needs to be kept in mind: because sources that are readily available are moreattractive than those requiring greater time or effort to secure, we need to make high-quality sources as readily retrievable as possible–while we continue to operate in the realworld, where paper-copy books are essential to scholarship because copyright and site-license restrictions will never vanish; nor is it likely that future scholars will readily read300-page texts online. If our goal is to promote scholarship, then “least effort” on theresearchers’ part means “most effort” on our part, in our acquisition efforts, in creatinghigh quality cataloging, in providing proactive reference service, and in assuring the long-term preservation of our material.)F) Each of these books is substantially about the tribute payments–i.e.,these are not just works that happen to have the keywords “tribute” and “Peloponnesian”somewhere near each other, as in the Google retrieval. They are essentially whole bookson the desired topic, because cataloging works on the assumption of “scope-match”coverage–that is, the assigned LC headings strive to indicate the contents of the book as awhole. (Any single assigned heading may not, by itself, indicate the content of the entirework, but any heading will at least indicate the subject-content of a substantial portion ofit. Scope-match cataloging aims to summarize the major overall content of a book, not itsindividual chapters or smaller subsections. It is the antithesis of “granular” levelindexing, as provided by the book’s index pages or by keywords from the entire text.) Infocusing on these books immediately, there is no need to wade through hundreds ofirrelevant sources that simply mention the desired keywords in passing, or in undesiredcontexts. The works retrieved under the LC subject heading are thus structural parts of“the elephant”–not insignificant toenails or individual hairs.To change the metaphor for a moment, consider a mosaic picture of anelephant made up of thousands of small individual colored tiles. Keyword retrieval in afull-text database is like searching at the granular level for individual tiles; if you specifythat you want all of the gray pieces (needed for the legs, sides, ears, tail) and all of thewhite pieces (tusks, teeth) they can indeed be retrieved together in one set. But searchingat this level cannot retrieve the image as a whole with all of the parts properlyinterrelated; it cannot combine just some of the grays into legs or ears or tails, to theexclusion of other gray pieces that belong elsewhere. Nor can it exclude tiles fromthousands of other entirely different pictures (rhinoceroses, skyscrapers, dirigibles),which are also retrieved because they happen to have gray and white pieces within theirown makeup. For these purposes you need the equivalent of “scope match” cataloging,9which both defines what “the whole” object is to begin with and sets conceptualboundaries on what is or is not a legitimate part of that whole. Within these scopeboundaries various keywords (from titles, contents, or full texts) are contextuallyrelevant, but outside of them the same words become irrelevant “noise.” Merely givingmore weight to certain words tagged as metadata, so that they will be ranked by thesoftware as more important within an overall keyword retrieval, will still not assemble anoverall picture with any scope boundaries, or segregate structural from tangentialelements within the picture, let alone separate the elements within the desired picturefrom the same elements appearing in entirely different pictures. Pictures, of course, don’t contain cross-references to other illustrations; sohere the analogy breaks down. But controlled-vocabulary LC subject headings, unlikemosaic tiles or keywords, are indeed linked to broader, related, and narrower terms toestablish a road map of relationships to other conceptual headings–a mapping frequentlycrucial to scholarly overviews that is not provided at all by “ranked” metadata terms, orprovided reliably by democratic tagging. Moreover, this cross-reference network itselffunctions in a way that refers users to other headings that are themselves at scope-match(rather than granular) conceptual levels–a level that is also lost when precoordinatedLCSH subject strings are decomposed into their individual “facet” elements.The point needs emphasis: some theorists have a knee-jerk aversion toscope-match subject cataloging because they unthinkingly regard it as simply a carry-overfrom card catalog days. (Cards could not provide granular-level access without makingcatalogs much too physically large.) What they apparently lack is any experience indealing with actual researchers, for whom this level of cataloging solves the otherwiseintractable problem of retrieving so much chaff with keywords that the whole books theywant become buried indistinguishably in huge retrievals–e.g., Google Book Search’s 674hits combining “tribute” and “Peloponnesian.” Keyword searching at granular levels“overshoots the mark,” as does faceted searching of LCSH elements that must becombined into wholes by searchers who barely know which keywords to enter in the firstplace, and who also often don’t know what the “whole” is until they recognize it in aprecoordinated string. (Would any searcher working entirely on his own know that“Finance, public” needs to be chosen to begin with, and then combined with “Greece”and “Athens”? As a reference librarian, I can say it is much easier to teach how to findthe precoordinated string than to teach how to think up all of the individual facets thatneed to go into a Boolean combination.) Increasing the granularity of searching tokeyword levels, and robbing LCSH “facets” of their conceptual contexts inprecoordinated strings, are both practices that directly undermine the scope-match level oftraditional indexing–but it is precisely this feature of cataloging that brings about thequick retrieval of the “elephant’s” structural parts (the whole books on, or substantialtreatments of, the topic). These are the books readers want to find first, unencumbered bythe clutter of thousands of irrelevant hits having the right words in the wrong contexts,outside the desired conceptual boundaries. Note that neither I nor anyone else is arguing against granular levels ofaccess being provided in addition to scope-match; it is the replacement of one by the10other that is objectionable. We need both.Scope-match cataloging hits the bull’s eye at the level of retrieval mostneeded for distinguishing structural from ephemeral relevance to a topic. While it is truethat the subject-content of a book (or other record) as a whole can indeed be indicated bya combination of individual index elements (“Finance” AND “public” AND “Greece”AND “Athens”), researchers have much more difficulty thinking up all of the terms thatgo into such combinations; it is much easier for them to simply recognize strings thathave already been combined. (“Least effort” is a reality–again, it’s easier for them on theretrieval end if we do more of the work on the input end.) Theorists who assert thatsimply “digitizing everything” eliminates the need for cataloging2 evidently have minimalexperience with the actual results produced by implementing their theory. Full-textsearching is indeed extremely valuable in many situations; but if a researcher wishes toget an overview of the important works on a topic, that kind of searching is positivelycounterproductive–it cannot segregate whole books from fragments of books, nor can itseparate substantial treatments from trivial. It buries high and low quality sources in hugesets without the discriminations that users need. Granular access precludes overviewperspectives unless librarians also provide alternative search mechanisms that solve theproblems created by granularity.G) The problem of keyword variations (see the list, above, of titlesretrieved) would not have been solved by “throwing more keywords into the hopper”–i.e.,so that words which don’t “hit” within titles (appearing on brief catalog records) cannonetheless be found because they do indeed “hit” within larger digitized full texts. Inaddition to erasing the necessary conceptual boundaries for determining the relevance ofEnglish-language hits (again, Google Book Search: 674 hits), the same keyword searchesof English terms would fail to retrieve the relevant French, German, and Greek texts. H) The catalog could assemble this group of highly-relevant resources, tobegin with, because it makes direct use of the subject expertise of the professionalcatalogers who had previously brought about conceptual categorization of the relevantbooks in one grouping (under the standardized heading)–and done it at the level of thebook as a whole–through vocabulary control. A retrieval system based on controlledconceptual categorization of sources is radically different from one that relies onrelevance ranking of keywords done by machine algorithms. The latter can take thewords specified by a researcher and change the display-order of the retrieved resultsaccording to various criteria for weighting the keywords; but such a system cannot find,to begin with, keywords other than those specified. (Claims for automated “queryexpansion” need to be examined skeptically; there is usually much “less there than meetsthe eye.” Demonstrations–as with this Peloponnesian example–are called for, rather thanmere assertions lacking concrete examples.) We all need to be very skeptical of thephrase “relevance ranking”–“term weighting” would be more accurate–because itradically changes the very meaning of the word relevance. It entirely divorces itsdefinition from the notion of conceptual appropriateness, across both variant expressions . 1 The Peloponnesian War and the Future of Reference, Cataloging, and Scholarship in Research Libraries ByThomas MannPrepared for AFSCME 2910 The Library. the need for, and requirements of, education of researchers; and it examines in detail many of the glaring disconnects between theory and practice in the

