Information Access on the Wide Open Web
RLG's James Michalko discusses the issues surrounding the access and retrieval of
scholarly information in today's environment of choice.
James Michalko is president and chief executive officer of RLG, a not-for-profit membership corporation of more than 160 universities, national libraries, archives, historical societies and other institutions that have notable collections for research and learning. RLG develops and operates information resources that address its members' shared goals for their collections. The organization was founded in 1974 and incorporated in 1975 by Columbia, Harvard and Yale Universities and The New York Public Library; it has pioneered cooperative solutions to the problems that research collections and their users face in the acquisition, delivery and preservation of information.
UBIQUITY: Let's start the conversation by noting that your organization's name was changed from Research Libraries Group, Inc. to RLG. What is that name change meant to accomplish?
MICHALKO: It's the way people commonly referred to the organization so it made sense. What's more, we were often thought of as a trade association, sometimes even being confused with the Association of Research Libraries, when, in fact, we were transforming ourselves around the issues rather than around the type of institution. We brought together archives, museums and libraries around the common issues of access to scholarly information. We preach that the best partners are not necessarily folks who look just like you; they are the institutions who serve the same clientele. We made a conscious effort to attract and find the common ground among cultural and memory institutions.
UBIQUITY: Is this bringing together of different kinds of institutions largely information-technology driven?
MICHALKO: They are all institutions that have a mission-specific charge to collect, preserve and make available materials to scholars and students. They all have similar challenges in the need to take advantage of the available information technology to meet that mission. Along those dimensions, they share a lot. However, we've discovered, as we get smarter about the differences between communities, some different emphases. The traditional museum community, for example, has an educational function but their priorities and obligations to a broad national or global constituency are very different than a research or national library of record. You sometimes have to get more granular about where the common interest is before you can get everybody going in the same direction.
UBIQUITY: What would that granular interest be?
MICHALKO: For instance, all of these communities have hatched their own sets of descriptive practices and standards for the different collections that they keep. That's great. But now that we find it necessary to interoperate, those institutional silos don't make a lot of sense to the target community. You end up having conversations about standards, descriptive practices and encoding practices from one community to the other. Everybody subscribes to the big good of providing broad access. When you get down to the details, you realize that for it to happen, you must honor the existing community practices and yet get people to see how they can work together and interoperate. So there's a little bit of a disconnect with the high level rhetoric. Yes, we all honor the same things but at a practical level, how do we do it? RLG tries to make a contribution at both of those levels and to make connections across these groups.
UBIQUITY: Are your communities -- libraries and museums and archives
-- equally comfortable in the brave new world of information technology?
MICHALKO: I think that they are. Some have been a little bit slower to take full advantage but I think that's largely been a function of the resources available to them, as opposed to some sort of residual Luddite instincts.
UBIQUITY: Not too many years ago -- maybe two or three years ago -- there was a sustained conversation about the importance of keeping the card catalog. I haven't heard a word about it lately.
MICHALKO: No, you haven't and neither have I.
UBIQUITY: Does that mean that the battle has receded into history?
MICHALKO: I think it's gone. In 2001, in fact, we printed our last catalog cards. Printing catalog cards was an important part of our work in the early days. As the volume of catalog cards declined, we kept raising the price until they were several dollars each. But we still had institutions that were getting just one card to put into what is called a shelf list catalog which acted as a card replica of the entire library; one card for every book. We finally stopped doing it because we couldn't afford to buy the card stock at such low print volumes. We actually got thank you notes from the institutions that were still ordering cards. They said that they'd been waiting for somebody else to make them stop. I think libraries are fully committed to the their databases and online catalogs.
UBIQUITY: Are those communities, to use a common phrase, "embracing technology"? I'm not sure how you embrace technology.
MICHALKO: Oh, I know how you embrace technology: you transfer resources in that direction and you make institutional commitments that you are going to deliver on your mission by being dependent on it in some way or another. All of them have done that. As I said, some are better at it than others, and some are further along. It's largely a function of resources. A state historical society might not be able to take advantage of technology for access to their materials as well as a major urban university library or a national research library. It's the same with museums. I think the biggest differences are content driven, not technology driven. People in the humanities and social science disciplines need access to huge quantities of content in order to do their work. We still haven't hit that threshold where the universe of materials available in an electronic forum can sustain working in only that environment.
UBIQUITY: How long do you think that situation will last?
MICHALKO: Some very narrow disciplines already have the critical mass of material they need available to them in an electronic form, and that's where most of their work gets done. I think we'll progress slowly at a sub-disciplinary level until eventually you discover that you've got that threshold level of electronic content along with the tools to exploit it. You will be an electronic scholar.
UBIQUITY: What relationship does RLG itself have with the computer science community? And what relationship would it like to have?
MICHALKO: Some of the institutions that work with us, particularly university libraries, have strong and close connections with the CS people. That's a relatively small number of institutions; places like the University of Michigan and Stanford. The hard thing for the information people is that they have some real applied development challenges that don't always excite the CS people, who are by and large driven by a research imperative. I think there's an opportunity here for both the CS folks and the information institutions. They could be excellent partners in the areas of multilingual access, taxonomy applications and the development of the visualization of complicated information. That's one of the reasons that I think the whole set of cultural memory institutions represents an opportunity for the CS community. They'd be very willing partners. They've got complicated problems that make some of the industrial applications that people are working on seem pretty easy.
UBIQUITY: Give an example of one of those problems.
MICHALKO: Well, we hired a young woman out of industry when the dot-com bubble burst. We were talking in a meeting about recasting our big Union Catalog and somewhere about 90 minutes into the meeting, she said, "I'm sorry to interrupt, but I need to understand this. You mean you've got nearly 700 gigabytes of descriptive data that's all been structured in exactly the same way?" We said, "Well, yes, that's what a library does." And she said, "I've never heard of such a thing. You could do some really interesting forms of retrieval and presentation." She became very excited about what you could do with that data. That's one of the reasons that we are trying to make more connections with the CS community. Among others, there are access problems related to language, description, relatedness, vocabularies, etc. where we could really benefit from some interest on the part of the CS
UBIQUITY: What kind of involvement does RLG have with language computing?
MICHALKO: In the early days of computing in this country, the instinct was that a whole category of scholarship was being left behind because the alphabets, the vernacular characters, of certain languages couldn't be represented in the computing environments that were then around. This was not a trivial problem in higher education. The idea was that if you could bring automation into those environments that relied on Chinese, Japanese, Korean, et cetera, you would a) save money and b) not leave them behind as information access moved to the computing environment. We, as an organization, were involved very early on in that. I'm pleased to say that we were one of the founders of the Unicode Consortium, which defines and maintains the encoding for character sets. These capabilities are now leveraged in browsers and much of the everyday-computing environment. The kinds of things that were hugely difficult years ago ended up becoming part of the underlying infrastructure that nobody thinks about. I think there are more of those things out there that we could be attacking together.
UBIQUITY: What kinds of things could be done better?
MICHALKO: For instance, we could do hugely better on multilingual access. I ought to be able to issue a search in my native language and find materials that are relevant in other languages. We ought to be able to do this, given the progress that's been made in statistical analysis and computational linguistics. But we haven't applied those tools to this environment. I'd like to see knowledge tools like vocabularies, taxonomies, thesauri, dictionaries, etc. get exploited to really open up authoritative information for broad. I'd like art historians and social scientists to be as well served in the digital knowledge environment as practitioners in the sciences.
UBIQUITY: Twenty years ago a researcher who had a problem finding something would think first of asking a research librarian for help. Now, probably the first inclination would be to go to Google. What do you make of that?
MICHALKO: Well, a number of things. For daily use, the Internet wins hands down. But what about reliability, quality, trust and credibility? The library still gets winning points for those. Serious researchers still apply some of those trust, credibility and reliability quality filters to what they get from the Internet. But libraries are definitely becoming a secondary stop, relative to their role 20 years ago. Partly it's because they don't have a big presence in the daily Internet environment. I think that's a big issue. Is there a way for the institutions, who have trusted, credible authoritative information, to deliver it into the environment that is now the primary choice of their users? Can they do it in a way that is as pleasing and satisfying as the paradigms that emerged from the dot-com era? This is the challenge for libraries, museums and archives. They have an asset. They have a reputation. Their audience would like to have that in the environment of their choice. The environment of their choice is the open Web, the Internet.
UBIQUITY: Do the institutions that are members of RLG have active relationships with computer companies or Internet companies, such as Amazon and Google and so forth?
MICHALKO: I think most university libraries have relationships with the hardware and network vendors.
UBIQUITY: What about research and development partnerships?
MICHALKO: There are very few that I know of. There seems not to be much interaction with the library or the folks who provide information access. I don't think our community has commanded the attention of those primary destination places on the Net. It's hard for us to make a business case.
UBIQUITY: Would these companies be more interested if they knew what the memory community could offer them? For example, right now, they may think, why does anybody need taxonomies anymore?
MICHALKO: People need taxonomies so that they can dig out the specific and authoritative information that is not found using the current algorithms. It's one of the problems that we're hoping to crack with some of the work that we're currently doing between naïve vocabularies and authoritative vocabularies. For example, when professors put stuff on a Web site they use the authoritative vocabularies associated with their disciplines. So, a student who types the phrase "English Civil War" into a search engine such as Google will probably get some other kid's paper who used the phrase "English Civil War" when what he or she really needs is information on the United Kingdom, Great Britain, civil disorder, 1600s, and so on. People have come to be satisfied with what they get on the Internet by using naïve vocabulary. A lot of stuff that might be of interest to them doesn't get found.
UBIQUITY: How do you propose to fix the problem?
MICHALKO: Some firms, such as Amazon, have created algorithms and done the computational analysis that asks, "Did you really mean this?" and says "If this is what you want, then you will find the following things relevant." We must deliver authoritative trusted information using those kinds of paradigms or we will simply become museums of long-term storage instead of current use. These are some of the ways in which the CS community could make accessible on behalf of the broad Internet community enormous amounts of wonderful resources that right now are either inaccessible or severely under used.
UBIQUITY: Let's switch to a hot topic nowadays and that's the general idea of intellectual property issues. What is your position about copyright and fair use and similar issues?
MICHALKO: Since we're not a trade group, we don't take positions on behalf of the community. However, one of the things that will continue to be a crucial driver in intellectual productivity and the progress of scholarship in the U.S. is fair use. It is crucial to the continuing intellectual vitality of the country to maintain that concept and practice in the electronic environment. We've been very careful as we work with institutions to ensure that whatever we do honors the fair use principle. Will the debate about digital rights management and digital rights management systems take onboard the criticality of fair use? I don't know. Our communities have worked hard to be part of the discussion but they're not a particularly powerful force in that conversation. As an aside, at the last Coalition for Networked Information meeting that Cliff Lynch ran, he characterized digital rights management as the most deeply cynical and nonsensical phrase of the year.
UBIQUITY: Why did he say that the digital rights management phrase is nonsensical?
MICHALKO: He said that it's neither about rights nor about management. Systems are being proposed that structure the conversation on the basis of restrictions.
UBIQUITY: What could be proposed to change that?
MICHALKO: It's a bit on the fringe of things that I focus on. In a sidelong way, the Lessig challenge to the digital millennium copyright act did, in fact, emerge out of the library, archive, and museum communities. On the other hand, there's Sony and others building technology-based restricted systems. We're not going to have a voice at that level. We're only going to have a voice on the national policy and the legal framework side of things. I'm hoping that the various organizations that are represented in Washington will help make a difference.
UBIQUITY: What is the energy level or the morale level in the library and memory communities?
MICHALKO: Over the last three or four years there was a certain amount of hand wringing. That is to say, there was concern that we were not reaching the audiences that we ought to be reaching. But there wasn't much in the way of effective response. Recently, because we have hard data and good user study input from different levels and perspectives, people were starting to see how they could respond. Then, just as they were gaining momentum, the economic bubble burst. During periods of economic hard times the ability to finance change is compromised. The will and the energy level are focused now. Unfortunately, that third necessary element, the resources to make good on it, is a bit constrained right now.
UBIQUITY: Give an example of an idea that is energizing one of your communities?
MICHALKO: The museum community has certainly progressed in their understanding of why the Web and digital information are good for them. For a long time, they thought of the Web as a publishing medium where they would make money from showing images of what's in their collections. Now they realize that it's a way to build and maintain constituencies at distance after people have visited and been part of a museum experience. They're starting to think about the physical experience as perhaps a centerpiece in a continuum of a relationship with a patron. That's very different and very energizing for museums. Unfortunately, the resources to build on these insights are really constrained. That's a bit depressing because things march on.
UBIQUITY: Is there any particular relationship you can think of that would be good between RLG and the ACM communities?
MICHALKO: Some years ago, I remember that one of the special interest groups -- the IR or information retrieval people -- had as part of their annual gathering an afternoon summit with the cultural community. They said, "Here's what we think is our big research agenda over the next number of years." Then the people in the cultural community shared their list of the things that they were concerned about and which they would love to progress. An interesting conversation ensued because much of the same stuff showed up on both lists, but in different priorities. It was a way to have the hands-on communities inform the research agenda, and for the researchers to find partners in pursuing their projects. I could see ACM and RLG providing opportunities to compare and integrate that would inform both the consumers and the researchers about their respective agendas.
Forum
[Home]
[About Ubiquity]
[The Editors]
Ubiquity welcomes the submissions of articles from everyone interested in the future of information
technology. Everything published in Ubiquity is copyrighted ©2003 by the ACM and the individual authors.
|
|