Two Catalysts for Qualitative Change

Richard Snodgrass
October 1,1999

Location  

  •     City and State, 2000 BCE
  •     Longitude, 1773 CE
  •     GPS + cell phone, 1999 CE

Confluences               

  •     Underlying technologies
  •     Highly accurate atomic clocks
  •     Geosynchronous satellites
  •     Advances in micro-circuitry
  •     Proliferation of cell phones
  •     Demonstrated need
  •     Catalyst: companies able to produce in quantity at low price
  •     Qualitative change

The Vision

The ACM Computing Portal

A web-based repository of bibliographic information    

  • contains information on all papers and books in the computing literature

  • contains a pointer to the digitized version, if available

Objectives

  •     Qualitatively increase the effectiveness of scientific research into computing
  •     Continue to place ACM as the premier scientific and educational organization for computing
  •     Increase service of ACM and the SIGs to the scientific community
  •     Provide a concrete illustration of the scope of computer science

Presentation

    Components

  •     Bibliographic Entries
  •     Abstracts and Keywords
  •     Full Text
  •     Citation Linking
  •     Demonstratio
  •     Realizing the Computing Portal
  •     Revisit the components
  •     The Next Step

Step 1: Bibliographic Entries

  •     Collect all bibliographic entries from all computer science journals, conferences, workshops, technical bulletins, and books.
  •     Over the period from 1940 to 2000
  •     Approximately 1M entries
  •     Provide free searching on the web.
  •     Provide citations in multiple formats: HTML, BiBTeX, refer, Word, ...

Step 2: Abstracts and Keywords

  •     Collect keywords, and later, abstracts, for all entries.
  •     Copyright restrictions on some abstracts?

Step 3: Full Text and Images

Collect full text of each available paper and book for

  •     use in searching
  •     to develop classification maps and lexicons
  •     other analyses

Step 4: Citation Linking

  •     Start with full text of paper's bibliography.
  •     Out linking: identify bibliographic entry of papers referenced by the paper
  •     In linking: identify bibliographic entries of papers referencing the paper
  •     Use for citation analysis, knowledge diffusion studies

Demonstration

Papers with wavelet:

Stage 1: Bibliographic Entries

Propose that each SIG be responsible for collecting
relevant entries.

  • ensure completeness, based on SIG interests
  • reduce overlap between SIGs
  • ensure correctness

Software for data entry, validation, and conversion provided to SIGs

1M entries / 36 SIGs = 30K entries per SIG

  • e.g., SIGMOD: approximately 50K entries

Many resources

  • DBLP: 130K entries
  • Propose that ACM donate the ACM Guide to Computing Literature: 200K entries
  • Collection of Computer Science Bibliographies: 930K entries

Stage 2: Keywords and Abstracts

  • Propose that SIGs collect these.
  • May need copyright permission, negotiated by ACM HQ
  • Collection of CS bibliographies has 100K abstracts

Stage 3: Full Text

Propose SIGs fund populating full ACM Digital Library.

  • PDF files containing encapsulated TIFF and OCRed full text
  • 99% accuracy
  • $1.25 per page.

Could go to SGML or XML, 99.9% accuracy: $8-$10 per page.

Populating the ACM DL

  •  Journals: 130K pages: $200K
  • Conference and workshop proceedings: 500K pages: $600K
  • Newsletters: 200K pages: $250K
  • Total: 850K pages at $1050K
  • $30K per SIG

Stage 3: Full Text, cont.

ACM papers: 850K pages, or about 50K papers

  • This represents 5% of total of 1M papers.

ACM books: obtain full text from publishers.

For remaining conference proceedings,

  • Offer full CD Rom package at cost in exchange for inclusion in CD Rom and use of full text for searching.
  • Pay for digitization out of conference profits
  • e.g., IEEE ICDE: 600 pages x 17 years x $1.25 = $13K.
  • SIGs pay for integration: $0.25 - $0.50 per page.

Stage 3: Journal Papers

For other journals,

  • Same offer as with conferences
  • Or, offer URL into their DL in exchange for full text, only for searching
  • ACM Computing Portal provides valuable entry into their DL, enhancing their revenue stream.

For other books, make same offer.

Open Architecture

  • Free searching via web interface, including full text search, at ACM site and SIG portals
  • Bibliographic data available for other search engines
  • As much PDF available for free as possible
  • Encourage digitization of corpus

Summary

The ACM Computing Portal

  • Free searchable access to the entire computer science corpus
  • SIG-specific portals
  • Fully populated ACM DL
  • Inclusion of or portal to other DL resources
  • Capability to purchase papers and to register queries
  • Possibly ancillary SIG-provided benefits, such as CD-ROMs

SGB Portal Committee

  • Rick Snodgrass (University of Arizona, CS), chair
  • Steve Cunningham (Cal State University-Stanislaus, CS)
  • Mary Fernandez (AT&T Labs)
  • Carol Hutchins (Courant Institute of Math. Sci. Library)
  • Bob Krovetz (NEC Research Institute)
  • Michael Ley (University of Trier, CS)
  • Andreas Paepcke (Stanford University)
  • Kathy Preas (KP Pubs on CDROM)
  • Charles Viles (Univ. of North Carolina, Info and Lib Sci)

Individual SIG Commitments

  • Collect and capture SIG-relevant bibliographic entries, abstracts, and keywords, in appropriate format.

Allocate funds to populate the ACM DL: journals, conference and workshop proceedings, SIG newsletter.

  • Roughly $30K for each SIG
  • SIGDA matching funds: $50K

Negotiate with steering committees of associated conferences and workshops.

ACM HQ Commitments

  • Donate entries from ACM Guide to Computing Literature.
  • Negotiate cross-use agreements with associated societies.
  • Acquire full text of books copyrighted by ACM.
  • Provide hardware and software to host CSP.
  • Provide staff to manage CSP, with content provided by SIGs.

ACM HQ Opportunities

  • Integrate CSP with CoRR
  • Provide print and CD-ROM versions of the expanded ACM Guide to Computing Literature
  • Fully populated DL
  • Increased visibility of ACM

Confluences

Underlying technologies

  • Inexpensive scanning, OCR, disk space, inexpensive, high capacity CD-ROM

Demonstrated need
Catalysts: ACM Council and SIG Governing Board
Qualitative change
 

Lifelong Learning

ACM offers lifelong learning resources including online books and courses from Skillsoft, TechTalks on the hottest topics in computing and IT, and more.

techpacks

ACM Case Studies

Written by leading domain experts for software engineers, ACM Case Studies provide an in-depth look at how software teams overcome specific challenges by implementing new technologies, adopting new practices, or a combination of both. Often through first-hand accounts, these pieces explore what the challenges were, the tools and techniques that were used to combat them, and the solution that was achieved.