| Improved techniques for processing queries in full-text systems |
| Full text |
Pdf
(933 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
New Orleans, Louisiana, United States
Pages: 306 - 315
Year of Publication: 1987
ISBN:0-89791-232-2
|
|
Authors
|
|
Y. Choueka
|
Inst. for Information Retrieval and Computational Linguistics (IRCOL) -- The Responsa Project and Department of Mathematics and Computer Science, Bar-Ilan University, Ramat Gan, Israel and On sabbatical leave at Bell Communications Research, Morristown, New Jersey, USA
|
|
A. Fraenkel
|
Department of Applied Mathematics, The Weizmann Institute of Science, Rehovot 76100, Israel
|
|
S. Klein
|
Department of Applied Mathematics, The Weizmann Institute of Science, Rehovot 76100, Israel
|
|
E. Segal
|
Inst. for Information Retrieval and Computational Linguistics (IRCOL) -- The Responsa Project
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 14, Citation Count: 8
|
|
|
ABSTRACT
In static full-text retrieval systems, which accommodate metrical as well as Boolean operators, the traditional approach to query processing uses a “concordance”, from which large sets of coordinates are retrieved and then merged and/or collated. Alternatively, in a system with l documents, the concordance can be replaced by a set of bit-maps of fixed length l, which are constructed for every different word of the database and serve as occurrence maps. We propose to combine the concordance and bit-map approaches, and show how this can speed up the processing of queries: fast ANDing and ORing of the maps in a preprocessing stage, lead to large I/O savings in collating coordinates of keywords needed to satisfy the metrical and Boolean constraints. Moreover, the bit-maps give partial information on the distribution of the coordinates of the keywords, which can be used when queries must be processed by stages, due to their complexity and the sizes of the involved sets of coordinates. The new techniques are partially implemented at the Responsa Retrieval Project.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
Bratley P., Choueka Y., Processing truncated terms in document retrieval systems, Inf. Processing ~ Management 18 (1982) 257-266.
|
| |
4
|
Choueka Y., Full text systems and research in the humanities, Computers and the Humanities XIV (1980) 153-169.
|
 |
5
|
A. S. Fraenkel , S. T. Klein , Y. Choueka , E. Segal, Improved hierarchical bit-vector compression in document retrieval systems, Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval, p.88-96, September 1986, Palazzo dei Congressi, Pisa, Italy
[doi> 10.1145/253168.253190]
|
 |
6
|
|
| |
7
|
Fraenkel A.S., All about the Responsa Retrieval Project you always wanted to know but were afraid to ask, Expanded Summary, Jurimetrics J. 16 (1976) 149- 156.
|
| |
8
|
Fraenkel A.S., Klein S.T., Novel compression of sparse bit-strings ~ preliminary report, Combinatorial Algo. rithms on Words, NATO ASI Series Vol. F12, Springer Verlag, Berlin (1985) 169-183.
|
| |
9
|
|
| |
10
|
|
CITED BY 8
|
|
|
|
|
|
|
|
|
|
|
|
|
A. Bookstein , S. T. Klein , T. Raita, Detecting content-bearing words by serial clustering—extended abstract, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.319-327, July 09-13, 1995, Seattle, Washington, United States
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
M4: a metamodel for data preprocessing
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
Anca Vaduva
, Jörg-Uwe Kietz
, Regina Zücker
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|