| Searching for historical word-forms in a database of 17th-century English text using spelling-correction methods |
| Full text |
Pdf
(758 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Copenhagen, Denmark
Pages: 256 - 265
Year of Publication: 1992
ISBN:0-89791-523-2
|
|
Authors
|
|
Alexander M. Robertson
|
Department of Information Studies, University of Sheffield, Western Bank, Sheffield, UK, S10 2TN
|
|
Peter Willett
|
Department of Information Studies, University of Sheffield, Western Bank, Sheffield, UK, S10 2TN
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 48, Citation Count: 4
|
|
|
ABSTRACT
This paper discusses the application of algorithmic spelling-correction techniques to the identification of those words in a database of 17th century English text that are most similar to a query word in modern English. The experiments have used n-gram matching, non-phonetic coding and dynamic programming methods for spelling correction, and have demonstrated that high-recall searches can be carried out, although some of the searches are very demanding of computational resources. The methods are, in principle, applicable to historical texts in many languages and from many diffeent periods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Angell, 1~.C., Freund, G.E. and Willett, P. (1983). Automatic spelling correction using ~ trigram similaxity measure. In{ormation Processing and Managemeat, 19, 255-261.
|
| |
2
|
Barber, C.L. (1972). Th, story of language. London, Pan Books.
|
| |
3
|
Burgess, A. (1975). Language made plain. London, Fontana Paperbacks.
|
| |
4
|
Crump, M. and Harris, M. (eds.) (1983). Searching the eighteenth century. London, British Library.
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
Gaxtd, T.N. (1990). PIiONIX: the algorithm. Program, 24, 363-366.
|
| |
9
|
Kruskal, J.B. (1983). Macromolecular sequences. In Sankoff, D. and Kruskal, J.B. (eds.). Time warps, string edits, and macromoleeul~: the theory and practice of sequence comparison. Reax~ng, Mass., Addison-Wesley Publishing Co., 45-53.
|
| |
10
|
Leslie, M. (1990). The Haxtlib Fapers Project: text retrieval in large dataaets. Literary and Linguistic Computing, 5, 58-69.
|
| |
11
|
Needleman, S.B.and wunch,C.D(1970).A general method ~ppiicable to the search for similaxitien in the amino acid sequence of two proteins. Journal of Molecular Bioloyy, ~8, 443-453.
|
| |
12
|
Pollock, J.J. (1980). SPEEDCOP: Task A.1. Quantification. Chemical Abstracts Service Internal Keport.
|
| |
13
|
Pollock, J.J. (1981). SPEEDCOP: Tank B.2- Automarie correx~io, of misspellings. Chemical Abstracts Service Internal Report.
|
| |
14
|
Pollock, J.J. (1981). SPEEDCOP: Final r~port. Chemical Abstracts Service Internal R.eport.
|
 |
15
|
|
| |
16
|
Rogers, H.J. ~nd Willett, P. (1991). Searching for historical word forms in text databases using spellingcorrection methods: reverse error and phonetic coding methods. Journal of Documentation, .{7, 333-353.
|
| |
17
|
Kussell, R.C. (1918). United States patent Ie61167. Waahington, United States P~tent Office.
|
| |
18
|
KusseI1, I~.C. (1922). United State~ patent I~35663. Washington, United States Patent Office.
|
| |
19
|
|
| |
20
|
Shaw, D. (1991). MAB.C catalogues of early-printed books at the University of Kent. Program, 25, 339- 347.
|
| |
21
|
Vallins, G.H. (1965). Spelling. London, Andre Deutsch.
|
 |
22
|
|
Peer to Peer - Readers of this Article have also read:
-
M4: a metamodel for data preprocessing
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
Anca Vaduva
, Jörg-Uwe Kietz
, Regina Zücker
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|