ACM Home Page
Please provide us with feedback. Feedback
Searching for historical word-forms in a database of 17th-century English text using spelling-correction methods
Full text pdf formatPdf (758 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Copenhagen, Denmark
Pages: 256 - 265  
Year of Publication: 1992
ISBN:0-89791-523-2
Authors
Alexander M. Robertson  Department of Information Studies, University of Sheffield, Western Bank, Sheffield, UK, S10 2TN
Peter Willett  Department of Information Studies, University of Sheffield, Western Bank, Sheffield, UK, S10 2TN
Sponsors
Royal School of Lib. : Royal School of Lib.
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 48,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/133160.133208
What is a DOI?

ABSTRACT

This paper discusses the application of algorithmic spelling-correction techniques to the identification of those words in a database of 17th century English text that are most similar to a query word in modern English. The experiments have used n-gram matching, non-phonetic coding and dynamic programming methods for spelling correction, and have demonstrated that high-recall searches can be carried out, although some of the searches are very demanding of computational resources. The methods are, in principle, applicable to historical texts in many languages and from many diffeent periods.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Angell, 1~.C., Freund, G.E. and Willett, P. (1983). Automatic spelling correction using ~ trigram similaxity measure. In{ormation Processing and Managemeat, 19, 255-261.
 
2
Barber, C.L. (1972). Th, story of language. London, Pan Books.
 
3
Burgess, A. (1975). Language made plain. London, Fontana Paperbacks.
 
4
Crump, M. and Harris, M. (eds.) (1983). Searching the eighteenth century. London, British Library.
5
6
 
7
 
8
Gaxtd, T.N. (1990). PIiONIX: the algorithm. Program, 24, 363-366.
 
9
Kruskal, J.B. (1983). Macromolecular sequences. In Sankoff, D. and Kruskal, J.B. (eds.). Time warps, string edits, and macromoleeul~: the theory and practice of sequence comparison. Reax~ng, Mass., Addison-Wesley Publishing Co., 45-53.
 
10
Leslie, M. (1990). The Haxtlib Fapers Project: text retrieval in large dataaets. Literary and Linguistic Computing, 5, 58-69.
 
11
Needleman, S.B.and wunch,C.D(1970).A general method ~ppiicable to the search for similaxitien in the amino acid sequence of two proteins. Journal of Molecular Bioloyy, ~8, 443-453.
 
12
Pollock, J.J. (1980). SPEEDCOP: Task A.1. Quantification. Chemical Abstracts Service Internal Keport.
 
13
Pollock, J.J. (1981). SPEEDCOP: Tank B.2- Automarie correx~io, of misspellings. Chemical Abstracts Service Internal Report.
 
14
Pollock, J.J. (1981). SPEEDCOP: Final r~port. Chemical Abstracts Service Internal R.eport.
15
 
16
Rogers, H.J. ~nd Willett, P. (1991). Searching for historical word forms in text databases using spellingcorrection methods: reverse error and phonetic coding methods. Journal of Documentation, .{7, 333-353.
 
17
Kussell, R.C. (1918). United States patent Ie61167. Waahington, United States P~tent Office.
 
18
KusseI1, I~.C. (1922). United State~ patent I~35663. Washington, United States Patent Office.
 
19
 
20
Shaw, D. (1991). MAB.C catalogues of early-printed books at the University of Kent. Program, 25, 339- 347.
 
21
Vallins, G.H. (1965). Spelling. London, Andre Deutsch.
22


Collaborative Colleagues:
Alexander M. Robertson: colleagues
Peter Willett: colleagues

Peer to Peer - Readers of this Article have also read: