Ophir Frieder
Dept. of Computer Science
Illinois Inst. of Technology
Chicago, IL 60616
Phone: (312) 567-4496
Email: ophir@ir.iit.edu
Biographical Information
Dr. Ophir Frieder is the IITRI Chair Professor of Computer Science and
the Director of the Information Retrieval Laboratory at the Illinois
Institute of Technology. His research interests span the general area
of scalable information retrieval systems. He is a Member of Phi Beta
Kappa and ACM and a Fellow of the IEEE.
Suggested Lecture Topics
Seeing is Believing: Searching and Understanding!
We describe two efforts for document visualization, Harris Corporation's SENTINEL
and an authorship detection system. SENTINEL leverages the fusion of multiple
information retrieval technologies to search and visualize its target information.
This 3-dimensional visualization capability provides users with an intuitive
understanding of document relevance, resulting in higher retrieval accuracy.
In the second effort, by displaying a function of the frequency of letter tuples,
a visual stylistic signature of authorship is developed. We evaluated this approach
using the disputed work "Federalist Papers", and efficiently determined authorship
coinciding with believed credit. Additional experimentation has likewise confirmed
the validity of the approach.
Integrating Structured Data and Text... Easily!
Traditionally, database applications stored predominantly structured data;
today, however, the focus has shifted towards the integration of multiple data
types. We overview a joint industrial-academic project that integrates structured
and text data using strictly the relational model. Using standard SQL query
templates, both structured data and text are searched. In contrast to traditional
information retrieval systems, this approach supports portability across platforms
and exploits parallelism without additional development costs.
The approach was deployed for general use at NIH's NCCAM as their on-line web-based
digital library server. Furthermore, several companies developed information retrieval
engines based on the described effort.
Getting Rid of Duplicates is Harder than You Think!
Duplicate documents hinder both accuracy and efficiency. Given a fixed
size retrieval set, selecting multiple copies of the same, or virtually
the same, documents reduces the number of relevant ones retrieved, thereby
lowering the accuracy. Furthermore, global term discrimination
statistics are adversely affected, nullifying the benefits of term
weights. Likewise, runtime efficiency deteriorates since additional
non-accuracy enhancing documents are processed unnecessarily.
The joint academic-industrial effort described relies on relevant word
identification and hashing techniques to identify candidate
duplicates. Experimental results demonstrate its accuracy and
efficiency. The approach described is used for multiple collection
fusion and document declassification.
On Scalable Information Retrieval Systems
Implementing scalable information retrieval systems requires the design and development of efficient methods to ingest data from multiple sources, search an retrieve results from both English and foreign language document collections and from collections comprising of multiple data types, harness high performance computer technology, and accurately answer user questions. Some recent efforts related to the development of scalable information retrieval systems are described. Particular emphasis is placed on those efforts that were adopted into commercial use.
Association for Computing Machinery Technology Outreach Program