Ophir Frieder

 
Dept. of Computer Science 
Illinois Inst. of Technology
Chicago, IL  60616
Phone: (312) 567-4496
Email: ophir@ir.iit.edu

Biographical Information

Dr. Ophir Frieder is the IITRI Chair Professor of Computer Science and the Director of the Information Retrieval Laboratory at the Illinois Institute of Technology. His research interests span the general area of scalable information retrieval systems. He is a Member of Phi Beta Kappa and ACM and a Fellow of the IEEE.

Suggested Lecture Topics

Seeing is Believing: Searching and Understanding!

We describe two efforts for document visualization, Harris Corporation's SENTINEL and an authorship detection system. SENTINEL leverages the fusion of multiple information retrieval technologies to search and visualize its target information. This 3-dimensional visualization capability provides users with an intuitive understanding of document relevance, resulting in higher retrieval accuracy.
In the second effort, by displaying a function of the frequency of letter tuples, a visual stylistic signature of authorship is developed. We evaluated this approach using the disputed work "Federalist Papers", and efficiently determined authorship coinciding with believed credit. Additional experimentation has likewise confirmed the validity of the approach.

Integrating Structured Data and Text... Easily!

Traditionally, database applications stored predominantly structured data; today, however, the focus has shifted towards the integration of multiple data types. We overview a joint industrial-academic project that integrates structured and text data using strictly the relational model. Using standard SQL query templates, both structured data and text are searched. In contrast to traditional information retrieval systems, this approach supports portability across platforms and exploits parallelism without additional development costs.
The approach was deployed for general use at NIH's NCCAM as their on-line web-based digital library server. Furthermore, several companies developed information retrieval engines based on the described effort.

Getting Rid of Duplicates is Harder than You Think!

Duplicate documents hinder both accuracy and efficiency. Given a fixed size retrieval set, selecting multiple copies of the same, or virtually the same, documents reduces the number of relevant ones retrieved, thereby lowering the accuracy. Furthermore, global term discrimination statistics are adversely affected, nullifying the benefits of term weights. Likewise, runtime efficiency deteriorates since additional non-accuracy enhancing documents are processed unnecessarily.
The joint academic-industrial effort described relies on relevant word identification and hashing techniques to identify candidate duplicates. Experimental results demonstrate its accuracy and efficiency. The approach described is used for multiple collection fusion and document declassification.

On Scalable Information Retrieval Systems

Implementing scalable information retrieval systems requires the design and development of efficient methods to ingest data from multiple sources, search an retrieve results from both English and foreign language document collections and from collections comprising of multiple data types, harness high performance computer technology, and accurately answer user questions. Some recent efforts related to the development of scalable information retrieval systems are described. Particular emphasis is placed on those efforts that were adopted into commercial use.
Association for Computing Machinery Technology Outreach Program