Efficiency Considerations for Scalable Information Retrieval Servers
AbstractWe review a variety of techniques to improve efficiency in information retrieval. Given the increasing volumes of data that are available electronically, understanding and using such techniques is critical. We address several efficiency concerns, but our primary focus is on index processing since it dominates the computational demands of information retrieval. Given the importance of index processing, in addition to a general overview we include some recent index maintenance results. These results demonstrate that by delaying the updating of the index when additional documents are introduced to the collection, efficiency is improved without noticeably degrading the effectiveness of information retrieval. We conclude with an overview of parallel processing in information retrieval. Since users cannot tolerate lengthy response times, searching large text databases requires vast computational resources. Parallel processing is currently the only means to support these demands. We focus on only those approaches that are currently commercially viable.