Use of Solr and Xapian in the Invenio document repository software

Speaker(s): 

Schedule info

Time slot: 
9 July 16:00 - 16:30
Room: 
Sir John A

Use of Solr and Xapian in the Invenio document repository software

Author(s)
First Name: 
Patrick
Last Name: 
Glauner
Affiliation: 
IT Department, CERN European Organization for Nuclear Research
First Name: 
Jan
Last Name: 
Iwaszkiewicz
Affiliation: 
IT Department, CERN European Organization for Nuclear Research
First Name: 
Jean-Yves
Last Name: 
Le Meur
Affiliation: 
IT Department, CERN European Organization for Nuclear Research
First Name: 
Tibor
Last Name: 
Simko
Affiliation: 
IT Department, CERN European Organization for Nuclear Research
Keywords: 
Invenio; Solr; Xapian; Python; institutional repository; word similarity ranking; scalability
Track: 
General conference
Paper
Abstract: 

Invenio is a free comprehensive web-based document repository and digital library software suite originally developed at CERN. It can serve a variety of use cases from an institutional repository or digital library to a web journal. In order to fully use full-text documents for efficient search and ranking, Solr was integrated into Invenio through a generic bridge. Solr indexes extracted full-texts and most relevant metadata. Consequently, Invenio takes advantage of Solr’s efficient search and word similarity ranking capabilities. In this paper, we first give an overview of Invenio, its capabilities and features. We then present our open source Solr integration as well as scalability challenges that arose for an Invenio-based multi-million record repository: the CERN Document Server. We also compare our Solr adapter to an alternative Xapian adapter using the same generic bridge. Both integrations are distributed with the Invenio package and ready to be used by the institutions using or adopting Invenio.

AttachmentSize
OR2013_Proposal_Paper_Glauner.pdf507.17 KB