Integrating Solr and Fedora

We have created a new handler for Solr to integrate directly with Fedora. Its aim is to simplify the amount of configurations required for Fedora and Muradora to utilize a full-text search and faceted search engine. More information about Solr can be found at http://lucene.apache.org/solr/ and their wiki

Our module is based on the GSearch 2.0 code done by Gert Schmeltz Pedersen.

Note: This Solr plugin requires access to the file system where Fedora stores its FOXML files.

Download software

  1. Solr: This Solr has been bundled with our requestHandler to handle Fedora FOXML files. It's taken from the Apache Solr project (development version 1.3)
  2. SolrDOManager: This module is required by Fedora to perform the callback to Solr when new objects are added/removed from Fedora.

Deploy Solr

  1. Download Solr and unpack it into a directory (e.g. /usr/local/solr). By default, the SOLR_HOME variable (where Solr stores its index and configuration files) will be the subdirectory "example/solr" (i.e. /usr/local/solr/example/solr).
  2. Build the Solr webapp
        $ant dist
    
  3. Deploy the webapp by renaming the newly created "apache-solr-1.3-dev.war" under "dist" directory to "solr.war" and place it it the Solr webapps directory
       $cp dist/apache-solr-1.3-dev.war    example/webapps/solr.war
    
  4. Under $TOMCAT_HOME/conf/Catalina/localhost create a new file called solr.xml with the content below (note: you will need to change the path to your appropriate SOLR_HOME directory)
       <Context docBase="/usr/local/solr/example/webapps/solr.war" debug="0" crossContext="false" >
          <Environment name="solr/home" type="java.lang.String" value="/usr/local/solr/example/solr" override="true" />
       </Context>
    
  5. Navigate to $SOLR_HOME/conf/ folder and check that the file "solrconfig.xml" contains the following requestHandler declaration (note: you will need to change to your appropriate Fedora URL and admin username and password)
      <requestHandler name="/fedora" class="solr.FedoraUpdateRequestHandler">
        <str name="fedoraSoap">http://localhost:8080/fedora/services</str>
        <str name="fedoraUser">fedoraAdmin</str>
        <str name="fedoraPass">adminPassword</str>
        <str name="foxmlXslt">demoFoxmlToSolr.xslt</str>
      </requestHandler>
    
    
  6. Check that you have the required demoFoxmlToSolr.xslt file under $SOLR_HOME/conf/xslt folder, and schema.xml file under $SOLR_HOME/conf/.

Using the Fedora Solr plugin

Assuming your Solr instance is deployed at http://localhost:8080/solr, you can interact with it via the rest interface:

Please note that, results of these commands will be reflected in the search results until you issue a "commit" command to Solr server. If you want the results to be immediately available for searching, append "commit=true" to each command. For example: http://localhost:8080/solr/fedora?pid=THE_OBJECT_PID&action=savePid&commit=true

Configure automatic updates from Fedora

To make Fedora automatically call Solr upon changes to digital objects, we need to install Solr DOManager. Download SolrDOManager, unpack and run ant dist-jar to build the DOManager. After that, copy dist/solr-dom.jar to $TOMCAT_HOME/webapps/fedora/WEB-INF/lib/ folder. In fedora.fcfg, change the DefaultDOManager class to SolrDOManager:

<module role="fedora.server.storage.DOManager" class="au.edu.mq.melcoe.fedora.SolrDOManager">

You will need to supply Solr-specific configuration including its base url, username & password. Note the username and password is only needed if you want to apply authentication to access Solr interface. You can also configure the DOManager to send commit update to Solr synchronously (immediately available for searching) or asynchronously.

Required:

<param name="solrUrl" value="http://localhost:8080/solr/"/>

Optional (only needed if basic auth is required for Solr REST access):

<param name="solrUsername" value="exampleUsername"/>
<param name="solrPassword" value="examplePassword"/>

In Solr, a newly inserted/updated document will not appear in the search results unless a commit command is sent to Solr server. You can force this DOManager to synchronously send a commit command to Solr by specifying:

<param name="syncUpdate" value="true"/>

Note that for use with Muradora, we recommend this to be turned on otherwise collection and object creations will not be done in a timely manner to be reflected in Muradora.

Back to Deployment Guide

Attachments