The Apache Solr configuration for Culturegraph.
# Start solr with 1GB Memory on Port 8983 and be verbose
solr start -m 1g -p 8983 -V
# Create core test with culturegraph config set
solr create_core -c test -d culturegraph -p 8983 -V
# Configure the data import handler for the core
cd solr/test/conf
nano solr-data-config.xml
# Change jdbc configuration, if required
# Data Import
# NOTE: Consult the logs, when a command is not working.
# Retrieve current status
curl -s http://localhost:8983/solr/test/dataimport?command=status
# Abort current action
# curl -s http://localhost:8983/solr/test/dataimport?command=abort
# Full-Import MARCXML from D:\data\marcxml
curl -s http://localhost:8983/solr/test/dataimport?command=full-import&entity=FileSystemFolder&dataDir=D:\data\marcxml&format=marcxml
# Full-Import Marc21 from database
# curl -s http://localhost:8983/solr/test/dataimport?command=full-import&entity=Marc21Database
# Commit
curl -s http://localhost:8983/solr/test/update?commit=true
# Stop solr
solr stop -all
NOTE:
- It is recommended to use the Solr Admin UI
- Select your core and go to Dataimport
- Be aware that API calls are also possible (see quickstart)
- Solr's Import Handler has various commands your may use to control the import procedure.
There are the following two ways to import data with this config set.
FileSystemFolder
You may import your records (Marc21 or MARCXML) from a custom location.
If you run a request with entity=FileSystemFolder
you need to set the following
request parameters (see also: DIH Request Parameters):
Parameter | Description | Example |
---|---|---|
dataDir | Folder location. | D:\data\marcxml |
format | File format. Choose marc21 or marcxml. |
Marc21Database
If you run a request with entity=Marc21Database
, you need a working JDBC configuration in your solr-data-config.xml
.
Inspect the following section (Configure JDBC) for further details.
The database needs at least the following columns:
Column | Type | Description |
---|---|---|
id | CHAR(40) | Record identifer. Format: (ISIL)IDN . |
update_time | TIMESTAMP | Insertion or update date for a record (UTC). |
full_record | TEXT | A bibliographic record formatted as MARC21. |
IMPORTANT: This step is mandatory, otherwise the Data Import Handler won't work with your MySQL Database.
Change the JDBC Data Source in solr-data-config.xml
.
The values in the provided jdbc data source element are only for demonstration purpose.
It is advised to use the Automatic Installation.
NOTE: The config set contains a vanilla solrconfig.xml
that needs to be modified.
Add a /dataimport request handler to solrconfig.xml
(Location: configsets/culturegraph
)
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">solr-data-config.xml</str>
</lst>
</requestHandler>
Install the following add-ons:
- https://github.com/culturegraph/solr-morph-transformer
- https://github.com/culturegraph/solr-metamorph-entity-processor
Copy all Metafacture Module JARs into SOLR_INSTANCE_DIR/lib
and into SOLR_INSTANCE_DIR/server/solr-webapp/webapp/WEB-INF/lib
.
Copy a appropriate version of MySQL Connector/J to SOLR_INSTANCE_DIR/lib/mysql
.
Add libraries to solrconfig.xml
:
<!-- Data Import Handler -->
<lib dir="\${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />
<!-- MySQL -->
<lib dir="\${solr.install.dir:../../../..}/lib/mysql" regex="mysql-connector-java-.*\.jar" />
<!-- Metafacture -->
<lib dir="\${solr.install.dir:../../../..}/lib/metafacture" regex="metafacture-.*\.jar" />
<!-- Data Import Handler Add-Ons -->
<lib dir="\${solr.install.dir:../../../..}/lib/dataimporthandler" regex="solr-metamorph-transformer-.*\.jar" />
<lib dir="\${solr.install.dir:../../../..}/lib/dataimporthandler" regex="solr-metamorph-entity-processor-.*\.jar" />
The solr-respin.sh
script modifies a Apache Solr release by doing the following:
- Download Solr release
- Add the culturegraph config set
- Add MySQL Connector to
lib/mysql
- Add the metafacture modules to
lib/metafacture
- Also copy content of
lib/metafacture
intoserver/solr-webapp/webapp/WEB-INF/lib
- Also copy content of
- Add DIH add-ons to
lib/dataimporthandler
- Modify the
sorlconfig.xml
- Add data import request handler
- Include
<lib ... />
statements for the added JARs
Run it with:
bash solr-respin.sh
The header section of the script contains the following changeable parameters:
Parameter | Description |
---|---|
SOLR_VERSION | Version of the Apache Solr distribution |
METAFACTURE_VERSION | Version of the Metafacture distribution. |
MYSQL_CONNECTOR_VERSION | Version of the used MySQL Connector. |