Skip to content
Konstantinos edited this page Jan 29, 2018 · 63 revisions

Welcome to the AriadneMetadataIdentification wiki!


What is it?

Ariadne Metadata Identification is a standalone java program exported from the Ariadne Harvester Identification submodule.Its role is to create global LO and LOM identifiers for IEEE LOM metadata records located at filesystem.The created global identifiers are injected in the respective XML file.


Requirements

The latest jdk should be installed from here


Configuration

Edit

  • configure.properties file.

First step is to choose whether there will be created LOM or LO global identifiers using the following attributes in the configure.properties file:

  • identification.xml.addGlobalLOIdentifier=true
  • identification.xml.addGlobalMetadataIdentifier=true

These attributes can be true or false.

Second step is to choose how these global identifiers will be created. There is the identification.class attribute which may have one of the following values:

  • org.ariadne.oai.utils.HashID
  • org.ariadne.oai.utils.HarvesterUtils
  • org.ariadne.oai.utils.PIDS

Using the first option one can have global identifier created using the hex representation of hashing algorithms of the metadata content. The algorithms supported are the following:

  • MD2
  • MD5
  • SHA-1
  • SHA-256
  • SHA-384
  • SHA-512

Using the second option one can have global identifier created using the following scheme:

global_lo_id=dataprovider+general.identifier.entry

global_lom_id=dataprovider+metaMetadata.identifier.entry

If you choose the program to create global LOM ID then the created global id will be used as the XML file name as well.

Third step is to define the value of the catalog element by setting the following attribute in the configure.properties file: identification.catalog.value

The value above will be used as the general.identifier.catalog and/or metaMetadata.identifier.catalog value.It is also one part of the newly created LO or LOM Global Identifier.

Last step is the logging configuration:

log.file.path=path_to_folder_where_the_logs_will_be_saved

log.file.name=log_file_name


Run it

Depending on the operating system of your machine open cmd (for Windows) a bash command program(for linux),change to directory where AriadneGIdentification.jar is located and enter the following:

java -jar AriadneIdentification.jar inpuFolder outputFolder

Tip: For large numbers of XML files on the command above you should add the argument Xmx and define the maximum size of heap size that should be used by the java virtual machine like this:

java -Xmx4096m -jar Ariadnedentification.jar inpuFolder outputFolder


Global Identifier Creation Options.

As shown above there are two options of creating global identifiers.

  • Using the scheme:GlobalID=catalogValue+providerName+preExisting general.identifier.entry element or metaMetadata.identifier.entry.

  • Using the hex representation of MD5 hash of the metadata content. With this approach the global id is created using the HEX representation of the MD5 hash function applied to the content of the metadata record*.The steps followed are the following:

  1. Read the content of the metadata record.
  2. Store it temporarily to a string.
  3. Compute the Hash using the string above as input.
  4. Get the HEX representation of the computed hash.
  5. ΜetadataRecD=catalogValue+providerName+stringHash.

*This option is used only for creating distinct metadata identifiers and not learning object identifiers.

As shown above the Global LO Identifier creation scheme is the same for both approaches.One thing we should mention is that there are times when there is no general.identifier at first place so as a fallback option we use the technical.location element value as the general.identifier.There also might be missing the metaMetadata.identifier element.For this case it is used the filename of the XML record as a fallback option.

When the global id is created then it is injected in the respective record as an additional general.identifier or metaMetadata.identifier element with all its LOM child(catalog,entry) elements.

Log file structure

DATE----NUMBER_OF_IDENTIFIED_RECORDS----REPONAME----GLOBAL_GENERALID(TRUE|FALSE)----GLOBAL_METAMETADATAID(TRUE|FALSE)