-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Welcome to the AriadneMetadataIdentification wiki!
Ariadne Metadata Identification is a standalone java program exported from the Ariadne Harvester Identification submodule.Its role is to create global LO and LOM identifiers for IEEE LOM metadata records located at filesystem.The created global identifiers are injected in the respective XML file.
The latest jdk should be installed from here
Edit
- configure.properties file.
First step is to choose whether there will be created LOM or LO global identifiers using the following attributes in the configure.properties file:
- identification.xml.addGlobalLOIdentifier=true
- identification.xml.addGlobalMetadataIdentifier=true
These attributes can be true or false.
Second step is to choose how these global identifiers will be created. There is the identification.class
attribute which may have one of the following values:
- org.ariadne.oai.utils.HashID
- org.ariadne.oai.utils.HarvesterUtils
- org.ariadne.oai.utils.PIDS
Using the first option one can have global identifier created using the hex representation of hashing algorithms of the metadata content. The algorithms supported are the following:
- MD2
- MD5
- SHA-1
- SHA-256
- SHA-384
- SHA-512
Using the second option one can have global identifier created using the following scheme:
global_lo_id=dataprovider+general.identifier.entry
global_lom_id=dataprovider+metaMetadata.identifier.entry
If you choose the program to create global LOM ID then the created global id will be used as the XML file name as well.
Third step is to define the value of the catalog element by setting the following attribute in the configure.properties file:
identification.catalog.value
The value above will be used as the general.identifier.catalog and/or metaMetadata.identifier.catalog value.It is also one part of the newly created LO or LOM Global Identifier.
Last step is the logging configuration:
log.file.path=path_to_folder_where_the_logs_will_be_saved
log.file.name=log_file_name
Depending on the operating system of your machine open cmd (for Windows) a bash command program(for linux),change to directory where AriadneGIdentification.jar is located and enter the following:
java -jar AriadneIdentification.jar inpuFolder outputFolder
Tip: For large numbers of XML files on the command above you should add the argument Xmx and define the maximum size of heap size that should be used by the java virtual machine like this:
java -Xmx4096m -jar Ariadnedentification.jar inpuFolder outputFolder
As shown above there are two options of creating global identifiers.
-
Using the scheme:GlobalID=catalogValue+providerName+preExisting general.identifier.entry element or metaMetadata.identifier.entry.
-
Using the hex representation of MD5 hash of the metadata content. With this approach the global id is created using the HEX representation of the MD5 hash function applied to the content of the metadata record*.The steps followed are the following:
- Read the content of the metadata record.
- Store it temporarily to a string.
- Compute the Hash using the string above as input.
- Get the HEX representation of the computed hash.
- ΜetadataRecD=catalogValue+providerName+stringHash.
*This option is used only for creating distinct metadata identifiers and not learning object identifiers.
As shown above the Global LO Identifier creation scheme is the same for both approaches.One thing we should mention is that there are times when there is no general.identifier at first place so as a fallback option we use the technical.location element value as the general.identifier.There also might be missing the metaMetadata.identifier element.For this case it is used the filename of the XML record as a fallback option.
When the global id is created then it is injected in the respective record as an additional general.identifier or metaMetadata.identifier element with all its LOM child(catalog,entry) elements.
DATE----NUMBER_OF_IDENTIFIED_RECORDS----REPONAME----GLOBAL_GENERALID(TRUE|FALSE)----GLOBAL_METAMETADATAID(TRUE|FALSE)