Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some catalogue specific elements do not work properly #14

Open
TobiasNx opened this issue Sep 20, 2024 · 15 comments
Open

Some catalogue specific elements do not work properly #14

TobiasNx opened this issue Sep 20, 2024 · 15 comments
Assignees

Comments

@TobiasNx
Copy link

Some of the changes in GKT and other hbz specific elements with letters do not validate properly even if they are configured:

grafik

"t", "Titel eines Werkes", "NR",
"f", "Datum eines Werkes", "NR",
"g", "Zusatz", "R",
"h", "Inhaltstyp", "R",
"l", "Sprache der Expression", "R",
"m", "Medium der Musikaufführung", "R",
"n", "Zählung eines Teils/einer Abteilung eines Werkes", "R",
"o", "Angabe des Musikarrangements", "R",
"p", "Titel eines Teils/einer Abteilung eines Werkes", "R",
"r", "Tonart", "NR",

@TobiasNx
Copy link
Author

TobiasNx commented Sep 26, 2024

The same happens with core specific enhancements: #13

@pkiraly
Copy link
Collaborator

pkiraly commented Sep 26, 2024

Did you specified the local version with --marcVersion? Could you send me the parameters you applied for the validation?

@TobiasNx
Copy link
Author

TYPE_PARAMS="--marcVersion HBZ --marcxml --fixAlma --ignorableRecords DEL$a=Y --ignorableFields 964,940,941,942,944,945,946,947,948,949,950,951,952,955,956,957,958,959,966,967,970,971,972,973,974,975,976,977,978,978,979"

@pkiraly
Copy link
Collaborator

pkiraly commented Sep 27, 2024

Thanks! It seems OK. The next step would be to write a unit test. If you could add a file with about 1-2 MARCXML records into src/test/resources/marc of your fork? I would write the unit tests and it helps detecting the errors.

TobiasNx added a commit that referenced this issue Oct 9, 2024
990082522550206441 -> undefined subfield 710 $$9  (#13)
990171082050206441 -> invalid value 246$ind2 	9 (#13)
991000922029706482 -> undefined subfield GKT 	t  and  981 	b  (#14)
@TobiasNx
Copy link
Author

@pkiraly we added three marcxml files in the folder marcxml. Also I added distinct files to hint the issues in the file names.

0358af4

@pkiraly
Copy link
Collaborator

pkiraly commented Oct 12, 2024

@TobiasNx Thanks! I wrote a unit test against these 3 files, but I was not able to reproduce the error. I found another error though: the ignorableRecords parameter throw an error when we save the parameters into a JSON file. This might block the success of the validation. Could you check the validation.log if you find a Java exception close to the end of the file?
I just fixed this issue in the main repository, see pkiraly#525.

I can not push the test against HBZ files because I do not have the necessary permission, so I put the code here. Please add it to src/test/java/de/gwdg/metadataqa/marc/cli/ValidatorCliTest.java:

// add this line to the import section
import java.util.stream.Collectors;

// put it after the last test method
  @Test
  public void validate_whenHbz() throws Exception {
    clearOutput(outputDir, outputFiles);

    ValidatorCli processor = new ValidatorCli(new String[]{
      "--schemaType", "MARC21",
      "--marcVersion", "HBZ",
      "--marcxml",
      "--outputDir", outputDir,
      "--fixAlma",
      "--ignorableRecords", "DEL$a=Y",
      "--ignorableFields", "964,940,941,942,944,945,946,947,948,949,950,951,952,955,956,957,958,959,966,967,970,971,972,973,974,975,976,977,978,978,979",
      "--details",
      "--trimId",
      "--summary",
      TestUtils.getPath("marcxml/990082522550206441_missing_validation_custom_subfield_9_core_710.xml"),
      TestUtils.getPath("marcxml/990171082050206441_missing_validation_custom_ind2_9_core_246.xml"),
      TestUtils.getPath("marcxml/991000922029706482_missing_subfield_validation_t_in_customfield_GKT.xml"),
    });

    RecordIterator iterator = new RecordIterator(processor);
    iterator.setProcessWithErrors(true);
    iterator.start();

    List<String> lines = getFileLines("issue-summary.csv");
    assertEquals(3, lines.size());
    List<String> undefinedFields = lines.stream()
      .filter(line -> line.contains("undefined field"))
      .collect(Collectors.toList());
    assertEquals(0, undefinedFields.size());
    // Pattern pattern = Pattern.compile("^\\d+,952,\\d+,\\d+,undefined field");
    // assertTrue(pattern.matcher(undefinedFields.get(0)).find());
  }

@TobiasNx
Copy link
Author

@Phu2 will take care of it.

@Phu2
Copy link

Phu2 commented Oct 18, 2024

Could you check the validation.log if you find a Java exception close to the end of the file?

No exceptions found. These are the last 20 lines of processing our whole basedump containing >27 mio records:

qa-catalogue@quaoar4:~/qa-catalogue$ tail -n 20 logs/hbz/validate.log 
Sep 23, 2024 3:27:51 PM de.gwdg.metadataqa.marc.dao.Control007 processContent
SEVERE: #991013998379706467) invalid category for 007: '|'
Sep 23, 2024 3:27:51 PM de.gwdg.metadataqa.marc.dao.Control007 processContent
SEVERE: #991013998419706467) invalid category for 007: '|'
Sep 23, 2024 3:27:51 PM de.gwdg.metadataqa.marc.dao.Control007 processContent
SEVERE: #991013998449706467) invalid category for 007: '|'
Sep 23, 2024 3:27:51 PM de.gwdg.metadataqa.marc.dao.Control007 processContent
SEVERE: #991003042719706480) invalid category for 007: '|'
Sep 23, 2024 3:27:52 PM de.gwdg.metadataqa.marc.cli.utils.RecordIterator processFile
INFO: Finished processing file. Processed 27,824,613 records.
Sep 23, 2024 3:27:52 PM de.gwdg.metadataqa.marc.cli.ValidatorCli afterIteration
INFO: printCounter
Sep 23, 2024 3:27:52 PM de.gwdg.metadataqa.marc.cli.ValidatorCli afterIteration
INFO: Saving summary
Sep 23, 2024 3:32:46 PM de.gwdg.metadataqa.marc.cli.ValidatorCli afterIteration
INFO: all printing is DONE
Sep 23, 2024 3:32:46 PM de.gwdg.metadataqa.marc.cli.QACli saveParameters
INFO: Saving configuration to /opt/qa-catalogue/output/hbz/validation.params.json.
Sep 23, 2024 3:32:46 PM de.gwdg.metadataqa.marc.cli.utils.RecordIterator start
INFO: Bye! It took: 01:40:42

Phu2 added a commit that referenced this issue Oct 18, 2024
Phu2 added a commit that referenced this issue Oct 18, 2024
@Phu2
Copy link

Phu2 commented Oct 18, 2024

I can not push the test against HBZ files because I do not have the necessary permission

Now you have write access (pending invitation).

@Phu2
Copy link

Phu2 commented Oct 18, 2024

@pkiraly Updated ValidatorCliTest.java as suggested, see new branch 14-test-validation.

@Phu2 Phu2 assigned TobiasNx and unassigned Phu2 Oct 18, 2024
@TobiasNx
Copy link
Author

@pkiraly
Copy link
Collaborator

pkiraly commented Oct 18, 2024

This is what the pkiraly#525 fixes. Now I have write permission, so I will fix it today in this branch.

@pkiraly
Copy link
Collaborator

pkiraly commented Oct 18, 2024

@TobiasNx I pushed the changes. You can try it again.

@Phu2
Copy link

Phu2 commented Oct 21, 2024

Thanks, @pkiraly ! mvn test runs fine without any errors.
@TobiasNx please have a look when you are back from vacation.

@TobiasNx
Copy link
Author

TobiasNx commented Oct 28, 2024

See: #17

mvn clean install locally seems to work now and the new pr seems to work too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants