Skip to content

Conversation

@dr0i
Copy link
Member

@dr0i dr0i commented Mar 28, 2025

This is a draft and WIP.
@TobiasNx you can use it for functional testing.

Resolves #510.

@dr0i dr0i requested a review from TobiasNx March 28, 2025 12:18
@dr0i dr0i changed the title WIP SRUopener (#510) WIP SRUopener Mar 28, 2025
@dr0i dr0i moved this to Review in Metafacture Mar 28, 2025
@TobiasNx
Copy link
Contributor

Nice seems to work. +1
The printed logs are a little bit esoteric:

, startRecord=1, maximumRecords=1000, istream.length=7865
urlToOpen=https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1000&startRecord=1001
, startRecord=1001, maximumRecords=1000, istream.length=7865
urlToOpen=https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1000&startRecord=2001
, startRecord=2001, maximumRecords=1000, istream.length=7865
urlToOpen=https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1000&startRecord=3001
, startRecord=3001, maximumRecords=1000, istream.length=7865
urlToOpen=https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1000&startRecord=4001
, startRecord=4001, maximumRecords=1000, istream.length=437

@TobiasNx
Copy link
Contributor

@dr0i is still in review?

@dr0i
Copy link
Member Author

dr0i commented Apr 10, 2025

As we found out in #510 this PR needs a complete redesign.

@dr0i dr0i force-pushed the 510-addSruOpener branch from ecd9c8c to c3f3ad6 Compare April 10, 2025 13:32
@dr0i dr0i force-pushed the 510-addSruOpener branch from 84d6845 to 3dc0416 Compare June 2, 2025 14:24
@dr0i
Copy link
Member Author

dr0i commented Jun 2, 2025

@TobiasNx can you do functional tests before I go on here? Have a look at the @Description to see how it works (hint: "stream" based, i.e. other than the OAI-PMH opener works atm.)
I've added the class to flux-commands.
[edit]: and ignore the failing editorconfigChecker for now.

@dr0i dr0i requested a review from TobiasNx June 2, 2025 14:38
@TobiasNx
Copy link
Contributor

TobiasNx commented Jun 4, 2025

@dr0i I tried to install the dist: https://metafacture.github.io/metafacture-documentation/docs/flux/Flux-User-Guide.html#build-from-local-distribution to try the runner for functional testing

but it runs into errors:

$ ./gradlew installDist

> Configure project :
HEAD has no annotated tags
No SCM tag found. Making a snapshot build
Feature branch found
Version is feature-510-addSruOpener-SNAPSHOT

[Incubating] Problems report is available at: file:///home/user/git/metafacture-core/build/reports/problems/problems-report.html

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.13/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

When I test the flux.sh then it outputs the following:

$ /home/user/git/metafacture-core/metafacture-runner/build/install/metafacture-core/flux.sh
Exception in thread "main" java.lang.ExceptionInInitializerError
        at org.metafacture.runner.Flux.main(Flux.java:62)
Caused by: org.metafacture.commons.reflection.ReflectionException: Class not found: org.metafacture.io.
        at org.metafacture.commons.reflection.ReflectionUtil.loadClass(ReflectionUtil.java:70)
        at org.metafacture.commons.reflection.ObjectFactory.loadClassesFromMap(ObjectFactory.java:57)
        at org.metafacture.flux.parser.FluxProgramm.<clinit>(FluxProgramm.java:54)
        ... 1 more
Caused by: java.lang.ClassNotFoundException: org.metafacture.io.
        at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
        at org.metafacture.commons.reflection.ReflectionUtil.loadClass(ReflectionUtil.java:67)
        ... 3 more

Can you help? (I tested the current master to compare, there $ ./gradlew installDist and $ /home/user/git/metafacture-core/metafacture-runner/build/install/metafacture-core/flux.sh works)

@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx Jun 4, 2025
@dr0i
Copy link
Member Author

dr0i commented Jun 5, 2025

Ah, I accidently removed the TarReader.
Try again please.

@dr0i dr0i assigned TobiasNx and unassigned dr0i Jun 5, 2025
@dr0i dr0i force-pushed the 510-addSruOpener branch from 4a00838 to ac80718 Compare June 10, 2025 13:02
@TobiasNx
Copy link
Contributor

TobiasNx commented Jun 18, 2025

Current version stucks in an endless SRU request loop starting by 1 again after finishing all request does not matter if a total number of records is given or not:

e.g.

"https://services.dnb.de/sru/authorities"
| open-sru(recordSchema="MARC21plus-xml", query="WOE%3Dsozialistenkongress%20and%20COD%3Ds",version="1.1",maximumRecords="10")
| object-batch-log(batchsize="10")
| as-records
| write(FLUX_DIR + "result.txt")
;
"https://services.dnb.de/sru/authorities"
| open-sru(recordSchema="MARC21plus-xml", query="WOE%3Dsozialistenkongress%20and%20COD%3Ds",version="1.1",maximumRecords="10",total="10")
| object-batch-log(batchsize="10")
| as-records
| write(FLUX_DIR + "result.txt")
;

Both result in, see that recordPosition 1 is turning up again after the expected last recordPosition 8:

<?xml version="1.0" encoding="UTF-8"?><searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"><version>1.1</version><numberOfRecords>8</numberOfRecords><records><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">042278333</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20110429135047.0</controlfield>
    <controlfield tag="008">900305n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">4227833-8</subfield>
      <subfield code="0">http://d-nb.info/gnd/4227833-8</subfield>
      <subfield code="2">gnd</subfield>
    </datafield>
    <datafield ind1=" " ind2=" " tag="035">
      <subfield code="a">(DE-101)042278333</subfield>
...
    <datafield ind1=" " ind2=" " tag="913">
      <subfield code="S">swd</subfield>
      <subfield code="i">k</subfield>
      <subfield code="a">Internationaler Sozialistenkongress</subfield>
      <subfield code="0">(DE-588c)4021089-3</subfield>
    </datafield>
  </record>
</collection></recordData><recordPosition>8</recordPosition></record></records><echoedSearchRetrieveRequest><version>1.1</version><query>WOE=sozialistenkongress and COD=s</query><xQuery xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/><startRecord>6</startRecord><maximumRecords>5</maximumRecords><recordSchema>MARC21plus-xml</recordSchema></echoedSearchRetrieveRequest></searchRetrieveResponse>
<?xml version="1.0" encoding="UTF-8"?><searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"><version>1.1</version><numberOfRecords>8</numberOfRecords><records><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">042278333</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20110429135047.0</controlfield>
    <controlfield tag="008">900305n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">4227833-8</subfield>
      <subfield code="0">http://d-nb.info/gnd/4227833-8</subfield>
      <subfield code="2">gnd</subfield>
...
    </datafield>
    <datafield ind1=" " ind2=" " tag="913">
      <subfield code="S">swd</subfield>
      <subfield code="i">c</subfield>
      <subfield code="a">Bern / Internationaler Sozialistenkongress &lt;1919&gt;</subfield>
      <subfield code="0">(DE-588c)4227833-8</subfield>
    </datafield>
  </record>
</collection></recordData><recordPosition>1</recordPosition></record><record><recordSchema>MARC21plus-xml</recordSchema><recordPacking>xml</recordPacking><recordData><collection xmlns="http://www.loc.gov/MARC21/slim">
  <record type="Authority">
    <leader>00000nz  a2200000nc 4500</leader>
    <controlfield tag="001">1267605979</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20230329111229.0</controlfield>
    <controlfield tag="008">220908n||azznnaabn           | ana    |c</controlfield>
    <datafield ind1="7" ind2=" " tag="024">
      <subfield code="a">1267605979</subfield>
      <subfield code="0">http://d-nb.info/gnd/1267605979</subfield>
      <subfield code="2">gnd</subfield>
...

@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx Jun 18, 2025
Copy link
Contributor

@TobiasNx TobiasNx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that SRU opener stucks in infinite loop. See: #682 (comment)

@dr0i
Copy link
Member Author

dr0i commented Jun 20, 2025

The inifinite loop should be fixed with b92238b, please try again @TobiasNx .

@dr0i
Copy link
Member Author

dr0i commented Jun 27, 2025

@TobiasNx Can you update here that this is no bug but caused by marcxmlplus (or so) ?

@TobiasNx
Copy link
Contributor

@dr0i The behaviour I reported was not the problem from the workshop. But the identation behaviour i reported is related to handle-generic-xml so it is no bug of the SRU opener.

dr0i added 2 commits July 10, 2025 13:27
Every single output is a valid XML by itself.
@dr0i dr0i force-pushed the 510-addSruOpener branch from 841fae3 to 65e7592 Compare July 10, 2025 11:28
@dr0i
Copy link
Member Author

dr0i commented Jul 10, 2025

Hi @blackwinter if you have some time: can you implement tests here? Functional-wise is the modul ready.

@dr0i dr0i assigned blackwinter and unassigned dr0i Jul 10, 2025
@dr0i dr0i moved this from Working to Selected in Metafacture Jul 10, 2025
Copy link
Member

@blackwinter blackwinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass and left some comments. We should discuss tests after the open questions are resolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fails Checkstyle check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in 415dece besides the ClassFanOutComplexity complain. What to do about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checkstyle-disable-line it for now.

DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document xmldoc = docBuilder.parse(inputStreamOfURl);

Transformer t = TransformerFactory.newInstance().newTransformer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto (made them final)

@blackwinter blackwinter assigned dr0i and unassigned blackwinter Jul 22, 2025
@dr0i dr0i assigned blackwinter and unassigned dr0i Oct 20, 2025
Copy link
Member

@blackwinter blackwinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the tests, should be mostly okay. Left some more comments regarding small optimizations and naming improvements.

P.S.: Please let the original commenter resolve any open conversations. They should have the opportunity to decide whether they're satisfied with your proposed solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checkstyle-disable-line it for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fails Checkstyle check.

@Override
public void process(final String baseUrl) {

final StringBuilder srUrl = new StringBuilder(baseUrl);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a String to begin with (no need to build the string for every iteration).

Throw the exception first, then either build the string once or just concatenate all its elements.

try {
final InputStream inputStreamOfURl = retrieveUrl(srUrl);
final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
final DocumentBuilder docBuilder = factory.newDocumentBuilder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the DocumentBuilder be reused? Then move it outside the loop or make it static.

final DocumentBuilder docBuilder = factory.newDocumentBuilder();
final Document xmldoc = docBuilder.parse(inputStreamOfURl);

final Transformer t = TransformerFactory.newInstance().newTransformer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give t a more descriptive name.

@blackwinter blackwinter assigned dr0i and unassigned blackwinter Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Selected

Development

Successfully merging this pull request may close these issues.

Add SRU opener / open-sru

3 participants