-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Importer project cleanup #41
Conversation
While I haven't encountered one of these previously, they cause an infinite retry loop without this patch. Strangely enough the logged error was an author validation error, although these do not get thrown... I was not able to reproduce this error with the same sample (OL100223A): > Invalid Name section - Invalid name section name null > Invalid name section sort name null > Invalid name section language null
Delete type definitions which have been moved to bookbrainz-data, see metabrainz/bookbrainz-data-js@7679731
The exploration tool was used to collect statistics about available JSON attributes in OL dumps. If we ever need this again, it will be easier to rewrite it from scratch.
Also uninstall the now unused `async` package.
Now that the outdated legacy code is gone, the eslint output finally became clearer as it contained less irrelevant messages.
The config.sample.json file and the importer README are reasonably self-explanatory for now.
Looks all good so far ! I agree with moving files in another PR. Happy to have an IRC chat to figure out folder structure if you want.
It might indeed make sense to do that. |
Let's postpone that until I've got the OpenLibrary importer (including relationships) working, then I will have a better understanding how complex each producer implementation will be and how many common code is left which does not belong into bb-data.
Since the identifier mapping enum will be used by both bb-site and bb-utils, it should definitely live in bb-data in the end. |
So does that mean that work on this PR is done, and it's ready for final review and merging? |
Yes, I would say the work is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set up everything from scratch following the instructions in the readme, and successfully produced messages for 100 authors, then started the consumer which successfully consumed 100 messages.
Everything is working fine and the code looks much cleaner, nice work !
Even though I get errors on insertion in the DB, the tool works great and the CLI is very usable (testing the failure queue while I was at it).
IMO the default mechanism should be to post failures to the failure queue, with the ability to pass an option to change the failure queue name or disable it. (probably fine to change this in a separate PR once we've used the tool a bit more, maybe my opinion was actually 💩 )
Good to hear that you have successfully tested it, with 100 messages even 😮 Most of the current errors seem to be the result from languages which are missing in the BB (sample) database but are present in MB. The reason for these rare languages appearing in our sample data at all is the bad performance of the language detection for author aliases, which are way too short to reliably do that. I think we should rather leave the alias language blank. I agree that changing the defaults to have a failure queue makes sense. Better store these messages by default than losing them by default. I think the primary motivation for the current default is that the CLI parameter was easier to implement that way 😁 |
Continuation of #40, depends on metabrainz/bookbrainz-data-js#309
Most commits are only deleting unused stuff, so this one should be easy to review.
Additionally I've fixed a rare bug in ca8f751, improved a few types and resolved a few linter errors. (Now that the outdated legacy code is gone, the eslint output finally became clearer as it contained less irrelevant messages.)
asyncCluster
(currently no replacement) and uninstallasync
npm packagea subfolderbb-data (15abafc depends on TypeScript migration bookbrainz-data-js#309, could be dropped from this PR if necessary)As you can see above, I haven't moved any files yet as I'm still unsure what would be a good folder structure. Until we have found one, I would prefer not to move any files as this only makes it very hard to trace the git history of a file. 8965604 is still annoying me from time to time in this regard. (I wonder why it was decided to have two unrelated project in one repository, that has tripped me a few times already when I accidentally ran npm or yarn inside the root directory...)