-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Saarland Scraper #121
base: master
Are you sure you want to change the base?
Conversation
This is my first point of contact with this projects code. Upon inspecting the saarland_scraper test cases I found mostly tests of internal and potentially private methods. I didn't feel confident enough to touch the scrapers implementation without a high level test perspective of: - Giving real world derived server response to scraper - Checking for extracted entries and their properties This commit adds webmock to stub the scrapers network calls. That way the used fixtures can be more easily checked against the real world responses or updated with current data in case the website changes their format again. references robbi5#119
- Input data was updated - Test expectations were updated TODO: Update DetailScraper references robbi5#119
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some files could not be reviewed due to errors:
.rubocop.yml: Style/AccessModifierIndentation has the wrong namespace - shoul...
.rubocop.yml: Style/AccessModifierIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/AlignHash has the wrong namespace - should be Layout .rubocop.yml: Style/AlignParameters has the wrong namespace - should be Layout .rubocop.yml: Style/CaseIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/DotPosition has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLineBetweenDefs has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundBlockBody has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundClassBody has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundModuleBody has the wrong namespace - should be Layout .rubocop.yml: Style/FileName has the wrong namespace - should be Naming .rubocop.yml: Style/FirstParameterIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/IndentationWidth has the wrong namespace - should be Layout .rubocop.yml: Style/IndentHash has the wrong namespace - should be Layout .rubocop.yml: Style/MethodName has the wrong namespace - should be Naming .rubocop.yml: Style/MultilineOperationIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/PredicateName has the wrong namespace - should be Naming .rubocop.yml: Style/SpaceAroundBlockParameters has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAroundEqualsInParameterDefault has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceBeforeBlockBraces has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceInsideBlockBraces has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceInsideHashLiteralBraces has the wrong namespace - should be Layout .rubocop.yml: Style/TrailingBlankLines has the wrong namespace - should be Layout .rubocop.yml: Style/VariableName has the wrong namespace - should be Naming .rubocop.yml: Lint/EndAlignment has the wrong namespace - should be Layout .rubocop.yml: Lint/DefEndAlignment has the wrong namespace - should be Layout .rubocop.yml: Style/ExtraSpacing has the wrong namespace - should be Layout .rubocop.yml: Style/AccessorMethodName has the wrong namespace - should be Naming .rubocop.yml: Style/AlignArray has the wrong namespace - should be Layout .rubocop.yml: Style/AsciiIdentifiers has the wrong namespace - should be Naming .rubocop.yml: Style/BlockEndNewline has the wrong namespace - should be Layout .rubocop.yml: Style/ClassAndModuleCamelCase has the wrong namespace - should be Naming .rubocop.yml: Style/CommentIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/ConstantName has the wrong namespace - should be Naming .rubocop.yml: Style/ElseAlignment has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLines has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundAccessModifier has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundMethodBody has the wrong namespace - should be Layout .rubocop.yml: Style/EndOfLine has the wrong namespace - should be Layout .rubocop.yml: Style/IndentationConsistency has the wrong namespace - should be Layout .rubocop.yml: Style/IndentArray has the wrong namespace - should be Layout .rubocop.yml: Style/LeadingCommentSpace has the wrong namespace - should be Layout .rubocop.yml: Style/MultilineBlockLayout has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterColon has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterComma has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterMethodName has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterNot has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterSemicolon has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceBeforeComma has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceBeforeComment has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceBeforeSemicolon has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAroundOperators has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceInsideParens has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceInsideRangeLiteral has the wrong namespace - should be Layout .rubocop.yml: Style/Tab has the wrong namespace - should be Layout .rubocop.yml: Style/TrailingWhitespace has the wrong namespace - should be Layout .rubocop.yml: Lint/BlockAlignment has the wrong namespace - should be Layout .rubocop.yml: Lint/ConditionPosition has the wrong namespace - should be Layout .rubocop.yml: Lint/Eval has the wrong namespace - should be Security .rubocop.yml: Lint/SpaceBeforeFirstArg has the wrong namespace - should be Layout Error: The `Style/TrailingComma` cop no longer exists. Please use `Style/TrailingCommaInArguments`, `Style/TrailingCommaInArrayLiteral`, and/or `Style/TrailingCommaInHashLiteral` instead. (obsolete configuration found in .rubocop.yml, please update it) The `Rails/DefaultScope` cop no longer exists. The `Lint/InvalidCharacterLiteral` cop has been removed since it was never being actually triggered. The `Style/SingleSpaceBeforeFirstArg` cop has been renamed to `Layout/SpaceBeforeFirstArg`. The `Style/SpaceAfterControlKeyword` cop has been removed. Please use `Layout/SpaceAroundKeyword` instead. The `Style/SpaceBeforeModifierKeyword` cop has been removed. Please use `Layout/SpaceAroundKeyword` instead. The `Style/MethodCallParentheses` cop has been renamed to `Style/MethodCallWithoutArgsParentheses`. The `Style/DeprecatedHashMethods` cop has been renamed to `Style/PreferredHashMethods`. The `Style/OpMethod` cop has been renamed and moved to `Naming/BinaryOperatorParameterName`. obsolete parameter EnforcedStyle (for Style/Encoding) found in .rubocop.yml Style/Encoding no longer supports styles. The "never" behavior is always assumed. obsolete parameter SupportedStyles (for Style/Encoding) found in .rubocop.yml obsolete parameter MaxLineLength (for Style/IfUnlessModifier) found in .rubocop.yml `Style/IfUnlessModifier: MaxLineLength` has been removed. Use `Metrics/LineLength: Max` instead obsolete parameter MaxLineLength (for Style/WhileUntilModifier) found in .rubocop.yml `Style/WhileUntilModifier: MaxLineLength` has been removed. Use `Metrics/LineLength: Max` instead obsolete parameter RunRailsCops (for AllCops) found in .rubocop.yml Use the following configuration instead: Rails: Enabled: true obsolete parameter IndentWhenRelativeTo (for Layout/CaseIndentation) found in .rubocop.yml `IndentWhenRelativeTo` has been renamed to `EnforcedStyle` obsolete parameter AlignWith (for Layout/EndAlignment) found in .rubocop.yml `AlignWith` has been renamed to `EnforcedStyleAlignWith` obsolete parameter AlignWith (for Layout/DefEndAlignment) found in .rubocop.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some files could not be reviewed due to errors:
.rubocop.yml: Style/AccessModifierIndentation has the wrong namespace - shoul...
.rubocop.yml: Style/AccessModifierIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/AlignHash has the wrong namespace - should be Layout .rubocop.yml: Style/AlignParameters has the wrong namespace - should be Layout .rubocop.yml: Style/CaseIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/DotPosition has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLineBetweenDefs has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundBlockBody has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundClassBody has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundModuleBody has the wrong namespace - should be Layout .rubocop.yml: Style/FileName has the wrong namespace - should be Naming .rubocop.yml: Style/FirstParameterIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/IndentationWidth has the wrong namespace - should be Layout .rubocop.yml: Style/IndentHash has the wrong namespace - should be Layout .rubocop.yml: Style/MethodName has the wrong namespace - should be Naming .rubocop.yml: Style/MultilineOperationIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/PredicateName has the wrong namespace - should be Naming .rubocop.yml: Style/SpaceAroundBlockParameters has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAroundEqualsInParameterDefault has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceBeforeBlockBraces has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceInsideBlockBraces has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceInsideHashLiteralBraces has the wrong namespace - should be Layout .rubocop.yml: Style/TrailingBlankLines has the wrong namespace - should be Layout .rubocop.yml: Style/VariableName has the wrong namespace - should be Naming .rubocop.yml: Lint/EndAlignment has the wrong namespace - should be Layout .rubocop.yml: Lint/DefEndAlignment has the wrong namespace - should be Layout .rubocop.yml: Style/ExtraSpacing has the wrong namespace - should be Layout .rubocop.yml: Style/AccessorMethodName has the wrong namespace - should be Naming .rubocop.yml: Style/AlignArray has the wrong namespace - should be Layout .rubocop.yml: Style/AsciiIdentifiers has the wrong namespace - should be Naming .rubocop.yml: Style/BlockEndNewline has the wrong namespace - should be Layout .rubocop.yml: Style/ClassAndModuleCamelCase has the wrong namespace - should be Naming .rubocop.yml: Style/CommentIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/ConstantName has the wrong namespace - should be Naming .rubocop.yml: Style/ElseAlignment has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLines has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundAccessModifier has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundMethodBody has the wrong namespace - should be Layout .rubocop.yml: Style/EndOfLine has the wrong namespace - should be Layout .rubocop.yml: Style/IndentationConsistency has the wrong namespace - should be Layout .rubocop.yml: Style/IndentArray has the wrong namespace - should be Layout .rubocop.yml: Style/LeadingCommentSpace has the wrong namespace - should be Layout .rubocop.yml: Style/MultilineBlockLayout has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterColon has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterComma has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterMethodName has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterNot has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterSemicolon has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceBeforeComma has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceBeforeComment has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceBeforeSemicolon has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAroundOperators has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceInsideParens has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceInsideRangeLiteral has the wrong namespace - should be Layout .rubocop.yml: Style/Tab has the wrong namespace - should be Layout .rubocop.yml: Style/TrailingWhitespace has the wrong namespace - should be Layout .rubocop.yml: Lint/BlockAlignment has the wrong namespace - should be Layout .rubocop.yml: Lint/ConditionPosition has the wrong namespace - should be Layout .rubocop.yml: Lint/Eval has the wrong namespace - should be Security .rubocop.yml: Lint/SpaceBeforeFirstArg has the wrong namespace - should be Layout Error: The `Style/TrailingComma` cop no longer exists. Please use `Style/TrailingCommaInArguments`, `Style/TrailingCommaInArrayLiteral`, and/or `Style/TrailingCommaInHashLiteral` instead. (obsolete configuration found in .rubocop.yml, please update it) The `Rails/DefaultScope` cop no longer exists. The `Lint/InvalidCharacterLiteral` cop has been removed since it was never being actually triggered. The `Style/SingleSpaceBeforeFirstArg` cop has been renamed to `Layout/SpaceBeforeFirstArg`. The `Style/SpaceAfterControlKeyword` cop has been removed. Please use `Layout/SpaceAroundKeyword` instead. The `Style/SpaceBeforeModifierKeyword` cop has been removed. Please use `Layout/SpaceAroundKeyword` instead. The `Style/MethodCallParentheses` cop has been renamed to `Style/MethodCallWithoutArgsParentheses`. The `Style/DeprecatedHashMethods` cop has been renamed to `Style/PreferredHashMethods`. The `Style/OpMethod` cop has been renamed and moved to `Naming/BinaryOperatorParameterName`. obsolete parameter EnforcedStyle (for Style/Encoding) found in .rubocop.yml Style/Encoding no longer supports styles. The "never" behavior is always assumed. obsolete parameter SupportedStyles (for Style/Encoding) found in .rubocop.yml obsolete parameter MaxLineLength (for Style/IfUnlessModifier) found in .rubocop.yml `Style/IfUnlessModifier: MaxLineLength` has been removed. Use `Metrics/LineLength: Max` instead obsolete parameter MaxLineLength (for Style/WhileUntilModifier) found in .rubocop.yml `Style/WhileUntilModifier: MaxLineLength` has been removed. Use `Metrics/LineLength: Max` instead obsolete parameter RunRailsCops (for AllCops) found in .rubocop.yml Use the following configuration instead: Rails: Enabled: true obsolete parameter IndentWhenRelativeTo (for Layout/CaseIndentation) found in .rubocop.yml `IndentWhenRelativeTo` has been renamed to `EnforcedStyle` obsolete parameter AlignWith (for Layout/EndAlignment) found in .rubocop.yml `AlignWith` has been renamed to `EnforcedStyleAlignWith` obsolete parameter AlignWith (for Layout/DefEndAlignment) found in .rubocop.yml
Had failing tests because of daylight saving time references robbi5#119
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some files could not be reviewed due to errors:
.rubocop.yml: Style/AccessModifierIndentation has the wrong namespace - shoul...
.rubocop.yml: Style/AccessModifierIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/AlignHash has the wrong namespace - should be Layout .rubocop.yml: Style/AlignParameters has the wrong namespace - should be Layout .rubocop.yml: Style/CaseIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/DotPosition has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLineBetweenDefs has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundBlockBody has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundClassBody has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundModuleBody has the wrong namespace - should be Layout .rubocop.yml: Style/FileName has the wrong namespace - should be Naming .rubocop.yml: Style/FirstParameterIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/IndentationWidth has the wrong namespace - should be Layout .rubocop.yml: Style/IndentHash has the wrong namespace - should be Layout .rubocop.yml: Style/MethodName has the wrong namespace - should be Naming .rubocop.yml: Style/MultilineOperationIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/PredicateName has the wrong namespace - should be Naming .rubocop.yml: Style/SpaceAroundBlockParameters has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAroundEqualsInParameterDefault has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceBeforeBlockBraces has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceInsideBlockBraces has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceInsideHashLiteralBraces has the wrong namespace - should be Layout .rubocop.yml: Style/TrailingBlankLines has the wrong namespace - should be Layout .rubocop.yml: Style/VariableName has the wrong namespace - should be Naming .rubocop.yml: Lint/EndAlignment has the wrong namespace - should be Layout .rubocop.yml: Lint/DefEndAlignment has the wrong namespace - should be Layout .rubocop.yml: Style/ExtraSpacing has the wrong namespace - should be Layout .rubocop.yml: Style/AccessorMethodName has the wrong namespace - should be Naming .rubocop.yml: Style/AlignArray has the wrong namespace - should be Layout .rubocop.yml: Style/AsciiIdentifiers has the wrong namespace - should be Naming .rubocop.yml: Style/BlockEndNewline has the wrong namespace - should be Layout .rubocop.yml: Style/ClassAndModuleCamelCase has the wrong namespace - should be Naming .rubocop.yml: Style/CommentIndentation has the wrong namespace - should be Layout .rubocop.yml: Style/ConstantName has the wrong namespace - should be Naming .rubocop.yml: Style/ElseAlignment has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLines has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundAccessModifier has the wrong namespace - should be Layout .rubocop.yml: Style/EmptyLinesAroundMethodBody has the wrong namespace - should be Layout .rubocop.yml: Style/EndOfLine has the wrong namespace - should be Layout .rubocop.yml: Style/IndentationConsistency has the wrong namespace - should be Layout .rubocop.yml: Style/IndentArray has the wrong namespace - should be Layout .rubocop.yml: Style/LeadingCommentSpace has the wrong namespace - should be Layout .rubocop.yml: Style/MultilineBlockLayout has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterColon has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterComma has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterMethodName has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterNot has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAfterSemicolon has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceBeforeComma has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceBeforeComment has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceBeforeSemicolon has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceAroundOperators has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceInsideParens has the wrong namespace - should be Layout .rubocop.yml: Style/SpaceInsideRangeLiteral has the wrong namespace - should be Layout .rubocop.yml: Style/Tab has the wrong namespace - should be Layout .rubocop.yml: Style/TrailingWhitespace has the wrong namespace - should be Layout .rubocop.yml: Lint/BlockAlignment has the wrong namespace - should be Layout .rubocop.yml: Lint/ConditionPosition has the wrong namespace - should be Layout .rubocop.yml: Lint/Eval has the wrong namespace - should be Security .rubocop.yml: Lint/SpaceBeforeFirstArg has the wrong namespace - should be Layout Error: The `Style/TrailingComma` cop no longer exists. Please use `Style/TrailingCommaInArguments`, `Style/TrailingCommaInArrayLiteral`, and/or `Style/TrailingCommaInHashLiteral` instead. (obsolete configuration found in .rubocop.yml, please update it) The `Rails/DefaultScope` cop no longer exists. The `Lint/InvalidCharacterLiteral` cop has been removed since it was never being actually triggered. The `Style/SingleSpaceBeforeFirstArg` cop has been renamed to `Layout/SpaceBeforeFirstArg`. The `Style/SpaceAfterControlKeyword` cop has been removed. Please use `Layout/SpaceAroundKeyword` instead. The `Style/SpaceBeforeModifierKeyword` cop has been removed. Please use `Layout/SpaceAroundKeyword` instead. The `Style/MethodCallParentheses` cop has been renamed to `Style/MethodCallWithoutArgsParentheses`. The `Style/DeprecatedHashMethods` cop has been renamed to `Style/PreferredHashMethods`. The `Style/OpMethod` cop has been renamed and moved to `Naming/BinaryOperatorParameterName`. obsolete parameter EnforcedStyle (for Style/Encoding) found in .rubocop.yml Style/Encoding no longer supports styles. The "never" behavior is always assumed. obsolete parameter SupportedStyles (for Style/Encoding) found in .rubocop.yml obsolete parameter MaxLineLength (for Style/IfUnlessModifier) found in .rubocop.yml `Style/IfUnlessModifier: MaxLineLength` has been removed. Use `Metrics/LineLength: Max` instead obsolete parameter MaxLineLength (for Style/WhileUntilModifier) found in .rubocop.yml `Style/WhileUntilModifier: MaxLineLength` has been removed. Use `Metrics/LineLength: Max` instead obsolete parameter RunRailsCops (for AllCops) found in .rubocop.yml Use the following configuration instead: Rails: Enabled: true obsolete parameter IndentWhenRelativeTo (for Layout/CaseIndentation) found in .rubocop.yml `IndentWhenRelativeTo` has been renamed to `EnforcedStyle` obsolete parameter AlignWith (for Layout/EndAlignment) found in .rubocop.yml `AlignWith` has been renamed to `EnforcedStyleAlignWith` obsolete parameter AlignWith (for Layout/DefEndAlignment) found in .rubocop.yml
Thank you for the PR and sorry for the long delay. I've added code for the Detail Scraper today, for the Overview Scraper I think a bit of pagination is needed to get older published papers. The method for this would be Rake commands for calling the scrapers already exist, but I see that they could be hard to follow, because they trigger async scraping. For development and debugging synchronous scraping would be better to follow. My method for testing is currently using the |
The Saarland Landtag updated their website and listing of published papers. This pull request tries to adjust the scraper to the new listing.
It adds webmock for higher level testing of general scraper input. This way the scraping itself can be tested as blackbox and the test can get more robust to internal changes and rewrites of the scraper class.
Side note 1:
Unfortunately I don't have the needed insight to test the scraper in a real world scenario since I don't understand the surrounding necessities like the body model yet. Maybe there is or there could be an easy to fire up rake command which ensures that the general workflow of scraping and extracting files is still working with a live internet connection? Maybe like an integration test under real world conditions.
Side note 2:
I just stumbled upon this project a few days ago and it's brilliant! Thanks for all your work, @robbi5 !