Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes diff when fuzzy finder anything is used in a Hash object (proof of concept) #596

Closed

Conversation

KarlHeitmann
Copy link
Contributor

@KarlHeitmann KarlHeitmann commented Apr 25, 2024

Hi

I'd like to give you some context about this PR.

TL; DR

See #551 , it is annoying to see the noise added by anything matcher on diffs and I wanted to fix that. I initially thought the solution was trivial, but I quickly found out I was wrong when I began reading the Diff class: there is a lot happening under the hood. I know there are some limitations to the Differ, and I understand there is a lot to change. This PR is a proposal, I don't know if my solution is too hacky, and I'd like to receive some feedback in order to solve this problem.

Before:

@@ -1,4 +1,4 @@
-:an_key => anything,
+:an_key => "dummy",
 :fixed => "fixed",
-:nested => {:anything_key=>anything, :name=>"foo", :trigger=>"wrong"},
+:nested => {:anything_key=>"bcdd0399-1cfe-4de1-a481-ca6b17d41ed8", :name=>"foo", :trigger=>"trigger"},

After:

@@ -1,4 +1,4 @@
 :an_key => "dummy",
 :fixed => "fixed",
-:nested => {:anything_key=>"bcdd0399-1cfe-4de1-a481-ca6b17d41ed8", :name=>"foo", :trigger=>"wrong"},
+:nested => {:anything_key=>"bcdd0399-1cfe-4de1-a481-ca6b17d41ed8", :name=>"foo", :trigger=>"trigger"},

Context

I am currently working on a RoR project that uses Jbuilder to render JSON data that I am delivering on the endpoints of my API.

I am testing my .jbuilder files with type: view RSpec tests, and I have a lot of .json fixture files I am loading and using on my test files as expected values.

The problem this PR addresses.

My plan is to use these fixture files to kill two birds with one stone. On one side, I am comparing those fixture files with the JSON rendered by the .jbuilder files, and on the other side I am using those fixture files as mocks to share with the frontend developers, so they will have the data necessary to write their React application.

The plan is: if somebody in the backend changes a jbuilder, the tests will complain and I will remember to change the corresponding JSON fixture. This way, I will always have the JSON fixtures used by the frontend developers in sync with the backend rendered files. Frontend developers often need to work with data they don't know how to create on their local machines (because they do know React or Vue, but don't know the Rails framework)

In order to achieve the results of my plan, I ~ needed ~ to scrub some fields of my JSON fixture files using the anything fuzzy matcher. An example of fields I often need to scrub are the id, created_at and updated_at values.

I quickly discovered this annoying issue described on #551 : whenever I missed something that should make the test fail and render the diff in STDOUT, I found out the key-value pairs in my expected var with a anything matcher will add noise to the diff message, obfuscating which key actually differs from the expectation.

My approach.

My first idea was: before the place in the source code that renders the diff string between actual and expected vars, I can add a stage where I'll search every key-value pair that has an anything object as value in the expected variable, and replace the anything value with the value found on the actual var I am testing.

Hopefully the commits of this PR will show you what I mean. First commit consists in an implementation that will work with a Hash with no nested hashes, second commit will use recursivity to extend the solution to a hash object with nested hashes.

My reasons.

When I started writing the code to implement my idea, I've got a lot of questions: Can I just mutate the expected var? Shouldn't I get a notice message by the diff remembering that I've used the anything fuzzy matcher in my test? can I just copy the actual value in the expected var? Am I lying when I am doing that?

My answer to these questions was: If I decide to use the fuzzy matcher anything on a key-value pair in a hash object in my test, it implies I do not care about seeing the anything in my diff . Because it is a wild card, because the anything fuzzy matcher should morph into the real value on my actual variable.

I know this solution is hacky.... but it works! It gets rid of the noise generated by the anything fuzzy matcher. I tried to use the super_diff gem, but it has the same problem.

I am sure the code I wrote can be improved and be more concise (as surely could be the text on this PR, lol), and I thought about extending the fix to cover Array objects. But I wanted to know your opinion about this issue before continuing with this. Can I use a recursive function this way? What impact on performance will have my lines of code in this extra stage I am adding? As Aristotle said: every scientific investigation should begin by gathering the opinion of the wise people (ie, the people that earlier on thought and worked on the problem I will begin to work right now).

I always liked the "diff" concept. And I am willing to continue collaborating on enhancing the diff tool. I've seen @mcmire wrote this guide about RSpec, but I have not digged on the article yet. I am doing baby steps. Every feedback is appreciated.

Best.

@KarlHeitmann
Copy link
Contributor Author

I've submitted 3 new commits to get rid of the CyclomaticComplexity, PerceivedComplexity & MethodLength cops. Each commit fixes one offense. The commits can be squashed into one commit, but since there are maaany ways to solve these offenses, these are only proposals.

@KarlHeitmann
Copy link
Contributor Author

Latest commit is to fix a problem when using ruby 2.4, 2.3, 2.2. It substitutes prepend method of Array class by unshift. Because prior ruby 2.4 it was not defined.

I can convert that last commit into a fixup to amend: e4ece3df8e411ea4dab6eb08e02269ccf9c3fb14

@KarlHeitmann
Copy link
Contributor Author

Build is still failing before commit f9a9e9d.

image

I don't know how to reproduce these build environment. I don't know what ruby-head stands for neither I understand the error there. And I don't know how to reproduce the 1.8.7 neither REE (what does this mean?) ruby versions in my computer to dig in and understand why it is failing.

However, I copied and pasted the ruby strings in an irb session and printed them to STDOUT using puts:

image

And found out on version 1.8.7 and REE the keys were not in alphabetical order. I know rspec-support uses pretty print to make the diff of the objects, and my hypothesis is on ruby version 1.8.7, pretty print maybe was not working correctly with some ruby hashes, and its output may be somewhat random.

The only solution I thought of was to cheat and rearrange my tests so the keys are mimicking the output. Hopefully tests will pass now in version 1.8.7 and REE.

Any other ideas are welcome to make those ruby builds pass.

lib/rspec/support/differ.rb Outdated Show resolved Hide resolved
end

def recursive_get_keys(hash)
klass = RSpec::Mocks::ArgumentMatchers::AnyArgMatcher
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this is not defined, like someone opted out of mocking completely, or uses rr/mocha etc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s ok to use ‘return [] unless defined?(RSpec::Mocks)’ here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I didn't thought someone may opt out mocking completely. I've fixed it on my latest commit!

@pirj
Copy link
Member

pirj commented May 8, 2024

Please accept my apologies for not reviewing this swiftly.

May I ask you to add the output of some spec how it looked before and after this change?

Don’t worry about ruby-head, it is very typical for it to fail. We mostly use it a bit ahead of time to fix issues with upcoming Ruby releases.

Thanks for the effort you’re putting into this.

@pirj
Copy link
Member

pirj commented May 8, 2024

To me it’s high time to soft-deprecate Ruby 1.8.7.
If you feel that fixing it is not something you’d love to dive i to, please make sure no existing spec fail, and just make your new examples pending with a note “broken on old rubies for unknown reasons”.

@KarlHeitmann
Copy link
Contributor Author

Hey @pirj ! Thanks for reaching out! No worries, it is just my hobby try to fix simple things in github. I have on my bucket list to make a tiny contribution to an important project (like this one). But I understand people may be busier than me 🤣

I made this gist with the output of these 3 specs BEFORE the changes here, and this gist with the output AFTER the changes. I hope this clarifies what I intend to do with my change.

My intention is whenever you want to diff a Hash with another Hash, if the expected hash contains an anything fuzzy matcher, that anything value will morph into the value of the actual variable, in order to reduce the noise generated by the diff if another key-value pair has a mismatch. As described here #551

The three examples I wrote on my gists shows you one example of two hash comparisons with NO nested hashes, and the other two shows you what happens if there is a nested hash. Related to your first comment, if you think it is too unsafe to perform this recursively, I can tweak my PR so this will work only with plain hashes. The first commit of this PR implements this behavior ONLY on hash comparisons without nested hashes.

@KarlHeitmann
Copy link
Contributor Author

P.S.1: About support for ruby 1.8.7, I saw you've used if String.method_defined?(:encoding) ... else ... end block to define methods for ruby 1.8.7 and the other ruby versions dynamically. Maybe I can give it a try to fix the ruby build problem.

P.S.2: I noticed something bad while using the debugger... after I mutate the hash on the expected anything value here, when PC returns to the it scope, the expected variable has mutated! As you can see on the screenshot below:
image
Line 611 has mutated the expected[:an_key] var from anything to dummy. I didn't notice that previously, I think this needs to be addressed.

@pirj
Copy link
Member

pirj commented May 10, 2024

I recall something has changed in 1.9 or 2.0 related to the stability of keys order on hashes. This may cause the ordering issue.

Please correct me if I’m wrong, but this seems to only affect the order of pair in the output diff, and won’t cause any other behavioural difference between rspec runnin on 1.8.7 and later. If so, I suggest disregarding this problem, and marking the spec as broken on 1.8.7.

@KarlHeitmann
Copy link
Contributor Author

Yes @pirj , it only affects the order of pair in the output diff. I'll mark the spec as broken on 1.8.7

end
elsif no_procs?(actual, expected) && no_numbers?(actual, expected)
diff = diff_as_string(coerce_to_string(actual), coerce_to_string(expected)) if any_multiline_strings?(actual, expected)
elsif hash_with_anything?(expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reset the discussion below.
I am concerned about side effects of shared state in instance variables.
I’d live to get rid of an instance variable @keys_with_anything.
Only those two methods need it. Can we cache it somehow here in a regular variable? Can hash_with_anything? become something that returns a value that we can subsequently pass to diff_as_object_with_anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! I just came up with something, I didn't thought previously about shared state. Now, I've renamed hash_with_anything? to all_keys_from_hash and it returns a keys array, that is later used in diff_as_object_with_anything. This way we don't have shared state in instance variables.

This approach involved an assignment inside an if condition, it added an extra && operator and it increased the rubocop metric: PerceivedComplexity score, so I had to create another method to wrap all the logic inside it.

@KarlHeitmann
Copy link
Contributor Author

Ok! I think I've solved the issues now. Please tell me if there is anything else I can do. I can also squash all the commits in one or two, but I think it is easier to review the code changes when each commit addresses a code suggestion change.

|@@ -1,4 +1,4 @@
| :an_key => "dummy",
| :fixed => "fixed",
|-:nested => {:anything_key=>"bcdd0399-1cfe-4", :name=>"foo", :trigger=>"wrong"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m in the fence about this change. For nested arrays, doesn’t it make more sense to indicate that a certain key was anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion for nested arrays. Displaying that a certain key was anything for nested arrays was the original behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m still not too familiar with this addition. Do you yhink it is possible to keep the effect of collapsed “an_key” which I undoubtedly like, but make no changes to the “anything_key”’s value in expected of a nested array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, sure. If you checkout the first commit of this PR, you'll be on a state where there is no recursion, thus no further modifications on nested objects.

I have no problem at all to revert to that state. I don't have a strong opinion about morphing anything fuzzy matcher in nested objects. After all, when a nested object differs you get the entire object highlighted in red anyways, so I understand morphing deep nested values (like the one you are unfamiliar with: :anything_key) may not be worth to do, and it may be better to only keep the effect of collapsed first level keys like "an_key".

But I don't know the answer to this, just want to give you options so you can choose a good answer. You have a lot more experience than I in this project :)

def run(actual, expected)
diff = ""

no_procs_no_numbers = lambda {|var1, var2| no_procs?(var1, var2) && no_numbers?(var1, var2)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to make this a method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I don't have a strong opinion either here. Originally I did it this way:

            if no_procs?(actual, expected) && no_numbers?(actual, expected)
               diff = diff_as_object_with_anything(actual, expected)
             end
          elsif no_procs?(actual, expected) && no_numbers?(actual, expected)
             diff = diff_as_object(actual, expected)

But Rubocop complained about offending Metrics::PerceivedComplexity rule, in order to lower the perceived complexity score I made the lambda function to reduce the number of && operators by 1, and that made the offense disappear.

I didn't made a method because I didn't know if it was overkill or not to create a method that only me will use it in only two places. But I don't have a strong opinion, I can write the method if you think it is better :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean is it possible to extract ‘no_procs_no_numbers?’ to a method? That would even further reduce the complexity, right? And (I may be mistaken) reduce memory allocations on subsequent calls.


if all_strings?(actual, expected)
diff = diff_as_string(coerce_to_string(actual), coerce_to_string(expected)) if any_multiline_strings?(actual, expected)
elsif (keys = all_keys_from_hash(expected)) && keys.any?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we check no_procs_no_numbers here, and the other check goes inside?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that makes it easier to read! Thanks! I've submitted a new commit with your suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something broke specs, I hope it’s not because of this cosmetic suggestion. Please feel free to revert if this is the case

@pirj pirj requested a review from JonRowe May 21, 2024 17:03
@KarlHeitmann
Copy link
Contributor Author

Hey @pirj , what if we just stick with the first commit and improve it with your great suggestion here?

The first commit is a simpler solution that implements the behavior you mentioned here, and I think that behavior although it is simpler is still a good improvement on rspec-support differ. RSpec::Support::Differ already displays nested object diffs in a single line, so the enhancement provided by my suggestion on this PR are marginal in such cases, it may not be worth to implement them.

I recall having seen the broken tests you mentioned while I was working on my last commits, but I am unable to reproduce the broken tests locally. Maybe it is better to simplify the PR, I can even open a new & cleaner PR with the first commit enhanced by your suggestion.

@pirj
Copy link
Member

pirj commented May 24, 2024

Completely agree, @KarlHeitmann. This pr will be kept if we decide to also include diffing for lines.

@KarlHeitmann
Copy link
Contributor Author

Ready, I've just submitted #599 with the simplified version of this PR.

@JonRowe
Copy link
Member

JonRowe commented Jun 30, 2024

👋 Do you want to rebase this taking #599 into account? If you'd prefer to close this thats also fine

@KarlHeitmann
Copy link
Contributor Author

Hey @JonRowe !

To be honest, I don't know what to do here...

I couldn't rebase the branch, because the merged #599 PR turned out to be completely different to what I initially wrote on the first commit of this branch.

Nevertheless, on my local machine I completely rewrote the current branch taking #599 and had success. The changes I made are simpler than what I did on my initial approach on the current PR #596 .

I could force-push my local branch into my repo so you can see the changes. But I don't know if it is worth to do. Because the diff message for nested hashes is already confusing. I mean, if I made the changes to hide any value that matches the anything fuzzy matcher on nested hashes.... it will not make a significant difference, because I would still see a biiiiiiiig single line diff for a hash object anyways 🤷. I don't know if I'm making myself understood correctly.

OMHO, I think I can close this PR (or keep it "on hold") until RSpec::Support::Differ class can handle diffing nested hashes in a readable and pretty way. Once that task is done, continue with the current PR to hide anything matches on nested hashes.

But I think it is your call, I can force push my local branch here or close this PR. Also, I can try to fix RSpec::Support::Differ to display nested hashes diff in a prettier way.

@pirj
Copy link
Member

pirj commented Jul 5, 2024

@KarlHeitmann what is your plan of improving the diff from the description?
The first, non-nested, part is done, right?

As I’m looking at before/after, my preference is to keep things as is. There are certainly cases where diffs can be improved. But any improvement is welcome. And I kindly appreciate your thoughtful and thorough approach!

Please feel free to force-push the rebased branch here.

@JonRowe
Copy link
Member

JonRowe commented Jul 7, 2024

I'm going to close this as is, if / when the differ gets improved over in RSpec Support and you'd like to re-open this as a new improvement then feel free to do so :)

@JonRowe JonRowe closed this Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants