Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Windows CI, fixing tests on Windows, test robustness #825

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

RoryO
Copy link

@RoryO RoryO commented Dec 9, 2019

Short story: I got this working

Build status

This was a journey. Hear my tale.

What started me on this saga was starting work on sonic pi. It depends on this library. Sonic PI's build instructions imply Rugged is the most fragile part in particular on Windows. Looking here I noticed Rugged does not have any Windows CI. This is important due to how fragile the RubyInstaller and msys2 environments are regarding C extensions. It's no fun doing gem install rugged which fails with no discernible fix from the user. Lets make sure that doesn't happen.

Attempt 1: Github Actions

I tried really, really hard to make this work. Short answer is the Windows images produced by Github are unpredictable and do not have any means of remote access for figuring them out.

Longer answer

The most difficult part of dealing with the Github provided images is getting to the bottom of layers of abstraction. There are definite issues regarding the installation of Ruby on the Windows images. I could not get to the bottom of it.

The recommended way is use the setup-ruby action. This works fine on *nix images. I guess it works fine on Windows images in the fact that the requested version is present. Using it is a matter of frustration.

The most frustrating and baffling thing about these images comes from the unpredictability on which tools run and how performing actions does not apparently affect anything.

For example, the last run where I tried and gave up, I discovered the msys2 system changes from underneath the Rake task that builds the gem and runs tests. You can see this here. Drawing attention to a few lines.

First the build step uses an outdated version of GCC even though earlier in the process I specifically updated the msys2 system which absolutely installed GCC 9.

Further, the actual build step of the gem uses a different tool path in C:\strawberry than the tool which configured the build earlier at C:\hostedtoolcache

This is the point where I gave up. I'm sure it's fixable somehow. There is a 10-15 minute iteration time for understanding these images and there is no remote shell option.

Attempt 2: Appveyor

Appveyor's specialty is it's Windows CI support. I found this true through working with it. I didn't have any troubles with their images. I did have some troubles with Bundler exit non-zero on a harmless warning I worked around. Otherwise it's smooth sailing.

Test breakages

This exposed a few issues with the test suite and potentially with libgit2. I left comments in this PR on where and why I changed things.

Further work

Included is a working appveyor config file. A maintainer of this repo must connect appveyor so it picks up on the reporting status.

I'm curious on the source of the failures I skipped over in the tests. It seems like they're within the libgit2 library and not the C binding code.

I'm also wondering about the open file handles after a test completes. On *nix they're definitely closed, on Windows they still linger until after the suite finishes. I don't feel like it affects day to day usage though.

@@ -12,39 +12,35 @@ jobs:
strategy:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating and simplifying this. Dropping 2.3 as it's EOL and not available in the CI images.

@@ -138,11 +138,13 @@ def test_raises_when_writing_invalid_entries

def test_can_write_index
Copy link
Author

@RoryO RoryO Dec 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an interesting test failure. What was happening here is it accidentally working. The procedure adds two new index entries with identical OIDs. Then, when checking for the order of insertion we sorted by OID. The stability of this sort depends on too many factors, and was in fact in a different order on Windows. Quick fix is create the second index entry with an OID higher than the first.

@@ -554,12 +550,15 @@ def test_clone_bare
end

def test_clone_with_transfer_progress_callback
skip 'Callback does not work with filesystem clone on Windows' if Gem.win_platform?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be some sort of difference with Windows where callbacks will not trigger unless a buffer hits a certain size. Other clone tests pass, so we know that cloning actually works, it's only the callback which doesn't trigger.

@@ -804,6 +802,11 @@ def verify_subtrees
end

def test_checkout_tree_with_commit
# the test repo has an unclean status. apparently libgit2 on *nix does not
Copy link
Author

@RoryO RoryO Dec 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another interesting thing I don't know how to resolve. The libgit2 test repo has modifications initially, confirmed on linux and windows. Apparently on *nix checking out another pointer isn't an issue here, on Windows it is for some reason.

# there are race conditions on Windows where the original test run
# did not release handles to the temporary directories.
# waiting until all tests finish works successfully
Minitest.after_run { Rugged::TestCase::FixtureRepo.teardown }
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The largest issue I kept running into with these tests is something about the Windows libgit2 was leaving open file handles to the temporary directories created. It appears the handles close when the test suite finishes. Therefore I think the best thing to do is clean up everything at the end of the suite instead of inline.

@@ -524,11 +518,13 @@ def test_init_with_wrong_argument
class RepositoryCloneTest < Rugged::TestCase
def setup
@tmppath = Dir.mktmpdir
@source_path = "file://" + File.join(Rugged::TestCase::TEST_DIR, 'fixtures', 'testrepo.git')
end
Rugged::TestCase::FixtureRepo.ensure_cleanup @tmppath
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making all temporary directory removal go through the same procedure for working around the open file handle issue mentioned later.

@@ -31,5 +31,8 @@ desc
s.add_development_dependency "rake-compiler", ">= 0.9.0"
s.add_development_dependency "pry"
s.add_development_dependency "minitest", "~> 5.0"
s.add_development_dependency "minitest-reporters"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added for junit xml output. github actions does not support this at this time.

@@ -54,7 +54,13 @@ def test_fetch_over_https_with_certificate_callback_fail
})
end

assert_equal "user rejected certificate for github.com", exception.message
response_message = if Gem.win_platform?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the windows response is actually more correct to the scenario. strange how we get different messages for the same action.

@RoryO RoryO marked this pull request as ready for review December 13, 2019 02:43
@@ -124,6 +124,7 @@ def test_remote_lookup_invalid

class RemotePushTest < Rugged::TestCase
def setup
skip 'local files and file:// protocol handled inconsistently with libgit2 on windows' if Gem.win_platform?
Copy link
Author

@RoryO RoryO Dec 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm already several layers of yak shaving doing what I originally wanted so I don't have the effort looking into this. This is worthy of someone looking into the libgit2 source and see what's up there. Some situations file:// works and is necessary, others it's rejected by the kernel with an invalid path name.

@RoryO
Copy link
Author

RoryO commented Dec 13, 2019

Looks like linux builds fail for reasons that are not our own. Appears github is transitioning ubuntu repos from ubuntu.com to microsoft.com

https://github.com/libgit2/rugged/runs/346774249#step:4:61

@RoryO
Copy link
Author

RoryO commented Dec 13, 2019

Pinging @carlosmn @tenderlove for visibility

@RoryO
Copy link
Author

RoryO commented Dec 19, 2019

@carlosmn @tenderlove beep beep

@RoryO
Copy link
Author

RoryO commented Feb 8, 2020

@carlosmn @tenderlove hello

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant