Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terragrunt catalog command clones repos on every invocation #3532

Open
tgeijg opened this issue Nov 4, 2024 · 4 comments
Open

Terragrunt catalog command clones repos on every invocation #3532

tgeijg opened this issue Nov 4, 2024 · 4 comments
Labels
bug Something isn't working preserved Preserved issues never go stale

Comments

@tgeijg
Copy link

tgeijg commented Nov 4, 2024

Describe the bug

According to the Medium article on terragrunt catalog the command should only clone the configured repos on the first invocation.

Note: the catalog command will git clone the repos locally, so the first run may take a little while if you have lots of repos, but it should be much faster on all re-runs.

The docs are a little more vague as they just say the repos are cloned into a temporary folder.

Steps To Reproduce

Using the example from the Medium article above, if I add the catalog config to the root terragrunt file in our infra repo

catalog {
  urls = [
    "https://github.com/gruntwork-io/terragrunt-infrastructure-modules-example",
    "https://github.com/gruntwork-io/terraform-aws-utilities",
    "https://github.com/gruntwork-io/terraform-kubernetes-namespace"
  ]
}

The results I'm seeing are that the repos are cloned (into the same temporary folder) on each catalog invocation.

❯ tg catalog
10:35:12.380 INFO   Cloning repository "git::https://github.com/gruntwork-io/terragrunt-infrastructure-modules-example.git" to temporary directory "/var/folders/s3/zm9trzfs4cd6900ws1qhvfdh0000gp/T/catalog5a5f7a556b6a764d7757384d7a55516e437742456d616f6f413577/terragrunt-infrastructure-modules-example"
10:35:13.762 INFO   Found 5 modules in repository "https://github.com/gruntwork-io/terragrunt-infrastructure-modules-example"
10:35:13.762 INFO   Cloning repository "git::https://github.com/gruntwork-io/terraform-aws-utilities.git" to temporary directory "/var/folders/s3/zm9trzfs4cd6900ws1qhvfdh0000gp/T/catalog4e5732674b627567715f364f6947554c45717a7a4132514766714d/terraform-aws-utilities"
10:35:16.613 INFO   Found 11 modules in repository "https://github.com/gruntwork-io/terraform-aws-utilities"
10:35:16.613 INFO   Cloning repository "git::https://github.com/gruntwork-io/terraform-kubernetes-namespace.git" to temporary directory "/var/folders/s3/zm9trzfs4cd6900ws1qhvfdh0000gp/T/catalog69625338504b4c684c66353434305f45594d486c74495a4d725959/terraform-kubernetes-namespace"
10:35:17.728 INFO   Found 4 modules in repository "https://github.com/gruntwork-io/terraform-kubernetes-namespace"
❯
❯ tg catalog
10:35:20.898 INFO   Cloning repository "git::https://github.com/gruntwork-io/terragrunt-infrastructure-modules-example.git" to temporary directory "/var/folders/s3/zm9trzfs4cd6900ws1qhvfdh0000gp/T/catalog5a5f7a556b6a764d7757384d7a55516e437742456d616f6f413577/terragrunt-infrastructure-modules-example"
10:35:23.323 INFO   Found 5 modules in repository "https://github.com/gruntwork-io/terragrunt-infrastructure-modules-example"
10:35:23.323 INFO   Cloning repository "git::https://github.com/gruntwork-io/terraform-aws-utilities.git" to temporary directory "/var/folders/s3/zm9trzfs4cd6900ws1qhvfdh0000gp/T/catalog4e5732674b627567715f364f6947554c45717a7a4132514766714d/terraform-aws-utilities"
10:35:27.127 INFO   Found 11 modules in repository "https://github.com/gruntwork-io/terraform-aws-utilities"
10:35:27.127 INFO   Cloning repository "git::https://github.com/gruntwork-io/terraform-kubernetes-namespace.git" to temporary directory "/var/folders/s3/zm9trzfs4cd6900ws1qhvfdh0000gp/T/catalog69625338504b4c684c66353434305f45594d486c74495a4d725959/terraform-kubernetes-namespace"
10:35:29.149 INFO   Found 4 modules in repository "https://github.com/gruntwork-io/terraform-kubernetes-namespace"

Expected behavior

On second catalog invocation the repositories should not be cloned again (into the same temporary folder).

Either the Medium article is incorrect, I'm doing something wrong, or this is a bug.

Versions

  • Terragrunt version: v0.68.4
@tgeijg tgeijg added the bug Something isn't working label Nov 4, 2024
@yhakbar
Copy link
Collaborator

yhakbar commented Nov 22, 2024

Hey @tgeijg ,

Thanks for submitting this issue! At the very least, our logs will need updating if we're avoiding the secondary clone, but reporting it as another clone.

Marking this issue as preserved so that it doesn't go stale. Definitely worth addressing.

@yhakbar yhakbar added stale Stale preserved Preserved issues never go stale and removed stale Stale labels Nov 22, 2024
@tgeijg
Copy link
Author

tgeijg commented Dec 13, 2024

Hi @yhakbar

Just to add some more info: Out of curiosity I ran some test with a few more repos - 40 to be exact - in the catalog config. Results below:

❯❯ time tg catalog 
09:59:07.375 INFO   Cloning repository "git::ssh://[email protected]/rivian/dc/platform/terraform-modules/s3.git//." to temporary directory "/var/folders/s3/zm9trzfs4cd6900ws1qhvfdh0000gp/T/catalog78364c3256384b62486c507854427165422d37756e37585

...

"/var/folders/s3/zm9trzfs4cd6900ws1qhvfdh0000gp/T/catalog55674a5f734835564d4c2d7773333952467078646f525a48625045"
10:00:36.775 INFO   Found 1 modules in repository "[email protected]:rivian/dc/platform/terraform-modules/lakeformation.git//."

terragrunt catalog  7.64s user 13.29s system 22% cpu 1:31.57 total

❯❯ time tg catalog
10:00:41.944 INFO   Cloning repository "git::ssh://[email protected]/rivian/dc/platform/terraform-modules/s3.git//." to temporary directory "/var/folders/s3/zm9trzfs4cd6900ws1qhvfdh0000gp/T/catalog78364c3256384b62486c507854427165422d37756e37585

...

"/var/folders/s3/zm9trzfs4cd6900ws1qhvfdh0000gp/T/catalog55674a5f734835564d4c2d7773333952467078646f525a48625045"
10:02:07.977 INFO   Found 1 modules in repository "[email protected]:rivian/dc/platform/terraform-modules/lakeformation.git//."

terragrunt catalog  7.30s user 12.55s system 22% cpu 1:27.07 total

❯❯ time tg catalog
10:02:15.978 INFO   Cloning repository "git::ssh://[email protected]/rivian/dc/platform/terraform-modules/s3.git//." to temporary directory "/var/folders/s3/zm9trzfs4cd6900ws1qhvfdh0000gp/T/catalog78364c3256384b62486c507854427165422d37756e37585

...

"/var/folders/s3/zm9trzfs4cd6900ws1qhvfdh0000gp/T/catalog55674a5f734835564d4c2d7773333952467078646f525a48625045"
10:03:41.321 INFO   Found 1 modules in repository "[email protected]:rivian/dc/platform/terraform-modules/lakeformation.git//."

terragrunt catalog  7.32s user 12.40s system 22% cpu 1:26.63 total

Or in short, three consecutive runs ran within a few seconds of each other. It sure doesn't seem to be significantly faster on the second run

1. - 7.64s user 13.29s system 22% cpu 1:31.57 total
2. - 7.30s user 12.55s system 22% cpu 1:27.07 total
3. - 7.32s user 12.40s system 22% cpu 1:26.63 total

@yhakbar
Copy link
Collaborator

yhakbar commented Dec 13, 2024

I believe you that we have a bug preventing re-use of already cloned modules.

My guess as to the origin of this bug is that we initially cloned once, but users wanted to allow catalogs to update without changing URLs. We could have introduced logic to use git ls-remote to check if the ref moved, but instead we just cloned each time.

Again, this is just a guess, but it would be plausible to me if that was the case.

I think a solution that would be better than saving a stashed clone of each ref would be to use something like this:
https://github.com/yhakbar/cln

The problem with storing a stashed clone of each ref is that there would be no space savings between updates. If you are using the default branch of a Catalog URL as the source of your catalog, then changing one file in your repo would result in an almost entirely duplicate copy of your repo on disk.

I would assume that most users would prefer a Content Addressable Store (CAS) to store an immutable read-only copy of every object in the repo, and to use that when perusing the catalog.

Would you agree with that? If so, I'm not sure when we would get to that, but that's the design we'd use for the final implementation.

@tgeijg
Copy link
Author

tgeijg commented Dec 13, 2024

That's a pretty neat idea @yhakbar. And yes, I do think that would be a better approach in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working preserved Preserved issues never go stale
Projects
None yet
Development

No branches or pull requests

2 participants