Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terragrunt exclude block is supported only for run-all #3937

Open
1 of 2 tasks
velkovb opened this issue Feb 26, 2025 · 11 comments
Open
1 of 2 tasks

Terragrunt exclude block is supported only for run-all #3937

velkovb opened this issue Feb 26, 2025 · 11 comments
Labels
bug Something isn't working

Comments

@velkovb
Copy link

velkovb commented Feb 26, 2025

Describe the bug

The skip attribute is deprecated and the documentation says it should be replaced with an exclude block. However, the exclude block does not provide the same functionality. It seems like it is honoured only on run-all commands and not during regular plan or apply commands.

Steps To Reproduce

The following works:

skip = contains(local.environments, local.target) ? false : true

locals {
  environments = ["dev", "prod"]
  target       = "test"
}

This doesn't:

exclude {
  if                   = !contains(local.environments, local.target)
  actions              = ["all"]
  exclude_dependencies = true
}

locals {
  environments = ["dev", "prod"]
  target       = "test"
}

A regular plan is executed even though the requirement is not satisfied.

Expected behavior

A clear and concise description of what you expected to happen.

terragrunt plan
17:34:20.994 INFO   Skipping terragrunt module ./terragrunt.hcl due to skip = true.

Nice to haves

  • Terminal output
  • Screenshots

Versions

  • Terragrunt version: 0.73.14
  • OpenTofu/Terraform version: Terraform 1.10.3
  • Environment details (Ubuntu 20.04, Windows 10, etc.): Ubuntu 24.04

Additional context

Add any other context about the problem here.

@velkovb velkovb added the bug Something isn't working label Feb 26, 2025
@yhakbar
Copy link
Collaborator

yhakbar commented Feb 26, 2025

This is expected, and documented here:
https://terragrunt.gruntwork.io/docs/features/runtime-control/#exclusion-from-the-run-queue

Can you give an explanation for why you think this should be the default behavior?

The expected usage of the exclude block is to exclude some units from the run queue by default, so that incomplete units can be integrated into a shared codebase and to allow engineers to explicitly test the incomplete units when necessary.

@velkovb
Copy link
Author

velkovb commented Feb 26, 2025

Because it is offered as a replacement of the skip attribute which work in a different way and can be used with plan and apply commands. As skip is deprecated, once it is removed there will be nothing to provide this functionality. Furthermore, it should be stated that is relevant in the context of run-all in the docs for the block itself so it is more visible for users.

@yhakbar
Copy link
Collaborator

yhakbar commented Feb 26, 2025

Did you note the call out in the docs that a before_hook can be used to achieve the same benefit?

terraform {
  before_hook "prevent_deploy" {
    commands = ["apply", "destroy"]
    execute  = local.ban_deploy ? ["bash", "-c", "echo 'Deploying on weekends is not allowed. Go home.' && exit 1"] : []
  }
}

That's a good idea! If you agree that users have access to the tooling that they need without changing the configuration block, please feel free to submit a pull request to update the documentation in a way that would have made the behavior obvious to you when you first looked at it.

@velkovb
Copy link
Author

velkovb commented Feb 26, 2025

That is way uglier and actually throws an error which it shouldn't be. Will look into a PR.

Would that mean that there are no plans to provide the existing skip attribute functionality?

@yhakbar
Copy link
Collaborator

yhakbar commented Feb 26, 2025

I can understand a desire to keep things the same, but that behavior of exiting without error was also surprising to users.

When including a configuration with a skip = true, performing a run in a unit would just exit out with that easy to miss warning. It's especially unclear in CI systems and scripts.

If you have a use-case where you would prefer that the exclude block also exits early for a standalone run, I would like to learn more about it. Knowing that it has value for users in real world scenarios might convince us to change how the exclude block works, or provide some sort of attribute to adjust the behavior.

@velkovb
Copy link
Author

velkovb commented Feb 26, 2025

I understand that the current behavior might be surprising in some cases, especially in CI systems where warnings can be easy to miss. However, from our perspective, skip = true is an intentional decision—it means the configuration is set up correctly and behaving as expected. Introducing an error in this case would create unnecessary failures, making deployments more confusing rather than clearer.

In our setup, we maintain a structured set of Terragrunt projects and deploy them across multiple environments for consistency. However, not all projects belong in every environment, so we use a configuration file to define where each project should be deployed. During execution, we check this file and skip the project if it isn't needed in the target environment. Seeing errors in this scenario would be misleading since the process is working exactly as designed.

Would you be open to discussing alternative ways to improve visibility in CI while preserving the expected behavior for cases like ours?

@yhakbar
Copy link
Collaborator

yhakbar commented Feb 26, 2025

Yes! Let's discuss.

It seems like the solution that would work best for most users is to simply not add the unnecessary units in each environment, so as it currently stands, it doesn't seem like good default behavior to have an early exit when performing a run on one unit.

Would a design that allowed users to opt in for an exclude to exit early instead of just excluding the unit from the run queue help?

Alternatively, maybe it would serve users better to have a configuration for the before_hook to exit early without error when a particular status code is detected?

I'd also like to understand better why it wouldn't be an error for a user to try to perform a run in a unit that shouldn't be used for a particular environment. Why is it better to exit with a status code of zero?

@velkovb
Copy link
Author

velkovb commented Feb 26, 2025

In our use case we have a monorepository with hundreds of projects. Each project has a configuration file that has the following yaml:

environments:
  - dev
  - prod

We use the same terragrunt.hcl with different input based on the environment to ensure consistence and to have promotion process. The current behaviour of the skip attributes allowed us to control the enabled environments for a project via this file and it cannot be accidentally applied in an environment that it is not enabled for. The other use case is when we plan multiple projects together but that will be using run-all so it should be fine with the exclude block.

└───project-a
        project.yaml
        terragrunt.hcl
        variables_dev.hcl
        variables_prod.hcl

That is why a project might be only in dev for example and it needs to be skipped in prod.

On the suggestions, both seems like an option. We will be okay if it is not the default behavior but have the possibility to configure it. From my experience, the current skip attribute is behaving better than a hook as it is evaluated only once and may faster than hooks. If we go the hook route, from what I remember it is evaluated on each dependency and we could get several executions on the hooks. Not sure if I am mistaken but it also would be called twice if the hook is configured for `commands = ["init", "validate", "plan", "apply", "destroy" ], and we run plan - once for the implicit init and once for the plan

@yhakbar
Copy link
Collaborator

yhakbar commented Feb 27, 2025

I see. That seems like a good use-case for early exit even when running directly on the unit.

e.g.

exclude {
    if = true
    early_exit = true # Also exit early when run directly on the unit.
}

In your use-case, though, wouldn't you prefer that an engineer get a non-zero exit code when they try to run the unit directly? That's bad behavior, right? If a user does terragrunt apply -auto-approve, they might think that they successfully applied the unit.

@velkovb
Copy link
Author

velkovb commented Feb 27, 2025

Will have a discussion with the team around it. Maybe its just a bias because I am used to the current solution. What worries me more is that the hooks are executed multiple times, once per dependency. I remember having really bad time around destroys and hooks. That is why I was hyped about the exclude block attribute to exclude dependencies but ... :)

@yhakbar
Copy link
Collaborator

yhakbar commented Feb 27, 2025

I can understand being frustrated when existing patterns change, especially if you feel like you're losing access to features.

We're trying to get to a stable configuration before 1.0 so we can make backwards compatibility guarantees going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants