-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/build task overhaul #710
Conversation
@michaelsauter, @oalyman as well as anyone else interested, this is ready for feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay in reviewing this.
After some thought, I believe this is much better than what we had, I really like what you did here 👍
I made some minor inline comments, but generally this looks good to me already.
Regarding the Go / Gradle tasks, I would like to align these as well, but we can do this in a separate PR as well if you prefer.
build/package/scripts/cache-build.sh
Outdated
--cached-outputs) outputs_str="$2"; shift;; | ||
--cached-outputs=*) outputs_str="${1#*=}";; | ||
|
||
--cache-extra-inputs) extra_inputs_str="$2"; shift;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the name a bit confusing. Maybe something like --cache-dependencies
? Though I am not convinced this is really better. Or maybe just changing the order to --extra-cache-inputs
would help? If you do not like any of these suggestions I'm fine with what you propose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the other comment I called now the combination of what previously would have been working-dir
and then the cache-extra-inputs
as cache-sources
. Let me know if you find this confusing or have a worthwhile alternative proposal.
description: >- | ||
List of build output directories (as colon separated string) to be cached. | ||
These directories are relative to `working-dir`. | ||
If caching is not enabled this value is not used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not need the separate toggle cache-build
anymore. Its description is a bit outdated now, and I think we could have the caching be based on the presence of the cached-outputs
param. If it is the empty string, no caching should happen. If one or more paths are specified, those should be cached.
That may lead to the question what we should have as defaults for this param, and I am leaning towards using the empty string as a default, and making build caching opt in. I feel that without an explicit output-dir, caching is more subtle and it would be good if people read about the feature and understand what has to be done instead of unknowingly using some magic values which may or may not cache everything they need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally I decided to remove extra inputs and instead have a cache-sources
parameter:
- List of build source directories (as colon separated string) which influence the build.
These directories are relative to the repository root.
If the contents in these directories change the cache is invalidated so that the build task will rebuild from scratch.
If left empty no caching is enabled. When sources are only in the working-dir, this value can be the same value.
You will have to opt in to build task skipping using parameter cache-sources
for all languages.
The cache-outputs
parameter can be used to determine the folders inside working-dir
to cache which is used as follows in the language tasks:
- go: not supported, passed into caching scripts with value of
output-dir
- npm: supported with default of
'dist'
- gradle: supported with default of
'docker'
- python: not supported, passed into caching scripts as empty value. As a reminder a build may still be skipped if the input doesn't change, but there is no need to cache any built files.
To me this feels quite natural now. What do you think?
@@ -30,11 +30,11 @@ spec: | |||
- name: dockerfile | |||
description: Path to the Dockerfile to build (relative to `docker-dir`). | |||
type: string | |||
default: ./Dockerfile | |||
default: ./docker/Dockerfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would tend to keep ./Dockerfile
. I know this is a breaking change but changing the docker-dir
already is one.
COPY src /node/ | ||
COPY node_modules /node/ | ||
COPY package.json /node/ | ||
COPY package-lock.json /node/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this could be simplified to COPY src node_modules package.json package-lock.json /node/
5354b99
to
dd90652
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it 👍
The only thing I am still struggling with is the naming of the parameters. I do find it a bit unintuitive that changing the "cache-sources" parameter influences whether caching is on or off.
So while reading this PR I was reminded of https://go.dev/blog/rebuild, which deals with a different, but related topic. The article uses the term "relevant input" to describe what influences a build. Reading the ADR in this PR, you actually use very similar language. My conclusion then is that:
- "input" and "output" seem to be the natural way to think about this
- "cached outputs" controls what gets cached so it feels more suitable to control whether caching is on or off
My suggestion would be the following:
- rename "cache-sources" to "build-inputs" or "relevant-inputs" or "relevant-cache-inputs" (leaning towards the latter). Naturally, this should default to the working dir and I would expect it to be changed by users very rarely (e.g. one example we know where this is helpful is for monorepos where one build depends on the other)
- keep "cached-outputs" including the defaults you chose. However, introduce a special string that disables caching, I would use "disabled" (or maybe something like "!disabled" - I think I like this most). In the case of Python, there currently is no "cached-outputs" param, but I would add it, with the default being the empty string. This would enable caching by default (caching only the artifacts), and allow to disable the caching by setting it to "!disabled". It would also allow users of the Python task with their own script to cache generated files for example.
Python and other dynamic languages where there is nothing to build have no outputs to cache but use the build for automated tests and thus build skipping makes still sense. I suppose it makes more sense to think and explain this feature with the notion of build skipping. Nonetheless looking at it with the sense of build skipping makes much more sense to me and then enabling/disabling with inputs seems okish for me. An alternative could be to bring the build-skipping-enabled flag back and also add the extra-inputs but the way it is proposed now made sense to me as well. Let me know how this affects your previous comment! |
To me "sources" implied that these are build inputs. Nonetheless I also don't mind changing a less generic term and use inputs instead. Although "relevant-inputs", "relevant-cache-inputs" or "relevant-build-inputs" make sense to me, here is another suggestion: - name: build-inputs
description: |
List of relevant directories (as colon separated string) with files that influence the build.
These directories are relative to the repository root.
If these directories do not change the build is skipped and any files that were
previously built are restored as well as the prior build reports.
The built files to restore are specified with the "build-outputs" parameter.
Otherwise (if build-inputs changed) the build starts from scratch.
The value "!always-rebuild" means that the build is never skipped.
type: string
default: $(parameter.working-dir) One doubt I have is: can a Task parameter's default be a value of another parameter? If not one would have to use some other magic value or reintroduce the "build-extra-inputs" |
After some time I have to say I did not come to a conclusion that feels completely right to me. I tend towards using "extra inputs" as a param, as that captures the intend for most cases, and avoids having to align this param with the working dir param. I am not sure how I would implement the possibility to disable skipping. One thing that came to my mind is that if "extra inputs" would be set to a path that always changes, it would effectively disable skipping. However, I do not know of any file in Unix that one can compute the md5 of that would always change ( In summary, feel free to pick what feels best to you now, and we can always fine-tune later. |
I was pondering about the random suggestion but for now felt that we should instead reintroduce the flag and have extra inputs. Following is the updated draft of the description: Draft to be mergedThis changes build tasks to no longer copy build outputs into the docker-context. This is implemented by having Dockerfile's use files from natural build output location(s) and by default have docker-context to be the same as the repository root. This also allows using the same Dockerfile for ods-pipeline as well as local builds. In addition build task skipping (and the related caching) is enhanced to to allow multiple input and output locations. cache-build.sh and copy-build-if-cached.sh no longer have an output-dir parameter and instead support parameters: --build-extra-inputs: List of build source directories (as colon separated string) which in addition to working-dir influence the build. --cached-outputs: List of build output directories (as colon separated string) to be cached. These directories are relative to the working-dir parameter. The build tasks themselves are adjusted accordingly. All build tasks now support parameter --build-extra-inputs with a default "" (empty String) as a task parameter. In addition: npm: no longer copies build outputs to output-dir. The parameters build-dir, copy-node-modules and output-dir which configure that behavior are removed and the build script is simplified. Instead the task now supports the cached-outputs parameter which defaults to dist. When other locations must be cached, the parameter would have to be adjusted. For example to build:node_modules if an associated Dockerfile would copy from these locations. python: no longer copies build outputs. The parameter output-dir was removed. In addition as for Python the actual build is meant to happen during image building no cached-outputs are needed and the task always passes an empty string here. Build tasks are still skipped if the working-dir is unchanged. go does not support cached-outputs on the task level and instead sets the scripts cached-outputs parameter to output-dir. gradle supports the cached-outputs parameter with a default being 'docker'. |
This changes build tasks to no longer copy build outputs into the docker-context. This is implemented by having Dockerfile's use files from natural build output location(s) and by default have
docker-context
to be the same as the repository root. This also allows using the same Dockerfile for ods-pipeline as well as local builds.In addition build task skipping (and the related caching) is enhanced to to allow multiple input and output locations.
cache-build.sh
andcopy-build-if-cached.sh
no longer have anoutput-dir
parameter and instead support parameters:--cache-sources
: List of source directories (as colon separated string) which influence the build. These directories are relative to the repository root. If the contents in these directories change the cache is invalidated and the build task will rebuild from scratch.The default is that this value is empty which is understood as disabling build task skipping and the related caching.
In the rather typical case where a build tasks' sources match the
working-dir
, this value should be the same. However if there are files outside theworking-dir
the value must capture these correctly as otherwise builds may be skipped although they need to run again.--cached-outputs
: List of build output directories (as colon separated string) to be cached. These directories are relative to theworking-dir
parameter.The build tasks themselves are adjusted accordingly. All build tasks now support parameter
--cache-sources
with a default "" (empty String) as a task parameter, so that build task skipping is now an opt-in feature.In addition:
npm: no longer copies build outputs to
output-dir
. The parametersbuild-dir
,copy-node-modules
andoutput-dir
which configure that behavior are removed and the build script is simplified. Instead the task now supports thecached-outputs
parameter which defaults todist
. When other locations must be cached, the parameter would have to be adjusted. For example tobuild:node_modules
if an associated Dockerfile would copy from these locations.python: no longer copies build outputs. The parameter
output-dir
was removed. In addition as for Python the actual build is meant to happen during image building nocached-outputs
are needed and the task always passes an empty string here. Of course build tasks would still be skipped if one opted in using thecache-sources
parameter.go does not support sets
cached-outputs
on the task level and instead sets the scriptscached-outputs
parameter tooutput-dir
.gradle supports the
cached-outputs
parameter with a default being'docker'
.Closes #678
Tasks:
docs/design
directory or not applicabledocs
directory or not applicablemake test
) or not applicable