You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The size of acryldata/datahub-actions:v0.1.9 reported by docker images is 3.08GB.
I did a little experiment where I changed Line 49 like below, and then the size was 2.27GB.
RUN --mount=type=cache,target=/datahub-ingestion/.cache/uv,uid=1000,gid=1000 \
UV_LINK_MODE=copy uv pip install -e ".[all]"
When I instead removed UV_LINK_MODE=copy like below from line 49, the docker image was 2.35GB, as it instead use hard links (default with UV) so all the package files are not duplicated.
RUN uv pip install -e ".[all]"
I think it would be best to remove the /datahub-ingestion/.cache/uv from the docker image by specifying --mount=type=cache... when it is built and if you really want to keep the uv .cache then remove UV_LINK_MODE instead to avoid many duplicated files.
I can create a PR for any of the alternatives if I only know which alternative you prefer.
The text was updated successfully, but these errors were encountered:
Yup let's change line 49. Adding the --mount=type=cache,target=/datahub-ingestion/.cache/uv,uid=1000,gid=1000 makes sense, and matches the behavior from line 44.
iirc we need to use UV_LINK_MODE=copy when using the cache mount, since hard links don't work across the cache volume into the base image layer.
A PR for this would be great :)
daha
linked a pull request
Jan 24, 2025
that will
close
this issue
Yes, you are correct about UV_LINK_MODE=copy, but if we would keep the uv cache in the image it would have been a different thing and we would not need UV_LINK_MODE=copy as the cache is on the same drive.
I saw that the datahub-actions docker image (latest) is a bit bigger than it needs to be.
As the RUN command on
datahub-actions/docker/datahub-actions/Dockerfile
Line 49 in ec6e41f
--mount=type=cache...
like on https://github.com/acryldata/datahub-actions/blob/main/docker/datahub-actions/Dockerfile#L44C5-L44C77 and is usingUV_LINK_MODE=copy
.The size of
acryldata/datahub-actions:v0.1.9
reported bydocker images
is 3.08GB.I did a little experiment where I changed Line 49 like below, and then the size was 2.27GB.
When I instead removed
UV_LINK_MODE=copy
like below from line 49, the docker image was 2.35GB, as it instead use hard links (default with UV) so all the package files are not duplicated.I think it would be best to remove the
/datahub-ingestion/.cache/uv
from the docker image by specifying--mount=type=cache...
when it is built and if you really want to keep the uv .cache then remove UV_LINK_MODE instead to avoid many duplicated files.I can create a PR for any of the alternatives if I only know which alternative you prefer.
The text was updated successfully, but these errors were encountered: