Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: nifi reduce image size #1027

Merged
merged 23 commits into from
Apr 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,20 @@ All notable changes to this project will be documented in this file.
### Added

- spark-connect-client: A new image for Spark connect tests and demos ([#1034])
- nifi: check for correct permissions and ownerships in /stackable folder via
`check-permissions-ownership.sh` provided in stackable-base image ([#1027]).

### Changed

- spark-k8s: Include spark-connect jars. Replace OpenJDK with Temurin JDK. Cleanup. ([#1034])

### Fixed

- nifi: reduce docker image size by removing the recursive chown/chmods in the final image ([#1027]).
- spark-k8s: reduce docker image size by removing the recursive chown/chmods in the final image ([#1042]).
- Add `--locked` flag to `cargo install` commands for reproducible builds ([#1044]).

[#1027]: https://github.com/stackabletech/docker-images/pull/1027
[#1034]: https://github.com/stackabletech/docker-images/pull/1034
[#1042]: https://github.com/stackabletech/docker-images/pull/1042
[#1044]: https://github.com/stackabletech/docker-images/pull/1044
Expand Down
131 changes: 81 additions & 50 deletions nifi/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,59 +7,78 @@ ARG PRODUCT
ARG MAVEN_VERSION="3.9.8"
ARG STACKABLE_USER_UID

RUN microdnf update && \
microdnf clean all && \
rm -rf /var/cache/yum
RUN <<EOF
microdnf update
microdnf clean all
rm -rf /var/cache/yum
EOF

# NOTE: From NiFi 2.0.0 upwards Apache Maven 3.9.6+ is required. As of 2024-07-04 the java-devel image
# ships 3.6.3. This will update maven accordingly depending on the version. The error is due to the maven-enforer-plugin.
#
# [ERROR] Rule 2: org.apache.maven.enforcer.rules.version.RequireMavenVersion failed with message:
# [ERROR] Detected Maven Version: 3.6.3 is not in the allowed range [3.9.6,).
#
WORKDIR /tmp
RUN if [[ "${PRODUCT}" != 1.* ]] ; then \
curl "https://repo.stackable.tech/repository/packages/maven/apache-maven-${MAVEN_VERSION}-bin.tar.gz" | tar -xzC . && \
ln -sf /tmp/apache-maven-${MAVEN_VERSION}/bin/mvn /usr/bin/mvn ; \
fi
RUN <<EOF
if [[ "${PRODUCT}" != 1.* ]] ; then
cd /tmp
curl "https://repo.stackable.tech/repository/packages/maven/apache-maven-${MAVEN_VERSION}-bin.tar.gz" | tar -xzC .
ln -sf /tmp/apache-maven-${MAVEN_VERSION}/bin/mvn /usr/bin/mvn
fi
EOF

USER ${STACKABLE_USER_UID}
WORKDIR /stackable

COPY --chown=${STACKABLE_USER_UID}:0 nifi/stackable/patches /stackable/patches

RUN curl 'https://repo.stackable.tech/repository/m2/tech/stackable/nifi/stackable-bcrypt/1.0-SNAPSHOT/stackable-bcrypt-1.0-20240508.153334-1-jar-with-dependencies.jar' \
# This used to be located in /bin/stackable-bcrypt.jar. We create a softlink for /bin/stackable-bcrypt.jar in the main container for backwards compatibility.
-o /stackable/stackable-bcrypt.jar && \
# Get the source release from nexus
curl "https://repo.stackable.tech/repository/packages/nifi/nifi-${PRODUCT}-source-release.zip" -o "/stackable/nifi-${PRODUCT}-source-release.zip" && \
unzip "nifi-${PRODUCT}-source-release.zip" && \
# Clean up downloaded source after unzipping
rm -rf "nifi-${PRODUCT}-source-release.zip" && \
# The NiFi "binary" ends up in a folder named "nifi-${PRODUCT}" which should be copied to /stackable
# from /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/nifi-${PRODUCT}-bin/nifi-${PRODUCT} (see later steps)
# Therefore we add the suffix "-src" to be able to copy the binary and remove the unzipped sources afterwards.
mv nifi-${PRODUCT} nifi-${PRODUCT}-src && \
# Apply patches
chmod +x patches/apply_patches.sh && \
patches/apply_patches.sh ${PRODUCT} && \
# Build NiFi
cd /stackable/nifi-${PRODUCT}-src/ && \
# NOTE: Since NiFi 2.0.0 PutIceberg Processor and services were removed, so including the `include-iceberg` profile does nothing.
# Additionally some modules were moved to optional build profiles, so we need to add `include-hadoop` to get `nifi-parquet-nar` for example.
if [[ "${PRODUCT}" != 1.* ]] ; then \
mvn --batch-mode --no-transfer-progress clean install -Dmaven.javadoc.skip=true -DskipTests --activate-profiles include-hadoop,include-hadoop-aws,include-hadoop-azure,include-hadoop-gcp ; \
else \
mvn --batch-mode --no-transfer-progress clean install -Dmaven.javadoc.skip=true -DskipTests --activate-profiles include-iceberg,include-hadoop-aws,include-hadoop-azure,include-hadoop-gcp ; \
fi && \
# Copy the binaries to the /stackable folder
mv /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/nifi-${PRODUCT}-bin/nifi-${PRODUCT} /stackable/nifi-${PRODUCT} && \
# Copy the SBOM as well
mv /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/bom.json /stackable/nifi-${PRODUCT}/nifi-${PRODUCT}.cdx.json && \
# Remove the unzipped sources
rm -rf /stackable/nifi-${PRODUCT}-src && \
# Remove generated docs in binary
rm -rf /stackable/nifi-${PRODUCT}/docs
RUN <<EOF
# This used to be located in /bin/stackable-bcrypt.jar. We create a softlink for /bin/stackable-bcrypt.jar in the main container for backwards compatibility.
curl 'https://repo.stackable.tech/repository/m2/tech/stackable/nifi/stackable-bcrypt/1.0-SNAPSHOT/stackable-bcrypt-1.0-20240508.153334-1-jar-with-dependencies.jar' \
-o /stackable/stackable-bcrypt.jar

# Get the source release from nexus
curl "https://repo.stackable.tech/repository/packages/nifi/nifi-${PRODUCT}-source-release.zip" -o "/stackable/nifi-${PRODUCT}-source-release.zip"
unzip "nifi-${PRODUCT}-source-release.zip"

# Clean up downloaded source after unzipping
rm -rf "nifi-${PRODUCT}-source-release.zip"

# The NiFi "binary" ends up in a folder named "nifi-${PRODUCT}" which should be copied to /stackable
# from /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/nifi-${PRODUCT}-bin/nifi-${PRODUCT} (see later steps)
# Therefore we add the suffix "-src" to be able to copy the binary and remove the unzipped sources afterwards.
mv nifi-${PRODUCT} nifi-${PRODUCT}-src

# Apply patches
chmod +x patches/apply_patches.sh
patches/apply_patches.sh ${PRODUCT}

# Build NiFi
cd /stackable/nifi-${PRODUCT}-src/

# NOTE: Since NiFi 2.0.0 PutIceberg Processor and services were removed, so including the `include-iceberg` profile does nothing.
# Additionally some modules were moved to optional build profiles, so we need to add `include-hadoop` to get `nifi-parquet-nar` for example.
if [[ "${PRODUCT}" != 1.* ]] ; then
mvn --batch-mode --no-transfer-progress clean install -Dmaven.javadoc.skip=true -DskipTests --activate-profiles include-hadoop,include-hadoop-aws,include-hadoop-azure,include-hadoop-gcp
else
mvn --batch-mode --no-transfer-progress clean install -Dmaven.javadoc.skip=true -DskipTests --activate-profiles include-iceberg,include-hadoop-aws,include-hadoop-azure,include-hadoop-gcp
fi

# Copy the binaries to the /stackable folder
mv /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/nifi-${PRODUCT}-bin/nifi-${PRODUCT} /stackable/nifi-${PRODUCT}

# Copy the SBOM as well
mv /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/bom.json /stackable/nifi-${PRODUCT}/nifi-${PRODUCT}.cdx.json

# Remove the unzipped sources
rm -rf /stackable/nifi-${PRODUCT}-src

# Remove generated docs in binary
rm -rf /stackable/nifi-${PRODUCT}/docs

# Set correct permissions
chmod -R g=u /stackable
EOF

FROM stackable/image/java-base AS final

Expand All @@ -83,8 +102,6 @@ COPY --chown=${STACKABLE_USER_UID}:0 nifi/licenses /licenses
COPY --chown=${STACKABLE_USER_UID}:0 nifi/python /stackable/python

RUN <<EOF
ln -s /stackable/nifi-${PRODUCT} /stackable/nifi

microdnf update

# python-pip: Required to install Python packages
Expand All @@ -96,24 +113,38 @@ microdnf clean all
rm -rf /var/cache/yum

# The nipyapi is required until NiFi 2.0.x for the ReportingTaskJob
# This can be removed once the 1.x.x line is removed
pip install --no-cache-dir \
nipyapi==0.19.1

# For backwards compatibility we create a softlink in /bin where the jar used to be as long as we are root
# This can be removed once older versions / operators using this are no longer supported
ln -s /stackable/stackable-bcrypt.jar /bin/stackable-bcrypt.jar

# All files and folders owned by root group to support running as arbitrary users.
# This is best practice as all container users will belong to the root group (0).
chown -R ${STACKABLE_USER_UID}:0 /stackable
chmod -R g=u /stackable
ln -s /stackable/nifi-${PRODUCT} /stackable/nifi

# fix missing permissions / ownership
chown --no-dereference ${STACKABLE_USER_UID}:0 /stackable/nifi
chmod --recursive g=u /stackable/python
chmod --recursive g=u /stackable/bin
chmod g=u /stackable/nifi-${PRODUCT}
EOF

# ----------------------------------------
# Checks
# This section is to run final checks to ensure the created final images
# adhere to several minimal requirements like:
# - check file permissions and ownerships
# ----------------------------------------

# Check that permissions and ownership in /stackable are set correctly
# This will fail and stop the build if any mismatches are found.
RUN <<EOF
/bin/check-permissions-ownership.sh /stackable ${STACKABLE_USER_UID} 0
EOF

# ----------------------------------------
# Attention: We are changing the group of all files in /stackable directly above
# If you do any file based actions (copying / creating etc.) below this comment you
# absolutely need to make sure that the correct permissions are applied!
# chown ${STACKABLE_USER_UID}:0
# Attention: Do not perform any file based actions (copying/creating etc.) below this comment because the permissions would not be checked.
# ----------------------------------------

USER ${STACKABLE_USER_UID}
Expand Down
59 changes: 59 additions & 0 deletions shared/checks/check-permissions-ownership.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/bin/bash
#
# Purpose
#
# Checks that permissions and ownership in the provided directory are set according to:
#
# chown -R ${STACKABLE_USER_UID}:0 /stackable
# chmod -R g=u /stackable
#
# Will error out and print directories / files that do not match the required permissions or ownership.
#
# Usage
#
# ./check-permissions-ownership.sh <directory> <uid> <gid>
# ./check-permissions-ownership.sh /stackable ${STACKABLE_USER_UID} 0
#

if [[ $# -ne 3 ]]; then
echo "Wrong number of parameters supplied. Usage:"
echo "$0 <directory> <uid> <gid>"
echo "$0 /stackable 1000 0"
exit 1
fi

DIRECTORY=$1
EXPECTED_UID=$2
EXPECTED_GID=$3

error_flag=0

# Check ownership
while IFS= read -r -d '' file; do
uid=$(stat -c "%u" "$file")
gid=$(stat -c "%g" "$file")

if [[ "$uid" -ne "$EXPECTED_UID" || "$gid" -ne "$EXPECTED_GID" ]]; then
echo "Ownership mismatch: $file (Expected: $EXPECTED_UID:$EXPECTED_GID, Found: $uid:$gid)"
error_flag=1
fi
done < <(find "$DIRECTORY" -print0)

# Check permissions
while IFS= read -r -d '' file; do
perms=$(stat -c "%A" "$file")
owner_perms="${perms:1:3}"
group_perms="${perms:4:3}"

if [[ "$owner_perms" != "$group_perms" ]]; then
echo "Permission mismatch: $file (Owner: $owner_perms, Group: $group_perms)"
error_flag=1
fi
done < <(find "$DIRECTORY" -print0)

if [[ $error_flag -ne 0 ]]; then
echo "Permission and Ownership checks failed for $DIRECTORY!"
exit 1
fi

echo "Permission and Ownership checks succeeded for $DIRECTORY!"
4 changes: 4 additions & 0 deletions stackable-base/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,10 @@ COPY --from=config-utils --chown=${STACKABLE_USER_UID}:0 /config-utils/config-ut
# Debug tool that logs generic system information.
COPY --from=containerdebug --chown=${STACKABLE_USER_UID}:0 /containerdebug/target/release/containerdebug /stackable/containerdebug

# **check-permissions-ownership.sh**
# Bash script to check proper permissions and ownership requirements in the final Stackable images
COPY --chown=${STACKABLE_USER_UID}:0 shared/checks/check-permissions-ownership.sh /bin/check-permissions-ownership.sh

ENV PATH="${PATH}:/stackable"

# These labels have mostly been superceded by the OpenContainer spec annotations below but it doesn't hurt to include them
Expand Down
32 changes: 19 additions & 13 deletions vector/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,22 @@ ARG STACKABLE_USER_UID
# This happens by writing a "shutdown file" in a shared volume
# See https://github.com/stackabletech/airflow-operator/blob/23.4.1/rust/operator-binary/src/airflow_db_controller.rs#L269 for an example
# The Vector container waits for this file to appear and this waiting happens using `inotifywait` which comes from the `inotify-tools` package
RUN ARCH="${TARGETARCH/amd64/x86_64}" ARCH="${ARCH/arm64/aarch64}" && \
rpm --install \
"https://repo.stackable.tech/repository/packages/vector/vector-${PRODUCT}-${RPM_RELEASE}.${ARCH}.rpm" \
"https://repo.stackable.tech/repository/packages/inotify-tools/inotify-tools-${INOTIFY_TOOLS}.${ARCH}.rpm" && \
mkdir /licenses && \
cp /usr/share/licenses/vector-${PRODUCT}/LICENSE /licenses/VECTOR_LICENSE && \
# Create the directory /stackable/vector/var.
# This directory is set by operator-rs in the parameter `data_dir`
# of the Vector configuration. The directory is used for persisting
# Vector state, such as on-disk buffers, file checkpoints, and more.
# Vector needs write permissions.
mkdir --parents /stackable/vector/var && \
chown --recursive ${STACKABLE_USER_UID}:0 /stackable/
RUN <<EOF
ARCH="${TARGETARCH/amd64/x86_64}"
ARCH="${ARCH/arm64/aarch64}"
rpm --install \
"https://repo.stackable.tech/repository/packages/vector/vector-${PRODUCT}-${RPM_RELEASE}.${ARCH}.rpm" \
"https://repo.stackable.tech/repository/packages/inotify-tools/inotify-tools-${INOTIFY_TOOLS}.${ARCH}.rpm"
mkdir /licenses
cp /usr/share/licenses/vector-${PRODUCT}/LICENSE /licenses/VECTOR_LICENSE

# Create the directory /stackable/vector/var.
# This directory is set by operator-rs in the parameter `data_dir`
# of the Vector configuration. The directory is used for persisting
# Vector state, such as on-disk buffers, file checkpoints, and more.
# Vector needs write permissions.
mkdir --parents /stackable/vector/var
chown --recursive ${STACKABLE_USER_UID}:0 /stackable/
# Set correct permissions
chmod -R g=u /stackable
EOF