-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Rewrite Env2yaml in java instead of Go #18423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🤖 GitHub commentsJust comment with:
|
|
This pull request does not have a backport label. Could you fix it @donoghuc? 🙏
|
|
/run exhaustive tests |
|
Failure looks suspiciously related to a |
Managing a Go toolchain for persisting ENV vars in logstash container artifacts has become cumbersome. We already manage a java runtime so this commit presents a path forward to use that instead of Go. The Go binary is faster than java (in my testing Go would complete in around less than 200ms while java takes over 300ms). Given the container startup time is on the order of magnitute of seconds this change should be inperceptable to consumers. The benefit from consolidating in Java is worth the slightly lower performance.
1a3eb07 to
aa61b56
Compare
|
Double checked what copilot came up with. Also to show how insignificant the performance is i did this simple ruby test: #!/usr/bin/env ruby
GO_IMAGE = "docker.elastic.co/logstash/logstash:9.2.0"
JAVA_IMAGE = "772bb6342461"
def measure_env2yaml(image, runs = 100)
times = []
runs.times do
start = Time.now
system("docker run --rm -e PIPELINE_WORKERS=4 --entrypoint='' #{image} env2yaml /usr/share/logstash/config/logstash.yml > /dev/null 2>&1")
times << ((Time.now - start) * 1000).to_i
end
times.sum / times.size
end
puts "=== env2yaml execution time (#{100} runs) ==="
puts "Go: #{measure_env2yaml(GO_IMAGE)}ms avg"
puts "Java: #{measure_env2yaml(JAVA_IMAGE)}ms avg"example run: |
|
Still need to do some more |
|
Still working through how ironbank is different. Will push an update when possible. The java re-write is ready for review (and works in the other container), just making sure to get all the golang stuff out of the repo. |
|
/run exhaustive tests |
|
ironbank has been sorted out and i think i've got all the other references to the golang verions removed. Kicked off exhaustive test run https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/2897 |
|
Exhaustive tests for packages (unrelated to container only env2yaml) are failing due to a separate issue #18424 |
| @@ -0,0 +1,203 @@ | |||
| import org.yaml.snakeyaml.Yaml; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use snakeyaml-engine instaed of snakeyaml? couple of reasons:
- we don't need a lot of the higher level snakeyaml machinery for this very simple yaml parsing
- reduces surface for vulnerabilities and cve findings, given most are typically found in snakeyaml itself and not snakeyaml-engine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt even know about this 😅 I see it ships with jruby. I pushed a commit using that instead. The only difference i can spot is that it seems to want to quote values (which in my testing does not matter). Take a look, we can always revert back to snakeyaml if needed (assuming we continue to ship that with logstash).
| # The .class file is compiled during Docker build and placed in /usr/local/bin | ||
|
|
||
| exec /usr/share/logstash/jdk/bin/java \ | ||
| -cp "/usr/share/logstash/logstash-core/lib/jars/*:/usr/local/bin" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested a series of JVM flags to make startup faster. overall I can see ever-so-slight benefits with -XX:+UseSerialGC -Xms16m -Xmx16m, I tried messing with tiered complation settings but no obvious benefit. best tests are likely run on a very limited container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, updated with these. I think 16m is probably fine, do you feel comfortable with that or should we bump it a bit?
Use snakeyaml-engine and some java flags for faster execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really prefer if we kept the building of this class into a separate stage of the docker file.
This would make the build environment much cleaner (JDK container with snakeyaml engine jars only).
Also it means it we don't have to have the Env2Yaml.java file in the file docker image, it is one extra layer in the file image that is unnecessary.
The suggestion would be to make env2yaml more of a proper java (gradle) project, with a build.gradle:
dependencies {
implementation 'org.snakeyaml:snakeyaml-engine:2.7'
}then the dockerfile has the env2yaml build stage:
FROM gradle:8.7-jdk21 AS build
WORKDIR /app
COPY build.gradle ./
COPY src ./src
RUN gradle build --no-daemon
# then use it in logstash in the second stage
FROM [..]
COPY --from=build /app/build/libs/*.jar app.jarGradle is not needed, it's just to avoid having to download snakeyaml-engine jar directly.
Build env2yaml in a separate build stage for container artifacts. Include its dependencies and manage separately from logstash. Continue to use the java runtime in the final container to run the program, but manage the classpath separately. Note this did not use gradle for dependency management because installing that as a depdendcy was not worth it compared with downloading a jar directly.
| # Build env2yaml | ||
| FROM ${BASE_REGISTRY}/${BASE_IMAGE}:${BASE_TAG} AS builder-env2yaml | ||
|
|
||
| RUN dnf -y upgrade && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working on validating this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cant seem to download the "official" ironbank images. Here is what i tested against:
# build the dockerfile
rake artifact:dockerfile_ironbank
# drop the logstash tarball in next to the dockerfile generated ^^
cp ../logstash-9.3.0-SNAPSHOT-linux-aarch64.tar.gz .
# I'm on mac (arm)
sed -i '' 's/linux-x86_64/linux-aarch64/g' Dockerfile
# Build with redhat 9 ubi
docker build --build-arg BASE_REGISTRY=docker.io --build-arg BASE_IMAGE=redhat/ubi9 --build-arg BASE_TAG=9.4 -t logstash-ironbank:test .
| COPY scripts/env2yaml/Env2Yaml.java /tmp/ | ||
| RUN /usr/share/logstash/jdk/bin/javac -cp "/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/org/snakeyaml/snakeyaml-engine/2.9/*"/tmp/Env2Yaml.java -d /usr/local/bin/ | ||
| # Copy env2yaml and snakeyaml-engine from builder stage | ||
| RUN mkdir -p /usr/share/logstash/env2yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this is a reasonable location for the env2yaml java files.
|
Thats a good idea @jsvd! I added a stage to build, then copy over build artifacts to |
|
I played a bit with it and created a poc diff https://gist.github.com/jsvd/8276ad5d533b7218478671e86cf94e01 the benefit of using gradle is that we don't have the snakeyaml version spread across a few difference places, this way it's only in the env2yaml/build.gradle |
Use gradle (and a dedicated gradle base image) for building env2yaml
| } | ||
|
|
||
| dependencies { | ||
| implementation 'org.snakeyaml:snakeyaml-engine:2.7' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CODEREVIEW: What version should we ship? Should we get on 3.0? I think jruby ships with 2.9
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/org/snakeyaml/snakeyaml-engine/2.9/snakeyaml-engine-2.9.jar
https://mvnrepository.com/artifact/org.snakeyaml/snakeyaml-engine
|
Thanks @jsvd for #18423 (comment) I had started down that path but didnt know if it was OK to use the gradle-jdk image in the ironbank build. I'm assuming that is OK, so i've updated everything to use gradle. There is an open question of which version to ship, but I do agree with you that this will likely be a good thing to have in place long term. Thanks for the POC that was very nice to go off of :). |
|
Based on an in person conversation we are going to investigate doing an iteration where env2yaml is compiled and included in the tarball consumed by the containers. This will eliminate the need for a separate build step to compile the program during container build. |
Dont rely on compiling at docker build time, rather do it when logstash compilation is done.
|
@jsvd I did an iteration based on our discussion about doing the env2yaml compilation alongside the logstash compilation 8734db3 to eliminate the need for multi stage container builds. Have a look at the management of snakeyaml-engine. I tried to come up with a pattern for copying over the version shipped with jruby without being beholden to updating paths or jar names that are version specific. Not sure how much that gets us... If you would prefer we just manage that in the gradle file we can refactor. |
The complexity around trying to copy over the jar shipped with jruby is not worth how easy it is to just manage it with gradle. This helps with keeping env2yaml contained.
|
Trying to use the snakeyaml-engine from jruby was adding unncessary complexity. I pushed a commit to just vendor it with gradle. I think that simplicity and insulation will long term be the best move. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be checked but now that env2yaml is a project registered at the top level gradle, we no longer need this empty settings file.
docker/templates/Dockerfile.erb
Outdated
| ln -s /usr/share/logstash /opt/logstash | ||
|
|
||
| COPY --from=builder-env2yaml /tmp/go/src/env2yaml/env2yaml /usr/local/bin/env2yaml | ||
| # Copy env2yaml from tarball (pre-compiled by Gradle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to have this comment in final dockerfile
| # Copy env2yaml from tarball (pre-compiled by Gradle) | |
| <%# Copy env2yaml from tarball (pre-compiled by Gradle) %> |
💚 Build Succeeded
History
|
Release notes
[rn:skip]
What does this PR do?
Managing a Go toolchain for persisting ENV vars in logstash container artifacts has become cumbersome. We already manage a java runtime so this commit presents a path forward to use that instead of Go. The Go binary is faster than java (in my testing Go would complete in around less than 200ms while java takes over 300ms). Given the container startup time is on the order of magnitute of seconds this change should be inperceptable to consumers. The benefit from consolidating in Java is worth the slightly lower performance.
Why is it important/What is the impact to the user?
This should not be noticeable, though technically starting logstash in a container will take about 200ms longer.
Checklist
How to test this PR locally
Build container:
Run env2yaml directly or check at startup.
Example: