Contributions to RAPIDS Accelerator for Apache Spark fall into the following three categories.
- To report a bug, request a new feature, or report a problem with documentation, please file an issue describing in detail the problem or new feature. The project team evaluates and triages issues, and schedules them for a release. If you believe the issue needs priority attention, please comment on the issue to notify the team.
- To propose and implement a new Feature, please file a new feature request issue. Describe the intended feature and discuss the design and implementation with the team and community. Once the team agrees that the plan looks good, go ahead and implement it using the code contributions guide below.
- To implement a feature or bug-fix for an existing outstanding issue, please follow the code contributions guide below. If you need more context on a particular issue, please ask in a comment.
There are two types of branches in this repository:
-
branch-[version]
: are development branches which can change often. Note that we merge into the branch with the greatest version number, as that is our default branch. -
main
: is the branch with the latest released code, and the version tag (i.e.v0.1.0
) is held here.main
will change with new releases, but otherwise it should not change with every pull request merged, making it a more stable branch.
We use Maven for most aspects of the build. Some important parts
of the build execute in the verify
phase of the Maven build lifecycle. We recommend when
building at least running to the verify
phase, e.g.:
mvn verify
After a successful build the RAPIDS Accelerator jar will be in the dist/target/
directory.
By default the build will only include shims for released versions of Spark. To include shims
for snapshot versions of Spark still under development, use the snapshot-shims
Maven profile
(e.g.: add -Psnapshot-shims
to the Maven command-line). Note that when a snapshot Spark version
later becomes an official release, the snapshot shim for that version may no longer build due to
missing snapshot artifacts for that Spark version.
You can build against different versions of the CUDA Toolkit by using one of the following profiles:
-Pcuda11
(CUDA 11.0/11.1/11.2, default)
- Read the Developer Overview to understand how the RAPIDS Accelerator plugin works.
- Find an issue to work on. The best way is to look for the good first issue or help wanted labels.
- Comment on the issue stating that you are going to work on it.
- Code! Make sure to update unit tests and integration tests if needed! refer to test section
- When done, create your pull request.
- Verify that CI passes all status checks. Fix if needed.
- Wait for other developers to review your code and update code as needed.
- Once reviewed and approved, a project committer will merge your pull request.
Remember, if you are unsure about anything, don't hesitate to comment on issues and ask for clarifications!
RAPIDS Accelerator for Apache Spark follows the same coding style guidelines as the Apache Spark
project. For IntelliJ IDEA users, an
example code style settings file is available in the
docs/dev/
directory.
This project follows the official Scala style guide and the Databricks Scala guide, preferring the latter.
This project follows the Oracle Java code conventions and the Scala conventions detailed above, preferring the latter.
We require that all contributors sign-off on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
Any contribution which contains commits that are not signed off will not be accepted.
To sign off on a commit use the --signoff
(or -s
) option when committing your changes:
git commit -s -m "Add cool feature."
This will append the following to your commit message:
Signed-off-by: Your Name <[email protected]>
The sign-off is a simple line at the end of the explanation for the patch. Your signature certifies that you wrote the patch or otherwise have the right to pass it on as an open-source patch. Use your real name, no pseudonyms or anonymous contributions. If you set your user.name
and user.email
git configs, you can sign your commit automatically with git commit -s
.
The signoff means you certify the below (from developercertificate.org):
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
Please visit the testing doc for details about how to run tests
We provide a basic config .pre-commit-config.yaml
for pre-commit to
automate some aspects of the development process. As a convenience you can enable automatic
copyright year updates by following the installation instructions on the
pre-commit homepage.
To this end, first install pre-commit
itself using the method most suitable for your development
environment. Then you will need to run pre-commit install
to enable it in your local git
repository. Using --allow-missing-config
will make it easy to work with older branches
that do not have .pre-commit-config.yaml
.
pre-commit install --allow-missing-config
and setting the environment variable:
export SPARK_RAPIDS_AUTO_COPYRIGHTER=ON
The default value of SPARK_RAPIDS_AUTO_COPYRIGHTER
is OFF
.
When automatic copyright updater is enabled and you modify a file with a prior
year in the copyright header it will be updated on git commit
to the current year automatically.
However, this will abort the commit process
with the following error message:
Update copyright year....................................................Failed
- hook id: auto-copyrighter
- duration: 0.01s
- files were modified by this hook
You can confirm that the update actually has happened by either inspecting its effect with
git diff
first or simply reexecuting git commit
right away. The second time no file
modification should be triggered by the copyright year update hook and the commit should succeed.
A pull request should pass all status checks before merged.
Please follow the steps in the Sign your work section, and make sure at least one commit in your pull request get signed-off.
The check runs on NVIDIA self-hosted runner, a project committer can
manually trigger it by commenting build
. It includes following steps,
- Mergeable check
- Blackduck vulnerability scan
- Fetch merged code (merge the pull request HEAD into BASE branch, e.g. fea-001 into branch-x)
- Run
mvn verify
and unit tests for multiple Spark versions in parallel. Ref: spark-premerge-build.sh
If it fails, you can click the Details
link of this check, and go to Upload log -> Jenkins log for pull request xxx (click here)
to
find the uploaded log.
Portions adopted from https://github.com/rapidsai/cudf/blob/main/CONTRIBUTING.md, https://github.com/NVIDIA/nvidia-docker/blob/main/CONTRIBUTING.md, and https://github.com/NVIDIA/DALI/blob/main/CONTRIBUTING.md