Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running task pods in other Kubernetes namespaces #17738

Merged
merged 14 commits into from
Apr 1, 2025

Conversation

GWphua
Copy link
Contributor

@GWphua GWphua commented Feb 19, 2025

Description

This PR aims to allow task pods to run in a namespace different from the rest of the cluster.

Running Kubernetes Task Pods in Another Namespace

Currently, Druid already supports running task pods in another namespace by specifying druid.indexer.runner.namespace: taskPodNamespace. If we only have one Druid cluster that spins all these task pods up, we will not need this PR.

However, running task pods from multiple clusters will be a problem. Let us imagine this scenario:

  1. We have two Druid clusters running in two seperate namespaces: C1 and C2.
  2. Suppose we aim to run our task pods in another namespace X.
  3. C1 and C2 overlord spins up task pods by creating peon jobs in namespace X.
  4. WLOG, when Overlord server in C1 decides to retrieve peon jobs, it will receive jobs from BOTH C1 and C2.
  5. C1 Overlord throws an error as it cannot process jobs from C2.

Dealing with Kubernetes Task Pods from Multiple Namespaces

The Overlord service filters peon jobs by labels. Hence, the intuition behind this PR is to simply label the peon jobs (druid.overlord.namespace=C1) so that the Overlord can filter jobs that are relevant to its cluster.

Hence, we introduce the new config: druid.indexer.runner.overlordNamespace. Here's an usage example using the above-mentioned namespaces C1 and X.

  1. When not provided, Druid will assume that X = C1, and default to druid.indexer.runner.namespace.
  2. When we want to run task pods in X, we can set druid.indexer.runner.namespace: X and druid.indexer.runner.overlordNamespace: C1.

Documentation

Added documentation for running Kubernetes Task Pods in another namespace.

Release note

You can now run task pods in a namespace different from the rest of the cluster.


Key changed/added classes in this PR
  • k8s-jobs.md
  • KubernetesTaskRunnerConfig.java
  • KubernetesTaskRunnerFactory.java
  • KubernetesPeonClient.java
  • K8sTaskAdapter.java
  • PodTemplateTaskAdapter.java
  • DruidK8sConstants.java
  • Relevant test files and YAML files containing expected output.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@@ -264,6 +264,7 @@ private Map<String, String> getJobLabels(KubernetesTaskRunnerConfig config, Task
return ImmutableMap.<String, String>builder()
.putAll(config.getLabels())
.put(DruidK8sConstants.LABEL_KEY, "true")
.put(DruidK8sConstants.ORIGINAL_NAMESPACE_KEY, config.getOverlordNamespace())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it work well when the namespace is undefined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under KubernetesTaskRunnerConfig, the namespace is annotated as non-null. Hence, we should be able to get the namespace.


The extension uses `druid.indexer.runner.capacity` to limit the number of k8s jobs in flight. A good initial value for this would be the sum of the total task slots of all the middle managers you were running before switching to K8s based ingestion. The K8s task runner uses one thread per Job that is created, so setting this number too large can cause memory issues on the overlord. Additionally set the variable `druid.indexer.runner.namespace` to the namespace in which you are running druid.

Other configurations required are:
`druid.indexer.runner.type: k8s` and `druid.indexer.task.encapsulatedTask: true`

### Running Task Pods in Another Namespace

It is possible to run task pods in a different namespace from the rest of your Druid cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this feature is for pretty advanced kubernetes users and some of the complexity seems to be around trying to support the K8sTaskAdapter based pod adapters, would it make sense to just require using the pod template adapter if you want to use this feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @georgew5656, do you mean that we only add this functionality for PodTemplateTaskAdapter, and remove support in K8sTaskAdapter because Single/Multi ContainerTaskAdapter do not need to use this feature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense because for non pod template mode, tasks are scheduled in the same ns with overlord

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@github-actions github-actions bot added Area - Batch Ingestion Area - Querying Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 GHA labels Feb 20, 2025
@GWphua GWphua force-pushed the peon-in-other-namespaces branch from 8001e3f to a14434c Compare February 20, 2025 02:55
Copy link
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@GWphua
Copy link
Contributor Author

GWphua commented Feb 21, 2025

Changed logic to only support pod template adapters as suggested. Please take a look again.
@georgew5656 @FrankChen021

@cryptoe
Copy link
Contributor

cryptoe commented Feb 24, 2025

@GWphua I am also going through the PR. Will come back to you by the end of the week.

@GWphua
Copy link
Contributor Author

GWphua commented Mar 6, 2025

@GWphua I am also going through the PR. Will come back to you by the end of the week.

hi @cryptoe, any updates on your side regarding this?

@GWphua
Copy link
Contributor Author

GWphua commented Mar 24, 2025

@GWphua I am also going through the PR. Will come back to you by the end of the week.

Hey @cryptoe, are you still going through the PR?

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the changes since @FrankChen021's review and they look good to me. There is documentation and test coverage, and the change is done in a way where the risk appears low if the new druid.indexer.runner.overlordNamespace parameter is not set.

@gianm
Copy link
Contributor

gianm commented Apr 1, 2025

@GWphua thank you for your contribution!

@gianm gianm merged commit ec5e231 into apache:master Apr 1, 2025
75 checks passed
@cryptoe
Copy link
Contributor

cryptoe commented Apr 1, 2025

Thanks @gianm for the review.
Apologies @GWphua for the dealy.

@GWphua GWphua deleted the peon-in-other-namespaces branch April 2, 2025 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants