FIx hostname generation issue preventing distributed operations #1833
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The ClickHouse operator generates incorrect hostnames in
remote_servers.xmlconfiguration, causing DNS resolution failures and breaking distributed operations in multi-node clustered deployments (sharded and/or replicated setups).Issue Details:
chi-db-clickhouse-db-{shard}-{replica}(e.g.,chi-db-clickhouse-db-0-0)chi-db-clickhouse-db-{shard}-{replica}-{ordinal}(e.g.,chi-db-clickhouse-db-0-0-0)This mismatch causes all nodes to show
is_local=0insystem.clusters, breaking distributed operations andON CLUSTERcommands.Root Cause
The operator was not following Kubernetes StatefulSet DNS naming conventions. StatefulSets use a specific DNS pattern:
The
createPodFQDNfunction was incorrectly usingcreatePodHostname()(service name) instead ofcreatePodName()(actual pod name with-0ordinal suffix). While the service name would work for network connectivity, ClickHouse'sis_localdetection requires the hostname inremote_servers.xmlto exactly match the pod's actual hostname for proper cluster node identification.Solution: Fixed Both Hostname Generation Functions
Modified both
createPodHostnameandcreatePodFQDNfunctions in CHI and CHK namers:1. Fixed
createPodHostname()Before (broken): Returned service name without ordinal
After (fixed): Returns actual pod name with ordinal
2. Fixed
createPodFQDN()Before (broken): Used service name in FQDN
After (fixed): Uses proper StatefulSet DNS pattern
This ensures both functions return pod names that match actual StatefulSet pod hostnames, enabling proper
is_localdetection and DNS resolution.Files Changed:
pkg/model/chi/namer/name.go- Implemented proper StatefulSet DNS pattern for CHIpkg/model/chk/namer/name.go- Implemented proper StatefulSet DNS pattern for CHKCompatibility with namespaceDomainPattern
This fix is fully compatible with the existing
namespaceDomainPatternfunctionality. When users specify a custom domain pattern like:The implementation properly handles both cases:
<pod-name>.<headless-service-name>.<namespace>.svc.cluster.local<pod-name>.<headless-service-name>.<custom-domain-pattern>The
%splaceholder in namespaceDomainPattern gets replaced with the namespace name, maintaining full backward compatibility while fixing the underlying DNS resolution issues.Impact
ON CLUSTERoperations: Distributed DDL now works in sharded configurationsis_local=0issue: All cluster nodes correctly identify themselves as localextraConfigremote_servers overridesnamespaceDomainPatternoverridesOperator Log Messages Explained
This fix resolves continuous operator log messages like:
These occur because the operator's
IsHostInCluster()function queries:When hostnames mismatch, this always returns 0 (no local node found), causing the operator to repeatedly log that hosts are "outside" the cluster even when they're functioning correctly.
Testing & Validation
Production Tested:
ON CLUSTERDDL operations on 4-node cluster (2 shards, 2 replicas each)namespaceDomainPatterncompatibilityTechnical Details: StatefulSet DNS Pattern Implementation
The fix implements the standard Kubernetes StatefulSet DNS pattern by ensuring FQDNs follow:
Key Components:
chi-clickhouse-clickhouse-0-0-0(includes-0ordinal)chi-clickhouse-clickhouse-0-0(StatefulSet service)<namespace>.svc.cluster.local(or custom vianamespaceDomainPattern)This ensures proper DNS resolution for StatefulSet pods while maintaining compatibility with all existing cluster configurations and custom domain patterns.
Verification
After this fix, users will no longer need manual
extraConfigoverrides. The operator automatically generates correct hostnames inremote_servers.xmlthat match actual StatefulSet pod names and DNS patterns.Example of corrected hostname generation:
chi-db-clickhouse-db-0-0.namespace.svc.cluster.localchi-db-clickhouse-db-0-0-0.chi-db-clickhouse-db-0-0.namespace.svc.cluster.localThis change ensures ClickHouse can properly identify local replicas and enables all distributed operations to work correctly out of the box with proper StatefulSet DNS resolution.