-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCS K8s cluster standardization #181
Comments
@jschoone @garloff It seems that this existing epic does for the CaaS track what I intended the new epic SovereignCloudStack/standards#285 to do for the IaaS track. I guess it remains to compare the description here with the table https://input.scs.community/tqKlv1Z_Srmi5e5o76CxhQ?view#KaaS-Layer I took from Kurt's slides and maybe update accordingly? For instance, two standards have already been ticked off, even though we still need to implement the conformance tests -- @cah-hbaum will write the corresponding issues, and so I could add those to this epic. Please tell me if disagree to anything I just wrote. |
Comparison between this epic and the table from Kurt's ALASCA talk slides
Please check what should be added here or what I did wrong @garloff @jschoone. |
TL;DR: I want them all to be considered and discussed.
The thing here is that nginx upstream uses (1) The traffic only is routed to the nodes that run the nginx container - which requires a health monitor to be configured which on many LBs (including the octavia one) requires a special annotation or a changed default (2) The original client IP is visible and not obscured by the LB -- L2/L3 LB instead of L4
For both ControlPlane and Worker Nodes, the number of them and the Flavors need to be configurable. The madatory SCS- Flavors need to be accepted for the latter. (Sidenote: This is a cluster-management feature, not a cluster property -- the latter being something you can rely on once a cluster exists.)
We have sonobuoy binary installed on the management cluster and run it to test the workload clusters for CNCF conformance. So we have tooling to test CNCF conformance and we want to require CNCF conformance for all clusters.
We have a standard on this: scs-0210-v1. Maybe we need to amend that providers must not drop support for a minor k8s version earlier than upstream does stop the security support (after ~14 months after a release). And maybe we should recommend that for managed clusters, the provider sends a warning to the users when they have a cluster entering the extended support period (after ~12 months) and align the needed upgrades?
We had some concepts written down for this -- and determined that this should be optional (for the customer).
I did not check these for completeness, but everything above looks desirable to me. Note:
(2) What is the standardized parameter format and API to create, modify and delete clusters?
|
@garloff I amended the description of this issue by everything that hadn't been in there. Maybe we can now go ahead and group the items a bit, like I did in SovereignCloudStack/standards#285. |
I updated the epic and grouped everything a bit more together. But I think in the long run, something like a table would be better, since the "pre"-work for the standard issues is done in other issues or over multiple ones. |
I created individual issues for nearly all points not yet covered by previous issues. I left a few open, since the seemed way too general and broad. |
@cah-hbaum That sounds great! I also like the new structure in the description above. 👍👍👍 |
Short term
Medium term
Long term
Not enough information
Blocked
Already working on
|
Closing in favor of SovereignCloudStack/standards#615. |
As DevOps team (=SCS user), I want to have the ability to create and use clusters on many different SCS-compliant container providers, where all relevant properties are either predefined by the SCS standard or can be controlled by a provider-independent cluster-settings.yaml file.
Relevant properties are those that tend to create trouble for the application deployment, e.g. k8s versions, CNI features, persistent volumes, ingress/load-balancers, anti-affinity rules (avoiding to have k8s nodes on the same host) ...
These properties should either be fixed by SCS (and then of course only evolve slowly over time) or be controllable by the customer (via a standardized, provider-independent
cluster-params.yaml
. For the controllable properties, we mandate existence and syntax and we may mandate all or some of the supported options. In any case, the supported options need to be discoverable (and the mechanism for discoverability should include the fixed properties as well).Note that there is value in standardizing things that are not mandatory, in order for providers to use the same name/semantics for same things. (Obviously optional features may become mandatory for providers in the future if we decide so.)
Hints:
Extensibility: We allow for extensions, but they must be clearly distinguishable from standardized properties.
This epic should list the standardization proposals / ADRs as issues that we as SCS community want to define as SCS-compliant relevant. Some of the proposals might not make it for a v1 of the SCS standard (because they are not ready or deemed not important enough or downgraded to recommendations). The individual proposed properties / ADRs should come with a rationale and with (ideally comprehensive) conformance tests. We want to evolve the reference implementation(s) in parallel to the standardization, but intellectually keep a clear distinction b/w standards and implementation.
We need to create conformance tests for these properties; it is useful to define standards in terms of tests that must pass. (Test-driven standardization!) Obviously, using existing test suites (such as CNCF/sonobouy or aqua/kube-bench) and possibly contributing to them is a good way to do this.
Inspiration for the list below:
Individual topics for standardization:
Networking
Standardize k8s networking policies (CNI)
Service type LoadBalancer with externalTrafficPolicy: Local
LoadBalancer
withexternalTrafficPolicy: Local
needs to work out of the boxIngress Support (OPTIONAL)
Container Registry
Container registry feature overview
Registry Standard from DR SCS-0212
Derive a standard from the DR created in the previous registry issueSplit already existing document into a standard and a Decision Record only concerning the SCS clusterMeta
Supported k8s versions
K8s version support period
KaaS ControlPlane/worker machine flavors
Cluster management API
Automation
KaaS Cluster Management Gitops Controller
KaaS Gitops/CI tooling
Identity Management
Understand the requirements towards the IdP Broker to support the container layer
Implement Machine Identities
KaaS IAM federation with ID broker
Logging & Metrics
Metrics server support (OPT-OUT)(OPTIONAL)
Logging/Monitoring/Tracing features? (OPTIONAL)
Security & Robustness
Forwarding-porting and retesting of upstream intel patchset for SGX and OpenStack
K8s cluster baseline security setupK8s cluster hardeningMove Keycloak onto kubernetes powered runtime on management plane
KaaS Optional Cert-Manager
Distributed K8s nodes to ensure Anti-Affinity
KaaS Robustness features
Storage
Tests
Definition of Done:
The text was updated successfully, but these errors were encountered: