EBS-CSI controller fails to provision volumes with authorization failure #11495

akini-wso2 · 2024-08-31T02:15:46Z

What happened?

Ihave provisioned a kubernetes cluster using kubespray of EC2 instances in AWS. After the cluster is successfully provisioned and all nodes are healthy and running, I installed the EBS-CSI driver by following the steps as recommended and then running the cluster.yml ansible playbook.

Initially, the ebs-csi controller pod was in crontroller pod was in crashback loop of state with the ebs-plugin container inside the pod failing. Error was 'CSI-NODE NAME NOT SET'. I was able to fix this issue by adding an env variable into the ebs-csi-controller by editing the deployment. Storage class was created as expected.

When running the sample PVC and pod, in official kubespray githup repo, the pvc was in pending state.

https://github.com/kubernetes-sigs/kubespray/blob/master/docs/CSI/aws-ebs-csi.md

Error log of ebs-csi-controller pod:

Warning ProvisioningFailed 15m ebs.csi.aws.com_ebs-csi-controller-75d79769b8-bbftz_1cfa04d6-8ed3-42f2-9834-1dfaa7687054 failed to provision volume with StorageClass "ebs-sc-new": rpc error: code = Internal desc = AuthFailure: AWS was not able to validate the provided access credentials
status code: 401, request id: 16eac760-e2e5-4182-a2c1-89cff669f3bd
Warning ProvisioningFailed 14m ebs.csi.aws.com_ebs-csi-controller-75d79769b8-bbftz_1cfa04d6-8ed3-42f2-9834-1dfaa7687054 failed to provision volume with StorageClass "ebs-sc-new": rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context deadline exceeded
Normal Provisioning 95s (x12 over 16m) ebs.csi.aws.com_ebs-csi-controller-75d79769b8-bbftz_1cfa04d6-8ed3-42f2-9834-1dfaa7687054 External provisioner is provisioning volume for claim "default/ebs-pvc"
Warning ProvisioningFailed 85s (x7 over 15m) ebs.csi.aws.com_ebs-csi-controller-75d79769b8-bbftz_1cfa04d6-8ed3-42f2-9834-1dfaa7687054 failed to provision volume with StorageClass "ebs-sc-new": rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal ExternalProvisioning 57s (x62 over 16m) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'ebs.csi.aws.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.

What did you expect to happen?

The pvc to bound and a volume to be created in AWS for the pod.

How can we reproduce it (as minimally and precisely as possible)?

Provision a kuberenetes cluster on AWS with EC2 instances using kubespray.

To install the ebs-csi-driver:

Uncommented the aws_ebs_csi_enabled option in group_vars/all/aws.yml and set it to true.
Set persistent_volumes_enabled in group_vars/k8s_cluster/k8s_cluster.yml to true.
Attached role to all the EC2 instances to allow all EBS actions
Created and applied secret to provide AWS credentials (access token and key)
Ran cluster.yml playbook.

To fix CSI_NODE_NAME env var not set:

kubectl edit deployment.apps/ebs-csi-controller -n kube-system
env:

name: CSI_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName

OS

Linux 6.5.0-1022-aws x86_64
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Version of Ansible

ansible [core 2.14.17]
config file = /etc/ansible/ansible.cfg
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.10/dist-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] (/usr/bin/python3)
jinja version = 3.1.2
libyaml = True

Version of Python

Python 3.10.12

Version of Kubespray (commit)

kubespray:v2.25.0

Network plugin used

cilium

Full inventory with variables

[all]
master1 ansible_host=10.0.0.101 ip=10.0.0.101
master2 ansible_host=10.0.4.70 ip=10.0.4.70
master3 ansible_host=10.0.15.218 ip=10.0.15.218
worker1 ansible_host=10.0.21.128 ip=10.0.21.128
worker2 ansible_host=10.0.24.96 ip=10.0.24.96
etcd1 ansible_host=10.0.5.14 ip=10.0.5.14

[kube_control_plane]
master2
master1

[etcd]
etcd1

[kube_node]
worker1
worker2

[calico_rr]

[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr

Command used to invoke ansible

sudo docker run --rm -it --mount type=bind,source=/home/ubuntu/kubespray/inventory/mycluster/,dst=/inventory --mount type=bind,source=/home/ubuntu/.ssh/id_rsa,dst=/root/.ssh/id_rsa --mount type=bind,source=/home/ubuntu/.ssh/id_rsa,dst=/home/ubuntu/.ssh/id_rsa quay.io/kubespray/kubespray:v2.25.0 bash ansible-playbook -i /inventory/inventory.ini cluster.yml --user=ubuntu --become --become-user=root --private-key=/home/ubuntu/.ssh/id_rsa -e kube_network_plugin=cilium --flush-cache

Output of ansible run

PLAY RECAP *************************************************************************************
etcd1 : ok=137 changed=11 unreachable=0 failed=0 skipped=340 rescued=0 ignored=0
master1 : ok=491 changed=14 unreachable=0 failed=0 skipped=950 rescued=0 ignored=1
master2 : ok=540 changed=22 unreachable=0 failed=0 skipped=1040 rescued=0 ignored=1
worker1 : ok=412 changed=15 unreachable=0 failed=0 skipped=638 rescued=0 ignored=1
worker2 : ok=412 changed=15 unreachable=0 failed=0 skipped=633 rescued=0 ignored=1

Thursday 29 August 2024 00:22:10 +0000 (0:00:00.302) 0:08:02.550 *******

container-engine/runc : Download_file | Download item ---------------------------------- 10.46s
container-engine/containerd : Download_file | Download item ---------------------------- 10.18s
container-engine/crictl : Download_file | Download item -------------------------------- 10.09s
container-engine/nerdctl : Download_file | Download item -------------------------------- 9.99s
container-engine/crictl : Extract_file | Unpacking archive ------------------------------ 7.79s
kubernetes/preinstall : Update package management cache (APT) --------------------------- 7.64s
container-engine/nerdctl : Extract_file | Unpacking archive ----------------------------- 6.92s
kubernetes-apps/ansible : Kubernetes Apps | Start Resources ----------------------------- 5.63s
download : Download_file | Download item ------------------------------------------------ 5.38s
kubernetes-apps/ansible : Kubernetes Apps | Lay Down CoreDNS templates ------------------ 4.90s
etcdctl_etcdutl : Download_file | Download item ----------------------------------------- 4.84s
kubernetes-apps/ingress_controller/ingress_nginx : NGINX Ingress Controller | Create manifests --- 4.83s
download : Download | Download files / images ------------------------------------------- 4.57s
kubernetes-apps/ingress_controller/ingress_nginx : NGINX Ingress Controller | Apply manifests --- 4.44s
network_plugin/cilium : Cilium | Create Cilium node manifests --------------------------- 4.28s
container-engine/containerd : Containerd | Unpack containerd archive -------------------- 4.10s
etcdctl_etcdutl : Extract_file | Unpacking archive -------------------------------------- 3.94s
kubernetes-apps/metrics_server : Metrics Server | Create manifests ---------------------- 3.75s
network_plugin/cilium : Cilium | Start Resources ---------------------------------------- 3.69s
container-engine/containerd : Download_file | Create dest directory on node ------------- 3.61s

Anything else we need to know

No response

tico88612 · 2024-08-31T12:05:56Z

Kubespray's EBS CSI is a bit old. If you need to fix it urgently, you can refer to the aws-ebs-csi-driver repo.

akini-wso2 added the kind/bug Categorizes issue or PR as related to a bug. label Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EBS-CSI controller fails to provision volumes with authorization failure #11495

EBS-CSI controller fails to provision volumes with authorization failure #11495

akini-wso2 commented Aug 31, 2024 •

edited

Loading

tico88612 commented Aug 31, 2024

EBS-CSI controller fails to provision volumes with authorization failure #11495

EBS-CSI controller fails to provision volumes with authorization failure #11495

Comments

akini-wso2 commented Aug 31, 2024 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

OS

Version of Ansible

Version of Python

Version of Kubespray (commit)

Network plugin used

Full inventory with variables

Command used to invoke ansible

Output of ansible run

Thursday 29 August 2024 00:22:10 +0000 (0:00:00.302) 0:08:02.550 *******

Anything else we need to know

tico88612 commented Aug 31, 2024

akini-wso2 commented Aug 31, 2024 •

edited

Loading