Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to provision iscsi volumes on truenas scale #67

Open
jstewart612 opened this issue Oct 22, 2024 · 17 comments
Open

Fails to provision iscsi volumes on truenas scale #67

jstewart612 opened this issue Oct 22, 2024 · 17 comments
Labels
duplicate This issue or pull request already exists next release This will be closed in the next release

Comments

@jstewart612
Copy link

jstewart612 commented Oct 22, 2024

ElectricEel-24.10-RC.2

Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG TrueNAS GET Request through fetch: 504
Tue, 22 Oct 2024 22:44:13 +0000 backend ERROR Not found: Volume with name pvc-7ec72f22-d747-4813-9027-72f25459836c not found.
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG Falcon Response (to HPE CSI): 404 Not Found
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG Last backend requests Response: 504 Gateway Time-out
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG API Key detected. Will use token authentication.
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG TrueNAS GET request URI: core/ping
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG TrueNAS response: "pong"
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG API fetch caught 1 item
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG HPE CSI Request <==============================>
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG          uri: http://truenas-csp-svc:8080/containers/v1/volumes?name=pvc-b59d2d9e-29f4-47b3-9757-0299fca39228
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG         body: None
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG        query: name=pvc-b59d2d9e-29f4-47b3-9757-0299fca39228
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG       method: GET
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG content_type: application/json
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG      headers: {"HOST": "truenas-csp-svc:8080", "USER-AGENT": "Go-http-client/1.1", "CONNECTION": "close", "ACCEPT": "
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG API Key detected. Will use token authentication.
Tue, 22 Oct 2024 22:44:13 +0000 backend DEBUG TrueNAS GET request URI: pool/dataset
Tue, 22 Oct 2024 22:44:16 +0000 backend DEBUG TrueNAS response: <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>
Tue, 22 Oct 2024 22:49:16 +0000 backend ERROR Backend Request (GET) Exception: Traceback (most recent call last):
  File "/app/backend.py", line 404, in get
    self.req_backend.raise_for_status()
  File "/app/lib/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://hl15-1.home.thecrimsontint.com/api/v2.0/pool/dataset
@jstewart612
Copy link
Author

StorageClass parameters:

parameters:
  allowOverrides: sparse,compression,deduplication,volblocksize,sync,description
  csi.storage.k8s.io/controller-expand-secret-name: hl15-1-iscsi
  csi.storage.k8s.io/controller-expand-secret-namespace: truenas-csp
  csi.storage.k8s.io/controller-publish-secret-name: hl15-1-iscsi
  csi.storage.k8s.io/controller-publish-secret-namespace: truenas-csp
  csi.storage.k8s.io/fstype: xfs
  csi.storage.k8s.io/node-publish-secret-name: hl15-1-iscsi
  csi.storage.k8s.io/node-publish-secret-namespace: truenas-csp
  csi.storage.k8s.io/node-stage-secret-name: hl15-1-iscsi
  csi.storage.k8s.io/node-stage-secret-namespace: truenas-csp
  csi.storage.k8s.io/provisioner-secret-name: hl15-1-iscsi
  csi.storage.k8s.io/provisioner-secret-namespace: truenas-csp
  root: data/truenas-csp

Screenshot of the dataset "data/truenas-csp":
image

@jstewart612
Copy link
Author

Proof that truenas-csp pod can hit the API:

/app # hostname; wget -q -O- https://hl15-1.home.thecrimsontint.com/api/v2.0 | head -n5
truenas-csp-dbc4f9d86-xg9xn
{
 "openapi": "3.0.0",
 "info": {
  "title": "TrueNAS RESTful API",
  "version": "v2.0"
/app #

@jstewart612
Copy link
Author

If I had to guess, your problem is here:

#66 (comment)

If I manually run that API call, it takes about 25s to return.

@datamattsson
Copy link
Collaborator

If I manually run that API call, it takes about 25s to return.

Thank you for reporting this. How many datasets (including snapshots) do you have?

@jstewart612
Copy link
Author

About 20

@datamattsson
Copy link
Collaborator

I did some adhoc curl'ing. In the current shape and form, one second is added to curl -s root:admin@truenas/api/v2.0/pool/dataset for every 100 datasets. On my VM running on 10 year old hardware, 600 datasets took 5 seconds to GET.

Using the query-filters and dialing down query-options to just retrieve the bare minimum, the same GET takes 700ms and this is definitely the optimization I'm looking to do, but I think there's something else that is broken in your case.

@jstewart612
Copy link
Author

Same truenas version?

@datamattsson
Copy link
Collaborator

Dragonfish-24.04.2

@jstewart612
Copy link
Author

Ah. I am ElectricEel-24.10-RC.2.

Also: the entire way you do business will break in 25.04:

image

@jstewart612
Copy link
Author

Seems this will be its replacement: https://github.com/truenas/api_client

@jstewart612
Copy link
Author

That client performs the same api call in 17s.

@jstewart612
Copy link
Author

Regardless... your hypothesis is overall correct.

I have two TrueNAS scale appliances. One this RESTful API call works fine as is. The other, it does not. Happen to know where on a TrueNAS scale box the restful api logs? Would be interesting to fully RCA this.

@jstewart612
Copy link
Author

Runtime of RESTful call at localhost:

real    1m17.001s
user    0m0.004s
sys     0m0.033s

@jstewart612
Copy link
Author

Found it.

Used democratic-csi before. It managed to make over 81000 snapshots of various things. Removed them all and now this api call returns even RESTful in under 7s.

@datamattsson
Copy link
Collaborator

Also: the entire way you do business will break in 25.04:

Yes, I've been served this notice and will replace my home cooked REST interaction with the websocket library.

Used democratic-csi before. It managed to make over 81000 snapshots of various things. Removed them all and now this api call returns even RESTful in under 7s.

Still seems very slow.

@jstewart612
Copy link
Author

Indeed. That, however, is an issue for iXSystems. I'll file a ticket over there.

So it sounds like the path here is probably filters in a future release then websocket come 25.04 or maybe websocket first?

@datamattsson
Copy link
Collaborator

So it sounds like the path here is probably filters in a future release then websocket come 25.04 or maybe websocket first?

The next release by the end of the year will have query option filters. This will most likely be the last RESTful release before switching to websocket but that has not been scoped yet.

@datamattsson datamattsson added duplicate This issue or pull request already exists next release This will be closed in the next release labels Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists next release This will be closed in the next release
Projects
None yet
Development

No branches or pull requests

2 participants