Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StatefulSet does not apply resource limits #223

Open
tmirks opened this issue Jul 3, 2018 · 11 comments
Open

StatefulSet does not apply resource limits #223

tmirks opened this issue Jul 3, 2018 · 11 comments

Comments

@tmirks
Copy link

tmirks commented Jul 3, 2018

Description

Elasticsearch cluster StatefulSets are created without resource limits set. Setting the limit values appear to be commented out in the code for whatever reason.

This is a problem if you've defined a LimitRange resource in your cluster to provide a default set of limits when otherwise unspecified.

Example

Our LimitRange:

...
  kind: LimitRange
  spec:
    limits:
    - default:
        cpu: "1"
        memory: 512Mi
...

The StatefulSet is created with the following resources (notice no limits):

...
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
...

The result is an error on the StatefulSet because the request is higher than the LimitRange defaults:

Invalid value: "1Gi": must be less than or equal to memory limit

Type     Reason        Age                 From                    Message
----     ------        ----                ----                    -------
Warning  FailedCreate  16s (x15 over 55s)  statefulset-controller  create Pod xxxxx-0 in StatefulSet xxxxx failed error: Pod "xxxxx-0" is invalid: spec.containers[0].resources.requests: Invalid value: "1Gi": must be less than or equal to memory limit
@stevesloka
Copy link
Member

We did have limits at one time, but ran into issues. There was a PR (#69) which addressed, but then some other issues came up. We should check upstream and see if any work has already been done that we could incorporate.

@stevesloka
Copy link
Member

There was this PR as well with some good comments: #120

@tmirks
Copy link
Author

tmirks commented Jul 31, 2018

@stevesloka what's the recommended course of action for resource limits then? Remove cluster defaults?

@stevesloka
Copy link
Member

@tmirks the tricky part with limits (at least with memory) is when the pod exceeds those limits the pod is restarted. Initially, I had them set and then took them out since during some testing I found my data nodes restarting all the time. So I commented out that bit in the code.

I think the proper way might be to do some tuning of the limits in combination with some Java options to also talk to the JVM.

@nabadger
Copy link

nabadger commented Sep 10, 2018

Hi,

I do think this should be an option - our clusters have resource limits enforced so have to set them. We would like to be able to specify the exact limits/requests for cpu/ram (this seems the norm. for other many other operators I've used).

If pods are getting killed, it's more of a cluster-admins role to ensure it's resourced up sufficiently, configured and monitored right?

@stevesloka
Copy link
Member

Clusters which enforce limits is a valid reason to add back. I'm not saying we shouldn't have them, just in my experience with using the operator, managing the JVM with limits can be tricky. I'll add a way to make them optional so you could still define and others could leave out.

@nabadger
Copy link

Awesome, many thanks 👍

@jimmyjones2
Copy link

Setting requests but not limit results in a Burstable QoS level. Given that Elasticsearch allocates pretty much all its memory on startup and it stays fixed, wouldn't it be better to set the limit and request to the same (ideally 2x heap), giving Guaranteed QoS so very unlikely to be killed (Guaranteed < Burstable < BestEfforts)?

The only situation I can see Burstable might be useful is setting the request to at least the Java heap size, allowing any unused memory on the node for the page cache - however I'm assuming when node memory pressure occurs the page cache would be reclaimed, bringing the container down to its request, rather than killing it.

@jimmyjones2
Copy link

Don't think my assumption about Burstable was correct - kubernetes/kubernetes#43916:

I'd like to share some observations, though I can't say I have a good solution to offer yet, other than to set a memory limit equal to the memory request for any pod that makes use of the file cache.

@kaarolch
Copy link
Contributor

kaarolch commented Oct 16, 2018

Anyway in most env people, teams have default request and limit set in case to limit all deployment without limit. In this type of env you have to recompile operator binary and create own container. Maybe a good solution would be to allow override default limits from CRD?

@alwinmarkcf
Copy link

Clusters which enforce limits is a valid reason to add back. I'm not saying we shouldn't have them, just in my experience with using the operator, managing the JVM with limits can be tricky. I'll add a way to make them optional so you could still define and others could leave out.

So is there something in there already? Do we need an additional value? I hit the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants