CP ends up ignoring that it's jobs have been killed

- what is happening and what you expect to see
Consul had a half hour issue accepting service checks. Containerpilot eventually stopped PUT-ing health check updates for all jobs to consul. CP does continue to PUT health status updates for itself.

Also, CP seems to get into a state where it doesn't see that any of the spawned jobs are gone. The /status endpoint shows jobs as healthy when I manually killed them myself.

Also, the rsyslog-check that is in every config ends up outputting the following even though running the check manually is successful:
```
check.rsyslog timeout after 5s: '[514]'
```
The "check-port" health check script is merely this:
```
#!/bin/bash
/bin/netstat -tunl | /bin/grep ":$1 " > /dev/null 2>&1
ret=$?
exit $ret
```

- the output of `containerpilot -version`
Version: 3.8.0
GitHash: 408dbc9

- the ContainerPilot configuration you're using
Doesn't matter the config. Happens to all my containers. Here is one anyways:
```
{
    consul: "{{.CONTAINER_HOST}}:8500",
    logging:
    {
        level: "INFO",
        format: "default",
        output: "stdout"
    },
    jobs: [
        {
            name: "rsyslog",
            exec: [ "rsyslogd-wrapper" ],
            restarts: "unlimited",
            health:
            {
                exec: "check-port 514", // Just simple: netstat ntlp | grep PORT
                interval: 2,
                ttl: 10,
                timeout: "5s",
            },
        },
        {{ if .DNSMASQ_SIDECAR }}
        {
            name: 'dnsmasq-{{.SERVICE_NAME_FULL}}',
            exec: [ "/usr/sbin/dnsmasq", "-k" ],
            restarts: "unlimited",
            port: "53",
            health:
            {
                exec: "check-port 53",
                interval: 2,
                ttl: 10,
                timeout: "5s",
            },
        },
        {{ end }}
        {
            name: "{{.SERVICE_NAME_FULL}}",
            when: {
              source: "watch.namingservice-{{.SERVICE_PLATFORM}}-{{.SERVICE_ENVIRONMENT}}",
              once: "healthy"
            },
            exec: [ 
                   "gosu", "admin",
                   "{{.BINDIR}}/{{.SERVICE_NAME}}", "-c", "{{.BASEDIR}}/cfg/{{.SERVICE_NAME}}.cfg", "-r", "short-recovery"
                  ],
	        restarts: "unlimited",
            port: "{{.SERVICE_PORT}}", // Causes service to be registered with Consul.
            health:
            {
                exec: "check-port {{.SERVICE_PORT}}",
                interval: 1,
                ttl: 10,
                timeout: "5s",
            },
            tags: [
                "{{.SERVICE_NAME}}",
                "{{.CONTAINER_HOST}}",
                "{{.SERVICE_ENVIRONMENT}}",
                "{{.SERVICE_PLATFORM}}"
            ],
            interfaces: [
                "10.0.0.0/8"
            ],
            consul:
            {
                enableTagOverride: true,
                deregisterCriticalServiceAfter: "6h"
            }
        },
        {
            // This job will watch for an event from Containerpilot that is fired
            //   when the "source" job in this config exits with a retcode > 0.
            // It then sends an event through Consul to notify this has occured.
            // A script run on the monitoring server will read the event
            //   from Consul.
            name: "{{.SERVICE_NAME_FULL}}-exit-failed-watcher",
            when: {
                source: "{{.SERVICE_NAME_FULL}}", // Must match the job name of the exec to watch.
                each: "exitFailed"
            },
            exec: [
                "send-consul-event", "service-exit-failed", "container_host={{.CONTAINER_HOST}}|service={{.SERVICE_NAME_FULL}}|hostname={{.HOSTNAME}}"
            ]
        }
    ],
    watches: [
      {
        name: "namingservice-{{.SERVICE_PLATFORM}}-{{.SERVICE_ENVIRONMENT}}",
        interval: 3
      }
    ]
}
```


- the output of any logs you can share; if you can it would be very helpful to turn on debug logging by adding `logging: { level: "DEBUG"}` to your ContainerPilot configuration.
I have logging set to debug but I don't have anything related to the issue. Seems logging output stopped?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CP ends up ignoring that it's jobs have been killed #577

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CP ends up ignoring that it's jobs have been killed #577

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions