Skip to content

CP ends up ignoring that it's jobs have been killed #577

Open
@daledude

Description

@daledude
  • what is happening and what you expect to see
    Consul had a half hour issue accepting service checks. Containerpilot eventually stopped PUT-ing health check updates for all jobs to consul. CP does continue to PUT health status updates for itself.

Also, CP seems to get into a state where it doesn't see that any of the spawned jobs are gone. The /status endpoint shows jobs as healthy when I manually killed them myself.

Also, the rsyslog-check that is in every config ends up outputting the following even though running the check manually is successful:

check.rsyslog timeout after 5s: '[514]'

The "check-port" health check script is merely this:

#!/bin/bash
/bin/netstat -tunl | /bin/grep ":$1 " > /dev/null 2>&1
ret=$?
exit $ret
  • the output of containerpilot -version
    Version: 3.8.0
    GitHash: 408dbc9

  • the ContainerPilot configuration you're using
    Doesn't matter the config. Happens to all my containers. Here is one anyways:

{
    consul: "{{.CONTAINER_HOST}}:8500",
    logging:
    {
        level: "INFO",
        format: "default",
        output: "stdout"
    },
    jobs: [
        {
            name: "rsyslog",
            exec: [ "rsyslogd-wrapper" ],
            restarts: "unlimited",
            health:
            {
                exec: "check-port 514", // Just simple: netstat ntlp | grep PORT
                interval: 2,
                ttl: 10,
                timeout: "5s",
            },
        },
        {{ if .DNSMASQ_SIDECAR }}
        {
            name: 'dnsmasq-{{.SERVICE_NAME_FULL}}',
            exec: [ "/usr/sbin/dnsmasq", "-k" ],
            restarts: "unlimited",
            port: "53",
            health:
            {
                exec: "check-port 53",
                interval: 2,
                ttl: 10,
                timeout: "5s",
            },
        },
        {{ end }}
        {
            name: "{{.SERVICE_NAME_FULL}}",
            when: {
              source: "watch.namingservice-{{.SERVICE_PLATFORM}}-{{.SERVICE_ENVIRONMENT}}",
              once: "healthy"
            },
            exec: [ 
                   "gosu", "admin",
                   "{{.BINDIR}}/{{.SERVICE_NAME}}", "-c", "{{.BASEDIR}}/cfg/{{.SERVICE_NAME}}.cfg", "-r", "short-recovery"
                  ],
	        restarts: "unlimited",
            port: "{{.SERVICE_PORT}}", // Causes service to be registered with Consul.
            health:
            {
                exec: "check-port {{.SERVICE_PORT}}",
                interval: 1,
                ttl: 10,
                timeout: "5s",
            },
            tags: [
                "{{.SERVICE_NAME}}",
                "{{.CONTAINER_HOST}}",
                "{{.SERVICE_ENVIRONMENT}}",
                "{{.SERVICE_PLATFORM}}"
            ],
            interfaces: [
                "10.0.0.0/8"
            ],
            consul:
            {
                enableTagOverride: true,
                deregisterCriticalServiceAfter: "6h"
            }
        },
        {
            // This job will watch for an event from Containerpilot that is fired
            //   when the "source" job in this config exits with a retcode > 0.
            // It then sends an event through Consul to notify this has occured.
            // A script run on the monitoring server will read the event
            //   from Consul.
            name: "{{.SERVICE_NAME_FULL}}-exit-failed-watcher",
            when: {
                source: "{{.SERVICE_NAME_FULL}}", // Must match the job name of the exec to watch.
                each: "exitFailed"
            },
            exec: [
                "send-consul-event", "service-exit-failed", "container_host={{.CONTAINER_HOST}}|service={{.SERVICE_NAME_FULL}}|hostname={{.HOSTNAME}}"
            ]
        }
    ],
    watches: [
      {
        name: "namingservice-{{.SERVICE_PLATFORM}}-{{.SERVICE_ENVIRONMENT}}",
        interval: 3
      }
    ]
}
  • the output of any logs you can share; if you can it would be very helpful to turn on debug logging by adding logging: { level: "DEBUG"} to your ContainerPilot configuration.
    I have logging set to debug but I don't have anything related to the issue. Seems logging output stopped?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions