Open
Description
- what is happening and what you expect to see
Consul had a half hour issue accepting service checks. Containerpilot eventually stopped PUT-ing health check updates for all jobs to consul. CP does continue to PUT health status updates for itself.
Also, CP seems to get into a state where it doesn't see that any of the spawned jobs are gone. The /status endpoint shows jobs as healthy when I manually killed them myself.
Also, the rsyslog-check that is in every config ends up outputting the following even though running the check manually is successful:
check.rsyslog timeout after 5s: '[514]'
The "check-port" health check script is merely this:
#!/bin/bash
/bin/netstat -tunl | /bin/grep ":$1 " > /dev/null 2>&1
ret=$?
exit $ret
-
the output of
containerpilot -version
Version: 3.8.0
GitHash: 408dbc9 -
the ContainerPilot configuration you're using
Doesn't matter the config. Happens to all my containers. Here is one anyways:
{
consul: "{{.CONTAINER_HOST}}:8500",
logging:
{
level: "INFO",
format: "default",
output: "stdout"
},
jobs: [
{
name: "rsyslog",
exec: [ "rsyslogd-wrapper" ],
restarts: "unlimited",
health:
{
exec: "check-port 514", // Just simple: netstat ntlp | grep PORT
interval: 2,
ttl: 10,
timeout: "5s",
},
},
{{ if .DNSMASQ_SIDECAR }}
{
name: 'dnsmasq-{{.SERVICE_NAME_FULL}}',
exec: [ "/usr/sbin/dnsmasq", "-k" ],
restarts: "unlimited",
port: "53",
health:
{
exec: "check-port 53",
interval: 2,
ttl: 10,
timeout: "5s",
},
},
{{ end }}
{
name: "{{.SERVICE_NAME_FULL}}",
when: {
source: "watch.namingservice-{{.SERVICE_PLATFORM}}-{{.SERVICE_ENVIRONMENT}}",
once: "healthy"
},
exec: [
"gosu", "admin",
"{{.BINDIR}}/{{.SERVICE_NAME}}", "-c", "{{.BASEDIR}}/cfg/{{.SERVICE_NAME}}.cfg", "-r", "short-recovery"
],
restarts: "unlimited",
port: "{{.SERVICE_PORT}}", // Causes service to be registered with Consul.
health:
{
exec: "check-port {{.SERVICE_PORT}}",
interval: 1,
ttl: 10,
timeout: "5s",
},
tags: [
"{{.SERVICE_NAME}}",
"{{.CONTAINER_HOST}}",
"{{.SERVICE_ENVIRONMENT}}",
"{{.SERVICE_PLATFORM}}"
],
interfaces: [
"10.0.0.0/8"
],
consul:
{
enableTagOverride: true,
deregisterCriticalServiceAfter: "6h"
}
},
{
// This job will watch for an event from Containerpilot that is fired
// when the "source" job in this config exits with a retcode > 0.
// It then sends an event through Consul to notify this has occured.
// A script run on the monitoring server will read the event
// from Consul.
name: "{{.SERVICE_NAME_FULL}}-exit-failed-watcher",
when: {
source: "{{.SERVICE_NAME_FULL}}", // Must match the job name of the exec to watch.
each: "exitFailed"
},
exec: [
"send-consul-event", "service-exit-failed", "container_host={{.CONTAINER_HOST}}|service={{.SERVICE_NAME_FULL}}|hostname={{.HOSTNAME}}"
]
}
],
watches: [
{
name: "namingservice-{{.SERVICE_PLATFORM}}-{{.SERVICE_ENVIRONMENT}}",
interval: 3
}
]
}
- the output of any logs you can share; if you can it would be very helpful to turn on debug logging by adding
logging: { level: "DEBUG"}
to your ContainerPilot configuration.
I have logging set to debug but I don't have anything related to the issue. Seems logging output stopped?
Metadata
Metadata
Assignees
Labels
No labels