Logtrain is a system for dynamically forwarding and transforming logs, similar to fluentd but a bit more specialized to solve two issues...
- Have very low overhead (e.g., less than 64Mi)
- Dynamically route logs based on various data sources.
es+https://user:password@host?[auth=apikey|bearer|basic]&[index=...]&[insecure=true]
es+http://user:password@host?[auth=apikey|bearer|basic]&[index=...]&[insecure=true]
The bearer token is taken from the password portion of the url. Api keys the API id should be used as the username and the API key should be the password. Setting insecure=true ignores certificate failures
http://host/path
https://host/path?[insecure=true]
Setting insecure=true ignores certificate failures
syslog+tls://host:port?[ca=]
syslog+http://host:port
syslog+https://host:port
syslog+tcp://host:port
syslog+udp://
(aliases,syslog://
)
persistent://key
Persistent storage will fail if persistent storage is not configured (see below). The key may be any value up to 128 characters. The persistent key may only be used on finite resources such as Pods, and cannot be set on deployments, statefulsets, etc.
TODO
Below is the recommended way of deploying Log Train. Note that this must run as a privileged container,
so that it can read files from /var/log/containers
. The daemonset also contains an initContainer that
will change the sysctl fs.inotify.max_user_instances
to 2048
on your nodes (usually from 128).
kubectl apply -f ./deployments/kubernetes/logtrain-serviceaccount.yaml
kubectl apply -f ./deployments/kubernetes/logtrain-service.yaml
kubectl apply -f ./deployments/kubernetes/logtrain-daemonset.yaml
Once deployed you can use the following annotations on deployments, daemonsets or statefulsets to forward logs.
logtrain.akkeris.io/drains
This annoation is a comma delimited list of drains (See Drain Types above).
logtrain.akkeris.io/hostname
Explicitly set the hostname used when reading in logs from kubernetes, if not set this will default to the name.namespace
.
logtrain.akkeris.io/tag
Explicitly set the tag when reading in logs from kuberntes, if not set this will default to the pod name.
TODO
HTTP_PORT
- The port to use for the http server, shared by any http (payload) and http (syslog) inputs.
Whether to watch a postgres database for information on where to foward logs to.
POSTGRES
- set totrue
DATABASE_URL
- The database url to use to listen for drain changes.
Whether to watch kubernetes deployments, statefulsets and daemonsets for annotations indicating where logs should be forwarded to.
KUBERNETES_DATASOURCE
- set totrue
Persistent log storage can be done via a postgres database. Set PERSISTENT_DATABASE_URL
to specify the database to store logs in.
Logs stored can be retrieved directly through the database in the table logs.data
with the key being logs.id
column. In addition,
logs persisted can be retrieved via the /logs/:key
.
PERSISTENT
- set totrue
PERSISTENT_DATABASE_URL
- A postgres database to store logs in the format of postgres://user:pass@host:5432/dbname.PERSISTENT_PATH
- The path on the http end point to respond to log requests, defaults to/logs/
Whether to watch the KUBERNETES_LOG_PATH
directory for pod logs and forward them.
KUBERNETES
- set totrue
KUBERNETES_LOG_PATH
- optional, the path on each node to look for logs. Defaults to/var/log/containers
EXCLUDE_NAMESPACES
- optional, a comma separated list of namespaces to ignore
Whether to open a gRPC access log stream end point for istio/envoy to stream http log traffic to.
ENVOY
- set totrue
ENVOY_PORT
- The port number to listen for gRPC access log streams (default is9001
)
HTTP_EVENTS
- set totrue
HTTP_EVENTS_PATH
- optional, The path on the http server to receive http event payloads, defaults to/events
Note, the port is inherited from HTTP_PORT
. The endpoint only allows one per event over the body and must
be the format defined by pkg/output/packet/packet.go.
HTTP_SYSLOG
- set totrue
HTTP_SYSLOG_PATH
- optional, The path on the http server to receive syslog streams as http, defaults to/syslog
Note, the port is inherited from HTTP_PORT
.
SYSLOG_TCP
- set totrue
SYSLOG_TCP_PORT
- optional, defaults tcp9002
SYSLOG_UDP
- set totrue
SYSLOG_UDP_PORT
- optional, defaults to9003
SYSLOG_TLS
- set totrue
SYSLOG_TLS_CERT_PEM
- The PEM encoded certificateSYSLOG_TLS_CA_PEM
- The PEM encoded certificate authority (optional)SYSLOG_TLS_KEY_PEM
- The PEM encoded certificate keySYSLOG_TLS_SERVER_NAME
- The servername the TLS server should use for SNISYSLOG_TLS_PORT
- optional, defaults to9004
AKKERIS=true
- for Akkeris formatting of outputONLY_AKKERIS=true
- Optional, ignore any other Kubernetes pods
The logtrain has been tested to be below 64MB (avg 59MB) and < 100m (5%) CPU for 52 pods on a node with 1500+ deployments being watched. For more pods per node or more deployments than the benchmark expect (and reset any limits/requests) for memory.
While targeting a 64MB top limit, logtrain should have a limit of 128MB.
If you receive an error on startup with too many files open
error message you'll need to increase
the fs.inotify.max_user_instances
and user.max_inotify_instances
. These are generally set to
128
by default, depending how many pods are running this may be insufficient.
sysctl -w fs.inotify.max_user_instances=2048
sysctl -w user.max_inotify_instances=2048
go build -o logtrain github.com/akkeris/logtrain/cmd/logtrain
go build -o logtail github.com/akkeris/logtrain/cmd/logtail
go test -v .../.
(Note if you're using GoConvey its best to set the parallel packges to 1 via -packages 1
)
go test -coverprofile cover.out -v ./... && go tool cover -html=cover.out && rm cover.out