From 0572e5740901fbd1a260f92f29cad7c81fec5e82 Mon Sep 17 00:00:00 2001 From: Hernan Cianfagna <110453267+hlcianfagna@users.noreply.github.com> Date: Mon, 15 Sep 2025 21:37:45 +0200 Subject: [PATCH 1/7] rsyslog: Index page and starter tutorial --- docs/integrate/index.md | 1 + docs/integrate/rsyslog/index.md | 45 ++++++++++ docs/integrate/rsyslog/tutorial.md | 133 +++++++++++++++++++++++++++++ 3 files changed, 179 insertions(+) create mode 100644 docs/integrate/rsyslog/index.md create mode 100644 docs/integrate/rsyslog/tutorial.md diff --git a/docs/integrate/index.md b/docs/integrate/index.md index 7c11fc4c..b649aab4 100644 --- a/docs/integrate/index.md +++ b/docs/integrate/index.md @@ -66,6 +66,7 @@ queryzen/index r/index rill/index risingwave/index +rsyslog/index scikit-learn/index sql-server/index streamlit/index diff --git a/docs/integrate/rsyslog/index.md b/docs/integrate/rsyslog/index.md new file mode 100644 index 00000000..1de92f8d --- /dev/null +++ b/docs/integrate/rsyslog/index.md @@ -0,0 +1,45 @@ +(rsyslog)= +# rsyslog + +```{div} .float-right +[![rsyslog logo](https://www.rsyslog.com/files/2019/01/logo_neu_cropped.png){height=60px loading=lazy}][rsyslog] +``` +```{div} .clearfix +``` + + +:::{rubric} About +::: + +[Rsyslog] is a rocket-fast system for log processing. + +It offers high-performance, advanced security features, and a modular design. +Originally a regular syslogd, rsyslog has evolved into a highly versatile +logging solution capable of ingesting data from numerous sources, +transforming it, and outputting it to a wide variety of destinations. + +Rsyslog can deliver over one million messages per second to local +destinations under minimal processing load. Even with complex routing +and remote forwarding, performance remains excellent. + +:::{rubric} Learn +::: + +::::{grid} 2 + +:::{grid-item-card} Tutorial: Store server logs in CrateDB using rsyslog +:link: rsyslog-tutorial +:link-type: ref +Storing server logs in CrateDB delivers fast search and aggregations on them. +::: + +:::: + +:::{toctree} +:maxdepth: 1 +:hidden: +Tutorial +::: + + +[rsyslog]: https://www.rsyslog.com/ diff --git a/docs/integrate/rsyslog/tutorial.md b/docs/integrate/rsyslog/tutorial.md new file mode 100644 index 00000000..b0c5192b --- /dev/null +++ b/docs/integrate/rsyslog/tutorial.md @@ -0,0 +1,133 @@ +(rsyslog-tutorial)= +# Storing server logs on CrateDB for fast search and aggregations + +## Introduction + +Did you know that CrateDB can be a great store for your server logs? + +If you have been using log aggregation tools or even some of the most advanced commercial SIEM systems, you have probably experienced the same frustrations I have: + +* timeouts when searching logs over long periods of time +* a complex and proprietary query syntax +* difficulties integrating queries on logs data into application monitoring dashboards + +Storing server logs on CrateDB solves these problems, it allows to query the logs with standard SQL and from any tool supporting the PostgreSQL protocol; its unique indexing also makes full-text queries and aggregations super fast. +Let me show you an example. + +## Setup + +### CrateDB + +First, we will need an instance of CrateDB, it may be best to have a dedicated cluster for this purpose, to separate the monitoring system from the systems being monitored, but for the purpose of this demo we can just have a single node cluster on a docker container: + +```bash +sudo docker run -d --name cratedb --publish 4200:4200 --publish 5432:5432 --env CRATE_HEAP_SIZE=1g crate -Cdiscovery.type=single-node +``` + +Next, we need a table to store the logs, let's connect to `http://localhost:4200/#!/console` and run: + +```sql +CREATE TABLE doc.systemevents ( + message TEXT + ,INDEX message_ft USING FULLTEXT(message) + ,facility INTEGER + ,fromhost TEXT + ,priority INTEGER + ,DeviceReportedTime TIMESTAMP + ,ReceivedAt TIMESTAMP + ,InfoUnitID INTEGER + ,SysLogTag TEXT + ); +``` +Tip: if you are on a headless system you can also run queries with {ref}`command-line tools `. + +Then we need an account for the logging system: + +```sql +CREATE USER rsyslog WITH (PASSWORD='pwd123'); +``` + +and we need to grant permissions on the table above: + +```sql +GRANT DML ON TABLE doc.systemevents TO rsyslog; +``` + +### rsyslog + +We will use [rsyslog](https://github.com/rsyslog/rsyslog) to send the logs to CrateDB, for this setup we need `rsyslog` v8.2202 or higher and the `ompgsql` module: + +```bash +sudo add-apt-repository ppa:adiscon/v8-stable +sudo apt-get update +sudo apt-get install rsyslog +sudo debconf-set-selections <<< 'rsyslog-pgsql rsyslog-pgsql/dbconfig-install string false' +sudo apt-get install rsyslog-pgsql +``` + +Let's now configure it to use the account we created earlier: + +```bash +echo 'module(load="ompgsql")' | sudo tee /etc/rsyslog.d/pgsql.conf +echo '*.* action(type="ompgsql" conninfo="postgresql://rsyslog:pwd123@localhost/doc")' | sudo tee -a /etc/rsyslog.d/pgsql.conf +sudo systemctl restart rsyslog +``` + +If you are interested in more advanced setups involving queuing for additional reliability in production scenarios, you can read more about available settings in the [rsyslog documentation](https://www.rsyslog.com/doc/v8-stable/tutorials/high_database_rate.html). + +### MediaWiki + +Now let's imagine that we want to run a container with [MediaWiki](https://www.mediawiki.org/wiki/MediaWiki) to host an intranet and we want all logs to go to CrateDB, we can just deploy this with: + +```bash +sudo docker run --name mediawiki -p 80:80 -d --log-driver syslog --log-opt syslog-address=unixgram:///dev/log mediawiki +``` + +If we now point a web browser to port 80 at `http://localhost/`, you will see a new MediaWiki page. +Let's play around a bit to generate log entries, just click on "set up the wiki" and then once on Continue. +This will have generated entries in the `doc.systemevents` table with `syslogtag` matching the container id of the container running the site. + + +## Explore + +We can now use the {ref}`crate-reference:predicates_match` to find the error messages we are interested in: + +```sql +SELECT devicereportedtime,message +FROM doc.systemevents +WHERE MATCH(message_ft, 'Could not reliably determine') USING PHRASE +ORDER BY 1 DESC; +``` + +```text ++--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| devicereportedtime | message | ++--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| 1691510710000 | AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.3. Set the 'ServerName' directive globally to suppress this message | +| 1691510710000 | AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.3. Set the 'ServerName' directive globally to suppress this message | ++--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +Let's now see which log sources created the most entries: + +```sql +SELECT syslogtag,count(*) +FROM doc.systemevents +GROUP BY 1 +ORDER BY 2 DESC +LIMIT 5; +``` + +```text ++----------------------+----------+ +| syslogtag | count(*) | ++----------------------+----------+ +| kernel: | 23 | +| 083053ae8ea3[52134]: | 20 | +| systemd[1]: | 15 | +| sudo: | 10 | +| rsyslogd: | 5 | ++----------------------+----------+ +``` + +I hope you found this interesting. Please do not hesitate to let us know your thoughts in the [CrateDB Community](https://community.cratedb.com/). From 720ddc21d764e537155e69b0471faefa48f8351b Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Mon, 15 Sep 2025 22:39:00 +0200 Subject: [PATCH 2/7] rsyslog: Implement suggestions by CodeRabbit --- docs/integrate/rsyslog/tutorial.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/docs/integrate/rsyslog/tutorial.md b/docs/integrate/rsyslog/tutorial.md index b0c5192b..630680ff 100644 --- a/docs/integrate/rsyslog/tutorial.md +++ b/docs/integrate/rsyslog/tutorial.md @@ -28,22 +28,23 @@ Next, we need a table to store the logs, let's connect to `http://localhost:4200 ```sql CREATE TABLE doc.systemevents ( - message TEXT - ,INDEX message_ft USING FULLTEXT(message) - ,facility INTEGER - ,fromhost TEXT - ,priority INTEGER - ,DeviceReportedTime TIMESTAMP - ,ReceivedAt TIMESTAMP - ,InfoUnitID INTEGER - ,SysLogTag TEXT - ); + message TEXT, + INDEX message_ft USING FULLTEXT(message) WITH (analyzer = 'english'), + facility INTEGER, + fromhost TEXT, + priority INTEGER, + DeviceReportedTime TIMESTAMP, + ReceivedAt TIMESTAMP, + InfoUnitID INTEGER, + SysLogTag TEXT +); ``` Tip: if you are on a headless system you can also run queries with {ref}`command-line tools `. Then we need an account for the logging system: ```sql +-- Use a strong secret; e.g. from a secret manager or env var. CREATE USER rsyslog WITH (PASSWORD='pwd123'); ``` @@ -70,6 +71,7 @@ Let's now configure it to use the account we created earlier: ```bash echo 'module(load="ompgsql")' | sudo tee /etc/rsyslog.d/pgsql.conf echo '*.* action(type="ompgsql" conninfo="postgresql://rsyslog:pwd123@localhost/doc")' | sudo tee -a /etc/rsyslog.d/pgsql.conf +sudo chmod 640 /etc/rsyslog.d/pgsql.conf sudo systemctl restart rsyslog ``` From 3bf4588ba3c735743a6d4d9b65685a6b71559b2d Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Mon, 15 Sep 2025 22:43:52 +0200 Subject: [PATCH 3/7] rsyslog: Cross-link into "Metrics and telemetry data" section --- docs/ingest/telemetry/index.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/ingest/telemetry/index.md b/docs/ingest/telemetry/index.md index eb9e1aff..0d835ad0 100644 --- a/docs/ingest/telemetry/index.md +++ b/docs/ingest/telemetry/index.md @@ -59,6 +59,12 @@ Prometheus is an open-source systems monitoring and alerting toolkit for collecting metrics data from applications and infrastructures. :::: +::::{grid-item-card} Rsyslog +:link: rsyslog +:link-type: ref +Rsyslog is a rocket-fast system for log processing. +:::: + ::::{grid-item-card} Telegraf :link: telegraf :link-type: ref From 2208b53ce0b01b3a17be3ebd79dc268e116249ab Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Mon, 15 Sep 2025 22:44:16 +0200 Subject: [PATCH 4/7] Naming things: Use "Metrics, telemetry, and logging data" ... after rsyslog joined the gang. --- docs/ingest/telemetry/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ingest/telemetry/index.md b/docs/ingest/telemetry/index.md index 0d835ad0..5edc61ee 100644 --- a/docs/ingest/telemetry/index.md +++ b/docs/ingest/telemetry/index.md @@ -2,7 +2,7 @@ (metrics-store)= (telemetry)= (integrate-metrics)= -# Metrics and telemetry data +# Metrics, telemetry, and logging data :::::{grid} :padding: 0 From e79d1be67d59c9ac71b6b68566244edf5539c71d Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Tue, 16 Sep 2025 18:25:16 +0200 Subject: [PATCH 5/7] rsyslog: Implement suggestions by CodeRabbit --- docs/ingest/telemetry/index.md | 6 ++-- docs/integrate/rsyslog/tutorial.md | 51 ++++++++++++++++-------------- 2 files changed, 31 insertions(+), 26 deletions(-) diff --git a/docs/ingest/telemetry/index.md b/docs/ingest/telemetry/index.md index 5edc61ee..18682b34 100644 --- a/docs/ingest/telemetry/index.md +++ b/docs/ingest/telemetry/index.md @@ -2,7 +2,7 @@ (metrics-store)= (telemetry)= (integrate-metrics)= -# Metrics, telemetry, and logging data +# Metrics, telemetry, and logs :::::{grid} :padding: 0 @@ -59,10 +59,10 @@ Prometheus is an open-source systems monitoring and alerting toolkit for collecting metrics data from applications and infrastructures. :::: -::::{grid-item-card} Rsyslog +::::{grid-item-card} rsyslog :link: rsyslog :link-type: ref -Rsyslog is a rocket-fast system for log processing. +Send logs with rsyslog, a rocket‑fast system for log processing. :::: ::::{grid-item-card} Telegraf diff --git a/docs/integrate/rsyslog/tutorial.md b/docs/integrate/rsyslog/tutorial.md index 630680ff..6001b943 100644 --- a/docs/integrate/rsyslog/tutorial.md +++ b/docs/integrate/rsyslog/tutorial.md @@ -1,30 +1,35 @@ (rsyslog-tutorial)= -# Storing server logs on CrateDB for fast search and aggregations +# Store server logs on CrateDB for fast search and aggregations ## Introduction -Did you know that CrateDB can be a great store for your server logs? +CrateDB stores server logs efficiently and makes them easy to query. -If you have been using log aggregation tools or even some of the most advanced commercial SIEM systems, you have probably experienced the same frustrations I have: +Common pain points with traditional log stacks and SIEMs include: -* timeouts when searching logs over long periods of time -* a complex and proprietary query syntax -* difficulties integrating queries on logs data into application monitoring dashboards +* timeouts when searching across long time ranges +* proprietary, complex query syntaxes +* awkward integrations with application monitoring dashboards -Storing server logs on CrateDB solves these problems, it allows to query the logs with standard SQL and from any tool supporting the PostgreSQL protocol; its unique indexing also makes full-text queries and aggregations super fast. -Let me show you an example. +CrateDB addresses these issues: query logs with standard SQL from any +PostgreSQL‑compatible tool, and use full‑text search and aggregations +backed by efficient indexes. The sections below walk through a minimal +setup. ## Setup ### CrateDB -First, we will need an instance of CrateDB, it may be best to have a dedicated cluster for this purpose, to separate the monitoring system from the systems being monitored, but for the purpose of this demo we can just have a single node cluster on a docker container: +First, start CrateDB. For production, use a dedicated cluster. For this demo, run a single‑node container: ```bash -sudo docker run -d --name cratedb --publish 4200:4200 --publish 5432:5432 --env CRATE_HEAP_SIZE=1g crate -Cdiscovery.type=single-node +sudo docker run -d --name cratedb \ + -p 4200:4200 -p 5432:5432 \ + -e CRATE_HEAP_SIZE=1g \ + crate:5.6.0 -Cdiscovery.type=single-node ``` -Next, we need a table to store the logs, let's connect to `http://localhost:4200/#!/console` and run: +Next, create a table for logs. Open `http://localhost:4200/#!/console` and run: ```sql CREATE TABLE doc.systemevents ( @@ -39,7 +44,7 @@ CREATE TABLE doc.systemevents ( SysLogTag TEXT ); ``` -Tip: if you are on a headless system you can also run queries with {ref}`command-line tools `. +Tip: On headless systems, run queries with the {ref}`command-line tools `. Then we need an account for the logging system: @@ -59,11 +64,10 @@ GRANT DML ON TABLE doc.systemevents TO rsyslog; We will use [rsyslog](https://github.com/rsyslog/rsyslog) to send the logs to CrateDB, for this setup we need `rsyslog` v8.2202 or higher and the `ompgsql` module: ```bash -sudo add-apt-repository ppa:adiscon/v8-stable -sudo apt-get update -sudo apt-get install rsyslog +sudo add-apt-repository -y ppa:adiscon/v8-stable +sudo apt-get update -y sudo debconf-set-selections <<< 'rsyslog-pgsql rsyslog-pgsql/dbconfig-install string false' -sudo apt-get install rsyslog-pgsql +sudo apt-get install -y rsyslog rsyslog-pgsql ``` Let's now configure it to use the account we created earlier: @@ -79,20 +83,20 @@ If you are interested in more advanced setups involving queuing for additional r ### MediaWiki -Now let's imagine that we want to run a container with [MediaWiki](https://www.mediawiki.org/wiki/MediaWiki) to host an intranet and we want all logs to go to CrateDB, we can just deploy this with: +To generate logs, run a [MediaWiki](https://www.mediawiki.org/wiki/MediaWiki) container and forward its logs to rsyslog: ```bash sudo docker run --name mediawiki -p 80:80 -d --log-driver syslog --log-opt syslog-address=unixgram:///dev/log mediawiki ``` -If we now point a web browser to port 80 at `http://localhost/`, you will see a new MediaWiki page. -Let's play around a bit to generate log entries, just click on "set up the wiki" and then once on Continue. -This will have generated entries in the `doc.systemevents` table with `syslogtag` matching the container id of the container running the site. +Open `http://localhost/` to see the MediaWiki setup page. +Click “set up the wiki”, then “Continue” to generate log entries. +CrateDB now stores new rows in `doc.systemevents`, with `syslogtag` matching the container ID. ## Explore -We can now use the {ref}`crate-reference:predicates_match` to find the error messages we are interested in: +Use {ref}`crate-reference:predicates_match` to find specific error messages: ```sql SELECT devicereportedtime,message @@ -110,7 +114,7 @@ ORDER BY 1 DESC; +--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` -Let's now see which log sources created the most entries: +Show the top log sources by event count: ```sql SELECT syslogtag,count(*) @@ -132,4 +136,5 @@ LIMIT 5; +----------------------+----------+ ``` -I hope you found this interesting. Please do not hesitate to let us know your thoughts in the [CrateDB Community](https://community.cratedb.com/). +We hope this was useful. Share feedback and questions in the +[CrateDB Community](https://community.cratedb.com/). From 9baef6bb89551cc85dbbf15a00d09d055886ef9e Mon Sep 17 00:00:00 2001 From: Kenneth Geisshirt Date: Wed, 17 Sep 2025 16:23:59 +0200 Subject: [PATCH 6/7] rsyslog: Implement suggestions by Kenneth --- docs/integrate/rsyslog/index.md | 2 +- docs/integrate/rsyslog/tutorial.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/integrate/rsyslog/index.md b/docs/integrate/rsyslog/index.md index 1de92f8d..76fea489 100644 --- a/docs/integrate/rsyslog/index.md +++ b/docs/integrate/rsyslog/index.md @@ -13,7 +13,7 @@ [Rsyslog] is a rocket-fast system for log processing. -It offers high-performance, advanced security features, and a modular design. +It offers high performance, advanced security features, and a modular design. Originally a regular syslogd, rsyslog has evolved into a highly versatile logging solution capable of ingesting data from numerous sources, transforming it, and outputting it to a wide variety of destinations. diff --git a/docs/integrate/rsyslog/tutorial.md b/docs/integrate/rsyslog/tutorial.md index 6001b943..8d376eb3 100644 --- a/docs/integrate/rsyslog/tutorial.md +++ b/docs/integrate/rsyslog/tutorial.md @@ -26,7 +26,7 @@ First, start CrateDB. For production, use a dedicated cluster. For this demo, ru sudo docker run -d --name cratedb \ -p 4200:4200 -p 5432:5432 \ -e CRATE_HEAP_SIZE=1g \ - crate:5.6.0 -Cdiscovery.type=single-node + crate:latest -Cdiscovery.type=single-node ``` Next, create a table for logs. Open `http://localhost:4200/#!/console` and run: @@ -65,9 +65,9 @@ We will use [rsyslog](https://github.com/rsyslog/rsyslog) to send the logs to Cr ```bash sudo add-apt-repository -y ppa:adiscon/v8-stable -sudo apt-get update -y +sudo apt update -y sudo debconf-set-selections <<< 'rsyslog-pgsql rsyslog-pgsql/dbconfig-install string false' -sudo apt-get install -y rsyslog rsyslog-pgsql +sudo apt install -y rsyslog rsyslog-pgsql ``` Let's now configure it to use the account we created earlier: From 63d9a30985d12b0634229a9495c88a301a5f20c5 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Wed, 17 Sep 2025 17:00:27 +0200 Subject: [PATCH 7/7] rsyslog: Improve/fix tutorial command walkthrough --- docs/integrate/rsyslog/tutorial.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/docs/integrate/rsyslog/tutorial.md b/docs/integrate/rsyslog/tutorial.md index 8d376eb3..f9382d64 100644 --- a/docs/integrate/rsyslog/tutorial.md +++ b/docs/integrate/rsyslog/tutorial.md @@ -29,7 +29,7 @@ sudo docker run -d --name cratedb \ crate:latest -Cdiscovery.type=single-node ``` -Next, create a table for logs. Open `http://localhost:4200/#!/console` and run: +Next, create a table for logs. Open `http://localhost:4200/#!/console` or invoke `crash` and run: ```sql CREATE TABLE doc.systemevents ( @@ -64,10 +64,11 @@ GRANT DML ON TABLE doc.systemevents TO rsyslog; We will use [rsyslog](https://github.com/rsyslog/rsyslog) to send the logs to CrateDB, for this setup we need `rsyslog` v8.2202 or higher and the `ompgsql` module: ```bash +sudo DEBIAN_FRONTEND=noninteractive apt install --yes software-properties-common sudo add-apt-repository -y ppa:adiscon/v8-stable -sudo apt update -y +sudo apt update --yes sudo debconf-set-selections <<< 'rsyslog-pgsql rsyslog-pgsql/dbconfig-install string false' -sudo apt install -y rsyslog rsyslog-pgsql +sudo apt install --yes rsyslog rsyslog-pgsql ``` Let's now configure it to use the account we created earlier: @@ -86,7 +87,11 @@ If you are interested in more advanced setups involving queuing for additional r To generate logs, run a [MediaWiki](https://www.mediawiki.org/wiki/MediaWiki) container and forward its logs to rsyslog: ```bash -sudo docker run --name mediawiki -p 80:80 -d --log-driver syslog --log-opt syslog-address=unixgram:///dev/log mediawiki +sudo docker run --name mediawiki \ + -p 80:80 -d \ + --log-driver syslog \ + --log-opt syslog-address=unixgram:///dev/log \ + mediawiki ``` Open `http://localhost/` to see the MediaWiki setup page.