certat
diff --git a/‎.gitignore
+33 b/‎.gitignore
+33
diff --git a/‎LICENSE
+661 b/‎LICENSE
+661
diff --git a/‎README.md
+107 b/‎README.md
+107
diff --git a/‎architecture-overview-stats-portal-screen.png
189 KB b/‎architecture-overview-stats-portal-screen.png
189 KB
diff --git a/‎architecture-overview-stats-portal.png
229 KB b/‎architecture-overview-stats-portal.png
229 KB
diff --git a/‎docs/Screenshot 2019-03-18 at 13.06.42.png
377 KB b/‎docs/Screenshot 2019-03-18 at 13.06.42.png
377 KB
diff --git a/‎docs/Screenshot 2019-03-18 at 13.10.48.png
548 KB b/‎docs/Screenshot 2019-03-18 at 13.10.48.png
548 KB
diff --git a/‎docs/Screenshot 2019-03-18 at 13.11.20.png
515 KB b/‎docs/Screenshot 2019-03-18 at 13.11.20.png
515 KB
diff --git a/‎docs/Screenshot 2019-03-18 at 13.11.57.png
541 KB b/‎docs/Screenshot 2019-03-18 at 13.11.57.png
541 KB
diff --git a/‎docs/Screenshot 2019-03-18 at 13.12.19.png
511 KB b/‎docs/Screenshot 2019-03-18 at 13.12.19.png
511 KB
diff --git a/‎docs/arch-overview.png
135 KB b/‎docs/arch-overview.png
135 KB
diff --git a/‎docs/arch-overview.pptx
38.9 KB b/‎docs/arch-overview.pptx
38.9 KB
diff --git a/‎docs/arch-overview_small.png
135 KB b/‎docs/arch-overview_small.png
135 KB
diff --git a/‎fundedby.png
114 KB b/‎fundedby.png
114 KB
diff --git a/‎src/README.md
+15 b/‎src/README.md
+15
diff --git a/‎src/agg_tables.txt
+2 b/‎src/agg_tables.txt
+2
diff --git a/‎src/cleanup.sh
+7 b/‎src/cleanup.sh
+7
diff --git a/‎src/crontab
+2 b/‎src/crontab
+2
diff --git a/‎src/data/README.md
+7 b/‎src/data/README.md
+7
diff --git a/‎src/data/asns.sql
+59 b/‎src/data/asns.sql
+59
diff --git a/‎src/data/identifier.sql
+73 b/‎src/data/identifier.sql
+73
diff --git a/‎src/data/taxonomy.sql
+54 b/‎src/data/taxonomy.sql
+54
@@ -0,0 +1,33 @@
+*.bak
+*.swp
+*.pyc
+*.log
+*.*~
+*.mmdb
+*.profile
+.profile
+intelmq.egg-info
+build
+dist
+*.old
+.vagrant/
+*~
+.coverage
+.idea/
+htmlcov/
+*.pem
+*.key
+.eggs
+
+# Debian build filed
+debian/files
+debian/intelmq.postinst.debhelper
+debian/intelmq.prerm.debhelper
+debian/intelmq.substvars
+debian/intelmq/
+/.pc/*
+
+# custom stuff for this repo
+scratch/*
+*/~$*.pptx
+src/data/*.sql
@@ -0,0 +1,107 @@
+# Stats portal
+
+The stats portal is a component in the certtools series. It connectes to the [eventDB](https://github.com/certtools/intelmq/blob/develop/intelmq/bin/intelmq_psql_initdb.py) (the database of all incident events which got processed by IntelMQ). The following picture explains the place of the stats-portal within the certtools components:
+
+![architecture-overview-stats-portal](architecture-overview-stats-portal-screen.png)
+
+
+The stats portal is the presentation layer for the aggregation tables.
+However, this code repository also contains the scripts to create the aggregation tables on periodic basis (for example via cron-jobs).
+
+# Overview of this source code repository
+
+This repo contains two parts:
+
+  1. the [code](src/) to aggregate the eventDB  (creating aggregation tables)
+  2. the [Grafana dashboard](src/grafana) which allows you to connect to the aggregation tables
+
+
+# How to install
+
+## Aggregation tables
+
+Take a look at the [cronjob example](src/crontab) and run the shell scripts on the eventDB server.
+The cron job creates aggregation tables, dumps them, pushes them to the server which will host the aggregation tables, loads them into postgresql there, transforms them to a timescaleDB format and cleans up some permissions. If you want to read the source code of this, please start by reading the (trivial) [make_all.sh](src/make_all.sh)
+
+Assumptions:
+
+We will call the host serving the aggregation tables "stats server".
+We will call the host doing the aggregation tables "eventDB server".
+
+The stats server creates the visual graphs via Grafana which pulls its data from the (timescale) aggregation tables.
+Please note that it assumes a postgresql user "statsro" on the stats server (Grafana needs this)
+Furthermore, we assume an (unix) ssh user "stats-sync" on both servers. ssh keys are used to push the data from the eventDB server to the stats server.
+
+
+## Grafana dashboard
+
+For the Grafana dashboard (on the stats server), please see the [README](src/grafana/README.md) in the grafana directory.
+
+
+# General topics
+
+
+## How to count correctly?
+
+Alas, this question is anything but trivial. 
+First, let's settle on a couple of definitions:
+
+* **Measurement**: some observation of the Internet which results in multiple events. Measurements usually are [internet-wide scans](https://en.wikipedia.org/wiki/Port_scanner) or [sinkholes](https://en.wikipedia.org/wiki/DNS_sinkhole) of botnets or [honeypot](https://en.wikipedia.org/wiki/Honeypot_(computing)) events.
+* **Feed**:  some data feed usually consisting of usually multiple rows ("events") which **must** always contain a time stamp (with time zone) and an IP address or a hostname / url.
+* **Event**: a row in the feed.
+* **EventDB**: the pre-processed (via IntelMQ or some other Extract Transform Load (ETL) tool) events are stored in a database, the eventDB.
+* **Aggregation**: the events in the EventDB are grouped by some criteria and counted. 
+* **Constituency portal**: a database ("contactDB") of a CERTs constituency. Usually contains contact information (email addresses, PGP keys, etc) for the constituency's security team. But often also contains relevant network information ("net objects") such as ASN, netblocks etc. Given the netobjects, each network operator can gain insight into his/her statistics for a specific ASN or some netblocks.
+* **ASN**: Autonomous System Number. See [wikipedia:ASN](https://en.wikipedia.org/wiki/Autonomous_system_(Internet))
+
+Now that we defined some terms, we can discuss how to count correctly, in other words: how to aggregate events properly. First of all, there is no "correct". We can only argue that certain ways of counting make more sense than others for specific questions. Secondly, the internet is a dynamic space. Effects such as [dynamic IP addresses](https://en.wikipedia.org/wiki/IP_address#Dynamic_IP) or [NAT](https://en.wikipedia.org/wiki/Network_address_translation) complicate the any measurement. Next to NAT and dynamic IPs, we also have the effect that answers to a measurement might be faked (especially with UDP based scans), packets might be dropped (for example with high scanning speeds) or the effect which is to be measured changes state during the measurement (a server gets turned off, a firewall rule triggers on the scan, etc.)
+So, in other words: measuring is hard. We always must keep in mind that the underlying data of the measurement might be biased or skewed somehow. However, most of the time, a national CERT will take the data feeds by feed providers (such as shadowserver) and simply treat it as indication (a possibility) of an event and pass it on to the network operator as a hint that something might be wrong. So, in a sense, the feed is treated as ground truth from the national CERT perspective.
+
+Understanding and aggregating the measurement is the second hard problem which is directly connected with the way we pose a question to the measurement data set (the eventDB).
+
+In our case, we are mostly interested in the following questions:
+
+* what are the trends over time for:
+  * malware infections
+  * types of problems ([classification.taxonomy](https://www.enisa.europa.eu/publications/reference-incident-classification-taxonomy), classification.type, classification.identifier)
+  * involved applications and protocols (for example UDP amplifiers)
+  * specific ASNs and network operators (how well is a network operator cleaning up his part of the Internet)
+  * feed providers (internal metric): are we receiving a a constant feed or are there big variations within a feed provider?
+  * number of events per day (overall)
+  * number of unique (distinct) IP addresses 
+
+
+Use cases
+-----------
+
+As can be seen above, one can vary the time interval (we will settle on one day), the way of counting (do we count number of events or number of distinct IP addresses?). 
+Given the assumption that we take the feed provider's data as ground truth, we will try to answer the following questions:
+
+### Internal view (national CSIRT's overview page)
+
+* how many events per day per taxonomy (taxonomy, type, identifier) exist?
+* how many events are we getting per ASN per feed provider?
+* what are the top-N ASNs per taxonomy (taxonomy, type, identifier)?
+* where (ASN) are we seeing steady increases of problems (versus the regularl slow decline of issues)?
+* what are the top-N problems in our country?
+
+### Network operator's perspective
+
+* How many events per day are there in my ASN? (--> trendline)
+* How many events per day per taxonomy are there in my ASN? (--> trendline)
+* If settling on specific taxonomy, how many types are there in my ASN? (--> trendline)
+* What malware infections exist in my ASN? (trendline)
+* What are the trends for vulnerabilities in my ASN? (trendline)
+
+
+These questions are being answered by the stats portal.
+
+
+  
+Funded by
+=========
+
+This project was partially funded by the CEF framework
+![cef logo](fundedby.png)
+
+
@@ -0,0 +1,15 @@
+# Server backend side
+
+We assume that the server backend hosts a postgreql eventDB. See [the corresponding IntelMQ output bot](https://github.com/certtools/intelmq/tree/develop/intelmq/bots/outputs/postgresql) as well as the scripts to [create the eventDB structure](https://github.com/certtools/intelmq/blob/develop/intelmq/bin/intelmq_psql_initdb.py).
+
+From the eventDB, we can create aggregation tables and copy them over to a separate server.
+There, we convert the aggregation tables to [timescaleDB](https://www.timescale.com/) in order to speed up time window based searches.
+Have a look at the [make_all.sh](make_all.sh) script.
+
+The constituency portal source code resides in [this repo](https://github.com/certat/do-portal)
+
+The whole process can be seen in the architecture sketch below:
+
+![architecture](../docs/arch-overview_small.png)
+
+
@@ -0,0 +1,2 @@
+agg_ndim_day_all_tags 
+agg_ndim_day_netobject_tags
@@ -0,0 +1,7 @@
+#!/bin/bash
+
+olddate=$( date --iso-8601 -d '3 days ago')
+olddatadir="/home/stats-sync/data/${olddate}_dump_agg_tables"
+
+# delete old stuff
+rm -rf $olddatadir
@@ -0,0 +1,2 @@
+# m h  dom mon dow   command
+03  04 *   *   *     ( cd $HOME; date; ./make_all.sh ; date )
@@ -0,0 +1,7 @@
+# data directory
+
+The data/ dir is used for transferring data between the eventDB server and the stats server.
+The files residing in this directory in the repository are meant as an example on how the asn, identifier, types and taxonomy tables look like.
+They get generated by the make_all.sh script however.
+
+
@@ -0,0 +1,59 @@
+--
+-- PostgreSQL database dump
+--
+
+-- Dumped from database version 9.5.16
+-- Dumped by pg_dump version 9.5.16
+
+SET statement_timeout = 0;
+SET lock_timeout = 0;
+SET client_encoding = 'UTF8';
+SET standard_conforming_strings = on;
+SELECT pg_catalog.set_config('search_path', '', false);
+SET check_function_bodies = false;
+SET client_min_messages = warning;
+SET row_security = off;
+
+SET default_tablespace = '';
+
+SET default_with_oids = false;
+
+--
+-- Name: asns; Type: TABLE; Schema: public; Owner: stats-sync
+--
+
+CREATE TABLE public.asns (
+    "source.asn" integer
+);
+
+
+ALTER TABLE public.asns OWNER TO "stats-sync";
+
+--
+-- Data for Name: asns; Type: TABLE DATA; Schema: public; Owner: stats-sync
+--
+
+--- NOTE NOTE NOTE
+--- This is a list of all your ASNs in your country. We intentionally removed most so that you can simply  
+--- see what you would need to put here.
+---
+--- Please note that you can get a list of all your ASNs in your country via stat.ripe.net:
+---
+---    https://stat.ripe.net/docs/data_api#country-asns
+--- 
+--- please fill in your own country list this way
+
+COPY public.asns ("source.asn") FROM stdin;
+2025
+2033
+2036
+2047
+2055
+\N
+\.
+
+
+--
+-- PostgreSQL database dump complete
+--
+
@@ -0,0 +1,73 @@
+--
+-- PostgreSQL database dump
+--
+
+-- Dumped from database version 9.5.16
+-- Dumped by pg_dump version 9.5.16
+
+SET statement_timeout = 0;
+SET lock_timeout = 0;
+SET client_encoding = 'UTF8';
+SET standard_conforming_strings = on;
+SELECT pg_catalog.set_config('search_path', '', false);
+SET check_function_bodies = false;
+SET client_min_messages = warning;
+SET row_security = off;
+
+SET default_tablespace = '';
+
+SET default_with_oids = false;
+
+--
+-- Name: identifier; Type: TABLE; Schema: public; Owner: stats-sync
+--
+
+CREATE TABLE public.identifier (
+    "classification.identifier" text
+);
+
+
+ALTER TABLE public.identifier OWNER TO "stats-sync";
+
+--
+-- Data for Name: identifier; Type: TABLE DATA; Schema: public; Owner: stats-sync
+--
+
+COPY public.identifier ("classification.identifier") FROM stdin;
+neshta
+openproxy
+open-chargen
+ghostpush
+gernidru
+tinynuke
+wordpress-vulnerabilities
+monerominer
+feodo
+accessible-sewage-plant
+wannacry
+sniperspy
+dresscode
+ntp-version
+smtpauth
+open-natpmp
+wpad
+androidlocker
+locky
+neutrino
+malware-generic
+apt-generic
+spam
+spamlink
+parama
+mail-password-leak
+unknownrat
+wordpress-login
+zeroaccess
+if-you-read-this-far-the-full-list-of-identifiers-are-available-from-cert-at-upon-request-by-friendly-other-CERTs
+\.
+
+
+--
+-- PostgreSQL database dump complete
+--
+
@@ -0,0 +1,54 @@
+--
+-- PostgreSQL database dump
+--
+
+-- Dumped from database version 9.5.16
+-- Dumped by pg_dump version 9.5.16
+
+SET statement_timeout = 0;
+SET lock_timeout = 0;
+SET client_encoding = 'UTF8';
+SET standard_conforming_strings = on;
+SELECT pg_catalog.set_config('search_path', '', false);
+SET check_function_bodies = false;
+SET client_min_messages = warning;
+SET row_security = off;
+
+SET default_tablespace = '';
+
+SET default_with_oids = false;
+
+--
+-- Name: taxonomy; Type: TABLE; Schema: public; Owner: stats-sync
+--
+
+CREATE TABLE public.taxonomy (
+    "classification.taxonomy" text
+);
+
+
+ALTER TABLE public.taxonomy OWNER TO "stats-sync";
+
+--
+-- Data for Name: taxonomy; Type: TABLE DATA; Schema: public; Owner: stats-sync
+--
+
+COPY public.taxonomy ("classification.taxonomy") FROM stdin;
+abusive content
+availability
+fraud
+information content security
+information gathering
+intrusion attempts
+intrusions
+malicious code
+other
+test
+vulnerable
+\.
+
+
+--
+-- PostgreSQL database dump complete
+--
+
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+agg_ndim_day_all_tags`
	`2`	`+agg_ndim_day_netobject_tags`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+# m h dom mon dow command`
	`2`	`+03 04 * * * ( cd $HOME; date; ./make_all.sh ; date )`