Skip to content

Latest commit

 

History

History

007

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

IEP007 - Running IntelMQ bots as Python Library

A working example call (Proof of Concept) is located here: https://github.com/wagner-intevation/intelmq/blob/bot-library/intelmq/tests/lib/test_bot.py#L141

Background

As of IntelMQ 3.1.0, IntelMQ Bots can only be started on the command line with intelmqctl. Most tools (including the IntelMQ API, and thus the IntelMQ Manager) use intelmqctl start to start bot instances. intelmqctl start spawns a new child process and detaches it.

Only intelmqctl run provides the ability to run bots interactively in the foreground of the command line and provides some neat features for debugging purposes.

Starting IntelMQ bots using Python code requires much of effort (and code complexity). Additionally, the bot's parameters can only be provided by modifying the IntelMQ runtime configuration file. Messages can only be fed and retrieved from the bot by connecting to the pipeline (e.g.) separately and writing/reading properly serialized messages there.

Integrating IntelMQ Bots into other (Python) tools is therefore hard to impossible in the current IntelMQ 3.1 version.

In a nutshell, calling a bot and processing should take, at most, a few lines. The following complete example shows what the procedure could look like. The bot class is instantiated, passing a few parameters.

from intelmq.bots.experts.domain_suffix.expert import DomainSuffixExpertBot
from intelmq.lib.bot import BotLibSettings
domain_suffix = DomainSuffixExpertBot('domain-suffix',  # bot id
                                      # the {} | {} syntax is available in Python >= 3.9
                                      settings=BotLibSettings | {
                                               'field': 'fqdn',
                                               'suffix_file': '/usr/share/publicsuffix/public_suffix_list.dat'}
queues = domain_suffix.process_message({'source.fqdn': 'www.example.com'})
# Select the output queue (as defined in `destination_queues`), first message, access the field 'source.domain_suffix':
# >>> output['output'][0]['source.domain_suffix']
# 'com'

Use cases

General

Any IntelMQ-related or third-party program may use IntelMQ's most potent components - IntelMQ's bots.

The full potential shows off when stacking multiple bots together and iterating over lots of data:

# instantiate all bots first, for an example see above
domain_suffix = DomainSuffixExpertBot(...)
url2fqdn = Url2fqdnExpertBot(...)
http_status = HttpstatusExpertBot(...)
tuency = TuencyExpertBot(...)
lookyloo = LookylooExpertBot(...)

# a list of input messages
messages = [{...}]

for message in message:
    for bot in (domain_suffix,
                url2fqdn,
                http_status,
                tuency,
                lookyloo):
        # for simiplicity we assume that the bots always send one message
        message = bot.process_message(message)['output'][0]
    # message now has the cumulated data of five bots

# messages now is a list of output messages

IntelMQ Webinput Preview

The IntelMQ webinput can show previews of the processed data to the operator, not just the input data, adding much more value to the preview functionality. Currently the preview gives the operator feedback on the parsing step. The further processing of the data by the bots is invisible to the operator. This causes confusion and uncertainty for the operators.

The Webinput backend can call the bots and process the events, without any interference to the running bot processes, pipelines and bot management. The data flow illustrated:

Data provided by operator -> webinput backend parser -> IntelMQ bots as configured in the webinput configuration -> preview shown to operator

The implementation details for the webinput are not part of this proposal document.

In the next step, the webinput can also show previews of notifications (e.g. Emails). This is also not part of this proposal document.

Data provided by operator -> webinput backend parser -> IntelMQ bots as configured in the webinput configuration -> notification tool (preview mode) -> notification preview shown to operator

Requirements

Messages and Pipeline

Providing input messages as function parameters and receiving output messages should be possible. In this case, messages should not be serialized or encoded, they should stay Message objects (derived from dict and behaving like dictionaries).

It should also be possible to let the bot use the configured pipeline (e.g. redis) and behave like a normal bot.

Exceptions and dumped messages

An exception in the bot's process() method should not be caught in intermediate layers and raised to the caller's function call.

Option: If there is a helper function to call process() multiple times (having a bunch of input messages), the exceptions are caught together with the (dumped) messages, accessible to the caller.

Parameters and Configuration

The global IntelMQ configuration should be effective. The user may override configuration options by providing a configuration dictionary.

Pre-configured bots

It should be possible to run bots defined in IntelMQ's runtime configuration file. Additional overriding parameters can be provided.

Un-Configured bots

It should be possible to run bots, which are not defined in IntelMQ's runtime configuration file. The bot configuration is provided as function parameter.

Signals

Normally bots react to signals like SIGTERM, SIGHUP and SIGINT to treat them specially. In library mode, this would interfere with the signal handling of the calling code. Thus, IntelMQ bots called as library must not manipulate the signal handling or call sys.exit.

Logging

By default, IntelMQ logs to /var/log/intelmq/ or /opt/intelmq/var/log, respectively - depending on the installation type and actual configuration. When IntelMQ bots are called by other external scripts as library, the logging is in most cases not wanted and causes permission errors. On the other hand, the logging might as well be fine. There should be an easy way to disable the file-logging, while keeping the possibility to use the default behavior.

Dependency on configuration files

IntelMQ in library-mode must not depend on existing IntelMQ configuration files or logging directories, but be able to behave as IntelMQ normally do. It is up to the user to decide the behavior, e.g. if the log of the bot should be written to files.

IntelMQ normally loads the runtime configuration file /etc/intelmq/runtime.yaml or /opt/intelmq/etc/runtime.yaml. In library-mode, IntelMQ tries to load the file, but does continue normally if it does not exist.

IntelMQ normally loads the harmonization configuration file /etc/intelmq/harmonization.yaml or /opt/intelmq/etc/harmonization.yaml. In library-mode, IntelMQ tries to load the file, and if it does not exist, loads the internal default harmonization configuration, which is part of the IntelMQ packages.

Rationales

Compatibility

Since the beginning of IntelMQ, the bot's process methods use the methods self.receive_message, self.acknowledge_message and self.send_message. Breaking this paradigm and changing to method parameters and return values or generator yields would indicate an API change and thus lead to IntelMQ version 4.0. Thus, we stick to the current behavior.

Specification

Only changes in the intelmq.lib.bot.Bot class are needed. No changes in the bots' code are required.

Bot constructor

The operator constructs the bot by initializing the bot's class. Global and bot configuration parameters are provided as parameter to the constructor in the same format as IntelMQ runtime configuration.

class Bot:
    def __init__(bot_id: str,
                 *args, **kwargs,  # any other paramters left out for clarity
                 settings: Optional[dict] = None)

After reading the runtime configuration file, the constructor applies all values of the settings parameter.

Method call

The intelmq.lib.bot.Bot class gets a new method process_message. The definition:

class Bot:
    def process_message(message: Optional[intelmq.lib.message.Message] = None):

For collectors: It takes no messages as input and returns a list of messages. For parsers, experts and outputs: It takes exactly one message as input and returns a list of messages. The messages are neither serialized nor encoded in any form, but are objects of the intelmq.lib.message.Message class. If the message is of instance a dict (with or without __type item), it will be automatically converted to the appropriate Message object (Report or Event, depending on the Bot type).

Return value is a list of messages sent by the bot. No exceptions of the bot are caught, the caller should handle them according to their needs. The bot does not dump any messages to files on errors, irrelevant of the bot's dumping configuration.

As bots can send messages to multiple queues, the return value is a dictionary of all destination queues. The items are lists, holding the sent messages.

Option: Processing multiple messages at once

This is a more complex situation in regards to error handling. Should one exception stop the processing? Should the processing continue and the exceptions be saved in a variable that is returned at the end with the sent messages?

Examples

Domain Suffix Expert Example

from intelmq.bots.experts.domain_suffix.expert import DomainSuffixExpertBot
from intelmq.lib.bot import BotLibSettings
domain_suffix = DomainSuffixExpertBot('domain-suffix',  # bot id
                                      settings=BotLibSettings | {
                                               'field': 'fqdn',
                                               'suffix_file': '/usr/share/publicsuffix/public_suffix_list.dat'}
queues = domain_suffix.process_message({'source.fqdn': 'www.example.com'})
# Select the output queue (as defined in `destination_queues`), first message, access the field 'source.domain_suffix':
# >>> output['output'][0]['source.domain_suffix']
# 'com'

Accessing queues

from intelmq.lib.bot import BotLibSettings

EXAMPLE_REPORT = {"feed.url": "http://www.example.com/",
                  "time.observation": "2015-08-11T13:03:40+00:00",
                  "raw": utils.base64_encode(RAW),
                  "__type": "Report",
                  "feed.name": "Example"}

bot = test_parser_bot.DummyParserBot('dummy-bot', settings=BotLibSettings |
                                                           {'destination_queues': {'_default': 'output',
                                                                                   '_on_error': 'error'}})

sent_messages = bot.process_message(EXAMPLE_REPORT)
# sent_messages is now a dict with all queues. queue names below are examples

# this is the output queue
assert sent_messages['output'][0] == MessageFactory.from_dict(test_parser_bot.EXAMPLE_EVENT)
# this is a dumped message
assert sent_messages['error'][0] == input_message