A working example call (Proof of Concept) is located here: https://github.com/wagner-intevation/intelmq/blob/bot-library/intelmq/tests/lib/test_bot.py#L141
As of IntelMQ 3.1.0, IntelMQ Bots can only be started on the command line with intelmqctl
.
Most tools (including the IntelMQ API, and thus the IntelMQ Manager) use intelmqctl start
to start bot instances.
intelmqctl start
spawns a new child process and detaches it.
Only intelmqctl run
provides the ability to run bots interactively in the foreground of the command line and provides some neat features for debugging purposes.
Starting IntelMQ bots using Python code requires much of effort (and code complexity). Additionally, the bot's parameters can only be provided by modifying the IntelMQ runtime configuration file. Messages can only be fed and retrieved from the bot by connecting to the pipeline (e.g.) separately and writing/reading properly serialized messages there.
Integrating IntelMQ Bots into other (Python) tools is therefore hard to impossible in the current IntelMQ 3.1 version.
In a nutshell, calling a bot and processing should take, at most, a few lines. The following complete example shows what the procedure could look like. The bot class is instantiated, passing a few parameters.
from intelmq.bots.experts.domain_suffix.expert import DomainSuffixExpertBot
from intelmq.lib.bot import BotLibSettings
domain_suffix = DomainSuffixExpertBot('domain-suffix', # bot id
# the {} | {} syntax is available in Python >= 3.9
settings=BotLibSettings | {
'field': 'fqdn',
'suffix_file': '/usr/share/publicsuffix/public_suffix_list.dat'}
queues = domain_suffix.process_message({'source.fqdn': 'www.example.com'})
# Select the output queue (as defined in `destination_queues`), first message, access the field 'source.domain_suffix':
# >>> output['output'][0]['source.domain_suffix']
# 'com'
Any IntelMQ-related or third-party program may use IntelMQ's most potent components - IntelMQ's bots.
The full potential shows off when stacking multiple bots together and iterating over lots of data:
# instantiate all bots first, for an example see above
domain_suffix = DomainSuffixExpertBot(...)
url2fqdn = Url2fqdnExpertBot(...)
http_status = HttpstatusExpertBot(...)
tuency = TuencyExpertBot(...)
lookyloo = LookylooExpertBot(...)
# a list of input messages
messages = [{...}]
for message in message:
for bot in (domain_suffix,
url2fqdn,
http_status,
tuency,
lookyloo):
# for simiplicity we assume that the bots always send one message
message = bot.process_message(message)['output'][0]
# message now has the cumulated data of five bots
# messages now is a list of output messages
The IntelMQ webinput can show previews of the processed data to the operator, not just the input data, adding much more value to the preview functionality. Currently the preview gives the operator feedback on the parsing step. The further processing of the data by the bots is invisible to the operator. This causes confusion and uncertainty for the operators.
The Webinput backend can call the bots and process the events, without any interference to the running bot processes, pipelines and bot management. The data flow illustrated:
Data provided by operator -> webinput backend parser -> IntelMQ bots as configured in the webinput configuration -> preview shown to operator
The implementation details for the webinput are not part of this proposal document.
In the next step, the webinput can also show previews of notifications (e.g. Emails). This is also not part of this proposal document.
Data provided by operator -> webinput backend parser -> IntelMQ bots as configured in the webinput configuration -> notification tool (preview mode) -> notification preview shown to operator
Providing input messages as function parameters and receiving output messages should be possible.
In this case, messages should not be serialized or encoded, they should stay Message objects (derived from dict
and behaving like dictionaries).
It should also be possible to let the bot use the configured pipeline (e.g. redis) and behave like a normal bot.
An exception in the bot's process()
method should not be caught in intermediate layers and raised to the caller's function call.
Option: If there is a helper function to call process()
multiple times (having a bunch of input messages), the exceptions are caught together with the (dumped) messages, accessible to the caller.
The global IntelMQ configuration should be effective. The user may override configuration options by providing a configuration dictionary.
It should be possible to run bots defined in IntelMQ's runtime configuration file. Additional overriding parameters can be provided.
It should be possible to run bots, which are not defined in IntelMQ's runtime configuration file. The bot configuration is provided as function parameter.
Normally bots react to signals like SIGTERM, SIGHUP and SIGINT to treat them specially.
In library mode, this would interfere with the signal handling of the calling code.
Thus, IntelMQ bots called as library must not manipulate the signal handling or call sys.exit
.
By default, IntelMQ logs to /var/log/intelmq/
or /opt/intelmq/var/log
, respectively - depending on the installation type and actual configuration.
When IntelMQ bots are called by other external scripts as library, the logging is in most cases not wanted and causes permission errors.
On the other hand, the logging might as well be fine.
There should be an easy way to disable the file-logging, while keeping the possibility to use the default behavior.
IntelMQ in library-mode must not depend on existing IntelMQ configuration files or logging directories, but be able to behave as IntelMQ normally do. It is up to the user to decide the behavior, e.g. if the log of the bot should be written to files.
IntelMQ normally loads the runtime configuration file /etc/intelmq/runtime.yaml
or /opt/intelmq/etc/runtime.yaml
.
In library-mode, IntelMQ tries to load the file, but does continue normally if it does not exist.
IntelMQ normally loads the harmonization configuration file /etc/intelmq/harmonization.yaml
or /opt/intelmq/etc/harmonization.yaml
.
In library-mode, IntelMQ tries to load the file, and if it does not exist, loads the internal default harmonization configuration, which is part of the IntelMQ packages.
Since the beginning of IntelMQ, the bot's process
methods use the methods self.receive_message
, self.acknowledge_message
and self.send_message
.
Breaking this paradigm and changing to method parameters and return values or generator yields would indicate an API change and thus lead to IntelMQ version 4.0.
Thus, we stick to the current behavior.
Only changes in the intelmq.lib.bot.Bot
class are needed.
No changes in the bots' code are required.
The operator constructs the bot by initializing the bot's class. Global and bot configuration parameters are provided as parameter to the constructor in the same format as IntelMQ runtime configuration.
class Bot:
def __init__(bot_id: str,
*args, **kwargs, # any other paramters left out for clarity
settings: Optional[dict] = None)
After reading the runtime configuration file, the constructor applies all values of the settings
parameter.
The intelmq.lib.bot.Bot
class gets a new method process_message
.
The definition:
class Bot:
def process_message(message: Optional[intelmq.lib.message.Message] = None):
For collectors:
It takes no messages as input and returns a list of messages.
For parsers, experts and outputs:
It takes exactly one message as input and returns a list of messages.
The messages are neither serialized nor encoded in any form, but are objects
of the intelmq.lib.message.Message
class. If the message is of instance a dict
(with or without __type
item), it will be automatically converted to the appropriate
Message object (Report
or Event
, depending on the Bot type).
Return value is a list of messages sent by the bot. No exceptions of the bot are caught, the caller should handle them according to their needs. The bot does not dump any messages to files on errors, irrelevant of the bot's dumping configuration.
As bots can send messages to multiple queues, the return value is a dictionary of all destination queues. The items are lists, holding the sent messages.
This is a more complex situation in regards to error handling. Should one exception stop the processing? Should the processing continue and the exceptions be saved in a variable that is returned at the end with the sent messages?
from intelmq.bots.experts.domain_suffix.expert import DomainSuffixExpertBot
from intelmq.lib.bot import BotLibSettings
domain_suffix = DomainSuffixExpertBot('domain-suffix', # bot id
settings=BotLibSettings | {
'field': 'fqdn',
'suffix_file': '/usr/share/publicsuffix/public_suffix_list.dat'}
queues = domain_suffix.process_message({'source.fqdn': 'www.example.com'})
# Select the output queue (as defined in `destination_queues`), first message, access the field 'source.domain_suffix':
# >>> output['output'][0]['source.domain_suffix']
# 'com'
from intelmq.lib.bot import BotLibSettings
EXAMPLE_REPORT = {"feed.url": "http://www.example.com/",
"time.observation": "2015-08-11T13:03:40+00:00",
"raw": utils.base64_encode(RAW),
"__type": "Report",
"feed.name": "Example"}
bot = test_parser_bot.DummyParserBot('dummy-bot', settings=BotLibSettings |
{'destination_queues': {'_default': 'output',
'_on_error': 'error'}})
sent_messages = bot.process_message(EXAMPLE_REPORT)
# sent_messages is now a dict with all queues. queue names below are examples
# this is the output queue
assert sent_messages['output'][0] == MessageFactory.from_dict(test_parser_bot.EXAMPLE_EVENT)
# this is a dumped message
assert sent_messages['error'][0] == input_message