ENH: Caching, advanced mapping and separating events for MISP Feed output bot #2509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

kamil-certat wants to merge 17 commits into certtools:develop from kamil-certat:misp_feed_mapping

Contributor

kamil-certat commented Jul 4, 2024 •

edited by sebix

Loading

MISP Feed output bot got new features:

caching incoming messages to not re-generate on every run;
an ability to select which fields should be mapped and set additional parameters for attributes;
an ability to group messages in different MISP events, based on a field.

The bot is fully backward-compatible. By default, the previous behaviour is kept.

In addition, code related to Python 3.6 was removed and the message library was fixed not to modify the original dict instance.

This PR replaces on PR #2505.

kamil-certat requested review from sebix and aaronkaplan

July 4, 2024 14:45

kamil-certat force-pushed the misp_feed_mapping branch 3 times, most recently from b4f2e68 to 0e0b949 Compare

July 4, 2024 14:54

kamil-certat added the component: bots label

Contributor Author

kamil-certat commented Jul 8, 2024

Yeah, finally green 🎉

Member

sebix commented Jul 8, 2024

Are you using it in production?

Contributor Author

kamil-certat commented Jul 8, 2024

We started using it on staging, found some pain points, and now I'll test it & hopefully promote to prod in a few days - so, not yet, but soon ;)

Member

sebix commented Jul 9, 2024

There's not much I can check here without reading the MISP code and docs and setting up an instance.

Maybe @Rafiot, can you have a glimpse?

sebix removed their request for review

July 9, 2024 07:36

aaronkaplan requested changes

View reviewed changes

Member

aaronkaplan left a comment

minor changes requested pls.

docs/user/bots.md

		@@ -4595,6 +4595,44 @@ The PyMISP library >= 2.4.119.1 is required, see
		() The output bot creates one event per each interval, all data in this time frame is part of this event. Default "1
		hour", string.

Member

aaronkaplan Jul 9, 2024

what happens when set to 0 ?

Contributor Author

kamil-certat Jul 16, 2024

I've looked in the code, and it looks like in such case, this would be a ValueError. However, you can use "0 hours" what should result in creating a new MISP event for every IntelMQ event.

docs/user/bots.md Show resolved Hide resolved

docs/user/bots.md Outdated

+              (optional, string): If set to a field name from IntelMQ event, the bot will group incoming messages
+              in separated MISP events, based on the value of this field. The `interval_event` parameter acts
+              for all grouping events together.

Member

aaronkaplan Jul 9, 2024

this is not 100% clear for me yet:

will it collect events until (... until when?...) and then sort and group by this viel?
I think rephrasing this slightly makes it clearer in the documentaiton.

Contributor Author

kamil-certat Jul 9, 2024

What do you think about the current version?

will it collect events until (... until when?...) and then sort and group by this viel?

This field is responsible only for separating using the field. Collecting & holding is a different setting, that can be used together, but you don't have to. Let's have a look at the new description

intelmq/bots/outputs/misp/output_feed.py

-                  output_dir: str = "/opt/intelmq/var/lib/bots/mispfeed-output"  # TODO: should be path
+                  output_dir: str = (
+                      "/opt/intelmq/var/lib/bots/mispfeed-output"  # TODO: should be path
+                  )

Member

aaronkaplan Jul 9, 2024

why the tuple?

Contributor Author

kamil-certat Jul 9, 2024

It's not a tuple ;) It's a Python way to put the string into next line (or split into multiple strings) to avoid line-length limit - although it's my fault. The rules we enforce in the repository are very weak, so weak that my almost every time wants tot reformat a lot, and sometimes I forget reverting changes not related to my code.

BTW, I think more and more that we would profit from enforcing black/ruff instead of weak pycodestyle.

Member

aaronkaplan Jul 29, 2024

I am also using black more and more...

intelmq/bots/outputs/misp/output_feed.py Outdated Show resolved Hide resolved

intelmq/bots/outputs/misp/output_feed.py

+                      else:
+                          self._custom_mapping(obj, message)
+                  def _default_mapping(self, obj: "MISPObject", message: dict):

Member

aaronkaplan Jul 9, 2024

docstring?

Contributor Author

kamil-certat Jul 9, 2024

Disagree it's needed here.

intelmq/bots/outputs/misp/output_feed.py Show resolved Hide resolved

intelmq/bots/outputs/misp/output_feed.py Outdated


		feed_output = self.current_event.to_feed(with_meta=False)
		def _custom_mapping(self, obj: "MISPObject", message: dict):

Member

aaronkaplan Jul 9, 2024

same

Contributor Author

kamil-certat Jul 9, 2024

Disagree it's needed here.

intelmq/bots/outputs/misp/output_feed.py Outdated

-                      with self.current_file.open('w') as f:
-                          json.dump(feed_output, f)
+                  def _generate_feed(self, message: dict = None):

Member

aaronkaplan Jul 9, 2024

same

Contributor Author

kamil-certat Jul 9, 2024

Disagree it's needed here.

intelmq/lib/mixins/cache.py Outdated Show resolved Hide resolved

Member

aaronkaplan commented Jul 9, 2024

My main request is:

give the possibility (or document it) how to add default tags to events (such as "FEED:Source=intelmq") to any event which gets generated via this feed. Adding a default tag helps in sorting out automatically added events and attributes in case something goes wrong.
make sure the users understand in the documentation how feeds work. Linking to the MISP documentation and what each parameter when including the MISP feed actually means.
For example, did you know that if you set "caching:enabled" in a MISP feed, then that means the feed events and attributes don't get added to the events (and attributes) table in MISP.
It's totally non-intuitive.
Why am I saying this? I had to figure it out the hard way myself.
basically we need a tutorial on docs.intelmq.org on how to run this properly!
It's a very cool feature and addition but you can quickly shoot yourself in the foot if you don't know how it works and what every parameter in MISP means.

Contributor Author

kamil-certat commented Jul 9, 2024

Thanks for the review!

re 1: It's a good idea, I'll look into it. I was concentrated on what we actually need, and I do think there are more things we can improve in the bot.
re 2: It's a good point, I will point to the MISP documentation. I also had to explore what we really can do, the PyMISP documentation wasn't really helpful :D
re 3: yeah, I agree it needs a tutorial (and I've planned it), but it's a separated thing, I won't include it here. My plan is to write a blog post once we establish the final integration internally, and then we could include it in the documentation as well. I expect to publish it around September, hopefully. There is a queue of what I'd like to write about, and a limited time ;)

kamil-certat added 14 commits

July 16, 2024 14:24


          ENH: Add possibility to delay generating MISP Feed

5caa3e0

Generating MISP feed on every incoming message slows down processing.
The new config option let us decide to save them in batches. Cached
events are stored in a cache list in Redis. In addition, a code
related to Python 3.6 was removed as we do not support this version
any more.


          Add documentation. Fix code compatibility

efb761d


          Fix spelling

b38bbf7


          ENH: Add attribute mapping

659b941

The bot can now construct an event much more alligned to custom
needs, allowing setting comments and selecting just a subset of
fields to export


          ENG: Add support for creating separated MISP Events

62a87e3

With `event_separator` parameter, user can decide to create
more than one MISP event in the output bot and group incomming
messages based on given field.

In additon, the message library was fixed not to modify the
parameter directly.


          Revert early copying message

2b8b617

A lot of tests depend on that, so it looks currently risky
to change.


          FIX: Handle not existing fields with manual mapping


          ENH: Add option to extend default info

6b2f8b7


          Fix typos

3c86980


          ENH: add support for tagging

d014e93


          Fix generating on restart

b1ac8e2


          ENH: Add tagging, check, and improved docs

85ddfb3


          DOC: Update documentation about CacheMixin

7c33962


          Adjust to pycodestyle

05f4aef

kamil-certat force-pushed the misp_feed_mapping branch from 246e181 to 05f4aef Compare

July 16, 2024 12:25


          Fix typo

c57121a


          Clean up imports in tests

2b8fac7

Contributor Author

kamil-certat commented Jul 16, 2024

Could you have a look again? I have implemented tagging as well as rewritten the documentation, added config examples, implemented validation in the check method and refactored code, including renaming methods to be more clear what they do or adding doc strings where I found it useful. I have also added descriptions to complex test cases explaining what should happen.

I do admit that the configuration of the bot is complex. I did my best to marriage flexibility and readability, but I think in the future we may eventually redesign this configuration, based on feedback. I'm also open for any suggestions.

kamil-certat requested a review from aaronkaplan

July 25, 2024 09:21

Member

sebix commented Aug 27, 2024

@aaronkaplan you requested some changes to this PR and @kamil-certat implemented various changes based on your feedback. Can you please re-check?


          Merge branch 'develop' into misp_feed_mapping

c4f6b7e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: bots