Module Todo: Document Metadata Extraction #717
Replies: 5 comments 12 replies
-
Note: it would also be nice to emit the text from these documents as a generic event consumable by |
Beta Was this translation helpful? Give feedback.
-
As a prerunner for this, I have written a proof-of-concept @nicpenning here is the module: You can use it like this: bbot -t evilcorp.com -f subdomain-enum -m filedownload Pairing it with the web spider can also be very effective: bbot -t evilcorp.com -f subdomain-enum -m filedownload -c web_spider_depth=2 web_spider_distance=2 |
Beta Was this translation helpful? Give feedback.
-
This is probably relevant to this discussion #907 (comment). Now there are As mentioned in the linked discussion that is a ML model to detect human passwords in several file formats. Perhaps more interesting though is it uses Apache Tika to extract the strings from
which we could then raise as |
Beta Was this translation helpful? Give feedback.
-
Circling back around to this one, as recently we've run into problems with unstructured. Overall it's great that unstructured runs without a server component and without a Java dependency. However we should be on the lookout for a better alternative, preferably one written in rust or golang. It seems they are just now starting to emerge. @domwhewell-sage this is one to keep an eye on:
|
Beta Was this translation helpful? Give feedback.
-
It would be useful to have a collection of modules that download documents (.pdf, .docx, etc.) and extract useful metadata such as usernames and internal domain names. Thanks to @pjhartlieb and @Sw3d1shPh1sh for requesting.
Also, per @nicpenning:
Would require:
EDIT: Possible sources of metadata-extraction logic:
Beta Was this translation helpful? Give feedback.
All reactions