This is a fork of the dataset at https://github.com/chesvectain/PackingData with some samples sanitized (e.g. UPX-packed samples in the ´not-packed´ folder or samples with a same hash from the packer and not-packed
folders).
It also includes a folder named outliers
containing samples we could identify as potentially disturbing our models, i.e. when they were sorted among the not packed samples while demonstrating characteristics of packed data. This dataset can be used for training machine learning models tailored to PE executable packing.
Folder labels
contains a Python script for generating labels based on the packer categories mentioned in the table of packed
folder's README.md
with the resulting JSON dictionaries.
You may also like these:
- Awesome Executable Packing: A curated list of awesome resources related to executable packing.
- Bintropy: Analysis tool for estimating the likelihood that a binary contains compressed or encrypted bytes (inspired from this paper).
- Dataset of packed ELF files: Dataset of ELF samples packed with many different packers.
- Docker Packing Box: Docker image gathering packers and tools for making datasets of packed executables.
- DSFF: Library implementing the DataSet File Format (DSFF).
- PEiD: Python implementation of the well-known Packed Executable iDentifier (PEiD).
- PyPackerDetect: Packing detection tool for PE files (fork of this repository).
- REMINDer: Packing detector using a simple heuristic (inspired from this paper).
Example of visualization created with Bintropy: