A tool used to detect botnet based on existing P2P botnet packet dataset and health packet dataset. Using machine learning to differentiate botnet trace out of normal trace.
- Extract features with Tshark and numpy
- Train and generate result with sklearn extraTreesClassifier
- High true rate of 99%
dataExtraction.py: Extracting packet data from pcap files and save asname.csvgenerateFlow.py: Combining packets with same sent IP and receive IP into flows and save flows intoname.flow.csvfeaturesExtraction.py: Extracting features from flows and save asname.features.csvflowMix: Generate train and test file by combining normal dataset and malicious dataset and save astest.csv,train.csvandtestStandard.csvgetResult: To train and get result and get true rate.
- Put a healthy pcap dataset and a botnet/suspicious pcap dataset in root
- Modify
constants.py, put the file names you want to use in FILENAMES. - Modify
all.sh, in the fourth line, change second and third parameters intohealthy pcap filename+.features.csvandmalicious pcap filename+.features.csv. In my program, I use half total data to train and anothor half to test. You can modify the ratio on your own. - In the terminal, use command
. all.sh