Skip to content

LogIntelligence/LogPPT

Repository files navigation

LogPPT 2.0: Effective and Efficient Log Parsing with Prompt-based Few-shot Learning

I. Framework


An overview of LogPPT

LogPPT consists of the following components:

  1. Data Sampling: A few-shot data sampling algorithm, which is used to select $K$ labelled logs for training ($K$ is small).
  2. Prompt-based Parsing: A module to tune a pre-trained language model using prompt tuning for log parsing
  3. Online Parsing: A caching mechanism to support efficient and consistent online parsing.

II. Requirements

  1. Python >=3.9
  2. torch
  3. transformers
  4. ...

To install all library:

$ pip install -r requirements.txt

2.2. Pre-trained models

To download the pre-trained language model:

$ cd pretrained_models/roberta-base
$ bash download.sh

III. Usage:

3.1. Few-shot data Sampling

$ cd demo
$ python 01_sampling.py

3.2. Training & Parsing

$ cd demo
$ export dataset=Apache
$ python 02_run_logppt.py --log_file ../datasets/loghub-full/$dataset/${dataset}_full.log_structured.csv --model_name_or_path roberta-base --train_file ../datasets/loghub-full/$dataset/samples/logppt_32.json --validation_file ../datasets/loghub-full/$dataset/validation.json --dataset_name $dataset --parsing_num_processes 4 --output_dir ./results/models/$dataset --task_output_dir ./results/logs --max_train_steps 1000

The parsed logs (parsing results) are saved in the outputs folder.

For the descriptions of all parameters, please use:

python 02_run_logppt.py --help

3.3. Evaluation

python 03_evaluate.py

** Implementations for baselines are adopted from Tools and Benchmarks for Automated Log Parsing, and Guidelines for Assessing the Accuracy of Log Message Template Identification Techniques.

4.2. RQ1: Parsing Effectiveness

  • Accuracy:

  • Robustness:


Robustness across different log data types


Robustness across different numbers of training data

  • Accuracy on Unseen Logs:


Accuracy on Unseen Logs

4.3. RQ2: Runtime Performance Evaluation


Running time of different log parsers under different volume

4.4. RQ3: Ablation Study

  • We exclude the Virtual Label Token Generation module and let the pre-trained model automatically assign the embedding for the virtual label token “I-PAR”. To measure the contribution of the Adaptive Random Sampling module, we remove it from our model and randomly sample the log messages for labelling.


Ablation Study Results

  • We vary the number of label words from 1 to 16 used in the Virtual Label Token Generation module.


Results with different numbers of label words

4.5. RQ4: Comparison with Different Tuning Techniques

We compare LogPPT with fine-tuning, hard-prompt, and soft-prompt.

  • Effectiveness:


Accuracy across different tuning methods

  • Efficiency:


Parsing time across different tuning methods

Additional results with PTA and RTA metrics

  • PTA: The ratio of correctly identified templates over the total number of identified templates.

  • RTA: The ratio of correctly identified templates over the total number of oracle templates.

Parsing results with Build Log from LogChunks

Raw logs Events
TEST 9/13884 [2/2 concurrent test workers running] TEST <*> [<*> concurrent test workers running]
(1.039 s) Test touch() function : basic functionality [ext/standard/tests/file/touch_basic.phpt] <*> Test touch() function : basic functionality <*>
(120.099 s) Bug #60120 (proc_open hangs when data in stdin/out/err is getting larger or equal to 2048) [ext/standard/tests/file/bug60120.phpt] <*> Bug <*> (proc_open hangs when data in <*> is getting larger or equal to <*>) <*>
SKIP Bug #54977 UTF-8 files and folder are not shown [ext/standard/tests/file/windows_mb_path/bug54977.phpt] reason: windows only test SKIP Bug <*> UTF-8 files and folder are not shown <*> reason: windows only test
Exts skipped : 17 Exts skipped : <*>

Full results with 32shot

About

Log Parsing with Prompt-based Few-shot Learning (ICSE 2023, Technical Track)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published