LogPPT 2.0: Effective and Efficient Log Parsing with Prompt-based Few-shot Learning

I. Framework

An overview of LogPPT

LogPPT consists of the following components:

Data Sampling: A few-shot data sampling algorithm, which is used to select $K$ labelled logs for training ($K$ is small).
Prompt-based Parsing: A module to tune a pre-trained language model using prompt tuning for log parsing
Online Parsing: A caching mechanism to support efficient and consistent online parsing.

II. Requirements

Python >=3.9
torch
transformers
...

To install all library:

$ pip install -r requirements.txt

2.2. Pre-trained models

To download the pre-trained language model:

$ cd pretrained_models/roberta-base
$ bash download.sh

III. Usage:

3.1. Few-shot data Sampling

$ cd demo
$ python 01_sampling.py

3.2. Training & Parsing

$ cd demo
$ export dataset=Apache
$ python 02_run_logppt.py --log_file ../datasets/loghub-full/$dataset/${dataset}_full.log_structured.csv --model_name_or_path roberta-base --train_file ../datasets/loghub-full/$dataset/samples/logppt_32.json --validation_file ../datasets/loghub-full/$dataset/validation.json --dataset_name $dataset --parsing_num_processes 4 --output_dir ./results/models/$dataset --task_output_dir ./results/logs --max_train_steps 1000

The parsed logs (parsing results) are saved in the outputs folder.

For the descriptions of all parameters, please use:

python 02_run_logppt.py --help

3.3. Evaluation

python 03_evaluate.py

** Implementations for baselines are adopted from Tools and Benchmarks for Automated Log Parsing, and Guidelines for Assessing the Accuracy of Log Message Template Identification Techniques.

4.2. RQ1: Parsing Effectiveness

Accuracy:

Robustness:

Robustness across different log data types

Robustness across different numbers of training data

Accuracy on Unseen Logs:

Accuracy on Unseen Logs

4.3. RQ2: Runtime Performance Evaluation

Running time of different log parsers under different volume

4.4. RQ3: Ablation Study

We exclude the Virtual Label Token Generation module and let the pre-trained model automatically assign the embedding for the virtual label token “I-PAR”. To measure the contribution of the Adaptive Random Sampling module, we remove it from our model and randomly sample the log messages for labelling.

Ablation Study Results

We vary the number of label words from 1 to 16 used in the Virtual Label Token Generation module.

Results with different numbers of label words

4.5. RQ4: Comparison with Different Tuning Techniques

We compare LogPPT with fine-tuning, hard-prompt, and soft-prompt.

Effectiveness:

Accuracy across different tuning methods

Efficiency:

Parsing time across different tuning methods

Additional results with PTA and RTA metrics

PTA: The ratio of correctly identified templates over the total number of identified templates.
RTA: The ratio of correctly identified templates over the total number of oracle templates.

Parsing results with Build Log from LogChunks

Raw logs	Events
TEST 9/13884 [2/2 concurrent test workers running]	TEST <> [<> concurrent test workers running]
(1.039 s) Test touch() function : basic functionality [ext/standard/tests/file/touch_basic.phpt]	<> Test touch() function : basic functionality <>
(120.099 s) Bug #60120 (proc_open hangs when data in stdin/out/err is getting larger or equal to 2048) [ext/standard/tests/file/bug60120.phpt]	<> Bug <> (proc_open hangs when data in <> is getting larger or equal to <>) <*>
SKIP Bug #54977 UTF-8 files and folder are not shown [ext/standard/tests/file/windows_mb_path/bug54977.phpt] reason: windows only test	SKIP Bug <> UTF-8 files and folder are not shown <> reason: windows only test
Exts skipped : 17	Exts skipped : <*>

Full results with 32shot

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
demo		demo
docs		docs
logppt		logppt
pretrained_models		pretrained_models
scripts		scripts
tests		tests
.gitignore		.gitignore
BERT_CRF.ipynb		BERT_CRF.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
res_analysis.ipynb		res_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LogPPT 2.0: Effective and Efficient Log Parsing with Prompt-based Few-shot Learning

I. Framework

II. Requirements

2.2. Pre-trained models

III. Usage:

3.1. Few-shot data Sampling

3.2. Training & Parsing

3.3. Evaluation

4.2. RQ1: Parsing Effectiveness

4.3. RQ2: Runtime Performance Evaluation

4.4. RQ3: Ablation Study

4.5. RQ4: Comparison with Different Tuning Techniques

Additional results with PTA and RTA metrics

Parsing results with Build Log from LogChunks

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

LogIntelligence/LogPPT

Folders and files

Latest commit

History

Repository files navigation

LogPPT 2.0: Effective and Efficient Log Parsing with Prompt-based Few-shot Learning

I. Framework

II. Requirements

2.2. Pre-trained models

III. Usage:

3.1. Few-shot data Sampling

3.2. Training & Parsing

3.3. Evaluation

4.2. RQ1: Parsing Effectiveness

4.3. RQ2: Runtime Performance Evaluation

4.4. RQ3: Ablation Study

4.5. RQ4: Comparison with Different Tuning Techniques

Additional results with PTA and RTA metrics

Parsing results with Build Log from LogChunks

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages