This project leverages Large Language Models (LLMs) like GPT-4 and Gemini to classify and analyze software failure incidents. The primary objective is to automate the summarization, categorization, and extraction of key information from articles related to software vulnerabilities, supply chain attacks, and other cyber incidents.
The project uses multiple APIs, including OpenAI's GPT API and Google's Gemini API, to generate content, categorize articles, and analyse data. It utilizes environment variables to manage API keys securely.
This script utilizes the Google Gemini API to generate content and analyze software-related articles. It connects to the Gemini API using the gemini-1.5-pro-latest model to generate summaries, identify key vulnerabilities, and classify information.
Key Features:
- Configures Gemini models using an API key stored in environment variables.
- Summarizes articles to extract critical insights.
- Lists available models for content generation.
- Demonstrates the usage of the Gemini models for content generation and analysis.
This script focuses on utilizing the OpenAI GPT API for content generation. It uses the GPT-4o model to analyze software articles, classify incidents, and generate summaries.
Key Features:
- Configures GPT models using environment variables for secure access.
- Extracts insights from articles related to software vulnerabilities and incidents.
- Includes functionality for categorizing incidents into predefined categories like negligence, malicious maintainers, attack chaining, etc.
- Uses metrics like Cohen's Kappa Score for evaluating classification consistency.
Contains a collection of software-related articles used for testing the Gemini and GPT models. These articles describe real-world incidents involving software vulnerabilities, supply chain attacks, and other cybersecurity issues.
A dataset containing articles with metadata used for training and evaluating the models.
Excel files used for manual classification, data analysis, and evaluation of model performance. These files help in comparing the automated classification results with human-labeled data.
-
Clone the Repository:
git clone https://github.com/YourUsername/YourRepository.git cd YourRepository -
Create a
.envFile:- Add the following lines to your
.envfile:API_KEY=your_gemini_api_key OPENAI_API_KEY=your_openai_api_key
- Add the following lines to your
-
Install Dependencies:
pip install -r requirements.txt
-
Run the
Geminiprompts.pyscript:python Geminiprompts.py
This will generate summaries and analyze the articles using the Gemini API.
-
Run the
Gpt4oPrompts.pyscript:python Gpt4oPrompts.py
This will categorize articles and extract insights using the GPT API.
- Improve the classification accuracy by fine-tuning the prompts for GPT and Gemini models.
- Expand the dataset with more articles to enhance model training.
- Integrate additional metrics for performance evaluation.