Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the pipeline for the task explanation and Llm #2190

Open
wants to merge 50 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
adbca17
Add Task EXPLANATION and the visualization of images with description.
Bepitic Jul 15, 2024
5611ec1
upd dataset task with explanation
Bepitic Jul 15, 2024
8ed23a3
fix tasktype on metrics, depth, cataset, inferencer.
Bepitic Jul 15, 2024
a463b5b
Merge branch 'main' into llm-pipeline
Bepitic Jul 15, 2024
d5baf6b
fix lint on visualization/image
Bepitic Jul 16, 2024
b7c8eaa
Merge branch 'openvinotoolkit:main' into llm-pipeline
Bepitic Jul 18, 2024
5b563d9
Merge branch 'llm-pipeline' of github.com:Bepitic/anomalib into llm-p…
Bepitic Jul 18, 2024
bfd936e
Fix formatting dataset
Bepitic Jul 18, 2024
f541316
fix format data/base/depth
Bepitic Jul 18, 2024
4e392a9
Fix formatting openvino_inferencer
Bepitic Jul 18, 2024
5fc70ba
fix formatting
Bepitic Jul 18, 2024
75099af
Add Explanation to error-msg.
Bepitic Aug 2, 2024
e5040d3
OpenAI - VLM init
Bepitic Aug 3, 2024
86ad803
Add wrapper to run OpenAI
Bepitic Aug 4, 2024
3678f72
add in ppyproject
Bepitic Aug 4, 2024
7413842
Add Test and fix description/title
Bepitic Aug 12, 2024
dc42cbd
Add Readme and fix bug.
Bepitic Aug 13, 2024
5788d22
Update src/anomalib/models/image/openai_vlm/lightning_model.py
Bepitic Aug 13, 2024
e4f6bec
Update src/anomalib/models/image/openai_vlm/__init__.py
Bepitic Aug 13, 2024
5437467
Add fix pipeline bug.
Bepitic Aug 13, 2024
982c9ca
Add test.
Bepitic Aug 13, 2024
642fd26
Merge branch 'OpenAI-VLM' of github.com:Bepitic/anomalib into OpenAI-VLM
Bepitic Aug 13, 2024
b8cacf0
add changes
Bepitic Aug 16, 2024
0929dc9
Add integration test and unit test + skip export.
Bepitic Aug 16, 2024
39cf996
change to LANGUAGE
Bepitic Aug 16, 2024
671693d
Update images in Readme.
Bepitic Aug 17, 2024
224118b
Update src/anomalib/models/image/chatgpt_vision/__init__.py
Bepitic Aug 20, 2024
b703a41
Update src/anomalib/models/image/chatgpt_vision/chatgpt.py
Bepitic Aug 20, 2024
24c5486
Update src/anomalib/models/image/chatgpt_vision/lightning_model.py
Bepitic Aug 20, 2024
68e757e
Update tests/integration/model/test_models.py
Bepitic Aug 20, 2024
86714a1
Update src/anomalib/models/image/chatgpt_vision/lightning_model.py
Bepitic Aug 20, 2024
196d2a3
Update src/anomalib/models/image/chatgpt_vision/lightning_model.py
Bepitic Aug 20, 2024
b7f345a
fix comments
Bepitic Aug 20, 2024
b285d10
remove last file of chatgpt_vision.
Bepitic Aug 20, 2024
a688530
fix tests
Bepitic Aug 20, 2024
0fb5f79
Merge pull request #1 from Bepitic/OpenAI-VLM (GPTVad)
Bepitic Aug 20, 2024
6503543
Merge branch 'main' into llm-pipeline
Bepitic Aug 20, 2024
8e92e5e
Update src/anomalib/models/image/gptvad/chatgpt.py
Bepitic Aug 21, 2024
5ab044d
upd: language -> VISUAL_PROMPTING
Bepitic Aug 21, 2024
3f9ca93
fix visual prompting and model_name
Bepitic Aug 21, 2024
391b4c4
fix GPT for Gpt and the folder of the tests.
Bepitic Aug 21, 2024
ca1a0bb
fix: change import error outside.
Bepitic Aug 21, 2024
022dcb7
fix readme pointing to the right model.
Bepitic Aug 21, 2024
af7b9e9
fix import cycle, and separate usecase by explicit if.
Bepitic Aug 21, 2024
faf334f
upd: add comments to the few shot / zero shot.
Bepitic Aug 21, 2024
3ed8d3f
fix: dataset expected colums
Bepitic Aug 21, 2024
7f454c4
upd: add the same logic of the label on visualize_full.
Bepitic Aug 22, 2024
45bd520
Merge branch 'main' into llm-pipeline
Bepitic Aug 22, 2024
44586d6
Fix in the logic of the code.
Bepitic Aug 22, 2024
7adb835
Merge branch 'llm-pipeline' of github.com:Bepitic/anomalib into llm-p…
Bepitic Aug 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/source/images/gptvad/broken.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/images/gptvad/good.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ core = [
"lightning>=2.2",
"torch>=2",
"torchmetrics>=1.3.2",
"openai>=1.38.0",
# NOTE: open-clip-torch throws the following error on v2.26.1
# torch.onnx.errors.UnsupportedOperatorError: Exporting the operator
# 'aten::_native_multi_head_attention' to ONNX opset version 14 is not supported
Expand Down
1 change: 1 addition & 0 deletions src/anomalib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ class TaskType(str, Enum):
CLASSIFICATION = "classification"
DETECTION = "detection"
SEGMENTATION = "segmentation"
LANGUAGE = "language"
Bepitic marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 2 additions & 2 deletions src/anomalib/callbacks/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,10 @@ def setup(
pixel_metric_names: list[str] | dict[str, dict[str, Any]]
if self.pixel_metric_names is None:
pixel_metric_names = []
elif self.task == TaskType.CLASSIFICATION:
elif self.task in (TaskType.CLASSIFICATION, TaskType.LANGUAGE):
pixel_metric_names = []
logger.warning(
"Cannot perform pixel-level evaluation when task type is classification. "
"Cannot perform pixel-level evaluation when task type is classification or language. "
"Ignoring the following pixel-level metrics: %s",
self.pixel_metric_names,
)
Expand Down
4 changes: 3 additions & 1 deletion src/anomalib/data/base/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,11 @@
from anomalib.data.utils import LabelName, masks_to_boxes, read_image, read_mask

_EXPECTED_COLUMNS_CLASSIFICATION = ["image_path", "split"]
_EXPECTED_COLUMNS_LANGUAGE = ["image_path", "split"]
_EXPECTED_COLUMNS_SEGMENTATION = [*_EXPECTED_COLUMNS_CLASSIFICATION, "mask_path"]
_EXPECTED_COLUMNS_PERTASK = {
"classification": _EXPECTED_COLUMNS_CLASSIFICATION,
"language": _EXPECTED_COLUMNS_LANGUAGE,
"segmentation": _EXPECTED_COLUMNS_SEGMENTATION,
"detection": _EXPECTED_COLUMNS_SEGMENTATION,
}
Expand Down Expand Up @@ -169,7 +171,7 @@ def __getitem__(self, index: int) -> dict[str, str | torch.Tensor]:
image = read_image(image_path, as_tensor=True)
item = {"image_path": image_path, "label": label_index}

if self.task == TaskType.CLASSIFICATION:
if self.task in (TaskType.CLASSIFICATION, TaskType.LANGUAGE):
item["image"] = self.transform(image) if self.transform else image
elif self.task in (TaskType.DETECTION, TaskType.SEGMENTATION):
# Only Anomalous (1) images have masks in anomaly datasets
Expand Down
2 changes: 1 addition & 1 deletion src/anomalib/data/base/depth.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def __getitem__(self, index: int) -> dict[str, str | torch.Tensor]:
depth_image = to_tensor(read_depth_image(depth_path))
item = {"image_path": image_path, "depth_path": depth_path, "label": label_index}

if self.task == TaskType.CLASSIFICATION:
if self.task in (TaskType.CLASSIFICATION, TaskType.LANGUAGE):
item["image"], item["depth_image"] = (
self.transform(image, depth_image) if self.transform else (image, depth_image)
)
Expand Down
2 changes: 1 addition & 1 deletion src/anomalib/deploy/inferencers/openvino_inferencer.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ def post_process(self, predictions: np.ndarray, metadata: dict | DictConfig | No
pred_idx = pred_score >= metadata["image_threshold"]
pred_label = LabelName.ABNORMAL if pred_idx else LabelName.NORMAL

if task == TaskType.CLASSIFICATION:
if task in (TaskType.CLASSIFICATION, TaskType.LANGUAGE):
_, pred_score = self._normalize(pred_scores=pred_score, metadata=metadata)
elif task in (TaskType.SEGMENTATION, TaskType.DETECTION):
if "pixel_threshold" in metadata:
Expand Down
2 changes: 2 additions & 0 deletions src/anomalib/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
Fastflow,
Fre,
Ganomaly,
GPTVad,
Padim,
Patchcore,
ReverseDistillation,
Expand Down Expand Up @@ -51,6 +52,7 @@ class UnknownModelError(ModuleNotFoundError):
"Fastflow",
"Fre",
"Ganomaly",
"GPTVad",
"Padim",
"Patchcore",
"ReverseDistillation",
Expand Down
2 changes: 2 additions & 0 deletions src/anomalib/models/image/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from .fastflow import Fastflow
from .fre import Fre
from .ganomaly import Ganomaly
from .gptvad import GPTVad
from .padim import Padim
from .patchcore import Patchcore
from .reverse_distillation import ReverseDistillation
Expand All @@ -34,6 +35,7 @@
"Fastflow",
"Fre",
"Ganomaly",
"GPTVad",
"Padim",
"Patchcore",
"ReverseDistillation",
Expand Down
8 changes: 8 additions & 0 deletions src/anomalib/models/image/gptvad/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
"""Generative Pre-Trained Transformer (GPT) based Large Language Model (LLM)."""

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

from .lightning_model import GPTVad

__all__ = ["GPTVad"]
127 changes: 127 additions & 0 deletions src/anomalib/models/image/gptvad/chatgpt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
"""Wrapper for the OpenAI calls to the VLM model."""

import logging
import os
from typing import Any

import openai


class GPTWrapper:
"""A wrapper class for making API calls to OpenAI's GPT-4 model to detect anomalies in images.

Environment variable OPENAI_API_KEY (str): API key for OpenAI.
https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key
Other possible models: https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4
All models with vision capabilities: 'gpt-4-turbo-2024-04-09', 'gpt-4-turbo',
all versions of 'gpt-4o-mini', and 'gpt-4o'

Args:
model_name (str): Model name for OpenAI API VLM. Default "gpt-4o"
detail (bool): If the images will be sended with high detail or low detail.

"""

def __init__(self, model_name: str = "gpt-4o", detail: bool = True) -> None:
openai_key = os.getenv("OPENAI_API_KEY")
self.model_name = model_name
self.detail = detail
if not openai_key:
from anomalib.engine.engine import UnassignedError
Bepitic marked this conversation as resolved.
Show resolved Hide resolved

msg = "OpenAI environment key not found.(OPENAI_API_KEY)"
raise UnassignedError(msg)

def api_call(
self,
images: list[str],
extension: str = "png",
) -> str:
"""Makes an API call to OpenAI's GPT-4 model to detect anomalies in an image.

Args:
images (list[str]): List of base64 images that serve as examples and last one to check for anomalies.
extension (str): Extension of the group of images that needs to be checked for anomalies. Default = 'png'

Returns:
str: The response from the GPT-4 model indicating whether the image has anomalies or not.
It returns 'NO' if there are no anomalies and 'YES: description' if there are anomalies,
where 'description' provides details of the anomaly and its position.

Raises:
openai.error.OpenAIError: If there is an error during the API call.
"""
prompt: str = ""
if len(images) > 0:
prompt = """
You will receive an image that is going to be an example of the typical image without any anomaly,
and the last image that you need to decide if it has an anomaly or not.
Answer with a 'NO' if it does not have any anomalies and 'YES: description'
where description is a description of the anomaly provided, position.
"""
else:
prompt = """
Examine the provided image carefully to determine if there is an obvious anomaly present.
Anomalies may include mechanical malfunctions, unexpected objects, safety hazards, structural damages,
or unusual patterns or defects in the objects.

Instructions:

1. Thoroughly inspect the image for any irregularities or deviations from normal operating conditions.

2. Clearly state if an obvious anomaly is detected.
- If an anomaly is detected, begin with 'YES,' followed by a detailed description of the anomaly.
- If no anomaly is detected, simply state 'NO' and end the analysis.

Example Output Structure:

'YES:
- Description: Conveyor belt misalignment causing potential blockages.
This may result in production delays and equipment damage.
Immediate realignment and inspection are recommended.'

'NO'

Considerations:

- Ensure accuracy in identifying anomalies to prevent overlooking critical issues.
- Provide clear and concise descriptions for any detected anomalies.
- Focus on obvious anomalies that could impact final use of the object operation or safety.
"""

detail_img = "high" if self.detail else "low"
messages: list[dict[str, Any]] = [
{
"role": "system",
"content": prompt,
},
]
for image in images:
Bepitic marked this conversation as resolved.
Show resolved Hide resolved
image_message = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/{extension};base64,{image}",
"detail": detail_img,
},
},
],
},
]
messages.extend(image_message)

try:
# Make the API call using the openai library
response = openai.chat.completions.create(
model=self.model_name,
messages=messages,
max_tokens=300,
)
return response.choices[-1].message.content or ""
except Exception:
msg = "The openai API trow an exception."
Bepitic marked this conversation as resolved.
Show resolved Hide resolved
logging.exception(msg)
raise
155 changes: 155 additions & 0 deletions src/anomalib/models/image/gptvad/lightning_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
"""OpenAI Visual Large Model: Zero-/Few-Shot Anomaly Classification.

Paper (No paper)
"""
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import base64
import logging
from pathlib import Path

import torch
from lightning.pytorch.utilities.types import STEP_OUTPUT
from torch.utils.data import DataLoader

from anomalib import LearningType
from anomalib.metrics.threshold import ManualThreshold
from anomalib.models.components import AnomalyModule

from .chatgpt import GPTWrapper

logger = logging.getLogger(__name__)

__all__ = ["GPTVad"]


class GPTVad(AnomalyModule):
Bepitic marked this conversation as resolved.
Show resolved Hide resolved
"""OpenAI VLM Lightning model using OpenAI's GPT-4 for image anomaly detection.

Args:
k_shot(int): The number of images that will compare to detect if it is an anomaly.
model_name (str): The OpenAI VLM for visual anomaly detection.
detail (bool): The detail of the input in the vlm for the image detection 'high'(true) 'low'(false).
"""

def __init__(
self,
k_shot: int = 0,
model_name: str = "gpt-4o",
detail: bool = True,
) -> None:
super().__init__()

self.k_shot = k_shot

self.model_name = model_name
self.detail = detail
self.image_threshold = ManualThreshold()
self.vlm = GPTWrapper(model_name=self.model_name, detail=self.detail)

def _setup(self) -> None:
dataloader = self.trainer.datamodule.train_dataloader()
pre_images = self.collect_reference_images(dataloader)
self.pre_images = pre_images

def _encode_image(self, image_path: str) -> str:
"""Function to encode the image into base64 to send it with the prompt."""
path = Path(image_path)
with path.open("rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")

def training_step(self, batch: dict[str, str | torch.Tensor], *args, **kwargs) -> dict[str, str | torch.Tensor]:
"""Train Step of LLM."""
del args, kwargs # These variables are not used.
# no train on llm
return batch

@staticmethod
def configure_optimizers() -> None:
"""OpenaiVlm doesn't require optimization, therefore returns no optimizers."""
return

def validation_step(
self,
batch: dict[str, str | list[str] | torch.Tensor],
*args,
**kwargs,
) -> STEP_OUTPUT:
"""Get batch of anomaly maps from input image batch.

Args:
batch (dict[str, str | list[str] | torch.Tensor]): Batch containing image filename, image, label and mask
args: Additional arguments.
kwargs: Additional keyword arguments.

Returns:
dict[str, Any]: str_otput and pred_scores, the output of the Llm and pred_scores 1.0 if is an anomaly image.
"""
del args, kwargs # These variables are not used.
batch_size = len(batch["image_path"])
outputs: list[str] = []
predictions: list[float] = []
for i in range(batch_size):
# Getting the base64 string
base64_images = [self._encode_image(img) for img in self.pre_images]
base64_images.append(self._encode_image(batch["image_path"][i]))

try:
output = self.vlm.api_call(base64_images)
except Exception:
logging.exception(
f"Error calling openAI API for image {batch['image_path'][i]}",
)
output = "Error"

# set an error and get to normal if not followed
prediction = 0.0
if output.startswith("N"):
prediction = 0.0
elif output.startswith("Y"):
prediction = 1.0
else:
logging.warning(
f"(Set predition to '0' Normal)Could not identify if there is anomaly by the output:\n{output}",
)

outputs.append(output)
predictions.append(prediction)
logging.debug(f"Output: {output}, Prediction: {prediction}")

batch["str_output"] = outputs
batch["pred_scores"] = torch.tensor(predictions).to(self.device)
batch["pred_labels"] = torch.tensor(predictions).to(self.device)
return batch

@property
def trainer_arguments(self) -> dict[str, int | float]:
"""Set model-specific trainer arguments."""
return {}

@property
def learning_type(self) -> LearningType:
"""The learning type of the model.

Llm is a zero-/few-shot model, depending on the user configuration. Therefore, the learning type is
set to ``LearningType.FEW_SHOT`` when ``k_shot`` is greater than zero and ``LearningType.ZERO_SHOT`` otherwise.
"""
return LearningType.ZERO_SHOT if self.k_shot == 0 else LearningType.FEW_SHOT

def collect_reference_images(self, dataloader: DataLoader) -> list[str]:
"""Collect reference images for few-shot inference.

The reference images are collected by iterating the training dataset until the required number of images are
collected.

Returns:
ref_images list[str]: A list containing the reference images path.
"""
reference_images_paths: list[str] = []
for batch in dataloader:
image_paths = batch["image_path"][: self.k_shot - len(reference_images_paths)]
reference_images_paths.extend(image_paths)
if self.k_shot == len(reference_images_paths):
break
return reference_images_paths
Loading