[AGENT] Add reusable hallucination audit learnings by BASIC-BIT · Pull Request #117 · Chronote-gg/Chronote

BASIC-BIT · 2026-02-13T06:05:15Z

[AGENT] Extracts permanent, low-footprint learnings from the hallucination audit into reusable tooling and docs, without carrying large raw artifacts into mainline history.

What this PR includes

Adds reusable audit scripts under analysis/hallucination-audit/.
Adds the audit findings summary doc docs/hallucination-audit-20260210.md.
Adds a concrete mitigation plan doc docs/hallucination-mitigation-plan-20260213.md.
Updates .gitignore for Python cache and husky internal files.

Script hardening included

align_with_full_transcript.py
- returns word-window for exact substring matches
- uses csv.DictWriter with full field union for robust CSV output
- adds lightweight pruning before expensive similarity checks
create_langfuse_dataset_sample.py
- default class counts now respect --sample-size
- validates unknown class keys in --counts

Intentionally excluded

No large meeting artifacts, raw trace dumps, or audio blobs.
No production runtime behavior changes in bot paths.

Context

This branch is the durable successor for learnings from the archive PR [AGENT] Add hallucination audit artifacts and docs #116.

greptile-apps · 2026-02-13T06:07:27Z

Greptile Overview

Greptile Summary

Extracts reusable hallucination audit tooling and documentation from audit work, including 6 Python analysis scripts, findings documentation, and a concrete mitigation plan. Scripts implement transcript classification, duplicate detection, audio volume analysis, full audio transcription, and Langfuse dataset sampling. Documentation captures key findings (379 hallucinated, 495 legit, 1178 unknown from 2052 traces) and recommends keeping current prompt-echo guards while adding tunable config keys and a vote-transcription path for suspicious snippets. All scripts follow clean architecture with proper error handling, retry logic for rate limits, and incremental output saves.

Adds complete audit toolkit under analysis/hallucination-audit/ with main orchestration script and supporting utilities
Documents audit findings with classification breakdowns, syllable-rate analysis, and audio loudness thresholds
Provides actionable mitigation plan with phased rollout strategy (config keys, vote transcription, threshold tuning)
Updates .gitignore for Python artifacts (__pycache__/, *.pyc) and husky internals (.husky/_)
All Python scripts properly handle env loading, API retries, and edge cases (rate limits, missing files, silence detection)

Confidence Score: 5/5

Safe to merge - well-architected analysis tooling with no production runtime changes
All changes are isolated to analysis scripts and documentation with no impact on bot runtime behavior. Scripts demonstrate good practices (retry logic, error handling, incremental saves). Documentation is clear and actionable. The PR explicitly excludes large artifacts and production changes.
No files require special attention

Important Files Changed

Filename	Overview
analysis/hallucination-audit/align_with_full_transcript.py	Implements exact and fuzzy transcript matching with word-window alignment
analysis/hallucination-audit/compute_audio_volume.py	Fetches Langfuse audio media and computes volume metrics via ffmpeg
analysis/hallucination-audit/create_langfuse_dataset_sample.py	Creates balanced Langfuse dataset samples for manual hallucination labeling
analysis/hallucination-audit/run_audit.py	Main audit script that classifies transcriptions and detects duplicates
docs/hallucination-mitigation-plan-20260213.md	Outlines mitigation strategy with config recommendations and rollout plan

_{Last reviewed commit: bb013a2}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb013a236a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-13T06:08:34Z

+    chosen_window = max(window_results.keys(), key=lambda k: len(window_results[k]))
+    traces = window_results[chosen_window]


Honor --meeting-id across all fetched windows

When --date is omitted, the script fetches both today and yesterday but then picks only the window with the most traces before it applies --meeting-id. If the requested meeting exists in the other window, filtered becomes empty and the audit proceeds with zero records, producing a misleading summary and incorrect downstream artifacts. Resolve the meeting ID against all fetched windows (or choose the matching window first) before narrowing the trace set.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-13T06:08:34Z

+    if list(segment_dir.glob("segment_*.mp3")):
+        return sorted(segment_dir.glob("segment_*.mp3"))


Regenerate segments when audio/chunk inputs change

The segment cache is reused whenever any segment_*.mp3 exists, without verifying that those files were produced from the current audio input or current --chunk-seconds value. Re-running after replacing audio_combined.mp3 or changing chunk size will silently transcribe stale segments, so full_transcript.txt can be out of sync with the intended meeting audio and corrupt alignment results.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-13T06:08:34Z

+        if response.status_code in (400, 409):
+            return


Raise on dataset-item 400 responses

Treating HTTP 400 as success causes malformed dataset-item requests to be silently dropped while the script continues as if sampling succeeded. If Langfuse rejects an item payload (schema change, invalid field, etc.), the resulting dataset can be smaller or imbalanced without any failure signal, which undermines audit/eval accuracy.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR extracts reusable learnings and tooling from a hallucination audit (meeting ID 3837e4e0-64e9-44ba-b5de-c3a6849832d6, conducted on 2026-02-10) into the mainline repository without including large raw artifacts. The audit analyzed 2052 traces, finding 379 hallucinated outputs (primarily prompt echo), 495 legitimate outputs, and 1178 unknown. The PR provides documentation, a mitigation plan, and a set of Python scripts for future audit work.

Changes:

Adds documentation of audit findings and mitigation recommendations
Adds six reusable Python scripts for conducting hallucination audits
Updates .gitignore for Python cache files and husky internals

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`docs/hallucination-audit-20260210.md`	Documents the 2026-02-10 hallucination audit findings including classification counts, SPS analysis, and audio volume metrics
`docs/hallucination-mitigation-plan-20260213.md`	Provides concrete mitigation recommendations including config changes, system improvements, and rollout strategy
`analysis/hallucination-audit/README.md`	Overview of the audit workspace and available scripts
`analysis/hallucination-audit/run_audit.py`	Main audit orchestration script that fetches Langfuse traces, classifies them, and builds duplicate groups
`analysis/hallucination-audit/compute_audio_volume.py`	Computes mean and max audio volume (dB) for Langfuse media using ffmpeg
`analysis/hallucination-audit/download_full_audio.py`	Downloads full meeting audio from S3 for analysis
`analysis/hallucination-audit/transcribe_full_audio.py`	Transcribes full meeting audio using OpenAI in segments
`analysis/hallucination-audit/align_with_full_transcript.py`	Aligns snippet transcripts with full transcript using fuzzy matching; uses csv.DictWriter for robust CSV output
`analysis/hallucination-audit/create_langfuse_dataset_sample.py`	Creates balanced Langfuse dataset samples for labeling with validation of class keys
`.gitignore`	Adds Python cache patterns and husky internal files

Copilot · 2026-02-13T06:10:55Z

+def build_index(words: List[str]) -> Dict[str, List[int]]:
+    index: Dict[str, List[int]] = {}
+    for pos, word in enumerate(words):
+        if len(word) < 4:
+            continue
+        index.setdefault(word, []).append(pos)
+    return index
+
+
+def find_subsequence_window(
+    snippet_words: List[str],
+    full_words: List[str],
+) -> Optional[Tuple[int, int]]:
+    if not snippet_words or not full_words or len(snippet_words) > len(full_words):
+        return None
+    snippet_length = len(snippet_words)
+    last_start = len(full_words) - snippet_length
+    for start in range(last_start + 1):
+        if full_words[start : start + snippet_length] == snippet_words:
+            return (start, start + snippet_length)
+    return None
+
+
+def best_match(
+    snippet_text: str,
+    snippet_words: List[str],
+    full_text: str,
+    full_words: List[str],
+    index: Dict[str, List[int]],
+) -> Tuple[Optional[float], str, Optional[Tuple[int, int]]]:
+    if not snippet_text:
+        return None, "empty", None
+    if snippet_text in full_text:
+        return 1.0, "substring", find_subsequence_window(snippet_words, full_words)
+
+    unique_words = sorted(set(snippet_words), key=len, reverse=True)
+    candidates = [word for word in unique_words if len(word) >= 4][:3]
+    if not candidates:
+        return None, "no_candidates", None
+
+    window_size = max(8, min(len(full_words), len(snippet_words) + 6))
+    snippet_word_set = set(snippet_words)
+    best_score: Optional[float] = None
+    best_window: Optional[Tuple[int, int]] = None
+    for word in candidates:
+        positions = index.get(word, [])
+        if len(positions) > 100:
+            positions = positions[:100]
+        for pos in positions:
+            start = max(0, pos - 3)
+            end = min(len(full_words), start + window_size)
+            window_text = " ".join(full_words[start:end])
+            if not window_text:
+                continue
+
+            if best_score is not None:
+                max_possible = 1 - (
+                    abs(len(snippet_text) - len(window_text))
+                    / max(len(snippet_text), len(window_text))
+                )
+                if max_possible <= best_score:
+                    continue
+
+            window_word_set = set(full_words[start:end])
+            if snippet_word_set and window_word_set:
+                overlap_ratio = len(snippet_word_set & window_word_set) / len(
+                    snippet_word_set
+                )
+                if overlap_ratio < 0.25:
+                    continue
+
+            dist = levenshtein_distance(snippet_text, window_text)
+            ratio = dist / max(len(snippet_text), len(window_text))
+            score = 1 - ratio
+            if best_score is None or score > best_score:
+                best_score = score
+                best_window = (start, end)
+    return best_score, "fuzzy", best_window


This function contains several magic numbers that should be extracted as named constants for better code clarity:

4 (lines 40, 73): minimum word length for indexing and candidate selection

3 (line 73): maximum number of candidate words

8 (line 77): minimum window size

6 (line 77): window size padding

100 (line 83): maximum positions to check per word

3 (line 86): position offset for window start

0.25 (line 105): minimum overlap ratio threshold

These magic numbers represent important thresholds for the fuzzy matching algorithm. Extracting them as named constants would make the algorithm's behavior more transparent and easier to tune.

Copilot · 2026-02-13T06:10:56Z

+        and avg_logprob > -1.2
+        and min_logprob > -2.5
+    )


The logprob thresholds -1.2 and -2.5 used to classify records as "legit" are magic numbers that should be extracted as named constants. These thresholds are critical for the classification algorithm and may need to be tuned based on further analysis.

Consider extracting these as module-level constants like CLEAN_AVG_LOGPROB_THRESHOLD = -1.2 and CLEAN_MIN_LOGPROB_THRESHOLD = -2.5 to make them more discoverable and easier to adjust.

Copilot · 2026-02-13T06:10:56Z

+            data={
+                "model": model,
+                "response_format": "json",
+                "temperature": "0",


The temperature parameter is passed as a string "0" instead of a numeric value. While the OpenAI API may accept this, it's better practice to pass numeric parameters as their proper type (integer or float) for API calls. This should be "temperature": 0 or "temperature": 0.0.

Suggested change

"temperature": "0",

"temperature": 0,

Copilot · 2026-02-13T06:10:56Z

+def load_env(path: Path) -> None:
+    if not path.exists():
+        return
+    for line in path.read_text(encoding="utf-8").splitlines():
+        stripped = line.strip()
+        if not stripped or stripped.startswith("#") or "=" not in stripped:
+            continue
+        key, value = stripped.split("=", 1)
+        key = key.strip()
+        value = value.strip().strip('"').strip("'")
+        if key and key not in os.environ:
+            os.environ[key] = value


The load_env function is duplicated across all six Python scripts. This violates the DRY (Don't Repeat Yourself) principle and makes maintenance harder. If a bug is found or improvement is needed in this function, it would need to be fixed in six places.

Consider extracting this function into a shared utility module (for example, analysis/hallucination-audit/utils.py or analysis/hallucination-audit/common.py) that all scripts can import from. This would ensure consistent behavior across all scripts and make future updates easier.

Copilot · 2026-02-13T06:10:56Z

+def build_near_duplicate_groups(records: List[Dict[str, Any]]) -> Dict[str, int]:
+    candidates: List[Tuple[str, str]] = []
+    for record in records:
+        norm = record.get("normalized_text")
+        if not norm or len(norm) < 12:
+            continue
+        candidates.append((record["trace_id"], norm))
+    if not candidates:
+        return {}
+
+    buckets: Dict[Tuple[int, str], List[Tuple[str, str]]] = defaultdict(list)
+    for trace_id, norm in candidates:
+        length_bucket = len(norm) // 20
+        prefix = norm[:5]
+        buckets[(length_bucket, prefix)].append((trace_id, norm))
+
+    parent: Dict[str, str] = {}
+
+    def find(x: str) -> str:
+        root = parent.get(x, x)
+        if root != x:
+            parent[x] = find(root)
+        return parent.get(x, x)
+
+    def union(a: str, b: str) -> None:
+        ra = find(a)
+        rb = find(b)
+        if ra != rb:
+            parent[rb] = ra
+
+    for items in buckets.values():
+        if len(items) < 2:
+            continue
+        if len(items) > 200:
+            continue
+        for i in range(len(items)):
+            trace_a, text_a = items[i]
+            for j in range(i + 1, len(items)):
+                trace_b, text_b = items[j]
+                if abs(len(text_a) - len(text_b)) > 20:
+                    continue
+                dist = levenshtein_distance(text_a, text_b)
+                ratio = dist / max(len(text_a), len(text_b))
+                if ratio <= 0.2:
+                    union(trace_a, trace_b)
+
+    groups: Dict[str, int] = {}
+    group_id = 1
+    clusters: Dict[str, List[str]] = defaultdict(list)
+    for trace_id, _ in candidates:
+        root = find(trace_id)
+        clusters[root].append(trace_id)
+    for trace_ids in clusters.values():
+        if len(trace_ids) < 2:
+            continue
+        for trace_id in trace_ids:
+            groups[trace_id] = group_id
+        group_id += 1
+    return groups


This function contains several magic numbers that should be extracted as named constants to improve code clarity:

12 (line 126): minimum normalized text length for near-duplicate detection

20 (line 134): length bucket divisor for grouping similar-length texts

5 (line 135): prefix length for bucketing

200 (line 155): maximum bucket size before skipping

20 (line 161): maximum length difference threshold

0.2 (line 165): maximum distance ratio for near-duplicates

These magic numbers represent important thresholds for the near-duplicate detection algorithm. Extracting them as named constants would make the code more maintainable and the algorithm's behavior more transparent.

Copilot · 2026-02-13T06:10:57Z

+        response = requests.get(
+            f"{base_url.rstrip('/')}/api/public/media/{media_id}",
+            auth=(public_key, secret_key),
+            timeout=60,
+        )
+        if response.status_code == 429:
+            retry_after = response.headers.get("Retry-After")
+            wait_seconds = float(retry_after) if retry_after else delay
+            time.sleep(wait_seconds)
+            delay = min(delay * 2, 30)
+            continue
+        response.raise_for_status()
+        payload = response.json()
+        return payload["url"]


The fetch_media_url function can raise RuntimeError after exhausting retries, but it doesn't handle other HTTP errors (non-429 status codes) that might occur during the retry loop. If a 500 error or network error occurs, response.raise_for_status() will raise an exception that terminates the retry loop immediately, even though retrying might succeed.

Consider wrapping response.raise_for_status() in a try-except block that catches requests.HTTPError and requests.RequestException, retrying on transient errors while only raising on final failure.

Suggested change

response = requests.get(

f"{base_url.rstrip('/')}/api/public/media/{media_id}",

auth=(public_key, secret_key),

timeout=60,

)

if response.status_code == 429:

retry_after = response.headers.get("Retry-After")

wait_seconds = float(retry_after) if retry_after else delay

time.sleep(wait_seconds)

delay = min(delay * 2, 30)

continue

response.raise_for_status()

payload = response.json()

return payload["url"]

try:

response = requests.get(

f"{base_url.rstrip('/')}/api/public/media/{media_id}",

auth=(public_key, secret_key),

timeout=60,

)

if response.status_code == 429:

retry_after = response.headers.get("Retry-After")

wait_seconds = float(retry_after) if retry_after else delay

time.sleep(wait_seconds)

delay = min(delay * 2, 30)

continue

response.raise_for_status()

payload = response.json()

return payload["url"]

except (requests.HTTPError, requests.RequestException):

if attempt == retries - 1:

raise

time.sleep(delay)

delay = min(delay * 2, 30)

continue

Copilot · 2026-02-13T06:10:57Z

+def create_dataset_item(
+    base_url: str,
+    auth: Tuple[str, str],
+    payload: Dict[str, Any],
+    retries: int = 5,
+) -> None:
+    delay = 0.5
+    for attempt in range(retries):
+        response = requests.post(
+            f"{base_url.rstrip('/')}/api/public/dataset-items",
+            auth=auth,
+            json=payload,
+            timeout=30,
+        )
+        if response.status_code == 429:
+            retry_after = response.headers.get("Retry-After")
+            wait_seconds = float(retry_after) if retry_after else delay
+            time.sleep(wait_seconds)
+            delay = min(delay * 2, 10)
+            continue
+        if response.status_code in (200, 201):
+            return
+        if response.status_code in (400, 409):
+            return
+        response.raise_for_status()
+    raise RuntimeError("rate_limited dataset-items")


Similar to fetch_media_url in compute_audio_volume.py, the create_dataset_item function can raise exceptions from response.raise_for_status() that terminate the retry loop prematurely. If a transient 5xx error or network error occurs, the function raises immediately instead of retrying.

Consider wrapping response.raise_for_status() in a try-except block to handle transient errors gracefully and only fail after exhausting all retries.

Copilot · 2026-02-13T06:10:57Z

+    if len(a) < len(b):
+        a, b = b, a
+    previous = list(range(len(b) + 1))
+    for i, ca in enumerate(a, start=1):
+        current = [i]
+        for j, cb in enumerate(b, start=1):
+            insert_cost = current[j - 1] + 1
+            delete_cost = previous[j] + 1
+            replace_cost = previous[j - 1] + (0 if ca == cb else 1)
+            current.append(min(insert_cost, delete_cost, replace_cost))
+        previous = current
+    return previous[-1]


The normalize_text and levenshtein_distance functions are duplicated in both run_audit.py and align_with_full_transcript.py. This code duplication violates the DRY principle and creates maintenance burden.

Consider extracting these shared functions into a common utility module that both scripts can import from. This ensures consistent text normalization and distance calculation logic across different parts of the audit tooling.

Suggested change

if len(a) < len(b):

a, b = b, a

previous = list(range(len(b) + 1))

for i, ca in enumerate(a, start=1):

current = [i]

for j, cb in enumerate(b, start=1):

insert_cost = current[j - 1] + 1

delete_cost = previous[j] + 1

replace_cost = previous[j - 1] + (0 if ca == cb else 1)

current.append(min(insert_cost, delete_cost, replace_cost))

previous = current

return previous[-1]

len_a = len(a)

len_b = len(b)

dp = [[0] * (len_b + 1) for _ in range(len_a + 1)]

for i in range(len_a + 1):

dp[i][0] = i

for j in range(len_b + 1):

dp[0][j] = j

for i in range(1, len_a + 1):

ca = a[i - 1]

for j in range(1, len_b + 1):

cb = b[j - 1]

cost = 0 if ca == cb else 1

dp[i][j] = min(

dp[i - 1][j] + 1,

dp[i][j - 1] + 1,

dp[i - 1][j - 1] + cost,

)

return dp[len_a][len_b]

Copilot · 2026-02-13T06:10:57Z

+    output_csv = meeting_dir / "transcriptions_classified_with_audio.csv"
+    fields = list(records[0].keys()) if records else []
+    if records:
+        with output_csv.open("w", encoding="utf-8", newline="") as handle:
+            handle.write(",".join(fields) + "\n")
+            for record in records:
+                row = []
+                for field in fields:
+                    value = record.get(field)
+                    if isinstance(value, list):
+                        value = "|".join(str(item) for item in value)
+                    elif isinstance(value, dict):
+                        value = json.dumps(value)
+                    elif value is None:
+                        value = ""
+                    text = str(value)
+                    if "," in text or "\n" in text or '"' in text:
+                        text = '"' + text.replace('"', '""') + '"'
+                    row.append(text)
+                handle.write(",".join(row) + "\n")


This script uses manual CSV writing (lines 242-257) instead of the more robust csv.DictWriter approach that is properly used in align_with_full_transcript.py. The PR description mentions "uses csv.DictWriter with full field union for robust CSV output" as a hardening improvement, but this script wasn't updated.

Manual CSV writing is more error-prone and harder to maintain. The implementation in align_with_full_transcript.py (lines 190-197) demonstrates the better approach using csv.DictWriter with proper field handling. Consider applying the same pattern here for consistency and robustness.

github-actions · 2026-02-13T06:11:33Z

Visual regression report

No visual diffs detected.

Run: https://github.com/Chronote-gg/Chronote/actions/runs/21976613954

[AGENT] Add reusable hallucination audit toolkit and mitigation plan

bb013a2

Copilot AI review requested due to automatic review settings February 13, 2026 06:05

BASIC-BIT mentioned this pull request Feb 13, 2026

[AGENT] Add hallucination audit artifacts and docs #116

Closed

Copilot started reviewing on behalf of BASIC-BIT February 13, 2026 06:05 View session

chatgpt-codex-connector Bot reviewed Feb 13, 2026

View reviewed changes

Copilot AI reviewed Feb 13, 2026

View reviewed changes

BASIC-BIT marked this pull request as draft February 14, 2026 21:36

		chosen_window = max(window_results.keys(), key=lambda k: len(window_results[k]))
		traces = window_results[chosen_window]

		if list(segment_dir.glob("segment_*.mp3")):
		return sorted(segment_dir.glob("segment_*.mp3"))

-    if len(a) < len(b):
-        a, b = b, a
-    previous = list(range(len(b) + 1))
-    for i, ca in enumerate(a, start=1):
-        current = [i]
-        for j, cb in enumerate(b, start=1):
-            insert_cost = current[j - 1] + 1
-            delete_cost = previous[j] + 1
-            replace_cost = previous[j - 1] + (0 if ca == cb else 1)
-            current.append(min(insert_cost, delete_cost, replace_cost))
-        previous = current
-    return previous[-1]
+    len_a = len(a)
+    len_b = len(b)
+    dp = [[0] * (len_b + 1) for _ in range(len_a + 1)]
+    for i in range(len_a + 1):
+        dp[i][0] = i
+    for j in range(len_b + 1):
+        dp[0][j] = j
+    for i in range(1, len_a + 1):
+        ca = a[i - 1]
+        for j in range(1, len_b + 1):
+            cb = b[j - 1]
+            cost = 0 if ca == cb else 1
+            dp[i][j] = min(
+                dp[i - 1][j] + 1,
+                dp[i][j - 1] + 1,
+                dp[i - 1][j - 1] + cost,
+            )
+    return dp[len_a][len_b]

Conversation

BASIC-BIT commented Feb 13, 2026

What this PR includes

Script hardening included

Intentionally excluded

Context

Uh oh!

greptile-apps Bot commented Feb 13, 2026

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 13, 2026

Visual regression report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants