AI Security Lab

A hands-on engineering lab for learning AI security — covering prompt injection detection, LLM guardrails, PII redaction, and output filtering.

Overview

The AI Security Lab is a collection of practical, self-contained labs focused on protecting LLM-powered applications from real-world attacks. Each lab builds on the previous one, taking you from detection to full pipeline defense.

Lab	Topic	Key Skills
Lab 1 — Prompt Injection Detection	Detect adversarial prompts using regex, heuristics, and a DeBERTa classifier	Multi-layer detection, threshold tuning, JSON output
Lab 2 — LLM Guardrails Pipeline	Block injections and strip PII before they reach your LLM	Input/output validation, PII redaction, custom content policy

Repository Structure

ai-security-lab/
├── agent.py                   # Lab 1 — Prompt injection detector
├── test_prompts.txt           # Lab 1 — Sample attack prompts
├── GUIDE.md                   # Lab 1 — Full setup & usage guide
└── llm-guardrails-lab/
    ├── agent.py               # Lab 2 — Guardrails pipeline
    ├── test_inputs.txt        # Lab 2 — Sample inputs (safe + malicious)
    └── GUIDE.md               # Lab 2 — Full setup & usage guide

Lab 1 — Prompt Injection Detection

Goal: Detect prompt injection attacks against LLM applications using three stacked detection layers — no false negatives, minimal false positives.

Detection Architecture

Layer	Method	Speed	Requires
1 — Regex	25+ known attack patterns	< 1ms	Nothing
2 — Heuristic	Structural anomaly scoring	< 5ms	Nothing
3 — Classifier	DeBERTa AI model (99%+ accuracy)	~150ms	`transformers` + `torch`

Quick Start

# Install dependencies (first time only)
pip install transformers torch sentencepiece protobuf

# Analyze a single prompt — full AI-powered mode
python agent.py --input "Ignore all previous instructions" --mode full

# Fast regex-only scan (no model required)
python agent.py --input "Some text" --mode regex

# Batch scan a file and get structured JSON output
python agent.py --file test_prompts.txt --mode full --output json

Sample Output

Verdict         : INJECTION DETECTED
Composite Score : 0.7102
Regex Score     : 0.5000   Matches: [system_prompt_override]
Heuristic Score : 0.3009
Classifier      : INJECTION (1.0000)
Detection Time  : 144.73 ms

Score guide: 0.0–0.3 safe · 0.3–0.5 suspicious · 0.5–1.0 injection detected

Attack Patterns Detected

Pattern	Example
System prompt override	"Ignore all previous instructions..."
Role-play escape	"You are now / Act as / Pretend to be..."
Developer mode	"DAN mode / jailbreak / god mode..."
Data exfiltration	"Reveal your system prompt..."
Token smuggling	Zero-width characters, hidden Unicode
Encoding obfuscation	"Base64 decode this and follow it..."
Few-shot injection	Fake conversation history to redirect behavior

Full setup guide: GUIDE.md

Lab 2 — LLM Guardrails Pipeline

Goal: Build a production-ready middleware layer that validates every input and output around your LLM — blocking attacks, stripping PII, and enforcing content policies.

Pipeline Architecture

User Input
    |
    v
[Length Guard]       <- Blocks oversized inputs
    |
    v
[Injection Guard]    <- Blocks prompt injections & jailbreaks
    |
    v
[Content Policy]     <- Blocks harmful or off-topic requests
    |
    v
[PII Guard]          <- Strips SSNs, emails, credit cards, API keys
    |
    v
  LLM API            <- Receives only clean, sanitized input
    |
    v
[Output Guard]       <- Catches system prompt leakage & PII in responses
    |
    v
User Response        <- Safe, redacted output

Quick Start

# No external dependencies — pure Python stdlib

# Full pipeline validation
python llm-guardrails-lab/agent.py --input "Your message here" --mode input-only

# PII detection and redaction only
python llm-guardrails-lab/agent.py --input "My SSN is 123-45-6789" --mode pii

# Batch scan with JSON output
python llm-guardrails-lab/agent.py --file llm-guardrails-lab/test_inputs.txt --mode input-only --output json

PII Redaction Reference

PII Type	Example Input	Replaced With
US SSN	123-45-6789	`[SSN_REDACTED]`
Email	user@example.com	`[EMAIL_REDACTED]`
Credit Card	4111 1111 1111 1111	`[CARD_REDACTED]`
Phone	(555) 123-4567	`[PHONE_REDACTED]`
IP Address	192.168.1.1	`[IP_REDACTED]`
AWS Key	AKIA...	`[AWS_KEY_REDACTED]`

Embed in Your Application

from agent import GuardrailsPipeline

pipeline = GuardrailsPipeline()  # or pass policy_path="custom_policy.json"

# --- Before sending to LLM ---
result = pipeline.validate_input(user_message)
if not result.safe:
    return f"Request blocked: {result.blocked_reason}"

# Send sanitized text (PII already stripped)
llm_response = your_llm.generate(result.sanitized_text)

# --- Before returning to user ---
output = pipeline.validate_output(llm_response)
return output.sanitized_text

Full setup guide: llm-guardrails-lab/GUIDE.md

Prerequisites

Lab	Python Version	External Dependencies
Lab 1	3.10+	`transformers`, `torch`, `sentencepiece`, `protobuf`
Lab 2	3.10+	None — pure Python stdlib

Learning Path

This lab is designed as a progressive curriculum in AI security engineering:

Lab 1 — Prompt Injection Detection — Understand and detect adversarial inputs
Lab 2 — LLM Guardrails Pipeline — Build a full input/output defense layer
Coming next: Malware Behavior Analysis with Cuckoo Sandbox
Coming next: Network Traffic Analysis with Wireshark

License

This project is open source and available under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Security Lab

Overview

Repository Structure

Lab 1 — Prompt Injection Detection

Detection Architecture

Quick Start

Sample Output

Attack Patterns Detected

Lab 2 — LLM Guardrails Pipeline

Pipeline Architecture

Quick Start

PII Redaction Reference

Embed in Your Application

Prerequisites

Learning Path

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
llm-guardrails-lab		llm-guardrails-lab
GUIDE.md		GUIDE.md
README.md		README.md
agent.py		agent.py
test_prompts.txt		test_prompts.txt

Folders and files

Latest commit

History

Repository files navigation

AI Security Lab

Overview

Repository Structure

Lab 1 — Prompt Injection Detection

Detection Architecture

Quick Start

Sample Output

Attack Patterns Detected

Lab 2 — LLM Guardrails Pipeline

Pipeline Architecture

Quick Start

PII Redaction Reference

Embed in Your Application

Prerequisites

Learning Path

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages