Skip to content

ahnaf015/cybersec-agentic-soc

Repository files navigation

Agentic SOC Orchestrator (MCP + Gemini) with Streamlit UI

An agentic SOC investigation workflow that orchestrates a multi-stage incident pipeline across:

  • Telemetry (cases + events)
  • Threat Intel (IOC enrichment)
  • Knowledge / Runbooks
  • Response controls (governed, human-in-the-loop actions)
  • Gemini LLM Analyst Notes (executive summary, hypotheses, safe query suggestions)

This repo includes two telemetry modes:

  • Dummy mode (synthetic demo): deterministic, easy to reproduce
  • IoT-23 mode (real IoT telemetry): Zeek-derived network telemetry from the IoT-23 dataset (processed into this project’s event schema)

The LLM does not get to “run tools freely.” Gemini only produces advisory outputs. Tool calls remain allowlisted + governed.


What this project demonstrates

  • Agentic SOC workflow with a staged state machine: triage → investigate → validate → recommend → respond
  • MCP tool orchestration across telemetry, threat intel, runbooks, and response services
  • Governed actions + audit trails (human-in-the-loop, logged)
  • LLM-assisted reasoning (Gemini):
    • Executive summary (“what happened”)
    • Hypotheses (“plausible narratives”)
    • Suggested next queries (advisory; validated/allowlisted; not blindly executed)

Screenshots

Streamlit UI

UI screenshot


Architecture (high level)

Streamlit UI → calls → FastAPI Orchestrator → orchestrates MCP tools:

  • telemetry.get_case, telemetry.search_events (+ optional pivots)
  • threatintel.enrich / enrich_batch
  • knowledge.get_runbook
  • response.request_action(s) (governed + audited)
  • Gemini produces LLM Analysis, embedded into the final report

Outputs:

  • JSON response (structured result)
  • Markdown incident report
  • Logs: logs/agent.jsonl
  • Audit trail: logs/audit.jsonl

Repository layout

services/
  agent_orchestrator/
    app.py               # FastAPI API
    agent.py              # Core agent pipeline + guardrails + Gemini integration
    planner.py            # Deterministic planner (tools + state transitions)
    report.py             # Markdown report generator (includes LLM section)
    mcp_client.py         # JSON-RPC MCP client + logging + audit
    llm_gemini.py         # Gemini wrapper
    logging_config.py     # JSON logging config
    audit.py              # AuditTrail JSONL writer
services/
  mcp_telemetry/          # Telemetry MCP service (dummy or IoT-23 backend)
  mcp_threatintel/        # Threat intel MCP service (demo)
  mcp_knowledge/          # Runbooks MCP service (demo)
  mcp_response/           # Response MCP service (demo / governed actions)
frontend/
  streamlit_app.py        # Streamlit UI
data/
  cases.json              # Dummy cases (demo mode)
  telemetry.jsonl         # Dummy telemetry events (demo mode)
  threat_intel.json       # Demo threat intel mappings
  runbooks.json           # Demo runbooks
  iot23/
    processed/            # IoT-23 processed outputs (cases.json + telemetry.jsonl)
    raw/
Dockerfile
Dockerfile.streamlit
docker-compose.yml
requirements.txt

Telemetry modes

A) Dummy mode (default)

Uses:

  • data/cases.json
  • data/telemetry.jsonl

B) IoT-23 mode (real IoT telemetry)

Uses processed files (generated by running the preprocessing script scripts/iot23_build_processed.py):

  • data/iot23/processed/cases.json
  • data/iot23/processed/telemetry.jsonl

Enable IoT-23 mode with:

TELEMETRY_BACKEND=iot23

Prerequisites

  • Python 3.11+
  • (Recommended) Docker + Docker Compose
  • (Optional) Kubernetes (Docker Desktop Kubernetes / local cluster)

Environment setup (.env)

Create a .env file in the repo root:

# Gemini
GEMINI_API_KEY=YOUR_KEY_HERE
GEMINI_MODEL=gemini-2.5-flash
ENABLE_LLM=true

# Telemetry backend
# dummy | iot23
TELEMETRY_BACKEND=dummy

# MCP endpoints (local)
MCP_TELEMETRY_URL=http://127.0.0.1:8011
MCP_THREATINTEL_URL=http://127.0.0.1:8012
MCP_RESPONSE_URL=http://127.0.0.1:8013
MCP_KNOWLEDGE_URL=http://127.0.0.1:8014

Notes:

  • .env is intentionally not committed (see .gitignore).
  • Docker Compose loads .env automatically.

Run locally (no Docker)

1) Install deps

python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate

pip install -r requirements.txt

2) Start MCP services (in separate terminals)

Start each service (example pattern):

python -m uvicorn services.mcp_telemetry.app:app --host 127.0.0.1 --port 8011 --reload
python -m uvicorn services.mcp_threatintel.app:app --host 127.0.0.1 --port 8012 --reload
python -m uvicorn services.mcp_response.app:app --host 127.0.0.1 --port 8013 --reload
python -m uvicorn services.mcp_knowledge.app:app --host 127.0.0.1 --port 8014 --reload

3) Start FastAPI orchestrator

python -m uvicorn services.agent_orchestrator.app:app --host 127.0.0.1 --port 8020 --reload

Health:

  • http://127.0.0.1:8020/health

4) Start Streamlit UI (new terminal)

streamlit run frontend/streamlit_app.py --server.port 8502

Open:

  • http://localhost:8502

Run with Docker Compose

This setup runs:

  • FastAPI orchestrator (port 8020)
  • Streamlit UI (port 8502)

1) Build + run

docker compose up --build

2) Verify health

curl http://localhost:8020/health

Run an investigation

Option A: FastAPI directly

curl "http://localhost:8020/investigate/1002?simulate_response=true&requested_by=analyst"

Option B: Streamlit UI

Open http://localhost:8502, enter a Case ID, click Investigate.


How to confirm Gemini is running

1) Look for log markers

Check logs/agent.jsonl for:

  • llm_call_start
  • llm_call_done (with ok: true)
  • llm_analysis_done

Example:

{"logger":"agent.core","msg":"llm_call_done","duration_ms":11246,"ok":true}

2) Confirm in API response

The /investigate/{case_id} response includes:

  • result.summary.llm_analysis

And the Markdown report contains:

  • ## LLM Analysis (Gemini)

Logs and audit trail

  • Runtime + tool calls: logs/agent.jsonl
  • Tool calls + state transitions (audit): logs/audit.jsonl

If you run in Docker/Kubernetes and want logs persisted:

  • Use a volume mount for /app/logs (recommended for demos)

Guardrails / Safety model

  • Deterministic state machine workflow
  • Allowlisted tool calls only
  • Response actions are governed
    • simulate_response=true allows simulation without real execution
    • confidence threshold gates execution
  • Audit logs for every tool call + state transition
  • LLM outputs are advisory, not executable commands

IoT-23 dataset notes

This project can run on processed IoT-23 telemetry (Zeek logs transformed into the same event schema used by the agent).

  • The agent extracts IOCs (IPs/domains/hashes) from events
  • Pivot tools (telemetry.pivot, telemetry.pivot_domain) help explore related activity
  • Confidence can start low if only network connection logs are present; it improves with richer telemetry (DNS/HTTP/SSL, auth logs, process logs, EDR signals, etc.)

Roadmap: next improvements

Already implemented / in-progress ideas:

  1. Better IOC extraction for domains (URLs/emails + normalization)
  2. Action deduplication to avoid repeated response actions
  3. Strict validator for LLM suggested_next_queries (allowlist tools + args schema)

Top additional improvements:

  1. Richer tagging + confidence calibration
    Add heuristics that leverage IoT-23 labels (Benign vs Malicious) and connection features (rare ports, burst patterns, beaconing) to improve tags + confidence scores.
  2. Pivot explorer + timeline UI
    Add a timeline view (group by minute/hour) and an IOC pivot explorer to navigate events interactively in Streamlit.
  3. Observability + metrics
    Add Prometheus metrics for tool latency, error rate, LLM latency, and pipeline stage durations (great for Docker/Kubernetes demos).

Common issues

Streamlit can’t reach FastAPI in Docker

Inside Docker, use service DNS name:

  • AGENT_ORCH_URL=http://agent_orchestrator:8020

Confirm from inside the container:

docker compose exec streamlit_ui python -c "import os,requests; u=os.getenv('AGENT_ORCH_URL'); print(u, requests.get(u+'/health').json())"

Gemini “API key missing” in Docker

Make sure:

  • .env exists in repo root
  • GEMINI_API_KEY is present
  • env_file: .env is included in docker-compose

License

This project is licensed under the MIT License — see the LICENSE.txt file for details.


Contributing

Contributions are welcome!

  1. Fork the repository
  2. Create a feature branch
    git checkout -b feature/your-feature
  3. Make your changes (include clear documentation / comments)
  4. Run tests (if applicable)
    pytest
  5. Submit a Pull Request with a detailed description of what you changed and why

Support

For questions or issues:

  • Check logs:
    • logs/agent.jsonl (runtime + tool calls)
    • logs/audit.jsonl (audit trail + state transitions)
  • Open a GitHub issue and include:
    • steps to reproduce
    • expected vs actual behavior
    • relevant log lines (redact secrets)
    • your environment (OS, Python version, Docker version)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors