Skip to content

Commit 08ad7cb

Browse files
authored
Merge pull request #3 from msitarzewski/v0.2.0
v0.2.0 — It Thinks Deeper
2 parents 4856ad8 + 8a6c5fc commit 08ad7cb

84 files changed

Lines changed: 10877 additions & 213 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 34 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,21 +22,30 @@ duh ask "What database should I use for a new SaaS product?"
2222
## Features
2323

2424
- **Multi-model consensus** -- Claude and GPT debate. Sycophantic challenges are detected and flagged.
25-
- **Persistent memory** -- Every thread, contribution, and decision stored in SQLite. Search with `duh recall`.
25+
- **Voting protocol** -- Fan out to all models in parallel, aggregate answers via majority or weighted synthesis.
26+
- **Query decomposition** -- Break complex questions into subtask DAGs, solve in parallel, synthesize results.
27+
- **Decision taxonomy** -- Auto-classify decisions by intent, category, and genus for structured recall.
28+
- **Outcome tracking** -- Record success/failure/partial feedback on past decisions.
29+
- **Tool-augmented reasoning** -- Models can call web search, read files, and execute code during consensus.
30+
- **Persistent memory** -- Every thread, contribution, decision, vote, and subtask stored in SQLite. Search with `duh recall`.
2631
- **Cost tracking** -- Per-model token costs in real-time. Configurable warn threshold and hard limit.
2732
- **Local models** -- Ollama and LM Studio via the OpenAI-compatible API. Mix cloud + local.
28-
- **Docker** -- Run in a container with persistent volume storage.
2933
- **Rich CLI** -- Styled panels, spinners, and formatted output.
3034

3135
## Commands
3236

3337
```bash
34-
duh ask "question" # Run consensus query
35-
duh recall "keyword" # Search past decisions
36-
duh threads # List past threads
37-
duh show <thread-id> # Inspect full debate history
38-
duh models # List available models
39-
duh cost # Show cumulative costs
38+
duh ask "question" # Run consensus query
39+
duh ask "question" --decompose # Decompose into subtasks first
40+
duh ask "question" --protocol voting # Use voting protocol instead
41+
duh ask "question" --protocol auto # Auto-select protocol by question type
42+
duh ask "question" --tools # Enable tool use (web search, file read, code exec)
43+
duh feedback <thread-id> --result success # Record outcome for a decision
44+
duh recall "keyword" # Search past decisions
45+
duh threads # List past threads
46+
duh show <thread-id> # Inspect full debate history
47+
duh models # List available models
48+
duh cost # Show cumulative costs
4049
```
4150

4251
## How consensus works
@@ -46,12 +55,28 @@ PROPOSE --> CHALLENGE --> REVISE --> COMMIT
4655
```
4756

4857
1. Strongest model proposes an answer
49-
2. Other models challenge with forced disagreement
58+
2. Other models challenge with forced disagreement (4 framing types: flaw, alternative, risk, devil's advocate)
5059
3. Proposer revises, addressing each valid challenge
5160
4. Decision extracted with confidence score and preserved dissent
5261

5362
Convergence detection (Jaccard similarity >= 0.7) stops early when challenges repeat.
5463

64+
### Voting protocol
65+
66+
```
67+
FAN-OUT (all models) --> AGGREGATE (majority / weighted)
68+
```
69+
70+
All models answer independently in parallel. A meta-judge (strongest model) picks the best answer (majority) or synthesizes all answers weighted by capability (weighted).
71+
72+
### Decomposition
73+
74+
```
75+
DECOMPOSE --> SCHEDULE (topological sort) --> SYNTHESIZE
76+
```
77+
78+
Complex questions are broken into a subtask DAG. Independent subtasks run in parallel. Results are synthesized into a final answer by the strongest model.
79+
5580
## Phase 0 benchmark
5681

5782
Before building duh, we validated the thesis: 50 questions, 4 methods, blind LLM-as-judge evaluation. Consensus consistently outperformed direct answers, self-debate, and ensemble approaches -- especially on questions requiring nuanced judgment and multi-perspective analysis. See [full benchmark results](docs/reference/benchmarks.md).
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
"""v0.1 baseline schema.
2+
3+
Revision ID: 001
4+
Revises:
5+
Create Date: 2026-02-16
6+
"""
7+
8+
from __future__ import annotations
9+
10+
import sqlalchemy as sa
11+
from alembic import op
12+
13+
revision: str = "001"
14+
down_revision: str | None = None
15+
branch_labels: tuple[str, ...] | None = None
16+
depends_on: str | None = None
17+
18+
19+
def upgrade() -> None:
20+
op.create_table(
21+
"threads",
22+
sa.Column("id", sa.String(36), primary_key=True),
23+
sa.Column("question", sa.Text(), nullable=False),
24+
sa.Column("status", sa.String(20), nullable=False, server_default="active"),
25+
sa.Column("created_at", sa.DateTime(), nullable=False),
26+
sa.Column("updated_at", sa.DateTime(), nullable=False),
27+
)
28+
op.create_index("ix_threads_status", "threads", ["status"])
29+
op.create_index("ix_threads_created_at", "threads", ["created_at"])
30+
31+
op.create_table(
32+
"turns",
33+
sa.Column("id", sa.String(36), primary_key=True),
34+
sa.Column(
35+
"thread_id",
36+
sa.String(36),
37+
sa.ForeignKey("threads.id"),
38+
nullable=False,
39+
),
40+
sa.Column("round_number", sa.Integer(), nullable=False),
41+
sa.Column("state", sa.String(20), nullable=False),
42+
sa.Column("created_at", sa.DateTime(), nullable=False),
43+
sa.Column("completed_at", sa.DateTime(), nullable=True),
44+
)
45+
op.create_index("ix_turns_thread_id", "turns", ["thread_id"])
46+
op.create_index(
47+
"ix_turns_thread_round",
48+
"turns",
49+
["thread_id", "round_number"],
50+
unique=True,
51+
)
52+
53+
op.create_table(
54+
"contributions",
55+
sa.Column("id", sa.String(36), primary_key=True),
56+
sa.Column(
57+
"turn_id",
58+
sa.String(36),
59+
sa.ForeignKey("turns.id"),
60+
nullable=False,
61+
),
62+
sa.Column("model_ref", sa.String(100), nullable=False),
63+
sa.Column("role", sa.String(20), nullable=False),
64+
sa.Column("content", sa.Text(), nullable=False),
65+
sa.Column("input_tokens", sa.Integer(), nullable=False, server_default="0"),
66+
sa.Column("output_tokens", sa.Integer(), nullable=False, server_default="0"),
67+
sa.Column("cost_usd", sa.Float(), nullable=False, server_default="0.0"),
68+
sa.Column("latency_ms", sa.Float(), nullable=False, server_default="0.0"),
69+
sa.Column("created_at", sa.DateTime(), nullable=False),
70+
)
71+
op.create_index("ix_contributions_turn_id", "contributions", ["turn_id"])
72+
op.create_index("ix_contributions_model_ref", "contributions", ["model_ref"])
73+
74+
op.create_table(
75+
"turn_summaries",
76+
sa.Column("id", sa.String(36), primary_key=True),
77+
sa.Column(
78+
"turn_id",
79+
sa.String(36),
80+
sa.ForeignKey("turns.id"),
81+
unique=True,
82+
nullable=False,
83+
),
84+
sa.Column("summary", sa.Text(), nullable=False),
85+
sa.Column("model_ref", sa.String(100), nullable=False),
86+
sa.Column("created_at", sa.DateTime(), nullable=False),
87+
)
88+
89+
op.create_table(
90+
"thread_summaries",
91+
sa.Column("id", sa.String(36), primary_key=True),
92+
sa.Column(
93+
"thread_id",
94+
sa.String(36),
95+
sa.ForeignKey("threads.id"),
96+
unique=True,
97+
nullable=False,
98+
),
99+
sa.Column("summary", sa.Text(), nullable=False),
100+
sa.Column("model_ref", sa.String(100), nullable=False),
101+
sa.Column("created_at", sa.DateTime(), nullable=False),
102+
)
103+
104+
op.create_table(
105+
"decisions",
106+
sa.Column("id", sa.String(36), primary_key=True),
107+
sa.Column(
108+
"turn_id",
109+
sa.String(36),
110+
sa.ForeignKey("turns.id"),
111+
unique=True,
112+
nullable=False,
113+
),
114+
sa.Column(
115+
"thread_id",
116+
sa.String(36),
117+
sa.ForeignKey("threads.id"),
118+
nullable=False,
119+
),
120+
sa.Column("content", sa.Text(), nullable=False),
121+
sa.Column("confidence", sa.Float(), nullable=False, server_default="0.0"),
122+
sa.Column("dissent", sa.Text(), nullable=True),
123+
sa.Column("created_at", sa.DateTime(), nullable=False),
124+
)
125+
op.create_index("ix_decisions_thread_id", "decisions", ["thread_id"])
126+
127+
128+
def downgrade() -> None:
129+
op.drop_table("decisions")
130+
op.drop_table("thread_summaries")
131+
op.drop_table("turn_summaries")
132+
op.drop_table("contributions")
133+
op.drop_table("turns")
134+
op.drop_table("threads")

alembic/versions/002_v02_schema.py

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
"""v0.2 schema — taxonomy, outcomes, subtasks.
2+
3+
Revision ID: 002
4+
Revises: 001
5+
Create Date: 2026-02-16
6+
"""
7+
8+
from __future__ import annotations
9+
10+
import sqlalchemy as sa
11+
from alembic import op
12+
13+
revision: str = "002"
14+
down_revision: str = "001"
15+
branch_labels: tuple[str, ...] | None = None
16+
depends_on: str | None = None
17+
18+
19+
def upgrade() -> None:
20+
# Add taxonomy columns to decisions
21+
with op.batch_alter_table("decisions") as batch_op:
22+
batch_op.add_column(sa.Column("intent", sa.String(50), nullable=True))
23+
batch_op.add_column(sa.Column("category", sa.String(50), nullable=True))
24+
batch_op.add_column(sa.Column("genus", sa.String(50), nullable=True))
25+
26+
# Outcomes table
27+
op.create_table(
28+
"outcomes",
29+
sa.Column("id", sa.String(36), primary_key=True),
30+
sa.Column(
31+
"decision_id",
32+
sa.String(36),
33+
sa.ForeignKey("decisions.id"),
34+
unique=True,
35+
nullable=False,
36+
),
37+
sa.Column(
38+
"thread_id",
39+
sa.String(36),
40+
sa.ForeignKey("threads.id"),
41+
nullable=False,
42+
),
43+
sa.Column("result", sa.String(20), nullable=False),
44+
sa.Column("notes", sa.Text(), nullable=True),
45+
sa.Column("created_at", sa.DateTime(), nullable=False),
46+
sa.Column("updated_at", sa.DateTime(), nullable=False),
47+
)
48+
op.create_index("ix_outcomes_thread_id", "outcomes", ["thread_id"])
49+
50+
# Subtasks table
51+
op.create_table(
52+
"subtasks",
53+
sa.Column("id", sa.String(36), primary_key=True),
54+
sa.Column(
55+
"parent_thread_id",
56+
sa.String(36),
57+
sa.ForeignKey("threads.id"),
58+
nullable=False,
59+
),
60+
sa.Column(
61+
"child_thread_id",
62+
sa.String(36),
63+
sa.ForeignKey("threads.id"),
64+
nullable=True,
65+
),
66+
sa.Column("label", sa.String(200), nullable=False),
67+
sa.Column("description", sa.Text(), nullable=False),
68+
sa.Column("dependencies", sa.Text(), nullable=False, server_default="[]"),
69+
sa.Column(
70+
"status",
71+
sa.String(20),
72+
nullable=False,
73+
server_default="pending",
74+
),
75+
sa.Column("sequence_order", sa.Integer(), nullable=False, server_default="0"),
76+
sa.Column("created_at", sa.DateTime(), nullable=False),
77+
sa.Column("updated_at", sa.DateTime(), nullable=False),
78+
)
79+
op.create_index(
80+
"ix_subtasks_parent_thread_id", "subtasks", ["parent_thread_id"]
81+
)
82+
83+
84+
def downgrade() -> None:
85+
op.drop_table("subtasks")
86+
op.drop_table("outcomes")
87+
with op.batch_alter_table("decisions") as batch_op:
88+
batch_op.drop_column("genus")
89+
batch_op.drop_column("category")
90+
batch_op.drop_column("intent")

alembic/versions/003_v02_votes.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
"""v0.2 schema -- votes table for voting protocol.
2+
3+
Revision ID: 003
4+
Revises: 002
5+
Create Date: 2026-02-16
6+
"""
7+
8+
from __future__ import annotations
9+
10+
import sqlalchemy as sa
11+
from alembic import op
12+
13+
revision: str = "003"
14+
down_revision: str = "002"
15+
branch_labels: tuple[str, ...] | None = None
16+
depends_on: str | None = None
17+
18+
19+
def upgrade() -> None:
20+
op.create_table(
21+
"votes",
22+
sa.Column("id", sa.String(36), primary_key=True),
23+
sa.Column(
24+
"thread_id",
25+
sa.String(36),
26+
sa.ForeignKey("threads.id"),
27+
nullable=False,
28+
),
29+
sa.Column("model_ref", sa.String(100), nullable=False),
30+
sa.Column("content", sa.Text(), nullable=False),
31+
sa.Column("created_at", sa.DateTime(), nullable=False),
32+
)
33+
op.create_index("ix_votes_thread_id", "votes", ["thread_id"])
34+
35+
36+
def downgrade() -> None:
37+
op.drop_table("votes")

0 commit comments

Comments
 (0)