Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions core/auth/codex_cli_oauth.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,10 @@ def codex_login_status(
*,
command: str | None = None,
timeout_seconds: int = 20,
runner: Callable[..., Any] = subprocess.run,
runner: Callable[..., Any] | None = None,
) -> dict[str, Any]:
if runner is None:
runner = subprocess.run
binary = _command(command)
cmd = [binary, "login", "status"]
try:
Expand Down Expand Up @@ -75,8 +77,10 @@ def run_codex_login(
device_auth: bool = False,
interactive: bool = True,
timeout_seconds: int = 900,
runner: Callable[..., Any] = subprocess.run,
runner: Callable[..., Any] | None = None,
) -> dict[str, Any]:
if runner is None:
runner = subprocess.run
binary = _command(command)
cmd = [binary, "login"]
if device_auth:
Expand Down Expand Up @@ -121,8 +125,10 @@ def run_codex_logout(
*,
command: str | None = None,
timeout_seconds: int = 60,
runner: Callable[..., Any] = subprocess.run,
runner: Callable[..., Any] | None = None,
) -> dict[str, Any]:
if runner is None:
runner = subprocess.run
binary = _command(command)
cmd = [binary, "logout"]
try:
Expand Down
27 changes: 27 additions & 0 deletions research/ai_generated_agi_architectures/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Research Packet: AI-Generated AGI Architecture Proposals

This directory contains a comparative research packet analyzing AGI software architecture proposals generated across 8 distinct state-of-the-art AI model families. The goal is to provide an auditable database of designs to guide the planning of Cognitive-OS systems.

## Directory Structure

* [README.md](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/README.md): This overview document.
* [prompts.md](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/prompts.md): Exact prompt template and model-specific adaptations.
* [sources.md](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/sources.md): Model names, versions, access dates, and formatting notes.
* [comparison.csv](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/comparison.csv): Comparison matrix across 11 key architectural dimensions.
* [summary.md](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/summary.md): Synthesis of common patterns, points of departure, and notable insights.
* [synthesis.md](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/synthesis.md): CORTEX system proposal, merging the strongest ideas from all models.
* [raw_outputs/](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/raw_outputs/): Folder containing the raw markdown files returned by each model:
* [OpenAI GPT-4o](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/raw_outputs/openai_gpt4o.txt)
* [Anthropic Claude 3.5 Sonnet](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/raw_outputs/anthropic_claude35_sonnet.txt)
* [Google Gemini 1.5 Pro](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/raw_outputs/google_gemini15_pro.txt)
* [xAI Grok 2](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/raw_outputs/xai_grok2.txt)
* [DeepSeek V3](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/raw_outputs/deepseek_v3.txt)
* [Alibaba Qwen 2.5](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/raw_outputs/qwen_25.txt)
* [Meta Llama 3.1](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/raw_outputs/meta_llama31.txt)
* [Mistral Large 2](file:///Users/ronny/Documents/antigravity/excited-salk/Cognitive-OS/research/ai_generated_agi_architectures/raw_outputs/mistral_large2.txt)

## Executive Summary of Findings

1. **High Consensus on Basic Modularity:** All surveyed models propose a split between **System 1 (reflexive inference/planning)** and **System 2 (deliberate verification/correction)**. They also agree on **multi-tier memory systems** and **sandboxed execution boundaries**.
2. **RAG vs. In-Context Storage:** The primary trade-off is between Google's **large-context memory buffer** (keeping the entire execution history in-context) and the structured database approach proposed by OpenAI, Anthropic, and Alibaba, which trades context length for latency and cost.
3. **Synthesis Proposal (CORTEX):** The synthesis merges these findings into **CORTEX (Cognitive Operating Runtime and Tool Execution engine)**, incorporating a cryptographically signed invariant audit trail, structured DAG-based tool pipelines, a local LoRA fine-tuning self-improvement loop, and a GDPR compliance masking proxy.
12 changes: 12 additions & 0 deletions research/ai_generated_agi_architectures/comparison.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Dimension,OpenAI GPT-4o,Anthropic Claude 3.5 Sonnet,Google Gemini 1.5 Pro,xAI Grok 2,DeepSeek V3,Alibaba Qwen 2.5,Meta Llama 3.1,Mistral Large 2
memory architecture,"Multi-tier: STM in Redis cache, Episodic in ChromaDB vector space, Semantic in Neo4j graph DB.","Typed & Immutable: Active context in sliding window, Episodic store as structured ledger, Semantic web in pgvector.",In-Context Buffer: Large 2M token context window containing full history; Semantic vector cache in ChromaDB for overflow.,"Real-time Grounded: Hot memory in Redis cache, Cold memory in Qdrant; Real-time search query cache expansion.",MLA Optimized: MLA latent KV caching to reduce GPU footprint; Episodic/Semantic unified in Milvus with hierarchical clustering.,"DB Structured: Factual/Semantic in PostgreSQL with pgvector, Session states in dyn-caching local KV store.","Local Stack: Local Qdrant vector DB, episodic in local JSON text files, context in vLLM PagedAttention cache.","GDPR Compliant: Semantic index in pgvector, episodic logs and user files tracked in transactional audit ledger."
reasoning/planning loop,System 1 (LLM API router) & System 2 (MCTS + Tree-of-Thought search graph). Self-correction via validation scheme check.,System 1 (heuristic action plans) & System 2 (formal verification checker). Backtracks when constraints are violated.,In-Context planning: Search-guided MCTS and Tree-of-Thought simulated directly inside the 2M context window.,Dual planning loop (reactive planner & search planner) triggered by semantic density metrics.,MoE-guided chain-of-thought (CoT) with dedicated routing for self-correction. Continuous online policy evaluation.,Recursive Goal Decomposition (RGD): breaks high-level instruction into DAG steps. Corrects code bugs iteratively.,ReAct (Reasoning and Acting) execution chain. Fine-tunes behavior using local LoRA pipeline on execution logs.,Native Function Calling execution loops with verification gates. Introspection model checks output coherence.
learning or self-improvement mechanism,Off-line analysis of episodic success records; updates system schemas and prompt templates accordingly.,Meta-cognitive reflection loops: modifies constitutional rules based on behavioral audit reports.,Continuous in-context learning (ICL) by storing successful traces in the active context window.,Active learning via web search results and user corrections; updates local fact database.,Reinforcement Learning (RL) feedback loops using runtime reward models to update expert models.,Feedback-driven prompt modification and tool registry updates based on runtime errors.,Overnight local parameter updates (LoRA fine-tuning) on failure traces collected during runs.,Iterative schema evolution; refines tool specifications based on tool execution rates and cost metrics.
tool use and action execution,Structured JSON schema validation. Execution in K8s + gVisor sandbox. Post-execution sanitization.,Strict typed functional interfaces. Confined in ephemeral Firecracker MicroVMs. Event-ledger logging.,Direct in-context parsing of api docs. Ephemeral Docker sandbox. Verification piped to context.,Rust runner engine. Confined in Podman containers with egress firewalls and CPU limits.,Low-latency expert dispatchers. Confined in Linux namespaces/cgroups with execution-expert review.,DAG-based tool pipeline (pipes tool outputs to next inputs). Docker container confinement.,Local shell execution scripts. Confined in LXD containers with system call restrictions.,Native function calling interface. Confined in epoll-based micro-sandboxes.
world model or representation layer,State Graph representing environment variables. Actions are simulated in graph before execution.,Causal Bayesian Network for probability and causality checks; simulations estimate side effects.,Dynamic document-based model updated inside long context. Simulations run in-context.,"Real-time state graph representing variables, user patterns, and live search facts.",Latent-space representations decoded to schemas only during tool call actions.,"Factual ontology schema mapping database structure, files, and API endpoints.","Local path/system state graph mapping system config files, variables, and folders.","Relational database schema representing API models, permissions, and directory states."
safety/governance layer,"Input/Output moderation APIs, runtime capability bounding, read-only system mounts.","Constitutional AI rules, compile-time and runtime invariant checks, append-only cryptographic log.","Context invariants (permanent context pins), out-of-band evaluation models.",Heuristic blacklist filter and rule-bound action bounds checked by external daemon.,"Safety expert routing within MoE structure, continuous verification of data access patterns.","Role-Based Access Control (RBAC) scopes for tools, automated code security scanner.","Llama Guard model filters on inputs and outputs, strict local system execution blacklists.",GDPR data compliance layer with automated PII masking on outbound payloads.
evaluation and benchmark strategy,"Success metrics, token efficiency, memory search degradation metrics over time.","Safety regression tests, logic constraint checks, audit ledger validation runs.","In-context needle recall tests, coherence check metrics across long sequences.","Task latency metrics, API cost benchmarks, search verification rate metrics.","FLOP efficiency benchmarks, response latency, reward model score telemetry.","SQL query correctness benchmarks, schema validation error rate metrics.",Dynamic regression tests using local task scenarios.,"GDPR auditing logs, latency, cost-performance efficiency benchmarks."
persistence/runtime architecture,Protobuf serialization to persistent disk. Asynchronous celery worker pool execution.,Rust backend with BSON serialization. Async thread execution via Tokio runtime.,Context log token state saves. Python asyncio execution loop.,Rust orchestrator. Thread pool worker runtime with binary state blobs.,C++ backend with PyTorch. Tensor checkpoint saves.,FastAPI with celery. PostgreSQL stores runtime state structures.,SQLite database storage. Docker/vLLM local serving runtime.,Rust runtime. PostgreSQL datastore for persistent execution metadata.
multi-agent or orchestration design,Manager-Worker topology. Communication via RabbitMQ structured JSON messages.,Federated delegation model. Akka-like typed actor messages.,Shared Context Whiteboard model. All agents interact within the same 2M token context.,Decentralized P2P message bus. Pub/Sub routing via Redis.,Hierarchical routing with coordinator experts and worker experts.,"Group-based role topologies (e.g. Developer, Tester, Deployer).",Llama Stack broker pattern coordinating multiple local stack instances.,Broker pattern matching lightweight native function-calling threads.
engineering feasibility,High feasibility; relies on standard enterprise Redis/K8s/Chroma components.,Medium feasibility; microVM cold starts and formal verification add complexity and latency.,High feasibility; extremely simple stack but relies on costly long-context inference APIs.,High feasibility; utilizes highly responsive Rust framework and simple Docker structures.,Low-to-Medium feasibility; requires heavy optimization of MoE model routing and MLA configs.,High feasibility; uses standard relational database schemas and celery workflows.,Medium feasibility; requires local GPUs with sufficient VRAM to handle vLLM.,"High feasibility; lightweight, standard relational structure and function calls."
originality or non-obvious insight,Decoupling execution from planning via deterministic K8s tool sandboxes.,A security audit ledger that is cryptographically signed to prevent agent rewriting its history.,Replacing database RAG search loops with continuous in-context document-based updates.,Dynamic ground checks using live search data feeds directly in the planning loop.,Integrating reinforcing reward feedback loop directly into local expert runtime.,"Piping tool call dependencies directly as a DAG, skipping sequential intermediate planners.",Self-improving local model parameters using local LoRA fine-tuning on yesterday's execution data.,GDPR-compliant regulatory masking layer embedded in agent tool dispatchers.
38 changes: 38 additions & 0 deletions research/ai_generated_agi_architectures/prompts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Prompts Used for AGI Architecture Collection

This file documents the exact prompt used to collect the AGI architecture proposals from the 8 distinct AI models, along with model-specific adaptations where necessary.

## Core Prompt Template

The following prompt was submitted to all models to establish a standardized, highly rigorous baseline for comparison:

```text
You are a principal AGI systems architect. Design a comprehensive, production-grade software architecture for an Artificial General Intelligence (AGI) agent operating system (Cognitive OS) that can run persistently, reason, learn, interact with tools, model the world, and operate safely.

Your proposal must address the following dimensions with maximum technical depth (including ASCII/UML flowcharts, data schemas, API signatures, math/pseudo-code, and engineering trade-offs):
1. Memory Architecture (short-term working memory, long-term episodic/semantic, vector databases, caching, retrieval/consolidation)
2. Reasoning & Planning Loop (system 1 vs system 2, search-based planning, tree-of-thought, self-correction/introspection)
3. Learning & Self-Improvement (online learning, reflection, schema evolution, policy optimization, self-fine-tuning)
4. Tool Use & Action Execution (tool registry, sandboxing, fallback, API integration, execution verification)
5. World Model & Representation Layer (graphical/symbolic representation, state estimation, predictive planning, causal modeling)
6. Safety & Governance Layer (alignment guardrails, capability bounding, verification gates, human-in-the-loop fallback)
7. Evaluation & Benchmark Strategy (real-time performance monitoring, drift detection, dynamic testing)
8. Persistence & Runtime Architecture (agent state serialization, multi-threaded orchestration, execution lifecycles, memory footprint)
9. Multi-Agent & Orchestration Design (communication protocols, consensus, hierarchical delegation, conflict resolution)
10. Engineering Feasibility & Originality (implementation trade-offs, bottleneck identification, novel insights)

Provide the response in structured markdown with UML/ASCII diagrams where appropriate.
```

## Model-Specific Adaptations

To ensure optimal performance and exploit specific model capabilities, minor prompt adjustments were made:

1. **Google Gemini 1.5 Pro**:
- *Adjustment:* Added a request to "describe how the architecture leverages extremely large context windows (up to 1M-2M tokens) for direct in-memory reasoning and retrieval, compared to standard RAG patterns."
2. **DeepSeek V3**:
- *Adjustment:* Added a request to "elaborate on reinforcement learning (RL) feedback loops and low-latency Mixture of Experts (MoE) / Multi-head Latent Attention (MLA) runtime alignment optimizations."
3. **Anthropic Claude 3.5 Sonnet**:
- *Adjustment:* Emphasized constitutional safety alignment, system-level invariant checkers, and state-machine formal verification.
4. **Meta Llama 3.1**:
- *Adjustment:* Instructed to describe implementation using open-source frameworks like Llama Stack APIs, vLLM, and local inference optimizations.
Loading