Skip to content

Add pre-execution shell command analysis#534

Open
jamiepine wants to merge 1 commit intomainfrom
command-analysis
Open

Add pre-execution shell command analysis#534
jamiepine wants to merge 1 commit intomainfrom
command-analysis

Conversation

@jamiepine
Copy link
Copy Markdown
Member

@jamiepine jamiepine commented Apr 4, 2026

Summary

  • Adds a shell_analysis module that parses and categorizes shell commands before execution, assessing risk level, detecting dangerous patterns, and emitting UX metadata (duration hints, collapsed_by_default, expects_no_output)
  • Integrates the analyzer into the shell tool — risky commands are blocked pending confirmation, and ShellOutput now includes an analysis field for downstream UI/worker logic
  • Updates the shell tool prompt description and documents the mitigation for the verbose-output production failure case

Test plan

  • Verify safe commands (e.g. ls, cargo build) execute normally with correct analysis metadata
  • Verify risky/destructive commands (e.g. rm -rf /) are blocked with a confirmation error
  • Verify expects_no_output commands show Done instead of [No output] on success
  • Verify existing shell tool tests still pass
  • Review ShellOutput.analysis fields are populated correctly for various command categories

🤖 Generated with Claude Code

Introduces a command analysis layer that categorizes shell commands,
assesses risk, detects patterns, and emits metadata (category, risk
level, duration hint, UX flags) before execution. Risky commands are
blocked pending confirmation. Downstream UI can use the analysis to
collapse verbose output and render silent successes cleanly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 4, 2026

Walkthrough

This PR introduces pre-execution shell command analysis by adding a new ShellAnalyzer that categorizes commands, detects security patterns, assesses risk levels, and computes execution metadata before running sandboxed shell operations. Analysis results are returned in ShellOutput to enable downstream UI handling of command visibility and output behavior.

Changes

Cohort / File(s) Summary
Documentation
docs/design-docs/production-worker-failures.md, prompts/en/tools/shell_description.md.j2
Updated design docs and shell tool description to describe command pre-execution analysis behavior, including metadata emission and command validation before execution.
Module Structure
src/tools.rs, src/tools/shell_analysis.rs
Added shell_analysis module with exports of CommandAnalysis, CommandCategory, DetectedPattern, DurationHint, PatternType, and RiskLevel types for external use.
Analysis Types & Parsing
src/tools/shell_analysis/types.rs, src/tools/shell_analysis/parser.rs
Defined serializable data models for analysis results (CommandAnalysis, CommandCategory, RiskLevel, DurationHint, PatternType, DetectedPattern) and implemented quote-aware shell command parsing with operator recognition, word splitting, and path normalization.
Analysis Detectors
src/tools/shell_analysis/categorizer.rs, src/tools/shell_analysis/security.rs
Implemented command categorization based on command type (search/read/list/write/destructive/network/silent) with output redirection awareness; added security pattern detection for command substitution, obfuscation, git commit injection, IFS injection, newlines, environment access, and exfiltration.
Analyzer Orchestration
src/tools/shell_analysis/analyzer.rs
Introduced ShellAnalyzer that chains parsing, categorization, and security detection; computes derived fields including risk level, duration hints, path traversal detection, and confirmation requirements for destructive/suspicious commands.
Shell Tool Integration
src/tools/shell.rs
Added ShellAnalyzer field to ShellTool, extended ShellOutput with analysis field, modified execution flow to analyze commands before running and return error if confirmation is required, updated output formatting to show "Done" for silent successful commands.
Formatting Adjustments
src/agent/channel.rs, src/config/load.rs
Restructured string concatenation and macro invocation formatting without logic changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add pre-execution shell command analysis' directly and clearly summarizes the main change: introducing a new shell_analysis module for analyzing commands before execution.
Description check ✅ Passed The PR description clearly describes the main changes: adding shell_analysis module for command parsing/categorization, integrating it into the shell tool with risk assessment and UX metadata, and updating documentation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch command-analysis

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

}
}

let analysis = self.analyzer.analyze(&args.command, &working_dir);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads-up: analysis.requires_confirmation currently hard-blocks execution, but ShellArgs doesn't include any confirm/ack flag. If the intention is “ask user then re-run”, we probably need an explicit arg (default false) or a structured “needs_confirmation” error that carries analysis so the caller can display details and retry.

continue;
}

if character == '\r' && !in_double_quote {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flags carriage returns inside single quotes as risky (the condition only checks !in_double_quote). Probably want to treat \r like \n and ignore it in both quote modes.

Suggested change
if character == '\r' && !in_double_quote {
if character == '\r' && !in_single_quote && !in_double_quote {
patterns.push(pattern(
PatternType::CarriageReturn,
"Command contains a carriage return outside quotes.",
));
continue;
}

}

fn base_command(words: &[String]) -> Option<String> {
let command_word = command_words(words).first()?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bypass: base_command is derived from the first command word, so wrappers like sudo rm -rf … / env rm -rf … get categorized as sudo/env (likely Other) and won't trigger destructive confirmation. Might be worth skipping common wrapper commands (+ their flags) to recover the underlying command for analysis.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/tools/shell_analysis/categorizer.rs`:
- Around line 31-46: DESTRUCTIVE_COMMANDS currently misses "rmdir" so rmdir is
classified as SILENT and skips confirmation; update the DESTRUCTIVE_COMMANDS
LazyLock initialization to include "rmdir" in the HashSet and also remove
"rmdir" from the SILENT_COMMANDS HashSet to avoid conflicting classification;
modify the static DESTRUCTIVE_COMMANDS and SILENT_COMMANDS definitions (refer to
symbols DESTRUCTIVE_COMMANDS and SILENT_COMMANDS in categorizer.rs) accordingly
so rmdir is treated as CommandCategory::Destructive.
- Around line 166-223: segment_semantics incorrectly assumes the subcommand is
words.get(1) and misses cases where wrappers (e.g., "env") or command-global
options (e.g., "-C", "--context") appear; add a helper (e.g.,
normalize_command_words) that takes the output of command_words(&segment.words)
and strips leading wrapper executables ("env", etc.), environment assignments,
and any leading global-style options (short/long flags and known global flags
like -C/--context) until the real subcommand token is reached; then replace the
current subcommand derivation (words.get(1).map(String::as_str)) with one based
on the normalized list and use that normalized list for checks in the "git",
"docker", "chmod" branches (and any flag helpers like
recursive_flag_present/force_flag_present/long_flag_present as needed).

In `@src/tools/shell.rs`:
- Around line 241-253: The analyzer is blocking "risky" commands with no way to
proceed; add a confirmation pathway by adding a confirm: bool field to ShellArgs
(or equivalent caller payload) and update the check in the function that calls
self.analyzer.analyze(&args.command, &working_dir) to only return the ShellError
when requires_confirmation is true AND args.confirm is false; if args.confirm is
true, allow execution to continue (or set a clear pending state), and ensure any
user-facing messages reflect that confirmation was used (use ShellError only for
unconfirmed attempts).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: caaa9a9b-ab61-425c-9be5-e819a6838369

📥 Commits

Reviewing files that changed from the base of the PR and between fb5c0f3 and 0268b9b.

📒 Files selected for processing (12)
  • docs/design-docs/production-worker-failures.md
  • prompts/en/tools/shell_description.md.j2
  • src/agent/channel.rs
  • src/config/load.rs
  • src/tools.rs
  • src/tools/shell.rs
  • src/tools/shell_analysis.rs
  • src/tools/shell_analysis/analyzer.rs
  • src/tools/shell_analysis/categorizer.rs
  • src/tools/shell_analysis/parser.rs
  • src/tools/shell_analysis/security.rs
  • src/tools/shell_analysis/types.rs

Comment on lines +49 to +87
fn detect_outside_workspace_paths(
&self,
parsed: &ParsedCommand,
working_dir: &Path,
) -> Vec<DetectedPattern> {
for segment in parsed.executable_segments() {
let words = command_words(&segment.words);
for word in words.iter().skip(1) {
if let Some(path) = resolve_candidate_path(working_dir, word)
&& !path.starts_with(&self.workspace)
{
return vec![DetectedPattern {
pattern_type: PatternType::OutsideWorkspacePath,
description: format!(
"Command references a path outside the workspace: {word}"
),
position: None,
}];
}
}
}

for segment in parsed.redirect_targets() {
for word in &segment.words {
if let Some(path) = resolve_candidate_path(working_dir, word)
&& !path.starts_with(&self.workspace)
{
return vec![DetectedPattern {
pattern_type: PatternType::OutsideWorkspacePath,
description: format!(
"Command redirects to a path outside the workspace: {word}"
),
position: None,
}];
}
}
}

Vec::new()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the structure of the shell_analysis module
fd -t f "analyzer.rs" src/tools/shell_analysis/

Repository: spacedriveapp/spacebot

Length of output: 102


🏁 Script executed:

# Read the analyzer.rs file to understand resolve_candidate_path and the full implementation
cat -n src/tools/shell_analysis/analyzer.rs | head -300

Repository: spacedriveapp/spacebot

Length of output: 12190


🏁 Script executed:

# Also check if there are related files that might contain resolve_candidate_path
rg "resolve_candidate_path" src/tools/shell_analysis/ -A 10 -B 2

Repository: spacedriveapp/spacebot

Length of output: 3109


🏁 Script executed:

# Check lines 235-263 specifically mentioned in the comment
sed -n '235,263p' src/tools/shell_analysis/analyzer.rs

Repository: spacedriveapp/spacebot

Length of output: 796


Special-case device sinks in the outside-workspace guard; treat unresolved home/env expansions in write positions as confirmation-worthy.

resolve_candidate_path() rejects ~ and $-prefixed paths, but accepts absolute paths like /dev/null. This causes benign redirects such as 2>/dev/null to be flagged as dangerous off-workspace writes and require confirmation. Meanwhile, real off-workspace writes using ~/backup.txt or $HOME/backup.txt are silently ignored because they don't resolve. The guard should whitelist safe device sinks (/dev/null, /dev/stdout, /dev/stderr) and instead treat unresolved expansions in write/redirect positions as confirmation-worthy rather than ignoring them.

Comment on lines +31 to +46
static DESTRUCTIVE_COMMANDS: LazyLock<HashSet<&'static str>> =
LazyLock::new(|| HashSet::from(["dd", "mkfs", "rm", "shred"]));

static NETWORK_COMMANDS: LazyLock<HashSet<&'static str>> = LazyLock::new(|| {
HashSet::from([
"apt", "apt-get", "brew", "bun", "curl", "ftp", "npm", "pip", "pip3", "pnpm", "rsync",
"scp", "sftp", "ssh", "telnet", "wget", "yarn",
])
});

static SILENT_COMMANDS: LazyLock<HashSet<&'static str>> = LazyLock::new(|| {
HashSet::from([
"cd", "chmod", "chgrp", "chown", "cp", "export", "ln", "mkdir", "mv", "rm", "rmdir",
"touch", "unset", "wait",
])
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check the file structure and locate the categorizer
find . -name "categorizer.rs" -type f

Repository: spacedriveapp/spacebot

Length of output: 107


🏁 Script executed:

# Read the categorizer.rs file to understand the current implementation
wc -l src/tools/shell_analysis/categorizer.rs

Repository: spacedriveapp/spacebot

Length of output: 109


🏁 Script executed:

# Get the full content of the categorizer file to understand the structure
cat -n src/tools/shell_analysis/categorizer.rs | head -250

Repository: spacedriveapp/spacebot

Length of output: 9657


🏁 Script executed:

# Search for RiskLevel and CommandCategory definitions
rg "enum RiskLevel|enum CommandCategory" -A 10

Repository: spacedriveapp/spacebot

Length of output: 1196


🏁 Script executed:

# Check if there's any logic that handles commands appearing in multiple sets
rg "SILENT_COMMANDS|DESTRUCTIVE_COMMANDS" -B 2 -A 2

Repository: spacedriveapp/spacebot

Length of output: 1467


Add rmdir to DESTRUCTIVE_COMMANDS.

rmdir currently classifies as CommandCategory::Silent because it only appears in SILENT_COMMANDS, not DESTRUCTIVE_COMMANDS. This causes it to bypass the confirmation gate for destructive operations despite permanently removing filesystem state. The fix is straightforward:

Fix
 static DESTRUCTIVE_COMMANDS: LazyLock<HashSet<&'static str>> =
-    LazyLock::new(|| HashSet::from(["dd", "mkfs", "rm", "shred"]));
+    LazyLock::new(|| HashSet::from(["dd", "mkfs", "rm", "rmdir", "shred"]));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static DESTRUCTIVE_COMMANDS: LazyLock<HashSet<&'static str>> =
LazyLock::new(|| HashSet::from(["dd", "mkfs", "rm", "shred"]));
static NETWORK_COMMANDS: LazyLock<HashSet<&'static str>> = LazyLock::new(|| {
HashSet::from([
"apt", "apt-get", "brew", "bun", "curl", "ftp", "npm", "pip", "pip3", "pnpm", "rsync",
"scp", "sftp", "ssh", "telnet", "wget", "yarn",
])
});
static SILENT_COMMANDS: LazyLock<HashSet<&'static str>> = LazyLock::new(|| {
HashSet::from([
"cd", "chmod", "chgrp", "chown", "cp", "export", "ln", "mkdir", "mv", "rm", "rmdir",
"touch", "unset", "wait",
])
});
static DESTRUCTIVE_COMMANDS: LazyLock<HashSet<&'static str>> =
LazyLock::new(|| HashSet::from(["dd", "mkfs", "rm", "rmdir", "shred"]));
static NETWORK_COMMANDS: LazyLock<HashSet<&'static str>> = LazyLock::new(|| {
HashSet::from([
"apt", "apt-get", "brew", "bun", "curl", "ftp", "npm", "pip", "pip3", "pnpm", "rsync",
"scp", "sftp", "ssh", "telnet", "wget", "yarn",
])
});
static SILENT_COMMANDS: LazyLock<HashSet<&'static str>> = LazyLock::new(|| {
HashSet::from([
"cd", "chmod", "chgrp", "chown", "cp", "export", "ln", "mkdir", "mv", "rm", "rmdir",
"touch", "unset", "wait",
])
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/shell_analysis/categorizer.rs` around lines 31 - 46,
DESTRUCTIVE_COMMANDS currently misses "rmdir" so rmdir is classified as SILENT
and skips confirmation; update the DESTRUCTIVE_COMMANDS LazyLock initialization
to include "rmdir" in the HashSet and also remove "rmdir" from the
SILENT_COMMANDS HashSet to avoid conflicting classification; modify the static
DESTRUCTIVE_COMMANDS and SILENT_COMMANDS definitions (refer to symbols
DESTRUCTIVE_COMMANDS and SILENT_COMMANDS in categorizer.rs) accordingly so rmdir
is treated as CommandCategory::Destructive.

Comment on lines +166 to +223
fn segment_semantics(
segment: &crate::tools::shell_analysis::parser::ParsedSegment,
) -> CommandSemantics {
let mut semantics = CommandSemantics::default();
let Some(base_command) = segment.base_command.as_deref() else {
return semantics;
};

if SEMANTIC_NEUTRAL_COMMANDS.contains(base_command) {
semantics.is_neutral = true;
return semantics;
}

semantics.is_search = SEARCH_COMMANDS.contains(base_command);
semantics.is_read = READ_COMMANDS.contains(base_command);
semantics.is_list = LIST_COMMANDS.contains(base_command);
semantics.is_write = WRITE_COMMANDS.contains(base_command);
semantics.is_destructive = DESTRUCTIVE_COMMANDS.contains(base_command);
semantics.is_network = NETWORK_COMMANDS.contains(base_command);
semantics.is_silent = SILENT_COMMANDS.contains(base_command);

let words = command_words(&segment.words);
let subcommand = words.get(1).map(String::as_str);

match base_command {
"chmod" => {
semantics.is_write = true;
semantics.is_silent = true;
if recursive_flag_present(words) {
semantics.is_destructive = true;
}
}
"chgrp" | "chown" => {
semantics.is_write = true;
semantics.is_silent = true;
}
"docker" => {
if matches!(subcommand, Some("build" | "compose" | "pull" | "push")) {
semantics.is_network = true;
}
}
"git" => match subcommand {
Some("checkout" | "switch") => {
semantics.is_silent = true;
}
Some("clean") if force_flag_present(words) => {
semantics.is_destructive = true;
semantics.is_silent = true;
}
Some("clone" | "fetch" | "pull" | "push" | "submodule") => {
semantics.is_network = true;
}
Some("reset") if long_flag_present(words, "--hard") => {
semantics.is_destructive = true;
semantics.is_silent = true;
}
_ => {}
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the categorizer.rs file to understand the full context
wc -l src/tools/shell_analysis/categorizer.rs

Repository: spacedriveapp/spacebot

Length of output: 109


🏁 Script executed:

# Read the segment_semantics function and surrounding context
head -230 src/tools/shell_analysis/categorizer.rs | tail -100

Repository: spacedriveapp/spacebot

Length of output: 3433


🏁 Script executed:

# Let's also look at the beginning to understand the constants
head -100 src/tools/shell_analysis/categorizer.rs

Repository: spacedriveapp/spacebot

Length of output: 3323


🏁 Script executed:

# Let's look at the parser.rs file to understand how ParsedSegment is structured
fd parser.rs src/tools/shell_analysis/

Repository: spacedriveapp/spacebot

Length of output: 100


🏁 Script executed:

# Check the parser file
wc -l src/tools/shell_analysis/parser.rs

Repository: spacedriveapp/spacebot

Length of output: 104


🏁 Script executed:

# Look at the ParsedSegment definition and base_command extraction
rg -A 10 "struct ParsedSegment" src/tools/shell_analysis/parser.rs

Repository: spacedriveapp/spacebot

Length of output: 401


🏁 Script executed:

# Search for base_command assignment in parser
rg "base_command" src/tools/shell_analysis/parser.rs -B 3 -A 3

Repository: spacedriveapp/spacebot

Length of output: 654


🏁 Script executed:

# Get the full base_command function
rg -A 20 "^fn base_command" src/tools/shell_analysis/parser.rs

Repository: spacedriveapp/spacebot

Length of output: 735


🏁 Script executed:

# Get the command_words function
rg -B 5 -A 15 "^pub\(crate\) fn command_words" src/tools/shell_analysis/parser.rs

Repository: spacedriveapp/spacebot

Length of output: 663


🏁 Script executed:

# Also check if there's any handling for common wrappers
rg -i "env|wrapper|global.option" src/tools/shell_analysis/parser.rs

Repository: spacedriveapp/spacebot

Length of output: 178


🏁 Script executed:

# Let's check if there's usage of CategorizationResult for confirmation/gating
rg "CategorizationResult" src/ -B 2 -A 5 | head -100

Repository: spacedriveapp/spacebot

Length of output: 4023


🏁 Script executed:

# Look for confirmation or gate-related logic
rg -i "confirm|gate|check.*destructive|check.*network" src/tools/shell_analysis/ -B 2 -A 2

Repository: spacedriveapp/spacebot

Length of output: 3917


🏁 Script executed:

# Let's see the full command_words definition and any wrapper handling
sed -n '1,100p' src/tools/shell_analysis/parser.rs | grep -A 30 "command_words"

Repository: spacedriveapp/spacebot

Length of output: 48


🏁 Script executed:

# Let's test the examples by tracing through the logic
# First, let's see what command_words does with our examples

cat > /tmp/test_parsing.py << 'EOF'
# Simulate command_words behavior

def is_env_assignment(word):
    if '=' not in word:
        return False
    name, _ = word.split('=', 1)
    if not name or '/' in name:
        return False
    if not name[0].isalpha() and name[0] not in ('_',):
        return False
    return all(c.isalnum() or c == '_' for c in name)

def first_command_word_index(words):
    for i, word in enumerate(words):
        if not is_env_assignment(word):
            return i
    return None

def command_words(words):
    idx = first_command_word_index(words)
    return words[idx:] if idx is not None else []

# Test examples
examples = [
    ["env", "FOO=1", "rm", "-rf", "target"],
    ["git", "-C", "repo", "reset", "--hard", "HEAD~1"],
    ["docker", "--context", "prod", "compose", "pull"],
]

for ex in examples:
    cw = command_words(ex)
    base = cw[0] if cw else None
    subcommand = cw[1] if len(cw) > 1 else None
    print(f"Command: {' '.join(ex)}")
    print(f"  base_command: {base}")
    print(f"  subcommand (words[1]): {subcommand}")
    print()
EOF
python /tmp/test_parsing.py

Repository: spacedriveapp/spacebot

Length of output: 343


🏁 Script executed:

# Let's also check what happens with "env" - is it in the base_command filtering?
# Let's check if "env" is handled as a wrapper
rg "env" src/tools/shell_analysis/ -i -B 2 -A 2 | head -50

Repository: spacedriveapp/spacebot

Length of output: 2881


🏁 Script executed:

# Check the full segment_semantics for git and docker handling
sed -n '166,324p' src/tools/shell_analysis/categorizer.rs

Repository: spacedriveapp/spacebot

Length of output: 5168


🏁 Script executed:

# Let's trace through what happens with git -C repo reset --hard
cat > /tmp/test_git.py << 'EOF'
# Simulate the categorizer logic for git -C repo reset --hard

def command_words(words):
    # Only filters env assignments, not flags
    result = []
    for word in words:
        if '=' not in word:
            result.append(word)
        else:
            # Check if it's an assignment
            name, _ = word.split('=', 1)
            if not name or '/' in name:
                result.append(word)
            else:
                continue
    return result if result else []

words = ["git", "-C", "repo", "reset", "--hard", "HEAD~1"]
cw = command_words(words)
base_command = cw[0] if cw else None
subcommand = cw[1] if len(cw) > 1 else None

print(f"Input: {' '.join(words)}")
print(f"command_words result: {cw}")
print(f"base_command: {base_command}")
print(f"subcommand (words[1]): {subcommand}")
print()
print("Expected to match 'git' with subcommand 'reset':")
print(f"  base_command == 'git': {base_command == 'git'}")
print(f"  subcommand == 'reset': {subcommand == 'reset'}")
print()
print("What actually happens:")
print(f"  Match fails because subcommand is '{subcommand}' not 'reset'")
EOF
python /tmp/test_git.py

Repository: spacedriveapp/spacebot

Length of output: 409


🏁 Script executed:

# Check if there are any tests that show the expected behavior with global options
rg "git.*-C\|docker.*--context\|env.*rm" src/tools/shell_analysis/ --type rs

Repository: spacedriveapp/spacebot

Length of output: 92


🏁 Script executed:

# Let's check if there's any wrapper or global option handling elsewhere
rg -B 5 -A 10 "skip.*option\|skip.*flag\|wrapper.*exec\|normalize.*command" src/tools/shell_analysis/

Repository: spacedriveapp/spacebot

Length of output: 48


🏁 Script executed:

# Let's verify the confirmation logic actually uses has_destructive and has_network
sed -n '1,200p' src/tools/shell_analysis/analyzer.rs | grep -A 20 "confirmation_reason"

Repository: spacedriveapp/spacebot

Length of output: 1903


🏁 Script executed:

# Test with an actual example - check test to see if it covers these cases
rg "test.*git\|test.*docker\|test.*env" src/tools/shell_analysis/categorizer.rs -A 5

Repository: spacedriveapp/spacebot

Length of output: 48


🏁 Script executed:

# Let's check if there are existing tests that would catch this bug
cat > /tmp/verify_bug.py << 'EOF'
# Test the three examples from the review to confirm they bypass detection

examples = {
    "env FOO=1 rm -rf target": {
        "expected": "Destructive",
        "actual_base": "env",
        "actual_subcommand": "FOO=1",
        "will_match": False,
        "reason": "env is not a destructive command and is in SEMANTIC_NEUTRAL_COMMANDS or not matched"
    },
    "git -C repo reset --hard HEAD~1": {
        "expected": "Destructive (via git reset --hard)",
        "actual_base": "git",
        "actual_subcommand": "-C",
        "will_match": False,
        "reason": "subcommand is -C not reset, so pattern doesn't match"
    },
    "docker --context prod compose pull": {
        "expected": "Network (via docker compose pull)",
        "actual_base": "docker",
        "actual_subcommand": "--context",
        "will_match": False,
        "reason": "subcommand is --context not compose, so pattern doesn't match"
    }
}

for cmd, info in examples.items():
    print(f"Command: {cmd}")
    print(f"  Expected: {info['expected']}")
    print(f"  Actual base_command: {info['actual_base']}")
    print(f"  Actual subcommand: {info['actual_subcommand']}")
    print(f"  Will trigger confirmation: {info['will_match']}")
    print(f"  Reason: {info['reason']}")
    print()
EOF
python /tmp/verify_bug.py

Repository: spacedriveapp/spacebot

Length of output: 803


🏁 Script executed:

# Check if "env" is in SEMANTIC_NEUTRAL_COMMANDS
rg "SEMANTIC_NEUTRAL_COMMANDS" src/tools/shell_analysis/categorizer.rs -A 1

Repository: spacedriveapp/spacebot

Length of output: 310


🏁 Script executed:

# Let's verify what category "env FOO=1 rm -rf target" actually gets categorized as
cat > /tmp/test_final.py << 'EOF'
# Simulate the full categorizer for "env FOO=1 rm -rf target"

# First segment would be just "env" with words ["env", "FOO=1", "rm", "-rf", "target"]
# OR it could be parsed differently

# Let's check: command_words skips env assignments
def is_env_assignment(word):
    if '=' not in word:
        return False
    name, _ = word.split('=', 1)
    if not name or '/' in name:
        return False
    return all(c.isalnum() or c == '_' for c in name)

def command_words(words):
    for i, word in enumerate(words):
        if not is_env_assignment(word):
            return words[i:]
    return []

# Test case 1: "env FOO=1 rm -rf target" as single segment
words1 = ["env", "FOO=1", "rm", "-rf", "target"]
cw1 = command_words(words1)
print("Case 1: env FOO=1 rm -rf target (single segment)")
print(f"  Input words: {words1}")
print(f"  command_words result: {cw1}")
print(f"  base_command: {cw1[0] if cw1 else None}")
print()

# But wait - let's check the actual code again
# base_command gets the FIRST command_word via Path::file_name
# So for ["env", "FOO=1", "rm", ...], command_words gives ["env", "FOO=1", "rm", ...]
# And base_command takes the first element "env"
EOF
python /tmp/test_final.py

Repository: spacedriveapp/spacebot

Length of output: 254


🏁 Script executed:

# Let's see exactly how the segments are created - are they split by pipes/redirects?
rg -B 5 -A 10 "executable_segments\|ParsedPart" src/tools/shell_analysis/parser.rs | head -80

Repository: spacedriveapp/spacebot

Length of output: 48


🏁 Script executed:

# Check the test to see if there are any tests for wrapped commands
rg "test_" src/tools/shell_analysis/categorizer.rs | head -20

Repository: spacedriveapp/spacebot

Length of output: 48


🏁 Script executed:

# Search for executable_segments definition
rg "executable_segments" src/tools/shell_analysis/ -B 2 -A 10

Repository: spacedriveapp/spacebot

Length of output: 9639


🏁 Script executed:

# Let's look at the full parser to understand how segments are split
sed -n '100,200p' src/tools/shell_analysis/parser.rs

Repository: spacedriveapp/spacebot

Length of output: 3628


🏁 Script executed:

# Check ParsedCommand structure and executable_segments
rg "impl.*ParsedCommand" src/tools/shell_analysis/parser.rs -A 20

Repository: spacedriveapp/spacebot

Length of output: 737


🏁 Script executed:

# Let's verify the final categorization for each example
cat > /tmp/final_verification.py << 'EOF'
# Full trace through categorization logic

SEMANTIC_NEUTRAL = {":", "echo", "false", "printf", "true"}
DESTRUCTIVE_COMMANDS = {"dd", "mkfs", "rm", "shred"}
NETWORK_COMMANDS = {"apt", "apt-get", "brew", "bun", "curl", "ftp", "npm", "pip", "pip3", "pnpm", "rsync", "scp", "sftp", "ssh", "telnet", "wget", "yarn", "docker"}

def categorize(base_cmd, subcommand):
    semantics = {
        'is_destructive': False,
        'is_network': False,
        'is_neutral': False,
    }
    
    if base_cmd in SEMANTIC_NEUTRAL:
        semantics['is_neutral'] = True
        return semantics
    
    semantics['is_destructive'] = base_cmd in DESTRUCTIVE_COMMANDS
    semantics['is_network'] = base_cmd in NETWORK_COMMANDS
    
    # Special handling for git
    if base_cmd == 'git':
        if subcommand == 'reset':  # Simplified - in reality needs --hard check
            semantics['is_destructive'] = True
    
    # Special handling for docker
    if base_cmd == 'docker':
        if subcommand in ['build', 'compose', 'pull', 'push']:
            semantics['is_network'] = True
    
    return semantics

examples = [
    ("env", "FOO=1", "env FOO=1 rm -rf target"),
    ("git", "-C", "git -C repo reset --hard HEAD~1"),
    ("docker", "--context", "docker --context prod compose pull"),
]

for base, sub, cmd in examples:
    sem = categorize(base, sub)
    print(f"Command: {cmd}")
    print(f"  base_command: {base}")
    print(f"  subcommand (words[1]): {sub}")
    print(f"  is_destructive: {sem['is_destructive']}")
    print(f"  is_network: {sem['is_network']}")
    if sem['is_destructive']:
        print(f"  → Requires confirmation (destructive)")
    elif sem['is_network']:
        print(f"  → May require confirmation (network)")
    else:
        print(f"  → NO confirmation (safe/other)")
    print()
EOF
python /tmp/final_verification.py

Repository: spacedriveapp/spacebot

Length of output: 579


🏁 Script executed:

# Check if network commands actually require confirmation
grep -A 30 "fn confirmation_reason" src/tools/shell_analysis/analyzer.rs

Repository: spacedriveapp/spacebot

Length of output: 995


🏁 Script executed:

# Final check: does network require confirmation?
grep -B 5 -A 20 "pattern_requires_confirmation" src/tools/shell_analysis/analyzer.rs | head -50

Repository: spacedriveapp/spacebot

Length of output: 1549


🏁 Script executed:

# Check if network alone requires confirmation or only destructive
rg "has_network" src/tools/shell_analysis/analyzer.rs

Repository: spacedriveapp/spacebot

Length of output: 103


🏁 Script executed:

# Get more context around has_network usage
rg "has_network" src/tools/shell_analysis/analyzer.rs -B 5 -A 5

Repository: spacedriveapp/spacebot

Length of output: 381


🏁 Script executed:

# Let's create a comprehensive test to confirm the bug with all three examples
cat > /tmp/comprehensive_test.py << 'EOF'
# Comprehensive verification of the review comment

# Based on code analysis:
# 1. confirmation_reason() only adds to reasons if has_destructive OR pattern-specific
# 2. has_network affects risk_level (Caution) but NOT confirmation requirement
# 3. Confirmation is ONLY required for destructive or certain patterns

print("=" * 70)
print("VERIFICATION OF REVIEW COMMENT")
print("=" * 70)
print()

examples = [
    {
        "cmd": "env FOO=1 rm -rf target",
        "expected": "Should require confirmation (destructive)",
        "base_command": "env",
        "subcommand": "FOO=1",
        "in_destructive_set": False,
        "has_special_handling": False,
        "result": "NO confirmation required",
        "bug": True,
    },
    {
        "cmd": "git -C repo reset --hard HEAD~1",
        "expected": "Should require confirmation (destructive git reset)",
        "base_command": "git",
        "subcommand": "-C",
        "subcommand_match": "reset",
        "subcommand_matches": False,
        "result": "NO confirmation required",
        "bug": True,
    },
    {
        "cmd": "docker --context prod compose pull",
        "expected": "Should have network risk",
        "base_command": "docker",
        "subcommand": "--context",
        "in_network_set": True,
        "subcommand_match": "compose",
        "subcommand_matches": False,
        "result": "Risk=Caution, but NO confirmation required (network alone doesn't confirm)",
        "bug": True,
        "note": "Network risk detected but confirmation not triggered",
    }
]

for i, ex in enumerate(examples, 1):
    print(f"{i}. {ex['cmd']}")
    print(f"   Expected: {ex['expected']}")
    print(f"   Result:   {ex['result']}")
    print(f"   Bug:      {'YES' if ex['bug'] else 'NO'}")
    if 'note' in ex:
        print(f"   Note:     {ex['note']}")
    print()

print("=" * 70)
print("CONCLUSION: Review comment is CORRECT")
print("=" * 70)
print()
print("Root cause: The categorizer assumes:")
print("  • base_command is always the effective executable")
print("  • subcommand is always words[1]")
print()
print("This fails when:")
print("  • Wrapper executables like 'env' are used")
print("  • Global options like '-C', '--context' precede the actual subcommand")
print()
print("Impact: Confirmation gate can be bypassed with common shell patterns")
EOF
python /tmp/comprehensive_test.py

Repository: spacedriveapp/spacebot

Length of output: 1289


Handle command-global options and wrapper executables in subcommand matching.

The matcher assumes words[1] always contains the subcommand, but common patterns bypass detection:

  • env FOO=1 rm -rf target → base="env", subcommand="FOO=1" (no confirmation despite destructive operation)
  • git -C repo reset --hard → base="git", subcommand="-C" (fails to match "reset", no confirmation)
  • docker --context prod compose pull → base="docker", subcommand="--context" (fails to match "compose")

The command_words() function correctly skips environment variable assignments but does not skip command-global options (e.g., -C, --context), causing subcommand matching to fail and destructive commands to bypass the confirmation gate. A helper function to normalize arguments by skipping leading command-global options before extracting the actual subcommand would close this gap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/shell_analysis/categorizer.rs` around lines 166 - 223,
segment_semantics incorrectly assumes the subcommand is words.get(1) and misses
cases where wrappers (e.g., "env") or command-global options (e.g., "-C",
"--context") appear; add a helper (e.g., normalize_command_words) that takes the
output of command_words(&segment.words) and strips leading wrapper executables
("env", etc.), environment assignments, and any leading global-style options
(short/long flags and known global flags like -C/--context) until the real
subcommand token is reached; then replace the current subcommand derivation
(words.get(1).map(String::as_str)) with one based on the normalized list and use
that normalized list for checks in the "git", "docker", "chmod" branches (and
any flag helpers like
recursive_flag_present/force_flag_present/long_flag_present as needed).

Comment on lines +241 to +253
let analysis = self.analyzer.analyze(&args.command, &working_dir);
if analysis.requires_confirmation {
return Err(ShellError {
message: format!(
"Command requires confirmation: {}",
analysis
.confirmation_reason
.as_deref()
.unwrap_or("the command was flagged as risky before execution")
),
exit_code: -1,
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if there's any confirmation bypass mechanism elsewhere in the codebase
rg -n "requires_confirmation|confirmation_reason" --type=rs -C3

Repository: spacedriveapp/spacebot

Length of output: 92


🏁 Script executed:

#!/bin/bash
# Search for confirmation-related code in Rust files
rg -n "requires_confirmation|confirmation_reason" --glob="*.rs" -C3

Repository: spacedriveapp/spacebot

Length of output: 5534


🏁 Script executed:

#!/bin/bash
# Also check the PR description or comments about the intended behavior
git log --oneline --all -n 20 | head -20

Repository: spacedriveapp/spacebot

Length of output: 1270


🏁 Script executed:

#!/bin/bash
# Check if there's a bypass flag or confirmation approval mechanism
rg -n "bypass|approval|confirm.*true|force" --glob="*.rs" src/tools/shell.rs -C2

Repository: spacedriveapp/spacebot

Length of output: 592


🏁 Script executed:

#!/bin/bash
# Check the function signature and args for the shell tool
rg -n "struct.*Args|pub.*confirm|confirmed.*execution" --glob="*.rs" src/tools/shell.rs -C5

Repository: spacedriveapp/spacebot

Length of output: 444


🏁 Script executed:

#!/bin/bash
# Check if there's a separate execution path or function for confirmed commands
rg -n "async fn.*execute|async fn.*call" --glob="*.rs" src/tools/shell.rs -B2 -A10

Repository: spacedriveapp/spacebot

Length of output: 718


🏁 Script executed:

#!/bin/bash
# Look for the full commit message of the most recent pre-execution analysis commit
git show --format=fuller 0268b9b | head -100

Repository: spacedriveapp/spacebot

Length of output: 3816


🏁 Script executed:

#!/bin/bash
# Check if there are any tests that show confirmation being handled
rg -n "requires_confirmation.*true|blocked.*confirmation" --glob="*.rs" -A10 -B2

Repository: spacedriveapp/spacebot

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Look at more complete context around the error return to see if there's any other path
sed -n '230,260p' src/tools/shell.rs

Repository: spacedriveapp/spacebot

Length of output: 1243


🏁 Script executed:

#!/bin/bash
# Check if there's a bypass or force flag in any of the args or config
rg -n "force|bypass|override.*confirm|skip.*check" --glob="*.rs" src/tools/

Repository: spacedriveapp/spacebot

Length of output: 1835


Hard-blocked commands have no recovery pathway despite commit message promising "pending confirmation."

The PR description states "Risky commands are blocked pending confirmation," but ShellArgs contains no confirmation flag, and the tool returns ShellError immediately with no mechanism to re-execute after approval. The LLM cannot proceed even if the user intends to run the flagged command.

Either add a confirm: bool parameter to allow bypassing the check on confirmed calls, or clarify that this is an intentional hard block and update the commit message accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/shell.rs` around lines 241 - 253, The analyzer is blocking "risky"
commands with no way to proceed; add a confirmation pathway by adding a confirm:
bool field to ShellArgs (or equivalent caller payload) and update the check in
the function that calls self.analyzer.analyze(&args.command, &working_dir) to
only return the ShellError when requires_confirmation is true AND args.confirm
is false; if args.confirm is true, allow execution to continue (or set a clear
pending state), and ensure any user-facing messages reflect that confirmation
was used (use ShellError only for unconfirmed attempts).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant