fix: auto-correct dash-to-underscore filename mismatches in path validation (#48)#161
fix: auto-correct dash-to-underscore filename mismatches in path validation (#48)#161williamp44 wants to merge 1 commit intobhauman:mainfrom
Conversation
…dation (bhauman#48) LLMs frequently request Clojure files using dashes (core-stuff.clj) when the filesystem uses underscores (core_stuff.clj). This adds transparent correction in validate-path-with-client so all tools (read_file, edit, grep, etc.) get the fix for free. Only corrects .clj/.cljs/.cljc/.cljd files. Directory components are preserved. Corrected path is re-validated for defense-in-depth (symlink protection).
📝 WalkthroughWalkthroughIntroduces defensive path validation with automatic dash-to-underscore filename correction for Clojure source files. When a file with dashes is not found, the system attempts to locate and return the underscore-corrected version instead, with corresponding test coverage for validation logic and tool integration. Changes
Sequence DiagramsequenceDiagram
actor Client
participant ReadFileTool as Read File Tool
participant PathValidator as Path Validator
participant FileSystem as File System
Client->>ReadFileTool: Request file with dashes<br/>(e.g., "core-stuff.clj")
ReadFileTool->>PathValidator: Validate path
PathValidator->>FileSystem: Check if path exists
FileSystem-->>PathValidator: Not found
PathValidator->>PathValidator: Extract extension<br/>(is Clojure source?)
alt Is Clojure file
PathValidator->>PathValidator: Replace dashes with<br/>underscores in filename
PathValidator->>FileSystem: Check corrected path
FileSystem-->>PathValidator: Found (e.g., "core_stuff.clj")
PathValidator-->>ReadFileTool: Return corrected path
else Not Clojure file
PathValidator-->>ReadFileTool: Return original path
end
ReadFileTool->>FileSystem: Read file content
FileSystem-->>ReadFileTool: File content
ReadFileTool-->>Client: Content + corrected filename
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip Migrating from UI to YAML configuration.Use the |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/clojure_mcp/utils/valid_paths_test.clj (1)
125-127: Use a unique temp directory name to avoid test collisions.Using a fixed temp dir can make this test flaky under parallel runs or failed prior cleanup.
Suggested change
- (let [test-dir (io/file (System/getProperty "java.io.tmpdir") "valid-paths-dash-test") + (let [test-dir (io/file (System/getProperty "java.io.tmpdir") + (str "valid-paths-dash-test-" (System/nanoTime))) canonical-dir (.getCanonicalPath test-dir)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/clojure_mcp/utils/valid_paths_test.clj` around lines 125 - 127, The test currently uses a fixed temp directory name via test-dir and canonical-dir which can collide in parallel runs; change creation of test-dir in the let binding (and any helper make-client that uses it) to use a unique temp directory (e.g., java.nio.file.Files/createTempDirectory or java.util.UUID/randomUUID appended to the name) and ensure any setup/teardown deletes the created temp directory to avoid leftover state between tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/clojure_mcp/utils/valid_paths.clj`:
- Around line 118-122: The validate-path flow currently returns only the
(possibly corrected) path string; update it and callers so dash-to-underscore
corrections are surfaced as structured metadata: have validate-path (or a new
helper used here) return a map like {:path validated-path :original
original-path :corrected? true/false :correction-source :dash-to-underscore}
when try-dash-to-underscore-correction is applied (use the existing
try-dash-to-underscore-correction, path-exists? and validate-path symbols to
detect correction), and modify read_file/format-results callers to consume this
map and include explicit fields (e.g. corrected_path, original_path, notice) in
the read_file response object instead of relying only on markdown text.
In `@test/clojure_mcp/tools/unified_read_file/tool_test.clj`:
- Around line 103-182: Add assertions to these tests to ensure the
dash→underscore correction is surfaced in tool metadata: after calling
tool-system/validate-inputs, tool-system/execute-tool, and
tool-system/format-results (and when using test-utils/make-tool-tester), assert
that the validated/result/formatted maps include the original requested path
(e.g. :original-path or similar), a corrected path (e.g. :path or
:corrected-path showing the actual file used), and an explicit flag/notice (e.g.
:corrected? true or :correction-notice) indicating auto-correction occurred;
update each relevant testing block (the ones that read corrected files and the
full pipeline block) to check these fields flow through validate-inputs →
execute-tool → format-results using the same symbol names used by the tool
implementation (unified-read-file-tool/create-unified-read-file-tool,
tool-system/validate-inputs, tool-system/execute-tool,
tool-system/format-results).
---
Nitpick comments:
In `@test/clojure_mcp/utils/valid_paths_test.clj`:
- Around line 125-127: The test currently uses a fixed temp directory name via
test-dir and canonical-dir which can collide in parallel runs; change creation
of test-dir in the let binding (and any helper make-client that uses it) to use
a unique temp directory (e.g., java.nio.file.Files/createTempDirectory or
java.util.UUID/randomUUID appended to the name) and ensure any setup/teardown
deletes the created temp directory to avoid leftover state between tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2341f1ff-28c5-4253-97ec-4b579fbf0732
📒 Files selected for processing (3)
src/clojure_mcp/utils/valid_paths.cljtest/clojure_mcp/tools/unified_read_file/tool_test.cljtest/clojure_mcp/utils/valid_paths_test.clj
| (let [validated (validate-path path current-dir allowed-dirs)] | ||
| (if (path-exists? validated) | ||
| validated | ||
| (or (try-dash-to-underscore-correction validated current-dir allowed-dirs) | ||
| validated))))) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether unified-read-file surfaces correction metadata/notice in its response contract.
# 1) Find path-validation callsites and nearby response construction.
rg -n -C4 'validate-path-with-client|validate-inputs|execute-tool|format-results' src/clojure_mcp/tools/unified_read_file test/clojure_mcp/tools/unified_read_file
# 2) Search for correction-related fields/strings in tool code and tests.
rg -n -C3 'corrected|correction|original-path|original_filename|notice|dash|underscore' src testRepository: bhauman/clojure-mcp
Length of output: 47980
🏁 Script executed:
# Check what execute-tool returns for unified-read-file
rg -n -A 20 'defmethod tool-system/execute-tool :unified-read-file' src/clojure_mcp/tools/unified_read_file/tool.clj | head -50
# Check the result structure being returned
rg -n -B 5 -A 10 'defn execute-tool\|execute-tool.*result' src/clojure_mcp/tools/unified_read_file/tool.clj | head -80
# Look at what's in the result map returned
rg -n ':result\|:path\|:error' src/clojure_mcp/tools/unified_read_file/tool.clj | head -40Repository: bhauman/clojure-mcp
Length of output: 1295
🏁 Script executed:
# Get the complete execute-tool method implementation
rg -n -A 100 'defmethod tool-system/execute-tool :unified-read-file' src/clojure_mcp/tools/unified_read_file/tool.clj | head -150
# Now check format-results to see what it includes
rg -n -A 30 'defmethod tool-system/format-results :unified-read-file' src/clojure_mcp/tools/unified_read_file/tool.cljRepository: bhauman/clojure-mcp
Length of output: 6567
🏁 Script executed:
# Check what format-raw-file returns to see if path is included
rg -n -B 5 -A 20 'defn format-raw-file' src/clojure_mcp/tools/unified_read_file/tool.clj
# Search for issue `#48` reference
rg -n '#48\|issue 48' src test
# Check if there's any correction metadata/flag being tracked anywhere
rg -n 'correction-notice\|corrected-path\|original-path\|was-corrected' src testRepository: bhauman/clojure-mcp
Length of output: 1400
🏁 Script executed:
# Search more broadly for correction references
rg -n 'correction\|corrected\|notice' src/clojure_mcp/tools/unified_read_file test/clojure_mcp/tools/unified_read_file
# Look at the actual response structure in tests - do they check for path metadata?
rg -n -B 5 -A 5 ':path\|:corrected\|:notice' test/clojure_mcp/tools/unified_read_file/tool_test.clj
# Check what the MCP tool response contract looks like
rg -n 'result\|response' src/clojure_mcp/tool_system.clj | head -40Repository: bhauman/clojure-mcp
Length of output: 45
🏁 Script executed:
# Check the final response structure more explicitly - what does the callback return?
cat -n src/clojure_mcp/tool_system.clj | grep -A 20 'tool-fn.*params.*callback'
# Look for any existing correction metadata handling
fd 'issue\|Issue' src test --type f -x cat {} \; | head -100
# Check unified-read-file responses in actual test assertions
rg -n 'result\|:error' test/clojure_mcp/tools/unified_read_file/tool_test.clj | grep -A 2 -B 2 'is ('Repository: bhauman/clojure-mcp
Length of output: 1438
Correction metadata should be surfaced in read_file responses.
The path validation transparently corrects dash-to-underscore mismatches but returns only the corrected path string without metadata. The downstream response from format-results includes the corrected path only as embedded text in markdown, not as an explicit flag or notice in the response structure. Per issue #48, read_file should include both the corrected filename and an explicit correction indicator (e.g., {"corrected_path": "...", "original_path": "...", "notice": "..."}) in its structured response, not just in the formatted markdown text.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/clojure_mcp/utils/valid_paths.clj` around lines 118 - 122, The
validate-path flow currently returns only the (possibly corrected) path string;
update it and callers so dash-to-underscore corrections are surfaced as
structured metadata: have validate-path (or a new helper used here) return a map
like {:path validated-path :original original-path :corrected? true/false
:correction-source :dash-to-underscore} when try-dash-to-underscore-correction
is applied (use the existing try-dash-to-underscore-correction, path-exists? and
validate-path symbols to detect correction), and modify read_file/format-results
callers to consume this map and include explicit fields (e.g. corrected_path,
original_path, notice) in the read_file response object instead of relying only
on markdown text.
| (deftest dash-to-underscore-correction-test | ||
| (testing "File with dashes requested, underscore version exists - reads successfully" | ||
| (let [underscore-file (io/file *test-dir* "core_stuff.clj") | ||
| _ (spit underscore-file "(ns core-stuff)\n(defn hello [] :world)") | ||
| tool-instance (unified-read-file-tool/create-unified-read-file-tool *nrepl-client-atom*) | ||
| dash-path (.getAbsolutePath (io/file *test-dir* "core-stuff.clj")) | ||
| validated (tool-system/validate-inputs tool-instance {:path dash-path}) | ||
| result (tool-system/execute-tool tool-instance validated) | ||
| formatted (tool-system/format-results tool-instance result)] | ||
| (is (not (:error formatted)) "Should successfully read the corrected file") | ||
| (is (some #(str/includes? % "core-stuff") (:result formatted)) | ||
| "Should contain the file content") | ||
| (.delete underscore-file))) | ||
|
|
||
| (testing "File with underscores requested directly - reads successfully" | ||
| (let [underscore-file (io/file *test-dir* "core_stuff.clj") | ||
| _ (spit underscore-file "(ns core-stuff)\n(defn hello [] :world)") | ||
| tool-instance (unified-read-file-tool/create-unified-read-file-tool *nrepl-client-atom*) | ||
| underscore-path (.getAbsolutePath underscore-file) | ||
| validated (tool-system/validate-inputs tool-instance {:path underscore-path}) | ||
| result (tool-system/execute-tool tool-instance validated) | ||
| formatted (tool-system/format-results tool-instance result)] | ||
| (is (not (:error formatted)) "Should successfully read the file") | ||
| (.delete underscore-file))) | ||
|
|
||
| (testing "Neither dash nor underscore version exists - returns normal error" | ||
| (let [tool-instance (unified-read-file-tool/create-unified-read-file-tool *nrepl-client-atom*) | ||
| non-existent-path (.getAbsolutePath (io/file *test-dir* "no-such-file.clj"))] | ||
| (is (thrown-with-msg? clojure.lang.ExceptionInfo | ||
| #"does not exist" | ||
| (tool-system/validate-inputs tool-instance {:path non-existent-path})) | ||
| "Should throw an error when neither file version exists"))) | ||
|
|
||
| (testing "Non-Clojure files (.java) - do NOT auto-correct" | ||
| (let [underscore-file (io/file *test-dir* "core_stuff.java") | ||
| _ (spit underscore-file "public class core_stuff {}") | ||
| tool-instance (unified-read-file-tool/create-unified-read-file-tool *nrepl-client-atom*) | ||
| dash-path (.getAbsolutePath (io/file *test-dir* "core-stuff.java"))] | ||
| (is (thrown-with-msg? clojure.lang.ExceptionInfo | ||
| #"does not exist" | ||
| (tool-system/validate-inputs tool-instance {:path dash-path})) | ||
| "Should NOT auto-correct non-Clojure file extensions") | ||
| (.delete underscore-file))) | ||
|
|
||
| (testing "Directory components with dashes - only filename corrected" | ||
| (let [dashed-dir (io/file *test-dir* "my-cool-dir") | ||
| _ (.mkdirs dashed-dir) | ||
| underscore-file (io/file dashed-dir "my_file.clj") | ||
| _ (spit underscore-file "(ns my-cool-dir.my-file)") | ||
| tool-instance (unified-read-file-tool/create-unified-read-file-tool *nrepl-client-atom*) | ||
| dash-path (.getAbsolutePath (io/file dashed-dir "my-file.clj")) | ||
| validated (tool-system/validate-inputs tool-instance {:path dash-path}) | ||
| result (tool-system/execute-tool tool-instance validated) | ||
| formatted (tool-system/format-results tool-instance result)] | ||
| (is (not (:error formatted)) "Should read file with dashed directory and corrected filename") | ||
| (is (str/includes? (:path validated) "my-cool-dir") | ||
| "Directory dashes should be preserved") | ||
| (.delete underscore-file) | ||
| (.delete dashed-dir))) | ||
|
|
||
| (testing "Full pipeline through make-tool-tester with dash correction" | ||
| (let [underscore-file (io/file *test-dir* "pipeline_test.clj") | ||
| _ (spit underscore-file "(ns pipeline-test)\n(defn greet [name] (str \"Hello \" name))") | ||
| tool-instance (unified-read-file-tool/create-unified-read-file-tool *nrepl-client-atom*) | ||
| tool-fn (test-utils/make-tool-tester tool-instance) | ||
| dash-path (.getAbsolutePath (io/file *test-dir* "pipeline-test.clj")) | ||
| result (tool-fn {:path dash-path})] | ||
| (is (not (:error? result)) "Full pipeline should succeed with dash correction") | ||
| (.delete underscore-file))) | ||
|
|
||
| (testing "Babashka (.bb) files - do NOT auto-correct" | ||
| (let [underscore-file (io/file *test-dir* "my_script.bb") | ||
| _ (spit underscore-file "(println :hello)") | ||
| tool-instance (unified-read-file-tool/create-unified-read-file-tool *nrepl-client-atom*) | ||
| dash-path (.getAbsolutePath (io/file *test-dir* "my-script.bb"))] | ||
| (is (thrown-with-msg? clojure.lang.ExceptionInfo | ||
| #"does not exist" | ||
| (tool-system/validate-inputs tool-instance {:path dash-path})) | ||
| "Should NOT auto-correct .bb files") | ||
| (.delete underscore-file)))) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify response-shape fields for unified-read-file and whether correction metadata exists.
# Inspect response map construction in the tool implementation.
rg -n -C5 'defn|format-results|execute|validate|:result|:error|:path|corrected|correction|notice|original' src/clojure_mcp/tools/unified_read_file
# Inspect existing tests for assertions on correction metadata/notice.
rg -n -C3 'dash-to-underscore|corrected|correction|notice|original|:path validated|format-results' test/clojure_mcp/tools/unified_read_file/tool_test.cljRepository: bhauman/clojure-mcp
Length of output: 50377
🏁 Script executed:
#!/bin/bash
# Search for correction-related metadata in response building
rg -n "correct|original|dash|underscore" src/clojure_mcp/tools/unified_read_file/tool.clj | head -30
# Check if there's any mention of what should be in the response for corrections
rg -n "corrected\|correction\|auto.*correct" src/clojure_mcp/tools/unified_read_file/
# Look at the valid_paths module to see what validate-path-with-client returns
find src -name "*valid*" -o -name "*path*" | head -20Repository: bhauman/clojure-mcp
Length of output: 149
🏁 Script executed:
#!/bin/bash
# Check the valid_paths implementation to understand path correction
find src -type f -name "*.clj" | xargs rg -l "validate-path-with-client" | head -5Repository: bhauman/clojure-mcp
Length of output: 271
🏁 Script executed:
#!/bin/bash
# Look for the actual implementation of dash-to-underscore correction
rg -n "dash\|underscore\|kebab" src/clojure_mcp/tools/unified_read_file/ -A5 -B5Repository: bhauman/clojure-mcp
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Inspect valid_paths module
cat src/clojure_mcp/utils/valid_paths.cljRepository: bhauman/clojure-mcp
Length of output: 12403
🏁 Script executed:
#!/bin/bash
# Check if the tool response includes any correction metadata
rg -n "original\|corrected\|correction" src/clojure_mcp/tools/unified_read_file/tool.cljRepository: bhauman/clojure-mcp
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Verify test expectations - what fields are being checked in format-results
sed -n '103,182p' test/clojure_mcp/tools/unified_read_file/tool_test.clj | grep -E "is |let \[|formatted|result"Repository: bhauman/clojure-mcp
Length of output: 1634
🏁 Script executed:
#!/bin/bash
# Check what fields validate-path-with-client actually returns
sed -n '78,120p' src/clojure_mcp/utils/valid_paths.cljRepository: bhauman/clojure-mcp
Length of output: 1922
🏁 Script executed:
#!/bin/bash
# Look at the full validate-inputs implementation to see all returned fields
sed -n '111,141p' src/clojure_mcp/tools/unified_read_file/tool.cljRepository: bhauman/clojure-mcp
Length of output: 1408
🏁 Script executed:
#!/bin/bash
# Check execute-tool return structure - does it include correction metadata?
sed -n '143,217p' src/clojure_mcp/tools/unified_read_file/tool.clj | grep -E ":path|:error|:original|:corrected|:mode" | head -20Repository: bhauman/clojure-mcp
Length of output: 602
🏁 Script executed:
#!/bin/bash
# Check format-results to see what fields are in final response
sed -n '275,299p' src/clojure_mcp/tools/unified_read_file/tool.cljRepository: bhauman/clojure-mcp
Length of output: 909
Add assertions verifying correction metadata is included in tool responses.
Tests confirm the dash-to-underscore correction works functionally, but they don't verify that the correction is communicated to the MCP client. The current implementation silently corrects file paths without surfacing this correction in the response.
Responses need to include correction metadata (corrected filename and indication that auto-correction occurred) so users and client tools understand what path was actually accessed. This aligns with the guideline to "Validate inputs and provide helpful error messages in MCP tool implementations."
Add assertions to verify:
- The corrected filename appears in the response metadata
- The response includes a flag or notice indicating auto-correction occurred (not just relying on implicit behavior)
Example assertions to include:
- Verify
:original-pathor similar field captures the requested path - Verify
:corrected?or:correction-noticeflag communicates the auto-correction to the client - Ensure this metadata flows through from
validate-inputs→execute-tool→format-results
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/clojure_mcp/tools/unified_read_file/tool_test.clj` around lines 103 -
182, Add assertions to these tests to ensure the dash→underscore correction is
surfaced in tool metadata: after calling tool-system/validate-inputs,
tool-system/execute-tool, and tool-system/format-results (and when using
test-utils/make-tool-tester), assert that the validated/result/formatted maps
include the original requested path (e.g. :original-path or similar), a
corrected path (e.g. :path or :corrected-path showing the actual file used), and
an explicit flag/notice (e.g. :corrected? true or :correction-notice) indicating
auto-correction occurred; update each relevant testing block (the ones that read
corrected files and the full pipeline block) to check these fields flow through
validate-inputs → execute-tool → format-results using the same symbol names used
by the tool implementation
(unified-read-file-tool/create-unified-read-file-tool,
tool-system/validate-inputs, tool-system/execute-tool,
tool-system/format-results).
|
Hi Bruce, i see Coderabbit comments about adding :corrected? or :correction-notice flag in the feature, but not clear this was what you wanted. let me know if any change is needed. |
Summary
-and return file content along with corrected filename. #48 — LLMs commonly request Clojure files using namespace-style dashes (core-stuff.clj) when the filesystem uses underscores (core_stuff.clj)validate-path-with-clientso all tools (read_file, edit, grep, glob, etc.) get the fix for free.clj,.cljs,.cljc,.cljdfiles — extensions that follow the Clojure namespace-to-filename conventionApproach
Per your review feedback: correction lives in
validate-path-with-clientinvalid_paths.clj, not in the read_file tool. Aftervalidate-pathreturns, if the file doesn't exist and has a Clojure source extension, tries the underscore version. The tool already returns the path it read, so the LLM sees the corrected filename without needing an explicit notice..cljdincluded per your confirmation — ClojureDart'sns-to-pathsincompiler.cljcuses the same dash-to-underscore mapping.Changes
utils/valid_paths.clj—clojure-source-ext?predicate,try-dash-to-underscore-correctionhelper, updatedvalidate-path-with-clientutils/valid_paths_test.clj— 13 unit test scenarios (including.cljd)tools/unified_read_file/tool_test.clj— 7 integration tests through full tool pipeline + recursive fixture cleanupKnown limitation
Correction applies to the filename only, not directory components. If an LLM sends
src/my-app/cool-module.cljdwhen the filesystem hassrc/my_app/cool_module.cljd, only the filename gets corrected (cool-module.cljd→cool_module.cljd). The directory mismatch (my-app/vsmy_app/) is not corrected. Directory correction could be a follow-up if needed.Test plan
.cljdsupport)Summary by CodeRabbit
New Features
Tests