Skip to content

[Nadiad] Yugkumar Mistry — Vibe Coding Submission#9

Open
Yug-Mistry wants to merge 5 commits into
nasscomAI:masterfrom
Yug-Mistry:participant/Yugkumar-Nadiad
Open

[Nadiad] Yugkumar Mistry — Vibe Coding Submission#9
Yug-Mistry wants to merge 5 commits into
nasscomAI:masterfrom
Yug-Mistry:participant/Yugkumar-Nadiad

Conversation

@Yug-Mistry
Copy link
Copy Markdown

@Yug-Mistry Yug-Mistry commented Apr 14, 2026

RAG-to-MCP — Submission PR

Name: Yugkumar Nadiad
City / Group: Nadiad
Date: 14 April 2026
AI tool(s) used: GitHub Copilot (Claude Sonnet 4.6)


Submission Checklist

  • uc-0a/agents.md — present and updated
  • uc-0a/skills.md — present and updated
  • uc-0a/classifier.py — runs without crash
  • uc-0a/results_pune.csv — output present
  • uc-rag/agents.md — present and updated
  • uc-rag/skills.md — present and updated
  • uc-rag/rag_server.py — not the stub, your implementation
  • uc-mcp/agents.md — present and updated
  • uc-mcp/skills.md — present and updated
  • uc-mcp/mcp_server.py — passes at least one test_client.py test
  • 3+ commits with meaningful messages, one per UC
  • All sections below filled

UC-0A — Complaint Classifier

Which failure mode did you encounter first?

Taxonomy drift — the naive prompt invented category names like "Road Issue" and "Drainage Problem" instead of using the exact schema values. The same complaint type received different labels across rows.

Which enforcement rule fixed it? Quote from your agents.md:

"Category must be exactly one value from the allowed list: Pothole, Flooding, Streetlight, Waste, Noise, Road Damage, Heritage Damage, Heat Hazard, Drain Blockage, Other. No variations or invented names."

Your commit message for UC-0A:

UC-0A Generated agents.md and skills.md from README, implemented classifier

Verification checkpoints:

  • All severity-signal rows (injury/child/school/hospital keywords) classified as Urgent
  • No invented categories outside the defined taxonomy
  • Justification column present and non-empty for every row

UC-RAG — RAG Server

Which failure mode did you encounter?

Chunk boundary failure — policy clause 5.2 ("requires approval from the Department Head and the HR Director") was split across two fixed-size chunks. Neither chunk alone contained the complete dual-approver obligation, so retrieval returned an incomplete answer.

What chunking strategy did you use and why?

Sentence-boundary chunking: text is split on sentence-ending punctuation and sentences are accumulated until the 400-token limit is reached. If adding the next sentence would exceed the limit, the current chunk is flushed first and the sentence opens a new chunk. This guarantees no clause is cut mid-sentence regardless of clause length.

Did your system correctly refuse "What is the flexible working culture?"?

Yes — no chunk scored above the 0.3 threshold for this query. The refusal template was returned with all retrieved chunk sources listed and no LLM call was made.

Did your system retrieve the correct document for "Can I use my personal phone for work files?"?

Yes — top retrieved chunks were policy_it_acceptable_use.txt chunk 0 and policy_it_acceptable_use.txt chunk 1. No HR leave chunks appeared in the passing set.

Which enforcement rule in agents.md prevented answers outside retrieved context?

"Answers must use only information present in the retrieved chunks. Never add context, assumptions, or qualifications from outside the retrieved set."

Your commit message for UC-RAG:

UC-RAG Generated agents.md and skills.md from README, implemented RAG server

Verification checkpoints:

  • At least 3 test queries return grounded answers (cited from retrieved context)
  • "What is the flexible working culture?" returns the refusal template (not a hallucinated answer)
  • "Can I use my personal phone for work files?" retrieves IT policy, not HR leave policy
  • Chunking produces more than 1 chunk per document (not whole-document embedding)

UC-MCP — MCP Server

Paste your tool description from mcp_server.py TOOL_DEFINITION:

"Answers questions about City Municipal Corporation (CMC) policy documents: HR Leave Policy, IT Acceptable Use Policy, and Finance Reimbursement Policy. Returns answers grounded in retrieved document chunks with cited sources. Questions outside these three documents return a refusal message — this tool does not answer general knowledge questions, budget forecasts, or topics not covered by the indexed CMC policy documents."

Does it state the document scope explicitly?

Yes — names all three policy documents and explicitly states what the tool will not answer.

Run result: python test_client.py --run-all

✅ tools/list — tool discovered with correct scope description
✅ In-scope: "Who approves leave without pay?" — answer returned
✅ Cross-doc: "Can I use my personal phone for work files?" — answer returned
✅ Out-of-scope: "What is the budget forecast for 2025?" — correctly refused
✅ Unknown method → -32601 error returned

Did the budget forecast question return isError: true?

Yes — no chunk scored above 0.3 for this query. The refusal template was returned with isError: true and no LLM call was made.

In one sentence — why is the tool description the enforcement?

The agent reads the tool description to decide when to call the tool, so a vague description grants implicit permission to call it for questions it cannot answer, wasting tool calls and producing empty or hallucinated responses.

Your commit message for UC-MCP:

UC-MCP Generated agents.md and skills.md from README, implemented MCP server

Verification checkpoints:

  • Tool description explicitly states document scope (which policies are covered)
  • Tool description states refusal behavior for out-of-scope queries
  • python test_client.py --run-all executes without connection error
  • Budget forecast question returns isError: true (out of scope)

CRAFT Reflection

Which step of the CRAFT loop was hardest across all three UCs?

Constrain — specifically calibrating the similarity threshold in UC-RAG. Writing the rule "refuse if score below 0.6" was easy; discovering that all-MiniLM-L6-v2 produces scores of 0.3–0.5 for semantically related but non-verbatim policy text required running the pipeline end-to-end and reading raw distance values. The rule looked correct on paper but failed silently at inference time until grounded in observed model behaviour.

What did you add to agents.md manually that the AI did not generate?

In UC-RAG agents.md, the explicit cross-document separation rule: "If the query spans two documents, retrieve from each document separately. Never merge retrieved chunks from different documents into a single blended answer." The AI generated a generic grounding rule but did not restrict per-document retrieval, which is the specific enforcement needed to prevent IT+HR policy blending.

One specific task in your real work where you will use R.I.C.E in the next 7 days:

Building an internal document Q&A bot for onboarding — new employees currently get inconsistent answers sourced from a mix of HR, IT, and Finance wikis. I will apply RICE to scope the agent strictly to indexed wiki pages and CRAFT to test whether naive retrieval blends documents before writing any enforcement rules.

@github-actions
Copy link
Copy Markdown

Hi there, participant! Thanks for joining our RAG-to-MCP Workshop!

We're reviewing your PR for the 3 Use Cases (UC-0A, UC-RAG, UC-MCP). Once your submission is validated and merged, you'll be awarded your completion badge!

Next Steps:

  • Make sure all 3 UCs are finished.
  • Ensure your commit messages match the required format.
  • Fill out every section of the PR template.
  • Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants