Sanitize user-controlled inputs in XML-structured LLM prompts to prevent injection attacks

**User Story**  
As a security-conscious developer,  
I want to sanitize user-controlled inputs in prompt formatting  
so that malicious XML tags can't disrupt LLM response parsing.

**Background**  
The current `user_prompt.format()` in `eknowledge/main.py` directly inserts raw text into XML-structured LLM prompts. This allows injection of fake `<node>` entries through inputs containing XML syntax (e.g., `"<node><from_node>HACK</from_node>"`). The vulnerability exists in:
```python
# main.py line 92:
HumanMessage(content=user_prompt.format(text=chunk, relationships=relations))
```
Attackers could manipulate knowledge graph outputs by poisoning text inputs with XML tags, potentially creating虚假 relationships or disrupting parsing logic.

**Acceptance Criteria**  
- [ ] Modify `execute_graph_generation` in `eknowledge/main.py` to sanitize text inputs  
- [ ] Replace special XML characters (`<`, `>`, `&`) with entities (`&lt;`, `&gt;`, `&amp;`) before string formatting  
- [ ] Add test case in `tests/test_eknowledge.py` that verifies:  
  - Inputs containing `<node>TEST</node>` get converted to `&lt;node&gt;TEST&lt;/node&gt;` in prompts  
  - LLM receives sanitized text that doesn't create unintended XML nodes  
- [ ] Ensure verbose mode logs show original vs sanitized text when enabled  
- [ ] Maintain existing chunk processing performance (add benchmark assertion if missing)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sanitize user-controlled inputs in XML-structured LLM prompts to prevent injection attacks #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Sanitize user-controlled inputs in XML-structured LLM prompts to prevent injection attacks #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions