Skip to content

Add Cursor rules to improve Cursor support for development #319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Jun 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
4ece11c
get aiq workflow create to work with Cursor rules
yczhang-nv May 27, 2025
7c6d67a
Add rules for aiq workflow delete
yczhang-nv May 27, 2025
f8f4de5
separate different commands
yczhang-nv May 27, 2025
2f94b32
add reinstall command
yczhang-nv May 27, 2025
c254bf3
add aiq run
yczhang-nv May 27, 2025
e11bb5f
add rules for adding tools to workflow config
yczhang-nv May 27, 2025
a33ef52
Merge remote-tracking branch 'upstream/develop' into yuchen-add-curso…
yczhang-nv May 28, 2025
c1a352e
add documentation
yczhang-nv May 28, 2025
ad2cbc8
add other commands to cursor rules
yczhang-nv May 28, 2025
d65e8f0
update and clean-ups
yczhang-nv May 28, 2025
0293d59
add cursor rules into .mdc files
yczhang-nv May 28, 2025
0e66623
Merge remote-tracking branch 'upstream/develop' into yuchen-add-curso…
yczhang-nv May 30, 2025
285cada
separate rules
yczhang-nv May 31, 2025
b573b2a
re-format rules
yczhang-nv Jun 2, 2025
a34740c
restructure Cursor Rules
yczhang-nv Jun 3, 2025
b1b28af
The AIQ CLI rules are working
yczhang-nv Jun 4, 2025
fdf4911
update documentation
yczhang-nv Jun 4, 2025
c0d9b52
some updates on the rules
yczhang-nv Jun 4, 2025
97d55e1
add several Cursor rules
yczhang-nv Jun 9, 2025
d6622ac
restructered cursor rules and updated descriptions
yczhang-nv Jun 9, 2025
bd21607
Update general rules
yczhang-nv Jun 9, 2025
eca52fc
add Cursor rule user guide
yczhang-nv Jun 9, 2025
dbc0613
fix broken link
yczhang-nv Jun 9, 2025
3aeae3d
add Cursor rules for agents
yczhang-nv Jun 9, 2025
1dae425
initial version of developer guide
yczhang-nv Jun 10, 2025
8278714
revised developer guide
yczhang-nv Jun 10, 2025
fae3758
update user guide
yczhang-nv Jun 11, 2025
83b5723
add Cursor rules for documentation
yczhang-nv Jun 11, 2025
c0ce206
Merge remote-tracking branch 'upstream/develop' into yuchen-add-curso…
yczhang-nv Jun 11, 2025
8e5cdad
fix broken links
yczhang-nv Jun 11, 2025
62211c3
fix broken links
yczhang-nv Jun 11, 2025
0918153
fix broken link
yczhang-nv Jun 11, 2025
429ae2e
fix CI
yczhang-nv Jun 11, 2025
5a7eabc
fix CI
yczhang-nv Jun 11, 2025
e1a67f4
fix CI
yczhang-nv Jun 11, 2025
997b83f
fix toctree
yczhang-nv Jun 11, 2025
61e3816
remove lines
yczhang-nv Jun 11, 2025
3140c1c
fix comments
yczhang-nv Jun 12, 2025
64f9f7d
fix comments
yczhang-nv Jun 12, 2025
fa30b28
fix comments
yczhang-nv Jun 12, 2025
2e3ab29
update
yczhang-nv Jun 12, 2025
5a6f9db
revert
yczhang-nv Jun 12, 2025
dcb80fc
fix broken links
yczhang-nv Jun 12, 2025
a76c206
fix comments
yczhang-nv Jun 13, 2025
390b8b2
fix broken link
yczhang-nv Jun 13, 2025
2f82d0f
small fixes
yczhang-nv Jun 13, 2025
c7c35dd
add Cursor rules for documentation
yczhang-nv Jun 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions .cursor/rules/aiq-agents/general.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
description: Follow these rules when the user's request involves integrating or selecting ReAct, Tool-Calling, Reasoning, or ReWOO agents within AIQ workflows
globs:
alwaysApply: false
---
# AIQ Agents Integration & Selection Rules

These rules standardise how the four built-in AIQ agents are configured inside YAML‐based workflows/functions and provide guidance for choosing the most suitable agent for a task.

## Referenced Documentation

- **ReAct Agent Docs**: [react-agent.md](mdc:docs/source/workflows/about/react-agent.md) – Configuration, prompt format and limitations.
- **Tool-Calling Agent Docs**: [tool-calling-agent.md](mdc:docs/source/workflows/about/tool-calling-agent.md) – Configuration, tool schema routing and limitations.
- **Reasoning Agent Docs**: [reasoning-agent.md](mdc:docs/source/workflows/about/reasoning-agent.md) – Configuration, wrapper semantics and limitations.
- **ReWOO Agent Docs**: [rewoo-agent.md](mdc:docs/source/workflows/about/rewoo-agent.md) – Configuration, planning/solver architecture and limitations.

## Integration Guidelines

1. **ReAct Agent**
- Use `_type: react_agent` in either the top-level `workflow:` or inside `functions:`.
- Always provide `tool_names` (list of YAML-defined functions) and `llm_name`.
- Optional but recommended parameters: `verbose`, `max_iterations`, `handle_parsing_errors`, `max_retries`.
- When overriding the prompt, keep `{tools}` and `{tool_names}` placeholders and ensure the LLM outputs in ReAct format.

2. **Tool-Calling Agent**
- Use `_type: tool_calling_agent`.
- Requires an LLM that supports function/tool calling (e.g. OpenAI, Nim chat-completion).
- Mandatory fields: `tool_names`, `llm_name`.
- Recommended fields: `verbose`, `handle_tool_errors`, `max_iterations`.
- Tool input parameters must be well-named; the agent relies on them for routing.

3. **ReWOO Agent**
- Use `_type: rewoo_agent`.
- Provide `tool_names` and `llm_name`.
- The agent executes a *planning* and then *solver* phase; advanced users may override `planner_prompt` or `solver_prompt` but must preserve required placeholders.
- Use `include_tool_input_schema_in_tool_description: true` to improve tool disambiguation.

4. **Reasoning Agent**
- Use `_type: reasoning_agent`.
- Requires a *reasoning-capable* LLM (e.g. DeepSeek-R1) that supports `<think></think>` tags.
- Mandatory fields: `llm_name`, `augmented_fn` (the underlying function/agent to wrap).
- Optional fields: `verbose`, `reasoning_prompt_template`, `instruction_prompt_template`.
- The `augmented_fn` must itself be defined in the YAML (commonly a ReAct or Tool-Calling agent).

## Selection Guidelines

Use this quick heuristic when deciding which agent best fits a workflow:

| Scenario | Recommended Agent | Rationale |
| --- | --- | --- |
| Simple, schema-driven tasks (single or few tool calls) | **Tool-Calling** | Lowest latency; leverages function-calling; no iterative reasoning needed |
| Multi-step tasks requiring dynamic reasoning between tool calls | **ReAct** | Iterative Think → Act → Observe loop excels at adaptive decision-making |
| Complex tasks where token/latency cost of ReAct is high but advance planning is beneficial | **ReWOO** | Plans once, then executes; reduces token usage vs. ReAct |
| Need to bolt an upfront reasoning/planning layer onto an existing agent or function | **Reasoning Agent** | Produces a plan that guides the wrapped function; separates planning from execution |

### Additional Tips

- If the LLM **does not** support function/tool calling, prefer **ReAct** or **ReWOO**.
- If up-front planning suffices and adaptability during execution is less critical, prefer **ReWOO** over **ReAct** for better token efficiency.
- When using **Reasoning Agent**, ensure the underlying `augmented_fn` itself can handle the planned steps (e.g., is a ReAct or Tool-Calling agent with relevant tools).
- For workflows that need parallel execution of independent tool calls, none of these agents currently offer built-in parallelism; consider splitting tasks or using custom orchestration.
312 changes: 312 additions & 0 deletions .cursor/rules/aiq-cli/aiq-eval.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,312 @@
---
description: Follow these rules when the user requests to evaluate a workflow
globs:
alwaysApply: false
---
# AIQ Evaluation Commands

This rule provides guidance for using `aiq eval` command to assess the accuracy of AIQ toolkit workflows and instrument their performance characteristics.

## aiq eval

Evaluates a workflow with a specified dataset to assess accuracy and performance.

### Basic Usage
```bash
aiq eval --config_file CONFIG_FILE [OPTIONS]
```

### Required Arguments
- `--config_file FILE`: A JSON/YAML file that sets the parameters for the workflow and evaluation

### Available Options
- `--dataset FILE`: A JSON file with questions and ground truth answers (overrides dataset path in config)
- `--result_json_path TEXT`: JSON path to extract result from workflow output (default: `$`)
- `--skip_workflow`: Skip workflow execution and use provided dataset for evaluation
- `--skip_completed_entries`: Skip dataset entries that already have generated answers
- `--endpoint TEXT`: Use endpoint for running workflow (e.g., `http://localhost:8000/generate`)
- `--endpoint_timeout INTEGER`: HTTP response timeout in seconds (default: 300)
- `--reps INTEGER`: Number of repetitions for evaluation (default: 1)

### Examples
```bash
# Basic evaluation with config file
aiq eval --config_file configs/eval_config.yml

# Evaluate with custom dataset
aiq eval --config_file configs/eval_config.yml --dataset data/test_questions.json

# Evaluate against running endpoint
aiq eval --config_file configs/eval_config.yml --endpoint http://localhost:8000/generate

# Skip workflow execution (evaluate existing results)
aiq eval --config_file configs/eval_config.yml --skip_workflow

# Multiple evaluation repetitions
aiq eval --config_file configs/eval_config.yml --reps 3

# Extract specific result field
aiq eval --config_file configs/eval_config.yml --result_json_path "$.response.answer"

# Skip already completed entries and extend timeout
aiq eval --config_file configs/eval_config.yml --skip_completed_entries --endpoint_timeout 600
```

## Dataset Format

The evaluation dataset should be a JSON file containing questions and ground truth answers:

### Basic Format
```json
[
{
"question": "What is machine learning?",
"ground_truth": "Machine learning is a subset of artificial intelligence..."
},
{
"question": "Explain neural networks",
"ground_truth": "Neural networks are computing systems inspired by..."
}
]
```

### Extended Format
```json
[
{
"question": "What is deep learning?",
"ground_truth": "Deep learning is a subset of machine learning...",
"context": "AI fundamentals",
"difficulty": "intermediate",
"category": "technical"
}
]
```

### Dataset with Generated Answers (for skip_workflow)
```json
[
{
"question": "What is AI?",
"ground_truth": "Artificial intelligence refers to...",
"generated_answer": "AI is the simulation of human intelligence..."
}
]
```

## Configuration File for Evaluation

The evaluation configuration should include both workflow and evaluation settings:

```yaml
# Workflow components
llms:
nim_llm:
_type: "nim_llm"
model: "meta/llama-3.1-8b-instruct"
temperature: 0.7

workflow:
_type: "simple_rag"
llm: llms.nim_llm

# Evaluation settings
evaluation:
dataset: "data/eval_dataset.json"
evaluators:
- _type: "semantic_similarity"
threshold: 0.8
- _type: "factual_accuracy"

metrics:
- "accuracy"
- "bleu_score"
- "semantic_similarity"
```

## Handling Missing Evaluation Configuration

When working with configuration files that may not contain an evaluation section, follow these rules:

### 1. Auto-detection of Evaluation Configuration
If the specified configuration file does not contain an `evaluation` section:

1. **Search for alternative config files**: Look for configuration files in the same directory that contain an `evaluation` section
2. **Common evaluation config patterns**: Check for files with names like:
- `*_eval.yml` or `*_eval.yaml`
- `*_evaluation.yml` or `*_evaluation.yaml`
- `eval_*.yml` or `eval_*.yaml`
- `evaluation_*.yml` or `evaluation_*.yaml`
3. **Suggest available options**: If multiple evaluation configs are found, present them to the user for selection

### 2. User Guidance for Missing Evaluation Section
If no evaluation configuration can be found automatically:

1. **Inform the user**: Clearly explain that no evaluation section was found in the configuration file
2. **Request essential information**: Ask the user to provide the following required information:
- **Dataset path**: Location of the evaluation dataset (JSON file with questions and ground truth)
- **Evaluators**: Which evaluation metrics to use (e.g., semantic_similarity, factual_accuracy)
- **Output preferences**: Where to save results and what format to use

### 3. Interactive Configuration Building
When evaluation configuration is missing, guide the user through creating one:

```bash
# Example prompts for missing evaluation config
"No evaluation section found in config file. Please provide:"
"1. Dataset file path (JSON with questions and ground_truth):"
"2. Evaluation metrics (comma-separated): [semantic_similarity, factual_accuracy, bleu_score]:"
"3. Output file path (optional):"
```

### 4. Minimal Evaluation Configuration Template
When user provides minimal information, create a basic evaluation configuration:

```yaml
evaluation:
dataset: "path/to/user/provided/dataset.json"
evaluators:
- _type: "semantic_similarity"
threshold: 0.8
metrics:
- "accuracy"
- "semantic_similarity"
```

### 5. Configuration Validation
Before proceeding with evaluation:
1. Verify the dataset file exists and is accessible
2. Validate the dataset format (contains required `question` and `ground_truth` fields)
3. Confirm all specified evaluators are available
4. Warn if essential evaluation components are missing

## Result JSON Path Usage

Use `--result_json_path` to extract specific fields from complex workflow outputs:

### Example Workflow Output
```json
{
"metadata": {"timestamp": "2024-01-01T00:00:00"},
"response": {
"answer": "The actual answer text",
"confidence": 0.95,
"sources": ["doc1.pdf", "doc2.pdf"]
},
"debug_info": {"tokens_used": 150}
}
```

### JSON Path Examples
```bash
# Extract just the answer
aiq eval --config_file config.yml --result_json_path "$.response.answer"

# Extract answer with confidence
aiq eval --config_file config.yml --result_json_path "$.response"

# Extract root level (default)
aiq eval --config_file config.yml --result_json_path "$"
```

## Endpoint Evaluation

When evaluating against a running service:

### Prerequisites
1. Start the service: `aiq serve --config_file config.yml --host localhost --port 8000`
2. Verify service is running: Check `http://localhost:8000/docs`

### Evaluation
```bash
# Evaluate against local service
aiq eval --config_file eval_config.yml --endpoint http://localhost:8000/generate

# Evaluate against remote service with timeout
aiq eval --config_file eval_config.yml --endpoint https://api.example.com/workflow --endpoint_timeout 300
```

## Evaluation Workflows

### 1. Initial Workflow Evaluation
```bash
# Validate configuration
aiq validate --config_file eval_config.yml

# Run evaluation
aiq eval --config_file eval_config.yml --dataset test_data.json

# Review results and iterate
```

### 2. Continuous Evaluation
```bash
# Skip completed entries for incremental evaluation
aiq eval --config_file eval_config.yml --skip_completed_entries

# Multiple repetitions for statistical significance
aiq eval --config_file eval_config.yml --reps 5
```

### 3. Production Endpoint Evaluation
```bash
# Start production service
aiq serve --config_file prod_config.yml --host 0.0.0.0 --port 8000 --workers 4

# Evaluate production endpoint
aiq eval --config_file eval_config.yml --endpoint http://localhost:8000/generate --endpoint_timeout 600
```

### 4. Evaluation-Only Mode
```bash
# When you have pre-generated results
aiq eval --config_file eval_config.yml --skip_workflow --dataset results_with_generated_answers.json
```

## Best Practices

1. **Prepare Quality Datasets**: Ensure ground truth answers are accurate and comprehensive
2. **Use Representative Data**: Include diverse questions that reflect real-world usage
3. **Configure Multiple Evaluators**: Use different evaluation metrics for comprehensive assessment
4. **Start Small**: Test with a small dataset before running full evaluations
5. **Version Control Datasets**: Track dataset versions alongside code changes
6. **Document Evaluation Setup**: Keep clear records of evaluation configurations and results
7. **Use Timeouts Appropriately**: Set reasonable timeouts based on expected response times
8. **Incremental Evaluation**: Use `--skip_completed_entries` for long-running evaluations
9. **Statistical Significance**: Use multiple repetitions (`--reps`) for robust results
10. **Monitor Resource Usage**: Consider memory and compute requirements for large datasets

## Common Evaluation Scenarios

### A/B Testing Configurations
```bash
# Evaluate baseline configuration
aiq eval --config_file baseline_config.yml --dataset test_set.json --output results_baseline.json

# Evaluate improved configuration
aiq eval --config_file improved_config.yml --dataset test_set.json --output results_improved.json

# Compare results
```

### Parameter Tuning
```bash
# Evaluate different temperature settings
aiq eval --config_file config.yml --override llms.nim_llm.temperature 0.3 --dataset tune_set.json
aiq eval --config_file config.yml --override llms.nim_llm.temperature 0.7 --dataset tune_set.json
aiq eval --config_file config.yml --override llms.nim_llm.temperature 0.9 --dataset tune_set.json
```

### Performance Monitoring
```bash
# Regular evaluation with metrics collection
aiq eval --config_file monitor_config.yml --endpoint http://prod-service:8000/generate --reps 3
```

## Troubleshooting

- **Timeout Errors**: Increase `--endpoint_timeout` for slow workflows
- **Memory Issues**: Process datasets in smaller batches
- **Connection Errors**: Verify endpoint URLs and service availability
- **JSON Path Errors**: Test JSON paths with sample outputs first
- **Missing Ground Truth**: Ensure dataset format matches expected structure
Loading