Skip to content

Latest commit

 

History

History
150 lines (111 loc) · 4.02 KB

File metadata and controls

150 lines (111 loc) · 4.02 KB

Quick Start

1. Prerequisites

  • Python >=3.10,<3.13
  • Data-Juicer runtime (py-data-juicer)
  • A DashScope or OpenAI-compatible API key

2. Install

Choose one installation profile:

  • core: full data_juicer_agents command surface
  • harness: minimal install for the djx tool harness profile
  • full: core plus copilot and interecipe
cd ./data-juicer-agents
uv venv .venv
source .venv/bin/activate
uv pip install -e '.[core]'

Harness install:

uv pip install -e '.[harness]'
export DJX_TOOL_PROFILE=harness

Full install:

uv pip install -e '.[full]'

3. Configure model access

export DASHSCOPE_API_KEY="<your_key>"
# or:
# export MODELSCOPE_API_TOKEN="<your_key>"

# Optional overrides
export DJA_OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export DJA_SESSION_MODEL="qwen3-max-2026-01-23"
export DJA_PLANNER_MODEL="qwen3-max-2026-01-23"
export DJA_MODEL_FALLBACKS="qwen-max,qwen-plus"
export DJA_LLM_THINKING="true"

4. Minimal CLI path

Optional inspection step:

djx retrieve "remove duplicate text records" \
  --dataset ./data/demo-dataset.jsonl \
  --top-k 8

Generate a plan:

djx plan "deduplicate and clean text for RAG" \
  --dataset ./data/demo-dataset.jsonl \
  --export ./data/demo-dataset-processed.jsonl \
  --output ./data/demo-plan.yaml

Apply the saved plan:

djx apply --plan ./data/demo-plan.yaml --yes

Dry-run without executing dj-process:

djx apply --plan ./data/demo-plan.yaml --yes --dry-run

Notes:

  • djx plan already performs internal operator retrieval before building the final plan.
  • djx retrieve is useful for inspection and debugging.

Minimal atomic tool path:

djx tool list --tag plan
djx tool schema inspect_dataset
djx tool run list_system_config --input-json '{}'

Notes:

  • djx tool is JSON-only and primarily intended for agent / skill automation.
  • write or execute tools require explicit --yes.
  • DJX_TOOL_PROFILE=harness limits djx tool to the harness tool set (apply, context, retrieve, plan).

5. Session mode (dj-agents)

Default TUI:

dj-agents --dataset ./data/demo-dataset.jsonl --export ./data/demo-dataset-processed.jsonl

Plain terminal mode:

dj-agents --ui plain --dataset ./data/demo-dataset.jsonl --export ./data/demo-dataset-processed.jsonl

AgentScope Studio mode:

as_studio
dj-agents --ui as_studio --studio-url http://localhost:3000 --dataset ./data/demo-dataset.jsonl --export ./data/demo-dataset-processed.jsonl

Notes:

  • dj-agents requires LLM access.
  • In session mode, press Ctrl+C to interrupt the current turn and Ctrl+D to exit.
  • In as_studio mode, start AgentScope Studio separately before launching dj-agents.
  • The session agent usually plans with inspect_dataset -> retrieve_operators -> build_dataset_spec -> build_process_spec -> build_system_spec -> assemble_plan -> plan_validate -> plan_save.
  • For operator-level discovery and schema lookup, prefer retrieve_operators / retrieve_operators_api, then get_operator_info.

6. Basic sanity checks

djx --help
djx retrieve "filter long text" --dataset ./data/demo-dataset.jsonl --json
djx plan "filter long text" --dataset ./data/demo-dataset.jsonl --export ./data/out.jsonl --verbose
djx apply --plan ./data/demo-plan.yaml --yes --dry-run
dj-agents --help

7. Troubleshooting

If planning or session startup fails with API/model errors, verify:

  • DASHSCOPE_API_KEY or MODELSCOPE_API_TOKEN
  • DJA_OPENAI_BASE_URL
  • DJA_SESSION_MODEL and DJA_PLANNER_MODEL
  • DJA_MODEL_FALLBACKS when you expect model fallback
  • DJA_LLM_THINKING if your provider rejects the thinking flag

If a command reports missing optional dependencies, install the matching profile:

  • data-juicer-agents[harness] for harness-only djx tool usage
  • data-juicer-agents[core] for the full djx / dj-agents command set
  • data-juicer-agents[full] for core + copilot + interecipe