- Python
>=3.10,<3.13 - Data-Juicer runtime (
py-data-juicer) - A DashScope or OpenAI-compatible API key
Choose one installation profile:
core: fulldata_juicer_agentscommand surfaceharness: minimal install for thedjx toolharness profilefull:corepluscopilotandinterecipe
cd ./data-juicer-agents
uv venv .venv
source .venv/bin/activate
uv pip install -e '.[core]'Harness install:
uv pip install -e '.[harness]'
export DJX_TOOL_PROFILE=harnessFull install:
uv pip install -e '.[full]'export DASHSCOPE_API_KEY="<your_key>"
# or:
# export MODELSCOPE_API_TOKEN="<your_key>"
# Optional overrides
export DJA_OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export DJA_SESSION_MODEL="qwen3-max-2026-01-23"
export DJA_PLANNER_MODEL="qwen3-max-2026-01-23"
export DJA_MODEL_FALLBACKS="qwen-max,qwen-plus"
export DJA_LLM_THINKING="true"Optional inspection step:
djx retrieve "remove duplicate text records" \
--dataset ./data/demo-dataset.jsonl \
--top-k 8Generate a plan:
djx plan "deduplicate and clean text for RAG" \
--dataset ./data/demo-dataset.jsonl \
--export ./data/demo-dataset-processed.jsonl \
--output ./data/demo-plan.yamlApply the saved plan:
djx apply --plan ./data/demo-plan.yaml --yesDry-run without executing dj-process:
djx apply --plan ./data/demo-plan.yaml --yes --dry-runNotes:
djx planalready performs internal operator retrieval before building the final plan.djx retrieveis useful for inspection and debugging.
Minimal atomic tool path:
djx tool list --tag plan
djx tool schema inspect_dataset
djx tool run list_system_config --input-json '{}'Notes:
djx toolis JSON-only and primarily intended for agent / skill automation.- write or execute tools require explicit
--yes. DJX_TOOL_PROFILE=harnesslimitsdjx toolto the harness tool set (apply,context,retrieve,plan).
Default TUI:
dj-agents --dataset ./data/demo-dataset.jsonl --export ./data/demo-dataset-processed.jsonlPlain terminal mode:
dj-agents --ui plain --dataset ./data/demo-dataset.jsonl --export ./data/demo-dataset-processed.jsonlAgentScope Studio mode:
as_studio
dj-agents --ui as_studio --studio-url http://localhost:3000 --dataset ./data/demo-dataset.jsonl --export ./data/demo-dataset-processed.jsonlNotes:
dj-agentsrequires LLM access.- In session mode, press
Ctrl+Cto interrupt the current turn andCtrl+Dto exit. - In
as_studiomode, start AgentScope Studio separately before launchingdj-agents. - The session agent usually plans with
inspect_dataset -> retrieve_operators -> build_dataset_spec -> build_process_spec -> build_system_spec -> assemble_plan -> plan_validate -> plan_save. - For operator-level discovery and schema lookup, prefer
retrieve_operators/retrieve_operators_api, thenget_operator_info.
djx --help
djx retrieve "filter long text" --dataset ./data/demo-dataset.jsonl --json
djx plan "filter long text" --dataset ./data/demo-dataset.jsonl --export ./data/out.jsonl --verbose
djx apply --plan ./data/demo-plan.yaml --yes --dry-run
dj-agents --helpIf planning or session startup fails with API/model errors, verify:
DASHSCOPE_API_KEYorMODELSCOPE_API_TOKENDJA_OPENAI_BASE_URLDJA_SESSION_MODELandDJA_PLANNER_MODELDJA_MODEL_FALLBACKSwhen you expect model fallbackDJA_LLM_THINKINGif your provider rejects the thinking flag
If a command reports missing optional dependencies, install the matching profile:
data-juicer-agents[harness]for harness-onlydjx toolusagedata-juicer-agents[core]for the fulldjx/dj-agentscommand setdata-juicer-agents[full]forcore + copilot + interecipe