feat(agent): interaction quality ops & recipe, bad-case HTML report, and robust JSONL / HF meta loading#957
Open
feat(agent): interaction quality ops & recipe, bad-case HTML report, and robust JSONL / HF meta loading#957
Conversation
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
…igs conflicts (keep qwen-turbo + full 07 entity config) Made-with: Cursor
Made-with: Cursor
* preliminary test of minimal_configs (01 to 05) Made-with: Cursor * test of minimal_configs (06 to 08), optimize some ops Made-with: Cursor * conflits resolved and gemini'suggestion adopted Made-with: Cursor
…ale samples Made-with: Cursor
…ested on small-scale samples (#949) * preliminary test of minimal_configs (01 to 05) Made-with: Cursor * test of minimal_configs (06 to 08), optimize some ops Made-with: Cursor * conflits resolved and gemini'suggestion adopted Made-with: Cursor * end-to-end yaml, analysis toolchain, ui developed; tested on small-scale samples Made-with: Cursor
… evidence Made-with: Cursor
Made-with: Cursor
…truncation - Normalize LLM recommendation to list[str] in parse_output (fixes HF datasets shard align) - agent_dialog_normalize_mapper: configurable history caps, head+tail write-back, meta flag - dialog_* mappers: shared max_*_chars_for_prompt via dialog_llm_input_utils - Recipe/docs: agent_interaction_quality_analysis, PERFORMANCE_LLM, BAD_CASE_INSIGHTS - Tests: agent_dialog_normalize_mapper, llm_analysis_filter parse_output - build_op_doc: exclude dialog_llm_input_utils helper; video_camera_pose droid_args docstring Made-with: Cursor
…er + evidence, dialog history caps (#950) * preliminary test of minimal_configs (01 to 05) Made-with: Cursor * test of minimal_configs (06 to 08), optimize some ops Made-with: Cursor * conflits resolved and gemini'suggestion adopted Made-with: Cursor * end-to-end yaml, analysis toolchain, ui developed; tested on small-scale samples Made-with: Cursor * fix(agent): multi-turn tool dialog, bad-case gating, report zh-tier + evidence Made-with: Cursor * fix(agent): llm_quality record schema + dialog history caps & prompt truncation - Normalize LLM recommendation to list[str] in parse_output (fixes HF datasets shard align) - agent_dialog_normalize_mapper: configurable history caps, head+tail write-back, meta flag - dialog_* mappers: shared max_*_chars_for_prompt via dialog_llm_input_utils - Recipe/docs: agent_interaction_quality_analysis, PERFORMANCE_LLM, BAD_CASE_INSIGHTS - Tests: agent_dialog_normalize_mapper, llm_analysis_filter parse_output - build_op_doc: exclude dialog_llm_input_utils helper; video_camera_pose droid_args docstring Made-with: Cursor
Bad-case report (generate_bad_case_report.py): - CJK fonts for matplotlib/body; bar labels; section order (charts → insights → cases) - LLM page-top summary: compact digest, shorter prompt/tokens/timeout; default qwen3.5-plus - Drilldown: page cap + sidecar *_drilldown_full.jsonl; copy and nav tweaks - Richer agent_insight_llm cards; rule-based fallback summary agent_dialog_normalize_mapper: - Stable HF Arrow meta: always agent_dialog_history_compressed bool; list[str] placeholders for empty tool/skill types; filter falsy in tool_type_mapper and skill_insight_mapper Pipeline: run_bad_case_pipeline report uses argv array safe under set -u; BAD_CASE_REPORT_LLM=1 Tests + recipe yaml aligned. Made-with: Cursor
…malize, tests, yaml) Made-with: Cursor
#951) * preliminary test of minimal_configs (01 to 05) Made-with: Cursor * test of minimal_configs (06 to 08), optimize some ops Made-with: Cursor * conflits resolved and gemini'suggestion adopted Made-with: Cursor * end-to-end yaml, analysis toolchain, ui developed; tested on small-scale samples Made-with: Cursor * fix(agent): multi-turn tool dialog, bad-case gating, report zh-tier + evidence Made-with: Cursor * fix(agent): llm_quality record schema + dialog history caps & prompt truncation - Normalize LLM recommendation to list[str] in parse_output (fixes HF datasets shard align) - agent_dialog_normalize_mapper: configurable history caps, head+tail write-back, meta flag - dialog_* mappers: shared max_*_chars_for_prompt via dialog_llm_input_utils - Recipe/docs: agent_interaction_quality_analysis, PERFORMANCE_LLM, BAD_CASE_INSIGHTS - Tests: agent_dialog_normalize_mapper, llm_analysis_filter parse_output - build_op_doc: exclude dialog_llm_input_utils helper; video_camera_pose droid_args docstring Made-with: Cursor * feat(agent): bad-case HTML report UX + HF meta stability for normalize Bad-case report (generate_bad_case_report.py): - CJK fonts for matplotlib/body; bar labels; section order (charts → insights → cases) - LLM page-top summary: compact digest, shorter prompt/tokens/timeout; default qwen3.5-plus - Drilldown: page cap + sidecar *_drilldown_full.jsonl; copy and nav tweaks - Richer agent_insight_llm cards; rule-based fallback summary agent_dialog_normalize_mapper: - Stable HF Arrow meta: always agent_dialog_history_compressed bool; list[str] placeholders for empty tool/skill types; filter falsy in tool_type_mapper and skill_insight_mapper Pipeline: run_bad_case_pipeline report uses argv array safe under set -u; BAD_CASE_REPORT_LLM=1 Tests + recipe yaml aligned. Made-with: Cursor
- Add DATA_JUICER_USE_STDLIB_JSON env patch in init_configs - Document workaround in config_all.yaml and DatasetCfg guides Made-with: Cursor
* preliminary test of minimal_configs (01 to 05) Made-with: Cursor * test of minimal_configs (06 to 08), optimize some ops Made-with: Cursor * conflits resolved and gemini'suggestion adopted Made-with: Cursor * end-to-end yaml, analysis toolchain, ui developed; tested on small-scale samples Made-with: Cursor * fix(agent): multi-turn tool dialog, bad-case gating, report zh-tier + evidence Made-with: Cursor * fix(agent): llm_quality record schema + dialog history caps & prompt truncation - Normalize LLM recommendation to list[str] in parse_output (fixes HF datasets shard align) - agent_dialog_normalize_mapper: configurable history caps, head+tail write-back, meta flag - dialog_* mappers: shared max_*_chars_for_prompt via dialog_llm_input_utils - Recipe/docs: agent_interaction_quality_analysis, PERFORMANCE_LLM, BAD_CASE_INSIGHTS - Tests: agent_dialog_normalize_mapper, llm_analysis_filter parse_output - build_op_doc: exclude dialog_llm_input_utils helper; video_camera_pose droid_args docstring Made-with: Cursor * feat(agent): bad-case HTML report UX + HF meta stability for normalize Bad-case report (generate_bad_case_report.py): - CJK fonts for matplotlib/body; bar labels; section order (charts → insights → cases) - LLM page-top summary: compact digest, shorter prompt/tokens/timeout; default qwen3.5-plus - Drilldown: page cap + sidecar *_drilldown_full.jsonl; copy and nav tweaks - Richer agent_insight_llm cards; rule-based fallback summary agent_dialog_normalize_mapper: - Stable HF Arrow meta: always agent_dialog_history_compressed bool; list[str] placeholders for empty tool/skill types; filter falsy in tool_type_mapper and skill_insight_mapper Pipeline: run_bad_case_pipeline report uses argv array safe under set -u; BAD_CASE_REPORT_LLM=1 Tests + recipe yaml aligned. Made-with: Cursor * fix: optional stdlib json for HF datasets JSONL (ujson Value too big) - Add DATA_JUICER_USE_STDLIB_JSON env patch in init_configs - Document workaround in config_all.yaml and DatasetCfg guides Made-with: Cursor
- Add load_jsonl_lenient config and DATA_JUICER_JSONL_LENIENT env - Stream jsonl-only inputs via Dataset.from_generator; document in DatasetCfg - Add unit tests for jsonl_lenient_loader Made-with: Cursor
…and config_all Made-with: Cursor
* preliminary test of minimal_configs (01 to 05) Made-with: Cursor * test of minimal_configs (06 to 08), optimize some ops Made-with: Cursor * conflits resolved and gemini'suggestion adopted Made-with: Cursor * end-to-end yaml, analysis toolchain, ui developed; tested on small-scale samples Made-with: Cursor * fix(agent): multi-turn tool dialog, bad-case gating, report zh-tier + evidence Made-with: Cursor * fix(agent): llm_quality record schema + dialog history caps & prompt truncation - Normalize LLM recommendation to list[str] in parse_output (fixes HF datasets shard align) - agent_dialog_normalize_mapper: configurable history caps, head+tail write-back, meta flag - dialog_* mappers: shared max_*_chars_for_prompt via dialog_llm_input_utils - Recipe/docs: agent_interaction_quality_analysis, PERFORMANCE_LLM, BAD_CASE_INSIGHTS - Tests: agent_dialog_normalize_mapper, llm_analysis_filter parse_output - build_op_doc: exclude dialog_llm_input_utils helper; video_camera_pose droid_args docstring Made-with: Cursor * feat(agent): bad-case HTML report UX + HF meta stability for normalize Bad-case report (generate_bad_case_report.py): - CJK fonts for matplotlib/body; bar labels; section order (charts → insights → cases) - LLM page-top summary: compact digest, shorter prompt/tokens/timeout; default qwen3.5-plus - Drilldown: page cap + sidecar *_drilldown_full.jsonl; copy and nav tweaks - Richer agent_insight_llm cards; rule-based fallback summary agent_dialog_normalize_mapper: - Stable HF Arrow meta: always agent_dialog_history_compressed bool; list[str] placeholders for empty tool/skill types; filter falsy in tool_type_mapper and skill_insight_mapper Pipeline: run_bad_case_pipeline report uses argv array safe under set -u; BAD_CASE_REPORT_LLM=1 Tests + recipe yaml aligned. Made-with: Cursor * fix: optional stdlib json for HF datasets JSONL (ujson Value too big) - Add DATA_JUICER_USE_STDLIB_JSON env patch in init_configs - Document workaround in config_all.yaml and DatasetCfg guides Made-with: Cursor * feat: lenient JSONL load (stdlib json, skip bad lines) - Add load_jsonl_lenient config and DATA_JUICER_JSONL_LENIENT env - Stream jsonl-only inputs via Dataset.from_generator; document in DatasetCfg - Add unit tests for jsonl_lenient_loader Made-with: Cursor
Mixed extensions previously forced HuggingFace JSON loader and ujson (Value too big). Now only jsonl* shards are read; others are skipped with warnings. Log line [lenient jsonl] ACTIVE confirms the path. Made-with: Cursor
…vior Made-with: Cursor
) * preliminary test of minimal_configs (01 to 05) Made-with: Cursor * test of minimal_configs (06 to 08), optimize some ops Made-with: Cursor * conflits resolved and gemini'suggestion adopted Made-with: Cursor * end-to-end yaml, analysis toolchain, ui developed; tested on small-scale samples Made-with: Cursor * fix(agent): multi-turn tool dialog, bad-case gating, report zh-tier + evidence Made-with: Cursor * fix(agent): llm_quality record schema + dialog history caps & prompt truncation - Normalize LLM recommendation to list[str] in parse_output (fixes HF datasets shard align) - agent_dialog_normalize_mapper: configurable history caps, head+tail write-back, meta flag - dialog_* mappers: shared max_*_chars_for_prompt via dialog_llm_input_utils - Recipe/docs: agent_interaction_quality_analysis, PERFORMANCE_LLM, BAD_CASE_INSIGHTS - Tests: agent_dialog_normalize_mapper, llm_analysis_filter parse_output - build_op_doc: exclude dialog_llm_input_utils helper; video_camera_pose droid_args docstring Made-with: Cursor * feat(agent): bad-case HTML report UX + HF meta stability for normalize Bad-case report (generate_bad_case_report.py): - CJK fonts for matplotlib/body; bar labels; section order (charts → insights → cases) - LLM page-top summary: compact digest, shorter prompt/tokens/timeout; default qwen3.5-plus - Drilldown: page cap + sidecar *_drilldown_full.jsonl; copy and nav tweaks - Richer agent_insight_llm cards; rule-based fallback summary agent_dialog_normalize_mapper: - Stable HF Arrow meta: always agent_dialog_history_compressed bool; list[str] placeholders for empty tool/skill types; filter falsy in tool_type_mapper and skill_insight_mapper Pipeline: run_bad_case_pipeline report uses argv array safe under set -u; BAD_CASE_REPORT_LLM=1 Tests + recipe yaml aligned. Made-with: Cursor * fix: optional stdlib json for HF datasets JSONL (ujson Value too big) - Add DATA_JUICER_USE_STDLIB_JSON env patch in init_configs - Document workaround in config_all.yaml and DatasetCfg guides Made-with: Cursor * feat: lenient JSONL load (stdlib json, skip bad lines) - Add load_jsonl_lenient config and DATA_JUICER_JSONL_LENIENT env - Stream jsonl-only inputs via Dataset.from_generator; document in DatasetCfg - Add unit tests for jsonl_lenient_loader Made-with: Cursor * fix(lenient jsonl): do not fall back to HF when folder mixes .json Mixed extensions previously forced HuggingFace JSON loader and ujson (Value too big). Now only jsonl* shards are read; others are skipped with warnings. Log line [lenient jsonl] ACTIVE confirms the path. Made-with: Cursor
* use list type for the arg to avoid ckpt failure
…ests - Add dialog_* LLM axis mappers, trace coherence, tool relevance, PII suspect - agent_output_locale; extend bad-case signals, insight & usage/tool mappers - generate_bad_case_report: TOC/sidebar, insight↔drill links, snapshot 算子 row - Recipe/docs/Operators.md/pyproject; mapper & locale tests - build_op_doc: exclude dialog_quality_llm_utils (helper, not an OP) Made-with: Cursor
- HTML report: macro distributions (tools, skills, intent/topic/sentiment) with bar charts and optional word clouds; TOC and chart section wiring. - Omit PII audit / redaction–related samples from high_precision and watchlist insight excerpts (drilldown/export unchanged). - agent_skill_insight_mapper: prompt asks for concrete ~10-char (zh) / 4–8-word (en) capability phrases; forbid vague read/write–style tags. - Docs: root README link to demos/agent; maintainer checklist in demos/agent README; YAML/minimal_configs notes. - Tests: generate_bad_case_report smoke (PII omission); agent_skill_insight prompt assertions. Made-with: Cursor
upstream: https://github.com/datajuicer/data-juicer.git Resolve config.py conflict by using build_base_parser(); take upstream uv.lock (validated with uv lock --check). Made-with: Cursor
…ot registered into the OPERATORS
yxdyc
commented
Mar 27, 2026
- agent_skill_insight_mapper: split labels on ,,、;; for CN/EN separators - generate_bad_case_report: mirror split in macro stats (no re-run required) - Optional semantic clustering for insight headlines/audit (scikit-learn) - Insight model tabs: default full model id, family mode flag; order by batch volume - Stack request_model chart: Top 5 by requests + merged remainder bar - Extend model family hints (Kimi, GLM, MiniMax); tab/chart copy updates - Smoke tests: PII omission, skill-insight macro split; --no-insight-semantic-cluster in PII run Made-with: Cursor
ShenQianli
reviewed
Mar 30, 2026
| tier = "high_precision" | ||
| elif len(mediums) >= self.min_medium_signals_for_watchlist: | ||
| tier = "watchlist" | ||
| elif len(signals) == 1 and signals[0].get("weight") == "medium": |
Collaborator
There was a problem hiding this comment.
When self.min_medium_signals_for_watchlist == 2, a sample with a single medium signal will be added to the watchlist. Not sure if it is intended.
ShenQianli
reviewed
Mar 30, 2026
| p = sum(u.get("prompt_tokens") or 0 for u in usages) | ||
| c = sum(u.get("completion_tokens") or 0 for u in usages) | ||
| totals = [u.get("total_tokens") for u in usages if u.get("total_tokens") is not None] | ||
| t = totals[0] if totals else None |
Collaborator
There was a problem hiding this comment.
total_tokens is taken from the first non-null entry. Given the limited information available about the raw data contract, it is unclear whether this always matches the intended aggregation logic.
cmgzn
reviewed
Mar 30, 2026
| # load_jsonl_lenient: true | ||
| # # or: DATA_JUICER_JSONL_LENIENT=1 | ||
| load_dataset_kwargs: {} # extra kwargs passed to datasets.load_dataset(). Useful for format-specific options, e.g. chunksize (JSON), columns (Parquet), delimiter (CSV). | ||
| load_jsonl_lenient: false # if true, stream jsonl* shards with stdlib json and skip bad lines; other suffixes in the same folder are ignored (not HF fallback). Confirm logs contain "[lenient jsonl] ACTIVE". |
Collaborator
There was a problem hiding this comment.
Suggest adding --load_jsonl_lenient parser argument in config.py
Enables command-line passing and better external access for tools like dj-agents.
PII / redaction: - Expand pii_redaction_mapper (PEM/JWT/URL/IP/MAC ordering, optional extended PII) with tests. - pii_llm_suspect_mapper: spaCy install/locks, safer logging; prompts mention URL/IP/MAC/JWT/PEM leaks. Reporting / demo: - generate_bad_case_report: non-PII vs PII-flagged insight subsections; minimal PII cards; headline clusters use non-PII rows only; align redaction placeholders for grouping. - agent_interaction_quality_analysis.yaml: pii_redaction indentation and default-behavior comment. Recent branch history (already on upstream before this commit): - OP doc build skips unregistered base classes; accelerator assignment fix; nested-query dict guard; bad-case report + skill insight parsing enrichments. Made-with: Cursor
- Default writes safe HTML plus *_pii_audit.html; --report-pii-variants safe|audit|both - Case study: ~half high_precision / half watchlist quota with spillover - Reuse char TF-IDF + MiniBatchKMeans round-robin for Insight cards and case-study page - Remove in-page PII minimal-card split; safe variant omits PII rows from insight + drill - run_bad_case_pipeline.sh echoes audit path; smoke tests updated Made-with: Cursor
- New section #sec-dialog-metrics: messages length, user turns, agent_turn_count, text chars, choices length, tool-touch message count, tokens (meta then stats), latency. - Optional matplotlib histograms when --no-charts is off; TOC and charts intro link to section. - Smoke test asserts sec-dialog-metrics anchor. Made-with: Cursor
- Repeatable --input in generate_bad_case_report and verify; load_merged_rows reads paths in order. - run_bad_case_pipeline report: multiple JSONL, optional trailing OUT.html. - Multi-input: compact page meta and bottom #sec-data-provenance details for audit. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch adds an agent / dialog interaction quality path (mappers, signals, optional LLM insight), a bad-case HTML report and supporting demos/tooling, and hardens JSONL loading and Hugging Face datasets usage for large integers / mixed folders. upstream/main is merged in so the branch is up to date for landing on main.
Agent & bad-case
Data loading & config
Config / CLI: align with upstream build_base_parser() refactor after merging upstream/main.
Tests & docs