Skip to content

The chatbot datasets contain only 10 and 100 queries, respectively, which may limit statistical robustness and generalizability to diverse domains. #6

@Qingbolan

Description

@Qingbolan

While the released chatbot subsets contain 10 and 100 queries for reproducibility, our curated datasets include xxx candidate queries for LM-Market and CA-Product..

To verify robustness, we scaled the CA dataset to [TODO: 500 and 1,000] queries and observed consistent trends in all key metrics [TODO: as shown in the Table below], confirming that the benchmark conclusions hold across larger scales.

exp configuration

  • dataset:
    • CA-Product: only modify the last step and explore the best number of queries
    • LM-Sys-Market: leverage LLM to select the top-1000 queries that suitable for ad insertion.
  • base LLM: doubao; judge LLM: 4.1-mini; embedding model: text-embedding-small

Metadata

Metadata

Assignees

No one assigned

    Labels

    expand-chatbot-dataset-queriesChatbot datasets contain only 10 and 100 queries; generalizability concernrebuttlerebuttle of paper submit

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions