Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The best parameters for PICARD in eval. #85

Closed
JiexingQi opened this issue Apr 11, 2022 · 2 comments
Closed

The best parameters for PICARD in eval. #85

JiexingQi opened this issue Apr 11, 2022 · 2 comments
Labels
question Further information is requested

Comments

@JiexingQi
Copy link

Hi, @tscholak . I run eval in T5-3B + PICARD model. The performance improvement by using PICARD is not significant as you showed in the results of your paper. For example, when I fine-tune a T5-3B model, the improvement is 3 for EM and <4 for EX. However, I find the improvement in your paper is 4 for EM (71.5 --> 75.5 by using PICARD) and 4.9 for EX (74.4 --> 79.3 by using PICARD).

The eval.json of mine is :

{
    "run_name": "eval_0411_spider_1984",
    "model_name_or_path": "t5-3b",
    "dataset": "spider",
    "source_prefix": "",
    "schema_serialization_type": "custom",
    "schema_serialization_randomized": false,
    "schema_serialization_with_db_id": true,
    "schema_serialization_with_db_content": true,
    "normalize_query": true,
    "target_with_db_id": true,
    "output_dir": "./experiment/eval_0411_spider_1984/",
    "cache_dir": "./transformers_cache",
    "do_train": false,
    "do_eval": true,
    "fp16": false,
    "per_device_eval_batch_size": 2,
    "seed": 1,
    "report_to": ["wandb"],
    "predict_with_generate": true,
    "num_beams": 4,
    "num_beam_groups": 1,
    "diversity_penalty": 0.0,
    "max_val_samples": 1034,
    "use_picard": true,
    "launch_picard": true,
    "picard_mode": "parse_with_guards",
    "picard_schedule": "incremental",
    "picard_max_tokens_to_check": 2,
    "eval_accumulation_steps": 1,
    "metric_config": "both",
    "val_max_target_length": 512,
    "val_max_time": 1200
}

Could you please tell me the eval parameter for you to obtain the best improvement? Thank you.

@tscholak
Copy link
Collaborator

Hi @JiexingQi. Your configuration looks fine. What is the performance of your T5-3b model without PICARD? What happens if you use tscholak/cxmefzzi? Can you reproduce what I had?

@tscholak
Copy link
Collaborator

There currently also is an issue with the PICARD parser that may cause timeouts and therefore empty results, see #80 and #82 (comment)

@tscholak tscholak added the question Further information is requested label Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants