You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run the command: python evaluation.py --db spider/test_database --etype exec --gold spider/test_data/dev_gold.sql --pred spider/test_data/dev_gold.sql
Then returns:
easy medium hard extra all
count 470 857 463 357 2147
===================== EXECUTION ACCURACY =====================
execution 0.996 0.977 0.983 0.994 0.985
However, in above command, the gold file and pred file are the same.
When predicted sql contains "value" identifiers (e.g. a column called "value"), p_str = p_str.replace("value", "1") replace string "value" in predicted sql incorrectly.
There are such sqls in Spider-test dataset:
SELECT CONTENTS FROM boxes WHERE Value > 150 warehouse_1
SELECT CONTENTS FROM boxes WHERE Value > 150 warehouse_1
SELECT warehouse , avg(value) FROM boxes GROUP BY warehouse warehouse_1
SELECT warehouse , avg(value) FROM boxes GROUP BY warehouse warehouse_1
SELECT avg(value) , sum(value) FROM boxes warehouse_1
SELECT avg(value) , sum(value) FROM boxes warehouse_1
SELECT avg(capacity) , sum(capacity) FROM warehouses warehouse_1
SELECT avg(capacity) , sum(capacity) FROM warehouses warehouse_1
SELECT avg(value) , max(value) , CONTENTS FROM boxes GROUP BY CONTENTS warehouse_1
SELECT avg(value) , max(value) , CONTENTS FROM boxes GROUP BY CONTENTS warehouse_1
SELECT CONTENTS FROM boxes ORDER BY value DESC LIMIT 1 warehouse_1
SELECT CONTENTS FROM boxes ORDER BY value DESC LIMIT 1 warehouse_1
SELECT avg(value) FROM boxes warehouse_1
SELECT avg(value) FROM boxes warehouse_1
......
When I delete p_str = p_str.replace("value", "1"), the evaluation returns 100% execution score as expected.
The text was updated successfully, but these errors were encountered:
I download spider dataset from Spider Leaderboard.
When I run the command:
python evaluation.py --db spider/test_database --etype exec --gold spider/test_data/dev_gold.sql --pred spider/test_data/dev_gold.sql
Then returns:
easy medium hard extra all
count 470 857 463 357 2147
===================== EXECUTION ACCURACY =====================
execution 0.996 0.977 0.983 0.994 0.985
However, in above command, the gold file and pred file are the same.
I find the problem in
evaluation.py
, code link (line# 569):https://github.com/taoyds/test-suite-sql-eval/blob/e97acc546ecbee8fa27fa8dbf025ef61493a876c/evaluation.py#L569C13-L569C48
When predicted sql contains "value" identifiers (e.g. a column called "value"),
p_str = p_str.replace("value", "1")
replace string "value" in predicted sql incorrectly.There are such sqls in Spider-test dataset:
SELECT CONTENTS FROM boxes WHERE Value > 150 warehouse_1
SELECT CONTENTS FROM boxes WHERE Value > 150 warehouse_1
SELECT warehouse , avg(value) FROM boxes GROUP BY warehouse warehouse_1
SELECT warehouse , avg(value) FROM boxes GROUP BY warehouse warehouse_1
SELECT avg(value) , sum(value) FROM boxes warehouse_1
SELECT avg(value) , sum(value) FROM boxes warehouse_1
SELECT avg(capacity) , sum(capacity) FROM warehouses warehouse_1
SELECT avg(capacity) , sum(capacity) FROM warehouses warehouse_1
SELECT avg(value) , max(value) , CONTENTS FROM boxes GROUP BY CONTENTS warehouse_1
SELECT avg(value) , max(value) , CONTENTS FROM boxes GROUP BY CONTENTS warehouse_1
SELECT CONTENTS FROM boxes ORDER BY value DESC LIMIT 1 warehouse_1
SELECT CONTENTS FROM boxes ORDER BY value DESC LIMIT 1 warehouse_1
SELECT avg(value) FROM boxes warehouse_1
SELECT avg(value) FROM boxes warehouse_1
......
When I delete
p_str = p_str.replace("value", "1")
, the evaluation returns 100% execution score as expected.The text was updated successfully, but these errors were encountered: