no Selected json #1

peng913 · 2025-01-24T13:28:17Z

Dear author, I'm running the self_step2.py for the Self-Traning Period, but I didn't generate the 'data_self/Selected_w_dialogs_' json file that R2 requires

xiaowudeshen · 2025-01-24T16:03:53Z

Hi there,

Thanks for your interest in this project. I am also revisiting this repo after a long time and let me try my best to answer your question.

I think your issue is regarding the self-training process in this paper. The code for self-training is stored in the self_step2.py file (shown below) and I checked that the file for "Selected_w_dialogs_" should be the "output_file_R2" which indicates that it's the output for self-training step 2.

    vargs = vars(args)
    print("Self_training_starts++++++++++++++++++++++++++++++++++++++++++++++")
    training_data_path = "data/train_dials.json"
    #file to save all the predicted results
    output_file_R1 = 'data_self_2/slot_train_dials_' + vargs["only_domain"] + '.json'
    #file to save all the selected dialogues
    output_file_R2 = 'data_self_2/Selected_w_dialogs_' + vargs["only_domain"] + '.json'

The step 2 of self-training is related to the opposite process, and it should perform two operations:

Use the generated value to inversely predict the slot type. e.g., Hotel-A ---> hotel-name; The results will be saved to pred_path2.
Check if the predicted slot type matches the original one, if so, we consider this as a good label and save it to the "output_file_R2" with the name "Selected_w_dialogs".

The code for this part (R2 function) is shown below, which should be in the same self_step2.py file.

pred_path2 = eval_from_checkpoint(args)
    # pred_path2 = 't5_self/t5_flan_joint/google/flan-t5-smallt5_except_domain_hoteljoint_taskmask_slot_slotlang_question_lr_0.0001_epoch_5_seed_577/mask_slott5/R2/none/results/zeroshot_prediction.json'
    preprocessed_data = data_preprocessing(training_data_path, pred_path2, output_file_R2,  purpose = 'prepare_finetuning')
    joint_acc, turn_acc_dict = preprocessed_data.check_slot_prediction_acc(pred_path1, purpose = 'prepare_finetuning')

I have just tested this code again using the commented "pred_path2" which saves my original prediction file and it can generate the file that you want.

I will suggest that you first check if you can successfully produce the pred_path2 and if so, you can skip the "eval_from_checkpoint", which takes a long time. Then you can directly run the "data_processing" to debug, and try to see if the "data_processing" function from the "prepare_self_training.py" is running well, where it checks for good labels and saves them in the desired output file. If both are working well, the file should be saved in the 'data_self_2' folder.

Let me know if you have further questions.

Victor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no Selected json #1

no Selected json #1

peng913 commented Jan 24, 2025

xiaowudeshen commented Jan 24, 2025

no Selected json #1

no Selected json #1

Comments

peng913 commented Jan 24, 2025

xiaowudeshen commented Jan 24, 2025