Skip to content

Conversation

@wallbloggerbeing
Copy link

  • Using the 'utf-8' encoding when opening files in the save_transcriptions and load_transcriptions functions, to avoid encoding errors when writing non-ASCII characters.

  • Using the start parameter in the enumerate function in the load_transcriptions function, to avoid confusing error messages in case of exceptions.

  • Using the rstrip function in the parse_transcription_line function, to remove any trailing white spaces and newlines from the transcription.

  • Optimizing the code for runtime and code cleanliness, by using list comprehension in the parse_transcription function to replace the for loop for appending transcriptions.

These changes should make the code more robust and handle various edge cases. I have tested the code with various transcription files and it works as expected.

Improved file handling and error handling in transcription parser
Fixed the indexing error by using len(chars) instead of line_probs.shape[1].

Used the += operator for concatenating strings instead of +.

Fixed the error in the list comprehension for averaging the probabilities by using item instead of probs[i].

Removed unnecessary parentheses and added spaces around operators for readability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant