Script: parsing transcript .srt files into readable text #76

Jamoverjelly · 2018-06-27T00:53:06Z

Hello,

I am working through an online class and trying to produce notes based on the instructional video content. Since many of the concepts covered in these videos are worth taking note of, I'm finding myself writing out nearly every line spoken by the instructor. Obviously, this process is laborious and extremely time-consuming. I am wondering if there is an easier way to extract the text from these videos using an srt tool to help parse and modify the text.

The syntax of the transcript files for each video are identical to standard srt format. Here's an example:

1
00:00:00,710 --> 00:00:03,220
Rob just showed us how we can
make things accessible to

2
00:00:03,220 --> 00:00:05,970
anyone who can't use a mouse or
pointing device.

3
00:00:05,970 --> 00:00:09,130
Whether that's because it's any
type of physical impairment or

4
00:00:09,130 --> 00:00:11,510
a technology issue or
simply personal preference.

Does pysrt currently provide any tools for modifying text content so that it's formatted into a more readable format? To clarify, for the above example, I would like to remove blank lines, lines beginning with the record number and time-stamp, and then join the remaining lines, adding spaces after periods, like so:

Rob just showed us how we can make things accessible to anyone who can't use a mouse or pointing device. Whether that's because it's any type of physical impairment or a technology issue or simply personal preference.

I am interested in creating the following output from the example above and being able to apply such a modification to more of the files in the series. In my current situation, I am really pretty rusty working with python, though believe this capability could be pretty easily implemented with
an understanding of common string methods.

Can anyone contributing to this project let me know how this is done or if the functionality already exists in pysrt?

Thanks!

The text was updated successfully, but these errors were encountered:

whoizit · 2019-05-04T07:05:25Z

@Jamoverjelly https://gist.github.com/whoizit/c54f916c1c6d78ad5ac88cf4735c9d7d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script: parsing transcript .srt files into readable text #76

Script: parsing transcript .srt files into readable text #76

Jamoverjelly commented Jun 27, 2018

whoizit commented May 4, 2019

Script: parsing transcript .srt files into readable text #76

Script: parsing transcript .srt files into readable text #76

Comments

Jamoverjelly commented Jun 27, 2018

whoizit commented May 4, 2019