How can I download subtitles of a list of videos using youtube-dl? I need an option for this. However I could not find an option to download only subtitles
5 Answers
There is an option, mentioned in the documention:
Subtitle Options:
--write-sub Write subtitle file
--write-auto-sub Write automatic subtitle file (YouTube only)
--all-subs Download all the available subtitles of the video
--list-subs List all available subtitles for the video
--sub-format FORMAT Subtitle format, accepts formats preference, for example: "srt" or "ass/srt/best"
--sub-lang LANGS Languages of the subtitles to download (optional) separated by commas, use IETF language tags like 'en,pt'
So for example, to list all subs for a video:
youtube-dl --list-subs https://www.youtube.com/watch?v=Ye8mB6VsUHw
To download all subs, but not the video:
youtube-dl --all-subs --skip-download https://www.youtube.com/watch?v=Ye8mB6VsUHw
If a video only has auto generated subtitles, then --all-subs still won't download it, instead use:
youtube-dl --write-auto-sub --skip-download https://www.youtube.com/watch?v=Ye8mB6VsUHw
- 2,394
- 3,758
Or you can only download one subtitle
youtube-dl --write-sub --sub-lang en --skip-download URL
- 951
just run the following command
youtube-dl --write-auto-sub --convert-subs=srt --skip-download URL
For example you are downloading
https://www.youtube.com/watch?v=example. with title "example"
--convert=srt will output to a file named example.en.srt where en stands for English es for Spanish etc.
The file will have something like this:
00:00:04.259 --> 00:00:05.259
>> I’m Elon Musk.
00:00:05.259 --> 00:00:06.669
>> What is your claim to fame?
00:00:06.669 --> 00:00:07.669
>> I’m the founder of
00:00:07.669 --> 00:00:08.669
Tesla.com.
OPTIONAL - If you need the text to be cleaned up you can use python to clean it a little:
import re
bad_words = ['-->','</c>']
prefix = re.compile(r"^>> ")
with open('example.en.vtt') as oldfile, open('newfile.txt', 'w') as newfile:
for line in oldfile:
if not any(bad_word in line for bad_word in bad_words):
newfile.write(line)
with open('newfile.txt') as result:
uniqlines = set(result.readlines())
with open('sub_out.txt', 'w') as rmdup:
mylst = map(lambda each: re.sub(prefix, "", each), uniqlines)
print(mylst)
rmdup.writelines(set(mylst))
Output newfile.txt:
I’m Elon Musk.
What is your claim to fame?
I’m the founder of
Tesla.com.
- 113
- 419
Another simple way to download subtitles from YouTube is to download Google2SRT. Google2SRT is a free, open source program for Windows, Mac and Linux that is able to download, save and convert multiple subtitles from YouTube videos.
Usage
Click the links to see screenshots of steps 1 and 2.
Paste the URL in the Google subtitles text box and click Read.
Choose the language by selecting the appropriate check box provided and press Go.
View the destination folder that was input in the SRT subtitles textbox to locate the SRT files.
youtube-dl has been forked, and the new command would be
yt-dlp --write-auto-sub --convert-subs=srt --skip-download https://www.youtube.com/watch?v=example
since i needed the text only, i adapted the answer of @Hernan Pesantez as follows, to clean the received format.
import re, sys
bad_words = ['-->','</c>']
new_lines = []
prefix = re.compile(r"^>> ")
with open(sys.argv[1]) as oldfile:
for line in oldfile:
line = line.strip()
if not line:
continue
line = re.sub(prefix, "", line)
if any(bad_word in line for bad_word in bad_words):
if new_lines and re.match(r'^\d+$', new_lines[-1]):
new_lines.pop()
continue
if new_lines and line.startswith(new_lines[-1]):
new_lines.pop()
new_lines.append(line)
for line in new_lines:
print(line)
- 113