You probably already found a solution, but since this was the first search result for
'ffmpeg ocr dvdsub srt', here's a tool I use.
https://github.com/ruediger/VobSub2SRT
It is not perfect and may required some editing.
I was trying to find a feature in ffmpeg that does this better than my method, but I found this and remembered the rabbit hole I had to go down, so I hope this helps someone.
Here's my process
For extracting dvdsub from a .mkv
Using mkvextract from mkvtoolnix-cli
mkvextract video.mkv tracks 2:video.idx
- arg 1 - The filename of video containing dvdsub
- arg 2 - The extraction type
- arg 3 - [Stream # containing dvdsub]:[Desired filename of extracted files].idx
My example would've produced a video.idx and a video.sub file
Generating subrip from .idx and .sub files
Using vobsub2srt
vobsub2srt uses tesseract and I found using tesseract's legacy mode works the best.
vobsub2srt --tesseract-oem 0 video
- arg 1 - Tesseract Engine Mode (
tesseract --help-oem for modes)
- arg 2 - Legacy Mode
- arg 3 - Filename of BOTH .idx and .sub WITHOUT extension
My example would've produced video.srt
Inspect and edit subrip file
Mistakes I've experienced
- '|' instead of 'I', tesseract's legacy mode doesn't seem to make this mistake often.
- ` instead of '
- Spacing, when a line starts with '-', there may not be space between '-' and the first word.
- Missing ' & "
- 'I' or '|' instead of '[', legacy doesn't seem to make this mistake often.
Edit it
If you're not familiar with subrip files, they can be simply tossed into a text editor.
grep, vim, and sed are your friends.
Most mistakes from legacy mode can be easily ignore however.
Replacing dvdsub with subrip(srt)
Using ffmpeg
ffmpeg -i video.mkv -i video.srt -c copy -c:s subrip -map 0:v -map 0:a -map 1 final-video.mkv
- arg 1 & 2 - Input #1 - Video file containing dvdsub
- arg 3 & 4 - Input #2 - Subrip file
- arg 5 - Codec used for all stream
- arg 6 - Copies all streams (Only video and audio gets copied)
- arg 7 - Subtitle Codec (Overrides arg 5 for subtitles)
- arg 8 - Selects subrip as subtitle codec (may be redundant, but safe>sorry)
- arg 9 & 10 - Maps video stream from 1st input to 1st stream in output
- arg 11 & 12 - Maps audio stream from 1st input to 2nd stream in ouput
- arg 13 & 14 - Maps subtitle stream from 2nd input to 3rd stream in output
- arg 15 - Output filename
And done, I hope there is no character limit on here.