How to extract parts of the international conference video based on spoken language using command line

Question

I have an international conference video which contains two spoken languages, i.e. the video is mixed with sentences of English and Chinese. I would like to remove the Chinese part by command line.

firstly, enerate subtitle files using whisper

whisper myvideo --model large --language en

the subtitle file contains both languages and timing

1
00:00:00,000 --> 00:00:04,220
if you are not concerned and doing the work of the Lord.
2
00:00:09,120 --> 00:00:13,880
如果你不愿意去遵行耶稣基督的话语的话,
3
00:00:14,220 --> 00:00:18,220
就没有必要昼夜去默想神的话。
4
00:00:18,220 --> 00:00:22,200
Take more of me, give me more of you.
....

The question is how to use command line and ffmpeg to remove all the Chinese video parts based on the timing in the subtitle? The video is very long, and the purpose is to use command line to do the task, rather than manually.

step1 ) So I need to identify language of every line of the subtitle:

#!/bin/bash
while IFS= read -r line
do
  echo "text: $line"
  lan= trans -id $line |awk '/^Code/ {print $2}'
  echo "lan: $lan"
done < "$1"

Well this above bash doesn't work properly yet, how to do this?

score 0 · Accepted Answer · answered Feb 14 '24 at 08:28

I wrote a bash script, barely to remove the Chinese part of the video.

use openai-whisper to generate subtitle file with English and Chinese timing

whisper myvideo -model large-v3 -language Chinese

use bash to extract the desired English timing.
use ffmpeg to extract and merge the English video

A full script of step 2 and 3 is here:

#!/bin/bash
# Date: 20240214
# c275633094@gmail.com
# usage: bash video_rm_chi.sh myvideo.*
# output: *_eng.srt
# output: *_t.txt
# output: *_list.txt
# output: *_list.*
# output: *_eng.*
re_encode=false
clean_final=false
fullfile="$1"
filename=$(basename -- "$fullfile")
extension="${filename##.}"
filename="${filename%.}"
rm "$filename"_t.txt 
rm "$filename"_list*
rm "$filename"_eng*
N=$(wc "$filename".srt |awk '{print $1}')
echo "N= $N"
N4=expr $N / 4
echo "N/4= $N4"
#Nr=expr $N % 4
#echo "N%4= $Nr"
n=1
jump=1
eng=0
tstart=0
tend=0
#while IFS= read -r line
#while read -r line
while(($n<=$N4))
do
  n4=expr $n \* 4 - 1
  echo "n = $n , n4 = $n4"
  if sed "$n4!d" "$filename".srt | ugrep "[\x{4e00}-\x{9fcc}]"; then 
    echo "line expr $n \* 4 - 1 contains Chinese, reject, and jump to next (x4) lyrics line"
    if [ $eng -gt 0 ]; then
      line_eng=$(sed -n "expr $n4 - 5p" "$filename".srt)
      time_eng=($line_eng)
      t2=${time_eng[2]}
      echo "t2 $t2"
    fi
    jump=expr $jump + 1 
    eng=0
  else
    if [ $jump -gt 0 ] && [ $eng -eq 0 ]; then
      echo '======================' 
      line_eng=$(sed -n "expr $n4 - 1p" "$filename".srt)
      time_eng=($line_eng)
      t1=${time_eng[0]}
      t2=${time_eng[2]}
      #sed "expr $n4 - 1!d" "$filename".srt >> "$filename""_eng.srt"
      #if [[ $(sed "expr $n4 - 1!d" "$filename".srt) = '-->' ]]; then sed "expr $n4 - 1!d" "$filename".srt >> "$filename"_t.txt; fi 
    fi
    jump=0 
    eng=expr $eng + 1
  fi 
  #if [[ $(sed "expr $n4 - 1!d" "$filename".srt) = '-->' ]]; then sed "expr $n4 - 1!d" "$filename".srt >> "$filename"_t.txt; fi 
  if [ $jump -eq 1 ] && [ $eng -eq 0 ] ; then
    echo "write1 $t1 $t2"
    echo "$t1 $t2" >> "$filename"_t.txt
  fi
  if [ $jump -eq 0 ] && [ $n -eq $N4 ] ; then
    echo "write2 $t1 $t2"
    echo "$t1 $t2" >> "$filename"_t.txt
  fi
  n=expr $n + 1
done < "$fullfile"
sed -i 's/,/./g' "$filename"_t.txt
ts_get_msec()
{
  read -r h m s ms <<< $(echo $1 | tr '.:' ' ' )
  h=${h#0}
  m=${m#0}
  s=${s#0}
  ms=${ms#0}
  echo $(((h60601000)+(m601000)+(s1000)+ms))
}
duration=0
i=1
while read -r start_ts stop_ts <&3; do
  i_formatted=$(printf "%04d" "$i")
  if [ $re_encode = true ] ; then
    ffmpeg -i $fullfile -ss $start_ts -to $stop_ts -c copy "$filename"_list"$i_formatted"."$extension"
  else
    ffmpeg -i $fullfile -ss $start_ts -to $stop_ts -c copy "$filename"_list"$i_formatted"."$extension"
    #ffmpeg -ss $start_ts -to $stop_ts -i $fullfile -vcodec copy -acodec copy -avoid_negative_ts make_zero "$filename"_list"$i_formatted"."$extension"
  fi
i=expr $i + 1
  START=$(ts_get_msec $start_ts)
  STOP=$(ts_get_msec $stop_ts)
  DIFF=$((STOP-START))
  duration=expr $duration + $DIFF
  echo "start $START"
  echo "stop $STOP"
  echo "diff $DIFF"
  echo "duration $duration"
done 3< "$filename"_t.txt
#https://stackoverflow.com/a/55682555/5845212
#ffmpeg will corrupt stanin, so use 3<
echo "f duration $duration"
for f in "$filename"_list*."$extension"; do echo "file '$f'" >> "$filename"_list.txt; done
if [ $re_encode = true ] ; then
  ffmpeg -f conat -safe 0 -i "$filename"_list.txt "$filename"_eng."$extension"
else
  ffmpeg -f concat -safe 0 -i "$filename"_list.txt -c copy "$filename"_eng."$extension"
fi
min=$(($duration /(601000)))
sec=$((($duration %(601000))/1000))
ms=$((($duration %(60*1000))%1000))
echo "video duration ${min}:${sec}.$ms"
if [ $clean_final = true ] ; then
  rm "$filename"_eng.srt 
  rm "$filename"_t.txt 
  rm "$filename"_list.*
fi

How to extract parts of the international conference video based on spoken language using command line

1 Answers1