Use silencedetect audio filter and feed its debugging output to segment output format parameter.
Here is a ready-made script:
#!/bin/bash
IN=$1
OUT=$2
true ${SD_PARAMS:="-55dB:d=0.3"};
true ${MIN_FRAGMENT_DURATION:="20"};
export MIN_FRAGMENT_DURATION
if [ -z "$OUT" ]; then
    echo "Usage: split_by_silence.sh input_media.mp4 output_template_%03d.mkv"
    echo "Depends on FFmpeg, Bash, Awk, Perl 5. Not tested on Mac or Windows."
    echo ""
    echo "Environment variables (with their current values):"
    echo "    SD_PARAMS=$SD_PARAMS       Parameters for FFmpeg's silencedetect filter: noise tolerance and minimal silence duration"
    echo "    MIN_FRAGMENT_DURATION=$MIN_FRAGMENT_DURATION    Minimal fragment duration"
    exit 1
fi
echo "Determining split points..." >& 2
SPLITS=$(
    ffmpeg -nostats -v repeat+info -i "${IN}" -af silencedetect="${SD_PARAMS}" -vn -sn  -f s16le  -y /dev/null \
    |& grep '\[silencedetect.*silence_start:' \
    | awk '{print $5}' \
    | perl -ne '
        our $prev;
        INIT { $prev = 0.0; }
        chomp;
        if (($_ - $prev) >= $ENV{MIN_FRAGMENT_DURATION}) {
            print "$_,";
            $prev = $_;
        }
    ' \
    | sed 's!,$!!'
)
echo "Splitting points are $SPLITS"
ffmpeg -v warning -i "$IN" -c copy -map 0 -f segment -segment_times "$SPLITS" "$OUT"
You specify input file, output file template, silence detection parametres and minimum fragment size, it writes multiple files.
Silence detection parameters may need to be tuned:
SD_PARAMS environment variable contains two parameters: noise tolerance level and minimum silence duration. Default value is -55dB:d=0.3. 
- Decrease the 
-55dB to e.g. -70dB if some faint non-silent sounds trigger spitting when they should not. Increase it to e.g. -40dB if it does not split on silence because of there is some noise in it, making it not completely silent. 
d=0.3 is a minimum silence duration to be considered as a splitting point. Increase it if only serious (e.g. whole 3 seconds) silence should be considered as real, split-worthy silence. 
- Another environment variable 
MIN_FRAGMENT_DURATION defines amount of time silence events are ignored after each split. This sets minimum fragment duration. 
The script would fail if no silence is detected at all.
There is a refactored version on Github Gist, but there was a problem with it for one user.