Use the atrim, asetpts, and concat filters:
ffmpeg -i video.mkv -i audio.mp3 -filter_complex \
"[0:a]atrim=end=12,asetpts=PTS-STARTPTS[aud1]; \
[1:a]atrim=30:42,asetpts=PTS-STARTPTS[aud2]; \
[0:a]atrim=start=24,asetpts=PTS-STARTPTS[aud3]; \
[aud1][aud2][aud3]concat=n=3:v=0:a=1[aout]" \
-map 0:v -map "[aout]" -c:v copy -c:a libfdk_aac output.mp4
- The first
atrim gets the the first 12 seconds of audio from the first input (video.mkv).
- The second
atrim gets seconds 30-42 from the second input (audio.mp3).
- The third
atrim gets seconds 24-end of audio from the first input (video.mkv).
concat then combines these segments into one audio stream.
- The video is stream copied instead of being re-encoded in this example.
- Without
asetpts I was getting buffer queue overflows resulting in a "jerky" output. See the atrim documentation for more info.