Note for future readers:
when I typed this answer, the question didn't specify any environment/os/language, it just asked how to merge audio.
Basic concept: equal waves amplify each other, opposite waves attenuate each other.
So, basic idea is:
- Convert audio-sample to PCM (if it isn't PCM already) (you might want to do this in chunks)
- Sum each channel's sample and divide by number of channels (substituting
0 for the tracks that take less time than the longest track).
- Pack (to wave etc.) and store binary (optionally compressed as mp3 etc).
Example: 3 mono tracks (T=track, C=channel, Sn=single Sample n's value or 0):
resultTrack_C1_Sn = ( T1_C1_Sn + T2_C1_Sn + T3_C1_Sn ) / totalTracks
When multi-channel (like stereo): repeat for each channel, interleaving results (left channel comes first.). Naturally you can do this in one 'run'.
Note (for PCM):
<= 8 bits sample resolution are unsigned in PCM (so 0 is 127, in other words, negative wave-swing is below 127, positive wave-swing is above 127) and you need to account for that.
>8 bits are signed so you can safely use the basic equation above (since -4 + -5 = -9).
That should get you started in the basic concept.