haimgs command is not completely right. Classically you pan by reducing the volume of only one channel. That means: if you want your signal to be 80% left, the left channel keeps the original volume while the right channel only gets 20% of the original volume. At least that's what sox's panning code did (and what Alan Blumlein seems to have propesed when inventing stereo).
Also his command can be shortened using the remix-option.
Therefore the corrected and shortened command is:
sox left.wav right.wav stereo.ogg remix 1,2v0.2 1v0.2,2
edit in answer to haimgs comment:
sox will warn you if clipping occours. But yes, it is possible. With the remix-option every channels volume is scaled with the factor 1/n, where n is the number of input channels. But thats only used if NO VULME OPTION is specified for the output channel (so your 100% + 20% is correct).
sox also has an option to scale any channel without explicit volume information, just add an "-a" after "remix" (like "remix -a 1,2v0.2 1v0.2,2") and the volumes will be like 50% +20% = 70%. It's pretty confusing and by now I'm not shure whether you also have to scale the panned channels volume by 1/n, which would result in "remix -a 1,2v0.1 1v0.1,2", or 50% + 10% = 60%. I will have to further investigate in this direction. Meanwhile you could read the remix-section in the man page of sox (also available at the sox homepage).
edit after further reflecting:
After thinking about it I am pretty shure that you have to scale the panned volumes by 1/n, too.
About the clipping issue: By dividing ALL the volumes by the number of channels, this problem can not occur. But that does not preserve the original power of the signal, because the power of a signal is logarithmic, not linear. The more channels you mix, the more silent the signal should become. That's why sox also got options for that, where the volumes get scaled by 1/sqrt(n). To use this, just take a "p" instead of a "v" at the remix-part and adjust the values accordingly, and also add a "-p"-option after the remix-statement. You can see the difference of scaling by 1/n and by 1/sqrt(n) here.
The following is how I think to compute the correct power values : for each channel you have to solve 20*log_10(factor). A factor of 2 will result in ~6(dB), a factor of 0.5 will result in ~-6(dB). That's exactly what the sox manual says, so I guess this is right.
So, finally the command in you case should be:
sox left.wav right.wav stereo.ogg remix -p -a 1,2p-6 1p-6,2
I don't have sox on this machine, so I can't test this command for correct syntax, so please tell me if there is a problem. I will test all this theory as soon as I get the chance to, because I will face a simmilar issue, but I will have to mix many more channels than just 2, and that is why I came up with that signal power stuff.