I'm working on a project for a Discord bot, I would like to allow a bot to listen within a Discord channel and process voice commands.
I'm using an open source speech-to-text Java library called Sphinx (https://cmusphinx.github.io/). I'm receiving audio data from the Discord server via this https://github.com/DV8FromTheWorld/JDA library.
This class (https://github.com/DV8FromTheWorld/JDA/blob/master/src/main/java/net/dv8tion/jda/core/audio/AudioReceiveHandler.java#L65) is used for receiving audio.
Method handleCombinedAudio(CombinedAudio audio) is called every 20 ms, and a byte[] of the audio data can be retrieved with audio.getBytes[]. 
The voice recognition software requires an InputStream of a byte array to properly recognize data. I have a method that concatenates byte arrays to form 3 sec chunks of sound, each which is processed by the voice recognition software. The problem I've run into is a mismatch of sound formats.
Sphinx requires RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000  Hz
Discord returns audio in: 48KHz 16bit stereo signed BigEndian PCM
How do I convert the received byte[] array from Discord into the proper format for Sphinx?
Any ideas would be greatly appreciated. Please be specific in answers.
