Does Azure's batch transcription support speaker diarization for more than 2 speakers?
I checked their Rest API documentation and didn't find anything relevant.
Are there other ways to do this using Azure cognitive services?
Does Azure's batch transcription support speaker diarization for more than 2 speakers?
I checked their Rest API documentation and didn't find anything relevant.
Are there other ways to do this using Azure cognitive services?
I believe that diarization is limited to two parties. From the MS documentation on V2T batch transcription:
diarizationEnabled - Optional, false by default. Specifies that diarization analysis should be carried out on the input, which is expected to be a mono channel that contains two voices. Requires wordLevelTimestampsEnabled to be set to true. [emphasis added]
Source: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription
Cog services now supports Speaker Recognition, which verifies the voice print of known account holders, and may work fine for n>2 way-conversations, but that only works for known account holders with profiles.
To update Frank's answer, more than 2 speakers seems to be available without speaker recognition now with version 3.1 through the diarization property:
diarization - [..] You need to use this property when you expect three or more speakers. For two speakers setting diarizationEnabled property to true is enough. [..] The maximum number of speakers for diarization must be less than 36 and more or equal to the minSpeakers property.
As noted above, the older method of diarizationEnabled is still available for
max. 2 speakers.