Seems as if there is no Windows built-in program that can do that for now, although one can expect this in future, especially if the Windows assistant Cortana is already there, and with the Speech-To-Text app already available on a smaller scale.
Yet, for now, the "other solutions" are needed:
You need to search for an ASR (=STT) model, meaning "Automatic Speech Recognition" (=Speech-To-Text) model
A nice theoretical overview of ASR is at https://maelfabien.github.io/machinelearning/speech_reco/#.
As this question is about the practical side of it:
- You will either need to buy a Speech-To-Text program - I have once bought Dragon NaturallySpeaking of the market leader "Nuance" that was sold in combination with a Philips VoiceTracer. This shall not advertise anything, it is just the way how I got my first Speech-To-Text program. I have never tested it, although doing that is still on my list :).
- Or you need to search for a pretrained model / train a model yourself.
I will just tell how I searched for it, which is the main answer, not the exact links. StackExchange is rather not about dropping some products or links, which is deemed rather off-topic. I have not tested anything and I am not a professional user.
Searching for ASR models, I found three pretrained models at "Hugging Face", which is an AI community that offers the seemingly most relevant choice of models, good if I only want to find few but relevant results at first: https://huggingface.co/models?pipeline_tag=automatic-speech-recognition.
Then I had a look at them in detail and found them to be trained on models which are publicly available on GitHub:
Then we see here that everything starts and ends on GitHub, which should not surprise. On GitHub, you would want to search for ASR, STT, Automatic Speech Recognition, Speech-To-Text, and perhaps just "speech", as I did, sorting the results by stars, to find "Mozilla DeepSpeech" to be the most promising project: https://github.com/mozilla/DeepSpeech#project-deepspeech.
For Chrome, there is SpeechTexter which supports all of the various dialects of Spanish.
You should try the free version of Google Speech-to-Text.
Also, if you search with the right keywords and add your language, you will find models that are pretrained in your needed language, for example
If you go on searching like this, you will find more projects. You will usually not need any programming skills, the demos are more a copy and paste job. The only thing needed is to have the right programming framework at hand.
Mind that some models or programs need a chosen sample rate as input, for example 16 KHz. You will sometimes need to reformat your audio files or your audio input.