My actual goal was to use Google's speech-to-text API for transcribing lectures which I recorded with my MBP. It turned out the quality of those audio files is not good enough for the API and resulted in garbage transcriptions, even though it is mostly easily understandable when listening to the audio as a human. My best guess is one needs audio files recorded with a microphone in order to achieve some nice results.
Transcribing an English audio message from a frend sent over Telegram gave ok-ish results. The message was a bit technical (about computer processors, RAM etc.) and some of the technical words were not understood by the API. However, since that friend of mine is not a native English speaker I'm not sure if the API is just a bit weak with technical words or rather with non-native speakers.