[Speech] to Text
The Speech-to-Text
operation transcribe audio into whatever language the audio is in.
Input Fields
Module Configuration
This refers to the MAC Whisperer LLM Configuration set up in the Getting Started section.
General Operation Fields
- Audio file path: Contains the full file path to the audio file to be transcribed.
- Finetuning prompt (Optional): Instructions to fine tune the output.
Additional Properties
- Model Name: Currently only OpenAI is supported with Whisper (default is
whisper-1
). - Language: If
auto
is provided, the language will be auto detected. Otherwise, specify languages in ISO-639-1 (opens in a new tab) format . - Temperature: Temperature is a number between 0 and 2, with a default value of 0.7. The temperature is used to control the randomness of the output. When you set it higher, you'll get more random outputs. When you set it lower, towards 0, the values are more deterministic.
- Response format: The format of the response (default is
json
).
XML Configuration
Below is the XML configuration for this operation:
<
whisperer:speech-to-text
doc:name="Speech to text"
doc:id="019cdce3-8f86-4eb4-8a16-3c19a8d11026"
config-ref="OpenAI"
audioFilePath="#[payload.filePath]"
/>
Output Field
This operation responds with a json
payload.
Example Output
This output has been converted to JSON.
{
"text": "Hi, the capital of Switzerland is Bern."
}
- text: The transcription of the audio file.