[Speech] to Text

The Speech-to-Text operation transcribe audio into whatever language the audio is in.

Input Fields

Module Configuration

This refers to the MAC Whisperer LLM Configuration set up in the Getting Started section.

General Operation Fields

Audio file path: Contains the full file path to the audio file to be transcribed.
Finetuning prompt (Optional): Instructions to fine tune the output.

Additional Properties

Model Name: Currently only OpenAI is supported with Whisper (default is whisper-1).
Language: If auto is provided, the language will be auto detected. Otherwise, specify languages in ISO-639-1 (opens in a new tab) format .
Temperature: Temperature is a number between 0 and 2, with a default value of 0.7. The temperature is used to control the randomness of the output. When you set it higher, you'll get more random outputs. When you set it lower, towards 0, the values are more deterministic.
Response format: The format of the response (default is json).

XML Configuration

Below is the XML configuration for this operation:

<
  whisperer:speech-to-text 
  doc:name="Speech to text" 
  doc:id="019cdce3-8f86-4eb4-8a16-3c19a8d11026" 
  config-ref="OpenAI"
  audioFilePath="#[payload.filePath]"
/>

Output Field

This operation responds with a json payload.

Example Output

This output has been converted to JSON.

{
    "text": "Hi, the capital of Switzerland is Bern."
}

text: The transcription of the audio file.

Supported Operations Speech Local