[Speech] to Text Local

The Speech-to-Text-Local operation converts audio files to text directly on your machine, using local CPU resources. It provides a secure, cloud-independent solution for on-premise applications, ensuring full data control.

Input Fields

Module Configuration

This refers to the MAC Whisperer LLM Configuration set up in the Getting Started section.

General Operation Fields

Audio file path: Contains the full file path to the audio file to be transcribed. This should be specified in multipart/form-data mode.

Additional Properties

Model Path: The path to the Whisper model. When deploying the Mule application, place the model file (.bin) in the app.home directory. Use the relative path mule.home ++ "/apps/" ++ app.name ++ "/model.bin" to specify it. Available models can be accessed via Whisper.cpp on Hugging Face (opens in a new tab).
NThreads: Number of CPU threads allocated to the operation. The default value is 4.
Language: Specifies the language of the input audio. By default, the language is automatically detected.
Translate: A boolean that indicates whether the input should be translated.
Print Progress: Displays the progress of the STT operation in the logs.

XML Configuration

Below is the XML configuration for this operation:

<
whisperer:speech-to-text-local 
doc:name="Speech to text local" 
doc:id="1806fc15-3530-44ec-b686-343e5d8eeb5c" 
config-ref="OpenAI" audioFile="#[payload.audio]" 
nThreads="#[payload.nthreads]" 
language="#[payload.language]" 
modelPath='#[mule.home ++ "/apps/" ++ app.name ++ "/model.bin"]'
/>

Output Field

This operation responds with a json payload.

Example Output

This output has been converted to JSON.

{
    "transcription": "Hi, the capital of France is Paris."
}

transcription: The transcription of the audio file.

Speech Text