[Speech] to Text Local
The Speech-to-Text-Local
operation converts audio files to text directly on your machine, using local CPU resources. It provides a secure, cloud-independent solution for on-premise applications, ensuring full data control.
Input Fields
Module Configuration
This refers to the MAC Whisperer LLM Configuration set up in the Getting Started section.
General Operation Fields
- Audio file path: Contains the full file path to the audio file to be transcribed. This should be specified in
multipart/form-data
mode.
Additional Properties
- Model Path: The path to the Whisper model. When deploying the Mule application, place the model file (
.bin
) in theapp.home
directory. Use the relative pathmule.home ++ "/apps/" ++ app.name ++ "/model.bin"
to specify it. Available models can be accessed via Whisper.cpp on Hugging Face (opens in a new tab). - NThreads: Number of CPU threads allocated to the operation. The default value is 4.
- Language: Specifies the language of the input audio. By default, the language is automatically detected.
- Translate: A boolean that indicates whether the input should be translated.
- Print Progress: Displays the progress of the STT operation in the logs.
XML Configuration
Below is the XML configuration for this operation:
<
whisperer:speech-to-text-local
doc:name="Speech to text local"
doc:id="1806fc15-3530-44ec-b686-343e5d8eeb5c"
config-ref="OpenAI" audioFile="#[payload.audio]"
nThreads="#[payload.nthreads]"
language="#[payload.language]"
modelPath='#[mule.home ++ "/apps/" ++ app.name ++ "/model.bin"]'
/>
Output Field
This operation responds with a json
payload.
Example Output
This output has been converted to JSON.
{
"transcription": "Hi, the capital of France is Paris."
}
- transcription: The transcription of the audio file.