Voice Q&A (Local)
A fully offline voice Q&A pipeline: microphone → local Whisper (speech-to-text) → local Mistral 7B (text generation) → log and file output. No API keys are required after the initial model download from HuggingFace.
Running
melodium run 07_voice_qa_local/Compo.toml --output qa.txt[…] info: pipeline: both models ready — listening…
[…] info: answer: Mélodium is a dataflow programming language designed for …Requires approximately 14 GB of RAM for Mistral 7B.
How it works
Four models are declared: two HfHub pointers (one per model repository) and two inference models:
model WhisperHub() : HfHub { repo_id = "openai/whisper-tiny" }
model MistralHub() : HfHub { repo_id = "mistralai/Mistral-7B-v0.1" }
model Asr() : Whisper {}
model Llm() : Mistral { temperature = 0.7, top_p = 0.9, max_new_tokens = 256 }Parallel model loading
Both models are fetched concurrently from the moment startup fires:
Audio capture starts as soon as the ASR model is loaded (loadAsr.loaded). The LLM can finish loading in parallel — if it is not ready by the time the first transcription arrives, the dataflow naturally blocks until it is.
Prompt formatting
Each transcribed segment is formatted into the Mistral [INST] prompt template before being sent to the model:
Self.question -> wrapEntry.value,map -> fmt.entries,formatted -> generate.prompt,generated -> Self.answerentry(key="q") wraps the string into a StringMap, and format(format="[INST] {q} [/INST]") interpolates it. This avoids string concatenation and keeps the template readable.
Dependencies
[dependencies]
std = "0.10.1" # core flows, logging, data structures
fs = "0.10.1" # local file I/O
audio = "0.10.1" # audio decode / encode / resample
record = "0.10.1" # microphone capture
ml = "0.10.1" # LLM, STT, TTS and local model inference