LLM Chat Server
An HTTP server that accepts plain-text prompts via POST /chat and streams the LLM response token by token back to the caller. The LLM connection is declared as a model and shared across all concurrent requests.
Running
melodium run 01_text_llm_chat/Compo.toml --api_key sk-... --model claude-sonnet-4-6$ curl -X POST http://127.0.0.1:8080/chat -d "What is Mélodium?"
Mélodium is a dataflow programming language…How it works
Two models are instantiated at startup:
model server: HttpServer(host=|from_ipv4(|localhost_ipv4()), port=port)
model llm: ChatLlm(api_key=api_key, model=model)ChatLlm wraps RemoteLlm with a fixed system prompt and token limit. The backend field selects the provider ("anthropic", "openai", etc.); switching providers means only changing backend and model in the model definition. The rest of the pipeline is unaffected.
The chat sub-treatment
Each incoming connection is handled by the chat sub-treatment, which is a straight three-step pipeline:
Self.data -> decode.data,text -> llmStream.prompt,token -> encode.text,data -> Self.datadecodeconverts raw request bytes to UTF-8 textllmStreamsends the text as a prompt and emits tokens as aStream<string>as they arriveencodeconverts each token back to bytes and forwards them directly intoconnection.data
Because llmStream emits tokens as a stream, they reach the HTTP response as they are produced, with no buffering and no explicit async logic.
Prompt text and LLM errors are independently forwarded to loggers via separate connections, without interrupting the token stream.
Video Explanation
Dependencies
[dependencies]
std = "0.10.1" # core flows, logging, data structures
http = "0.10.1" # HTTP server and client
net = "0.10.1" # IP address helpers
encoding = "0.10.1" # UTF-8 encode / decode
ml = "0.10.1" # LLM, STT, TTS and local model inference