Skip to Content
Mélodium 0.10.1 is now available!
DocsExamplesLLM Chat Server

LLM Chat Server

Source: 01_text_llm_chat

An HTTP server that accepts plain-text prompts via POST /chat and streams the LLM response token by token back to the caller. The LLM connection is declared as a model and shared across all concurrent requests.

Running

melodium run 01_text_llm_chat/Compo.toml --api_key sk-... --model claude-sonnet-4-6
$ curl -X POST http://127.0.0.1:8080/chat -d "What is Mélodium?" Mélodium is a dataflow programming language…

How it works

Two models are instantiated at startup:

model server: HttpServer(host=|from_ipv4(|localhost_ipv4()), port=port) model llm: ChatLlm(api_key=api_key, model=model)

ChatLlm wraps RemoteLlm with a fixed system prompt and token limit. The backend field selects the provider ("anthropic", "openai", etc.); switching providers means only changing backend and model in the model definition — the rest of the pipeline is unaffected.

See in Compositeur Studio

The chat sub-treatment

Each incoming connection is handled by the chat sub-treatment, which is a straight three-step pipeline:

Self.data -> decode.data,text -> llmStream.prompt,token -> encode.text,data -> Self.data
  • decode converts raw request bytes to UTF-8 text
  • llmStream sends the text as a prompt and emits tokens as a Stream<string> as they arrive
  • encode converts each token back to bytes and forwards them directly into connection.data

Because llmStream emits tokens as a stream, they reach the HTTP response as they are produced — no buffering, no explicit async logic.

Prompt text and LLM errors are independently forwarded to loggers via separate connections, without interrupting the token stream.

Video Explanation

Dependencies

[dependencies] std = "0.10.1" # core flows, logging, data structures http = "0.10.1" # HTTP server and client net = "0.10.1" # IP address helpers encoding = "0.10.1" # UTF-8 encode / decode ml = "0.10.1" # LLM, STT, TTS and local model inference