Skip to Content
Mélodium 0.10.1 is now available!
DocsExamplesLLM Chat Server

LLM Chat Server

Source: 01_text_llm_chat

An HTTP server that accepts plain-text prompts via POST /chat and streams the LLM response token by token back to the caller. The LLM connection is declared as a model and shared across all concurrent requests.

Running

melodium run 01_text_llm_chat/Compo.toml --api_key sk-... --model claude-sonnet-4-6
$ curl -X POST http://127.0.0.1:8080/chat -d "What is Mélodium?" Mélodium is a dataflow programming language…

How it works

Two models are instantiated at startup:

model server: HttpServer(host=|from_ipv4(|localhost_ipv4()), port=port) model llm: ChatLlm(api_key=api_key, model=model)

ChatLlm wraps RemoteLlm with a fixed system prompt and token limit. The backend field selects the provider ("anthropic", "openai", etc.); switching providers means only changing backend and model in the model definition. The rest of the pipeline is unaffected.

See in Compositeur Studio

The chat sub-treatment

Each incoming connection is handled by the chat sub-treatment, which is a straight three-step pipeline:

Self.data -> decode.data,text -> llmStream.prompt,token -> encode.text,data -> Self.data
  • decode converts raw request bytes to UTF-8 text
  • llmStream sends the text as a prompt and emits tokens as a Stream<string> as they arrive
  • encode converts each token back to bytes and forwards them directly into connection.data

Because llmStream emits tokens as a stream, they reach the HTTP response as they are produced, with no buffering and no explicit async logic.

Prompt text and LLM errors are independently forwarded to loggers via separate connections, without interrupting the token stream.

Video Explanation

Dependencies

[dependencies] std = "0.10.1" # core flows, logging, data structures http = "0.10.1" # HTTP server and client net = "0.10.1" # IP address helpers encoding = "0.10.1" # UTF-8 encode / decode ml = "0.10.1" # LLM, STT, TTS and local model inference