Distributed LLM Inference
Source: 15_distributed_llm_inference
An HTTP server that accepts plain-text prompts and streams LLM responses back. The LLM call runs on a Mélodium cloud runner — the ml package only needs to be available on the runner, not on the front-end machine. The front-end requires no ML dependencies at all.
Running
melodium run 15_distributed_llm_inference/Compo.toml \
--api_token "my-api-token" \
--openai_key sk-...$ curl -X POST http://127.0.0.1:8080/chat -d "Explain Mélodium in one sentence."
Mélodium is a dataflow programming language…How it works
The Assistant model and the inferText treatment run on the remote runner. The front-end only needs the http, distrib, and work packages:
model distributor: DistributionEngine(
treatment = "distributed_llm_inference/main::inferText",
version = "0.1.0"
)Passing const parameters to the remote treatment
inferText needs the openai_key to configure its Assistant model, but const parameters cannot be passed through streams. They are sent via the distribution engine’s start call:
distribStart: start[distributor=distributor](
params = |map([|entry<string>("openai_key", openai_key)])
)On the remote side, inferText declares:
treatment inferText(const openai_key: string)
model llm: Assistant(openai_key=openai_key)The const is set once when the runner starts and shared across all invocations of that treatment.
Token streaming end-to-end
chat on the remote side emits response tokens as Stream<string>. They are encoded to bytes, sent back through recvStream<byte>, and forwarded directly into connection.data on the front-end — tokens appear in the HTTP response as they are generated, with no intermediate buffering.
Dependencies
[dependencies]
std = "0.10.1" # core flows, logging, data structures
http = "0.10.1" # HTTP server and client
net = "0.10.1" # IP address helpers
encoding = "0.10.1" # UTF-8 encode / decode
work = "0.10.1" # cloud runner provisioning
distrib = "0.10.1" # stream distribution across runners
ml = "0.10.1" # LLM, STT, TTS and local model inference