Vision Chat
Sends an image URL with a question to a vision-capable LLM (GPT-4o) and returns the description. Two entrypoints: main for a single one-shot CLI call, and server for an HTTP server accepting JSON requests.
Running
One-shot CLI:
melodium run 12_vision_chat/Compo.toml \
--image_url "https://example.com/photo.jpg" \
--question "What do you see?" \
--openai_key sk-...HTTP server:
melodium run 12_vision_chat/Compo.toml server -- --openai_key sk-... --port 8080$ curl -X POST http://127.0.0.1:8080/describe \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com/photo.jpg","question":"Describe this image."}'
The image shows a…How it works
Both entrypoints instantiate the same Vision model:
model Vision(const openai_key: string) : RemoteLlm {
backend = "openai"
model = "gpt-4o"
system = "You are an expert image analyst. Describe images clearly and in detail."
…
}main — CLI entrypoint
See in Compositeur Studio
describeUrl receives the image URL and question as const parameters, emits a StringMap, and uses format to build the prompt string — no string concatenation in the dataflow. The response fans out to the log and a local file simultaneously.
server — HTTP entrypoint
See in Compositeur Studio
The handleDescribe sub-treatment uses a JavaScriptEngine model (PromptBuilder) to parse the JSON body and build the prompt dynamically. This is more flexible than a fixed format string when the input structure may vary.
decode() -> toJson() -> unwrapBody -> buildPrompt[engine=promptBuilder] -> unwrapPrompt -> promptStr -> promptOr -> chat[llm=llm]tryToString<Json>() extracts a plain string from the JSON result with an Option<string> return, which unwrapOr<string>(default="") then resolves to a safe fallback.
Dependencies
[dependencies]
std = "0.10.1" # core flows, logging, data structures
fs = "0.10.1" # local file I/O
http = "0.10.1" # HTTP server and client
net = "0.10.1" # IP address helpers
json = "0.10.1" # JSON parsing and serialisation
encoding = "0.10.1" # UTF-8 encode / decode
javascript = "0.10.1" # embedded JavaScript engine
ml = "0.10.1" # LLM, STT, TTS and local model inference