Skip to Content
Mélodium 0.10.1 is now available!
DocsExamplesVision Chat

Vision Chat

Source: 12_vision_chat

Sends an image URL with a question to a vision-capable LLM (GPT-4o) and returns the description. Two entrypoints: main for a single one-shot CLI call, and server for an HTTP server accepting JSON requests.

Running

One-shot CLI:

melodium run 12_vision_chat/Compo.toml \ --image_url "https://example.com/photo.jpg" \ --question "What do you see?" \ --openai_key sk-...

HTTP server:

melodium run 12_vision_chat/Compo.toml server -- --openai_key sk-... --port 8080
$ curl -X POST http://127.0.0.1:8080/describe \ -H "Content-Type: application/json" \ -d '{"url":"https://example.com/photo.jpg","question":"Describe this image."}' The image shows a…

How it works

Both entrypoints instantiate the same Vision model:

model Vision(const openai_key: string) : RemoteLlm { backend = "openai" model = "gpt-4o" system = "You are an expert image analyst. Describe images clearly and in detail." }

main — CLI entrypoint

See in Compositeur Studio

describeUrl receives the image URL and question as const parameters, emits a StringMap, and uses format to build the prompt string — no string concatenation in the dataflow. The response fans out to the log and a local file simultaneously.

server — HTTP entrypoint

See in Compositeur Studio

The handleDescribe sub-treatment uses a JavaScriptEngine model (PromptBuilder) to parse the JSON body and build the prompt dynamically. This is more flexible than a fixed format string when the input structure may vary.

decode() -> toJson() -> unwrapBody -> buildPrompt[engine=promptBuilder] -> unwrapPrompt -> promptStr -> promptOr -> chat[llm=llm]

tryToString<Json>() extracts a plain string from the JSON result with an Option<string> return, which unwrapOr<string>(default="") then resolves to a safe fallback.

Dependencies

[dependencies] std = "0.10.1" # core flows, logging, data structures fs = "0.10.1" # local file I/O http = "0.10.1" # HTTP server and client net = "0.10.1" # IP address helpers json = "0.10.1" # JSON parsing and serialisation encoding = "0.10.1" # UTF-8 encode / decode javascript = "0.10.1" # embedded JavaScript engine ml = "0.10.1" # LLM, STT, TTS and local model inference