Cloud Worker Pipeline

Source: 14_cloud_worker_pipeline

Reads a local text file, sends it to a cloud runner for word-count processing (one count per line), and writes the results locally. The cloud runner is released explicitly after the pipeline completes.

Running


melodium run 14_cloud_worker_pipeline/Compo.toml \
  --api_token "my-api-token" \
  --input     data.txt \
  --output    word_counts.txt

Note

api_token here authenticates against a Mélodium Services API, such as Cadence.CI .


[…] info: cloud: provisioning cloud runner…
[…] info: cloud: runner provisioned, distributor connecting…
[…] info: cloud: pipeline complete

How it works

main instantiates the DistantEngine and DistributionEngine models, then wires the read/dispatch/write pipeline:


model runner: DistantEngine(api_url=|wrap<string>("https://api-staging.melodium.tech/0.1"), api_token=|wrap<string>(api_token))
model distributor: DistributionEngine(
  treatment = "cloud_worker_pipeline/main::transform",
  version   = "0.1.0"
)

Provisioning and the read gate

distant provisions a container (512 MB RAM, 1 CPU, 1 GB storage); its access flows into start, and the local input file is only read once the distributor reports ready:


provisionRunner: distant[distant_engine=runner](
    max_duration = 600,
    memory       = 512, // MB
    cpu          = 1000, // millicores
    storage      = 1024, // MB
    edition      = _,
    arch         = _,
    volumes      = [],
    containers   = [],
    service_containers = [],
    tags         = []
)
startup.trigger -> provisionRunner.trigger,access -> distribStart.access
 
distribStart: start[distributor=distributor](params=|map([]))
 
read: readLocal(path=input)
distribStart.ready -> read.trigger

main treatment diagram See in Compositeur Studio

Dispatching to the remote `transform`

dispatch wraps distribute, sendStream, and recvStream. Note that the sent and received types differ: raw bytes go out, word-count strings come back:


treatment dispatch[distributor: DistributionEngine]()
  input  data:   Stream<byte>
  output result: Stream<string>
{
    trig: trigger<byte>()
    dist: distribute[distributor=distributor]()
 
    Self.data -> trig.stream,start -> dist.trigger
 
    sendData:   sendStream<byte>[distributor=distributor](name="data")
    recvResult: recvStream<string>[distributor=distributor](name="result")
 
    dist.distribution_id -> sendData.distribution_id
    dist.distribution_id -> recvResult.distribution_id
 
    Self.data       -> sendData.data
    recvResult.data -> Self.result
}
 
read.data -> dispatch.data,result -> write.text

The name values ("data", "result") match transform’s own input/output port names exactly.

Explicit cleanup

Unlike other distributed examples where the runner runs indefinitely, this pipeline calls stop[distributor=distributor]() once writing is complete:


distribStop: stop[distributor=distributor]()
write.completed -> distribStop.trigger

This releases the cloud runner and stops billing. The max_duration = 600 on provisionRunner acts as a safety cap if cleanup fails.

The remote `transform` treatment

The WordCounter model embeds a JavaScript function that counts whitespace-separated words per line. It is defined in the same file as main but instantiated only on the remote runner:


model WordCounter() : JavaScriptEngine {
    code = ${{function countWords(line) {
    var s = line.trim();
    if (s.length === 0) return '0';
    return s.split(/\s+/).length.toString();
}
}}
}
 
treatment transform()
  model counter: WordCounter()
  input  data:   Stream<byte>
  output result: Stream<string>
{
    decode()
    wrapStr:      fromString<string>()
    jsCount:      process[engine=counter](code="countWords(value)")
    unwrapResult: unwrapOr<Json>(default=|null())
    resultStr:    tryToString<Json>()
    unwrapStr:    unwrapOr<string>(default="0")
 
    Self.data -> decode.data,text -> wrapStr.value,json -> jsCount.value,result -> unwrapResult.option,value -> resultStr.value,into -> unwrapStr.option,value -> Self.result
}

decode converts bytes to UTF-8 text, fromString<string>() wraps each line as a Json value for process, and the unwrapOr / tryToString chain extracts the count string back out, defaulting to "0" on error.

Dependencies


[dependencies]
std        = "0.10.1"  # core flows, logging, data structures
fs         = "0.10.1"  # local file I/O
encoding   = "0.10.1"  # UTF-8 encode / decode
javascript = "0.10.1"  # embedded JavaScript engine
json       = "0.10.1"  # JSON parsing and serialisation
work       = "0.10.1"  # cloud runner provisioning
distrib    = "0.10.1"  # stream distribution across runners