Skip to content
Skip to Content
ReferenceJob Operations Reference

Job Operations Reference

Complete reference for all job operations in the rbee system.

Architecture Overview

API Split

┌─────────────────────────────────────────────────────┐ │ Queen Job API (Port 7833) │ │ - Status (query registries) │ │ - Infer (schedule and route to workers) │ └─────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────┐ │ Hive Job API (Port 7835) │ │ - WorkerSpawn, WorkerProcessList, WorkerProcessGet │ │ - WorkerProcessDelete │ │ - ModelDownload, ModelList, ModelGet, ModelDelete │ └─────────────────────────────────────────────────────┘

Key principle: NO PROXYING. Talk to Queen for orchestration, talk to Hive for worker/model management.

Queen Operations

Status

Query live status of all hives and workers from registries.

Endpoint: POST http://localhost:7833/v1/jobs

Request:

{ "operation": "status" }

Response (via SSE):

Status Output
data: {"action":"status_start","message":"Querying registries..."} data: {"action":"status_hives","message":"Hives: 2 online, 2 available"} data: {"action":"status_workers","message":"Workers: 4 online, 3 available"} data: {"action":"status_complete","message":"Status query complete"} data: [DONE]

Use case: Check cluster health, see what’s online


Infer

Run inference with automatic worker provisioning.

Endpoint: POST http://localhost:7833/v1/jobs

Request:

{ "operation": "infer", "hive_id": "localhost", "model": "meta-llama/Llama-3.2-1B", "prompt": "Hello, how are you?", "max_tokens": 100, "temperature": 0.7, "top_p": 0.9, "stream": true }

Parameters:

ParameterTypeRequiredDefaultDescription
operationstringRequiredMust be "infer"
hive_idstringRequiredTarget hive ID (e.g., "localhost", "gpu-0")
modelstringRequiredModel name or HuggingFace ID
promptstringRequiredInput prompt text
max_tokensnumberOptionalMaximum tokens to generate (default: 100)
temperaturenumberOptionalSampling temperature 0.0-2.0 (default: 0.7)
top_pnumberOptionalNucleus sampling threshold (default: 0.9)
streambooleanOptionalEnable streaming output (default: true)

Response (streaming):

Inference Output
data: {"action":"infer_start","message":"Starting inference..."} data: {"action":"token","message":"Hello"} data: {"action":"token","message":"!"} data: {"action":"token","message":" How"} data: {"action":"token","message":" can"} data: {"action":"token","message":" I"} data: {"action":"token","message":" help"} data: {"action":"token","message":" you"} data: {"action":"token","message":" today"} data: {"action":"token","message":"?"} data: {"action":"infer_complete","message":"Inference complete"} data: [DONE]

Flow:

  1. Queen checks worker registry for available worker
  2. If no worker: Queen internally sends WorkerSpawn to hive, waits for heartbeat
  3. Queen routes request DIRECTLY to worker (bypassing hive)
  4. Queen relays SSE stream back to client

Hive Operations

Hive Job Server: http://localhost:7835/v1/jobs

Worker Operations

WorkerSpawn

Spawn a new worker process on the hive.

curl -X POST http://localhost:7835/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "worker_spawn", "hive_id": "localhost", "model": "meta-llama/Llama-3.2-1B", "worker": "cpu", "device": 0 }'

Parameters:

ParameterTypeRequiredDefaultDescription
operationstringRequiredMust be "worker_spawn"
hive_idstringRequiredTarget hive ID
modelstringRequiredModel to load
workerstringRequiredWorker type: "cpu", "cuda", "metal"
devicenumberRequiredDevice index (0, 1, 2...)

Response:

Worker Spawn
data: {"action":"worker_spawn_start","message":"Spawning worker..."} data: {"action":"worker_spawn_health_check","message":"Waiting for worker to start..."} data: {"action":"worker_spawn_complete","message":"Worker spawned (PID: 1234, port: 9301)"} data: [DONE]

WorkerProcessList

List all running worker processes on the hive.

curl -X POST http://localhost:7835/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "worker_process_list", "hive_id": "localhost" }'

Response:

Worker List
data: {"action":"worker_proc_list_entry","message":"PID 1234 | llama-3.2-1b | GPU 0 | running"} data: {"action":"worker_proc_list_entry","message":"PID 1235 | llama-3.2-3b | GPU 1 | running"} data: [DONE]

WorkerProcessGet

Get details of a specific worker process.

curl -X POST http://localhost:7835/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "worker_process_get", "hive_id": "localhost", "worker_id": "worker-123" }'

WorkerProcessDelete

Kill a worker process (SIGTERM → SIGKILL).

curl -X POST http://localhost:7835/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "worker_process_delete", "hive_id": "localhost", "worker_id": "worker-123" }'

Response:

Worker Delete
data: {"action":"worker_proc_del_start","message":"Killing worker PID 1234"} data: {"action":"worker_proc_del_sigterm","message":"Sent SIGTERM"} data: {"action":"worker_proc_del_ok","message":"Worker killed successfully"} data: [DONE]

Model Operations

ModelDownload

Download a model from HuggingFace.

curl -X POST http://localhost:7835/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "model_download", "hive_id": "localhost", "model": "meta-llama/Llama-3.2-1B" }'

Response:

Model Download
data: {"action":"model_download_start","message":"Downloading llama-3.2-1b"} data: {"action":"model_download_progress","message":"Downloaded 123 MB / 1230 MB (10%)"} data: {"action":"model_download_progress","message":"Downloaded 246 MB / 1230 MB (20%)"} data: {"action":"model_download_complete","message":"Download complete"} data: [DONE]

ModelList

List all models available on the hive.

curl -X POST http://localhost:7835/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "model_list", "hive_id": "localhost" }'

Response:

Model List
data: {"action":"model_list_entry","message":"llama-3.2-1b | 1.23 GB | available"} data: {"action":"model_list_entry","message":"llama-3.2-3b | 3.45 GB | available"} data: [DONE]

ModelGet

Get details of a specific model.

curl -X POST http://localhost:7835/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "model_get", "hive_id": "localhost", "model": "meta-llama/Llama-3.2-1B" }'

ModelDelete

Delete a model from disk.

curl -X POST http://localhost:7835/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "model_delete", "hive_id": "localhost", "model": "meta-llama/Llama-3.2-1B" }'

Response:

Model Delete
data: {"action":"model_delete_complete","message":"Model deleted successfully"} data: [DONE]

Usage Examples

Example 1: Manual Worker Management

# 1. Spawn worker (talk to HIVE directly) curl -X POST http://localhost:7835/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "worker_spawn", "hive_id": "localhost", "model": "meta-llama/Llama-3.2-1B", "worker": "cpu", "device": 0 }' # 2. Wait for worker heartbeat (automatic) # 3. Run inference (talk to QUEEN) curl -X POST http://localhost:7833/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "infer", "hive_id": "localhost", "model": "meta-llama/Llama-3.2-1B", "prompt": "Hello!", "max_tokens": 50, "stream": true }'

Example 2: Automatic Worker Management

# Just run inference - Queen spawns worker if needed curl -X POST http://localhost:7833/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "infer", "hive_id": "localhost", "model": "meta-llama/Llama-3.2-1B", "prompt": "Hello!", "max_tokens": 50, "stream": true }' # Queen internally: # 1. Checks worker registry # 2. If no worker: sends WorkerSpawn to hive (internal) # 3. Waits for worker heartbeat # 4. Routes inference directly to worker

Job Pattern

All operations follow the same pattern:

1. Submit Job

POST /v1/jobs Content-Type: application/json { "operation": "...", ...parameters... }

Response:

{ "job_id": "abc-123-def-456", "sse_url": "/v1/jobs/abc-123-def-456/stream" }

2. Connect to SSE Stream

GET /v1/jobs/abc-123-def-456/stream Accept: text/event-stream

Response:

data: {"action":"...","message":"..."} data: {"action":"...","message":"..."} data: [DONE]

3. Process Events

Parse SSE events and handle based on action field.

See: Job-Based Pattern for complete details

Operation Summary

Queen Operations (Port 7833)

ParameterTypeRequiredDefaultDescription
StatusoperationOptionalQuery registries for cluster status
InferoperationOptionalRun inference with automatic worker provisioning

Hive Operations (Port 7835)

Worker Management:

ParameterTypeRequiredDefaultDescription
WorkerSpawnoperationOptionalSpawn new worker process
WorkerProcessListoperationOptionalList all running workers
WorkerProcessGetoperationOptionalGet worker details
WorkerProcessDeleteoperationOptionalKill worker process

Model Management:

ParameterTypeRequiredDefaultDescription
ModelDownloadoperationOptionalDownload model from HuggingFace
ModelListoperationOptionalList available models
ModelGetoperationOptionalGet model details
ModelDeleteoperationOptionalDelete model from disk

Next Steps

2025 © rbee. Your private AI cloud, in one command.
GitHubrbee.dev