Job Operations Reference

Complete reference for all job operations in the rbee system.

Critical Architecture

Queen’s job API handles ONLY orchestration operations (Status, Infer). Worker/Model management operations go directly to Hive’s job server.

Architecture Overview

API Split


┌─────────────────────────────────────────────────────┐
│ Queen Job API (Port 7833)                           │
│  - Status (query registries)                        │
│  - Infer (schedule and route to workers)            │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│ Hive Job API (Port 7835)                            │
│  - WorkerSpawn, WorkerProcessList, WorkerProcessGet │
│  - WorkerProcessDelete                              │
│  - ModelDownload, ModelList, ModelGet, ModelDelete  │
└─────────────────────────────────────────────────────┘

Key principle: NO PROXYING. Talk to Queen for orchestration, talk to Hive for worker/model management.

Queen Operations

Status

Query live status of all hives and workers from registries.

Endpoint: POST http://localhost:7833/v1/jobs

Request:


{
  "operation": "status"
}

Response (via SSE):

Status Output

data: {"action":"status_start","message":"Querying registries..."} data: {"action":"status_hives","message":"Hives: 2 online, 2 available"} data: {"action":"status_workers","message":"Workers: 4 online, 3 available"} data: {"action":"status_complete","message":"Status query complete"} data: [DONE]

Use case: Check cluster health, see what’s online

Infer

Run inference with automatic worker provisioning.

Endpoint: POST http://localhost:7833/v1/jobs

Request:


{
  "operation": "infer",
  "hive_id": "localhost",
  "model": "meta-llama/Llama-3.2-1B",
  "prompt": "Hello, how are you?",
  "max_tokens": 100,
  "temperature": 0.7,
  "top_p": 0.9,
  "stream": true
}

Parameters:

Parameter	Type	Required	Default	Description
operation	`string`	Required	—	Must be "infer"
hive_id	`string`	Required	—	Target hive ID (e.g., "localhost", "gpu-0")
model	`string`	Required	—	Model name or HuggingFace ID
prompt	`string`	Required	—	Input prompt text
max_tokens	`number`	Optional	—	Maximum tokens to generate (default: 100)
temperature	`number`	Optional	—	Sampling temperature 0.0-2.0 (default: 0.7)
top_p	`number`	Optional	—	Nucleus sampling threshold (default: 0.9)
stream	`boolean`	Optional	—	Enable streaming output (default: true)

Response (streaming):

Inference Output

data: {"action":"infer_start","message":"Starting inference..."} data: {"action":"token","message":"Hello"} data: {"action":"token","message":"!"} data: {"action":"token","message":" How"} data: {"action":"token","message":" can"} data: {"action":"token","message":" I"} data: {"action":"token","message":" help"} data: {"action":"token","message":" you"} data: {"action":"token","message":" today"} data: {"action":"token","message":"?"} data: {"action":"infer_complete","message":"Inference complete"} data: [DONE]

Flow:

Queen checks worker registry for available worker
If no worker: Queen internally sends WorkerSpawn to hive, waits for heartbeat
Queen routes request DIRECTLY to worker (bypassing hive)
Queen relays SSE stream back to client

Automatic Worker Provisioning

If no worker exists for the model, Queen automatically spawns one on the target hive. You don’t need to manually spawn workers!

Hive Operations

Hive Job Server: http://localhost:7835/v1/jobs

Direct Access Required

These operations are NOT available through Queen’s API. Connect directly to Hive’s job server.

Worker Operations

WorkerSpawn

Spawn a new worker process on the hive.


curl -X POST http://localhost:7835/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "worker_spawn",
    "hive_id": "localhost",
    "model": "meta-llama/Llama-3.2-1B",
    "worker": "cpu",
    "device": 0
  }'

Parameters:

Parameter	Type	Required	Default	Description
operation	`string`	Required	—	Must be "worker_spawn"
hive_id	`string`	Required	—	Target hive ID
model	`string`	Required	—	Model to load
worker	`string`	Required	—	Worker type: "cpu", "cuda", "metal"
device	`number`	Required	—	Device index (0, 1, 2...)

Response:

Worker Spawn

data: {"action":"worker_spawn_start","message":"Spawning worker..."} data: {"action":"worker_spawn_health_check","message":"Waiting for worker to start..."} data: {"action":"worker_spawn_complete","message":"Worker spawned (PID: 1234, port: 9301)"} data: [DONE]

WorkerProcessList

List all running worker processes on the hive.


curl -X POST http://localhost:7835/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "worker_process_list",
    "hive_id": "localhost"
  }'

Response:

Worker List

data: {"action":"worker_proc_list_entry","message":"PID 1234 | llama-3.2-1b | GPU 0 | running"} data: {"action":"worker_proc_list_entry","message":"PID 1235 | llama-3.2-3b | GPU 1 | running"} data: [DONE]

WorkerProcessGet

Get details of a specific worker process.


curl -X POST http://localhost:7835/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "worker_process_get",
    "hive_id": "localhost",
    "worker_id": "worker-123"
  }'

WorkerProcessDelete

Kill a worker process (SIGTERM → SIGKILL).


curl -X POST http://localhost:7835/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "worker_process_delete",
    "hive_id": "localhost",
    "worker_id": "worker-123"
  }'

Response:

Worker Delete

data: {"action":"worker_proc_del_start","message":"Killing worker PID 1234"} data: {"action":"worker_proc_del_sigterm","message":"Sent SIGTERM"} data: {"action":"worker_proc_del_ok","message":"Worker killed successfully"} data: [DONE]

Model Operations

ModelDownload

Download a model from HuggingFace.


curl -X POST http://localhost:7835/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "model_download",
    "hive_id": "localhost",
    "model": "meta-llama/Llama-3.2-1B"
  }'

Response:

Model Download

data: {"action":"model_download_start","message":"Downloading llama-3.2-1b"} data: {"action":"model_download_progress","message":"Downloaded 123 MB / 1230 MB (10%)"} data: {"action":"model_download_progress","message":"Downloaded 246 MB / 1230 MB (20%)"} data: {"action":"model_download_complete","message":"Download complete"} data: [DONE]

ModelList

List all models available on the hive.


curl -X POST http://localhost:7835/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "model_list",
    "hive_id": "localhost"
  }'

Response:

Model List

data: {"action":"model_list_entry","message":"llama-3.2-1b | 1.23 GB | available"} data: {"action":"model_list_entry","message":"llama-3.2-3b | 3.45 GB | available"} data: [DONE]

ModelGet

Get details of a specific model.


curl -X POST http://localhost:7835/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "model_get",
    "hive_id": "localhost",
    "model": "meta-llama/Llama-3.2-1B"
  }'

ModelDelete

Delete a model from disk.


curl -X POST http://localhost:7835/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "model_delete",
    "hive_id": "localhost",
    "model": "meta-llama/Llama-3.2-1B"
  }'

Response:

Model Delete

data: {"action":"model_delete_complete","message":"Model deleted successfully"} data: [DONE]

Usage Examples

Example 1: Manual Worker Management


# 1. Spawn worker (talk to HIVE directly)
curl -X POST http://localhost:7835/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "worker_spawn",
    "hive_id": "localhost",
    "model": "meta-llama/Llama-3.2-1B",
    "worker": "cpu",
    "device": 0
  }'
 
# 2. Wait for worker heartbeat (automatic)
 
# 3. Run inference (talk to QUEEN)
curl -X POST http://localhost:7833/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "infer",
    "hive_id": "localhost",
    "model": "meta-llama/Llama-3.2-1B",
    "prompt": "Hello!",
    "max_tokens": 50,
    "stream": true
  }'

Example 2: Automatic Worker Management


# Just run inference - Queen spawns worker if needed
curl -X POST http://localhost:7833/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "infer",
    "hive_id": "localhost",
    "model": "meta-llama/Llama-3.2-1B",
    "prompt": "Hello!",
    "max_tokens": 50,
    "stream": true
  }'
 
# Queen internally:
# 1. Checks worker registry
# 2. If no worker: sends WorkerSpawn to hive (internal)
# 3. Waits for worker heartbeat
# 4. Routes inference directly to worker

Recommended Approach

Let Queen handle worker provisioning automatically. Only manually spawn workers for advanced use cases.

Job Pattern

All operations follow the same pattern:

1. Submit Job


POST /v1/jobs
Content-Type: application/json
 
{
  "operation": "...",
  ...parameters...
}

Response:


{
  "job_id": "abc-123-def-456",
  "sse_url": "/v1/jobs/abc-123-def-456/stream"
}

2. Connect to SSE Stream


GET /v1/jobs/abc-123-def-456/stream
Accept: text/event-stream

Response:


data: {"action":"...","message":"..."}
data: {"action":"...","message":"..."}
data: [DONE]

3. Process Events

Parse SSE events and handle based on action field.

See: Job-Based Pattern for complete details

Operation Summary

Queen Operations (Port 7833)

Parameter	Type	Required	Default	Description
Status	`operation`	Optional	—	Query registries for cluster status
Infer	`operation`	Optional	—	Run inference with automatic worker provisioning

Hive Operations (Port 7835)

Worker Management:

Parameter	Type	Required	Default	Description
WorkerSpawn	`operation`	Optional	—	Spawn new worker process
WorkerProcessList	`operation`	Optional	—	List all running workers
WorkerProcessGet	`operation`	Optional	—	Get worker details
WorkerProcessDelete	`operation`	Optional	—	Kill worker process

Model Management:

Parameter	Type	Required	Default	Description
ModelDownload	`operation`	Optional	—	Download model from HuggingFace
ModelList	`operation`	Optional	—	List available models
ModelGet	`operation`	Optional	—	Get model details
ModelDelete	`operation`	Optional	—	Delete model from disk

Next Steps

Job-Based Pattern

How operations work

API Split Architecture

Queen vs Hive explained

CLI Reference

Use rbee CLI instead