Job Operations Reference
Complete reference for all job operations in the rbee system.
Queen’s job API handles ONLY orchestration operations (Status, Infer). Worker/Model management operations go directly to Hive’s job server.
Architecture Overview
API Split
┌─────────────────────────────────────────────────────┐
│ Queen Job API (Port 7833) │
│ - Status (query registries) │
│ - Infer (schedule and route to workers) │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Hive Job API (Port 7835) │
│ - WorkerSpawn, WorkerProcessList, WorkerProcessGet │
│ - WorkerProcessDelete │
│ - ModelDownload, ModelList, ModelGet, ModelDelete │
└─────────────────────────────────────────────────────┘Key principle: NO PROXYING. Talk to Queen for orchestration, talk to Hive for worker/model management.
Queen Operations
Status
Query live status of all hives and workers from registries.
Endpoint: POST http://localhost:7833/v1/jobs
Request:
{
"operation": "status"
}Response (via SSE):
Use case: Check cluster health, see what’s online
Infer
Run inference with automatic worker provisioning.
Endpoint: POST http://localhost:7833/v1/jobs
Request:
{
"operation": "infer",
"hive_id": "localhost",
"model": "meta-llama/Llama-3.2-1B",
"prompt": "Hello, how are you?",
"max_tokens": 100,
"temperature": 0.7,
"top_p": 0.9,
"stream": true
}Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| operation | string | Required | — | Must be "infer" |
| hive_id | string | Required | — | Target hive ID (e.g., "localhost", "gpu-0") |
| model | string | Required | — | Model name or HuggingFace ID |
| prompt | string | Required | — | Input prompt text |
| max_tokens | number | Optional | — | Maximum tokens to generate (default: 100) |
| temperature | number | Optional | — | Sampling temperature 0.0-2.0 (default: 0.7) |
| top_p | number | Optional | — | Nucleus sampling threshold (default: 0.9) |
| stream | boolean | Optional | — | Enable streaming output (default: true) |
Response (streaming):
Flow:
- Queen checks worker registry for available worker
- If no worker: Queen internally sends
WorkerSpawnto hive, waits for heartbeat - Queen routes request DIRECTLY to worker (bypassing hive)
- Queen relays SSE stream back to client
If no worker exists for the model, Queen automatically spawns one on the target hive. You don’t need to manually spawn workers!
Hive Operations
Hive Job Server: http://localhost:7835/v1/jobs
These operations are NOT available through Queen’s API. Connect directly to Hive’s job server.
Worker Operations
WorkerSpawn
Spawn a new worker process on the hive.
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "worker_spawn",
"hive_id": "localhost",
"model": "meta-llama/Llama-3.2-1B",
"worker": "cpu",
"device": 0
}'Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| operation | string | Required | — | Must be "worker_spawn" |
| hive_id | string | Required | — | Target hive ID |
| model | string | Required | — | Model to load |
| worker | string | Required | — | Worker type: "cpu", "cuda", "metal" |
| device | number | Required | — | Device index (0, 1, 2...) |
Response:
WorkerProcessList
List all running worker processes on the hive.
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "worker_process_list",
"hive_id": "localhost"
}'Response:
WorkerProcessGet
Get details of a specific worker process.
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "worker_process_get",
"hive_id": "localhost",
"worker_id": "worker-123"
}'WorkerProcessDelete
Kill a worker process (SIGTERM → SIGKILL).
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "worker_process_delete",
"hive_id": "localhost",
"worker_id": "worker-123"
}'Response:
Model Operations
ModelDownload
Download a model from HuggingFace.
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "model_download",
"hive_id": "localhost",
"model": "meta-llama/Llama-3.2-1B"
}'Response:
ModelList
List all models available on the hive.
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "model_list",
"hive_id": "localhost"
}'Response:
ModelGet
Get details of a specific model.
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "model_get",
"hive_id": "localhost",
"model": "meta-llama/Llama-3.2-1B"
}'ModelDelete
Delete a model from disk.
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "model_delete",
"hive_id": "localhost",
"model": "meta-llama/Llama-3.2-1B"
}'Response:
Usage Examples
Example 1: Manual Worker Management
# 1. Spawn worker (talk to HIVE directly)
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "worker_spawn",
"hive_id": "localhost",
"model": "meta-llama/Llama-3.2-1B",
"worker": "cpu",
"device": 0
}'
# 2. Wait for worker heartbeat (automatic)
# 3. Run inference (talk to QUEEN)
curl -X POST http://localhost:7833/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "infer",
"hive_id": "localhost",
"model": "meta-llama/Llama-3.2-1B",
"prompt": "Hello!",
"max_tokens": 50,
"stream": true
}'Example 2: Automatic Worker Management
# Just run inference - Queen spawns worker if needed
curl -X POST http://localhost:7833/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "infer",
"hive_id": "localhost",
"model": "meta-llama/Llama-3.2-1B",
"prompt": "Hello!",
"max_tokens": 50,
"stream": true
}'
# Queen internally:
# 1. Checks worker registry
# 2. If no worker: sends WorkerSpawn to hive (internal)
# 3. Waits for worker heartbeat
# 4. Routes inference directly to workerLet Queen handle worker provisioning automatically. Only manually spawn workers for advanced use cases.
Job Pattern
All operations follow the same pattern:
1. Submit Job
POST /v1/jobs
Content-Type: application/json
{
"operation": "...",
...parameters...
}Response:
{
"job_id": "abc-123-def-456",
"sse_url": "/v1/jobs/abc-123-def-456/stream"
}2. Connect to SSE Stream
GET /v1/jobs/abc-123-def-456/stream
Accept: text/event-streamResponse:
data: {"action":"...","message":"..."}
data: {"action":"...","message":"..."}
data: [DONE]3. Process Events
Parse SSE events and handle based on action field.
See: Job-Based Pattern for complete details
Operation Summary
Queen Operations (Port 7833)
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| Status | operation | Optional | — | Query registries for cluster status |
| Infer | operation | Optional | — | Run inference with automatic worker provisioning |
Hive Operations (Port 7835)
Worker Management:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| WorkerSpawn | operation | Optional | — | Spawn new worker process |
| WorkerProcessList | operation | Optional | — | List all running workers |
| WorkerProcessGet | operation | Optional | — | Get worker details |
| WorkerProcessDelete | operation | Optional | — | Kill worker process |
Model Management:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| ModelDownload | operation | Optional | — | Download model from HuggingFace |
| ModelList | operation | Optional | — | List available models |
| ModelGet | operation | Optional | — | Get model details |
| ModelDelete | operation | Optional | — | Delete model from disk |