Queen vs Hive: API Split
rbee has TWO separate job servers. Understanding which to use is critical.
Why Two Servers?
Queen handles orchestration (inference, status).
Hive handles lifecycle (workers, models).
This separation keeps concerns clean:
- Queen focuses on routing and scheduling
- Hive focuses on resource management
- Workers focus on inference execution
Queen Job Server (Port 7833)
Operations: 2 only
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| Status | operation | Optional | — | Query worker and hive registries for current state |
| Infer | operation | Optional | — | Schedule inference request (Queen routes directly to worker) |
Status Operation
bash
curl -X POST http://localhost:7833/v1/jobs \-H "Content-Type: application/json" \-d '{"operation": "status"}'Returns: Current state of all hives and workers from registries.
Infer Operation
bash
curl -X POST http://localhost:7833/v1/jobs \-H "Content-Type: application/json" \-d '{ "operation": "infer", "model": "llama-3-8b", "prompt": "Hello!", "max_tokens": 50}'Flow:
- Queen checks worker registry for available worker
- If no worker: sends
WorkerSpawnjob to hive (internal), waits for heartbeat - Queen routes request DIRECTLY to worker (bypassing hive)
- Queen relays SSE stream back to client
Critical
Inference NEVER goes through hive. Queen routes directly to worker.
OpenAI-Compatible Endpoints
Queen also provides OpenAI-compatible endpoints:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| POST /openai/v1/chat/completions | endpoint | Optional | — | OpenAI chat completions (streaming supported) |
| GET /openai/v1/models | endpoint | Optional | — | List available models |
| GET /openai/v1/models/{model} | endpoint | Optional | — | Get model details |
| GET /v1/heartbeats/stream | endpoint | Optional | — | SSE stream of all heartbeat events |
Hive Job Server (Port 7835)
Operations: 8 operations for worker and model management
Worker Operations
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| WorkerSpawn | operation | Optional | — | Spawn a new worker process |
| WorkerProcessList | operation | Optional | — | List all worker processes on this hive |
| WorkerProcessGet | operation | Optional | — | Get details of a specific worker |
| WorkerProcessDelete | operation | Optional | — | Kill a worker process |
Example: Spawn Worker
bash
curl -X POST http://localhost:7835/v1/jobs \-H "Content-Type: application/json" \-d '{ "operation": "worker_spawn", "hive_id": "localhost", "model": "meta-llama/Llama-3.2-1B", "worker": "cpu", "device": 0}'Example: List Workers
bash
curl -X POST http://localhost:7835/v1/jobs \-H "Content-Type: application/json" \-d '{ "operation": "worker_process_list", "hive_id": "localhost"}'Model Operations
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| ModelDownload | operation | Optional | — | Download a model from HuggingFace |
| ModelList | operation | Optional | — | List all models in local catalog |
| ModelGet | operation | Optional | — | Get details of a specific model |
| ModelDelete | operation | Optional | — | Delete a model from local catalog |
Example: Download Model
bash
curl -X POST http://localhost:7835/v1/jobs \-H "Content-Type: application/json" \-d '{ "operation": "model_download", "hive_id": "localhost", "model": "meta-llama/Llama-3.2-1B"}'Example: List Models
bash
curl -X POST http://localhost:7835/v1/jobs \-H "Content-Type: application/json" \-d '{ "operation": "model_list", "hive_id": "localhost"}'Architecture Summary
rbee-keeper CLI
rbee-keeper CLI
├─→ Queen Job Server (http://localhost:7833/v1/jobs)
│ ├─ Status
│ └─ Infer
│
└─→ Hive Job Server (http://localhost:7835/v1/jobs)
├─ WorkerSpawn, WorkerProcessList, WorkerProcessGet, WorkerProcessDelete
└─ ModelDownload, ModelList, ModelGet, ModelDeleteNo Proxying
rbee-keeper talks directly to queen AND hive. There is no proxying.
rbee-keeper GUI
rbee-keeper GUI
├─→ Queen Web UI (iframe: http://localhost:7833/)
├─→ Hive Web UI (iframe: http://localhost:7835/)
└─→ Worker Web UI (iframe: http://localhost:8080/)Direct SDK access - GUI opens web UIs in iframes, uses SDK directly.
Inference Flow
Client → Queen (scheduling) → Worker (DIRECT)
↘ Hive (internal: spawn worker if needed)Critical
- Hive is NEVER in the inference path
- Queen routes directly to worker
- Hive only used for worker lifecycle (internal queen operation)
Decision Tree: Which Server?
Use Queen (7833) when:
- ✅ Running inference
- ✅ Checking system status
- ✅ Using OpenAI-compatible API
- ✅ Monitoring heartbeats
Use Hive (7835) when:
- ✅ Managing workers manually
- ✅ Managing models manually
- ✅ Checking local hive status
- ✅ Debugging worker issues
Port Reference
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| 7833 | port | Required | Queen | Queen job server and OpenAI-compatible API |
| 7835 | port | Required | Hive | Hive job server for worker/model management |
| 9000+ | port | Optional | Workers | Worker inference servers (9001, 9002, 9003, ...) |
Examples
Example 1: Manual Worker Management
bash
# 1. Spawn worker (talk to HIVE directly)curl -X POST http://localhost:7835/v1/jobs \-d '{"operation": "worker_spawn", "hive_id": "localhost", "model": "llama-3-8b"}'
# 2. Wait for worker heartbeat (automatic)
# 3. Run inference (talk to QUEEN)curl -X POST http://localhost:7833/v1/jobs \-d '{"operation": "infer", "model": "llama-3-8b", "prompt": "Hello!"}'Example 2: Automatic Worker Management
bash
# Just run inference - queen spawns worker if neededcurl -X POST http://localhost:7833/v1/jobs \-d '{"operation": "infer", "model": "llama-3-8b", "prompt": "Hello!"}'
# Queen internally:# 1. Checks worker registry# 2. If no worker: sends WorkerSpawn to hive (internal)# 3. Waits for worker heartbeat# 4. Routes inference directly to worker