Queen vs Hive: API Split

rbee has TWO separate job servers. Understanding which to use is critical.

Why Two Servers?

Queen handles orchestration (inference, status).
Hive handles lifecycle (workers, models).

This separation keeps concerns clean:

Queen focuses on routing and scheduling
Hive focuses on resource management
Workers focus on inference execution

Queen Job Server (Port 7833)

Operations: 2 only

Parameter	Type	Required	Default	Description
Status	`operation`	Optional	—	Query worker and hive registries for current state
Infer	`operation`	Optional	—	Schedule inference request (Queen routes directly to worker)

Status Operation

bash

curl -X POST http://localhost:7833/v1/jobs \
-H "Content-Type: application/json" \
-d '{"operation": "status"}'

Returns: Current state of all hives and workers from registries.

Infer Operation

bash

curl -X POST http://localhost:7833/v1/jobs \
-H "Content-Type: application/json" \
-d '{
  "operation": "infer",
  "model": "llama-3-8b",
  "prompt": "Hello!",
  "max_tokens": 50
}'

Flow:

Queen checks worker registry for available worker
If no worker: sends WorkerSpawn job to hive (internal), waits for heartbeat
Queen routes request DIRECTLY to worker (bypassing hive)
Queen relays SSE stream back to client

Critical

Inference NEVER goes through hive. Queen routes directly to worker.

OpenAI-Compatible Endpoints

Queen also provides OpenAI-compatible endpoints:

Parameter	Type	Required	Default	Description
POST /openai/v1/chat/completions	`endpoint`	Optional	—	OpenAI chat completions (streaming supported)
GET /openai/v1/models	`endpoint`	Optional	—	List available models
GET /openai/v1/models/{model}	`endpoint`	Optional	—	Get model details
GET /v1/heartbeats/stream	`endpoint`	Optional	—	SSE stream of all heartbeat events

Hive Job Server (Port 7835)

Operations: 8 operations for worker and model management

Worker Operations

Parameter	Type	Required	Default	Description
WorkerSpawn	`operation`	Optional	—	Spawn a new worker process
WorkerProcessList	`operation`	Optional	—	List all worker processes on this hive
WorkerProcessGet	`operation`	Optional	—	Get details of a specific worker
WorkerProcessDelete	`operation`	Optional	—	Kill a worker process

Example: Spawn Worker

bash

curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
  "operation": "worker_spawn",
  "hive_id": "localhost",
  "model": "meta-llama/Llama-3.2-1B",
  "worker": "cpu",
  "device": 0
}'

Example: List Workers

bash

curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
  "operation": "worker_process_list",
  "hive_id": "localhost"
}'

Model Operations

Parameter	Type	Required	Default	Description
ModelDownload	`operation`	Optional	—	Download a model from HuggingFace
ModelList	`operation`	Optional	—	List all models in local catalog
ModelGet	`operation`	Optional	—	Get details of a specific model
ModelDelete	`operation`	Optional	—	Delete a model from local catalog

Example: Download Model

bash

curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
  "operation": "model_download",
  "hive_id": "localhost",
  "model": "meta-llama/Llama-3.2-1B"
}'

Example: List Models

bash

curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
  "operation": "model_list",
  "hive_id": "localhost"
}'

Architecture Summary

rbee-keeper CLI


rbee-keeper CLI
  ├─→ Queen Job Server (http://localhost:7833/v1/jobs)
  │   ├─ Status
  │   └─ Infer
  │
  └─→ Hive Job Server (http://localhost:7835/v1/jobs)
      ├─ WorkerSpawn, WorkerProcessList, WorkerProcessGet, WorkerProcessDelete
      └─ ModelDownload, ModelList, ModelGet, ModelDelete

No Proxying

rbee-keeper talks directly to queen AND hive. There is no proxying.

rbee-keeper GUI


rbee-keeper GUI
  ├─→ Queen Web UI (iframe: http://localhost:7833/)
  ├─→ Hive Web UI (iframe: http://localhost:7835/)
  └─→ Worker Web UI (iframe: http://localhost:8080/)

Direct SDK access - GUI opens web UIs in iframes, uses SDK directly.

Inference Flow


Client → Queen (scheduling) → Worker (DIRECT)
      ↘ Hive (internal: spawn worker if needed)

Critical

Hive is NEVER in the inference path
Queen routes directly to worker
Hive only used for worker lifecycle (internal queen operation)

Decision Tree: Which Server?

Use Queen (7833) when:

✅ Running inference
✅ Checking system status
✅ Using OpenAI-compatible API
✅ Monitoring heartbeats

Use Hive (7835) when:

✅ Managing workers manually
✅ Managing models manually
✅ Checking local hive status
✅ Debugging worker issues

Port Reference

Parameter	Type	Required	Default	Description
7833	`port`	Required	Queen	Queen job server and OpenAI-compatible API
7835	`port`	Required	Hive	Hive job server for worker/model management
9000+	`port`	Optional	Workers	Worker inference servers (9001, 9002, 9003, ...)

Examples

Example 1: Manual Worker Management

bash

# 1. Spawn worker (talk to HIVE directly)
curl -X POST http://localhost:7835/v1/jobs \
-d '{"operation": "worker_spawn", "hive_id": "localhost", "model": "llama-3-8b"}'

# 2. Wait for worker heartbeat (automatic)

# 3. Run inference (talk to QUEEN)
curl -X POST http://localhost:7833/v1/jobs \
-d '{"operation": "infer", "model": "llama-3-8b", "prompt": "Hello!"}'

Example 2: Automatic Worker Management

bash

# Just run inference - queen spawns worker if needed
curl -X POST http://localhost:7833/v1/jobs \
-d '{"operation": "infer", "model": "llama-3-8b", "prompt": "Hello!"}'

# Queen internally:
# 1. Checks worker registry
# 2. If no worker: sends WorkerSpawn to hive (internal)
# 3. Waits for worker heartbeat
# 4. Routes inference directly to worker

Next Steps

Job-Based Pattern

Understand job submission and SSE streaming

Job Operations Reference

Complete API documentation

Heartbeat Architecture

How monitoring works