Skip to content
Skip to Content
Architecture & ConceptsQueen vs Hive: API Split

Queen vs Hive: API Split

rbee has TWO separate job servers. Understanding which to use is critical.

Why Two Servers?

This separation keeps concerns clean:

  • Queen focuses on routing and scheduling
  • Hive focuses on resource management
  • Workers focus on inference execution

Queen Job Server (Port 7833)

Operations: 2 only

ParameterTypeRequiredDefaultDescription
StatusoperationOptionalQuery worker and hive registries for current state
InferoperationOptionalSchedule inference request (Queen routes directly to worker)

Status Operation

bash
curl -X POST http://localhost:7833/v1/jobs \
-H "Content-Type: application/json" \
-d '{"operation": "status"}'

Returns: Current state of all hives and workers from registries.

Infer Operation

bash
curl -X POST http://localhost:7833/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "infer",
"model": "llama-3-8b",
"prompt": "Hello!",
"max_tokens": 50
}'

Flow:

  1. Queen checks worker registry for available worker
  2. If no worker: sends WorkerSpawn job to hive (internal), waits for heartbeat
  3. Queen routes request DIRECTLY to worker (bypassing hive)
  4. Queen relays SSE stream back to client

OpenAI-Compatible Endpoints

Queen also provides OpenAI-compatible endpoints:

ParameterTypeRequiredDefaultDescription
POST /openai/v1/chat/completionsendpointOptionalOpenAI chat completions (streaming supported)
GET /openai/v1/modelsendpointOptionalList available models
GET /openai/v1/models/{model}endpointOptionalGet model details
GET /v1/heartbeats/streamendpointOptionalSSE stream of all heartbeat events

Hive Job Server (Port 7835)

Operations: 8 operations for worker and model management

Worker Operations

ParameterTypeRequiredDefaultDescription
WorkerSpawnoperationOptionalSpawn a new worker process
WorkerProcessListoperationOptionalList all worker processes on this hive
WorkerProcessGetoperationOptionalGet details of a specific worker
WorkerProcessDeleteoperationOptionalKill a worker process

Example: Spawn Worker

bash
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "worker_spawn",
"hive_id": "localhost",
"model": "meta-llama/Llama-3.2-1B",
"worker": "cpu",
"device": 0
}'

Example: List Workers

bash
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "worker_process_list",
"hive_id": "localhost"
}'

Model Operations

ParameterTypeRequiredDefaultDescription
ModelDownloadoperationOptionalDownload a model from HuggingFace
ModelListoperationOptionalList all models in local catalog
ModelGetoperationOptionalGet details of a specific model
ModelDeleteoperationOptionalDelete a model from local catalog

Example: Download Model

bash
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "model_download",
"hive_id": "localhost",
"model": "meta-llama/Llama-3.2-1B"
}'

Example: List Models

bash
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "model_list",
"hive_id": "localhost"
}'

Architecture Summary

rbee-keeper CLI

rbee-keeper CLI ├─→ Queen Job Server (http://localhost:7833/v1/jobs) │ ├─ Status │ └─ Infer └─→ Hive Job Server (http://localhost:7835/v1/jobs) ├─ WorkerSpawn, WorkerProcessList, WorkerProcessGet, WorkerProcessDelete └─ ModelDownload, ModelList, ModelGet, ModelDelete

rbee-keeper GUI

rbee-keeper GUI ├─→ Queen Web UI (iframe: http://localhost:7833/) ├─→ Hive Web UI (iframe: http://localhost:7835/) └─→ Worker Web UI (iframe: http://localhost:8080/)

Direct SDK access - GUI opens web UIs in iframes, uses SDK directly.

Inference Flow

Client → Queen (scheduling) → Worker (DIRECT) ↘ Hive (internal: spawn worker if needed)

Decision Tree: Which Server?

Use Queen (7833) when:

  • ✅ Running inference
  • ✅ Checking system status
  • ✅ Using OpenAI-compatible API
  • ✅ Monitoring heartbeats

Use Hive (7835) when:

  • ✅ Managing workers manually
  • ✅ Managing models manually
  • ✅ Checking local hive status
  • ✅ Debugging worker issues

Port Reference

ParameterTypeRequiredDefaultDescription
7833portRequiredQueenQueen job server and OpenAI-compatible API
7835portRequiredHiveHive job server for worker/model management
9000+portOptionalWorkersWorker inference servers (9001, 9002, 9003, ...)

Examples

Example 1: Manual Worker Management

bash
# 1. Spawn worker (talk to HIVE directly)
curl -X POST http://localhost:7835/v1/jobs \
-d '{"operation": "worker_spawn", "hive_id": "localhost", "model": "llama-3-8b"}'
# 2. Wait for worker heartbeat (automatic)
# 3. Run inference (talk to QUEEN)
curl -X POST http://localhost:7833/v1/jobs \
-d '{"operation": "infer", "model": "llama-3-8b", "prompt": "Hello!"}'

Example 2: Automatic Worker Management

bash
# Just run inference - queen spawns worker if needed
curl -X POST http://localhost:7833/v1/jobs \
-d '{"operation": "infer", "model": "llama-3-8b", "prompt": "Hello!"}'
# Queen internally:
# 1. Checks worker registry
# 2. If no worker: sends WorkerSpawn to hive (internal)
# 3. Waits for worker heartbeat
# 4. Routes inference directly to worker

Next Steps

2025 © rbee. Your private AI cloud, in one command.
GitHubrbee.dev