Skip to content
Skip to Content
ReferenceComplete API Reference

Complete API Reference

Complete HTTP API reference for all rbee services: Queen, Hive, and Workers.

Queen API (Port 7833)

Health & Info

GET /health

Health check endpoint (no authentication required).

curl http://localhost:7833/health

Response:

{ "status": "healthy" }

GET /v1/info

Get Queen service information.

curl http://localhost:7833/v1/info

Response:

{ "service": "queen-rbee", "version": "0.1.0", "base_url": "http://localhost:7833", "port": 7833 }

Job Management

POST /v1/jobs

Create a new job.

curl -X POST http://localhost:7833/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "spawn_worker", "params": { "model": "llama-3-8b", "worker_type": "cuda", "device": 0 } }'

Response:

{ "job_id": "job_abc123", "status": "pending", "created_at": "2025-11-08T15:30:00Z" }

GET /v1/jobs/{job_id}/stream

Stream job events via Server-Sent Events (SSE).

curl -N http://localhost:7833/v1/jobs/job_abc123/stream

Event Stream:

data: {"event":"job_started","timestamp":"2025-11-08T15:30:01Z"} data: {"event":"progress","progress":50,"message":"Downloading model..."} data: {"event":"job_completed","result":{"worker_id":"worker_xyz"}}

DELETE /v1/jobs/{job_id}

Cancel a running job.

curl -X DELETE http://localhost:7833/v1/jobs/job_abc123

Response:

{ "job_id": "job_abc123", "status": "cancelled" }

Hive Management

POST /v1/hive/ready

Hive discovery callback (called by hives on startup).

curl -X POST http://localhost:7833/v1/hive/ready \ -H "Content-Type: application/json" \ -d '{ "hive_id": "localhost", "hive_url": "http://localhost:7835" }'

GET /v1/heartbeats/stream

Stream live heartbeat events from all hives (SSE).

curl -N http://localhost:7833/v1/heartbeats/stream

Event Stream:

data: {"hive_id":"localhost","status":"healthy","workers":3} data: {"hive_id":"gpu-server","status":"healthy","workers":5}

System

POST /v1/shutdown

Graceful shutdown of Queen service.

curl -X POST http://localhost:7833/v1/shutdown

Hive API (Port 7835)

Health

GET /health

Health check endpoint.

curl http://localhost:7835/health

Response:

ok

Note: Hive’s health endpoint returns plain text “ok”, not JSON.

Job Management

POST /v1/jobs

Create a job on this hive (worker/model operations).

curl -X POST http://localhost:7835/v1/jobs \ -H "Content-Type: application/json" \ -d '{ "operation": "download_model", "params": { "model_id": "meta-llama/Llama-3-8B-GGUF", "filename": "llama-3-8b-q4_k_m.gguf" } }'

GET /v1/jobs/{job_id}/stream

Stream job events (SSE).

curl -N http://localhost:7835/v1/jobs/job_xyz/stream

DELETE /v1/jobs/{job_id}

Cancel a job.

curl -X DELETE http://localhost:7835/v1/jobs/job_xyz

Capabilities

GET /v1/capabilities

Get hive capabilities (GPU, CPU, available workers).

curl http://localhost:7835/v1/capabilities

Response:

{ "cpu_cores": 16, "total_ram_gb": 64, "gpus": [ { "id": 0, "name": "NVIDIA RTX 3090", "vram_gb": 24 } ], "workers": [ { "type": "cpu", "available": true }, { "type": "cuda", "available": true } ] }

Telemetry

GET /v1/heartbeats/stream

Stream hive heartbeat events (SSE).

curl -N http://localhost:7835/v1/heartbeats/stream

Worker API (Dynamic Ports)

Workers use dynamically assigned ports starting from 8080.

The hive assigns ports automatically when spawning workers. To find a worker’s port:

  • Check queen telemetry: GET /v1/heartbeats/stream
  • Check hive telemetry: Worker process stats include port
  • Use ps aux | grep worker to see command-line args

Examples below use port 8080, but your worker may be on a different port.

Health & Info

GET /health

Worker health check.

curl http://localhost:8080/health

Response:

OK

GET /info

Worker information.

curl http://localhost:8080/info

Response:

{ "name": "llm-worker-cuda", "version": "0.1.0", "worker_type": "cuda", "model_loaded": "llama-3-8b-q4_k_m", "capabilities": ["text-generation"] }

Inference

POST /v1/infer

Run inference on the worker.

curl -X POST http://localhost:8080/v1/infer \ -H "Content-Type: application/json" \ -d '{ "prompt": "Once upon a time", "max_tokens": 100, "temperature": 0.7 }'

Response:

{ "output": "Once upon a time, in a land far away...", "tokens_generated": 100, "inference_time_ms": 1250 }

OpenAI Compatible

POST /v1/chat/completions

OpenAI-compatible chat endpoint.

curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3-8b", "messages": [ {"role": "user", "content": "Hello!"} ] }'

See: OpenAI Compatible API for full details.

Authentication

API Token

For non-loopback deployments, use bearer token authentication:

export LLORCH_API_TOKEN="your-secure-token" curl -H "Authorization: Bearer $LLORCH_API_TOKEN" \ http://queen.example.com:7833/v1/info

See: Security Configuration for setup.

Error Codes

CodeDescription
200Success
201Created
400Bad Request - Invalid parameters
401Unauthorized - Missing or invalid token
404Not Found - Resource doesn’t exist
500Internal Server Error
503Service Unavailable - Service not ready

Error Response Format

{ "error": { "code": "invalid_request", "message": "Missing required parameter: model" } }

Server-Sent Events (SSE)

Connection

curl -N http://localhost:7833/v1/heartbeats/stream

Headers:

Content-Type: text/event-stream Cache-Control: no-cache Connection: keep-alive

Event Format

data: {"event":"heartbeat","hive_id":"localhost","timestamp":"2025-11-08T15:30:00Z"} data: {"event":"worker_spawned","worker_id":"worker_123"}

Reconnection

Clients should implement automatic reconnection with exponential backoff:

const eventSource = new EventSource('/v1/heartbeats/stream'); eventSource.onerror = () => { // Reconnect after delay setTimeout(() => { eventSource.close(); // Create new connection }, 5000); };

Rate Limiting

Current Implementation: No rate limiting

Future: Rate limiting will be added in a future release.

Best Practices:

  • Limit concurrent SSE connections to 10-20 per client
  • Use connection pooling
  • Implement client-side backoff for failed requests

Completed by: TEAM-427
Based on: bin/10_queen_rbee/src/main.rs, bin/20_rbee_hive/src/main.rs, worker implementations

2025 © rbee. Your private AI cloud, in one command.
GitHubrbee.dev