Complete API Reference

Complete HTTP API reference for all rbee services: Queen, Hive, and Workers.

Queen API (Port 7833)

Health & Info

GET /health

Health check endpoint (no authentication required).


curl http://localhost:7833/health

Response:


{
  "status": "healthy"
}

GET /v1/info

Get Queen service information.


curl http://localhost:7833/v1/info

Response:


{
  "service": "queen-rbee",
  "version": "0.1.0",
  "base_url": "http://localhost:7833",
  "port": 7833
}

Job Management

POST /v1/jobs

Create a new job.


curl -X POST http://localhost:7833/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "spawn_worker",
    "params": {
      "model": "llama-3-8b",
      "worker_type": "cuda",
      "device": 0
    }
  }'

Response:


{
  "job_id": "job_abc123",
  "status": "pending",
  "created_at": "2025-11-08T15:30:00Z"
}

GET /v1/jobs/{job_id}/stream

Stream job events via Server-Sent Events (SSE).


curl -N http://localhost:7833/v1/jobs/job_abc123/stream

Event Stream:


data: {"event":"job_started","timestamp":"2025-11-08T15:30:01Z"}

data: {"event":"progress","progress":50,"message":"Downloading model..."}

data: {"event":"job_completed","result":{"worker_id":"worker_xyz"}}

DELETE /v1/jobs/{job_id}

Cancel a running job.


curl -X DELETE http://localhost:7833/v1/jobs/job_abc123

Response:


{
  "job_id": "job_abc123",
  "status": "cancelled"
}

Hive Management

POST /v1/hive/ready

Hive discovery callback (called by hives on startup).


curl -X POST http://localhost:7833/v1/hive/ready \
  -H "Content-Type: application/json" \
  -d '{
    "hive_id": "localhost",
    "hive_url": "http://localhost:7835"
  }'

GET /v1/heartbeats/stream

Stream live heartbeat events from all hives (SSE).


curl -N http://localhost:7833/v1/heartbeats/stream

Event Stream:


data: {"hive_id":"localhost","status":"healthy","workers":3}

data: {"hive_id":"gpu-server","status":"healthy","workers":5}

System

POST /v1/shutdown

Graceful shutdown of Queen service.


curl -X POST http://localhost:7833/v1/shutdown

Hive API (Port 7835)

Health

GET /health

Health check endpoint.


curl http://localhost:7835/health

Response:

ok

Note: Hive’s health endpoint returns plain text “ok”, not JSON.

Job Management

POST /v1/jobs

Create a job on this hive (worker/model operations).


curl -X POST http://localhost:7835/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "download_model",
    "params": {
      "model_id": "meta-llama/Llama-3-8B-GGUF",
      "filename": "llama-3-8b-q4_k_m.gguf"
    }
  }'

GET /v1/jobs/{job_id}/stream

Stream job events (SSE).


curl -N http://localhost:7835/v1/jobs/job_xyz/stream

DELETE /v1/jobs/{job_id}

Cancel a job.


curl -X DELETE http://localhost:7835/v1/jobs/job_xyz

Capabilities

GET /v1/capabilities

Get hive capabilities (GPU, CPU, available workers).


curl http://localhost:7835/v1/capabilities

Response:


{
  "cpu_cores": 16,
  "total_ram_gb": 64,
  "gpus": [
    {
      "id": 0,
      "name": "NVIDIA RTX 3090",
      "vram_gb": 24
    }
  ],
  "workers": [
    {
      "type": "cpu",
      "available": true
    },
    {
      "type": "cuda",
      "available": true
    }
  ]
}

Telemetry

GET /v1/heartbeats/stream

Stream hive heartbeat events (SSE).


curl -N http://localhost:7835/v1/heartbeats/stream

Worker API (Dynamic Ports)

Workers use dynamically assigned ports starting from 8080.

The hive assigns ports automatically when spawning workers. To find a worker’s port:

Check queen telemetry: GET /v1/heartbeats/stream
Check hive telemetry: Worker process stats include port
Use ps aux | grep worker to see command-line args

Examples below use port 8080, but your worker may be on a different port.

Health & Info

GET /health

Worker health check.


curl http://localhost:8080/health

Response:

OK

GET /info

Worker information.


curl http://localhost:8080/info

Response:


{
  "name": "llm-worker-cuda",
  "version": "0.1.0",
  "worker_type": "cuda",
  "model_loaded": "llama-3-8b-q4_k_m",
  "capabilities": ["text-generation"]
}

Inference

POST /v1/infer

Run inference on the worker.


curl -X POST http://localhost:8080/v1/infer \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Once upon a time",
    "max_tokens": 100,
    "temperature": 0.7
  }'

Response:


{
  "output": "Once upon a time, in a land far away...",
  "tokens_generated": 100,
  "inference_time_ms": 1250
}

OpenAI Compatible

POST /v1/chat/completions

OpenAI-compatible chat endpoint.


curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-8b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

See: OpenAI Compatible API for full details.

Authentication

API Token

For non-loopback deployments, use bearer token authentication:


export LLORCH_API_TOKEN="your-secure-token"
 
curl -H "Authorization: Bearer $LLORCH_API_TOKEN" \
  http://queen.example.com:7833/v1/info

See: Security Configuration for setup.

Error Codes

Code	Description
200	Success
201	Created
400	Bad Request - Invalid parameters
401	Unauthorized - Missing or invalid token
404	Not Found - Resource doesn’t exist
500	Internal Server Error
503	Service Unavailable - Service not ready

Error Response Format


{
  "error": {
    "code": "invalid_request",
    "message": "Missing required parameter: model"
  }
}

Server-Sent Events (SSE)

Connection


curl -N http://localhost:7833/v1/heartbeats/stream

Headers:


Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

Event Format


data: {"event":"heartbeat","hive_id":"localhost","timestamp":"2025-11-08T15:30:00Z"}

data: {"event":"worker_spawned","worker_id":"worker_123"}

Reconnection

Clients should implement automatic reconnection with exponential backoff:


const eventSource = new EventSource('/v1/heartbeats/stream');
 
eventSource.onerror = () => {
  // Reconnect after delay
  setTimeout(() => {
    eventSource.close();
    // Create new connection
  }, 5000);
};

Rate Limiting

Current Implementation: No rate limiting

Future: Rate limiting will be added in a future release.

Best Practices:

Limit concurrent SSE connections to 10-20 per client
Use connection pooling
Implement client-side backoff for failed requests

Job Operations

Detailed job operations

OpenAI Compatible API

OpenAI-style endpoints

Security Configuration

Authentication setup

Completed by: TEAM-427
Based on: bin/10_queen_rbee/src/main.rs, bin/20_rbee_hive/src/main.rs, worker implementations

Complete API Reference

Queen API (Port 7833)

Health & Info

Job Management

Hive Management

System

Hive API (Port 7835)

Health

Job Management

Capabilities

Telemetry

Worker API (Dynamic Ports)

Health & Info

Inference

OpenAI Compatible

Authentication

API Token

Error Codes

Error Response Format

Server-Sent Events (SSE)

Connection

Event Format

Reconnection

Rate Limiting

Related Documentation

Job Operations

OpenAI Compatible API

Security Configuration