Complete API Reference
Complete HTTP API reference for all rbee services: Queen, Hive, and Workers.
Queen API (Port 7833)
Health & Info
GET /health
Health check endpoint (no authentication required).
curl http://localhost:7833/healthResponse:
{
"status": "healthy"
}GET /v1/info
Get Queen service information.
curl http://localhost:7833/v1/infoResponse:
{
"service": "queen-rbee",
"version": "0.1.0",
"base_url": "http://localhost:7833",
"port": 7833
}Job Management
POST /v1/jobs
Create a new job.
curl -X POST http://localhost:7833/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "spawn_worker",
"params": {
"model": "llama-3-8b",
"worker_type": "cuda",
"device": 0
}
}'Response:
{
"job_id": "job_abc123",
"status": "pending",
"created_at": "2025-11-08T15:30:00Z"
}GET /v1/jobs/{job_id}/stream
Stream job events via Server-Sent Events (SSE).
curl -N http://localhost:7833/v1/jobs/job_abc123/streamEvent Stream:
data: {"event":"job_started","timestamp":"2025-11-08T15:30:01Z"}
data: {"event":"progress","progress":50,"message":"Downloading model..."}
data: {"event":"job_completed","result":{"worker_id":"worker_xyz"}}DELETE /v1/jobs/{job_id}
Cancel a running job.
curl -X DELETE http://localhost:7833/v1/jobs/job_abc123Response:
{
"job_id": "job_abc123",
"status": "cancelled"
}Hive Management
POST /v1/hive/ready
Hive discovery callback (called by hives on startup).
curl -X POST http://localhost:7833/v1/hive/ready \
-H "Content-Type: application/json" \
-d '{
"hive_id": "localhost",
"hive_url": "http://localhost:7835"
}'GET /v1/heartbeats/stream
Stream live heartbeat events from all hives (SSE).
curl -N http://localhost:7833/v1/heartbeats/streamEvent Stream:
data: {"hive_id":"localhost","status":"healthy","workers":3}
data: {"hive_id":"gpu-server","status":"healthy","workers":5}System
POST /v1/shutdown
Graceful shutdown of Queen service.
curl -X POST http://localhost:7833/v1/shutdownHive API (Port 7835)
Health
GET /health
Health check endpoint.
curl http://localhost:7835/healthResponse:
okNote: Hive’s health endpoint returns plain text “ok”, not JSON.
Job Management
POST /v1/jobs
Create a job on this hive (worker/model operations).
curl -X POST http://localhost:7835/v1/jobs \
-H "Content-Type: application/json" \
-d '{
"operation": "download_model",
"params": {
"model_id": "meta-llama/Llama-3-8B-GGUF",
"filename": "llama-3-8b-q4_k_m.gguf"
}
}'GET /v1/jobs/{job_id}/stream
Stream job events (SSE).
curl -N http://localhost:7835/v1/jobs/job_xyz/streamDELETE /v1/jobs/{job_id}
Cancel a job.
curl -X DELETE http://localhost:7835/v1/jobs/job_xyzCapabilities
GET /v1/capabilities
Get hive capabilities (GPU, CPU, available workers).
curl http://localhost:7835/v1/capabilitiesResponse:
{
"cpu_cores": 16,
"total_ram_gb": 64,
"gpus": [
{
"id": 0,
"name": "NVIDIA RTX 3090",
"vram_gb": 24
}
],
"workers": [
{
"type": "cpu",
"available": true
},
{
"type": "cuda",
"available": true
}
]
}Telemetry
GET /v1/heartbeats/stream
Stream hive heartbeat events (SSE).
curl -N http://localhost:7835/v1/heartbeats/streamWorker API (Dynamic Ports)
Workers use dynamically assigned ports starting from 8080.
The hive assigns ports automatically when spawning workers. To find a worker’s port:
- Check queen telemetry:
GET /v1/heartbeats/stream - Check hive telemetry: Worker process stats include port
- Use
ps aux | grep workerto see command-line args
Examples below use port 8080, but your worker may be on a different port.
Health & Info
GET /health
Worker health check.
curl http://localhost:8080/healthResponse:
OKGET /info
Worker information.
curl http://localhost:8080/infoResponse:
{
"name": "llm-worker-cuda",
"version": "0.1.0",
"worker_type": "cuda",
"model_loaded": "llama-3-8b-q4_k_m",
"capabilities": ["text-generation"]
}Inference
POST /v1/infer
Run inference on the worker.
curl -X POST http://localhost:8080/v1/infer \
-H "Content-Type: application/json" \
-d '{
"prompt": "Once upon a time",
"max_tokens": 100,
"temperature": 0.7
}'Response:
{
"output": "Once upon a time, in a land far away...",
"tokens_generated": 100,
"inference_time_ms": 1250
}OpenAI Compatible
POST /v1/chat/completions
OpenAI-compatible chat endpoint.
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3-8b",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'See: OpenAI Compatible API for full details.
Authentication
API Token
For non-loopback deployments, use bearer token authentication:
export LLORCH_API_TOKEN="your-secure-token"
curl -H "Authorization: Bearer $LLORCH_API_TOKEN" \
http://queen.example.com:7833/v1/infoSee: Security Configuration for setup.
Error Codes
| Code | Description |
|---|---|
| 200 | Success |
| 201 | Created |
| 400 | Bad Request - Invalid parameters |
| 401 | Unauthorized - Missing or invalid token |
| 404 | Not Found - Resource doesn’t exist |
| 500 | Internal Server Error |
| 503 | Service Unavailable - Service not ready |
Error Response Format
{
"error": {
"code": "invalid_request",
"message": "Missing required parameter: model"
}
}Server-Sent Events (SSE)
Connection
curl -N http://localhost:7833/v1/heartbeats/streamHeaders:
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-aliveEvent Format
data: {"event":"heartbeat","hive_id":"localhost","timestamp":"2025-11-08T15:30:00Z"}
data: {"event":"worker_spawned","worker_id":"worker_123"}Reconnection
Clients should implement automatic reconnection with exponential backoff:
const eventSource = new EventSource('/v1/heartbeats/stream');
eventSource.onerror = () => {
// Reconnect after delay
setTimeout(() => {
eventSource.close();
// Create new connection
}, 5000);
};Rate Limiting
Current Implementation: No rate limiting
Future: Rate limiting will be added in a future release.
Best Practices:
- Limit concurrent SSE connections to 10-20 per client
- Use connection pooling
- Implement client-side backoff for failed requests
Related Documentation
Job Operations
Detailed job operations
OpenAI Compatible API
OpenAI-style endpoints
Security Configuration
Authentication setup
Completed by: TEAM-427
Based on: bin/10_queen_rbee/src/main.rs, bin/20_rbee_hive/src/main.rs, worker implementations