Skip to content
Skip to Content
Getting StartedGetting Started

Getting Started

This guide will help you set up rbee and run your first LLM inference.

Prerequisites

Before you begin, ensure you have:

  • A Linux system with CUDA-capable GPU (NVIDIA)
  • Docker installed (optional, for containerized deployment)
  • At least 16GB of RAM
  • 50GB+ of free disk space for models

Installation

Using the CLI

# Install rbee CLI curl -sSL https://install.rbee.dev | sh # Verify installation rbee --version

Using Docker

# Pull the latest image docker pull rbee/orchestrator:latest # Run the orchestrator docker run -d --gpus all -p 8080:8080 rbee/orchestrator:latest

Configuration

Create a configuration file at ~/.rbee/config.yaml:

orchestrator: host: 0.0.0.0 port: 8080 models: - name: llama-3.1-8b source: huggingface repo: meta-llama/Llama-3.1-8B

Running Your First Inference

# Start the orchestrator rbee start # Send a test request curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}] }'

Next Steps

2025 © rbee. Your private AI cloud, in one command.
GitHubrbee.dev