Skip to content

Server Guide

Run LLMForge as an API server.

Basic Server

from llmforge.server import serve

serve(
    model="meta-llama/Llama-3.2-1B-Instruct",
    host="0.0.0.0",
    port=8000,
)

With Authentication

from llmforge.server import serve

serve(
    model="meta-llama/Llama-3.2-1B-Instruct",
    api_key="your-api-key",
    allowed_origins=["https://yourapp.com"],
)

Using the Server

Generate Endpoint

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR-API-KEY" \
  -d '{
    "prompt": "Write a poem",
    "max_tokens": 128,
    "temperature": 0.7
  }'

Chat Endpoint

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 128
  }'

OpenAI-Compatible API

from llmforge.server import OpenAICompatibleServer

server = OpenAICompatibleServer(
    model="meta-llama/Llama-3.2-1B-Instruct",
)

OpenAI SDK Usage

from openai import OpenAI

client = OpenAI(
    api_key="sk-dummy",
    base_url="http://localhost:8000/v1"
)

response = client.chat.completions.create(
    model="llama-3.2-1b",
    messages=[{"role": "user", "content": "Hi"}],
    max_tokens=128,
)

Server Options

Parameter Default Description
model required Model name or path
host "0.0.0.0" Server host
port 8000 Server port
api_key None API key for auth
allowed_origins ["*"] CORS origins
max_batch_size 8 Max batch size
timeout 120 Request timeout