Skip to content

Chat API Guide

The chat module provides an interactive chat interface and programmatic chat generation.

Command Line Interface

# Basic chat
llmforge chat --model meta-llama/Llama-3.2-1B-Instruct

# With custom settings
llmforge chat --model meta-llama/Llama-3.2-1B-Instruct \
    --temp 0.8 \
    --max-tokens 512 \
    --system-prompt "You are a helpful assistant"

Chat Commands

Command Description
q Exit chat
r Reset conversation
h Show help

Programmatic Chat

from llmforge.chat import stream_generate
from llmforge import load, make_sampler, make_prompt_cache

# Load model and tokenizer
model, tokenizer = load("meta-llama/Llama-3.2-1B-Instruct")

# Prepare messages
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is PyTorch?"},
]

# Apply chat template
prompt = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True
)

# Create sampler
sampler = make_sampler(
    temperature=0.7,
    top_p=0.9
)

# Create prompt cache
prompt_cache = make_prompt_cache(model)

# Stream generated text
for response in stream_generate(
    model,
    tokenizer,
    prompt,
    max_tokens=256,
    sampler=sampler,
    prompt_cache=prompt_cache
):
    print(response.text, end="")

Parameters

Parameter CLI Flag Type Default Description
model --model str Llama-3.2-3B-Instruct-4bit Model path or HF repo
temperature --temp float 0.0 Sampling temperature
top_p --top-p float 1.0 Nucleus sampling threshold
max_tokens -m int 256 Max tokens to generate
system_prompt --system-prompt str None System prompt
adapter_path --adapter-path str None LoRA adapter path