Files
clawlama/AI.md
T
jason 412ca5575d Create AI.md with project overview and details
Add comprehensive documentation for ClawLama project, detailing architecture, features, installation, and usage.
2026-02-17 22:51:41 -05:00

24 KiB

Project Overview

ClawLama is a CPU-optimized, multi-arch (amd64/arm64), single-container AI assistant that bundles OpenClaw and Ollama into one self-contained Docker image. Zero-cost, fully local, privacy-first — GPU accelerated when available, fully functional without it.

Image: docker.io/casjaysdevdocker/clawlama:latest Base: debian:bookworm-slim Platforms: linux/amd64, linux/arm64 Single container: Both Ollama and OpenClaw run inside one image, no compose required.

Source Reference: Based on iam-veeramalla's OpenClaw + Ollama guide.


Problem Statement

Running OpenClaw with a local Ollama model requires manual multi-step setup: installing OpenClaw, installing Ollama, pulling a model, writing a JSON config, and wiring everything together. ClawLama eliminates this friction by packaging everything into a single container — just docker run.


Architecture

┌───────────────────────────────────────────────────────────┐
│  docker.io/casjaysdevdocker/clawlama:latest               │
│  debian:bookworm-slim | linux/amd64, linux/arm64          │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐  │
│  │                   entrypoint.sh                     │  │
│  │  • Detect arch (amd64/arm64) + GPU (nvidia-smi)     │  │
│  │  • Generate openclaw.json from env vars             │  │
│  │  • Pull model if not cached                         │  │
│  │  • Start Ollama (background)                        │  │
│  │  • Wait for Ollama health                           │  │
│  │  • Start OpenClaw gateway (foreground)              │  │
│  └─────────────────────────────────────────────────────┘  │
│                                                           │
│  ┌──────────────┐    ┌───────────────────┐                │
│  │  OpenClaw    │───▶│  Ollama (CPU/GPU) │                │
│  │  Gateway +   │    │  localhost:11434   │                │
│  │  Agent       │    │  Auto-detects GPU  │                │
│  │  :18789      │    │  at runtime        │                │
│  └──────────────┘    └───────────────────┘                │
│         │                      │                          │
│         ▼                      ▼                          │
│  ┌─────────────┐     ┌────────────────┐                   │
│  │  /data/      │     │  /data/        │                   │
│  │  workspace/  │     │  ollama/       │                   │
│  │  (volume)    │     │  (volume)      │                   │
│  └─────────────┘     └────────────────┘                   │
└───────────────────────────────────────────────────────────┘

Single-Image Architecture

Both Ollama and OpenClaw run inside one container. The entrypoint manages process lifecycle:

  1. Ollama starts as a background process, binding to localhost:11434.
  2. OpenClaw starts as the foreground process after Ollama is healthy, connecting to http://localhost:11434/v1.
  3. If Ollama crashes, the entrypoint detects it and exits (container restarts via Docker's restart policy).
  4. Signals (SIGTERM/SIGINT) are forwarded to both processes for clean shutdown.

GPU Runtime Detection

The image ships no GPU libraries. GPU acceleration is achieved through NVIDIA Container Toolkit runtime passthrough:

  • At startup, entrypoint runs nvidia-smi to detect GPU availability.
  • GPU found: Ollama automatically uses CUDA via the mounted NVIDIA runtime. Logs report GPU model and VRAM.
  • No GPU: Ollama falls back to CPU inference. No errors, no warnings — this is the expected default path.
  • User enables GPU by passing --gpus all to docker run (requires NVIDIA Container Toolkit on host).

Ports

  • 18789 — OpenClaw gateway (exposed)
  • 11434 — Ollama API (internal only by default; expose with -p 11434:11434 for debugging)

Installation Method (No curl | sh)

  • Ollama: Latest release binary downloaded directly from GitHub at build time: https://github.com/ollama/ollama/releases/latest/download/ollama-linux-${TARGETARCH}. No version pinning — always gets the newest stable release.
  • OpenClaw: Installed via npm install -g openclaw@latest. Always gets the newest published version.
  • Node.js: Current LTS from NodeSource apt repository (node_lts.x) with GPG key verification. Auto-advances to next LTS major (e.g., 22 → 24) when Node promotes it.

Features

  1. Single container, single commanddocker run -d docker.io/casjaysdevdocker/clawlama:latest launches both Ollama and OpenClaw. No compose file required for basic use.
  2. debian:bookworm-slim base — Minimal Debian with glibc for full Ollama SIMD compatibility (AVX/AVX2 on amd64, NEON on arm64).
  3. Direct binary installation (no curl | sh):
    • Ollama: latest release binary from GitHub releases (/releases/latest/download/), selected per TARGETARCH.
    • OpenClaw: npm install -g openclaw@latest.
    • Node.js: current LTS from NodeSource apt repo (node_lts.x) with GPG key verification.
  4. Pre-configured OpenClaw ↔ Ollama wiring — OpenClaw config auto-generated at startup pointing to http://localhost:11434/v1 with zero-cost pricing.
  5. Persistent volumes — Two mount points: /data/ollama (model store), /data/workspace (OpenClaw). Survive restarts.
  6. Default model: gpt-oss:20b — Automatically pulled on first launch before OpenClaw starts.
  7. Health checks — Container HEALTHCHECK verifies both Ollama API and OpenClaw gateway.
  8. Environment variable overrides:
    • CLAWLAMA_MODEL — Model to pull and use (default: gpt-oss:20b)
    • CLAWLAMA_CONTEXT_WINDOW — Context window size (default: 131072)
    • CLAWLAMA_MAX_TOKENS — Max output tokens (default: 8192)
    • CLAWLAMA_MAX_CONCURRENT — Agent concurrency (default: 4)
    • CLAWLAMA_SUBAGENT_CONCURRENT — Subagent concurrency (default: 8)
    • CLAWLAMA_OPENCLAW_PORT — OpenClaw gateway port (default: 18789)
    • OLLAMA_NUM_THREADS — CPU threads for inference (default: auto-detect physical cores)
    • OLLAMA_NUM_PARALLEL — Max parallel requests (default: 1)
    • OLLAMA_MAX_LOADED_MODELS — Models in memory (default: 1)
    • OLLAMA_HOST — Ollama bind address (default: 127.0.0.1:11434)
  9. Multi-arch (amd64 + arm64) — Single manifest tag built with docker buildx. Ollama binary selected by TARGETARCH. All scripts POSIX shell.
  10. Runtime GPU detection — Entrypoint probes nvidia-smi. GPU used automatically if available via --gpus all. No GPU libs in image; NVIDIA Container Toolkit handles passthrough. CPU is the default and primary path.
  11. Model swap without rebuild — Changing CLAWLAMA_MODEL and restarting pulls the new model and regenerates config.
  12. CPU performance auto-tuning — Entrypoint auto-detects physical cores, available RAM, and sets OLLAMA_NUM_THREADS optimally if unset. Logs detected values.
  13. Telegram integration helper — Optional CLAWLAMA_TELEGRAM_BOT_TOKEN env var auto-configures Telegram channel.
  14. Full tool profile with layered restrictions — All OpenClaw tools enabled via profile: "full". Git and /etc restrictions enforced via TOOLS.md (soft) and container filesystem (hard).
  15. Startup banner — Print connection info, detected arch, CPU cores, RAM, GPU status, and model to stdout.
  16. Quantized model recommendations — README documents CPU-friendly models by RAM tier:
    • 8 GB RAM: 7B Q4 variants
    • 16 GB RAM: gpt-oss:20b (default) or 13B Q5
    • 32+ GB RAM: 20B+ full or 34B Q4
  17. docker-compose.yml included — Provided for users who prefer compose, with volume mounts and restart policy pre-configured.
  18. Multi-model support — Comma-separated CLAWLAMA_MODELS env var configures multiple models in the OpenClaw provider config.
  19. Backup/restore scripts — Shell scripts to tar /data volumes for migration.
  20. Portainer/Dockge compatible — Compose file works with popular Docker management UIs.
  21. Architecture detection in logs — Log detected arch and SIMD instruction sets (AVX, AVX2, AVX-512, NEON) for performance troubleshooting.
  22. GPU VRAM-aware model selection — When GPU detected, log VRAM and suggest optimal model/quantization for available resources.

File Structure

clawlama/
├── AI.md                              # This spec
├── TODO.AI.md                         # Task tracking
├── Dockerfile                         # Multi-stage, multi-arch (amd64 + arm64)
├── docker-compose.yml                 # Optional compose file for convenience
├── .env.example                       # Template environment variables
├── rootfs/
│   ├── usr/local/bin/
│   │   ├── entrypoint.sh             # Main entrypoint: detect GPU, gen config, start services
│   │   └── healthcheck.sh            # Health check script for HEALTHCHECK instruction
│   └── etc/clawlama/
│       ├── openclaw.template.json    # OpenClaw config template (envsubst-ready)
│       └── TOOLS.md                  # Agent tool usage rules (git deny, /etc deny)
├── scripts/
│   ├── build.sh                      # Multi-arch buildx build + push
│   ├── backup.sh                     # Backup /data volumes
│   └── restore.sh                    # Restore /data volumes
└── README.md                         # User-facing documentation

Quick Start

# CPU-only (default)
docker run -d \
  --name clawlama \
  -v clawlama-data:/data \
  -p 18789:18789 \
  docker.io/casjaysdevdocker/clawlama:latest

# With NVIDIA GPU acceleration
docker run -d \
  --name clawlama \
  --gpus all \
  -v clawlama-data:/data \
  -p 18789:18789 \
  docker.io/casjaysdevdocker/clawlama:latest

# Custom model + expose Ollama API for debugging
docker run -d \
  --name clawlama \
  -v clawlama-data:/data \
  -p 18789:18789 \
  -p 11434:11434 \
  -e CLAWLAMA_MODEL=qwen2:7b \
  -e OLLAMA_HOST=0.0.0.0:11434 \
  docker.io/casjaysdevdocker/clawlama:latest

Dockerfile Sketch

# ── Stage 1: Build dependencies ──────────────────────────────
FROM debian:bookworm-slim AS builder

ARG TARGETARCH

# Install Node.js LTS from NodeSource apt repo (no curl | sh)
RUN apt-get update && apt-get install -y --no-install-recommends \
      ca-certificates curl gnupg gettext-base && \
    mkdir -p /etc/apt/keyrings && \
    curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key \
      | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg && \
    echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_lts.x nodistro main" \
      > /etc/apt/sources.list.d/nodesource.list && \
    apt-get update && apt-get install -y --no-install-recommends nodejs && \
    rm -rf /var/lib/apt/lists/*

# Download latest Ollama binary directly (no curl | sh, no version pinning)
RUN curl -fsSL -o /usr/local/bin/ollama \
      "https://github.com/ollama/ollama/releases/latest/download/ollama-linux-${TARGETARCH}" && \
    chmod +x /usr/local/bin/ollama

# Install latest OpenClaw via npm
RUN npm install -g openclaw@latest

# ── Stage 2: Runtime ─────────────────────────────────────────
FROM debian:bookworm-slim

# Install Node.js LTS runtime (same repo method, no dev packages)
RUN apt-get update && apt-get install -y --no-install-recommends \
      ca-certificates curl gnupg tini procps gettext-base && \
    mkdir -p /etc/apt/keyrings && \
    curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key \
      | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg && \
    echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_lts.x nodistro main" \
      > /etc/apt/sources.list.d/nodesource.list && \
    apt-get update && apt-get install -y --no-install-recommends nodejs && \
    rm -rf /var/lib/apt/lists/*

# Copy Ollama binary
COPY --from=builder /usr/local/bin/ollama /usr/local/bin/ollama

# Copy OpenClaw global install
COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
COPY --from=builder /usr/bin/openclaw /usr/bin/openclaw

# Copy rootfs overlay
COPY rootfs/ /

# Create non-root user and data directories
RUN groupadd -r clawlama && useradd -r -g clawlama -m clawlama && \
    mkdir -p /data/ollama /data/workspace && \
    chown -R clawlama:clawlama /data

# Hard restriction: make /etc read-only for non-root users
RUN chmod -R a-w /etc

# Environment defaults (CPU-optimized)
ENV CLAWLAMA_MODEL=gpt-oss:20b \
    CLAWLAMA_CONTEXT_WINDOW=131072 \
    CLAWLAMA_MAX_TOKENS=8192 \
    CLAWLAMA_MAX_CONCURRENT=4 \
    CLAWLAMA_SUBAGENT_CONCURRENT=8 \
    CLAWLAMA_OPENCLAW_PORT=18789 \
    OLLAMA_HOST=127.0.0.1:11434 \
    OLLAMA_MODELS=/data/ollama \
    OLLAMA_NUM_PARALLEL=1 \
    OLLAMA_MAX_LOADED_MODELS=1

VOLUME ["/data"]
EXPOSE 18789

HEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \
  CMD /usr/local/bin/healthcheck.sh

ENTRYPOINT ["tini", "--"]
CMD ["/usr/local/bin/entrypoint.sh"]

Notes:

  • tini is the PID 1 init for proper signal handling — no zombie processes.
  • TARGETARCH is automatically set by docker buildx (amd64 or arm64).
  • curl used only for apt key download and direct binary fetch — never piped to shell.
  • Builder stage is discarded; runtime image contains only what's needed.
  • OLLAMA_MODELS=/data/ollama ensures models persist in the volume.
  • /etc made non-writable via chmod — hard enforcement of /etc write protection.
  • Entrypoint copies TOOLS.md into /data/workspace/TOOLS.md on first run (OpenClaw reads this as agent instructions).

OpenClaw Configuration

The following JSON config is generated at container startup from environment variables via envsubst:

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://localhost:11434/v1",
        "apiKey": "ollama-local",
        "api": "openai-completions",
        "models": [
          {
            "id": "${CLAWLAMA_MODEL}",
            "name": "${CLAWLAMA_MODEL}",
            "reasoning": false,
            "input": ["text"],
            "cost": {
              "input": 0,
              "output": 0,
              "cacheRead": 0,
              "cacheWrite": 0
            },
            "contextWindow": ${CLAWLAMA_CONTEXT_WINDOW},
            "maxTokens": ${CLAWLAMA_MAX_TOKENS}
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/${CLAWLAMA_MODEL}"
      },
      "workspace": "/data/workspace",
      "maxConcurrent": ${CLAWLAMA_MAX_CONCURRENT},
      "subagents": {
        "maxConcurrent": ${CLAWLAMA_SUBAGENT_CONCURRENT}
      }
    }
  },
  "tools": {
    "profile": "full",
    "exec": {
      "security": "full",
      "ask": "off",
      "backgroundMs": 10000,
      "timeoutSec": 1800,
      "applyPatch": {
        "enabled": true,
        "workspaceOnly": true
      }
    },
    "fs": {
      "workspaceOnly": false
    },
    "elevated": {
      "enabled": false
    }
  }
}

Tool Permission Policy

OpenClaw's tool policy operates at the tool level (allow/deny entire tools like exec, read, write), not at the command or path level. To enforce the desired restrictions (no git commit/push/reset --hard, no writes to /etc), ClawLama uses a layered approach:

Layer Mechanism What It Enforces
Tool profile tools.profile: "full" All tools enabled: group:fs, group:runtime, group:ui, group:sessions, group:memory, group:automation, web_search, web_fetch
Exec security tools.exec.security: "full" Shell commands auto-approved (no prompt). Agent has full exec access.
Workspace TOOLS.md Agent instruction file Soft restrictions: instructs agent to never run git commit, git push, git reset --hard
Container filesystem Dockerfile RUN chmod / read-only mounts Hard restriction: /etc read-only at container level, preventing writes regardless of agent behavior
Elevated mode tools.elevated.enabled: false No host-level exec breakout (good hygiene)
Workspace scope tools.exec.applyPatch.workspaceOnly: true apply_patch operations restricted to workspace directory

Tool Groups (Reference)

OpenClaw's built-in tool groups for use in tools.allow / tools.deny:

Group Tools
group:runtime exec, bash, process
group:fs read, write, edit, apply_patch
group:sessions sessions_list, sessions_history, sessions_send, sessions_spawn, session_status
group:memory memory_search, memory_get
group:ui browser, canvas
group:automation cron, gateway
group:messaging message
group:nodes nodes

Git Restrictions (via TOOLS.md)

Since OpenClaw has no command-pattern deny list for exec, git restrictions are enforced via the workspace TOOLS.md file — an agent instruction document that OpenClaw injects into the system prompt:

<!-- /data/workspace/TOOLS.md -->
# Tool Usage Rules

## Git Restrictions (MANDATORY)
- NEVER run `git commit` in any form
- NEVER run `git push` in any form
- NEVER run `git reset --hard` in any form
- All other git commands are allowed (status, diff, log, add, branch, checkout, clone, pull, stash, etc.)

## Filesystem Restrictions
- Do NOT write to or delete files in /etc/
- The /etc directory is read-only at the container level

Important: TOOLS.md is a soft restriction — the LLM is instructed not to run these commands, but it is not technically blocked by OpenClaw's tool policy engine. For hard enforcement, users should enable exec approvals (tools.exec.ask: "always"). The container-level /etc read-only mount is a hard restriction regardless.

/etc Protection (via Container)

Since OpenClaw's tools.fs.workspaceOnly is an all-or-nothing toggle, and we want reads everywhere but writes denied only to /etc, this is enforced at the Docker layer:

RUN chmod -R a-w /etc

Override Path

Users can override tool policy by bind-mounting a custom config:

docker run -v ./my-openclaw.json:/data/workspace/.openclaw/openclaw.json ...

Technical Constraints

  • CPU-optimized, GPU-optional — Image ships zero GPU libraries. Ollama runs CPU inference by default. GPU is activated automatically when user passes --gpus all and NVIDIA Container Toolkit is installed on host. No image rebuild needed.
  • Direct binary installation only — No curl | sh or curl | bash anywhere in the Dockerfile. All software installed via apt packages (Node.js via NodeSource repo with GPG key), direct binary download (Ollama from GitHub releases), and npm package manager (OpenClaw).
  • No version pinning — Ollama uses /releases/latest/download/, OpenClaw uses @latest, Node.js uses node_lts.x. Each docker build picks up the newest stable versions. No version ARGs to maintain.
  • Multi-arch manifest — Published image contains both linux/amd64 and linux/arm64. Built with docker buildx.
  • Platform-specific performance:
    • amd64: Ollama leverages AVX/AVX2/AVX-512 SIMD when available (most x86_64 CPUs from 2013+).
    • arm64: Ollama leverages NEON SIMD (all ARMv8+). Apple Silicon performs well; Raspberry Pi 5 is functional but slower.
  • Node.js LTS required for OpenClaw (currently ≥ 22).
  • Single-process entrypoint pattern — Ollama runs as background process, OpenClaw as foreground. Entrypoint handles process supervision, signal forwarding, and crash detection. No external supervisor (s6, supervisord) required.
  • Ollama must be healthy before OpenClaw starts — Entrypoint polls localhost:11434 with retry loop before launching OpenClaw.
  • No external API keys required — entire stack is zero-cost by design.
  • Model storage can be large (20B model ≈ 12-15 GB); /data/ollama volume mount is mandatory, not tmpfs.
  • RAM requirements (CPU inference):
    • 7B Q4 model: ~4-6 GB RAM minimum
    • 13B Q4 model: ~8-10 GB RAM minimum
    • 20B model: ~16 GB RAM minimum (default)
    • Recommend at least 2 GB headroom above model size for OpenClaw + Node.js + OS.
  • RAM requirements (GPU inference): Model must fit in VRAM. Partial offload (CPU+GPU split) is handled automatically by Ollama.
  • apiKey: "ollama-local" — Ollama doesn't require auth but OpenClaw config requires a non-empty value; this is a dummy placeholder.
  • Soft vs hard restrictions — OpenClaw's tool policy operates at tool granularity only. Git command restrictions are soft (TOOLS.md). /etc write protection is hard (OS-level chmod). For maximum safety, enable tools.exec.ask: "always".
  • Image size target — Under 500 MB compressed (excluding pulled models).

Security Considerations

  • OpenClaw gateway should NOT be exposed to the public internet without authentication.
  • Ollama API binds to 127.0.0.1 inside the container by default — not accessible from host unless explicitly exposed.
  • All data stays local — no telemetry, no cloud calls, no API key leakage.
  • Container runs as non-root user (clawlama) where possible. Entrypoint drops privileges after setup.
  • OpenClaw's prompt injection surface is inherited; users should review OpenClaw's security docs before enabling messaging integrations.
  • When GPU passthrough is enabled (--gpus all), the container gains access to host GPU devices — standard NVIDIA Container Toolkit security model applies.

Success Criteria

  1. docker pull docker.io/casjaysdevdocker/clawlama:latest succeeds on both amd64 and arm64 hosts.
  2. docker run -d -v clawlama-data:/data -p 18789:18789 docker.io/casjaysdevdocker/clawlama:latest brings up both services with no manual intervention.
  3. Ollama model is pulled automatically on first run using CPU inference.
  4. OpenClaw agent responds to queries using the local Ollama model within 120 seconds of container start (CPU inference baseline; faster with GPU).
  5. Changing CLAWLAMA_MODEL env var and restarting pulls the new model and regenerates config.
  6. Container restart preserves all workspace data and downloaded models via /data volume.
  7. docker stop && docker start recovers to working state.
  8. Runs successfully on: x86_64 Linux server (cloud VM), Apple Silicon Mac (Docker Desktop), Raspberry Pi 5 (arm64).
  9. When launched with --gpus all on an NVIDIA host, Ollama detects and uses GPU — verified in logs.
  10. When launched without --gpus on any host, Ollama runs CPU-only — no GPU-related errors in logs.

Out of Scope

  • Baking GPU libraries into the image — GPU support is via NVIDIA Container Toolkit runtime passthrough only.
  • AMD ROCm / Intel Arc GPU support — Only NVIDIA GPUs supported via container toolkit.
  • Custom OpenClaw skill development (users add their own post-deploy).
  • Building or fine-tuning custom Ollama models.
  • Production-grade reverse proxy / TLS termination (user's responsibility).
  • OpenClaw's built-in onboarding wizard (openclaw onboard) — replaced by container auto-config.
  • iMessage / BlueBubbles / platform-specific integrations requiring host OS access.

References