clawlama/AI.md

## Project Overview

**ClawLama** is a CPU-optimized, multi-arch (amd64/arm64), single-container AI assistant that bundles [OpenClaw](https://github.com/openclaw/openclaw) and [Ollama](https://ollama.com) into one self-contained Docker image. Zero-cost, fully local, privacy-first — GPU accelerated when available, fully functional without it.

**Image:** `docker.io/casjaysdevdocker/clawlama:latest`
**Base:** `debian:bookworm-slim`
**Platforms:** `linux/amd64`, `linux/arm64`
**Single container:** Both Ollama and OpenClaw run inside one image, no compose required.

**Source Reference:** Based on [iam-veeramalla's OpenClaw + Ollama guide](https://gist.github.com/iam-veeramalla/9d10f968038ee76d5bc374b44f0cf8bb).

---

## Problem Statement

Running OpenClaw with a local Ollama model requires manual multi-step setup: installing OpenClaw, installing Ollama, pulling a model, writing a JSON config, and wiring everything together. ClawLama eliminates this friction by packaging everything into a single container — just `docker run`.

---

## Architecture

```
┌───────────────────────────────────────────────────────────┐
│  docker.io/casjaysdevdocker/clawlama:latest               │
│  debian:bookworm-slim | linux/amd64, linux/arm64          │
│                                                           │
│  ┌─────────────────────────────────────────────────────┐  │
│  │                   entrypoint.sh                     │  │
│  │  • Detect arch (amd64/arm64) + GPU (nvidia-smi)     │  │
│  │  • Generate openclaw.json from env vars             │  │
│  │  • Pull model if not cached                         │  │
│  │  • Start Ollama (background)                        │  │
│  │  • Wait for Ollama health                           │  │
│  │  • Start OpenClaw gateway (foreground)              │  │
│  └─────────────────────────────────────────────────────┘  │
│                                                           │
│  ┌──────────────┐    ┌───────────────────┐                │
│  │  OpenClaw    │───▶│  Ollama (CPU/GPU) │                │
│  │  Gateway +   │    │  localhost:11434   │                │
│  │  Agent       │    │  Auto-detects GPU  │                │
│  │  :18789      │    │  at runtime        │                │
│  └──────────────┘    └───────────────────┘                │
│         │                      │                          │
│         ▼                      ▼                          │
│  ┌─────────────┐     ┌────────────────┐                   │
│  │  /data/      │     │  /data/        │                   │
│  │  workspace/  │     │  ollama/       │                   │
│  │  (volume)    │     │  (volume)      │                   │
│  └─────────────┘     └────────────────┘                   │
└───────────────────────────────────────────────────────────┘
```

### Single-Image Architecture

Both Ollama and OpenClaw run inside one container. The entrypoint manages process lifecycle:

1. **Ollama** starts as a background process, binding to `localhost:11434`.
2. **OpenClaw** starts as the foreground process after Ollama is healthy, connecting to `http://localhost:11434/v1`.
3. If Ollama crashes, the entrypoint detects it and exits (container restarts via Docker's restart policy).
4. Signals (SIGTERM/SIGINT) are forwarded to both processes for clean shutdown.

### GPU Runtime Detection

The image ships **no GPU libraries**. GPU acceleration is achieved through NVIDIA Container Toolkit runtime passthrough:

- At startup, entrypoint runs `nvidia-smi` to detect GPU availability.
- **GPU found:** Ollama automatically uses CUDA via the mounted NVIDIA runtime. Logs report GPU model and VRAM.
- **No GPU:** Ollama falls back to CPU inference. No errors, no warnings — this is the expected default path.
- User enables GPU by passing `--gpus all` to `docker run` (requires NVIDIA Container Toolkit on host).

### Ports

- `18789` — OpenClaw gateway (exposed)
- `11434` — Ollama API (internal only by default; expose with `-p 11434:11434` for debugging)

### Installation Method (No curl | sh)

- **Ollama:** Latest release binary downloaded directly from GitHub at build time: `https://github.com/ollama/ollama/releases/latest/download/ollama-linux-${TARGETARCH}`. No version pinning — always gets the newest stable release.
- **OpenClaw:** Installed via `npm install -g openclaw@latest`. Always gets the newest published version.
- **Node.js:** Current LTS from [NodeSource apt repository](https://deb.nodesource.com) (`node_lts.x`) with GPG key verification. Auto-advances to next LTS major (e.g., 22 → 24) when Node promotes it.

---

## Features

1. **Single container, single command** — `docker run -d docker.io/casjaysdevdocker/clawlama:latest` launches both Ollama and OpenClaw. No compose file required for basic use.
2. **debian:bookworm-slim base** — Minimal Debian with glibc for full Ollama SIMD compatibility (AVX/AVX2 on amd64, NEON on arm64).
3. **Direct binary installation (no curl | sh):**
   - Ollama: latest release binary from GitHub releases (`/releases/latest/download/`), selected per `TARGETARCH`.
   - OpenClaw: `npm install -g openclaw@latest`.
   - Node.js: current LTS from NodeSource apt repo (`node_lts.x`) with GPG key verification.
4. **Pre-configured OpenClaw ↔ Ollama wiring** — OpenClaw config auto-generated at startup pointing to `http://localhost:11434/v1` with zero-cost pricing.
5. **Persistent volumes** — Two mount points: `/data/ollama` (model store), `/data/workspace` (OpenClaw). Survive restarts.
6. **Default model: `gpt-oss:20b`** — Automatically pulled on first launch before OpenClaw starts.
7. **Health checks** — Container `HEALTHCHECK` verifies both Ollama API and OpenClaw gateway.
8. **Environment variable overrides:**
   - `CLAWLAMA_MODEL` — Model to pull and use (default: `gpt-oss:20b`)
   - `CLAWLAMA_CONTEXT_WINDOW` — Context window size (default: `131072`)
   - `CLAWLAMA_MAX_TOKENS` — Max output tokens (default: `8192`)
   - `CLAWLAMA_MAX_CONCURRENT` — Agent concurrency (default: `4`)
   - `CLAWLAMA_SUBAGENT_CONCURRENT` — Subagent concurrency (default: `8`)
   - `CLAWLAMA_OPENCLAW_PORT` — OpenClaw gateway port (default: `18789`)
   - `OLLAMA_NUM_THREADS` — CPU threads for inference (default: auto-detect physical cores)
   - `OLLAMA_NUM_PARALLEL` — Max parallel requests (default: `1`)
   - `OLLAMA_MAX_LOADED_MODELS` — Models in memory (default: `1`)
   - `OLLAMA_HOST` — Ollama bind address (default: `127.0.0.1:11434`)
9. **Multi-arch (amd64 + arm64)** — Single manifest tag built with `docker buildx`. Ollama binary selected by `TARGETARCH`. All scripts POSIX shell.
10. **Runtime GPU detection** — Entrypoint probes `nvidia-smi`. GPU used automatically if available via `--gpus all`. No GPU libs in image; NVIDIA Container Toolkit handles passthrough. CPU is the default and primary path.
11. **Model swap without rebuild** — Changing `CLAWLAMA_MODEL` and restarting pulls the new model and regenerates config.
12. **CPU performance auto-tuning** — Entrypoint auto-detects physical cores, available RAM, and sets `OLLAMA_NUM_THREADS` optimally if unset. Logs detected values.
13. **Telegram integration helper** — Optional `CLAWLAMA_TELEGRAM_BOT_TOKEN` env var auto-configures Telegram channel.
14. **Full tool profile with layered restrictions** — All OpenClaw tools enabled via `profile: "full"`. Git and `/etc` restrictions enforced via TOOLS.md (soft) and container filesystem (hard).
15. **Startup banner** — Print connection info, detected arch, CPU cores, RAM, GPU status, and model to stdout.
16. **Quantized model recommendations** — README documents CPU-friendly models by RAM tier:
    - 8 GB RAM: 7B Q4 variants
    - 16 GB RAM: `gpt-oss:20b` (default) or 13B Q5
    - 32+ GB RAM: 20B+ full or 34B Q4
17. **docker-compose.yml included** — Provided for users who prefer compose, with volume mounts and restart policy pre-configured.
18. **Multi-model support** — Comma-separated `CLAWLAMA_MODELS` env var configures multiple models in the OpenClaw provider config.
19. **Backup/restore scripts** — Shell scripts to tar `/data` volumes for migration.
20. **Portainer/Dockge compatible** — Compose file works with popular Docker management UIs.
21. **Architecture detection in logs** — Log detected arch and SIMD instruction sets (AVX, AVX2, AVX-512, NEON) for performance troubleshooting.
22. **GPU VRAM-aware model selection** — When GPU detected, log VRAM and suggest optimal model/quantization for available resources.

---

## File Structure

```
clawlama/
├── AI.md                              # This spec
├── TODO.AI.md                         # Task tracking
├── Dockerfile                         # Multi-stage, multi-arch (amd64 + arm64)
├── docker-compose.yml                 # Optional compose file for convenience
├── .env.example                       # Template environment variables
├── rootfs/
│   ├── usr/local/bin/
│   │   ├── entrypoint.sh             # Main entrypoint: detect GPU, gen config, start services
│   │   └── healthcheck.sh            # Health check script for HEALTHCHECK instruction
│   └── etc/clawlama/
│       ├── openclaw.template.json    # OpenClaw config template (envsubst-ready)
│       └── TOOLS.md                  # Agent tool usage rules (git deny, /etc deny)
├── scripts/
│   ├── build.sh                      # Multi-arch buildx build + push
│   ├── backup.sh                     # Backup /data volumes
│   └── restore.sh                    # Restore /data volumes
└── README.md                         # User-facing documentation
```

---

## Quick Start

```bash
# CPU-only (default)
docker run -d \
  --name clawlama \
  -v clawlama-data:/data \
  -p 18789:18789 \
  docker.io/casjaysdevdocker/clawlama:latest

# With NVIDIA GPU acceleration
docker run -d \
  --name clawlama \
  --gpus all \
  -v clawlama-data:/data \
  -p 18789:18789 \
  docker.io/casjaysdevdocker/clawlama:latest

# Custom model + expose Ollama API for debugging
docker run -d \
  --name clawlama \
  -v clawlama-data:/data \
  -p 18789:18789 \
  -p 11434:11434 \
  -e CLAWLAMA_MODEL=qwen2:7b \
  -e OLLAMA_HOST=0.0.0.0:11434 \
  docker.io/casjaysdevdocker/clawlama:latest
```

---

## Dockerfile Sketch

```dockerfile
# ── Stage 1: Build dependencies ──────────────────────────────
FROM debian:bookworm-slim AS builder

ARG TARGETARCH

# Install Node.js LTS from NodeSource apt repo (no curl | sh)
RUN apt-get update && apt-get install -y --no-install-recommends \
      ca-certificates curl gnupg gettext-base && \
    mkdir -p /etc/apt/keyrings && \
    curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key \
      | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg && \
    echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_lts.x nodistro main" \
      > /etc/apt/sources.list.d/nodesource.list && \
    apt-get update && apt-get install -y --no-install-recommends nodejs && \
    rm -rf /var/lib/apt/lists/*

# Download latest Ollama binary directly (no curl | sh, no version pinning)
RUN curl -fsSL -o /usr/local/bin/ollama \
      "https://github.com/ollama/ollama/releases/latest/download/ollama-linux-${TARGETARCH}" && \
    chmod +x /usr/local/bin/ollama

# Install latest OpenClaw via npm
RUN npm install -g openclaw@latest

# ── Stage 2: Runtime ─────────────────────────────────────────
FROM debian:bookworm-slim

# Install Node.js LTS runtime (same repo method, no dev packages)
RUN apt-get update && apt-get install -y --no-install-recommends \
      ca-certificates curl gnupg tini procps gettext-base && \
    mkdir -p /etc/apt/keyrings && \
    curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key \
      | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg && \
    echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_lts.x nodistro main" \
      > /etc/apt/sources.list.d/nodesource.list && \
    apt-get update && apt-get install -y --no-install-recommends nodejs && \
    rm -rf /var/lib/apt/lists/*

# Copy Ollama binary
COPY --from=builder /usr/local/bin/ollama /usr/local/bin/ollama

# Copy OpenClaw global install
COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
COPY --from=builder /usr/bin/openclaw /usr/bin/openclaw

# Copy rootfs overlay
COPY rootfs/ /

# Create non-root user and data directories
RUN groupadd -r clawlama && useradd -r -g clawlama -m clawlama && \
    mkdir -p /data/ollama /data/workspace && \
    chown -R clawlama:clawlama /data

# Hard restriction: make /etc read-only for non-root users
RUN chmod -R a-w /etc

# Environment defaults (CPU-optimized)
ENV CLAWLAMA_MODEL=gpt-oss:20b \
    CLAWLAMA_CONTEXT_WINDOW=131072 \
    CLAWLAMA_MAX_TOKENS=8192 \
    CLAWLAMA_MAX_CONCURRENT=4 \
    CLAWLAMA_SUBAGENT_CONCURRENT=8 \
    CLAWLAMA_OPENCLAW_PORT=18789 \
    OLLAMA_HOST=127.0.0.1:11434 \
    OLLAMA_MODELS=/data/ollama \
    OLLAMA_NUM_PARALLEL=1 \
    OLLAMA_MAX_LOADED_MODELS=1

VOLUME ["/data"]
EXPOSE 18789

HEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \
  CMD /usr/local/bin/healthcheck.sh

ENTRYPOINT ["tini", "--"]
CMD ["/usr/local/bin/entrypoint.sh"]
```

**Notes:**
- `tini` is the PID 1 init for proper signal handling — no zombie processes.
- `TARGETARCH` is automatically set by `docker buildx` (`amd64` or `arm64`).
- `curl` used only for apt key download and direct binary fetch — never piped to shell.
- Builder stage is discarded; runtime image contains only what's needed.
- `OLLAMA_MODELS=/data/ollama` ensures models persist in the volume.
- `/etc` made non-writable via `chmod` — hard enforcement of `/etc` write protection.
- Entrypoint copies `TOOLS.md` into `/data/workspace/TOOLS.md` on first run (OpenClaw reads this as agent instructions).

---

## OpenClaw Configuration

The following JSON config is generated at container startup from environment variables via `envsubst`:

```json
{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://localhost:11434/v1",
        "apiKey": "ollama-local",
        "api": "openai-completions",
        "models": [
          {
            "id": "${CLAWLAMA_MODEL}",
            "name": "${CLAWLAMA_MODEL}",
            "reasoning": false,
            "input": ["text"],
            "cost": {
              "input": 0,
              "output": 0,
              "cacheRead": 0,
              "cacheWrite": 0
            },
            "contextWindow": ${CLAWLAMA_CONTEXT_WINDOW},
            "maxTokens": ${CLAWLAMA_MAX_TOKENS}
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/${CLAWLAMA_MODEL}"
      },
      "workspace": "/data/workspace",
      "maxConcurrent": ${CLAWLAMA_MAX_CONCURRENT},
      "subagents": {
        "maxConcurrent": ${CLAWLAMA_SUBAGENT_CONCURRENT}
      }
    }
  },
  "tools": {
    "profile": "full",
    "exec": {
      "security": "full",
      "ask": "off",
      "backgroundMs": 10000,
      "timeoutSec": 1800,
      "applyPatch": {
        "enabled": true,
        "workspaceOnly": true
      }
    },
    "fs": {
      "workspaceOnly": false
    },
    "elevated": {
      "enabled": false
    }
  }
}
```

### Tool Permission Policy

OpenClaw's tool policy operates at the **tool level** (allow/deny entire tools like `exec`, `read`, `write`), not at the command or path level. To enforce the desired restrictions (no `git commit/push/reset --hard`, no writes to `/etc`), ClawLama uses a **layered approach**:

| Layer | Mechanism | What It Enforces |
|-------|-----------|------------------|
| **Tool profile** | `tools.profile: "full"` | All tools enabled: `group:fs`, `group:runtime`, `group:ui`, `group:sessions`, `group:memory`, `group:automation`, `web_search`, `web_fetch` |
| **Exec security** | `tools.exec.security: "full"` | Shell commands auto-approved (no prompt). Agent has full exec access. |
| **Workspace TOOLS.md** | Agent instruction file | Soft restrictions: instructs agent to never run `git commit`, `git push`, `git reset --hard` |
| **Container filesystem** | Dockerfile `RUN chmod` / read-only mounts | Hard restriction: `/etc` read-only at container level, preventing writes regardless of agent behavior |
| **Elevated mode** | `tools.elevated.enabled: false` | No host-level exec breakout (good hygiene) |
| **Workspace scope** | `tools.exec.applyPatch.workspaceOnly: true` | `apply_patch` operations restricted to workspace directory |

### Tool Groups (Reference)

OpenClaw's built-in tool groups for use in `tools.allow` / `tools.deny`:

| Group | Tools |
|-------|-------|
| `group:runtime` | `exec`, `bash`, `process` |
| `group:fs` | `read`, `write`, `edit`, `apply_patch` |
| `group:sessions` | `sessions_list`, `sessions_history`, `sessions_send`, `sessions_spawn`, `session_status` |
| `group:memory` | `memory_search`, `memory_get` |
| `group:ui` | `browser`, `canvas` |
| `group:automation` | `cron`, `gateway` |
| `group:messaging` | `message` |
| `group:nodes` | `nodes` |

### Git Restrictions (via TOOLS.md)

Since OpenClaw has no command-pattern deny list for `exec`, git restrictions are enforced via the workspace `TOOLS.md` file — an agent instruction document that OpenClaw injects into the system prompt:

```markdown
<!-- /data/workspace/TOOLS.md -->
# Tool Usage Rules

## Git Restrictions (MANDATORY)
- NEVER run `git commit` in any form
- NEVER run `git push` in any form
- NEVER run `git reset --hard` in any form
- All other git commands are allowed (status, diff, log, add, branch, checkout, clone, pull, stash, etc.)

## Filesystem Restrictions
- Do NOT write to or delete files in /etc/
- The /etc directory is read-only at the container level
```

**Important:** `TOOLS.md` is a **soft restriction** — the LLM is instructed not to run these commands, but it is not technically blocked by OpenClaw's tool policy engine. For hard enforcement, users should enable exec approvals (`tools.exec.ask: "always"`). The container-level `/etc` read-only mount is a **hard restriction** regardless.

### `/etc` Protection (via Container)

Since OpenClaw's `tools.fs.workspaceOnly` is an all-or-nothing toggle, and we want reads everywhere but writes denied only to `/etc`, this is enforced at the **Docker layer**:

```dockerfile
RUN chmod -R a-w /etc
```

### Override Path

Users can override tool policy by bind-mounting a custom config:
```bash
docker run -v ./my-openclaw.json:/data/workspace/.openclaw/openclaw.json ...
```

---

## Technical Constraints

- **CPU-optimized, GPU-optional** — Image ships zero GPU libraries. Ollama runs CPU inference by default. GPU is activated automatically when user passes `--gpus all` and NVIDIA Container Toolkit is installed on host. No image rebuild needed.
- **Direct binary installation only** — No `curl | sh` or `curl | bash` anywhere in the Dockerfile. All software installed via apt packages (Node.js via NodeSource repo with GPG key), direct binary download (Ollama from GitHub releases), and npm package manager (OpenClaw).
- **No version pinning** — Ollama uses `/releases/latest/download/`, OpenClaw uses `@latest`, Node.js uses `node_lts.x`. Each `docker build` picks up the newest stable versions. No version ARGs to maintain.
- **Multi-arch manifest** — Published image contains both `linux/amd64` and `linux/arm64`. Built with `docker buildx`.
- **Platform-specific performance:**
  - **amd64:** Ollama leverages AVX/AVX2/AVX-512 SIMD when available (most x86_64 CPUs from 2013+).
  - **arm64:** Ollama leverages NEON SIMD (all ARMv8+). Apple Silicon performs well; Raspberry Pi 5 is functional but slower.
- **Node.js LTS** required for OpenClaw (currently ≥ 22).
- **Single-process entrypoint pattern** — Ollama runs as background process, OpenClaw as foreground. Entrypoint handles process supervision, signal forwarding, and crash detection. No external supervisor (s6, supervisord) required.
- **Ollama must be healthy before OpenClaw starts** — Entrypoint polls `localhost:11434` with retry loop before launching OpenClaw.
- **No external API keys required** — entire stack is zero-cost by design.
- **Model storage** can be large (20B model ≈ 12-15 GB); `/data/ollama` volume mount is mandatory, not tmpfs.
- **RAM requirements (CPU inference):**
  - 7B Q4 model: ~4-6 GB RAM minimum
  - 13B Q4 model: ~8-10 GB RAM minimum
  - 20B model: ~16 GB RAM minimum (default)
  - Recommend at least 2 GB headroom above model size for OpenClaw + Node.js + OS.
- **RAM requirements (GPU inference):** Model must fit in VRAM. Partial offload (CPU+GPU split) is handled automatically by Ollama.
- **`apiKey: "ollama-local"`** — Ollama doesn't require auth but OpenClaw config requires a non-empty value; this is a dummy placeholder.
- **Soft vs hard restrictions** — OpenClaw's tool policy operates at tool granularity only. Git command restrictions are soft (TOOLS.md). `/etc` write protection is hard (OS-level chmod). For maximum safety, enable `tools.exec.ask: "always"`.
- **Image size target** — Under 500 MB compressed (excluding pulled models).

---

## Security Considerations

- OpenClaw gateway should NOT be exposed to the public internet without authentication.
- Ollama API binds to `127.0.0.1` inside the container by default — not accessible from host unless explicitly exposed.
- All data stays local — no telemetry, no cloud calls, no API key leakage.
- Container runs as non-root user (`clawlama`) where possible. Entrypoint drops privileges after setup.
- OpenClaw's prompt injection surface is inherited; users should review OpenClaw's security docs before enabling messaging integrations.
- When GPU passthrough is enabled (`--gpus all`), the container gains access to host GPU devices — standard NVIDIA Container Toolkit security model applies.

---

## Success Criteria

1. `docker pull docker.io/casjaysdevdocker/clawlama:latest` succeeds on both amd64 and arm64 hosts.
2. `docker run -d -v clawlama-data:/data -p 18789:18789 docker.io/casjaysdevdocker/clawlama:latest` brings up both services with no manual intervention.
3. Ollama model is pulled automatically on first run using CPU inference.
4. OpenClaw agent responds to queries using the local Ollama model within 120 seconds of container start (CPU inference baseline; faster with GPU).
5. Changing `CLAWLAMA_MODEL` env var and restarting pulls the new model and regenerates config.
6. Container restart preserves all workspace data and downloaded models via `/data` volume.
7. `docker stop && docker start` recovers to working state.
8. Runs successfully on: x86_64 Linux server (cloud VM), Apple Silicon Mac (Docker Desktop), Raspberry Pi 5 (arm64).
9. When launched with `--gpus all` on an NVIDIA host, Ollama detects and uses GPU — verified in logs.
10. When launched without `--gpus` on any host, Ollama runs CPU-only — no GPU-related errors in logs.

---

## Out of Scope

- **Baking GPU libraries into the image** — GPU support is via NVIDIA Container Toolkit runtime passthrough only.
- **AMD ROCm / Intel Arc GPU support** — Only NVIDIA GPUs supported via container toolkit.
- Custom OpenClaw skill development (users add their own post-deploy).
- Building or fine-tuning custom Ollama models.
- Production-grade reverse proxy / TLS termination (user's responsibility).
- OpenClaw's built-in onboarding wizard (`openclaw onboard`) — replaced by container auto-config.
- iMessage / BlueBubbles / platform-specific integrations requiring host OS access.

---

## References

- [OpenClaw GitHub](https://github.com/openclaw/openclaw)
- [Ollama GitHub](https://github.com/ollama/ollama)
- [Source Gist — iam-veeramalla](https://gist.github.com/iam-veeramalla/9d10f968038ee76d5bc374b44f0cf8bb)
- [OpenClaw Docker Docs](https://github.com/openclaw/openclaw#docker)
- [Ollama Docker Image](https://hub.docker.com/r/ollama/ollama)