How to Connect WhatsApp to Ollama (Local AI)

Run your WhatsApp chatbot entirely on your own machine — no API keys, no cloud costs, no data leaving your network. Ollama makes it easy to run open-source models like Llama, Gemma, and Mistral locally.

Prerequisites

A paired whatsmeow-node session (How to Pair)
Ollama installed and running
A model pulled: ollama pull llama3.2
The Ollama SDK: npm install ollama

Step 1: Pull a Model

# Install Ollama from https://ollama.com, then:
ollama pull llama3.2

Other good choices for chat:

gemma3 — Google's open model, fast and capable
mistral — Strong for its size
llama3.2:1b — Smallest Llama, fastest responses

Step 2: Set Up Both Clients

import { createClient } from "@whatsmeow-node/whatsmeow-node";
import { Ollama } from "ollama";

const client = createClient({ store: "session.db" });
const ollama = new Ollama({ host: "http://localhost:11434" });

const MODEL = "llama3.2";
const SYSTEM_PROMPT = "You are a helpful WhatsApp assistant. Keep responses concise — under 500 characters when possible, since this is a chat interface.";

Step 3: Handle Incoming Messages

client.on("message", async ({ info, message }) => {
  if (info.isFromMe) return;

  const text =
    (message.conversation as string) ??
    (message.extendedTextMessage as { text?: string } | undefined)?.text;
  if (!text) return;

  await client.sendChatPresence(info.chat, "composing");

  const reply = await askOllama(info.sender, text);
  await client.sendMessage(info.chat, { conversation: reply });
});

Step 4: Send to Ollama

async function askOllama(userJid: string, userMessage: string): Promise<string> {
  const response = await ollama.chat({
    model: MODEL,
    messages: [
      { role: "system", content: SYSTEM_PROMPT },
      { role: "user", content: userMessage },
    ],
  });

  return response.message.content;
}

Step 5: Add Conversation History

import type { Message } from "ollama";

const conversations = new Map<string, Message[]>();
const MAX_HISTORY = 20;

async function askOllama(userJid: string, userMessage: string): Promise<string> {
  const history = conversations.get(userJid) ?? [];
  history.push({ role: "user", content: userMessage });

  if (history.length > MAX_HISTORY) {
    history.splice(0, history.length - MAX_HISTORY);
  }

  const response = await ollama.chat({
    model: MODEL,
    messages: [{ role: "system", content: SYSTEM_PROMPT }, ...history],
  });

  const reply = response.message.content;
  history.push({ role: "assistant", content: reply });
  conversations.set(userJid, history);

  return reply;
}

Complete Example

import { createClient } from "@whatsmeow-node/whatsmeow-node";
import { Ollama } from "ollama";
import type { Message } from "ollama";

const client = createClient({ store: "session.db" });
const ollama = new Ollama({ host: "http://localhost:11434" });

const MODEL = "llama3.2";
const SYSTEM_PROMPT =
  "You are a helpful WhatsApp assistant. Keep responses concise — under 500 characters when possible, since this is a chat interface.";

const conversations = new Map<string, Message[]>();
const MAX_HISTORY = 20;

async function askOllama(userJid: string, userMessage: string): Promise<string> {
  const history = conversations.get(userJid) ?? [];
  history.push({ role: "user", content: userMessage });

  if (history.length > MAX_HISTORY) {
    history.splice(0, history.length - MAX_HISTORY);
  }

  const response = await ollama.chat({
    model: MODEL,
    messages: [{ role: "system", content: SYSTEM_PROMPT }, ...history],
  });

  const reply = response.message.content;
  history.push({ role: "assistant", content: reply });
  conversations.set(userJid, history);

  return reply;
}

client.on("message", async ({ info, message }) => {
  if (info.isFromMe) return;

  const text =
    (message.conversation as string) ??
    (message.extendedTextMessage as { text?: string } | undefined)?.text;
  if (!text) return;

  console.log(`${info.pushName}: ${text}`);
  await client.sendChatPresence(info.chat, "composing");

  try {
    const reply = await askOllama(info.sender, text);
    await client.sendMessage(info.chat, { conversation: reply });
    console.log(`→ ${reply.slice(0, 80)}...`);
  } catch (err) {
    console.error("Ollama error:", err);
    await client.sendMessage(info.chat, {
      conversation: "Sorry, I'm having trouble right now. Make sure Ollama is running.",
    });
  }
});

client.on("logged_out", ({ reason }) => {
  console.error(`Logged out: ${reason}`);
  client.close();
  process.exit(1);
});

async function main() {
  const { jid } = await client.init();
  if (!jid) {
    console.error("Not paired! See: How to Pair WhatsApp");
    process.exit(1);
  }
  await client.connect();
  await client.sendPresence("available");
  console.log(`Ollama bot is online! (model: ${MODEL})`);

  process.on("SIGINT", async () => {
    await client.sendPresence("unavailable");
    await client.disconnect();
    client.close();
    process.exit(0);
  });
}

main().catch(console.error);

Common Pitfalls

Ollama must be running

Make sure the Ollama server is running (ollama serve) before starting the bot. If it's not running, all requests will fail.

Response time depends on hardware

Local models run on your CPU/GPU. Smaller models like llama3.2:1b respond in 1-3 seconds on modern hardware. Larger models may take 10+ seconds — the typing indicator keeps the user informed while they wait.

Echo loops

Always check info.isFromMe first. Without this, the bot replies to its own messages forever.

Model must be pulled first

Run ollama pull llama3.2 before starting the bot. If the model isn't downloaded, requests will fail.