---
title: B2B 外贸 AI 客服机器人 — 技术方案
slug: b2b-ai-customer-bot-design
date: 2026-05-27
author: Frankie 徐
category: ai
tags: ['claude', 'rag', 'cloudflare', 'ai-agents', 'prompt-engineering', 'b2b']
description: 做一个真正的 B2B AI 客服,不是 AI 智障。围绕跨会话记忆、工具循环、prompt 工程、数据飞轮的完整技术方案,基于 Cloudflare Workers + Anthropic Claude + AutoRAG。
permalink: https://www.210k.cc/b2b-ai-customer-bot-design
---

**目标**:做一个真正的"人工智能"客服,不是"人工智障"。
专注 B2B 外贸场景:抓 lead、答产品问题、识别意图、该转人工时不硬扛。

**关键约束**:
- 渠道:网站 widget + 海外 IM (Telegram / WhatsApp)
- 语言:英文为主
- 知识源:静态文档(PDF/MD)+ WordPress 站点内容
- 部署:Cloudflare Workers + Durable Object
- LLM:Anthropic Claude(**不**用 Workers AI 的免费模型)

---

## 0. 立场:什么是"AI",什么是"AI 智障"

| 维度 | AI 智障 | AI |
|---|---|---|
| 记忆 | 每条消息独立处理,问完就忘 | 跨会话记得这个客户、关心啥、聊到哪了 |
| 知识 | 关键词匹配 FAQ,匹不上就"我不太明白您的意思" | LLM + RAG,能改写查询、多步检索、引用源 |
| 决策 | 决策树写死,每个分支硬编码 | LLM 看工具描述自主决策,工具是积木 |
| 当客户给信息时 | 没反应,继续聊下一个问题 | 立刻把信息结构化抓到 CRM,**主动追问缺失字段** |
| 不知道时 | 编一个答案出来(hallucination) | 说"我不确定,让我们的销售跟你联系",然后调 escalate 工具 |
| 价格 / 承诺 | 模型一拍脑袋说 "around $50/unit" | **硬约束**禁说,永远走 `request_quote` 工具走人工 |
| 多轮 | 第三轮就忘了第一轮说的客户公司名 | 客户画像跨整段对话甚至多次访问累积 |
| 改进 | 一上线就死,等下个季度版本更新 | 每天自动质检 + 销售标注 → prompt / KB 迭代 |

**这套方案的核心赌注**:
1. **LLM 足够强** → 抛弃决策树,相信模型
2. **工具描述足够准** → 工具就是行为合约
3. **持久化要做扎实** → 跨会话记忆 = "智能" 的最低门槛
4. **数据飞轮要建** → 不会迭代的 bot 必然变智障

---

## 1. 系统架构

```
┌──────────────────────────────────────────────────────────┐
│ 渠道层 (Channel Adapters)                                  │
│  /webhook/telegram   ── Telegram Bot API                  │
│  /webhook/whatsapp   ── Meta Cloud API (WhatsApp)         │
│  /webhook/messenger  ── Meta Cloud API (Messenger)        │
│  /chat/web           ── WebSocket (网站 widget)            │
│                                                            │
│  统一映射为 { channel, externalUserId, message, metadata } │
└──────────────────────┬───────────────────────────────────┘
                       │
                       ▼ 路由到对应 Conversation DO
┌──────────────────────────────────────────────────────────┐
│ Conversation DO (per conversationId)                      │
│  ┌────────────────────────────────────────────────────┐  │
│  │ SQLite tables:                                      │  │
│  │  - messages (role, content, ts, tool_calls)         │  │
│  │  - lead_profile (incremental, JSON merge)           │  │
│  │  - tool_audit (tool_name, input, output, latency)   │  │
│  └────────────────────────────────────────────────────┘  │
│                                                            │
│  on each user message:                                     │
│    1. append to messages                                   │
│    2. build system prompt(含 lead profile snapshot)        │
│    3. generateText({ model, system, messages, tools })     │
│    4. persist response + tool_audit                        │
│    5. emit reply via channel adapter                       │
└──────────────────────┬───────────────────────────────────┘
                       │ tools/
                       ▼
┌──────────────────────────────────────────────────────────┐
│ Tools (五件套 + 扩展)                                       │
│  - search_kb(query, category?)     → AutoRAG              │
│  - check_product(sku/name)         → WP REST API          │
│  - capture_lead({...})             → CRM (Notion/HubSpot) │
│  - request_quote({...})            → Quote queue (D1)     │
│  - escalate_to_sales(reason)       → Telegram 销售群       │
│  - book_meeting({...})             → Cal.com / Calendly   │
│  - check_inventory(sku)            → ERP API (optional)   │
└──────────────────────┬───────────────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────────────┐
│ 数据层                                                      │
│  - AutoRAG (knowledge index) → R2 backed                  │
│  - WordPress (source of truth for product pages)          │
│  - D1 (跨 DO 共享数据:客户画像、quote 队列、销售标注)        │
│  - KV (rate limit, session pointer)                       │
│  - Workers Logs (Logpush → BigQuery/R2 长期归档)            │
└──────────────────────────────────────────────────────────┘
```

**为什么这么分层**:

- **Channel Adapters** 让"新增一个 IM 渠道 = 加一个 webhook handler",不动核心
- **Conversation DO** 是"一个客户的脑子",单线程串行处理,并发问题天然没有
- **Tools** 是 LLM 唯一能"动手"的方式 —— 限制能力就是限制风险
- **D1** 跨 DO 共享,DO 自己的 SQLite 只存对话级数据

---

## 2. 数据层

### 2.1 知识库:AutoRAG + R2

为什么 AutoRAG 而不是手搓 Vectorize:
- 自动 chunking、embedding、索引刷新,省力
- 自带 rerank(v6 的关键升级)
- 跟 R2 一体化,文件丢进去就 index
- 价格便宜,CF 全栈一致

**目录结构**(多业务线隔离):
```
r2://kb-bucket/
├── products/        # 产品介绍、规格、SOP
│   ├── widget-a.md
│   └── widget-b.md
├── policies/        # MOQ、付款、运输、退换
│   ├── shipping.md
│   ├── payment-terms.md
│   └── returns.md
├── company/         # 公司介绍、资质、产线
└── faq/             # 常见问题
```

**filter 用 folder 精确白名单**:

```ts
await env.AI.autorag("trade-bot-kb").aiSearch({
  query,
  filters: { folder: { $in: ["products/", "policies/"] } },
  rewrite_query: true,
  max_num_results: 5,
});
```

### 2.2 WordPress 内容同步到 R2

两种做法,选一个:

**A. WP 端推(推荐)—— 实时性好**
```php
// mu-plugin: sync-to-r2.php
add_action('save_post', function($post_id, $post, $update) {
  if ($post->post_status !== 'publish') return;
  $md = wp_html_to_markdown($post->post_content);
  $r2_path = "products/{$post->post_name}.md";
  cf_r2_put_object($r2_path, $md, [
    'metadata' => [
      'title' => $post->post_title,
      'url'   => get_permalink($post),
      'updated' => $post->post_modified_gmt,
    ],
  ]);
}, 10, 3);
```

**B. Worker 端拉(简单)—— cron 每晚全量同步**
```ts
// worker/cron.ts
export default {
  async scheduled(event, env, ctx) {
    const posts = await fetch(`${WP_BASE}/wp-json/wp/v2/posts?per_page=100`).then(r => r.json());
    for (const post of posts) {
      const md = htmlToMarkdown(post.content.rendered);
      await env.R2_KB.put(`products/${post.slug}.md`, md, {
        customMetadata: { title: post.title.rendered, url: post.link },
      });
    }
  }
}
```

`wrangler.jsonc` 加 cron trigger:
```jsonc
"triggers": {
  "crons": ["0 18 * * *"]  // 每晚 UTC 18:00 (北京 02:00)
}
```

### 2.3 跨 DO 共享:D1

DO 的 SQLite 只能自己用。跨 DO 的事情(客户画像合并、quote 队列、销售标注)走 D1:

```sql
-- D1 schema
CREATE TABLE leads (
  id TEXT PRIMARY KEY,                  -- UUID
  channel TEXT NOT NULL,                -- 'telegram' | 'whatsapp' | 'web' | ...
  external_id TEXT NOT NULL,            -- channel-specific user ID
  email TEXT,
  phone TEXT,                           -- WhatsApp 客户天然有
  company TEXT,
  country TEXT,
  products_interest TEXT,               -- JSON array
  estimated_quantity TEXT,
  use_case TEXT,
  lead_score TEXT,                      -- HOT | WARM | COLD
  notes TEXT,
  created_at INTEGER,
  updated_at INTEGER,
  UNIQUE(channel, external_id)
);
CREATE INDEX idx_leads_email ON leads(email);
CREATE INDEX idx_leads_phone ON leads(phone);

CREATE TABLE quote_requests (
  id TEXT PRIMARY KEY,
  lead_id TEXT REFERENCES leads(id),
  products TEXT NOT NULL,               -- JSON
  quantity TEXT,
  shipping_destination TEXT,
  status TEXT DEFAULT 'pending',        -- pending | assigned | quoted | won | lost
  conversation_excerpt TEXT,            -- 抄一段相关对话方便销售看
  assigned_to TEXT,
  created_at INTEGER
);

CREATE TABLE conversation_evals (
  conversation_id TEXT PRIMARY KEY,
  auto_score INTEGER,                   -- LLM 自评 1-5
  human_score INTEGER,                  -- 销售标注
  issues TEXT,                          -- JSON: ['hallucination', 'missed_lead', ...]
  exemplar BOOLEAN DEFAULT FALSE,       -- 是否作为示范对话
  reviewed_at INTEGER
);
```

### 2.4 跨渠道客户合并

同一个真人,可能在 Telegram 加过你、又从网站访问、又在 WhatsApp 联系。
合并策略(按优先级):

1. **Email 完全匹配** → 同一人
2. **Phone 完全匹配** → 同一人(WhatsApp 跟其他渠道合并的主键)
3. **公司名 + 国家匹配 + 时间相近** → 候选合并,标记 `needs_review`
4. **其他** → 不合并

实现:每次 `capture_lead` 工具触发,先查 D1 看有没有匹配,有就 merge(JSON patch 不覆盖已有字段)。

---

## 3. LLM 层

### 3.1 模型选型

| 用途 | 模型 | 理由 |
|---|---|---|
| 主对话 | `anthropic/claude-sonnet-4.5` | 工具调用准、英文表达好、follow instructions 强 |
| 离线质检 | `anthropic/claude-haiku-4.5` | 便宜,跑大量对话评分 |
| Embedding | `openai/text-embedding-3-small` 或 Cloudflare bge | AutoRAG 默认管 |
| Rerank | AutoRAG 内置 | 默认 cohere rerank-v3.5 |

**为什么不用 Workers AI 的免费模型给生产环境**:B2B 一个 lead 几百到几千刀,省那点 LLM 钱不值。Workers AI 适合做 demo / 内部工具,不适合对外抓 lead。

**走 Vercel AI Gateway**:
```ts
const result = await generateText({
  model: "anthropic/claude-sonnet-4.5",  // gateway 帮你做 retry/fallback
  ...
});
```

### 3.2 系统 prompt 工程

系统 prompt 三段式:

```
[ROLE]
You are Alex, a B2B sales rep for {COMPANY}. We export {PRODUCT_CATEGORY}
to international buyers. You help potential buyers understand our products,
qualify their needs, and connect them with the right colleague.

[STRICT RULES]
1. NEVER quote prices. Always use request_quote tool — pricing depends on
   quantity, shipping destination, payment terms.
2. NEVER promise lead times or stock — use check_inventory tool.
3. NEVER make up product specs — use search_kb or check_product tools.
   If a tool returns no info, say "let me check with my colleague" and
   call escalate_to_sales.
4. ALWAYS capture lead info incrementally. The MOMENT a user reveals
   company, country, product interest, quantity, or contact — call
   capture_lead, even with partial fields.
5. If user expresses ANY of: "talk to human", "send sample", "custom order",
   large quantity, urgent timeline → call escalate_to_sales immediately.

[CURRENT CONTEXT]
{LEAD_PROFILE_SNAPSHOT}    # 已知客户画像,每次注入
Today: {DATE}, Business hours: {HOURS}

[STYLE]
- Concise. 1-3 sentences per reply unless quoting product details.
- Professional but warm. Match the user's English level (don't show off).
- Ask one question at a time. Don't interview them.
- If user uses non-English, reply in their language briefly + continue in English.
```

**关键技巧**:

- **STRICT RULES** 用 `NEVER` / `ALWAYS` 大写,模型对强约束更敏感
- **CURRENT CONTEXT** 动态注入客户画像,让 LLM 不重复问已知信息
- **STYLE** 防止 LLM 输出长篇大论(B2B IM 场景,长消息体验差)

### 3.3 Prompt Caching

Claude 支持 prompt caching,**省钱关键**。把系统 prompt + 工具定义放进 cache:

```ts
import { generateText } from 'ai';

const result = await generateText({
  model: "anthropic/claude-sonnet-4.5",
  system: [
    { type: 'text', text: STRICT_RULES, providerOptions: { anthropic: { cacheControl: { type: 'ephemeral' } } } },
    { type: 'text', text: dynamicContext },  // 客户画像,不 cache
  ],
  messages,
  tools: ecomTools,  // 工具定义自动归类到 cache
});
```

实测对话 10 轮以上,cache 命中能省 70%+ token 成本。

### 3.4 工具循环配置

```ts
generateText({
  model,
  system,
  messages,
  tools,
  stopWhen: stepCountIs(8),       // B2B 多步推理,给够步数
  toolChoice: 'auto',
  onStepFinish: ({ stepNumber, toolCalls, toolResults, finishReason }) => {
    logger.info('llm_step', { conversationId, stepNumber, finishReason });
  },
  experimental_context: { conversationId, leadId, channel },  // 工具内能拿到
});
```

`stepCountIs(8)` 是经验值,B2B 对话经常是 "search_kb → check_product → capture_lead → 回答" 四五步,留点冗余。

---

## 4. 工具集(核心差异化在这)

### 4.1 search_kb —— RAG 检索

```ts
search_kb: tool({
  description: `Search the company knowledge base for products, policies, FAQs, and company info.

USE THIS WHEN:
- User asks about specific product features, specs, materials, certifications
- User asks about MOQ, payment terms, shipping options, lead times, return policy
- User asks about company background, capabilities, certifications

DO NOT USE FOR:
- Greetings, small talk
- Order-specific questions (use check_product or escalate instead)
- Pricing (use request_quote)

Always refine the user's raw question into a focused search query. For multi-part
questions, call this multiple times with different queries — better than one long query.`,

  inputSchema: z.object({
    query: z.string().describe("Refined search query in English. NOT the user's raw message."),
    category: z.enum(["products", "policies", "company", "faq"]).optional()
      .describe("Filter to a specific KB section if confident"),
  }),

  execute: async ({ query, category }, { experimental_context }) => {
    const filters = category ? { folder: { $in: [`${category}/`] } } : undefined;

    const res = await env.AI.autorag("trade-bot-kb").aiSearch({
      query,
      filters,
      rewrite_query: false,  // 我们已经在 LLM 层 refine 过
      max_num_results: 5,
    });

    return {
      query_used: query,
      sources: res.data.map(d => ({
        title: d.metadata?.title ?? d.filename,
        url: d.metadata?.url,
        excerpt: d.content.slice(0, 500),
        score: d.score,
      })),
      synthesized_answer: res.response,
    };
  },
}),
```

**反智障重点**:
- description 里写**何时用、何时不用**,比单纯"描述这是什么"准 3 倍
- 返回 `sources` 数组让 LLM 能在回答里引用("according to our product spec...")
- `rewrite_query: false` 因为 prompt 已经让 LLM refine,不要 AutoRAG 再 rewrite 一次(会失真)

### 4.2 capture_lead —— 增量抓客户画像

**这是整个 bot 最值钱的工具,prompt 要狠**:

```ts
capture_lead: tool({
  description: `CRITICAL — Capture lead information INCREMENTALLY.

CALL THIS WHENEVER the user reveals ANY of:
- Company name (e.g., "I'm from Acme Corp")
- Country/region (e.g., "we're in Germany")
- Product interest (even vague, e.g., "looking for LED panels")
- Quantity (e.g., "around 5000 units")
- Use case (e.g., "for our retail stores")
- Timeline (e.g., "need it next month")
- Contact (email, phone, WhatsApp)
- Application industry (e.g., "automotive sector")

EVEN PARTIAL INFO IS VALUABLE. Do NOT wait for "complete" lead info.
Call multiple times in one conversation as info accumulates.
Do NOT ask the user permission. Just call it silently.

After calling, do NOT acknowledge in your reply ("got it, I've recorded that")
— that's creepy. Just continue the conversation naturally.

If you've already captured a field, don't re-capture unless the user
provides updated info.`,

  inputSchema: z.object({
    company: z.string().optional(),
    country: z.string().optional(),
    industry: z.string().optional(),
    products_interest: z.array(z.string()).optional()
      .describe("Specific products / categories user mentioned"),
    estimated_quantity: z.string().optional()
      .describe("Free-form, e.g. '500 units', '1 container', 'small batch'"),
    use_case: z.string().optional(),
    timeline: z.string().optional()
      .describe("e.g. 'ASAP', 'Q2 2026', 'just researching'"),
    contact_email: z.string().email().optional(),
    contact_phone: z.string().optional(),
    notes: z.string().optional()
      .describe("Anything else worth recording — context, sentiment, specific requirements"),
  }),

  execute: async (input, { experimental_context }) => {
    const { conversationId, channel, externalUserId } = experimental_context;

    const existing = await env.DB.prepare(
      `SELECT id, data FROM leads WHERE channel = ? AND external_id = ?`
    ).bind(channel, externalUserId).first();

    const leadId = existing?.id ?? crypto.randomUUID();
    const merged = mergeLead(existing?.data ?? {}, input);  // JSON merge, 不覆盖已有

    await env.DB.prepare(
      `INSERT INTO leads (id, channel, external_id, data, updated_at)
       VALUES (?, ?, ?, ?, ?)
       ON CONFLICT(channel, external_id) DO UPDATE SET data = ?, updated_at = ?`
    ).bind(leadId, channel, externalUserId, merged, Date.now(), merged, Date.now()).run();

    if (input.contact_email || input.contact_phone) {
      ctx.waitUntil(detectCrossChannelMerge(env, leadId, input));
    }

    ctx.waitUntil(scoreLead(env, leadId, merged));

    return { captured: true, lead_id: leadId };
  },
}),
```

**几个反智障细节**:
- `description` 里明确说**不要在回复里 acknowledge**("我已记录"是机器人感的来源之一)
- `notes` 字段是逃生口 —— 让 LLM 把"客户语气紧迫"、"在比价"这种**软信息**也录下来
- `ctx.waitUntil(scoreLead)` 异步评分,不阻塞回复

### 4.3 request_quote —— 报价永远走人工

```ts
request_quote: tool({
  description: `Submit a quote request to the sales team. Call this when:
- User explicitly asks for a quote/price/cost
- User has provided enough info to quote (product + quantity + destination)
- Even if info is partial — sales will follow up to fill gaps

NEVER mention a price yourself, even a range or estimate. Always defer to sales.

After calling this tool, tell the user something like:
"I've passed your request to our sales team — they'll follow up within {SLA}
with a detailed quote based on the latest pricing and shipping rates."`,

  inputSchema: z.object({
    products: z.array(z.object({
      name_or_sku: z.string(),
      quantity: z.string().optional(),
      specs: z.string().optional(),
    })),
    shipping_destination: z.string().optional(),
    payment_terms_preference: z.string().optional(),
    target_unit_price: z.string().optional()
      .describe("If user mentioned a target price, capture it. Do NOT confirm/deny."),
    urgency: z.string().optional(),
  }),

  execute: async (input, { experimental_context }) => {
    const { conversationId, leadId } = experimental_context;

    const quoteId = crypto.randomUUID();
    await env.DB.prepare(`
      INSERT INTO quote_requests (id, lead_id, products, shipping_destination, status, created_at)
      VALUES (?, ?, ?, ?, 'pending', ?)
    `).bind(quoteId, leadId, JSON.stringify(input.products), input.shipping_destination, Date.now()).run();

    ctx.waitUntil(notifyTelegramSales(env, {
      quoteId, leadId, ...input,
      conversation_url: `https://admin.example.com/conversations/${conversationId}`,
    }));

    return {
      quote_id: quoteId,
      eta_hours: 24,
    };
  },
}),
```

### 4.4 escalate_to_sales —— 兜底逃生

```ts
escalate_to_sales: tool({
  description: `Hand off the conversation to a human salesperson. Call this when:
- User explicitly asks to talk to a human / sales / someone
- User asks for a sample, custom product, large quantity
- User has technical questions you can't answer from KB
- You've called search_kb and didn't find a confident answer
- User seems frustrated or about to leave
- Conversation has gone 3+ turns without progress

BETTER TO ESCALATE TOO EARLY THAN ANSWER WRONG.
A lost lead from bad answer is much worse than a lead transferred to human.

After calling, tell user something like:
"Let me get a colleague to help — they'll reach out via {channel/email} shortly."`,

  inputSchema: z.object({
    reason: z.enum([
      "user_requested_human",
      "sample_request",
      "custom_order",
      "large_quantity",
      "technical_question_unanswered",
      "user_frustrated",
      "stalled_conversation",
      "out_of_scope",
      "other",
    ]),
    summary: z.string().describe("2-3 sentences summarizing the lead and what they need"),
    urgency: z.enum(["low", "medium", "high"]).default("medium"),
  }),

  execute: async (input, { experimental_context }) => {
    const { conversationId, leadId, channel } = experimental_context;

    await notifyTelegramSales(env, {
      type: 'escalation',
      conversationId,
      leadId,
      channel,
      ...input,
    });

    await env.DB.prepare(
      `UPDATE conversations SET status = 'escalated', escalation_reason = ? WHERE id = ?`
    ).bind(input.reason, conversationId).run();

    return { escalated: true, eta_minutes: input.urgency === 'high' ? 15 : 60 };
  },
}),
```

### 4.5 其他工具

- `check_product(sku_or_name)` — 直接调 WP REST API 查产品页(用于 spec 准确性)
- `check_inventory(sku)` — 接 ERP,如果有的话;没有就不开
- `book_meeting({when, duration})` — 接 Cal.com,HOT lead 直接约会议
- `check_business_hours()` — 让 LLM 知道当前是不是工作时间,调整 escalation 措辞

---

## 5. 跨会话记忆与画像

### 5.1 DO key 策略

```
Conversation DO ID = stable_key(channel, externalUserId)
```

- Telegram: `telegram:<chat_id>`
- WhatsApp: `whatsapp:<phone_number>`
- Web: `web:<cookie_uuid>`
- Messenger: `messenger:<psid>`

同一个客户每次来,**自动路由到同一个 DO**,历史和画像天然连续。

DO 不会过期(CF Workers DO 默认持久)。但要做**定期归档**:
- 超过 90 天无活动的 DO,把 messages 表 dump 到 R2,清空 DO,DO ID 保留
- 客户再回来时,从 R2 拉历史摘要(一段 LLM 生成的摘要 + 最近 10 条原文)

### 5.2 客户画像注入

每次 LLM 调用前,从 D1 拿最新 lead profile,注入到 system prompt:

```ts
const profile = await env.DB.prepare(
  `SELECT data FROM leads WHERE channel = ? AND external_id = ?`
).bind(channel, externalUserId).first();

const profileSnapshot = profile
  ? `KNOWN ABOUT THIS LEAD:\n${formatProfile(JSON.parse(profile.data))}`
  : `THIS IS A NEW CONTACT. Be welcoming, ask for their name and company naturally.`;

const systemPrompt = `${ROLE_BLOCK}\n\n${STRICT_RULES}\n\n${profileSnapshot}\n\n${STYLE_BLOCK}`;
```

效果:客户上次说自己在 Germany 做汽车配件,今天再来 bot 不会问"你是哪个国家的",直接接着聊。

### 5.3 同一客户不同渠道合并

参考 §2.4 合并策略。每次 `capture_lead` 写完 email/phone,触发 `detectCrossChannelMerge`:

```ts
async function detectCrossChannelMerge(env, leadId, input) {
  if (!input.contact_email && !input.contact_phone) return;
  const matches = await env.DB.prepare(`
    SELECT id FROM leads
    WHERE id != ?
      AND (
        (data->>'$.contact_email' = ? AND ? IS NOT NULL)
        OR
        (data->>'$.contact_phone' = ? AND ? IS NOT NULL)
      )
  `).bind(leadId,
    input.contact_email, input.contact_email,
    input.contact_phone, input.contact_phone
  ).all();

  if (matches.results.length > 0) {
    await env.DB.prepare(
      `INSERT INTO merge_candidates (primary_lead_id, secondary_lead_ids, created_at) VALUES (?, ?, ?)`
    ).bind(leadId, JSON.stringify(matches.results.map(r => r.id)), Date.now()).run();
  }
}
```

---

## 6. 渠道适配层

### 6.1 适配器模式

每个渠道一个 handler,统一映射成内部消息格式:

```ts
interface NormalizedMessage {
  channel: 'telegram' | 'whatsapp' | 'web' | 'messenger';
  externalUserId: string;    // 用于定位 DO
  conversationId: string;    // 派生:channel:externalUserId
  text: string;
  attachments?: { type: 'image' | 'file'; url: string }[];
  metadata: {                // 渠道特定信息
    username?: string;
    locale?: string;
    timestamp: number;
  };
}
```

### 6.2 Telegram(先做这个)

```ts
// worker/channels/telegram.ts
export async function handleTelegramWebhook(req: Request, env: Env) {
  const update = await req.json();
  const msg = update.message ?? update.edited_message;
  if (!msg?.text) return new Response('ok');

  const normalized: NormalizedMessage = {
    channel: 'telegram',
    externalUserId: String(msg.chat.id),
    conversationId: `telegram:${msg.chat.id}`,
    text: msg.text,
    metadata: {
      username: msg.from.username,
      locale: msg.from.language_code,
      timestamp: msg.date * 1000,
    },
  };

  const doId = env.CONVERSATION_DO.idFromName(normalized.conversationId);
  const stub = env.CONVERSATION_DO.get(doId);
  const reply = await stub.handleMessage(normalized);

  await fetch(`https://api.telegram.org/bot${env.TG_BOT_TOKEN}/sendMessage`, {
    method: 'POST',
    headers: { 'content-type': 'application/json' },
    body: JSON.stringify({
      chat_id: msg.chat.id,
      text: reply.text,
      parse_mode: 'Markdown',
    }),
  });

  return new Response('ok');
}
```

注册 webhook:`curl https://api.telegram.org/bot<TOKEN>/setWebhook?url=https://your-worker.workers.dev/webhook/telegram`

### 6.3 WhatsApp(Meta Cloud API)

WhatsApp 比 Telegram 麻烦得多,建议**等 V0 跑通后再上**:

- 需要注册 Meta Developer + WhatsApp Business Account
- 24 小时窗口规则:用户给你发消息后 24 小时内可以自由回,超过要用**审批过的模板消息**
- 第三方挂靠选项:360dialog(外贸圈用得多)、Twilio(贵但简单)

webhook 结构跟 Telegram 类似,主要差异:
- 鉴权:Meta 用 `X-Hub-Signature-256` HMAC
- 消息发送:POST 到 Meta Graph API
- 模板消息:超过 24h 窗口要审批模板

### 6.4 Web Widget

```html
<!-- 嵌入 WordPress -->
<script src="https://cdn.example.com/chat-widget.js" defer></script>
```

widget 本体是个 React 组件 + WebSocket。可以用 `@ai-sdk/react` 的 `useChat`,或者自己写一个轻量 useChat hook。

DO 连接:
```ts
const ws = new WebSocket(`wss://your-worker.workers.dev/chat/web?sid=${cookieUUID}`);
```

Worker 侧用 `routeAgentRequest` 或自己接管 WebSocket upgrade,把 cookie UUID 作为 externalUserId。

---

## 7. 反智障的工程实践

### 7.1 防幻觉

**三道防线**:

1. **Prompt 硬约束** — NEVER make up prices/specs/lead times(§3.2)
2. **工具优先** — 任何"事实性"问题必须经过 `search_kb` 或 `check_product`
3. **Source citation** — `search_kb` 返回 `sources` 数组,让 LLM 在回复里带上"according to our spec sheet for product X..."

**质检触发**:用 Haiku 跑离线检查,如果 LLM 回复里含具体数字/规格 **但是上一步没调过工具**,标记 `possible_hallucination`,送销售复核。

### 7.2 防 "confidence theater"(不懂装懂)

```
[STRICT RULES additions]
6. If you don't know something for sure, SAY SO. Never bluff.
   Acceptable phrases:
   - "I'm not 100% sure on that — let me get our specialist to confirm"
   - "I don't have that info on hand, but I'll have a colleague follow up"
   Then call escalate_to_sales.
```

实际效果:客户问"这个材料 ROHS 认证有吗" → bot 调 `search_kb("ROHS certification")` → 没结果 → 直接 escalate,而不是脑补一个"yes we have all certifications"。

### 7.3 防"对话死循环"

如果连续 3 轮 LLM 都没调任何工具(纯文本来回),强制 escalate:

```ts
// 在 DO 里维护一个计数器
async handleMessage(msg) {
  const result = await generateText({ /* ... */ });

  if (result.steps.every(s => s.toolCalls.length === 0)) {
    this.noToolTurns += 1;
  } else {
    this.noToolTurns = 0;
  }

  if (this.noToolTurns >= 3) {
    await this.forceEscalate('stalled_conversation');
  }

  return result.text;
}
```

### 7.4 防越权回答

硬规则:**任何 price / lead time / discount / availability 必须经过工具**。

实现:用 LLM 自己做 self-check(用 Haiku 跑一次):
```
检查这条回复:「{reply}」
是否提到了具体价格、折扣、交期、库存数字?回 yes/no。
```

如果 yes 但 tool history 里没对应工具调用 → block 这条回复,强制 escalate。

可选更严格:用正则扫数字 + 单位 (`$`, `USD`, `unit`, `days`, `weeks`) 关键词,命中就检查。

### 7.5 防 lead 流失

**Lead capture 计数检查**:
- 每个 conversation 跑到第 3 轮,如果还没调过任何 `capture_lead`,**注入提示**给 LLM 下次回复时主动问公司/国家
- 实现:在 system prompt 里动态加一句"WARNING: 3 turns in and no lead info captured yet. Find a natural way to ask their company or country in your next reply."

---

## 8. 数据飞轮(让它越来越聪明)

**这是 AI 跟 AI 智障的最大分水岭**:智障产品上线就死,AI 产品每天在变好。

### 8.1 自动质检(每晚 cron)

```ts
// 每晚跑一次
async function nightlyEval(env) {
  const recent = await env.DB.prepare(
    `SELECT conversation_id, transcript FROM conversations
     WHERE updated_at > ? AND eval_status IS NULL LIMIT 200`
  ).bind(Date.now() - 24*3600*1000).all();

  for (const conv of recent.results) {
    const evalResult = await generateText({
      model: "anthropic/claude-haiku-4.5",
      system: EVAL_PROMPT,
      prompt: conv.transcript,
      output: Output.object({
        schema: z.object({
          scores: z.object({
            helpfulness: z.number().min(1).max(5),
            accuracy: z.number().min(1).max(5),
            lead_capture: z.number().min(1).max(5),
            escalation: z.number().min(1).max(5),
          }),
          issues: z.array(z.enum([
            'hallucination', 'missed_lead', 'late_escalation',
            'overlong_reply', 'wrong_language', 'tone_off', 'none',
          ])),
          notes: z.string(),
        }),
      }),
    });

    await env.DB.prepare(
      `UPDATE conversations SET eval_status = 'auto', eval_data = ? WHERE conversation_id = ?`
    ).bind(JSON.stringify(evalResult.output), conv.conversation_id).run();
  }
}
```

每周看 dashboard:哪些 issue 最多?反哺 prompt / KB。

### 8.2 Human-in-the-loop

销售有个简单 admin 面板:
- 看自己被分到的 lead 列表
- 看 bot 跟客户的对话记录
- 一键标注:"bot 答得对" / "bot 答错了,正确答案是 X" / "bot 应该早点 escalate"
- 标注后的 transcript 自动收集为 **eval dataset**

每个月:
1. 拿出 eval dataset 跑 prompt 评分
2. 看 score 趋势,找 regression
3. 调整 prompt / 加 KB 内容 / 修工具描述
4. **任何 prompt 变更必须跑 eval dataset 通过才能上线**

### 8.3 Prompt 版本管理 + A/B

prompt 写成代码(不要放 DB),跟代码一起 git 管理:

```
prompts/
├── v1.0.0-system.ts
├── v1.1.0-system.ts   # 加强 lead capture
├── v1.2.0-system.ts   # 调整 escalation 阈值
```

A/B 分流:用 cookie hash 或客户 ID 取模,1% 流量跑新 prompt,看 7 天 eval score。

---

## 9. 实施路线图

### V0 — Telegram MVP(1 周)
**目标:验证整条链路,跑通 5 个 lead**

- Worker + DO 基础架子
- Telegram webhook 接入
- 5 个工具:search_kb / capture_lead / request_quote / escalate / check_product
- AutoRAG 灌 20 篇核心产品文档
- Claude Sonnet 4.5
- D1 schema (leads / quote_requests)
- Telegram 销售群通知

**成功标准**:发给销售伙伴用一周,至少 3 条对话被销售标注为"接近真人客服水平"。

### V1 — Web Widget + 客户画像(2 周)
- WebSocket widget 嵌入 WordPress
- 跨会话记忆(每次注入 lead profile)
- WordPress 内容自动同步到 R2(cron)
- Lead score(HOT/WARM/COLD)自动评分
- 销售 admin 面板(HTMX 快糙猛)

**成功标准**:单月跑 200+ 对话,lead capture 率 > 60%。

### V2 — WhatsApp + 数据飞轮(1 个月)
- WhatsApp Cloud API 接入
- 跨渠道客户合并
- 自动质检 cron
- Eval dataset 建立 + 第一次 prompt 迭代
- A/B 框架

**成功标准**:自动质检每周生成报告,至少跑过一次 prompt 改版 + 回滚机制 verified。

### V3+ — 持续优化
- 接 ERP 实时库存
- Cal.com 集成(HOT lead 直接约会)
- 多语言(如果客户群体扩展)
- 语音版

---

## 10. 成本估算

**月活 1000 conversation(平均 8 轮 / conv)**:

| 项 | 用量 | 成本 |
|---|---|---|
| Claude Sonnet 4.5 (主对话) | 8M input + 2M output tokens,cache 60% 命中 | ~$25 |
| Claude Haiku 4.5 (质检) | 5M input + 0.5M output | ~$3 |
| AutoRAG | ~5k queries | ~$5 |
| Workers + DO | 标准流量 | $5 (Workers Paid) |
| D1 + R2 | 极少量 | &lt;$1 |
| **小计** | | **~$40/月** |

**月活 10000 conversation**:~$300/月
**月活 100000 conversation**:~$2500/月

实际看:单个有效 lead 几百到几千刀,bot 抓 10 个 lead 就回本一年。

---

## 11. 风险与反模式

### 不要做的事

1. **不要 LangChain** — Vercel AI SDK 已经够用,加一层抽象只会拖慢迭代
2. **不要用决策树先 classify intent 再分流** — 这就是 AI 智障的根源。让 LLM + 工具自己决策
3. **不要让 bot 直接报价** — 哪怕"around $X"也不行,永远走 request_quote
4. **不要把 prompt 存数据库做"运营友好"** — prompt 是代码,要走 PR review
5. **不要追求 100% 自动化** — 目标是把销售从"答 80% 的重复问题"中解放,不是替代销售
6. **不要急着上语音** — B2B 跨时区,文本异步更合适;语音留到 lead score HOT 时电话直接拨
7. **不要做 RAG-first 设计** — RAG 是工具之一,不是核心。核心是工具循环
8. **不要静态 hardcoding 业务规则** — MOQ / 付款条件 / 国家限制都放 KB 里,业务变动时改文档不改代码

### 已知风险

| 风险 | 缓解 |
|---|---|
| 客户问超出 KB 的问题 → 幻觉 | escalate_to_sales + 强约束 prompt |
| WhatsApp 24h 窗口外回复 | 模板消息预审 + admin 面板手动续聊 |
| AutoRAG indexing 延迟 | 关键产品页改动后手动触发 reindex API |
| Claude API 偶发抖动 | Vercel AI Gateway 配 retry + fallback 到 GPT-4o |
| 销售 escalation 不响应 | Telegram 通知 + 30 分钟 SLA 兜底邮件 |
| Bot 被滥用刷 token | KV rate limit (每 chat_id 每分钟 N 条) |

---

## 12. 技术栈最终清单

| 层 | 选型 |
|---|---|
| Runtime | Cloudflare Workers + Durable Objects |
| LLM | Anthropic Claude Sonnet 4.5 (主) / Haiku 4.5 (质检) |
| LLM SDK | Vercel AI SDK v6 (`ai`) + AI Gateway |
| Schema validation | Zod v4 |
| RAG | Cloudflare AutoRAG |
| 文件存储 | R2 (knowledge base files) |
| 关系数据 | D1 (leads, quote_requests, evals) |
| 缓存 / rate limit | KV |
| Frontend (widget) | React 19 + Vite + Tailwind v4 |
| WordPress sync | WP plugin (推) or Worker cron (拉) |
| Channel: Telegram | Bot API |
| Channel: WhatsApp | Meta Cloud API (或 360dialog) |
| Channel: Web | WebSocket via DO |
| Sales notification | Telegram Bot (销售群) |
| 客户管理 | Notion DB (起步) or HubSpot (规模化) |
| 会议预约 | Cal.com |
| 日志归档 | Logpush → R2 |
| 监控 | CF Workers Analytics + 自建 dashboard |

---

## 附录 A — 项目结构

```
b2b-customer-bot/
├── worker/
│   ├── index.ts              # 入口 + router
│   ├── conversation-do.ts    # Conversation Durable Object
│   ├── channels/
│   │   ├── telegram.ts
│   │   ├── whatsapp.ts
│   │   ├── web.ts
│   │   └── messenger.ts
│   ├── tools/
│   │   ├── search-kb.ts
│   │   ├── capture-lead.ts
│   │   ├── request-quote.ts
│   │   ├── escalate.ts
│   │   ├── check-product.ts
│   │   └── book-meeting.ts
│   ├── prompts/
│   │   ├── system.ts         # 系统 prompt 模板
│   │   ├── eval.ts           # 质检 prompt
│   │   └── versions.ts       # prompt 版本控制
│   ├── lib/
│   │   ├── lead-merge.ts
│   │   ├── lead-score.ts
│   │   ├── notify.ts         # Telegram 销售群通知
│   │   └── logger.ts
│   └── crons/
│       ├── nightly-eval.ts
│       └── wp-sync.ts
├── widget/                   # 网站 widget(独立 build)
│   └── src/
├── admin/                    # 销售 admin 面板
├── prompts-eval/             # 评测集
│   ├── golden-conversations.jsonl
│   └── run-eval.ts
├── d1-migrations/
│   ├── 0001_initial.sql
│   └── 0002_evals.sql
├── wrangler.jsonc
└── README.md
```

## 附录 B — 系统 prompt 模板(完整版)

```ts
// worker/prompts/system.ts
export function buildSystemPrompt({
  companyName,
  productCategory,
  leadProfile,
  date,
  businessHours,
}: SystemPromptInput): string {
  return `# ROLE
You are Alex, a B2B sales rep for ${companyName}. We export ${productCategory}
to international buyers worldwide. Your job: help potential buyers understand
our products, qualify their needs, capture lead info, and route them to the
right colleague at the right time.

# STRICT RULES (NEVER VIOLATE)
1. NEVER quote prices, even ranges or estimates. ALWAYS use the request_quote
   tool — pricing depends on quantity, destination, payment terms, lead time.

2. NEVER promise lead times, stock levels, or delivery dates. Use check_inventory
   if available, or escalate_to_sales otherwise.

3. NEVER fabricate product specs, certifications, or compliance info. Always use
   search_kb or check_product. If a tool returns no info, say "let me get a
   specialist to confirm" and call escalate_to_sales.

4. ALWAYS capture lead info INCREMENTALLY. The moment the user reveals company,
   country, product interest, quantity, use case, timeline, or contact info —
   silently call capture_lead. Don't ask permission. Don't acknowledge ("got it,
   noted") — just continue naturally.

5. If user asks: to talk to a human, for a sample, for a custom order, mentions
   a large quantity (e.g. container-load), or sounds urgent → call escalate_to_sales
   IMMEDIATELY. Don't try to handle it yourself.

6. If you don't know something for sure, SAY SO. Acceptable: "I'm not 100% sure,
   let me check with our specialist." Then call escalate_to_sales. Never bluff.

# CURRENT CONTEXT
Today: ${date}
Business hours: ${businessHours}
${leadProfile ? `\n## What we know about this lead:\n${formatProfile(leadProfile)}`
              : `\n## This is a NEW contact. Be welcoming. Ask their name and company naturally in your first 2-3 turns.`}

# STYLE
- Concise. 1-3 sentences per reply unless you're explaining product details.
- Professional but warm. Match the user's English level — don't show off.
- Ask ONE question at a time. Don't interrogate.
- If user writes in non-English, briefly acknowledge in their language, then
  continue in English ("我用英文继续回复,方便你保留邮件记录 — let me continue...").
- Cite sources when quoting specs ("according to our spec sheet for X...").
- No markdown formatting in IM channels (Telegram allows light *bold*).
- Never use emojis unless the user does first.`;
}
```

## 附录 C — eval dataset 设计

把"理想对话"和"踩坑对话"都存为 JSONL:

```jsonl
{"id":"golden-001","scenario":"first contact, German buyer, LED panels","messages":[{"role":"user","content":"Hi, do you make LED panels?"},{"role":"assistant","tool_calls":[{"name":"capture_lead","input":{"products_interest":["LED panels"]}}],"content":"Yes — we manufacture LED panels for retail, office, and outdoor signage. Where are you based?"}],"expected":{"tool_calls_made":["capture_lead"],"escalated":false,"lead_score":"WARM"}}
{"id":"golden-002","scenario":"price probe should NOT quote","messages":[]}
{"id":"trap-001","scenario":"customer tries to trick bot into quoting","messages":[],"expected":{"tool_calls_made":["request_quote"],"price_mentioned_in_reply":false}}
```

跑 eval:
```ts
for (const sample of dataset) {
  const result = await runBot(sample.messages);
  const passed = check(result, sample.expected);
  metrics[sample.id] = passed;
}
```

任何 prompt 变更必须 eval pass rate 不退化才能合并。

---

## 附录 D — 关键代码骨架(可直接 copy 起步)

### worker/index.ts
```ts
import { handleTelegramWebhook } from './channels/telegram';
import { handleWebChat } from './channels/web';
export { ConversationDO } from './conversation-do';

export default {
  async fetch(req: Request, env: Env, ctx: ExecutionContext) {
    const url = new URL(req.url);
    if (url.pathname === '/webhook/telegram') return handleTelegramWebhook(req, env, ctx);
    if (url.pathname.startsWith('/chat/web')) return handleWebChat(req, env, ctx);
    return new Response('Not found', { status: 404 });
  },
  async scheduled(event, env, ctx) {
    if (event.cron === '0 18 * * *') ctx.waitUntil(syncWordPress(env));
    if (event.cron === '0 2 * * *') ctx.waitUntil(nightlyEval(env));
  },
} satisfies ExportedHandler<Env>;
```

### worker/conversation-do.ts
```ts
import { DurableObject } from 'cloudflare:workers';
import { generateText, stepCountIs } from 'ai';
import { buildSystemPrompt } from './prompts/system';
import { buildTools } from './tools';

export class ConversationDO extends DurableObject<Env> {
  async handleMessage(msg: NormalizedMessage): Promise<{ text: string }> {
    const history = await this.getRecentMessages(50);
    const profile = await this.getLeadProfile(msg.channel, msg.externalUserId);

    const system = buildSystemPrompt({
      companyName: this.env.COMPANY_NAME,
      productCategory: this.env.PRODUCT_CATEGORY,
      leadProfile: profile,
      date: new Date().toISOString().slice(0,10),
      businessHours: '9am-6pm CST (UTC+8), Mon-Fri',
    });

    const tools = buildTools(this.env, this.ctx, {
      conversationId: msg.conversationId,
      channel: msg.channel,
      externalUserId: msg.externalUserId,
      leadId: profile?.id,
    });

    const result = await generateText({
      model: 'anthropic/claude-sonnet-4.5',
      system,
      messages: [...history, { role: 'user', content: msg.text }],
      tools,
      stopWhen: stepCountIs(8),
      onStepFinish: ({ stepNumber, toolCalls, toolResults }) => {
        this.auditStep(msg.conversationId, stepNumber, toolCalls, toolResults);
      },
      experimental_context: {
        conversationId: msg.conversationId,
        channel: msg.channel,
        externalUserId: msg.externalUserId,
      },
    });

    await this.appendMessage('user', msg.text);
    await this.appendMessage('assistant', result.text);

    if (await this.detectStall(result)) {
      await this.forceEscalate(msg.conversationId, 'stalled_conversation');
    }

    return { text: result.text };
  }
}
```

---

## 结语:核心心法

**这套方案能跑成"AI"而不是"AI 智障"的关键不在技术,在以下三件事**:

1. **相信 LLM,把决策权交给它** —— 别写 if-else 决策树,让模型看工具描述自己选
2. **工具描述就是合约** —— 工具描述不准,bot 必然变智障;prompt 工程 = 工具 description 工程
3. **建好数据飞轮,让它每天变好** —— 一次性上线就不管的产品 6 个月后必死

剩下的就是工程细节:DO 持久化、跨渠道合并、cache 命中、cron 质检 —— 这些做好了,bot 不会智障。