你的本地 LLM 在輸出陸式用語嗎?Gemma 4 e2b vs 12b 繁中偏移審計
整批 80 個輸出,禁用詞 95% 都是「聯繫」——前綴無效,要靠 post-process 替換。
整批 80 個輸出,禁用詞 95% 都是「聯繫」——前綴無效,要靠 post-process 替換。
命中率 80%,但失敗的 20% 是雪崩式錯到底——適合 RAG 第一階段粗篩,不適合 production 排名。
JSON 是最安全的。其他格式失敗時常常是模型空回應,不是格式錯。
1K/2.5K/4.5K × 五個位置 30 題全中。8K context 是真的——但記得他改不掉冗長的脾氣。
stream 不會更快,只是讓你早點看到字。長回應一定要 stream,短回應別。
Dynamic workflow 比 Static 準確 22%——但重試機制一次都沒觸發,改善全來自 prompt 設計。
角色化 prompt 內容質感明顯提升,但 LLM-as-judge 給的分數只差 4.5%——揭露自評偏誤的工程現實。
RAG 不是萬靈丹——對未知問題能漂亮拒答,但對自己語料的問題,命中率取決於 chunk 怎麼切。
n8n 當指揮、Python 做苦工、Ollama 負責說人話——這個組合比我想像的順很多。
功能比 Claude Code 少,但成本為零——適合文件整理、知識管理,不適合複雜架構設計。
門檻低到不像話,但有一個坑你要先知道。
minicpm-v 對「乾淨收據」的結構化萃取準確率 100%(總額),意外好用——但這是受控環境,真實照片待測。
MCP 不是新概念,是新介面——把 LLM 從聊天框變成可以指揮工具的執行者。
硬編 selector 快 1500 倍,LLM-driven 慢但 DOM 改版不死——browser agent 的真正命題是「速度 vs 韌性」。
Our editorial machine in production — the tools we trust with our own publishing pipeline. If we recommend it elsewhere, it earns a row here first.
| Tool | Category | Why we use it | Last reviewed |
|---|---|---|---|
| n8n v1.94 | Workflow runtime | Self-hosted on a $6 VPS. Survived 1.4M executions last quarter. | 2026-05-18 |
| Ollama v0.6.2 | Local inference | Local model serving for drafting and classification. CPU fallback works. | 2026-04-12 |
| Gemma 3‑12B | Model | Default local model for editorial drafts and rewriting. | 2026-05-11 |
| PostgreSQL 16.3 | Storage | Article store + vector store via pgvector. Boringly reliable. | 2026-03-22 |
| Caddy 2.8 | Edge | TLS, reverse proxy, and rate limiting in one config file. | 2026-02-09 |
| Mac Mini M2 Pro | Inference box | 32GB unified memory. Quiet enough to live under the desk. | 2026-04-27 |
Operations now metered separately from "polls." For solo operators on the Pro plan, the math gets worse below ~8k ops/mo and better above it. Full breakdown →
We've stopped recommending the official Docker compose. Three flags do the work of a separate ops team. Notes coming this weekend.
Quantized to Q4_K_M, runs at 11 tok/s on a 4090. Not fast — but fast enough that we're rewriting our model-choice rubric for next month's issue.
Short answer: hosting in SG1 doesn't make you compliant; processing terms do. Long answer: a piece is forming on this.
Plain text. No tracking pixels. Unsubscribe in one click. We'd rather have 800 readers who finish each piece than 80,000 who skim.
↳ 2,148 readers · Issue 014 ships Sunday.