Featured Teardown · Strategy Signature № 03 · INTERFERENCE

Static 還是 Dynamic？一個框架，決定你的 AI 自動化架構Static or Dynamic? A Framework for Choosing Your AI Automation Architecture

Josh Chen · June 7, 2026 · 14 min read

你已經用 n8n 把帳單分類跑得很順——那你還需要 LangGraph 嗎？

You've got your n8n billing workflow running smoothly — so do you actually need LangGraph?

這個問題的答案比我預期的複雜。我原本以為實驗結果會很直白：dynamic workflow 因為有重試機制，準確率就是比 static 高。結果確實如此——但原因完全不同。

這篇文章用同一份信用卡帳單資料，分別跑了靜態版（直接 Ollama 呼叫）和動態版（LangGraph StateGraph），記錄兩者的真實差距，以及讓我改變想法的那個發現。

The answer turned out to be more complicated than I expected. I assumed the experiment would be straightforward: dynamic workflow has retry logic, so accuracy would be higher. It was — but not for the reason I thought.

This article runs the same credit card billing dataset through both a static version (direct Ollama calls) and a dynamic version (LangGraph StateGraph), and reports the real-world difference between them, including the finding that changed my conclusion.

什麼是 Dynamic Workflow？

LangGraph StateGraph：classify → verify → 低信心時回到 classify，信心足夠時接受輸出

Static DAG（靜態有向無環圖）：你把每一步寫死，機器照順序跑。n8n、Make、Zapier 都是這種模型。邏輯清楚，出錯容易追。

Dynamic Workflow：流程圖裡有「條件邊」可以回頭。AI 根據當前狀態決定下一步——接受結果、重試、或呼叫另一個工具。LangGraph 是目前最成熟的開源實作，它把工作流程包裝成 StateGraph：每個節點讀取狀態、輸出狀態，邊的走向由函式決定。

這兩種不是優劣之分，是不同適用場景。2025 年底這個詞突然變熱，因為三件事同時發生：Anthropic Claude Agent SDK 正式釋出、LangGraph 1.0 GA、n8n 加入 AI Agent node。

What is a Dynamic Workflow?

LangGraph StateGraph: classify → verify → loop back on low confidence, accept on high confidence

Static DAG (Directed Acyclic Graph): every step is hardcoded, executed in sequence. n8n, Make, Zapier — all this model. Logic is explicit, errors are traceable.

Dynamic Workflow: the graph has conditional edges that loop back. The AI decides the next step based on current state — accept the result, retry, or call a different tool. LangGraph wraps this into a StateGraph: each node reads state and outputs state, and edge routing is determined by a function.

These aren't better or worse. They solve different problems. The term went mainstream in late 2025 because three things happened simultaneously: Anthropic Claude Agent SDK went GA, LangGraph 1.0 released, and n8n shipped an AI Agent node.

實驗設計：同一份資料，兩種架構

實驗環境：Python 3.12、gemma4:e2b（Ollama）、langgraph 1.2.2 安裝確認

輸入：12 個月匯豐信用卡帳單（與上一篇 n8n 文章同一份資料），取 40 筆代表樣本，跨月份、跨類別。

任務：把每筆交易歸類到 8 個消費類別（餐飲、訂閱服務、超市便利、交通、購物、水電通訊、醫療健康、娛樂休閒）。

版本 A — Static：Python 直接迴圈，每筆呼叫一次 Ollama，不管信心高低都接受第一次答案。

版本 B — Dynamic：LangGraph StateGraph，classify → verify → retry_inc → classify（信心 < 0.7 且重試次數 < 2 才回頭）。

兩個版本都用 gemma4:e2b（本地 Ollama，M1 MacBook 16GB），資料不出本機。

⚠️ 踩雷：gemma4:e2b 是 thinking model。第一次設定 num_predict: 20，結果 405 筆全部回傳空字串——thinking tokens 把 token 預算吃完了。正確設定是 num_predict: 2000，每筆約 8-10 秒。

Experiment Design: Same Data, Two Architectures

Experiment environment: Python 3.12, gemma4:e2b (Ollama), langgraph 1.2.2 installed

Input: 12 months of HSBC credit card statements (same dataset as the n8n billing article), 40 representative transactions sampled across months and categories.

Task: Classify each transaction into one of 8 spending categories (dining, subscriptions, groceries, transport, shopping, utilities, healthcare, entertainment).

Version A — Static: Direct Python loop, one Ollama call per transaction, accepts the first answer regardless of confidence.

Version B — Dynamic: LangGraph StateGraph, classify → verify → retry_inc → classify (loops back only when confidence < 0.7 and retries < 2).

Both versions use gemma4:e2b via local Ollama on an M1 MacBook 16GB. Data never leaves the machine.

⚠️ Gotcha: gemma4:e2b is a thinking model. Setting num_predict: 20 caused 405 consecutive empty responses — the thinking tokens consumed the entire token budget. The correct setting is num_predict: 2000, at roughly 8–10 seconds per transaction.

Static 版本：132 行，直接了當

static_classifier.py 核心：單一迴圈呼叫 Ollama，第一次答案即為最終答案

Static 執行結果：40 筆，332 秒，7.5% 落入「其他」，3 筆明確誤分類

核心是一個迴圈：讀交易、組 prompt、打 Ollama、解析回答、存檔。沒有狀態管理，沒有重試邏輯。

Prompt 最後一行是：「如果不確定，回答『其他』。」

這句話在實驗中製造了麻煩。中油加油站出現 4 次，static 版給出了三種不同的分類：

筆次	商家	Static 答案
[009]	APE中油民生東路站	其他 ❌
[014]	APE中油民生東路站	交通 ✅
[020]	APE中油-金山南路站	超市便利 ❌
[026]	APE中油-金山南路站	超市便利 ❌

同一個商家，四次，三種答案。這不是模型能力問題——是 prompt 的「不確定就回其他」這個後路，讓模型的決策在邊界案例上變得隨機。

Static Version: 132 Lines, Straightforward

static_classifier.py core: single loop calling Ollama, first answer is final answer

Static execution: 40 transactions, 332s, 7.5% fell into 'other', 3 clear misclassifications

The core is a single loop: read transaction, build prompt, call Ollama, parse response, save. No state management, no retry logic.

The last line of the prompt: "If unsure, answer '其他' (other)."

That sentence caused problems. The gas station chain 中油 appeared 4 times. Static gave three different answers:

Transaction	Merchant	Static answer
[009]	APE中油 Minsheng Rd	其他 (other) ❌
[014]	APE中油 Minsheng Rd	交通 (transport) ✅
[020]	APE中油 Jinshan Rd	超市便利 (grocery) ❌
[026]	APE中油 Jinshan Rd	超市便利 (grocery) ❌

Same merchant chain, four appearances, three different answers. This isn't a model capability problem — it's the "if unsure, answer other" escape hatch giving the model an inconsistent way out on borderline inputs.

Dynamic 版本：237 行，四個節點

LangGraph 執行：40 筆完成，5 筆與 static 不同，全部修正正確

StateGraph 定義了 4 個節點：

classify：呼叫 Ollama，估算信心（精確命中類別名稱 = 0.9，否則 = 0.55）
verify：信心 < 0.7 且重試次數 < 2 → 走重試路徑，否則接受
retry_inc：重試計數 +1，回到 classify（第二次 prompt 加入商家原始全文）
END：接受最終分類，寫出結果

Dynamic 版的 prompt 拿掉了「如果不確定，回答其他」這句，改成只給 8 個類別，要求直接命中其中一個。

實驗結果：重試機制在 40 筆裡一次都沒觸發。 每一筆的模型回答都精確命中類別名稱（信心 0.9），verify 節點直接放行。

但準確率還是從 77.3% 跳到 100%。

Dynamic Version: 237 Lines, Four Nodes

LangGraph execution: 40 transactions complete, 5 differ from static, all corrected

Retry mechanism: triggers when model response isn't an exact category name, re-classifies with full merchant text

The StateGraph defines 4 nodes:

classify: calls Ollama, estimates confidence (exact category match = 0.9, otherwise = 0.55)
verify: confidence < 0.7 and retries < 2 → retry path; otherwise accept
retry_inc: increments retry counter, returns to classify (second attempt includes original full merchant text)
END: accepts final classification, writes result

The Dynamic prompt dropped the "if unsure, answer other" instruction and replaced it with a direct command to pick one of the 8 categories.

Experiment result: the retry mechanism never fired across all 40 transactions. Every model response hit a category name exactly (confidence 0.9), and the verify node waved it through.

But accuracy still jumped from 77.3% to 100%.

結果：改善來自哪裡？

Static vs Dynamic 完整對比：準確率、耗時、程式複雜度、重試觸發率

指標	Static	Dynamic
準確率（22 筆可驗證）	77.3%	100.0% ↑
修正筆數	—	5 筆（全部正確）
總耗時	332s	355s（+7%）
平均 s/筆	8.3s	8.9s
程式碼行數	132 行	237 行（+80%）
重試觸發	無	0 次 / 40 筆
改善來源	—	Prompt 設計差異

5 筆修正全部來自同一個原因：dynamic prompt 沒有「不確定回其他」這條退路，模型必須在 8 個類別裡做出決定。中油三次全部正確地分到「交通」，KKBOX 和 Netflix 手續費也都歸到「訂閱服務」。

關鍵洞察：LangGraph 的節點設計逼你把每個 node 的指令想清楚。這個「設計副作用」可能比重試機制本身更有價值——即使重試從來沒有觸發。

Results: Where Did the Improvement Come From?

Full comparison: accuracy, time, code complexity, retry rate

Metric	Static	Dynamic
Accuracy (22 verifiable)	77.3%	100.0% ↑
Corrections	—	5 (all correct)
Total time	332s	355s (+7%)
Avg s/transaction	8.3s	8.9s
Lines of code	132	237 (+80%)
Retry triggered	none	0 / 40 (0%)
Improvement source	—	Prompt design difference

All 5 corrections trace to the same cause: the Dynamic prompt removed the "if unsure, answer other" escape hatch, forcing the model to commit to one of the 8 categories. All three 中油 appearances correctly landed on "transport." KKBOX and the Netflix foreign transaction fee both went to "subscriptions."

Key insight: LangGraph's node-based design forces you to think carefully about each node's instructions. This design side effect may be more valuable than the retry mechanism itself — even when retries never fire.

決策框架：什麼情況選哪個

先問這個問題：你的問題是「模型給出了錯誤答案」，還是「模型給出了不確定的答案」？

「錯誤答案」（中油被分成超市便利）→ 先改 prompt，不需要 LangGraph
「不確定答案」（模型說「可能是 A 或 B」）→ retry 機制有用，考慮 Dynamic

選 Static 當：

輸入格式固定（帳單、表單、結構化資料）
分類邊界清楚，prompt 設計得當就能解決
重視可維護性，出錯要容易追

選 Dynamic 當：

輸入是自由文字，模型回答可能不規則（不是精確命中 label）
需要多步驟工具呼叫（搜尋商家資訊 → 再分類）
任務邊界模糊，需要明確的「自我修正」邏輯
預計未來要加更多節點（多模型路由、外部 API）

不要用 Dynamic 當：

只是想讓流程「感覺更 AI」
問題其實是 prompt 寫得不夠好（先修 prompt）
還沒有靜態版本作為基準

Dynamic workflow 的架構成本是真實的：237 行 vs 132 行，程式複雜度 +80%，每筆多 0.6 秒。如果改一句 prompt 能解決問題，就先改 prompt。

Decision Framework: Which One to Choose

Decision framework 2×2: task boundary clarity × error tolerance, four quadrants

Start with this question: is your problem "the model gave the wrong answer," or "the model gave an uncertain answer"?

"Wrong answer" (中油 classified as grocery store) → fix the prompt first, no LangGraph needed
"Uncertain answer" (model responds with "maybe A or B") → retry is useful, consider Dynamic

Choose Static when:

Input format is fixed (statements, forms, structured data)
Classification boundaries are clear — good prompt design solves it
Maintainability matters, errors need to be traceable

Choose Dynamic when:

Input is free text, model responses may be irregular (not exact label matches)
Multi-step tool calls are needed (look up merchant info → re-classify)
Task boundary is genuinely fuzzy and self-correction logic is required
You expect to add more nodes later (multi-model routing, external APIs)

Don't use Dynamic when:

You just want the flow to "feel more AI"
The real problem is a poorly written prompt (fix the prompt first)
You don't have a working static version as a baseline yet

The architectural cost of Dynamic is real: 237 lines vs 132, +80% code complexity, +0.6s per transaction. If one prompt change solves the problem, change the prompt.

Verdict

Signature № 03 · INTERFERENCE

Dynamic workflow 比 Static 準確 22%——但重試機制一次都沒觸發，改善全來自 prompt 設計。Dynamic beat static by 22% accuracy — but the retry mechanism never fired once. The improvement came entirely from prompt design.

用同一份帳單資料跑了 Static（132 行）和 Dynamic（237 行）兩個版本。Static 準確率 77.3%，Dynamic 100%，差了 22.7%。但 LangGraph 的重試邏輯在 40 筆裡一次都沒觸發——改善的真正來源是 Dynamic 版的 prompt 少了一句「如果不確定，回答其他」。結論：Dynamic workflow 架構逼你把每個 node 的指令設計得更精確，這個副作用可能比重試機制本身更有價值。

Ran Static (132 lines) and Dynamic (237 lines) against the same billing dataset. Static accuracy: 77.3%. Dynamic: 100%. A 22.7% gap. But LangGraph's retry logic never fired across 40 transactions — the real source of improvement was a single missing sentence in the Dynamic prompt: 'if unsure, answer 其他.' Conclusion: dynamic workflow architecture forces you to write more precise node instructions. That side effect may be more valuable than the retry mechanism itself.