Featured Teardown · Strategy Signature № 03 · INTERFERENCE

Static 還是 Dynamic?一個框架,決定你的 AI 自動化架構Static or Dynamic? A Framework for Choosing Your AI Automation Architecture

你已經用 n8n 把帳單分類跑得很順——那你還需要 LangGraph 嗎?

You've got your n8n billing workflow running smoothly — so do you actually need LangGraph?

這個問題的答案比我預期的複雜。我原本以為實驗結果會很直白:dynamic workflow 因為有重試機制,準確率就是比 static 高。結果確實如此——但原因完全不同。

這篇文章用同一份信用卡帳單資料,分別跑了靜態版(直接 Ollama 呼叫)和動態版(LangGraph StateGraph),記錄兩者的真實差距,以及讓我改變想法的那個發現。

The answer turned out to be more complicated than I expected. I assumed the experiment would be straightforward: dynamic workflow has retry logic, so accuracy would be higher. It was — but not for the reason I thought.

This article runs the same credit card billing dataset through both a static version (direct Ollama calls) and a dynamic version (LangGraph StateGraph), and reports the real-world difference between them, including the finding that changed my conclusion.


什麼是 Dynamic Workflow?

LangGraph StateGraph:classify → verify → 低信心時回到 classify,信心足夠時接受輸出

Static DAG(靜態有向無環圖):你把每一步寫死,機器照順序跑。n8n、Make、Zapier 都是這種模型。邏輯清楚,出錯容易追。

Dynamic Workflow:流程圖裡有「條件邊」可以回頭。AI 根據當前狀態決定下一步——接受結果、重試、或呼叫另一個工具。LangGraph 是目前最成熟的開源實作,它把工作流程包裝成 StateGraph:每個節點讀取狀態、輸出狀態,邊的走向由函式決定。

這兩種不是優劣之分,是不同適用場景。2025 年底這個詞突然變熱,因為三件事同時發生:Anthropic Claude Agent SDK 正式釋出、LangGraph 1.0 GA、n8n 加入 AI Agent node。

What is a Dynamic Workflow?

LangGraph StateGraph: classify → verify → loop back on low confidence, accept on high confidence

Static DAG (Directed Acyclic Graph): every step is hardcoded, executed in sequence. n8n, Make, Zapier — all this model. Logic is explicit, errors are traceable.

Dynamic Workflow: the graph has conditional edges that loop back. The AI decides the next step based on current state — accept the result, retry, or call a different tool. LangGraph wraps this into a StateGraph: each node reads state and outputs state, and edge routing is determined by a function.

These aren't better or worse. They solve different problems. The term went mainstream in late 2025 because three things happened simultaneously: Anthropic Claude Agent SDK went GA, LangGraph 1.0 released, and n8n shipped an AI Agent node.


實驗設計:同一份資料,兩種架構

實驗環境:Python 3.12、gemma4:e2b(Ollama)、langgraph 1.2.2 安裝確認

輸入:12 個月匯豐信用卡帳單(與上一篇 n8n 文章同一份資料),取 40 筆代表樣本,跨月份、跨類別。

任務:把每筆交易歸類到 8 個消費類別(餐飲、訂閱服務、超市便利、交通、購物、水電通訊、醫療健康、娛樂休閒)。

版本 A — Static:Python 直接迴圈,每筆呼叫一次 Ollama,不管信心高低都接受第一次答案。

版本 B — Dynamic:LangGraph StateGraph,classify → verify → retry_inc → classify(信心 < 0.7 且重試次數 < 2 才回頭)。

兩個版本都用 gemma4:e2b(本地 Ollama,M1 MacBook 16GB),資料不出本機。

⚠️ 踩雷:gemma4:e2b 是 thinking model。第一次設定 num_predict: 20,結果 405 筆全部回傳空字串——thinking tokens 把 token 預算吃完了。正確設定是 num_predict: 2000,每筆約 8-10 秒。

Experiment Design: Same Data, Two Architectures

Experiment environment: Python 3.12, gemma4:e2b (Ollama), langgraph 1.2.2 installed

Input: 12 months of HSBC credit card statements (same dataset as the n8n billing article), 40 representative transactions sampled across months and categories.

Task: Classify each transaction into one of 8 spending categories (dining, subscriptions, groceries, transport, shopping, utilities, healthcare, entertainment).

Version A — Static: Direct Python loop, one Ollama call per transaction, accepts the first answer regardless of confidence.

Version B — Dynamic: LangGraph StateGraph, classify → verify → retry_inc → classify (loops back only when confidence < 0.7 and retries < 2).

Both versions use gemma4:e2b via local Ollama on an M1 MacBook 16GB. Data never leaves the machine.

⚠️ Gotcha: gemma4:e2b is a thinking model. Setting num_predict: 20 caused 405 consecutive empty responses — the thinking tokens consumed the entire token budget. The correct setting is num_predict: 2000, at roughly 8–10 seconds per transaction.

Static 版本:132 行,直接了當

static_classifier.py 核心:單一迴圈呼叫 Ollama,第一次答案即為最終答案 Static 執行結果:40 筆,332 秒,7.5% 落入「其他」,3 筆明確誤分類

核心是一個迴圈:讀交易、組 prompt、打 Ollama、解析回答、存檔。沒有狀態管理,沒有重試邏輯。

Prompt 最後一行是:「如果不確定,回答『其他』。」

這句話在實驗中製造了麻煩。中油加油站出現 4 次,static 版給出了三種不同的分類:

筆次商家Static 答案
[009]APE中油民生東路站其他
[014]APE中油民生東路站交通 ✅
[020]APE中油-金山南路站超市便利
[026]APE中油-金山南路站超市便利

同一個商家,四次,三種答案。這不是模型能力問題——是 prompt 的「不確定就回其他」這個後路,讓模型的決策在邊界案例上變得隨機。

Static Version: 132 Lines, Straightforward

static_classifier.py core: single loop calling Ollama, first answer is final answer Static execution: 40 transactions, 332s, 7.5% fell into 'other', 3 clear misclassifications

The core is a single loop: read transaction, build prompt, call Ollama, parse response, save. No state management, no retry logic.

The last line of the prompt: "If unsure, answer '其他' (other)."

That sentence caused problems. The gas station chain 中油 appeared 4 times. Static gave three different answers:

TransactionMerchantStatic answer
[009]APE中油 Minsheng Rd其他 (other)
[014]APE中油 Minsheng Rd交通 (transport) ✅
[020]APE中油 Jinshan Rd超市便利 (grocery)
[026]APE中油 Jinshan Rd超市便利 (grocery)

Same merchant chain, four appearances, three different answers. This isn't a model capability problem — it's the "if unsure, answer other" escape hatch giving the model an inconsistent way out on borderline inputs.


Dynamic 版本:237 行,四個節點

LangGraph 執行:40 筆完成,5 筆與 static 不同,全部修正正確 Retry 機制說明:模型回答不精確時觸發重試,加入商家原始全文再分類

StateGraph 定義了 4 個節點:

  • classify:呼叫 Ollama,估算信心(精確命中類別名稱 = 0.9,否則 = 0.55)
  • verify:信心 < 0.7 且重試次數 < 2 → 走重試路徑,否則接受
  • retry_inc:重試計數 +1,回到 classify(第二次 prompt 加入商家原始全文)
  • END:接受最終分類,寫出結果

Dynamic 版的 prompt 拿掉了「如果不確定,回答其他」這句,改成只給 8 個類別,要求直接命中其中一個。

實驗結果:重試機制在 40 筆裡一次都沒觸發。 每一筆的模型回答都精確命中類別名稱(信心 0.9),verify 節點直接放行。

但準確率還是從 77.3% 跳到 100%。

Dynamic Version: 237 Lines, Four Nodes

LangGraph execution: 40 transactions complete, 5 differ from static, all corrected Retry mechanism: triggers when model response isn't an exact category name, re-classifies with full merchant text

The StateGraph defines 4 nodes:

  • classify: calls Ollama, estimates confidence (exact category match = 0.9, otherwise = 0.55)
  • verify: confidence < 0.7 and retries < 2 → retry path; otherwise accept
  • retry_inc: increments retry counter, returns to classify (second attempt includes original full merchant text)
  • END: accepts final classification, writes result

The Dynamic prompt dropped the "if unsure, answer other" instruction and replaced it with a direct command to pick one of the 8 categories.

Experiment result: the retry mechanism never fired across all 40 transactions. Every model response hit a category name exactly (confidence 0.9), and the verify node waved it through.

But accuracy still jumped from 77.3% to 100%.


結果:改善來自哪裡?

Static vs Dynamic 完整對比:準確率、耗時、程式複雜度、重試觸發率
指標StaticDynamic
準確率(22 筆可驗證)77.3%100.0%
修正筆數5 筆(全部正確)
總耗時332s355s(+7%)
平均 s/筆8.3s8.9s
程式碼行數132 行237 行(+80%)
重試觸發0 次 / 40 筆
改善來源Prompt 設計差異

5 筆修正全部來自同一個原因:dynamic prompt 沒有「不確定回其他」這條退路,模型必須在 8 個類別裡做出決定。中油三次全部正確地分到「交通」,KKBOX 和 Netflix 手續費也都歸到「訂閱服務」。

關鍵洞察:LangGraph 的節點設計逼你把每個 node 的指令想清楚。這個「設計副作用」可能比重試機制本身更有價值——即使重試從來沒有觸發。

Results: Where Did the Improvement Come From?

Full comparison: accuracy, time, code complexity, retry rate
MetricStaticDynamic
Accuracy (22 verifiable)77.3%100.0%
Corrections5 (all correct)
Total time332s355s (+7%)
Avg s/transaction8.3s8.9s
Lines of code132237 (+80%)
Retry triggerednone0 / 40 (0%)
Improvement sourcePrompt design difference

All 5 corrections trace to the same cause: the Dynamic prompt removed the "if unsure, answer other" escape hatch, forcing the model to commit to one of the 8 categories. All three 中油 appearances correctly landed on "transport." KKBOX and the Netflix foreign transaction fee both went to "subscriptions."

Key insight: LangGraph's node-based design forces you to think carefully about each node's instructions. This design side effect may be more valuable than the retry mechanism itself — even when retries never fire.

決策框架:什麼情況選哪個

決策框架 2×2:任務邊界清晰度 × 容錯需求,四個象限對應不同建議

先問這個問題:你的問題是「模型給出了錯誤答案」,還是「模型給出了不確定的答案」?

  • 「錯誤答案」(中油被分成超市便利)→ 先改 prompt,不需要 LangGraph
  • 「不確定答案」(模型說「可能是 A 或 B」)→ retry 機制有用,考慮 Dynamic

選 Static 當:

  • 輸入格式固定(帳單、表單、結構化資料)
  • 分類邊界清楚,prompt 設計得當就能解決
  • 重視可維護性,出錯要容易追

選 Dynamic 當:

  • 輸入是自由文字,模型回答可能不規則(不是精確命中 label)
  • 需要多步驟工具呼叫(搜尋商家資訊 → 再分類)
  • 任務邊界模糊,需要明確的「自我修正」邏輯
  • 預計未來要加更多節點(多模型路由、外部 API)

不要用 Dynamic 當:

  • 只是想讓流程「感覺更 AI」
  • 問題其實是 prompt 寫得不夠好(先修 prompt)
  • 還沒有靜態版本作為基準

Dynamic workflow 的架構成本是真實的:237 行 vs 132 行,程式複雜度 +80%,每筆多 0.6 秒。如果改一句 prompt 能解決問題,就先改 prompt。

Decision Framework: Which One to Choose

Decision framework 2×2: task boundary clarity × error tolerance, four quadrants

Start with this question: is your problem "the model gave the wrong answer," or "the model gave an uncertain answer"?

  • "Wrong answer" (中油 classified as grocery store) → fix the prompt first, no LangGraph needed
  • "Uncertain answer" (model responds with "maybe A or B") → retry is useful, consider Dynamic

Choose Static when:

  • Input format is fixed (statements, forms, structured data)
  • Classification boundaries are clear — good prompt design solves it
  • Maintainability matters, errors need to be traceable

Choose Dynamic when:

  • Input is free text, model responses may be irregular (not exact label matches)
  • Multi-step tool calls are needed (look up merchant info → re-classify)
  • Task boundary is genuinely fuzzy and self-correction logic is required
  • You expect to add more nodes later (multi-model routing, external APIs)

Don't use Dynamic when:

  • You just want the flow to "feel more AI"
  • The real problem is a poorly written prompt (fix the prompt first)
  • You don't have a working static version as a baseline yet

The architectural cost of Dynamic is real: 237 lines vs 132, +80% code complexity, +0.6s per transaction. If one prompt change solves the problem, change the prompt.