Featured Teardown · Strategy Signature № 03 · INTERFERENCE

一個模型,四種專家:用 Prompt 把 gemma4 變成稅務、法律、營養、客服顧問One Model, Four Experts: Prompting gemma4 Into Tax, Legal, Nutrition, and Support Specialists

你還在每次 chat 都重新解釋背景嗎?

Are you still re-explaining your background on every chat?

每次問 ChatGPT 一個法律問題,你都要先寫一句「我是台灣租客,房東加了一條……」。每次問營養師問題,又要說明「我是台灣人,飲食以米飯為主」。這就是為什麼系統 prompt 存在——把這些「背景脈絡」一次設定好,剩下的對話就能直接聊正事。

這篇文章用同一個 gemma4:e2b(本地 Ollama),分別設定 4 種專業角色,看看一個 7B 通用模型在 prompt 加持下能不能變成可用的領域助手。

Every time you ask ChatGPT a legal question, you start with "I'm a tenant in Taiwan, my landlord added a clause that...". Every time you ask a nutrition question, you preface with "I'm Taiwanese, my diet is rice-heavy". That's exactly what system prompts solve — set the background context once, then talk shop.

This article takes the same gemma4:e2b (local Ollama) and configures it as 4 domain experts. Can a 7B generic model become a usable specialist with the right prompt?


實驗:同一個 gemma4,四種角色

四種角色的系統 prompt 並排比較
角色領域System Prompt 重點
Tax Advisor台灣綜所稅引用財政部規定、提醒查證、不給投資建議
Legal Helper台灣租賃法引用內政部定型化契約、提醒尋求律師
Nutritionist飲食建議台灣飲食習慣、強調醫師確診優先
Support Lead客服 escalation同理→確認→方案三步驟、避免承諾賠償

每個 prompt 約 30 行,包含:身份、知識邊界、回答風格、必要免責聲明。

Experiment: Same gemma4, Four Roles

System prompts for four roles compared side-by-side
RoleDomainSystem Prompt Focus
Tax AdvisorTaiwan income taxQuote MOF regulations, recommend verification, no investment advice
Legal HelperTaiwan rental lawQuote MOI standard contracts, recommend consulting a lawyer
NutritionistDiet adviceTaiwan eating habits, defer to doctor diagnosis
Support LeadCustomer support escalationEmpathy → Confirm → Solve, no compensation promises

Each prompt is ~30 lines covering: identity, knowledge boundaries, response style, mandatory disclaimers.


角色 A:稅務顧問

稅務 prompt 答案 vs 通用 gemma4 答案對比

問題:「我去年薪資 80 萬、房租支出 12 萬,可以列舉扣除嗎?」

通用 gemma4 給的是泛泛的「依照規定可以扣除」,沒有金額上限與條件。角色化 prompt 引用財政部 113 年規定:每年上限 NT$18 萬,需房東配合開立繳款證明,且要在 5 月前完成設算扣除選項比較。

Role A: Tax Advisor

Tax role response vs generic gemma4 response

Question: "Last year I made NT$800K, paid NT$120K in rent — can I itemize that as a deduction?"

Generic gemma4 gives a vague "yes, per regulations." The role-prompted version cites Taiwan MOF 113 rules: NT$180K annual cap, requires landlord cooperation on payment certificates, must be compared against standard deduction before May filing.


角色 B:租賃法律助手

法律 prompt 同上對比

問題:「房東在我簽的契約裡加了『承租人不得申請任何政府補貼』,這合法嗎?」

通用版本:「應該不太合法,建議找律師。」角色版本:直接引用內政部定型化契約「不得記載事項」第 10 點——這條無效,且可向地方政府消保官申訴。

Role B: Rental Law Helper

Legal role comparison

Question: "My landlord added 'tenant shall not apply for any government subsidy' to the contract — is that legal?"

Generic version: "Probably not, consult a lawyer." Role version: cites MOI standard contract Forbidden Clause #10 directly — this clause is unenforceable and can be reported to the local consumer protection office.


角色 C:營養師

營養師 prompt 對比

問題:「我是糖尿病前期,每天應該吃多少碳水?」

通用版本傾向直接給數字(45-65%),無風險提醒。角色版本:先說「請先看新陳代謝科醫師」,再提供 ADA 建議範圍 + 台灣常見糖友飲食地雷(粥、芋圓、市售飲料)。

Role C: Nutritionist

Nutritionist role comparison

Question: "I'm pre-diabetic, how many carbs should I eat per day?"

Generic version: gives numbers (45-65%), no risk caveat. Role version: leads with "please see an endocrinologist first," then provides ADA-recommended range plus Taiwan-specific common pitfalls (congee, taro balls, sweetened beverages).


角色 D:客服 escalation 助手

客服 prompt 對比

情境:客戶投訴外送送錯餐、要求賠償雙倍。

通用版本:直接道歉並提議退款。角色版本:依照「同理→確認→方案」三步驟——先共情、確認損失、提供「重送 + NT$100 抵用券」這種可控的補償框架,不承諾雙倍。

Role D: Support Escalation Lead

Support role comparison

Scenario: customer received wrong delivery, demands double refund.

Generic version: apologizes and offers refund. Role version: follows "Empathize → Confirm → Solve" steps — empathy first, confirm loss, propose controlled compensation ("redeliver + NT$100 voucher") without committing to double.


評分結果

4 角色 × 5 維度評分表,含 LLM-as-judge 偏誤分析

用 gemma4 自己當 LLM-as-judge,5 維度 10 分制評分 60 個回答(每角色 12 個):

維度通用稅務法律營養客服
領域準確性9.339.009.339.009.75
引用具體性3.834.255.673.505.83
風險意識9.759.9210.009.9210.00
在地化8.589.259.759.589.33
行動可執行9.929.429.929.6710.00
總分(/50)41.4241.8344.6741.6744.92

反直覺發現:通用版總分 41.42,角色版平均 43.27,差距僅 1.85 分(4.5%)。但實際看內容——通用版開頭「我無法提供正式建議」,角色版直接列方案——質的差距遠大於分數差距。

這揭露了 LLM-as-judge 的友善偏誤:gemma4 評自己生成的內容時,傾向給高分(accuracy、risk、actionable 三個維度都 9 分以上),難拉開差距。唯一拉開的是「引用具體性」——角色 prompt 強制引用條號的效果。

Scoring Results

4 roles × 5 dimensions scoring table with LLM-as-judge bias analysis

Used gemma4 itself as LLM-as-judge, scoring 60 answers (12 per role) on a 10-point scale:

DimensionGenericTaxLegalNutritionSupport
Domain Accuracy9.339.009.339.009.75
Citation Specificity3.834.255.673.505.83
Risk Awareness9.759.9210.009.9210.00
Localization8.589.259.759.589.33
Actionability9.929.429.929.6710.00
Total (/50)41.4241.8344.6741.6744.92

Counterintuitive finding: Generic scored 41.42 vs role average 43.27 — only a 1.85-point gap (4.5%). But the content quality gap is far larger: generic opens with "I cannot provide formal advice," role-specific versions list concrete options immediately.

This reveals LLM-as-judge friendliness bias: gemma4 scoring its own generations tends toward high scores (accuracy, risk, actionability all above 9.0), making differentiation hard. The only dimension that genuinely separated was Citation Specificity — direct evidence that role prompts forcing condition-code references actually work.


決策框架:什麼任務適合「角色化 prompt」

值得做的情境:

  • 重複性高的對話(每天問同類問題)
  • 領域有明確邊界(稅務 / 法律 / 客服等)
  • 你能寫出該領域的「免責聲明」(這代表你了解風險邊界)

不要期待的事:

  • 補不了訓練資料的時間性(2024 後的新法規)
  • 救不了數學能力(複雜計算照樣會算錯)
  • 不能取代專業認證(律師、醫師、會計師)

和 RAG / Fine-tune 的關係:

  • Prompt engineering 是 0 成本起點,先試這個
  • RAG 補「最新資訊」的不足(下一篇文章)
  • Fine-tune 改變模型行為,門檻最高

Decision Framework: When Role Prompting Pays Off

Worth doing when:

  • Conversations repeat (you ask similar questions daily)
  • Domain has clear boundaries (tax / legal / support)
  • You can write that domain's "disclaimer" (proves you understand the risk boundary)

Don't expect it to:

  • Fix knowledge cutoff (regulations after 2024)
  • Fix math weakness (complex calculations still fail)
  • Replace professional certification (lawyer, doctor, CPA)

Relationship to RAG / Fine-tune:

  • Prompt engineering is the zero-cost starting point — try it first
  • RAG fills the "current information" gap (next article)
  • Fine-tuning changes model behavior — highest barrier