---
name: tai-ch100-token-efficiency-model-choice-and-business-value
description: 'Apply chapter 100 of Testing AI, Token Efficiency, Model Choice, and Business Value, as a workflow for evaluating AI and non-deterministic systems. Use for test planning, eval design, quality review, release evidence, examples, or coaching related to token efficiency, model choice, and business value.'
---

# Token Efficiency, Model Choice, and Business Value

Skill name: `tai-ch100-token-efficiency-model-choice-and-business-value`

Based on **Testing AI: Engineering Confidence in AI Systems** by **Jason Arbon**.

## Purpose

The best AI system is not the biggest model or the cheapest model. It is the model path that
creates the most trustworthy value for the risk, cost, latency, and business constraints.

## Use This Workflow

- Identify the AI behavior or release decision being evaluated.
- Define realistic cases, slices, unacceptable outcomes, and evidence needed for confidence.
- Choose measurements that match the risk: rubric scores, samples, intervals, traces, human review, deterministic checks, or production monitors.
- Report uncertainty, severe failures, and decision impact instead of only a pass/fail result.

## Key Guidance

Token efficiency is not just a cost-control exercise. It is a quality strategy. Every prompt,
retrieved chunk, tool call, retry, judge pass, and output token consumes time, money, context
budget, and operational capacity. For example, a customer-support agent may answer correctly
with a frontier model, 30 retrieved chunks, and three judge passes. That might be acceptable for
a high-risk legal escalation. It is probably wasteful for a low-risk password-reset question.

## Apply The Approach

Create representative cases, score them with explicit criteria, review severe failures separately, report uncertainty, and connect the evidence to a concrete decision.

## Expert Notes

At expert level, build an efficient frontier for AI quality. Compare marginal quality gain
against marginal cost, latency, privacy exposure, security risk, regional availability, and
continuity risk. Track cost per successful outcome, not cost per request. Maintain fallback
models, provider substitution tests, cached-path tests, and region-aware deployment checks so
the business can keep operating when a model, vendor, region, or policy changes.
