---
name: tai-ch108-glossary-of-ai-testing-terms
description: 'Apply chapter 108 of Testing AI, Glossary of AI Testing Terms, as a workflow for evaluating AI and non-deterministic systems. Use for test planning, eval design, quality review, release evidence, examples, or coaching related to glossary of ai testing terms.'
---

# Glossary of AI Testing Terms

Skill name: `tai-ch108-glossary-of-ai-testing-terms`

Based on **Testing AI: Engineering Confidence in AI Systems** by **Jason Arbon**.

## Purpose

A shared vocabulary makes AI quality work easier to teach, debate, and improve.

## Use This Workflow

- Identify the AI behavior or release decision being evaluated.
- Define realistic cases, slices, unacceptable outcomes, and evidence needed for confidence.
- Choose measurements that match the risk: rubric scores, samples, intervals, traces, human review, deterministic checks, or production monitors.
- Report uncertainty, severe failures, and decision impact instead of only a pass/fail result.

## Key Guidance

A glossary is not filler. It is infrastructure for shared understanding. AI quality work mixes
testing, statistics, machine learning, security, product, and operations. When people use the
same words differently, eval discussions become confusion disguised as alignment.

## Apply The Approach

Create representative cases, score them with explicit criteria, review severe failures separately, report uncertainty, and connect the evidence to a concrete decision.

## Expert Notes

At expert level, treat the glossary as a living artifact. Update it when the organization
invents new failure categories, metrics, release gates, or governance concepts.