---
name: tai-theme-tools-and-appendices
description: 'Use the Testing AI theme Tools and Appendices to plan, review, or teach related AI quality work. Applies concepts and techniques from the book to testing AI, AI-generated software, and non-deterministic systems when relevant.'
---

# Tools and Appendices

Skill name: `tai-theme-tools-and-appendices`

Based on **Testing AI: Engineering Confidence in AI Systems** by **Jason Arbon**.

## Theme Purpose

Use these approaches when creating executive summaries, release checklists, eval reports, worked examples, templates, governance, taxonomies, glossaries, prompt cases, MCP checks, agentic workflow guidance, SKILLS.md tests, Jank review loops, AI failure maps, fail-safe designs, and variance-aware measurement infrastructure.

Apply these concepts when testing AI, AI-generated software, model-backed features, agents, search, chatbots, RAG systems, generated code, dynamic interfaces, or other software whose behavior can vary across runs, users, data, tools, or time.

## How To Use This Theme

- Identify the behavior, capability, risk, or release decision being evaluated.
- Choose the relevant concepts below and turn them into concrete eval cases, samples, traces, checks, rubrics, metrics, or release gates.
- Prefer evidence that supports a decision: ship, canary, hold, rollback, or collect more samples.
- Report by slices and severe failures when averages hide risk.
- Preserve enough evidence that another person or agent can understand what was tested, how it was measured, and why the recommendation follows.

## Concepts And Techniques To Apply

- Create executive summaries, release checklists, eval reports, governance artifacts, failure taxonomies, glossaries, and reusable templates.
- Use worked examples to turn fuzzy AI quality into concrete cases, rubrics, samples, traces, and decisions.
- Test prompt inputs, chatbot inputs, MCP workflows, agentic workflows, SKILLS.md files, Jank-style review loops, and fail-safe behavior.
- Build measurement infrastructure that understands variance, repeated runs, high-water marks, and uncertainty.
- Use tools and templates to make quality work repeatable, inspectable, and teachable.

## Reporting Guidance

- State what was tested and what population the evidence represents.
- Explain uncertainty, missing coverage, severe failures, and known blind spots.
- Connect findings to a concrete decision or next action.
- Use topic-specific chapter skills only when deeper detail is needed; this theme skill should stand alone as practical guidance.