---
name: tai-ch114-testing-jank-directly-with-claude
description: 'Apply chapter 114 of Testing AI, Testing Jank Directly With Claude, as a workflow for evaluating AI and non-deterministic systems. Use for test planning, eval design, quality review, release evidence, examples, or coaching related to testing jank directly with claude.'
---

# Testing Jank Directly With Claude

Skill name: `tai-ch114-testing-jank-directly-with-claude`

Based on **Testing AI: Engineering Confidence in AI Systems** by **Jason Arbon**.

## Purpose

A coding agent should not be the only judge of its own work. Jank gives Claude a direct quality-
checking loop for code, documents, and live product behavior.

## Use This Workflow

- Identify the AI behavior or release decision being evaluated.
- Define realistic cases, slices, unacceptable outcomes, and evidence needed for confidence.
- Choose measurements that match the risk: rubric scores, samples, intervals, traces, human review, deterministic checks, or production monitors.
- Report uncertainty, severe failures, and decision impact instead of only a pass/fail result.

## Key Guidance

When Claude or another AI coding agent builds something, the first pass often looks convincing.
The code compiles. The page loads. The answer sounds confident. But AI-generated work can still
contain broken flows, weak edge-case handling, misleading copy, accessibility problems, missing
tests, stale assumptions, and risky changes.

## Apply The Approach

Create representative cases, score them with explicit criteria, review severe failures separately, report uncertainty, and connect the evidence to a concrete decision.

## Expert Notes

At expert level, Jank should be part of the agentic development contract. Define when it runs,
what targets it covers, which findings block release, how reports are stored, and how fixes are
verified. The point is not to worship a tool. The point is to create an independent quality loop
close enough to the coding agent that it actually gets used.
