---
name: tai-theme-embodied-robotics-and-physical-ai
description: 'Use the Testing AI theme Embodied Robotics and Physical AI to plan, review, or teach related AI quality work. Applies concepts and techniques from the book to testing AI, AI-generated software, and non-deterministic systems when relevant.'
---

# Embodied Robotics and Physical AI

Skill name: `tai-theme-embodied-robotics-and-physical-ai`

Based on **Testing AI: Engineering Confidence in AI Systems** by **Jason Arbon**.

## Theme Purpose

Use these approaches when testing robots and other physical AI systems for real-world safety, simulation coverage, planning, recovery, power, latency, cost, social acceptance, perception, containment, fail-safes, monitoring, and field learning.

Apply these concepts when testing AI, AI-generated software, model-backed features, agents, search, chatbots, RAG systems, generated code, dynamic interfaces, or other software whose behavior can vary across runs, users, data, tools, or time.

## How To Use This Theme

- Identify the behavior, capability, risk, or release decision being evaluated.
- Choose the relevant concepts below and turn them into concrete eval cases, samples, traces, checks, rubrics, metrics, or release gates.
- Prefer evidence that supports a decision: ship, canary, hold, rollback, or collect more samples.
- Report by slices and severe failures when averages hide risk.
- Preserve enough evidence that another person or agent can understand what was tested, how it was measured, and why the recommendation follows.

## Concepts And Techniques To Apply

- Evaluate embodied AI as perception, planning, action, human interaction, safety, recovery, and operating economics, not only task completion.
- Use scenario catalogs for real-world hazards: people, clutter, lighting, occlusion, blocked paths, fragile objects, restricted zones, and emergency interruptions.
- Use simulation and virtual worlds to expand coverage cheaply and safely, then calibrate against physical tests to measure the sim-to-real gap.
- Score trajectories, navigation, recovery, safe stops, near misses, force limits, speed limits, permissions, and intervention rates.
- Measure energy use, latency, compute cost, fleet maintenance, rescue frequency, and quality per dollar.
- Test human-robot interaction, social acceptance, accessibility, privacy, consent, personal space, and cultural context.
- Test sensor fusion, perception uncertainty, stale maps, sensor disagreement, world-model drift, and downstream action impact.
- Apply containment, physical fail-safes, geofences, emergency stops, approval gates, audit logs, production monitoring, and field-learning regression loops.

## Reporting Guidance

- State what was tested and what population the evidence represents.
- Explain uncertainty, missing coverage, severe failures, and known blind spots.
- Connect findings to a concrete decision or next action.
- Use topic-specific chapter skills only when deeper detail is needed; this theme skill should stand alone as practical guidance.
