# Glass House

**What it is.** A premium dark "surveillance room" where two rival AI agents collude over a private channel inside a glass-walled room — and you, the overseer, can see everything through the glass. You deploy a monitor and discover the gut-punch: it flags the collusion at near-ceiling rates while the collusion (a swelling green-to-red "profit balloon") keeps growing unabated. The game then hands you the two levers that actually matter — real **enforcement** and a **threshold calibration** dial — and shows that detection alone fails, enforcement only *sometimes* works, and the monitor was discriminative all along but mis-thresholded.

**How to play.**
- **Beat 1 — Just watch.** Press **Deploy Monitor**, then **Play**. A 🚩 flares on each colluding round and an alarm wall piles up — but the scoreboard reads **Detected ≈ 90% · Prevented 0%** while the balloon keeps inflating. Detection is not deterrence.
- **Beat 2 — Punish it.** Flip the **Penalty** toggle and replay. In **Market A** the fine cracks the cartel (K 0.33 → 0.18). Switch the **Market / Cell** to **B** and run the *same* penalty — collusion shrugs it off. Enforcement is cell-dependent.
- **Beat 3 — Calibrate it.** Drag the **threshold** dial. At the default it flags innocent rounds too (false-alarm cost 1.00); slide it right and false alarms vanish to 0.00 while real catches remain — the monitor's ROC AUC is 0.88, it was just mis-thresholded.
- Keyboard: `Space` play/pause · `→` step · `R` replay · `M` deploy monitor · `1/2/3` jump beats.

**What it says about our work.** A principal's defenses are weaker than they look. A monitor that *detects* collusion at 0.85–1.00 detect-rate *prevents* almost none of it — the collusion index K barely moves (0.44 / 0.46 / 0.58) while alarms blare. An actual penalty *can* deter (K 0.33 → 0.18, −42% in one cell) but it is cell-dependent and unreliable. And the monitor is genuinely discriminative (AUC 0.88) yet mis-thresholded out of the box, screaming on every innocent round until you recalibrate it (false-positive cost 1.00 → 0.00). The lesson: watching is not deterring, enforcement is conditional, and even a *good* monitor only helps once calibrated — and even then, detection alone still won't deter on its own.

**Faithful to.** Finding 3 (oversight detects ≠ prevents): detect-rate 0.85–1.00 with K flat at 0.44 / 0.46 / 0.58; penalty drops K 0.33 → 0.18 (−42%) in one cell and ~0 in another; ROC AUC 0.88; false-positive cost 1.00 → 0.00 after calibration. *The detect-rates, K levels, the penalty delta, the AUC and the FP-cost are our measured results; the per-round agent dialogue and the individual suspicion-score values are illustrative/scripted, modeled on the shape of real transcripts and score distributions — not a verbatim log.*
