// READOUT — ROUND 01 / VOLUMETRIC SUMMARY

49 rules · 28 rubrics
distilled from 53 craft samples.

Domain post-training in absence of weight access. All training output forced into file-state artifacts — principles, rubric, exemplars — verifiable across sessions. Critic agent runs independently of generator. Inflation tracked across hold-out test set.
PRINCIPLES
0
8 categories · aesthetic → critic-fed → signal
RUBRICS
0
16 objective · 7 semi · 5 taste
PEAK SCORE
0
/ 5.00
post-iteration · 5 fixes applied +0.40
GRADE TIER
S
closed loop · holdout verified · PASS
§01

Principle distribution

49 RULES · 8 CATEGORIES
FIG.01

By category

Each principle is one falsifiable rule. Categories grew organically as bad-case patterns repeated — anti-pattern bucket appeared only after iteration 02.

0
PRINCIPLES
AestheticP-001~01111
StructureP-012~0176
MotionP-018~0247
MediaP-025~0273
Anti-patternP-028~0358
Critic-fedP-036~0394
Interaction FSMP-040~0445
Signal gradingP-045~0495
FIG.02

Rubric composition

28 rubrics weight-distributed across 3 axes. Taste axis carries 2× weight despite fewer items — taste is where reward-hacking shows up.

A · OBJECTIVE
grep / DOM / script-checkable
e.g. viewport meta · cancelAnimationFrame · 44px hit-target
B · SEMI
threshold + anchor
e.g. first-fold density ≤ 5 units · cue mapping precision
C · TASTE
subjective · anchored to corpus
e.g. 3-sec gestalt · font-tension · emotional trigger
§02

Generator-critic inflation

HOLDOUT TEST · CRITIC INDEPENDENT
FIG.03

Self-eval vs blind-critic delta (10pt)

In-distribution: generator inflated +0.8 → +1.5 across rounds. Holdout pass-2 added independent critic in the loop — inflation collapsed to −0.20. Inflection point = training maturity signal.

+2.0 +1.0 0.0 −1.0 −2.0 Δ (10pt) truth line · critic = generator test_01 test_02 test_03 holdout IN-DISTRIBUTION (3 RUNS) HOLDOUT +0.8 +1.5 +1.3 −0.20 ← critic injected
Generator self-eval (Δ vs truth)
Independent critic baseline
RUNGENCRITICΔ
test_018.07.20+0.80
test_029.27.70+1.50
test_038.57.20+1.30
holdout9.109.30−0.20
MEAN INFLATION ID+1.20
§03

S-tier closed-loop iteration

v01 → v02 · +0.40 SCORE GAIN
FIG.04

Re-scoring after critic-applied fixes

First pass scored 4.55/5 (high A-tier). Critic surfaced 5 specific rubric gaps. Generator applied targeted patches without rewrite. Re-score: 4.95 — entered S band on closed loop.

S tier ≥ 4.5 A tier ≥ 3.5 5.00 4.50 3.50 2.50 0.00 v01 4.55 A-tier v02 4.95 S-tier ★ +0.40 (5 fixes applied)
R-028emotional-trigger hard path explicitly checked0.10
R-021audio cue mapping → phrase-locked from linear0.10
R-023font hierarchy collapsed 6 → 4 levels0.08
R-016scene-detail hooks restored (≥80% survival)0.08
R-010tap targets raised to 44px on mobile0.04
§04

Training protocol — 5-step loop

CRITIC IS INDEPENDENT · NOT GENERATOR
01 CORPUS 53 craft samples + 43 rebuild notes + 6 good/bad pairs collected as raw material.
02 DISTILL Generator extracts falsifiable rules. Output → principles.md, rubric.md, exemplars/.
03 CRITIC Independent agent re-scores blind. Reads rubric, never reads generator self-eval.
04 PATCH Gaps surfaced by critic feed back as new principles (P-036~049 originated here).
05 HOLDOUT Re-test on unseen brief. Inflation collapses → training matures.
§05

Protocol self-reflection

META-LAYER · 3 ACTIONABLE GAPS
F-001

Generator self-eval inflates by +1.2 / 10 on average.

Across 3 independent runs, generator scored itself ≥ 8.0 / 10; blind critic re-evaluation returned 7.20 ~ 7.70. Inflation is structural, not noise — reward-hacking surfaces when scorer = scored.

PRIORITY · MUST FIX
F-002

Critic reports surface newer gaps than good/bad exemplars.

4 of 15 new principles (P-036~039) originated from critic reports rather than prior exemplars. Critic loop should feed back into principles.md as primary update source, not secondary.

PRIORITY · RECOMMENDED
F-003

Single-rubric weight of 0.10 can't express severity of red-line violations.

A rule like "rAF must release on visibility-hidden" weighs only 2% of total. Generator can violate critical principles and still grade out high. Tag red-line rubrics with override-weights or trigger-tier demotion.

PRIORITY · RECOMMENDED