// READOUT — ROUND 01 / VOLUMETRIC SUMMARY

49 rules · 28 rubrics
distilled from 53 craft samples.

Domain post-training in absence of weight access. All training output forced into file-state artifacts — principles, rubric, exemplars — verifiable across sessions. Critic agent runs independently of generator. Inflation tracked across hold-out test set.

PRINCIPLES

8 categories · aesthetic → critic-fed → signal

RUBRICS

16 objective · 7 semi · 5 taste

PEAK SCORE

/ 5.00

post-iteration · 5 fixes applied +0.40

GRADE TIER

closed loop · holdout verified · PASS

§01

Principle distribution

49 RULES · 8 CATEGORIES

FIG.01

By category

Each principle is one falsifiable rule. Categories grew organically as bad-case patterns repeated — anti-pattern bucket appeared only after iteration 02.

PRINCIPLES

AestheticP-001~01111

StructureP-012~0176

MotionP-018~0247

MediaP-025~0273

Anti-patternP-028~0358

Critic-fedP-036~0394

Interaction FSMP-040~0445

Signal gradingP-045~0495

FIG.02

Rubric composition

28 rubrics weight-distributed across 3 axes. Taste axis carries 2× weight despite fewer items — taste is where reward-hacking shows up.

A 16

B 7

C 5

A · OBJECTIVE

grep / DOM / script-checkable

e.g. viewport meta · cancelAnimationFrame · 44px hit-target

B · SEMI

threshold + anchor

e.g. first-fold density ≤ 5 units · cue mapping precision

C · TASTE

subjective · anchored to corpus

e.g. 3-sec gestalt · font-tension · emotional trigger

§02

Generator-critic inflation

HOLDOUT TEST · CRITIC INDEPENDENT

FIG.03

Self-eval vs blind-critic delta (10pt)

In-distribution: generator inflated +0.8 → +1.5 across rounds. Holdout pass-2 added independent critic in the loop — inflation collapsed to −0.20. Inflection point = training maturity signal.

Generator self-eval (Δ vs truth)

Independent critic baseline

RUN	GEN	CRITIC	Δ
test_01	8.0	7.20	+0.80
test_02	9.2	7.70	+1.50
test_03	8.5	7.20	+1.30
holdout	9.10	9.30	−0.20
MEAN INFLATION ID			+1.20

§03

S-tier closed-loop iteration

v01 → v02 · +0.40 SCORE GAIN

FIG.04

Re-scoring after critic-applied fixes

First pass scored 4.55/5 (high A-tier). Critic surfaced 5 specific rubric gaps. Generator applied targeted patches without rewrite. Re-score: 4.95 — entered S band on closed loop.

R-028emotional-trigger hard path explicitly checked0.10

R-021audio cue mapping → phrase-locked from linear0.10

R-023font hierarchy collapsed 6 → 4 levels0.08

R-016scene-detail hooks restored (≥80% survival)0.08

R-010tap targets raised to 44px on mobile0.04

§04

Training protocol — 5-step loop

CRITIC IS INDEPENDENT · NOT GENERATOR

01 CORPUS 53 craft samples + 43 rebuild notes + 6 good/bad pairs collected as raw material.

02 DISTILL Generator extracts falsifiable rules. Output → principles.md, rubric.md, exemplars/.

03 CRITIC Independent agent re-scores blind. Reads rubric, never reads generator self-eval.

04 PATCH Gaps surfaced by critic feed back as new principles (P-036~049 originated here).

05 HOLDOUT Re-test on unseen brief. Inflation collapses → training matures.

§05

Protocol self-reflection

META-LAYER · 3 ACTIONABLE GAPS

F-001

Generator self-eval inflates by +1.2 / 10 on average.

Across 3 independent runs, generator scored itself ≥ 8.0 / 10; blind critic re-evaluation returned 7.20 ~ 7.70. Inflation is structural, not noise — reward-hacking surfaces when scorer = scored.

PRIORITY · MUST FIX

F-002

Critic reports surface newer gaps than good/bad exemplars.

4 of 15 new principles (P-036~039) originated from critic reports rather than prior exemplars. Critic loop should feed back into principles.md as primary update source, not secondary.

PRIORITY · RECOMMENDED

F-003

Single-rubric weight of 0.10 can't express severity of red-line violations.

A rule like "rAF must release on visibility-hidden" weighs only 2% of total. Generator can violate critical principles and still grade out high. Tag red-line rubrics with override-weights or trigger-tier demotion.