Transformer

v0.24 introduced a tiny decoder-only transformer in closed_world_lm.transformer_char_model.

It is intentionally small:

corpus-trained character tokenizer
learned token and position embeddings
one causal self-attention block
one feed-forward block
next-character language-model head
dependency-free scalar autodiff
random initialization only

Train a smoke checkpoint:

PYTHONPATH=src python3 -m closed_world_lm.transformer_char_model train \
  --run runs/transformer-smoke \
  --steps 40 \
  --context-size 8 \
  --embedding-dim 6 \
  --feedforward-dim 12

Evaluate answer probes:

PYTHONPATH=src python3 -m closed_world_lm.transformer_char_model eval \
  --checkpoint runs/transformer-smoke/transformer.json \
  --json runs/transformer-smoke/transformer_eval.json

Train on corpus-derived answer lessons:

PYTHONPATH=src python3 -m closed_world_lm.transformer_char_model answer-train \
  --run runs/transformer-answer-smoke \
  --steps 100 \
  --eval-every 0 \
  --candidate-scope eval \
  --selector-steps 200 \
  --selector-eval-every 0 \
  --selector-emit-completions \
  --generator-steps 400 \
  --generator-eval-every 0 \
  --direct-answer-steps 100 \
  --direct-answer-eval-every 0 \
  --direct-answer-mode periodic-balanced-repair-unlikelihood \
  --direct-answer-negative-weight 1.0 \
  --direct-answer-positive-weight 1.0 \
  --direct-answer-rollout-interval 50

From v0.71 onward, answer-train writes experiment_intent.json before training and closes it with a decision in transformer_answer_metrics.json. Use --experiment-hypothesis, --experiment-acceptance-gate name:rule, --experiment-failure-criterion, and --experiment-note to make a screen's intent more specific. From v0.77 onward, transformer screens close through the constraint-first promotion report.

From v0.72 onward, profile-aware replay planning lives in src/closed_world_lm/replay_plan.py. The transformer still emits the same direct_answer_replay_plan.json shape for profile-aware modes, but replay record normalization, profile grouping, coverage floors, and missing-target summaries are now standalone training-planning mechanics.

From v0.73 onward, answer-train also writes corpus_hygiene.json and training_plan.json. These artifacts record source mixture, duplicate checks, train/eval prompt overlap, candidate ratio, rare-profile coverage, allowed data sources, planned artifacts, and replay-plan summaries when profile-aware replay writes a plan.

From v0.75 onward, answer-train also writes candidate_quarantine.json. The manifest records candidate lifecycle state and is linked from training_plan.json; candidate records are not training data until admitted into the ledgered corpus.

From v0.76 onward, answer-train also writes closed_world_verifier.json. The verifier is deterministic and checks that the closed-world data boundary, candidate exclusion policy, quarantine manifest, and protected train/eval overlap all pass before transformer screen evidence is trusted.

From v0.77 onward, answer-train also writes training_recipe.json and constraint_first_promotion.json. The recipe records model, tokenizer, data, objective, optimizer, replay, artifacts, gates, and rerun details. The constraint-first report blocks loss, NLL, rank, top-k, or exact quality evidence until verifier, contamination, branch-context, coverage, and diversity constraints pass first.

From v0.78 onward, the answer-training stack starts using separate transformer responsibility surfaces for artifact contracts, experiment/recipe decisions, trainer utilities, and the direct-answer objective catalog. The public CLI and artifact names remain stable.

From v0.79 onward, src/closed_world_lm/transformer_model.py owns model, optimizer, and generation config validation, checkpoint identity, closed-world dataset metadata, and run metadata. transformer_char_model.py still exports the old names for compatibility.

From v0.80 onward, src/closed_world_lm/transformer_checkpoint.py owns checkpoint payload loading and identity validation, and src/closed_world_lm/transformer_eval.py owns generic transformer probe loading, candidate collection, scoring, eval report assembly, samples JSONL writing, and eval JSON writing. The public eval CLI and artifact shapes remain stable.

From v0.81 onward, branch-balanced-context-profile-target-share-preserving-deficit-unlikelihood adds balanced owned target-share pressure across replay targets inside each profile-aware replay group. It keeps the existing profile replay plan, deficit focus, and represented-target preservation, but adds a per-target anti-collapse term so one represented target cannot dominate a multi-target profile without pressure on the remaining replay targets.

v0.82 screens that objective under the modern artifact stack and constraint-first gates. The screen fixes the transformer metrics purity field for external_embeddings, passes the verifier and branch-context gate, and preserves coverage by restoring step 0, but trained snapshots still collapse QA and heldout branch diversity.

v0.83 adds branch-balanced-context-profile-prompt-ownership-target-share-preserving-deficit-unlikelihood. It keeps the profile target-share objective and adds a prompt-specific sibling-target margin, so each replay context is trained to rank its own target above other targets from the same profile. The focused mechanic passes, but the full screen still rejects trained snapshots that lose target-token coverage.

v0.84 adds branch-balanced-context-profile-baseline-anchored-prompt-ownership-target-share-preserving-deficit-unlikelihood. It keeps prompt ownership but anchors replay preservation to the baseline replay predictions recorded before direct-answer training, so preservation no longer follows prediction drift. The screen improves trained coverage relative to v0.83 but still restores baseline because it misses the full coverage floor.

v0.85 adds branch-balanced-context-profile-baseline-floor-gated-prompt-ownership-target-share-preserving-deficit-unlikelihood. It keeps baseline replay anchors and rejects any attempted direct-answer update whose branch-profile target-token coverage falls below the step-0 floor. The screen preserves coverage by rejecting all attempted updates, so the next repair must produce accepted safe updates rather than looser promotion gates.

v0.86 adds branch-balanced-context-profile-baseline-floor-adaptive-prompt-ownership-target-share-preserving-deficit-unlikelihood. It keeps the baseline-floor guard and retries the same update at smaller learning-rate scales after restoring model, optimizer, and RNG state. The screen shows that step size alone is not enough: all scaled attempts are still rejected.

v0.87 adds branch-balanced-context-profile-baseline-floor-repaired-prompt-ownership-target-share-preserving-deficit-unlikelihood. It keeps the adaptive guard and adds one bounded baseline-covered anchor repair before each failed retry is accepted or rejected. The screen shows that post-update repair is not enough: all repaired attempts are still rejected.

v0.88 adds branch-balanced-context-profile-baseline-floor-objective-prompt-ownership-target-share-preserving-deficit-unlikelihood. It puts balanced baseline-floor anchors inside the same loss and backward pass as the branch-diversity pressure. The screen shows that the combined objective is still not enough: all objective-shaped attempts are rejected.

v0.89 adds branch-context-profile-baseline-floor-stabilization-unlikelihood. It removes branch-diversity pressure from guarded attempts and trains only baseline-covered floor anchors. The screen shows that floor-only stabilization is still not enough: all stabilization-shaped attempts are rejected.

v0.90 adds baseline-floor rejection diagnostics to the same stabilization mode. The guard now records rejected update-shape counts, rejected learning-rate scale counts, violation profile counts, compact per-attempt floor diagnostics, and the worst rejected coverage violation.

v0.91 adds branch-context-profile-baseline-floor-profile-targeted-stabilization-unlikelihood. It covers every baseline-covered floor-anchor profile-target group in each guarded attempt. The screen shows that broader floor-anchor coverage is still not enough: all profile-targeted attempts are rejected with the same violation pattern as v0.90.

v0.92 adds branch-context-profile-baseline-floor-sequential-profile-stabilization-unlikelihood. It changes the repair shape to sequential source-profile floor batches with rollback after each unsafe profile group. The screen shows that source-profile ordering is still not enough: all profile-local attempts are rejected before any effective guarded update survives.

v0.93 adds branch-context-profile-baseline-floor-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the sequential rollback shape, extends calibrated adaptive scales below 0.01, and uses coverage-only guard probes. The diagnostic screen accepts the first nonzero source-profile update that preserves the baseline floor.

v0.94 adds branch-context-profile-baseline-floor-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It searches calibrated scales separately for each source profile, keeps the first safe update for that profile, and rolls back only unsafe profile-scale attempts. The diagnostic screen accepts eight source-profile updates while the baseline floor remains preserved.

v0.95 adds branch-context-profile-baseline-floor-diversity-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the profile-scale search, then accepts a source-profile update only when it preserves the baseline coverage floor and does not regress the branch-diversity score from that profile's pre-update state. The diagnostic screen accepts five score-improving source-profile updates and rejects eleven floor-preserving score regressions before promotion still blocks on branch diversity.

v0.96 adds branch-context-profile-baseline-floor-diversity-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It adds missing-target frontier anchors to each eligible source profile before the same floor and diversity acceptance gates run. The diagnostic screen accepts nine score-improving source-profile updates, lowers max dominant predicted rate to 0.9, and raises minimum target-token coverage to 0.1667 before promotion still blocks on branch diversity.

v0.97 adds branch-context-profile-baseline-floor-diversity-coverage-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the missing-target frontier anchors but accepts a profile-scale update only when the candidate preserves the baseline floor and gains target-token coverage over that profile's pre-update branch snapshot. The diagnostic screen accepts one coverage-gaining source-profile update, rejects coverage ties and coverage regressions explicitly, and shows the strict monotonic screen is auditable but too conservative to recover full branch diversity yet.

v0.98 adds branch-context-profile-baseline-floor-diversity-coverage-prep-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.97 coverage audit but allows coverage-tied preparation moves when they improve the branch-diversity score. The diagnostic screen restores the nine accepted source-profile frontier updates while separating three coverage gains from six coverage-preparation moves, so the self-improvement ledger can distinguish real coverage recovery from safe setup movement.

v0.99 adds branch-context-profile-baseline-floor-diversity-coverage-recovery-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.98 preparation path but gives each safe preparation candidate a small missing-target recovery retry before falling back to the prepared state. The diagnostic screen accepts six source-profile updates, converts two preparation candidates into direct coverage recoveries, keeps four preparation fallbacks, and keeps promotion blocked on branch diversity.

v0.100.0 adds branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.99 recovery retry but adds a branch-stability check: a recovery candidate must preserve the branch-diversity score of its prepared state before the recovered weights are accepted. The diagnostic screen keeps the two coverage recoveries, records fifteen branch-stability checks, rejects one retry for branch-score regression, and keeps promotion blocked on branch diversity.

v0.101.0 adds branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-branch-diversity-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.100.0 branch-stable recovery guard and adds a bounded branch-diversity recovery step after already-safe profile updates. The diagnostic screen accepts six source-profile updates, runs branch-diversity recovery on all six, keeps five branch-score-improving refinements, falls back once, and keeps promotion blocked on branch diversity.

v0.102.0 adds branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-branch-diversity-collapsed-profile-binding-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.101.0 branch-diversity recovery guard and adds a bounded collapsed-profile binding step after already-safe profile updates. The diagnostic screen accepts eleven source-profile updates, keeps four branch-diversity refinements, accepts one collapsed-profile binding update, narrows final collapse from nine eval profiles to three, and keeps promotion blocked on branch diversity.

v0.103.0 adds branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-branch-diversity-collapsed-profile-binding-remaining-profile-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.102.0 binding guard and prioritizes source-profile groups for the remaining collapsed eval profiles: learning, owner, and paraphrase coverage through color, owner, place, and training_data source labels. The diagnostic screen accepts eleven source-profile updates, records twenty-one prioritized attempts, accepts six prioritized updates, improves learning coverage from 0.0 to 0.25, preserves target coverage, and keeps promotion blocked on branch diversity.

v0.104.0 adds branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-branch-diversity-collapsed-profile-binding-remaining-profile-owner-paraphrase-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.103.0 remaining-profile curriculum, narrows residual binding targets to owner and paraphrases, and protects learning as a preserved profile. The diagnostic screen records sixteen owner/paraphrase-prioritized attempts, accepts six prioritized source-profile updates, runs seventy-five learning-preservation checks, rejects twenty-four preservation failures, keeps learning non-collapsed at coverage 0.25, and keeps promotion blocked on branch diversity.

v0.105.0 adds deterministic closed-world retrieval memory in src/closed_world_lm/memory_retrieval.py. Transformer answer-training runs now write retrieval_memory_report.json as a separate artifact from neural weight metrics. The diagnostic screen at runs/transformer-answer-v0.105.0-retrieval-memory-owner-paraphrase-frontier-profile-scale-step1-dim4-context80/ builds 497 memory cards from story facts, admitted memories, self facts, learning rules, and glossary entries; answers 219/219 eval probes exactly; uses no external model, no pretrained retriever, and no external embeddings; and performs no weight updates. The direct-answer transformer screen remains blocked on branch diversity, so retrieval success is evidence for the memory-first rail, not neural promotion.

v0.106.0 adds deterministic memory-guided consolidation planning in src/closed_world_lm/memory_consolidation.py. Transformer answer-training runs now write memory_consolidation_plan.json after retrieval and branch diagnostics are available. The diagnostic screen at runs/transformer-answer-v0.106.0-memory-guided-consolidation-owner-paraphrase-frontier-profile-scale-step1-dim4-context80/ keeps retrieval at 219/219, records 9 memory-backed neural failed profiles, and ranks owner, paraphrases, glossary, admission_paraphrases, and admissions as the top consolidation priorities. It identifies collapsed memory-backed profiles owner, paraphrases, and glossary; the transformer still rejects promotion on branch diversity.

v0.107.0 adds gated memory-consolidation direct-answer training. The diagnostic screen at runs/transformer-answer-v0.107.0-gated-memory-consolidation-owner-paraphrase-glossary-frontier-profile-scale-step1-dim4-context80/ loads the v0.106.0 consolidation plan as a declared source artifact, consumes owner, paraphrases, and glossary as the target profile list, records 26 memory-consolidation prioritized attempts with 8 acceptances and 18 rejections, keeps retrieval exact at 219/219, and still rejects promotion on branch_diversity_target. Retrieval remains separate memory evidence; the new mode proves a plan-to-weight-update handoff under gates, not completed neural learning.

v0.108.0 expands the memory-consolidation target window. Target-only profiles such as heldout, qa, admissions, and admission_paraphrases now map back to admitted corpus source labels before replay ordering. The diagnostic screen at runs/transformer-answer-v0.108.0-expanded-memory-consolidation-owner-paraphrase-heldout-qa-glossary-frontier-profile-scale-step1-dim4-context80/ consumes the v0.107.0 plan with five target profiles: owner, paraphrases, heldout, qa, and glossary. Retrieval remains exact at 219/219, the guard again records 26 prioritized attempts with 8 acceptances and 18 rejections, and promotion still rejects on branch_diversity_target. The next mechanic should directly target missing first-token diversity for that expanded profile set.

v0.109.0 adds that missing first-token memory-consolidation pressure. The diagnostic screen at runs/transformer-answer-v0.109.0-missing-first-token-memory-consolidation-owner-paraphrase-heldout-qa-glossary-frontier-profile-scale-step1-dim4-context80/ consumes the v0.108.0 plan, preserves the five-profile target window, extracts plan-derived missing first-token maps, and records 8 missing-token candidates, 22 missing-token attempts, 1 accepted guarded coverage-gain update, 21 rejections, and 7 fallback acceptances. Retrieval remains exact at 219/219; promotion still rejects on branch_diversity_target, and the next plan narrows remaining collapsed memory-backed profiles to owner, paraphrases, and learning.

v0.110.0 consumes that narrowed plan directly. The diagnostic screen at runs/transformer-answer-v0.110.0-remaining-collapsed-missing-first-token-memory-consolidation-owner-paraphrase-learning-frontier-profile-scale-step1-dim4-context80/ requires source-plan collapsed_memory_backed_profiles, targets only owner, paraphrases, and learning, and records the remaining-collapsed target contract in the replay plan and direct-answer guard. Retrieval remains exact at 219/219; the missing-token phase records 6 candidates, 16 attempts, 1 accepted guarded coverage-gain update, 15 rejections, and 5 fallback acceptances. Promotion still rejects on branch_diversity_target, so the next mechanic should repair those three profiles with more profile-specific pressure.

v0.111.0 adds that profile-specific pressure. The diagnostic screen at runs/transformer-answer-v0.111.0-profile-specific-missing-first-token-memory-consolidation-owner-paraphrase-learning-frontier-profile-scale-step1-dim4-context80/ consumes the v0.110.0 plan, keeps owner, paraphrases, and learning as the target profiles, and maps each admitted source label to only the unresolved targets it can support: learning -> learning, owner -> owner/paraphrases, and color/place/training_data -> paraphrases. Retrieval remains exact at 219/219; memory-prioritized consolidation records 16 attempts with 6 acceptances and 10 rejections. The profile-specific missing-token phase records 6 candidates, 18 attempts, 0 direct missing-token acceptances, 18 rejections, and 6 fallbacks, while the guard records 1 accepted profile-specific update shape. Promotion still rejects on branch_diversity_target, so the next mechanic should use the per-profile acceptance deltas to repair paraphrases, owner, and re-emergent glossary collapse.

v0.112.0 pauses repair-objective churn and adds branch-diversity root-cause diagnostics. The diagnostic screen at runs/transformer-answer-v0.112.0-branch-diversity-root-cause-profile-specific-memory-consolidation-step1-dim4-context80/ consumes the v0.111.0 plan, targets owner, paraphrases, and glossary, and keeps retrieval exact at 219/219. It records 24 memory-prioritized attempts with 8 acceptances and 16 rejections, plus 24 profile-specific missing-token attempts with 0 direct missing-token acceptances, 24 rejections, and 8 fallbacks. The new branch_diversity_target.root_cause report classifies the final failure as a critical target_routing_gap: 9/9 profiles fail, 3 remain collapsed, 1 has zero target-token coverage, and 6 have buried targets. Promotion still rejects on branch_diversity_target, so the next mechanic should audit routing, representation separation, and profile/target imbalance before adding another objective.

v0.113.0 adds that routing audit to direct-answer snapshots. The diagnostic screen at runs/transformer-answer-v0.113.0-branch-routing-audit-profile-specific-memory-consolidation-step1-dim4-context80/ consumes the v0.112.0 plan, targets owner, paraphrases, and learning, and keeps retrieval exact at 219/219. It records 16 memory-prioritized attempts with 6 acceptances and 10 rejections, plus 18 profile-specific missing-token attempts with 0 direct missing-token acceptances, 18 rejections, and 6 fallbacks. The root cause remains a critical target_routing_gap, and branch_routing_audit reports high output-bias escape risk, low representation separation across 9/9 multi-target profiles, and a glossary target-imbalance hotspot. Promotion still rejects on branch_diversity_target.

v0.114.0 adds branch_logit_prior_profiles and centroid separation metrics to direct-answer snapshots. The diagnostic screen at runs/transformer-answer-v0.114.0-logit-prior-representation-instrumentation-profile-specific-memory-consolidation-step1-dim4-context80/ consumes the v0.113.0 plan, targets owner, paraphrases, and glossary, and keeps retrieval exact at 219/219. It records 24 profile-specific missing-token attempts with 0 direct missing-token acceptances, 24 rejections, and 8 fallbacks. The root cause remains a critical target_routing_gap; output-bias risk remains high, but logit-prior decomposition says dominant-token wins are hidden-projection driven across 9/9 multi-target profiles. Promotion still rejects on branch_diversity_target.

Add --use-context-mean to either train or answer-train to test the experimental mean-pooled context residual in the final transformer representation. It is diagnostic architecture evidence only until it improves prompt-conditioned branch profiles and complete answer metrics. Add --use-context-projection to test a zero-initialized trainable projection of that context summary; it starts baseline-equivalent and must prove that its learned parameters improve branch profiles before it can be promoted. Add --use-prompt-prefix-projection to test a zero-initialized trainable projection of non-padding prompt-prefix positions before the final answer token. Add --use-prompt-attention-summary to test a trainable attention-pooled summary of the current context through a zero-initialized output projection. It is also diagnostic until branch profiles improve. Add --direct-answer-require-branch-context-gate to skip direct-answer training unless branch contexts are semantically complete and unambiguous. Add --direct-answer-snapshot-mode branch-only for bounded longer-context screens that need branch profiles and branch-context gate evidence but can intentionally skip greedy completion evals in direct-answer JSONL snapshots. Direct-answer snapshots also emit branch_diversity_target, which fails when multi-target eval profiles collapse to too few predicted branch tokens. Use --direct-answer-mode branch-diversity-unlikelihood to train distinct branch targets while also suppressing each branch context's current wrong prediction. Use --direct-answer-freeze-output-bias to exclude the transformer output bias from direct-answer updates when screening whether a branch objective is learning prompt-specific weights rather than moving one global token bias. Use --direct-answer-mode branch-target-softmax-unlikelihood to add a restricted softmax over the distinct branch targets in each batch, making the right target compete directly against the other observed branch targets. Use --direct-answer-restore-best-branch-snapshot to restore the best scored branch-diversity checkpoint before final metrics and checkpoint writing. Add --use-prompt-position-projection to test a zero-initialized position-specific projection of non-padding prompt-prefix positions. Add --prompt-position-projection-scale to scale that prompt-position projection residual before it is added to the final branch representation. Use --direct-answer-mode branch-target-margin-unlikelihood to add a smooth pairwise target-margin loss over the distinct branch targets in each batch. Direct-answer snapshots include branch_representation_profiles so runs can measure hidden-state pairwise distance before the output head. Use --direct-answer-mode branch-representation-contrast-unlikelihood to penalize nearly identical hidden states for different branch targets. Use --direct-answer-mode branch-balanced-representation-contrast-unlikelihood to build that representation-contrast batch from target buckets so frequent first answer tokens cannot crowd out rare branch targets. Direct-answer branch profiles also include target-rank diagnostics: average target rank, top-3/top-5 target coverage, and the top predicted alternatives on failed branch records. Use --direct-answer-mode branch-output-binding-unlikelihood to combine restricted branch-target softmax with branch representation contrast in the same update. Use --direct-answer-mode branch-rank-margin-unlikelihood to push each branch target above the model's current top wrong tokens. The --direct-answer-hard-negatives value controls how many top wrong tokens each branch target is margined against. Use --direct-answer-mode branch-balanced-rank-margin-unlikelihood to apply the same rank-margin repair with target-balanced branch batches. Use --direct-answer-mode branch-topk-softmax-unlikelihood to train each branch target against a restricted softmax over the target and the model's current top wrong tokens. Use --direct-answer-mode branch-balanced-topk-softmax-unlikelihood for the same objective with target-balanced branch batches. The --direct-answer-hard-negatives value controls the top-wrong-token candidate count, and --direct-answer-contrast-weight controls the restricted-softmax loss weight. Use --direct-answer-mode branch-bidirectional-binding-unlikelihood to bind prompt contexts and branch targets in both directions: row-wise target choice inside each prompt context, and column-wise target-token ownership across prompt contexts. Use --direct-answer-mode branch-balanced-bidirectional-binding-unlikelihood for the same objective with target-balanced branch batches. Use --direct-answer-mode branch-coverage-binding-unlikelihood to combine bidirectional binding with hard-wrong-token competition and a target-set mass coverage guard. Use --direct-answer-mode branch-balanced-coverage-binding-unlikelihood for the same objective with target-balanced branch batches, and use --direct-answer-hard-negatives to choose the hard wrong-token pool size. Use --direct-answer-mode branch-target-set-coverage-unlikelihood to train only target-set mass against hard wrong tokens before exact-target sharpening. Use --direct-answer-mode branch-balanced-target-set-coverage-unlikelihood for the same objective with target-balanced branch batches. Use --direct-answer-mode branch-target-diversity-unlikelihood to keep target-set mass pressure while adding an explicit target-share diversity term over the branch target set. Use --direct-answer-mode branch-balanced-target-diversity-unlikelihood for the same objective with target-balanced branch batches. Use --direct-answer-mode branch-target-replay-coverage-unlikelihood to apply target-set mass and target-share balance over the broader admitted branch training pool at the same branch position. Use --direct-answer-mode branch-balanced-target-replay-coverage-unlikelihood for the same objective with target-balanced sampled branch batches. Use --direct-answer-mode branch-context-replay-coverage-unlikelihood to train each sampled replay branch context to own its own target within the replay target set. Use --direct-answer-mode branch-balanced-context-replay-coverage-unlikelihood for the same objective with target-balanced sampled branch and replay batches. Use --direct-answer-mode branch-context-coverage-anchor-unlikelihood to add a covered-target anchor for replay branches whose own target is already top-1. Use --direct-answer-mode branch-balanced-context-coverage-anchor-unlikelihood for the same objective with target-balanced sampled branch and replay batches. Use --direct-answer-mode branch-context-target-balanced-anchor-unlikelihood to average covered-target anchors by covered target and skip singleton covered target batches. Use --direct-answer-mode branch-balanced-context-target-balanced-anchor-unlikelihood for the same objective with target-balanced sampled branch and replay batches. Use --direct-answer-mode branch-context-coverage-deficit-unlikelihood to identify replay target tokens that are absent from current replay predictions and focus extra target pressure on those missing targets. Use --direct-answer-mode branch-balanced-context-coverage-deficit-unlikelihood for the same objective with target-balanced sampled branch and replay batches. Use --direct-answer-mode branch-context-coverage-preserving-deficit-unlikelihood to combine missing-target pressure with target-balanced preservation anchors for target tokens currently represented in replay predictions. Use --direct-answer-mode branch-balanced-context-coverage-preserving-deficit-unlikelihood for the same objective with target-balanced sampled branch and replay batches. Use --direct-answer-mode branch-context-profile-coverage-preserving-deficit-unlikelihood to compute those deficits and preservation anchors inside each admitted source/profile instead of one global replay target set. Use --direct-answer-mode branch-balanced-context-profile-coverage-preserving-deficit-unlikelihood for the same objective with target-balanced sampled branch and replay batches. Profile-aware modes emit direct_answer_replay_plan.json with branch counts, replay counts, target ids, represented target ids, missing target ids, and coverage floors by profile before direct-answer training starts. Best branch snapshot scoring first enforces a profile-wise target-token coverage floor against the baseline snapshot. Eligible snapshots then use target-rank/top-k evidence before generic wrong-token diversity, so restore prefers snapshots that move correct targets upward without trading away coverage. v0.51 adds opt-in foundation-stack controls before the next repair objective: --optimizer adamw, --gradient-accumulation-steps, warmup/decay schedule flags, --resume-checkpoint, --resume-optimizer, --attention-heads, --use-rms-norm, --use-gated-mlp, --tie-output-embeddings, --use-rotary-positions, --use-kv-cache-path, generation sampling controls, and eval --samples-jsonl trace artifacts. Use STRUCTURE_AUDIT.md before adding the next transformer repair objective: QuarkLM may study open-source model/trainer/tokenizer/checkpoint structure, but must not import external weights, tokenizers, embeddings, datasets, or training text. Use --use-pre-layer-norm to run the audited opt-in GPT-style pre-layer-norm block path with final normalization before the language-model head.

Current language-model evidence from runs/transformer-v0.25/:

Signal	Value
Steps	`40`
Validation NLL	`3.5885 -> 3.4382`
Answer exact eval	`0/28`
Pretrained weights	`false`
Pretrained tokenizer	`false`

Current promoted answer-lesson evidence from runs/transformer-answer-v0.42-branch-repair-contrast50-dim8-context32/:

Signal	Value
Steps	`80`
Context size	`32`
Embedding / feed-forward dimensions	`8 / 16`
Candidate scope	`eval`
Direct answer steps	`1000`
Direct answer mode	`periodic-branch-repair-contrast-unlikelihood`
Direct answer negative weight	`1.0`
Direct answer positive weight	`1.0`
Direct answer contrast weight	`1.0`
Direct answer branch position	`1`
Direct answer rollout interval	`50`
Direct answer training examples	`9144`
Direct answer exact	`0/219 -> 0/219`
Direct answer target loss	`3.4278 -> 2.2708`
Direct answer uses candidates	`false`
Direct answer auxiliary weights	`false`
Answer target NLL	`3.5850 -> 2.4129`
Transformer-only candidate accuracy	`15/219 -> 37/219`
Selector-emitted exact answers	`18/219 -> 219/219`
Selector candidate accuracy	`18/219 -> 219/219`
v0.31 generator exact without candidates	`0/219 -> 219/219`
v0.31 generator target loss	`3.3160 -> 0.0029`
Pretrained weights	`false`
Pretrained tokenizer	`false`
External embeddings	`false`
v0.31 generator uses answer candidates	`false`

Latest bounded stacked-transformer screen:

Signal	Value
Run	`runs/transformer-answer-v0.43-two-layer-toponly-skip-screen-dim8-context32/`
Layers	`2`
Steps	`40` target-loss + `80` direct-answer
Direct-answer update scope	top layer and language-model head only
Post-direct candidate snapshot	skipped and recorded in metrics
Pre-direct candidate accuracy	`15/219 -> 15/219`
Pre-direct answer target NLL	`3.5855 -> 3.4796`
Direct answer target loss	`3.5186 -> 3.2436`
Direct answer exact	`0/219 -> 0/219`
Failure pattern	repeated `"a"` greedy completion
Promotion status	screening evidence only

Latest direct-answer diagnostic smoke:

Signal	Value
Run	`runs/transformer-answer-v0.43-branch-profile-smoke-dim4-context16/`
Diagnostic	branch profiles from model logits
Branch position	`1`
Smoke steps	`5` target-loss + `5` direct-answer
Post-direct candidate snapshot	skipped and recorded in metrics
QA branch accuracy	`1/8 -> 1/8`
Dominant QA branch prediction	all `"o"` -> all `"y"`
Final QA target margin	negative, about `-0.0048`
Promotion status	diagnostic smoke only

Latest branch repair smoke:

Signal	Value
Selected comparison run	`runs/transformer-answer-v0.43-periodic-branch-batch-smoke-dim4-context16/`
Prior rejected repair	`runs/transformer-answer-v0.43-periodic-branch-collapse-smoke-dim4-context16/`
Mode	`periodic-branch-batch-contrast-unlikelihood`
Branch batch size	`4`
Rollout interval	`5`
Steps	`5` target-loss + `20` direct-answer
Direct answer loss	`3.5800 -> 3.5248`
QA branch accuracy	`1/8 -> 0/8`
Dominant QA branch prediction	all `"o"` -> all `"a"`
Promotion status	rejected repair evidence

Latest representation-side smoke:

Signal	Value
Selected run	`runs/transformer-answer-v0.43-context-mean-branch-repair-smoke-dim4-context16/`
Comparison run	`runs/transformer-answer-v0.43-context-mean-branch-batch-smoke-dim4-context16/`
Representation option	`--use-context-mean`
Selected mode	`periodic-branch-repair-unlikelihood`
Comparison mode	`periodic-branch-batch-contrast-unlikelihood`
Steps	`5` target-loss + `20` direct-answer
Post-direct candidate snapshot	skipped and recorded in metrics
Selected direct answer loss	`3.5805 -> 3.5310`
Comparison direct answer loss	`3.5805 -> 3.5252`
Selected QA branch accuracy	`1/8 -> 0/8`
Comparison QA branch accuracy	`1/8 -> 0/8`
Dominant QA branch prediction	all `"o"` -> all `"a"` in both screens
Promotion status	rejected representation evidence

Latest learned-representation smoke:

Signal	Value
Selected run	`runs/transformer-answer-v0.43-context-projection-branch-repair-smoke-dim4-context16/`
Comparison run	`runs/transformer-answer-v0.43-context-projection-branch-batch-smoke-dim4-context16/`
Representation option	`--use-context-projection`
Selected mode	`periodic-branch-repair-unlikelihood`
Comparison mode	`periodic-branch-batch-contrast-unlikelihood`
Steps	`5` target-loss + `20` direct-answer
Post-direct candidate snapshot	skipped and recorded in metrics
Projection parameter movement	all `20` parameters moved in both screens
Selected direct answer loss	`3.5802 -> 3.5217`
Comparison direct answer loss	`3.5802 -> 3.5252`
Selected QA branch accuracy	`1/8 -> 0/8`
Comparison QA branch accuracy	`1/8 -> 0/8`
Dominant QA branch prediction	all `"o"` -> all `"a"` in both screens
Promotion status	rejected representation evidence

Latest prompt-attention representation smoke:

Signal	Value
Selected run	`runs/transformer-answer-v0.43-prompt-attention-branch-repair-smoke-dim4-context16/`
Comparison run	`runs/transformer-answer-v0.43-prompt-attention-branch-batch-smoke-dim4-context16/`
Representation option	`--use-prompt-attention-summary`
Selected mode	`periodic-branch-repair-unlikelihood`
Comparison mode	`periodic-branch-batch-contrast-unlikelihood`
Steps	`5` target-loss + `20` direct-answer
Post-direct candidate snapshot	skipped and recorded in metrics
Output projection movement	all `20` zero-initialized parameters moved in both screens
Selected direct answer loss	`3.5802 -> 3.5217`
Comparison direct answer loss	`3.5802 -> 3.5252`
Selected QA branch accuracy	`1/8 -> 0/8`
Comparison QA branch accuracy	`1/8 -> 0/8`
Dominant QA branch prediction	all `"o"` -> all `"a"` in both screens
Promotion status	rejected representation evidence

Latest branch-context coverage diagnostic:

Signal	Context 16	Context 32	Context 80
Run	`runs/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context16/`	`runs/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context32/`	`runs/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context80/`
QA semantic coverage	`0/8`	`0/8`	`8/8`
QA ambiguous branch contexts	`4`	`0`	`0`
All-eval semantic coverage	`0/219`	`53/219`	`219/219`
All-eval ambiguous branch contexts	`40`	`0`	`0`
Promotion status	diagnostic only	diagnostic only	diagnostic only

Latest branch-context gate smoke:

Signal	Context 16	Context 80
Run	`runs/transformer-answer-v0.43-branch-context-gate-smoke-dim4-context16/`	`runs/transformer-answer-v0.43-branch-context-gate-smoke-dim4-context80/`
Required gate	`true`	`true`
Gate status	failed	passed
Requested direct steps	`5`	`1`
Actual direct steps	`0`	`1`
Training skipped	`true`	`false`
Promotion status	guardrail evidence only	guardrail evidence only

Latest branch-only snapshot smoke:

Signal	Initial smoke	Repair/contrast screen	Branch-batch screen
Run	`runs/transformer-answer-v0.43-branch-context-gated-branchonly-smoke-dim4-context80/`	`runs/transformer-answer-v0.43-branchonly-periodic-repair-contrast50-dim8-context80/`	`runs/transformer-answer-v0.43-branchonly-branch-batch-dim8-context80/`
Context size	`80`	`80`	`80`
Embedding/feed-forward dim	`4/8`	`8/16`	`8/16`
Snapshot mode	`branch-only`	`branch-only`	`branch-only`
Required gate	passed, `219/219` semantic records covered	passed, `219/219` semantic records covered	passed, `219/219` semantic records covered
Requested/actual direct steps	`5/5`	`100/100`	`50/50`
JSONL greedy evals skipped	`true`	`true`	`true`
QA branch profile	all `"x"` to all `"r"`; `1/8` final	all space to all `"a"`; `0/8` final	all space to all `"a"`; `0/8` final
Direct loss signal	smoke only	interval train loss `6.7890 -> 6.4326`	interval train loss `3.4614 -> 3.1976`
Promotion status	screening efficiency evidence only	rejected screening evidence	rejected screening evidence

Latest branch-diversity target smoke:

Signal	Value
Run	`runs/transformer-answer-v0.43-branch-diversity-target-smoke-dim4-context80/`
Context gate	passed, `219/219` semantic records covered
Direct steps	`5/5`
Snapshot mode	`branch-only`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	`"r"` at rate `1.0`
Final QA target-token coverage	`0.125`
Promotion status	explicit target evidence only

Latest branch-diversity training smoke:

Signal	Value
Run	`runs/transformer-answer-v0.43-branch-diversity-train-smoke-dim4-context80/`
Mode	`branch-diversity-unlikelihood`
Context gate	passed, `219/219` semantic records covered
Direct steps	`10/10`
Snapshot mode	`branch-only`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	all `"b"`
Final QA target-token coverage	`0.125`
Promotion status	rejected training-mode evidence

Latest branch-diversity freeze-bias smoke:

Signal	Value
Run	`runs/transformer-answer-v0.43-branch-diversity-freezebias-smoke-dim4-context80/`
Mode	`branch-diversity-unlikelihood`
Stabilizer	`--direct-answer-freeze-output-bias`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Snapshot mode	`branch-only`
Diversity target	failed, `0/9` multi-target profiles passed
Direct answer train loss	`3.6149 -> 3.5016`
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	all `"w"`
Final QA target-token coverage	`0.0`
Promotion status	rejected stabilizer evidence

Latest branch-target softmax smoke:

Signal	Value
Run	`runs/transformer-answer-v0.43-branch-target-softmax-freezebias-smoke-dim4-context80/`
Mode	`branch-target-softmax-unlikelihood`
Stabilizer	`--direct-answer-freeze-output-bias`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Snapshot mode	`branch-only`
Diversity target	failed, `0/9` multi-target profiles passed
Composite train loss	`5.6671 -> 5.5820`
Best QA predicted unique	`2/8` at step `20`
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	all `"w"`
Final QA target-token coverage	`0.0`
Promotion status	rejected target-set evidence

Latest branch restore-best smoke:

Signal	Value
Run	`runs/transformer-answer-v0.43-branch-target-softmax-restorebest-smoke-dim4-context80/`
Mode	`branch-target-softmax-unlikelihood`
Stabilizers	`--direct-answer-freeze-output-bias`, `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Restored best branch snapshot	yes, from step `40`
Best branch score	`[0.0, 0.0, -9.0, 0.0, 0.0946, 0.1409, 0.0]`
Snapshot mode	`branch-only`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	all `"u"`
Final QA target-token coverage	`0.125`
Promotion status	rejected guardrail evidence

Latest prompt-prefix projection smoke:

Signal	Value
Run	`runs/transformer-answer-v0.43-prompt-prefix-target-softmax-restorebest-smoke-dim4-context80/`
Representation option	`--use-prompt-prefix-projection`
Mode	`branch-target-softmax-unlikelihood`
Stabilizers	`--direct-answer-freeze-output-bias`, `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Prompt-prefix projection movement	all `20` parameters moved, max abs about `0.0942`
Composite train loss	`5.6649 -> 5.5679`
Restored best branch snapshot	yes, from step `40`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	all `"u"`
Final QA target-token coverage	`0.125`
Promotion status	rejected representation evidence

Latest prompt-position projection smoke:

Signal	Value
Run	`runs/transformer-answer-v0.43-prompt-position-target-softmax-restorebest-smoke-dim4-context80/`
Representation option	`--use-prompt-position-projection`
Mode	`branch-target-softmax-unlikelihood`
Stabilizers	`--direct-answer-freeze-output-bias`, `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Prompt-position projection movement	`1108/1284` parameters moved, max abs about `0.0942`
Composite train loss	`5.6649 -> 5.5679`
Restored best branch snapshot	yes, from step `40`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	all `"u"`
Final QA target-token coverage	`0.125`
Promotion status	rejected representation evidence

Latest branch-target margin smoke:

Signal	Value
Run	`runs/transformer-answer-v0.43-branch-target-margin-prompt-position-smoke-dim4-context80/`
Mode	`branch-target-margin-unlikelihood`
Representation option	`--use-prompt-position-projection`
Stabilizers	`--direct-answer-freeze-output-bias`, `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Prompt-position projection movement	`1108/1284` parameters moved, max abs about `0.1096`
Train loss	`4.8973 -> 4.7784`
Restored best branch snapshot	yes, from step `40`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	all `"u"`
Final QA target-token coverage	`0.125`
Promotion status	rejected target-margin evidence

Latest branch-representation contrast smoke:

Signal	Value
Run	`runs/transformer-answer-v0.43-branch-representation-contrast50-prompt-position-smoke-dim4-context80/`
Mode	`branch-representation-contrast-unlikelihood`
Representation option	`--use-prompt-position-projection`
Representation contrast weight	`50.0`
Stabilizers	`--direct-answer-freeze-output-bias`, `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Snapshot diagnostic	`branch_representation_profiles`
Train loss	`53.5827 -> 53.4342`
Restored best branch snapshot	yes, from step `40`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	all `"u"`
Final QA target-token coverage	`0.125`
Final QA different-target hidden distance	avg about `0.00107`, max about `0.00237`
Promotion status	rejected representation-contrast evidence

Latest branch-representation capacity smoke:

Signal	Value
Run	`runs/transformer-answer-v0.43-branch-representation-contrast50-prompt-position-smoke-dim8-context80-steps40/`
Mode	`branch-representation-contrast-unlikelihood`
Embedding/feed-forward dim	`8/16`
Representation option	`--use-prompt-position-projection`
Representation contrast weight	`50.0`
Stabilizers	`--direct-answer-freeze-output-bias`, `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`40/40`; 50-step dim8 screen was too slow for the regular loop
Train loss	`53.6111 -> 53.5752`
Restored best branch snapshot	yes, from step `10`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	all `"u"`
Final QA target-token coverage	`0.125`
Final QA different-target hidden distance	avg about `0.00209`, max about `0.00367`
Promotion status	rejected capacity evidence

Latest prompt-position scale smoke:

Signal	Value
Run	`runs/transformer-answer-v0.43-prompt-position-scale32-repcontrast50-smoke-dim4-context80/`
Mode	`branch-representation-contrast-unlikelihood`
Representation option	`--use-prompt-position-projection`
Prompt-position scale	`32.0`
Representation contrast weight	`50.0`
Stabilizers	`--direct-answer-freeze-output-bias`, `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Train loss	`55.3835 -> 50.8435`
Prompt-position parameters moved	`1108/1284`, max absolute value about `0.07087`
Restored best branch snapshot	yes, from step `40`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	all `"u"`
Final QA target-token coverage	`0.125`
Final QA different-target hidden distance	restored avg about `0.01235`, max about `0.03610`; raw step-50 avg about `0.4115` before restore
Promotion status	rejected prompt-signal scale evidence

Latest pre-layer-norm structural smoke:

Signal	Value
Run	`runs/transformer-answer-v0.44-prelayernorm-repcontrast50-prompt-position-smoke-dim4-context80/`
Mode	`branch-representation-contrast-unlikelihood`
Architecture option	`--use-pre-layer-norm`
Representation option	`--use-prompt-position-projection`
Representation contrast weight	`50.0`
Stabilizers	`--direct-answer-freeze-output-bias`, `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Train loss	final interval `43.8918`
Prompt-position parameters moved	`1108/1284`, max absolute value about `0.44679`
Final-norm parameters moved	`8/8`, max absolute value about `2.6389`
Restored best branch snapshot	no; step `50` was best
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	all `"y"`
Final QA target-token coverage	`0.125`
Final QA different-target hidden distance	avg about `0.2835`, max about `0.5151`
Partial diversity	`7/9` multi-target profiles no longer fully collapsed
Promotion status	partial structural evidence, rejected for promotion

Latest target-balanced branch-batch smoke:

Signal	Value
Run	`runs/transformer-answer-v0.44-target-balanced-prelayernorm-repcontrast50-prompt-position-smoke-dim4-context80/`
Mode	`branch-balanced-representation-contrast-unlikelihood`
Architecture option	`--use-pre-layer-norm`
Representation option	`--use-prompt-position-projection`
Batch sampler	target-bucket balanced branch batch
Representation contrast weight	`50.0`
Stabilizers	`--direct-answer-freeze-output-bias`, `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Train loss	final interval `50.6619`
Prompt-position parameters moved	`516/1284`, max absolute value about `0.05881`
Final-norm parameters moved	`8/8`, max absolute value about `1.0013`
Restored best branch snapshot	yes, restored to baseline step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	all `"n"`
Final QA target-token coverage	`0.125`
Final QA different-target hidden distance	restored avg about `0.1261`, max about `0.2476`
Promotion status	rejected sampler evidence

Latest branch-rank diagnostic smoke:

Signal	Value
Run	`runs/transformer-answer-v0.45-branch-rank-diagnostic-smoke-dim4-context80/`
Diagnostic	branch target rank, top-3/top-5 target coverage, and failed-record top predictions
Architecture option	`--use-pre-layer-norm`
Representation option	`--use-prompt-position-projection`
Snapshot mode	`branch-only`
Context gate	passed, `219/219` semantic records covered
Direct steps	`1/1`
Final QA dominant prediction	all `"n"`
Final QA average target rank	`14.25`
Final QA top-3/top-5 target coverage	`0.125` / `0.125`
Final heldout dominant prediction	all `"n"`
Final heldout average target rank	`14.25`
Final heldout top-3/top-5 target coverage	`0.125` / `0.125`
Promotion status	diagnostic evidence only

Latest output-binding repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.46-output-binding-rankscore-smoke-dim4-context80/`
Mode	`branch-output-binding-unlikelihood`
Architecture option	`--use-pre-layer-norm`
Representation option	`--use-prompt-position-projection`
Binding weight	`2.0`
Stabilizers	`--direct-answer-freeze-output-bias`, rank-aware `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`20/20`
Train loss	`8.7064 -> 8.2205`
Restored best branch snapshot	no; step `20` was best by aggregate rank-aware score
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `2`
Final QA dominant predictions	wrong `"l"`/`"j"` branch tokens
Final QA average target rank	`17.375 -> 14.125`
Final QA top-3/top-5 target coverage	`0.125 -> 0.0` / `0.125 -> 0.25`
Final heldout average target rank	`17.25 -> 14.375`
Final heldout top-3/top-5 target coverage	`0.125 -> 0.0` / `0.125 -> 0.25`
Promotion status	rejected output-binding evidence

Latest rank-margin repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.47-rank-margin-steps50-smoke-dim4-context80/`
Mode	`branch-rank-margin-unlikelihood`
Architecture option	`--use-pre-layer-norm`
Representation option	`--use-prompt-position-projection`
Hard wrong tokens	`5`
Margin weight	`2.0`
Stabilizers	`--direct-answer-freeze-output-bias`, rank-aware `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Train loss	`7.3649 -> 6.1629`
Restored best branch snapshot	yes, restored from step `40`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	wrong `"n"`
Final QA average target rank	`17.375 -> 9.0`
Final QA target-token coverage	`0.0 -> 0.125`
Final QA top-3/top-5 target coverage	`0.125 -> 0.25` / `0.125 -> 0.5`
Final heldout average target rank	`17.25 -> 9.0`
Final heldout target-token coverage	`0.0 -> 0.125`
Final heldout top-3/top-5 target coverage	`0.125 -> 0.25` / `0.125 -> 0.375`
Promotion status	rejected rank-lift evidence

Latest balanced rank-margin repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.48-balanced-rank-margin-smoke-dim4-context80/`
Mode	`branch-balanced-rank-margin-unlikelihood`
Architecture option	`--use-pre-layer-norm`
Representation option	`--use-prompt-position-projection`
Batch sampler	target-balanced branch batch
Hard wrong tokens	`5`
Margin weight	`2.0`
Stabilizers	`--direct-answer-freeze-output-bias`, rank-aware `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Train loss	`7.2303 -> 6.3662`
Restored best branch snapshot	no; step `50` was best by aggregate rank-aware score
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `2`
Final QA dominant predictions	wrong `"a"`/`"n"` branch tokens
Final QA average target rank	`17.375 -> 9.375`
Final QA target-token coverage	`0.0 -> 0.125`
Final QA top-3/top-5 target coverage	`0.125 -> 0.375` / `0.125 -> 0.5`
Final heldout average target rank	`17.25 -> 9.625`
Final heldout target-token coverage	`0.0 -> 0.125`
Final heldout top-3/top-5 target coverage	`0.125 -> 0.25` / `0.125 -> 0.5`
Promotion status	rejected balanced rank-lift evidence

Latest top-one hard-negative rank-margin smoke:

Signal	Value
Run	`runs/transformer-answer-v0.49-balanced-rank-margin-top1-smoke-dim4-context80/`
Mode	`branch-balanced-rank-margin-unlikelihood`
Architecture option	`--use-pre-layer-norm`
Representation option	`--use-prompt-position-projection`
Batch sampler	target-balanced branch batch
Hard wrong tokens	`1`
Margin weight	`2.0`
Stabilizers	`--direct-answer-freeze-output-bias`, rank-aware `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Train loss	`7.3512 -> 6.3642`
Restored best branch snapshot	yes, restored from step `10`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	wrong `"n"`
Final QA average target rank	`17.375 -> 12.5`
Final QA target-token coverage	`0.0 -> 0.125`
Final QA top-3/top-5 target coverage	`0.125 -> 0.125` / `0.125 -> 0.25`
Final heldout average target rank	`17.25 -> 12.375`
Final heldout target-token coverage	`0.0 -> 0.125`
Final heldout top-3/top-5 target coverage	`0.125 -> 0.125` / `0.125 -> 0.25`
Promotion status	rejected top-one hard-negative evidence

Latest top-k softmax branch repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.50-balanced-topk-softmax-w5-smoke-dim4-context80/`
Mode	`branch-balanced-topk-softmax-unlikelihood`
Architecture option	`--use-pre-layer-norm`
Representation option	`--use-prompt-position-projection`
Batch sampler	target-balanced branch batch
Hard wrong tokens	`5`
Restricted-softmax weight	`5.0`
Stabilizers	`--direct-answer-freeze-output-bias`, rank-aware `--direct-answer-restore-best-branch-snapshot`
Context gate	passed, `219/219` semantic records covered
Direct steps	`50/50`
Restored best branch snapshot	yes, restored from step `40`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `1`
Final QA dominant prediction	wrong `"u"`
Final QA average target rank	`17.375 -> 8.75`
Final QA target-token coverage	`0.0 -> 0.125`
Final QA top-3/top-5 target coverage	`0.125 -> 0.375` / `0.125 -> 0.5`
Final heldout average target rank	`17.25 -> 8.75`
Final heldout target-token coverage	`0.0 -> 0.125`
Final heldout top-3/top-5 target coverage	`0.125 -> 0.375` / `0.125 -> 0.5`
Promotion status	rejected top-k softmax rank-lift evidence

Latest foundation-stack smoke:

Signal	Value
Run	`runs/transformer-v0.51-foundation-stack-smoke/`
Checkpoint format	`quarklm-transformer-v2`
Optimizer	`adamw` with saved `optimizer_state.json`
Schedule / accumulation	warmup `1`, decay `2`, gradient accumulation `2`
Architecture switches	`--attention-heads 2`, `--use-rms-norm`, `--use-gated-mlp`, `--tie-output-embeddings`, `--use-rotary-positions`
Runtime switches	`--use-kv-cache-path`, eval `--use-kv-cache`, top-k/top-p/temperature/repetition controls
Eval artifacts	`eval.json` and replayable `eval_samples.jsonl` with token traces
Steps	`2/2` language-model smoke steps
Validation status	mechanics smoke completed; transformer tests pass
Promotion status	foundation mechanics evidence only

Latest full-stack top-k branch repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.52-fullstack-topk-softmax-smoke-dim4-context80/`
Mode	`branch-balanced-topk-softmax-unlikelihood`
Foundation stack	AdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Direct steps	`50/50`
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA dominant prediction	wrong `"i"`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Final QA top-3/top-5 target coverage	`0.25` / `0.375`
Final heldout average target rank	`13.375`
Final heldout target-token coverage	`0.25`
Final heldout top-3/top-5 target coverage	`0.25` / `0.375`
Promotion status	rejected unchanged top-k pressure under full stack

Latest full-stack bidirectional binding branch repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.53-fullstack-bidir-binding-smoke-dim4-context80/`
Mode	`branch-balanced-bidirectional-binding-unlikelihood`
Foundation stack	AdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Binding pressure	row-wise branch target choice plus column-wise target-token ownership across prompt contexts
Unit coverage	focused transformer tests pass, including the context-ownership regression
Direct steps	`50/50`
Restored best branch snapshot	yes, restored from step `40`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `2`
Final QA dominant prediction	wrong `"a"`
Final QA average target rank	`7.875`
Final QA target-token coverage	`0.125`
Final QA top-3/top-5 target coverage	`0.25` / `0.5`
Step-50 QA note	target-token coverage briefly reached `0.25` with average rank `8.375` before restore selected step `40`
Final heldout average target rank	`9.0`
Final heldout target-token coverage	`0.125`
Final heldout top-3/top-5 target coverage	`0.25` / `0.375`
Promotion status	partial rank-pressure progress; rejected until target coverage is preserved and top-1 branch choices improve

Latest full-stack coverage binding branch repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.54-fullstack-coverage-binding-smoke-dim4-context80/`
Mode	`branch-balanced-coverage-binding-unlikelihood`
Foundation stack	AdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Binding pressure	branch targets versus sibling targets plus hard wrong tokens, with target-set mass coverage guard
Hard wrong tokens	`5`
Unit coverage	focused transformer tests pass, including the hard-wrong-token coverage regression
Direct steps	`50/50`
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA dominant prediction	wrong `"i"`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Final QA top-3/top-5 target coverage	`0.25` / `0.375`
Training snapshot note	step `50` improved QA average target rank to `8.125`, but target-token coverage collapsed to `0.0` with wrong `"a"` top-1 collapse
Final heldout average target rank	`13.375`
Final heldout target-token coverage	`0.25`
Final heldout top-3/top-5 target coverage	`0.25` / `0.375`
Promotion status	rejected; best-snapshot scoring protected the checkpoint, but the objective traded target coverage away for rank

Latest full-stack target-set coverage branch repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.55-fullstack-target-set-coverage-smoke-dim4-context80/`
Mode	`branch-balanced-target-set-coverage-unlikelihood`
Foundation stack	AdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Coverage pressure	branch target set versus hard wrong tokens, without exact-target row loss or cross-context ownership
Positive target CE	`0.0`
Hard wrong tokens	`5`
Unit coverage	focused transformer tests pass, including the target-set-only coverage regression
Direct steps	`50/50`
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA dominant prediction	wrong `"i"`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Final QA top-3/top-5 target coverage	`0.25` / `0.375`
Training snapshot note	step `50` improved QA average target rank to `10.0`, but target-token coverage collapsed to `0.0` with wrong `"a"` top-1 collapse
Final heldout average target rank	`13.375`
Final heldout target-token coverage	`0.25`
Final heldout top-3/top-5 target coverage	`0.25` / `0.375`
Promotion status	rejected; batch-local target-set mass is not enough to preserve eval target-token coverage

Latest full-stack target-diversity branch repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.57-fullstack-target-diversity-smoke-dim4-context80/`
Mode	`branch-balanced-target-diversity-unlikelihood`
Foundation stack	AdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Diversity pressure	target-set mass plus target-share balance over branch targets
Positive target CE	`0.0`
Hard wrong tokens	`5`
Unit coverage	focused transformer tests pass, including restricted target-set mass and weakest target-share balance regression
Direct steps	`50/50`
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA dominant prediction	wrong `"i"`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Final QA top-3/top-5 target coverage	`0.25` / `0.375`
Training snapshot note	step `50` improved QA average target rank to `10.0`, but target-token coverage collapsed to `0.0` with wrong `"a"` top-1 collapse
Final heldout average target rank	`13.375`
Final heldout target-token coverage	`0.25`
Final heldout top-3/top-5 target coverage	`0.25` / `0.375`
Promotion status	rejected; batch-local target-share diversity still does not preserve eval-wide target-token coverage

Latest full-stack target-replay coverage branch repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.58-fullstack-target-replay-coverage-smoke-dim4-context80/`
Mode	`branch-balanced-target-replay-coverage-unlikelihood`
Foundation stack	AdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Replay pressure	target-set mass plus target-share balance over admitted branch-pool targets
Positive target CE	`0.0`
Hard wrong tokens	`5`
Unit coverage	focused transformer tests pass, including sampled-batch missing pool-target replay regression
Direct steps	`50/50`
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA dominant prediction	wrong `"i"`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Final QA top-3/top-5 target coverage	`0.25` / `0.375`
Training snapshot note	step `40` improved QA average target rank to `6.875` and top-5 coverage to `0.5`; by step `50`, QA/heldout top-1 collapsed to wrong `"n"` and target-token coverage had hit `0.0` during training
Final heldout average target rank	`13.375`
Final heldout target-token coverage	`0.25`
Final heldout top-3/top-5 target coverage	`0.25` / `0.375`
Promotion status	rejected; pool-owned replay coverage still does not preserve context-specific target ownership

Latest full-stack context-replay coverage branch repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.59-fullstack-context-replay-coverage-smoke-dim4-context80/`
Mode	`branch-balanced-context-replay-coverage-unlikelihood`
Foundation stack	AdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Replay pressure	target-set mass plus context-owned target share over admitted branch-pool replay contexts
Positive target CE	`0.0`
Hard wrong tokens	`5`
Unit coverage	focused transformer tests pass, including fixed replay-context owned-target share regression
Direct steps	`50/50`
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA dominant prediction	wrong `"i"`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Final QA top-3/top-5 target coverage	`0.25` / `0.375`
Training snapshot note	step `40` improved QA average target rank to `7.375`, top-3 to `0.375`, and top-5 to `0.5`; by step `50`, QA predicted diversity was only `2/8` and target-token coverage had hit `0.0` during training
Final heldout average target rank	`13.375`
Final heldout target-token coverage	`0.25`
Final heldout top-3/top-5 target coverage	`0.25` / `0.375`
Promotion status	rejected; context-owned replay improves rank/top-k snapshots but still does not preserve target-token coverage

Latest full-stack coverage-floor branch restore smoke:

Signal	Value
Run	`runs/transformer-answer-v0.60-fullstack-context-replay-coverage-floor-metadata-smoke-dim4-context80/`
Mode	`branch-balanced-context-replay-coverage-unlikelihood`
Scoring guard	profile-wise target-token coverage floor before rank/top-k scoring
Snapshot metadata	direct-answer JSONL rows include `branch_target_coverage_by_profile`
Foundation stack	AdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Positive target CE	`0.0`
Hard wrong tokens	`5`
Unit coverage	focused transformer tests pass, including profile-wise coverage-floor regression
Direct steps	`50/50`
Direct-answer JSONL rows	`7` clean rows
Restored best branch snapshot	yes, restored from step `0`
Baseline coverage floor	`qa` `0.25`, `heldout` `0.25`, `admissions` `0.1429`, minimum profile `0.0714`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA dominant prediction	wrong `"i"`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Final QA top-3/top-5 target coverage	`0.25` / `0.375`
Training snapshot note	step `40` improved QA average target rank to `7.375`, top-3 to `0.375`, and top-5 to `0.5`, but regressed profile coverage and was ineligible for restore
Final heldout average target rank	`13.375`
Final heldout target-token coverage	`0.25`
Final heldout top-3/top-5 target coverage	`0.25` / `0.375`
Promotion status	gate repair accepted; trained model behavior rejected because coverage still collapses during training

Latest full-stack covered-target anchor branch repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.61-fullstack-context-coverage-anchor-smoke-dim4-context80/`
Mode	`branch-balanced-context-coverage-anchor-unlikelihood`
Scoring guard	profile-wise target-token coverage floor before rank/top-k scoring
Anchor pressure	covered replay branches add target-vs-replay-target/hard-wrong CE
Foundation stack	AdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Positive target CE	`0.0`
Hard wrong tokens	`5`
Unit coverage	focused transformer tests pass, including anchored-vs-unanchored covered branch regression
Direct steps	`50/50`
Direct-answer JSONL rows	`7` clean rows
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA dominant prediction	wrong `"i"`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Final QA top-3/top-5 target coverage	`0.25` / `0.375`
Training snapshot note	snapshots collapsed harder to covered wrong `"i"`; QA/heldout predicted diversity fell to `1/8`, target-token coverage to `0.125`, and average target rank above `21`
Final heldout average target rank	`13.375`
Final heldout target-token coverage	`0.25`
Final heldout top-3/top-5 target coverage	`0.25` / `0.375`
Promotion status	rejected; global covered-target anchoring over-protects one covered token instead of preserving coverage diversity

Latest full-stack coverage-preserving deficit branch repair smoke:

Signal	Value
Run	`runs/transformer-answer-v0.65-fullstack-coverage-preserving-deficit-smoke-dim4-context80/`
Mode	`branch-balanced-context-coverage-preserving-deficit-unlikelihood`
Scoring guard	profile-wise target-token coverage floor before rank/top-k scoring
Deficit pressure	replay target tokens absent from current replay predictions receive target-vs-hard-candidate pressure
Preservation pressure	target tokens currently represented in replay predictions receive target-balanced anchors
Foundation stack	AdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Positive target CE	`0.0`
Hard wrong tokens	`5`
Unit coverage	focused transformer tests pass, including missing-target lift and represented-target preservation regressions
Direct steps	`50/50`
Direct-answer JSONL rows	`7` clean rows
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA dominant prediction	wrong `"i"`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Final QA top-3/top-5 target coverage	`0.25` / `0.375`
Training snapshot note	step `50` reached QA/heldout branch accuracy `1/8`, QA average target rank `7.75`, heldout average target rank `7.125`, and top-5 coverage `0.5`, but both profiles collapsed to predicted diversity `1/8` and target-token coverage `0.125`
Final heldout average target rank	`13.375`
Final heldout target-token coverage	`0.25`
Final heldout top-3/top-5 target coverage	`0.25` / `0.375`
Promotion status	rejected; current-prediction preservation improves rank but over-preserves one represented target token

Latest profile-aware replay-plan smoke:

Signal	Value
Run	`runs/transformer-answer-v0.67-profile-aware-replay-plan-smoke-dim4-context80/`
Mode	`branch-balanced-context-profile-coverage-preserving-deficit-unlikelihood`
Replay plan artifact	`direct_answer_replay_plan.json`
Replay plan size	`9144` branch records and `9144` replay records across `21` profiles
Example profile floors	`qa:place` coverage floor `0.5`; `qa:color` coverage floor `0.0`
Foundation stack	AdamW, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Unit coverage	focused transformer tests pass, including profile-deficit isolation and shared-target source preservation
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Direct steps	`1/1` bounded smoke
Snapshot mode	`branch-only`; post-direct candidate snapshot skipped and recorded
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Promotion status	mechanics-readiness evidence only; profile-aware plan exists, but model quality is not promoted

Latest profile-aware full-stack repair screen:

Signal	Value
Run	`runs/transformer-answer-v0.68-fullstack-profile-aware-preserving-deficit-smoke-dim4-context80/`
Mode	`branch-balanced-context-profile-coverage-preserving-deficit-unlikelihood`
Replay plan artifact	`direct_answer_replay_plan.json`
Replay plan size	`9144` branch records and `9144` replay records across `21` profiles
Foundation stack	AdamW, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Direct steps	`50/50`
Direct-answer JSONL rows	`7` clean rows
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3` after restore
Final QA average target rank	`13.25` after restore
Final QA target-token coverage	`0.25` after restore
Training snapshot note	step `40` improved QA average target rank to `6.5` and top-5 coverage to `0.625`, but QA target-token coverage regressed to `0.125` and predicted diversity collapsed to `1/8`
Final heldout average target rank	`13.375` after restore
Training heldout note	step `40` improved heldout average target rank to `6.875` and top-5 coverage to `0.5`, but target-token coverage regressed to `0.125` and predicted diversity collapsed to `1/8`
Promotion status	rejected; profile-aware rank gains still trade away coverage and diversity

Latest profile target-share full-stack screen:

Signal	Value
Run	`runs/transformer-answer-v0.82-fullstack-profile-target-share-smoke-dim4-context80/`
Mode	`branch-balanced-context-profile-target-share-preserving-deficit-unlikelihood`
Artifact stack	experiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size	`9144` branch records and `9144` replay records across `21` profiles
Foundation stack	AdamW, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representation	context `80`, `--use-pre-layer-norm`, `--use-prompt-position-projection`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Purity gates	no pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps	`50/50`
Direct-answer JSONL rows	`7` clean rows
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3` after restore
Final QA average target rank	`13.25` after restore
Final QA target-token coverage	`0.25` after restore
Training snapshot note	step `40` improved QA average target rank to `9.125` and top-5 coverage to `0.375`, but QA target-token coverage regressed to `0.0` and predicted diversity collapsed to `1/8`
Training heldout note	step `40` improved heldout average target rank to `9.25` and top-5 coverage to `0.375`, but heldout target-token coverage regressed to `0.0` and predicted diversity collapsed to `1/8`
Promotion status	rejected; target-share pressure still trades coverage and diversity away for rank

Latest prompt-specific branch ownership full-stack screen:

Signal	Value
Run	`runs/transformer-answer-v0.83-fullstack-prompt-ownership-smoke-dim4-context80/`
Mode	`branch-balanced-context-profile-prompt-ownership-target-share-preserving-deficit-unlikelihood`
Added mechanic	sibling-target margin inside each profile so a replay context is trained to outrank other profile targets
Unit coverage	focused transformer test passes; prompt ownership lifts a context-specific target more than v0.82 target-share pressure
Artifact stack	experiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size	`9144` branch records and `9144` replay records across `21` profiles
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Purity gates	no pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps	`50/50`
Direct-answer JSONL rows	`7` clean rows
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3` after restore
Final QA average target rank	`13.25` after restore
Final QA target-token coverage	`0.25` after restore
Training snapshot note	step `50` improved QA average target rank to `8.625`, but QA target-token coverage regressed to `0.0` and predicted diversity collapsed to `1/8`
Training heldout note	step `50` improved heldout average target rank to `8.5`, but heldout target-token coverage regressed to `0.0` and predicted diversity collapsed to `1/8`
Promotion status	rejected; prompt ownership needs coverage-preserving training before rank gains can be trusted

Latest baseline-anchored prompt ownership full-stack screen:

Signal	Value
Run	`runs/transformer-answer-v0.84-fullstack-baseline-anchored-prompt-ownership-smoke-dim4-context80/`
Mode	`branch-balanced-context-profile-baseline-anchored-prompt-ownership-target-share-preserving-deficit-unlikelihood`
Added mechanic	replay preservation uses baseline profile-aware replay predictions instead of current prediction drift
Unit coverage	focused transformer tests pass; baseline prediction overrides are used by profiled replay batches and protect a covered target better than dynamic prediction preservation
Artifact stack	experiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size	`9144` branch records and `9144` replay records across `21` profiles
Baseline prediction anchors	`562` recorded and active
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Purity gates	no pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps	`50/50`
Direct-answer JSONL rows	`7` clean rows
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3` after restore
Final QA average target rank	`13.25` after restore
Final QA target-token coverage	`0.25` after restore
Training snapshot note	step `40` improved QA average target rank to `8.0`, but QA target-token coverage regressed to `0.125` and predicted diversity collapsed to `1/8`
Training heldout note	step `40` improved heldout average target rank to `8.375`, but heldout target-token coverage regressed to `0.125` and predicted diversity collapsed to `1/8`
Promotion status	rejected; baseline anchors improve coverage over v0.83 but still miss the full `0.25` coverage floor

Latest baseline-floor update-gated prompt ownership full-stack screen:

Signal	Value
Run	`runs/transformer-answer-v0.85-fullstack-baseline-floor-gated-prompt-ownership-smoke-dim4-context80/`
Mode	`branch-balanced-context-profile-baseline-floor-gated-prompt-ownership-target-share-preserving-deficit-unlikelihood`
Added mechanic	direct-answer updates are rolled back when branch-profile target-token coverage falls below the step-0 baseline floor
Unit coverage	focused transformer tests pass; the new mode records active baseline replay anchors and update-guard accounting
Artifact stack	experiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size	`9144` branch records and `9144` replay records across `21` profiles
Baseline prediction anchors	`562` recorded and active
Update guard	checked `50/50` attempted updates; accepted `0`; rejected `50`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Purity gates	no pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps	`50/50` attempted
Direct-answer JSONL rows	`7` clean rows
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Training snapshot note	every recorded trained snapshot preserved QA target-token coverage at `0.25`, but only because every attempted update was rejected
Training heldout note	every recorded trained snapshot preserved heldout target-token coverage at `0.25`, but only because every attempted update was rejected
Promotion status	rejected; the guard prevents unsafe forgetting, but no weight update is accepted and branch diversity still fails

Latest adaptive baseline-floor prompt ownership full-stack screen:

Signal	Value
Run	`runs/transformer-answer-v0.86-fullstack-baseline-floor-adaptive-prompt-ownership-smoke-dim4-context80/`
Mode	`branch-balanced-context-profile-baseline-floor-adaptive-prompt-ownership-target-share-preserving-deficit-unlikelihood`
Added mechanic	rejected updates are retried at learning-rate scales `1.0`, `0.25`, `0.05`, and `0.01` after restoring model, optimizer, and RNG state
Unit coverage	focused transformer tests pass; the new mode records active baseline replay anchors and adaptive retry accounting
Artifact stack	experiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size	`9144` branch records and `9144` replay records across `21` profiles
Baseline prediction anchors	`562` recorded and active
Adaptive scales	`1.0`, `0.25`, `0.05`, `0.01`
Update guard	checked `50/50` steps; attempted `200` scaled updates; accepted `0`; rejected `200`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Purity gates	no pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps	`50/50` attempted
Direct-answer JSONL rows	`7` clean rows
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Training snapshot note	every recorded trained snapshot preserved QA target-token coverage at `0.25`, but adaptive retries accepted no updates
Training heldout note	every recorded trained snapshot preserved heldout target-token coverage at `0.25`, but adaptive retries accepted no updates
Promotion status	rejected; smaller learning-rate scales do not make the update safe, which sets up the v0.87 repair-retry screen

Latest repaired baseline-floor prompt ownership full-stack screen:

Signal	Value
Run	`runs/transformer-answer-v0.87-fullstack-baseline-floor-repaired-prompt-ownership-clean-smoke-dim4-context80/`
Mode	`branch-balanced-context-profile-baseline-floor-repaired-prompt-ownership-target-share-preserving-deficit-unlikelihood`
Added mechanic	failed adaptive retries get one bounded baseline-covered anchor repair before the floor probe decides whether to keep or roll back the update
Unit coverage	focused transformer tests pass; the new mode records active baseline replay anchors, repair anchors, repair attempts, and accepted update-shape accounting
Artifact stack	experiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size	`9144` branch records and `9144` replay records across `21` profiles
Baseline prediction anchors	`562` recorded and active
Repair anchors	`227` recorded; one repair step per failed retry
Adaptive scales	`1.0`, `0.25`, `0.05`, `0.01`
Update guard	checked `50/50` steps; attempted `200` updates; ran `200` one-step repairs; accepted `0`; rejected `200`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Purity gates	no pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps	`50/50` attempted
Direct-answer JSONL rows	`7` clean rows
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Training snapshot note	every recorded trained snapshot preserved QA target-token coverage at `0.25`, but repair retries accepted no updates
Training heldout note	every recorded trained snapshot preserved heldout target-token coverage at `0.25`, but repair retries accepted no updates
Promotion status	rejected; post-update repair is insufficient and the next repair needs a floor-preserving objective before optimizer application

Latest objective-side baseline-floor prompt ownership full-stack screen:

Signal	Value
Run	`runs/transformer-answer-v0.88-fullstack-baseline-floor-objective-prompt-ownership-smoke-dim4-context80/`
Mode	`branch-balanced-context-profile-baseline-floor-objective-prompt-ownership-target-share-preserving-deficit-unlikelihood`
Added mechanic	a balanced batch of baseline-covered floor anchors is included in the same direct-answer loss and backward pass as branch-diversity pressure
Unit coverage	focused transformer tests pass; the new mode records objective anchor counts, anchor batch size, anchor weight, and accepted/rejected guard accounting
Artifact stack	experiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size	`9144` branch records and `9144` replay records across `21` profiles
Baseline prediction anchors	`562` recorded and active
Objective-side floor anchors	`227` recorded; batch size `32`; weight `10.0`
Adaptive scales	`1.0`, `0.25`, `0.05`, `0.01`
Update guard	checked `50/50` steps; attempted `200` updates; ran `200` objective anchor batches covering `2400` anchor records; accepted `0`; rejected `200`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Purity gates	no pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps	`50/50` attempted
Direct-answer JSONL rows	`7` clean rows
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Training snapshot note	every recorded trained snapshot preserved QA target-token coverage at `0.25`, but objective-side floor anchors accepted no updates
Training heldout note	every recorded trained snapshot preserved heldout target-token coverage at `0.25`, but objective-side floor anchors accepted no updates
Promotion status	rejected; the combined floor-anchor and branch-pressure objective is insufficient, which sets up the stabilization-only screen

Latest stabilization-only baseline-floor full-stack screen:

Signal	Value
Run	`runs/transformer-answer-v0.89-fullstack-baseline-floor-stabilization-smoke-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-stabilization-unlikelihood`
Added mechanic	guarded attempts train only baseline-covered floor anchors, with branch-diversity pressure removed from the update shape
Unit coverage	focused transformer tests pass; the new mode records stabilization anchor counts, anchor batch size, stabilization batches, and accepted/rejected guard accounting
Artifact stack	experiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size	`9144` branch records and `9144` replay records across `21` profiles
Baseline prediction anchors	`562` recorded and active
Stabilization floor anchors	`227` recorded; batch size `32`
Adaptive scales	`1.0`, `0.25`, `0.05`, `0.01`
Update guard	checked `50/50` steps; attempted `200` updates; ran `200` stabilization anchor batches covering `2400` anchor records; accepted `0`; rejected `200`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Purity gates	no pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps	`50/50` attempted
Direct-answer JSONL rows	`7` clean rows
Restored best branch snapshot	yes, restored from step `0`
Diversity target	failed, `0/9` multi-target profiles passed
Final QA target/predicted unique	`8` / `3`
Final QA average target rank	`13.25`
Final QA target-token coverage	`0.25`
Training snapshot note	every recorded trained snapshot preserved QA target-token coverage at `0.25`, but stabilization-only floor anchors accepted no updates
Training heldout note	every recorded trained snapshot preserved heldout target-token coverage at `0.25`, but stabilization-only floor anchors accepted no updates
Promotion status	rejected; floor-only anchor updates are insufficient under the current guard, so the next repair should diagnose the guard/update interaction before branch pressure is added back

Latest baseline-floor rejection diagnostics screen:

Signal	Value
Run	`runs/transformer-answer-v0.90-fullstack-baseline-floor-stabilization-diagnostics-smoke-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-stabilization-unlikelihood`
Added mechanic	guard records rejected update-shape counts, rejected scale counts, violation profile counts, diagnostic samples, and worst rejected floor violation
Unit coverage	focused transformer tests pass; the reusable coverage diagnostic helper reports profile deficits and the stabilization guard records rejection diagnostics
Update guard	checked `50/50` steps; attempted `200` updates; accepted `0`; rejected `200`
Rejected update shapes	`stabilization: 200`
Rejected adaptive scales	`1: 50`, `0.25: 50`, `0.05: 50`, `0.01: 50`
Violation profile counts	`heldout: 200`, `admissions: 150`, `glossary: 150`, `qa: 150`, `self: 100`, `learning: 50`, `owner: 50`
Worst rejected floor violation	`learning`, baseline coverage `0.25`, snapshot coverage `0.0`, deficit `0.25`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed
Promotion status	rejected for model promotion, but diagnostic evidence is usable for the next profile-targeted floor repair

Profile-targeted baseline-floor stabilization screen:

Signal	Value
Run	`runs/transformer-answer-v0.91-fullstack-baseline-floor-profile-targeted-stabilization-smoke-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-profile-targeted-stabilization-unlikelihood`
Added mechanic	guarded attempts train the full baseline-covered floor-anchor profile-target surface instead of a random 32-anchor sample
Unit coverage	focused transformer tests pass; the new mode records profile-target activity, full floor batch sizing, profile-target counts, and source-profile anchor counts
Floor anchors	`227` recorded; requested batch size `227`; `12` profile-target groups
Anchor profile counts	`qa:owner 48`, `qa:place 41`, `fact:owner 40`, `fact:place 40`, `bridge:owner 20`, `bridge:place 16`, `fact:learning 8`, `qa:glossary 6`, `qa:learning 5`, `qa:self 3`
Update guard	checked `50/50` steps; attempted `200` updates; accepted `0`; rejected `200`
Rejected update shapes	`profile_targeted_stabilization: 200`
Rejected adaptive scales	`1: 50`, `0.25: 50`, `0.05: 50`, `0.01: 50`
Violation profile counts	`heldout: 200`, `admissions: 150`, `glossary: 150`, `qa: 150`, `self: 100`, `learning: 50`, `owner: 50`
Worst rejected floor violation	`learning`, baseline coverage `0.25`, snapshot coverage `0.0`, deficit `0.25`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed
Promotion status	rejected; full profile-target floor coverage alone does not make guarded updates safe

Sequential profile-floor stabilization screen:

Signal	Value
Run	`runs/transformer-answer-v0.92-fullstack-baseline-floor-sequential-profile-stabilization-smoke-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-sequential-profile-stabilization-unlikelihood`
Added mechanic	guarded attempts train source-profile floor-anchor groups sequentially and roll back each unsafe group before trying the next one
Unit coverage	focused transformer tests pass; the new mode records sequential profile attempts, accept/reject counts, no-effective-update attempts, and profile probe samples
Floor anchors	`227` recorded; requested batch size `227`; `12` profile-target groups; `10` source-profile groups
Sequential profile attempts	`2000` attempted; `0` accepted; `2000` rejected; `2400` anchor records
Source-profile rejection counts	each of `bridge:owner`, `bridge:place`, `fact:learning`, `fact:owner`, `fact:place`, `qa:glossary`, `qa:learning`, `qa:owner`, `qa:place`, and `qa:self` rejected `200` times
Update guard	checked `50/50` steps; attempted `200` updates; accepted `0`; rejected `200`; no-effective-update attempts `200`
Rejected update shapes	`sequential_profile_stabilization: 200`
Rejected adaptive scales	`1: 50`, `0.25: 50`, `0.05: 50`, `0.01: 50`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed
Promotion status	rejected; sequential source-profile repair still cannot produce safe weight movement

Calibrated sequential profile-floor stabilization screen:

Signal	Value
Run	`runs/transformer-answer-v0.93-baseline-floor-calibrated-sequential-profile-stabilization-step1-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-calibrated-sequential-profile-stabilization-unlikelihood`
Added mechanic	calibrated adaptive scales below `0.01` plus coverage-only guard probes for floor checks
Unit coverage	focused transformer tests pass; the mode records calibrated activation, extended scale metadata, replay-plan scales, and accepted/rejected update-shape accounting
Calibrated scales	`1`, `0.25`, `0.05`, `0.01`, `0.0025`, `0.0005`, `0.0001`
Update guard	checked `1/1` step; attempted `5` updates; accepted `1`; rejected `4`; no-effective-update attempts `4`
Accepted update	`bridge:owner` source-profile group at scale `0.0025`
Sequential profile attempts	`50` attempted; `1` accepted; `49` rejected; `60` anchor records
Rejected adaptive scales	`1: 1`, `0.25: 1`, `0.05: 1`, `0.01: 1`
Accepted update shapes	`calibrated_sequential_profile_stabilization: 1`
Rejected update shapes	`calibrated_sequential_profile_stabilization: 4`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed
Promotion status	rejected for model promotion; calibrated floor-preserving movement is now proven possible

Latest profile-scale calibrated floor stabilization screen:

Signal	Value
Run	`runs/transformer-answer-v0.94-baseline-floor-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood`
Added mechanic	profile-scale memory: search calibrated scales per source profile, preserve the first safe profile update, and roll back unsafe profile-scale attempts
Unit coverage	focused transformer tests pass; the mode records profile-scale activation, search/outer scales, profile-scale attempts, acceptance/rejection scale counts, and accepted profile scales
Search scales	`1`, `0.25`, `0.05`, `0.01`, `0.0025`, `0.0005`, `0.0001`
Outer guard	checked `1/1` step; attempted `1` update; accepted `1`; rejected `0`; no-effective-update attempts `0`
Profile-scale attempts	`60` attempted; `8` accepted; `52` rejected; `72` anchor records
Accepted profile scales	`bridge:owner 0.0025`, `bridge:place 0.0005`, `fact:learning 0.0005`, `fact:owner 0.0001`, `fact:place 0.0001`, `qa:glossary 0.0001`, `qa:place 0.0001`, `qa:self 1`
Accepted update shapes	`profile_scale_calibrated_sequential_profile_stabilization: 1`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed
Promotion status	rejected for model promotion; safe calibrated movement now spans eight source profiles

Latest diversity-aware profile-scale floor stabilization screen:

Signal	Value
Run	`runs/transformer-answer-v0.95-baseline-floor-diversity-profile-scale-calibrated-sequential-stabilization-configured-step1-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-diversity-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood`
Added mechanic	diversity-aware profile-scale memory: accept profile-scale updates only when they preserve the baseline floor and do not regress branch-diversity score from the profile's pre-update state
Unit coverage	focused transformer tests pass; the mode records diversity activation, attempts, score improvements/ties/regressions, floor rejections, rejection reasons, and accepted profile outcomes
Search scales	`1`, `0.25`, `0.05`, `0.01`, `0.0025`, `0.0005`, `0.0001`
Outer guard	checked `1/1` step; attempted `1` update; accepted `1`; rejected `0`; outer diversity rejections `0`
Profile-scale attempts	`58` attempted; `5` accepted; `53` rejected
Diversity outcomes	`5` score improvements; `0` ties; `11` score regressions; `42` floor regressions
Accepted profile scales	`bridge:owner 0.0025`, `bridge:place 0.0005`, `fact:learning 0.0005`, `qa:glossary 1`, `qa:learning 0.0025`
Accepted update shapes	`profile_scale_diversity_calibrated_sequential_profile_stabilization: 1`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed
Promotion status	rejected for model promotion; accepted movement is now explicitly diversity-score non-regressive

Latest frontier profile-scale floor stabilization screen:

Signal	Value
Run	`runs/transformer-answer-v0.96-baseline-floor-diversity-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-diversity-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood`
Added mechanic	frontier target anchors: add missing-target branch contexts to eligible profile-scale batches, then accept only floor-preserving and branch-diversity-score-improving updates
Unit coverage	focused transformer tests pass; the mode records frontier activation, frontier anchor counts, attempts, score outcomes, floor rejections, rejection reasons, and accepted profile outcomes
Search scales	`1`, `0.25`, `0.05`, `0.01`, `0.0025`, `0.0005`, `0.0001`
Frontier anchors	`52` anchors across `10` source-profile groups and `52` source-profile targets
Outer guard	checked `1/1` step; attempted `1` update; accepted `1`; rejected `0`
Profile-scale attempts	`43` attempted; `9` accepted; `34` rejected; `224` frontier records sampled
Diversity outcomes	`9` score improvements; `0` ties; `6` score regressions; `28` floor regressions
Accepted profile scales	`bridge:owner 0.0025`, `fact:learning 0.0025`, `fact:owner 0.0025`, `fact:place 0.25`, `qa:glossary 0.05`, `qa:learning 0.05`, `qa:owner 0.01`, `qa:place 0.0005`, `qa:self 0.05`
Accepted update shapes	`profile_scale_frontier_diversity_calibrated_sequential_profile_stabilization: 1`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed; max dominant predicted rate improved to `0.9`; minimum target-token coverage improved to `0.1667`
Promotion status	rejected for model promotion; frontier movement improves diversity but does not yet satisfy full target coverage

Latest coverage-frontier profile-scale floor stabilization screen:

Signal	Value
Run	`runs/transformer-answer-v0.97-baseline-floor-diversity-coverage-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-diversity-coverage-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood`
Added mechanic	coverage-frontier acceptance: keep frontier anchors active, then accept only floor-preserving updates that gain target-token coverage over the current profile-base snapshot
Unit coverage	focused transformer tests pass; the mode records coverage-frontier activation, coverage gain/tie/regression counts, coverage rejection reasons, accepted coverage deltas, and the new update shape
Search scales	`1`, `0.25`, `0.05`, `0.01`, `0.0025`, `0.0005`, `0.0001`
Frontier anchors	`52` anchors across `10` source-profile groups and `52` source-profile targets
Outer guard	checked `1/1` step; attempted `1` update; accepted `1`; rejected `0`
Profile-scale attempts	`68` attempted; `1` accepted; `67` rejected
Coverage outcomes	`1` coverage gain; `15` coverage ties; `52` coverage regressions
Coverage rejection reasons	`50` floor regressions; `15` coverage ties; `2` coverage regressions
Accepted profile scales	`bridge:owner 0.0025`
Accepted update shapes	`profile_scale_coverage_frontier_diversity_calibrated_sequential_profile_stabilization: 1`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed; strict coverage gating accepted only one source-profile update
Promotion status	rejected for model promotion; monotonic coverage gains are now auditable, but the screen starves later source-profile repairs

Latest coverage-prep frontier profile-scale floor stabilization screen:

Signal	Value
Run	`runs/transformer-answer-v0.98-baseline-floor-diversity-coverage-prep-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-diversity-coverage-prep-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood`
Added mechanic	coverage-preparation acceptance: keep coverage-frontier accounting, accept coverage gains, and also accept coverage-tied moves only when branch-diversity score improves
Unit coverage	focused transformer tests pass; the mode records coverage-prep activation, gain/preparation acceptances, rejection reasons, accepted preparation outcomes, and the new update shape
Search scales	`1`, `0.25`, `0.05`, `0.01`, `0.0025`, `0.0005`, `0.0001`
Frontier anchors	`52` anchors across `10` source-profile groups and `52` source-profile targets
Outer guard	checked `1/1` step; attempted `1` update; accepted `1`; rejected `0`
Profile-scale attempts	`43` attempted; `9` accepted; `34` rejected
Coverage outcomes	`3` coverage gains; `10` coverage ties; `30` coverage regressions
Coverage-prep outcomes	`3` gain acceptances; `6` preparation acceptances; `34` rejections
Coverage rejection reasons	`28` floor regressions; `4` coverage ties without score gain; `2` coverage regressions
Accepted profile scales	`bridge:owner 0.0025`, `fact:learning 0.0025`, `fact:owner 0.0025`, `fact:place 0.25`, `qa:glossary 0.05`, `qa:learning 0.05`, `qa:owner 0.01`, `qa:place 0.0005`, `qa:self 0.05`
Accepted update shapes	`profile_scale_coverage_prep_frontier_diversity_calibrated_sequential_profile_stabilization: 1`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed; max dominant predicted rate remains `0.9`; minimum target-token coverage remains `0.1667`
Promotion status	rejected for model promotion; coverage-prep restores frontier movement while preserving explicit coverage-gain accounting

Latest coverage-recovery frontier profile-scale floor stabilization screen:

Signal	Value
Run	`runs/transformer-answer-v0.99-baseline-floor-diversity-coverage-recovery-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-diversity-coverage-recovery-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood`
Added mechanic	coverage-recovery retry: after a safe coverage-preparation candidate, attempt a small missing-target update and keep either the recovered candidate or the original prepared state
Unit coverage	focused transformer tests pass; the mode records recovery activation, retry scales, prepared candidates, retry attempts, recovery acceptances, fallback preparations, rejection reasons, accepted outcomes, and the recovery update shape
Search scales	`1`, `0.25`, `0.05`, `0.01`, `0.0025`, `0.0005`, `0.0001`; recovery retry scales `1`, `0.25`, `0.05`
Frontier anchors	`52` anchors across `10` source-profile groups and `52` source-profile targets
Outer guard	checked `1/1` step; attempted `1` update; accepted `1`; rejected `0`
Profile-scale attempts	`54` attempted; `6` accepted; `48` rejected
Coverage outcomes	`2` coverage gains; `11` coverage ties; `41` coverage regressions
Coverage-prep outcomes	`2` gain acceptances; `4` preparation acceptances; `48` rejections
Coverage-recovery outcomes	`6` prepared candidates; `15` recovery retries over `95` recovery records; `2` recoveries; `4` preparation fallbacks; `13` retry rejections
Recovery rejection reasons	`7` floor regressions; `6` coverage ties
Coverage rejection reasons	`38` floor regressions; `7` coverage ties without score gain; `3` coverage regressions
Accepted profile scales	`bridge:owner 1`, `bridge:place 1`, `fact:glossary 0.01`, `fact:owner 0.01`, `fact:place 0.01`, `qa:glossary 0.0025`
Accepted recovery outcomes	`bridge:place` and `fact:glossary` recovered coverage; `bridge:owner`, `fact:owner`, `fact:place`, and `qa:glossary` fell back to preparation
Accepted update shapes	`profile_scale_coverage_recovery_frontier_diversity_calibrated_sequential_profile_stabilization: 1`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed; coverage preservation passed, but max dominant predicted rate reached `1.0` and minimum target-token coverage remained `0.0`
Promotion status	rejected for model promotion; v0.99 proves recovery conversion is auditable, but branch-diverse behavior is not yet stable enough to promote

Latest branch-stable coverage-recovery frontier profile-scale floor stabilization screen:

Signal	Value
Run	`runs/transformer-answer-v0.100.0-baseline-floor-diversity-branch-stable-coverage-recovery-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood`
Added mechanic	branch-stable recovery acceptance: a coverage recovery must preserve the prepared candidate's branch-diversity score before the recovered state is accepted
Unit coverage	focused transformer tests pass; the mode records branch-stable activation, checks, acceptances, rejections, fallback preparations, rejection reasons, accepted outcomes, replay-plan activation, and the branch-stable update shape
Search scales	`1`, `0.25`, `0.05`, `0.01`, `0.0025`, `0.0005`, `0.0001`; recovery retry scales `1`, `0.25`, `0.05`
Frontier anchors	`52` anchors across `10` source-profile groups and `52` source-profile targets
Outer guard	checked `1/1` step; attempted `1` update; accepted `1`; rejected `0`
Profile-scale attempts	`54` attempted; `6` accepted; `48` rejected
Coverage outcomes	`2` coverage gains; `11` coverage ties; `41` coverage regressions
Coverage-prep outcomes	`2` gain acceptances; `4` preparation acceptances; `48` rejections
Branch-stable recovery outcomes	`6` prepared candidates; `15` branch-stability checks; `2` branch-stable recoveries; `4` preparation fallbacks; `13` retry rejections
Branch-stable rejection reasons	`7` floor regressions; `5` coverage ties; `1` branch-score regression
Coverage rejection reasons	`38` floor regressions; `7` coverage ties without score gain; `3` coverage regressions
Accepted profile scales	`bridge:owner 1`, `bridge:place 1`, `fact:glossary 0.01`, `fact:owner 0.01`, `fact:place 0.01`, `qa:glossary 0.0025`
Accepted branch-stable outcomes	`bridge:place` and `fact:glossary` recovered coverage while preserving prepared branch score; `bridge:owner`, `fact:owner`, `fact:place`, and `qa:glossary` fell back to preparation
Accepted update shapes	`profile_scale_branch_stable_coverage_recovery_frontier_diversity_calibrated_sequential_profile_stabilization: 1`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed; coverage preservation passed, but max dominant predicted rate remains `1.0` and minimum target-token coverage remains `0.0`
Promotion status	rejected for model promotion; v0.100.0 proves recovery conversion can be checked against branch stability, but branch-diverse behavior is still not stable enough to promote

Latest branch-diversity recovery frontier profile-scale floor stabilization screen:

Signal	Value
Run	`runs/transformer-answer-v0.101.0-baseline-floor-diversity-branch-diversity-recovery-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-branch-diversity-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood`
Added mechanic	branch-diversity recovery: after a profile update is already safe under the floor, coverage, and branch-stability gates, try a small profile-local branch-diversity update and keep it only if branch score improves without coverage regression
Unit coverage	focused transformer tests pass; the mode records branch-diversity recovery activation, candidates, attempts, acceptances, fallback acceptances, rejection reasons, accepted outcomes, score deltas, replay-plan activation, and the branch-diversity recovery update shape
Search scales	`1`, `0.25`, `0.05`, `0.01`, `0.0025`, `0.0005`, `0.0001`; recovery retry scales `1`, `0.25`, `0.05`; branch-diversity recovery scales `0.25`, `0.05`, `0.01`
Frontier anchors	`52` anchors across `10` source-profile groups and `52` source-profile targets
Outer guard	checked `1/1` step; attempted `1` update; accepted `1`; rejected `0`
Profile-scale attempts	`52` attempted; `6` accepted; `46` rejected
Coverage outcomes	`2` coverage gains; `16` coverage ties; `34` coverage regressions
Coverage-prep outcomes	`2` gain acceptances; `4` preparation acceptances; `46` rejections
Branch-stable recovery outcomes	`4` prepared candidates; `12` branch-stability checks; `0` branch-stable recoveries; `4` preparation fallbacks; `12` retry rejections
Branch-diversity recovery outcomes	`6` candidates; `9` attempts; `5` branch-score-improving refinements; `1` fallback; `4` rejected attempts
Branch-diversity rejection reasons	`1` floor regression; `1` score regression; `2` score ties
Accepted profile scales	`bridge:owner 1`, `bridge:place 1`, `fact:glossary 0.05`, `fact:owner 0.01`, `fact:place 0.05`, `fact:self 0.0025`
Accepted branch-diversity outcomes	`bridge:owner`, `bridge:place`, `fact:glossary`, `fact:owner`, and `fact:place` improved branch score; `fact:self` fell back
Accepted update shapes	`profile_scale_branch_diversity_recovery_frontier_calibrated_sequential_profile_stabilization: 1`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed; coverage preservation passed, but max dominant predicted rate remains `1.0` and minimum target-token coverage remains `0.0`
Promotion status	rejected for model promotion; v0.101.0 proves local branch-diversity recovery can improve guarded profile states, but global branch-diverse behavior is still not stable enough to promote

Latest collapsed-profile binding frontier profile-scale floor stabilization screen:

Signal	Value
Run	`runs/transformer-answer-v0.102.0-baseline-floor-diversity-collapsed-profile-binding-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/`
Mode	`branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-branch-diversity-collapsed-profile-binding-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood`
Added mechanic	collapsed-profile binding: after a profile update is already safe under the floor, coverage, branch-stability, and branch-diversity recovery gates, try a small profile-local binding update and keep it only if a still-collapsed eval profile improves without coverage regression
Unit coverage	focused transformer tests pass; the mode records collapsed-profile binding activation, candidates, attempts, acceptances, fallback acceptances, rejection reasons, target collapsed profiles, profile-diversity deltas, replay-plan activation, and the collapsed-profile binding update shape
Search scales	`1`, `0.25`, `0.05`, `0.01`, `0.0025`, `0.0005`, `0.0001`; recovery retry scales `1`, `0.25`, `0.05`; branch-diversity recovery scales `0.25`, `0.05`, `0.01`; collapsed-profile binding scales `0.25`, `0.05`, `0.01`
Frontier anchors	`52` anchors across `10` source-profile groups and `52` source-profile targets
Outer guard	checked `1/1` step; attempted `1` update; accepted `1`; rejected `0`
Profile-scale attempts	`54` attempted; `11` accepted; `43` rejected
Branch-diversity recovery outcomes	`11` candidates; `26` attempts; `4` branch-score refinements; `7` fallbacks
Collapsed-profile binding outcomes	`11` candidates; `31` attempts; `1` binding update; `10` fallbacks; `30` rejected attempts
Collapsed-profile binding rejection reasons	`27` collapsed-profile ties; `1` floor regression; `2` score regressions
Collapsed profiles	baseline `9/9`; final `3/9` remaining: `learning`, `owner`, `paraphrases`
Accepted update shapes	`profile_scale_collapsed_profile_binding_frontier_calibrated_sequential_profile_stabilization: 1`
Branch-context gate	passed across `219/219` semantic records with no ambiguous, colliding, or skipped records
Deterministic verifier	passed with no external model
Diversity target	failed, `0/9` multi-target profiles passed; coverage preservation passed, but max dominant predicted rate remains `1.0` and minimum target-token coverage remains `0.0`
Promotion status	rejected for model promotion; v0.102.0 proves a targeted collapsed-profile binding update can survive the guard, but learning, owner, and paraphrase collapse still block a functional transformer responder

The transformer is not yet promoted as a reliable responder. It is architecture evidence: a from-scratch attention model can update weights on the admitted corpus and leave a checkpoint plus metrics. v0.42 preserves the 37/219 transformer-only candidate result while improving answer-target NLL versus v0.41, but raw greedy completion still fails exact answers with the short wrong completion " te.". The latest v0.43 stacked screen proves that two-layer top-layer-only direct-answer training can complete and write a checkpoint when the expensive post-direct candidate snapshot is explicitly skipped, but its repeated "a" output is still a failed direct decoder. v0.31's no-candidate transformer-guided generator remains useful comparison evidence, but it is not raw transformer decoding. The branch-profile smoke adds a sharper diagnosis: at the configured branch position, the model is selecting one global token across prompts instead of separating target-specific answer branches. The branch-collapse repair uses that diagnosis by penalizing the sampled dominant branch token, but the evidence shows it only moves the collapse to a new global token. Branch-batch contrast then trains several distinct target branches in one update; it lowers loss under sparse dosage, but the branch profile still collapses globally and even loses the one initially correct QA branch. --use-context-mean then adds a mean-pooled context residual to the final hidden representation, but the bounded screens still collapse the QA branch to one wrong global token. The next repair needs a stronger prompt-conditioned representation signal than simple prompt averaging. --use-context-projection then lets the model learn a zero-initialized projection of that context summary, and the projection weights do move during training, but the branch profile still collapses globally. --use-prompt-attention-summary makes the summary itself attention-pooled and trainable, but the bounded screens still collapse globally. The branch-context coverage diagnostic explains why context-16 branch screens were partly underdetermined: QA had only four visible branch contexts for eight records, and those windows mapped to different first target tokens. Context-32 removes literal QA ambiguity but still truncates semantic prompt features. Context-80 gives every current eval record complete semantic branch-context coverage with no ambiguity. The next repair needs efficient longer-context prompt-specific discrimination, not just suppression, batching, or a trainable summary of a truncated context. The optional branch-context gate now enforces that distinction for direct-answer screens: unsafe context-16 branch repair can be skipped and recorded, while complete context-80 branch repair is allowed to run. The branch-only snapshot mode keeps those longer-context screens practical by skipping greedy completion evals while still recording the branch diagnostics and gate evidence needed for the next decision. The first dim8 follow-ups show that lower branch loss and complete branch context are still not enough: both repair/contrast and branch-batch contrast collapse QA branch prediction to one global token. A full greedy-eval promotion snapshot is not warranted until a screen improves prompt-specific branch diversity. The branch-diversity target now makes that requirement machine-readable in every direct-answer snapshot. branch-diversity-unlikelihood trains directly against the observed collapse token and improves the tiny unit case, but the first corpus smoke only moves the dominant global prediction. Freezing the output bias removes one cheap global escape hatch, but the corpus smoke still rotates to a single dominant branch token. Restricted target-set softmax briefly raises QA predicted diversity to two tokens, then collapses back by the final snapshot. The next repair needs to make diversity stable across prompts, not just rotate or momentarily crack the collapsed token. Best-snapshot restoration can preserve a better measured branch state, but it still ends as a one-token collapse until the underlying representation separates prompts. Prompt-prefix projection gives the model a targeted trainable prompt path and the new parameters move, but the evidence still ends in the same all-"u" branch collapse. Prompt-position projection keeps position-specific prompt access and moves many more parameters, but the branch profile remains collapsed too. Branch-target margin adds pairwise target separation on top of that prompt path and lowers bounded train loss, but the restored branch profile remains the same one-token collapse. Branch-representation contrast exposes that the hidden states themselves remain nearly indistinguishable at the answer branch, so the next repair needs a stronger prompt-conditioned representation path rather than another output-head loss alone. The dim-8 capacity screen increases measured hidden distance, but branch predictions still collapse globally, so width alone is not the missing repair. Prompt-position projection scaling shows the prompt residual can be made louder and the restored hidden-state distance can rise, but the branch prediction still collapses globally. The pre-layer-norm/final-normalization path is now implemented and screened; it cracks full collapse in most multi-target profiles but leaves QA and heldout collapsed. Target-balanced branch batching then regresses to a baseline-restored global "n" collapse, so the next repair should strengthen prompt-to-answer binding for QA and heldout rather than rely on sampler balancing or another unrelated loss term. The branch-rank diagnostic confirms the correct target is usually buried outside the top five predictions, which points the next repair toward output-head prompt binding instead of a simple near-miss margin tweak. The first output-binding repair combines that target-set pressure with representation contrast and improves average target rank/top-5 evidence, but it still fails target-token coverage and collapses to wrong branch tokens. The next repair needs to promote the correct target into the top branch set, not only move it upward while the wrong tokens remain on top. Hard rank-margin repair is the first screen to make that movement clear: it lifts correct targets into the top five more often and improves target-token coverage, but it still leaves a single global wrong prediction. The next repair needs to convert rank lift into prompt-specific top-1 branch choices. Target- balanced rank-margin adds some wrong-token diversity and better QA top-3 coverage, but it still does not make correct target tokens win the branch. The top-one hard-negative screen then regresses rank and top-k coverage, so the next repair should not simply concentrate more pressure on the current top wrong token. It needs a prompt-conditioned mechanism that selects among near-tied branch candidates.

The v0.66 open-source mechanics audit reframes the current blocker as trainer mechanics rather than another global branch loss. v0.67 implements the first profile-aware replay-plan surface: branch records carry source/profile keys, deficits and preservation are computed per profile, and the plan is written as a run artifact before training. v0.68 proves that constraint is doing useful work: profile-aware training moved correct targets upward in the ranked list, but only by collapsing target-token coverage and branch diversity, so the snapshot gate restored baseline. The next trainer change needs anti-collapse preservation inside the profile-aware plan. v0.81 implements that trainer change as a profile target-share objective mechanic. v0.82 screens it and rejects the trained snapshots because rank lift still comes from branch collapse. v0.83 adds prompt-specific sibling-target ownership margins and proves the focused mechanic, but the screen still restores step 0 because trained snapshots lose target-token coverage. v0.84 anchors replay preservation to baseline predictions and improves trained coverage relative to v0.83, but still restores step 0 because snapshots miss the full coverage floor. v0.85 adds a baseline-floor update guard that preserves the floor by rejecting all attempted unsafe updates. v0.86 retries those updates at four smaller scales and still rejects every attempt. v0.87 adds one baseline-covered repair after each failed retry and still rejects every attempt; v0.88 moves floor anchors into the objective and still rejects every attempt; v0.89 removes branch pressure and still rejects every floor-stabilization attempt. v0.90 records the rejected profile floors directly, showing heldout violates every attempt and the worst deficit is 0.25 on learning. v0.91 covers the full profile-target floor surface and still rejects every attempt. v0.92 changes the repair shape to sequential source-profile batches and still rejects every profile-local attempt. v0.93 calibrates that movement below 0.01 and accepts one source-profile update at scale 0.0025. v0.94 adds profile-scale memory and accepts eight source-profile updates. v0.95 adds diversity-aware profile-scale acceptance, preserves five score-improving source-profile updates, and rejects eleven floor-preserving score regressions, and v0.96 adds frontier target anchors, preserving nine score-improving source-profile updates while lowering max dominant predicted rate to 0.9. v0.97 adds coverage-frontier acceptance and shows strict monotonic coverage gating is auditable but too conservative, accepting only one coverage-gaining source-profile update. v0.98 adds coverage-prep acceptance, restores nine source-profile updates, and separates three coverage gains from six safe setup moves. v0.99 adds coverage-recovery retry, converts two prepared candidates into direct coverage recoveries, and preserves four preparation fallbacks. v0.100.0 adds branch-stable recovery acceptance, keeps those two recoveries, and records one branch-score regression rejection. v0.101.0 adds branch-diversity recovery, accepts five local branch-score refinements, and falls back once. v0.102.0 adds collapsed-profile binding, accepts one targeted binding update, and narrows final collapse from nine eval profiles to three. The next repair should target learning, owner, and paraphrases without weakening the coverage floor.