Skip to main content

Transformer

v0.24 introduced a tiny decoder-only transformer in closed_world_lm.transformer_char_model.

It is intentionally small:

  • corpus-trained character tokenizer
  • learned token and position embeddings
  • one causal self-attention block
  • one feed-forward block
  • next-character language-model head
  • dependency-free scalar autodiff
  • random initialization only

Train a smoke checkpoint:

PYTHONPATH=src python3 -m closed_world_lm.transformer_char_model train \
--run runs/transformer-smoke \
--steps 40 \
--context-size 8 \
--embedding-dim 6 \
--feedforward-dim 12

Evaluate answer probes:

PYTHONPATH=src python3 -m closed_world_lm.transformer_char_model eval \
--checkpoint runs/transformer-smoke/transformer.json \
--json runs/transformer-smoke/transformer_eval.json

Train on corpus-derived answer lessons:

PYTHONPATH=src python3 -m closed_world_lm.transformer_char_model answer-train \
--run runs/transformer-answer-smoke \
--steps 100 \
--eval-every 0 \
--candidate-scope eval \
--selector-steps 200 \
--selector-eval-every 0 \
--selector-emit-completions \
--generator-steps 400 \
--generator-eval-every 0 \
--direct-answer-steps 100 \
--direct-answer-eval-every 0 \
--direct-answer-mode periodic-balanced-repair-unlikelihood \
--direct-answer-negative-weight 1.0 \
--direct-answer-positive-weight 1.0 \
--direct-answer-rollout-interval 50

From v0.71 onward, answer-train writes experiment_intent.json before training and closes it with a decision in transformer_answer_metrics.json. Use --experiment-hypothesis, --experiment-acceptance-gate name:rule, --experiment-failure-criterion, and --experiment-note to make a screen's intent more specific. From v0.77 onward, transformer screens close through the constraint-first promotion report.

From v0.72 onward, profile-aware replay planning lives in src/closed_world_lm/replay_plan.py. The transformer still emits the same direct_answer_replay_plan.json shape for profile-aware modes, but replay record normalization, profile grouping, coverage floors, and missing-target summaries are now standalone training-planning mechanics.

From v0.73 onward, answer-train also writes corpus_hygiene.json and training_plan.json. These artifacts record source mixture, duplicate checks, train/eval prompt overlap, candidate ratio, rare-profile coverage, allowed data sources, planned artifacts, and replay-plan summaries when profile-aware replay writes a plan.

From v0.75 onward, answer-train also writes candidate_quarantine.json. The manifest records candidate lifecycle state and is linked from training_plan.json; candidate records are not training data until admitted into the ledgered corpus.

From v0.76 onward, answer-train also writes closed_world_verifier.json. The verifier is deterministic and checks that the closed-world data boundary, candidate exclusion policy, quarantine manifest, and protected train/eval overlap all pass before transformer screen evidence is trusted.

From v0.77 onward, answer-train also writes training_recipe.json and constraint_first_promotion.json. The recipe records model, tokenizer, data, objective, optimizer, replay, artifacts, gates, and rerun details. The constraint-first report blocks loss, NLL, rank, top-k, or exact quality evidence until verifier, contamination, branch-context, coverage, and diversity constraints pass first.

From v0.78 onward, the answer-training stack starts using separate transformer responsibility surfaces for artifact contracts, experiment/recipe decisions, trainer utilities, and the direct-answer objective catalog. The public CLI and artifact names remain stable.

From v0.79 onward, src/closed_world_lm/transformer_model.py owns model, optimizer, and generation config validation, checkpoint identity, closed-world dataset metadata, and run metadata. transformer_char_model.py still exports the old names for compatibility.

From v0.80 onward, src/closed_world_lm/transformer_checkpoint.py owns checkpoint payload loading and identity validation, and src/closed_world_lm/transformer_eval.py owns generic transformer probe loading, candidate collection, scoring, eval report assembly, samples JSONL writing, and eval JSON writing. The public eval CLI and artifact shapes remain stable.

From v0.81 onward, branch-balanced-context-profile-target-share-preserving-deficit-unlikelihood adds balanced owned target-share pressure across replay targets inside each profile-aware replay group. It keeps the existing profile replay plan, deficit focus, and represented-target preservation, but adds a per-target anti-collapse term so one represented target cannot dominate a multi-target profile without pressure on the remaining replay targets.

v0.82 screens that objective under the modern artifact stack and constraint-first gates. The screen fixes the transformer metrics purity field for external_embeddings, passes the verifier and branch-context gate, and preserves coverage by restoring step 0, but trained snapshots still collapse QA and heldout branch diversity.

v0.83 adds branch-balanced-context-profile-prompt-ownership-target-share-preserving-deficit-unlikelihood. It keeps the profile target-share objective and adds a prompt-specific sibling-target margin, so each replay context is trained to rank its own target above other targets from the same profile. The focused mechanic passes, but the full screen still rejects trained snapshots that lose target-token coverage.

v0.84 adds branch-balanced-context-profile-baseline-anchored-prompt-ownership-target-share-preserving-deficit-unlikelihood. It keeps prompt ownership but anchors replay preservation to the baseline replay predictions recorded before direct-answer training, so preservation no longer follows prediction drift. The screen improves trained coverage relative to v0.83 but still restores baseline because it misses the full coverage floor.

v0.85 adds branch-balanced-context-profile-baseline-floor-gated-prompt-ownership-target-share-preserving-deficit-unlikelihood. It keeps baseline replay anchors and rejects any attempted direct-answer update whose branch-profile target-token coverage falls below the step-0 floor. The screen preserves coverage by rejecting all attempted updates, so the next repair must produce accepted safe updates rather than looser promotion gates.

v0.86 adds branch-balanced-context-profile-baseline-floor-adaptive-prompt-ownership-target-share-preserving-deficit-unlikelihood. It keeps the baseline-floor guard and retries the same update at smaller learning-rate scales after restoring model, optimizer, and RNG state. The screen shows that step size alone is not enough: all scaled attempts are still rejected.

v0.87 adds branch-balanced-context-profile-baseline-floor-repaired-prompt-ownership-target-share-preserving-deficit-unlikelihood. It keeps the adaptive guard and adds one bounded baseline-covered anchor repair before each failed retry is accepted or rejected. The screen shows that post-update repair is not enough: all repaired attempts are still rejected.

v0.88 adds branch-balanced-context-profile-baseline-floor-objective-prompt-ownership-target-share-preserving-deficit-unlikelihood. It puts balanced baseline-floor anchors inside the same loss and backward pass as the branch-diversity pressure. The screen shows that the combined objective is still not enough: all objective-shaped attempts are rejected.

v0.89 adds branch-context-profile-baseline-floor-stabilization-unlikelihood. It removes branch-diversity pressure from guarded attempts and trains only baseline-covered floor anchors. The screen shows that floor-only stabilization is still not enough: all stabilization-shaped attempts are rejected.

v0.90 adds baseline-floor rejection diagnostics to the same stabilization mode. The guard now records rejected update-shape counts, rejected learning-rate scale counts, violation profile counts, compact per-attempt floor diagnostics, and the worst rejected coverage violation.

v0.91 adds branch-context-profile-baseline-floor-profile-targeted-stabilization-unlikelihood. It covers every baseline-covered floor-anchor profile-target group in each guarded attempt. The screen shows that broader floor-anchor coverage is still not enough: all profile-targeted attempts are rejected with the same violation pattern as v0.90.

v0.92 adds branch-context-profile-baseline-floor-sequential-profile-stabilization-unlikelihood. It changes the repair shape to sequential source-profile floor batches with rollback after each unsafe profile group. The screen shows that source-profile ordering is still not enough: all profile-local attempts are rejected before any effective guarded update survives.

v0.93 adds branch-context-profile-baseline-floor-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the sequential rollback shape, extends calibrated adaptive scales below 0.01, and uses coverage-only guard probes. The diagnostic screen accepts the first nonzero source-profile update that preserves the baseline floor.

v0.94 adds branch-context-profile-baseline-floor-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It searches calibrated scales separately for each source profile, keeps the first safe update for that profile, and rolls back only unsafe profile-scale attempts. The diagnostic screen accepts eight source-profile updates while the baseline floor remains preserved.

v0.95 adds branch-context-profile-baseline-floor-diversity-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the profile-scale search, then accepts a source-profile update only when it preserves the baseline coverage floor and does not regress the branch-diversity score from that profile's pre-update state. The diagnostic screen accepts five score-improving source-profile updates and rejects eleven floor-preserving score regressions before promotion still blocks on branch diversity.

v0.96 adds branch-context-profile-baseline-floor-diversity-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It adds missing-target frontier anchors to each eligible source profile before the same floor and diversity acceptance gates run. The diagnostic screen accepts nine score-improving source-profile updates, lowers max dominant predicted rate to 0.9, and raises minimum target-token coverage to 0.1667 before promotion still blocks on branch diversity.

v0.97 adds branch-context-profile-baseline-floor-diversity-coverage-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the missing-target frontier anchors but accepts a profile-scale update only when the candidate preserves the baseline floor and gains target-token coverage over that profile's pre-update branch snapshot. The diagnostic screen accepts one coverage-gaining source-profile update, rejects coverage ties and coverage regressions explicitly, and shows the strict monotonic screen is auditable but too conservative to recover full branch diversity yet.

v0.98 adds branch-context-profile-baseline-floor-diversity-coverage-prep-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.97 coverage audit but allows coverage-tied preparation moves when they improve the branch-diversity score. The diagnostic screen restores the nine accepted source-profile frontier updates while separating three coverage gains from six coverage-preparation moves, so the self-improvement ledger can distinguish real coverage recovery from safe setup movement.

v0.99 adds branch-context-profile-baseline-floor-diversity-coverage-recovery-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.98 preparation path but gives each safe preparation candidate a small missing-target recovery retry before falling back to the prepared state. The diagnostic screen accepts six source-profile updates, converts two preparation candidates into direct coverage recoveries, keeps four preparation fallbacks, and keeps promotion blocked on branch diversity.

v0.100.0 adds branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.99 recovery retry but adds a branch-stability check: a recovery candidate must preserve the branch-diversity score of its prepared state before the recovered weights are accepted. The diagnostic screen keeps the two coverage recoveries, records fifteen branch-stability checks, rejects one retry for branch-score regression, and keeps promotion blocked on branch diversity.

v0.101.0 adds branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-branch-diversity-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.100.0 branch-stable recovery guard and adds a bounded branch-diversity recovery step after already-safe profile updates. The diagnostic screen accepts six source-profile updates, runs branch-diversity recovery on all six, keeps five branch-score-improving refinements, falls back once, and keeps promotion blocked on branch diversity.

v0.102.0 adds branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-branch-diversity-collapsed-profile-binding-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.101.0 branch-diversity recovery guard and adds a bounded collapsed-profile binding step after already-safe profile updates. The diagnostic screen accepts eleven source-profile updates, keeps four branch-diversity refinements, accepts one collapsed-profile binding update, narrows final collapse from nine eval profiles to three, and keeps promotion blocked on branch diversity.

v0.103.0 adds branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-branch-diversity-collapsed-profile-binding-remaining-profile-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.102.0 binding guard and prioritizes source-profile groups for the remaining collapsed eval profiles: learning, owner, and paraphrase coverage through color, owner, place, and training_data source labels. The diagnostic screen accepts eleven source-profile updates, records twenty-one prioritized attempts, accepts six prioritized updates, improves learning coverage from 0.0 to 0.25, preserves target coverage, and keeps promotion blocked on branch diversity.

v0.104.0 adds branch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-branch-diversity-collapsed-profile-binding-remaining-profile-owner-paraphrase-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood. It keeps the v0.103.0 remaining-profile curriculum, narrows residual binding targets to owner and paraphrases, and protects learning as a preserved profile. The diagnostic screen records sixteen owner/paraphrase-prioritized attempts, accepts six prioritized source-profile updates, runs seventy-five learning-preservation checks, rejects twenty-four preservation failures, keeps learning non-collapsed at coverage 0.25, and keeps promotion blocked on branch diversity.

v0.105.0 adds deterministic closed-world retrieval memory in src/closed_world_lm/memory_retrieval.py. Transformer answer-training runs now write retrieval_memory_report.json as a separate artifact from neural weight metrics. The diagnostic screen at runs/transformer-answer-v0.105.0-retrieval-memory-owner-paraphrase-frontier-profile-scale-step1-dim4-context80/ builds 497 memory cards from story facts, admitted memories, self facts, learning rules, and glossary entries; answers 219/219 eval probes exactly; uses no external model, no pretrained retriever, and no external embeddings; and performs no weight updates. The direct-answer transformer screen remains blocked on branch diversity, so retrieval success is evidence for the memory-first rail, not neural promotion.

v0.106.0 adds deterministic memory-guided consolidation planning in src/closed_world_lm/memory_consolidation.py. Transformer answer-training runs now write memory_consolidation_plan.json after retrieval and branch diagnostics are available. The diagnostic screen at runs/transformer-answer-v0.106.0-memory-guided-consolidation-owner-paraphrase-frontier-profile-scale-step1-dim4-context80/ keeps retrieval at 219/219, records 9 memory-backed neural failed profiles, and ranks owner, paraphrases, glossary, admission_paraphrases, and admissions as the top consolidation priorities. It identifies collapsed memory-backed profiles owner, paraphrases, and glossary; the transformer still rejects promotion on branch diversity.

v0.107.0 adds gated memory-consolidation direct-answer training. The diagnostic screen at runs/transformer-answer-v0.107.0-gated-memory-consolidation-owner-paraphrase-glossary-frontier-profile-scale-step1-dim4-context80/ loads the v0.106.0 consolidation plan as a declared source artifact, consumes owner, paraphrases, and glossary as the target profile list, records 26 memory-consolidation prioritized attempts with 8 acceptances and 18 rejections, keeps retrieval exact at 219/219, and still rejects promotion on branch_diversity_target. Retrieval remains separate memory evidence; the new mode proves a plan-to-weight-update handoff under gates, not completed neural learning.

v0.108.0 expands the memory-consolidation target window. Target-only profiles such as heldout, qa, admissions, and admission_paraphrases now map back to admitted corpus source labels before replay ordering. The diagnostic screen at runs/transformer-answer-v0.108.0-expanded-memory-consolidation-owner-paraphrase-heldout-qa-glossary-frontier-profile-scale-step1-dim4-context80/ consumes the v0.107.0 plan with five target profiles: owner, paraphrases, heldout, qa, and glossary. Retrieval remains exact at 219/219, the guard again records 26 prioritized attempts with 8 acceptances and 18 rejections, and promotion still rejects on branch_diversity_target. The next mechanic should directly target missing first-token diversity for that expanded profile set.

v0.109.0 adds that missing first-token memory-consolidation pressure. The diagnostic screen at runs/transformer-answer-v0.109.0-missing-first-token-memory-consolidation-owner-paraphrase-heldout-qa-glossary-frontier-profile-scale-step1-dim4-context80/ consumes the v0.108.0 plan, preserves the five-profile target window, extracts plan-derived missing first-token maps, and records 8 missing-token candidates, 22 missing-token attempts, 1 accepted guarded coverage-gain update, 21 rejections, and 7 fallback acceptances. Retrieval remains exact at 219/219; promotion still rejects on branch_diversity_target, and the next plan narrows remaining collapsed memory-backed profiles to owner, paraphrases, and learning.

v0.110.0 consumes that narrowed plan directly. The diagnostic screen at runs/transformer-answer-v0.110.0-remaining-collapsed-missing-first-token-memory-consolidation-owner-paraphrase-learning-frontier-profile-scale-step1-dim4-context80/ requires source-plan collapsed_memory_backed_profiles, targets only owner, paraphrases, and learning, and records the remaining-collapsed target contract in the replay plan and direct-answer guard. Retrieval remains exact at 219/219; the missing-token phase records 6 candidates, 16 attempts, 1 accepted guarded coverage-gain update, 15 rejections, and 5 fallback acceptances. Promotion still rejects on branch_diversity_target, so the next mechanic should repair those three profiles with more profile-specific pressure.

v0.111.0 adds that profile-specific pressure. The diagnostic screen at runs/transformer-answer-v0.111.0-profile-specific-missing-first-token-memory-consolidation-owner-paraphrase-learning-frontier-profile-scale-step1-dim4-context80/ consumes the v0.110.0 plan, keeps owner, paraphrases, and learning as the target profiles, and maps each admitted source label to only the unresolved targets it can support: learning -> learning, owner -> owner/paraphrases, and color/place/training_data -> paraphrases. Retrieval remains exact at 219/219; memory-prioritized consolidation records 16 attempts with 6 acceptances and 10 rejections. The profile-specific missing-token phase records 6 candidates, 18 attempts, 0 direct missing-token acceptances, 18 rejections, and 6 fallbacks, while the guard records 1 accepted profile-specific update shape. Promotion still rejects on branch_diversity_target, so the next mechanic should use the per-profile acceptance deltas to repair paraphrases, owner, and re-emergent glossary collapse.

v0.112.0 pauses repair-objective churn and adds branch-diversity root-cause diagnostics. The diagnostic screen at runs/transformer-answer-v0.112.0-branch-diversity-root-cause-profile-specific-memory-consolidation-step1-dim4-context80/ consumes the v0.111.0 plan, targets owner, paraphrases, and glossary, and keeps retrieval exact at 219/219. It records 24 memory-prioritized attempts with 8 acceptances and 16 rejections, plus 24 profile-specific missing-token attempts with 0 direct missing-token acceptances, 24 rejections, and 8 fallbacks. The new branch_diversity_target.root_cause report classifies the final failure as a critical target_routing_gap: 9/9 profiles fail, 3 remain collapsed, 1 has zero target-token coverage, and 6 have buried targets. Promotion still rejects on branch_diversity_target, so the next mechanic should audit routing, representation separation, and profile/target imbalance before adding another objective.

v0.113.0 adds that routing audit to direct-answer snapshots. The diagnostic screen at runs/transformer-answer-v0.113.0-branch-routing-audit-profile-specific-memory-consolidation-step1-dim4-context80/ consumes the v0.112.0 plan, targets owner, paraphrases, and learning, and keeps retrieval exact at 219/219. It records 16 memory-prioritized attempts with 6 acceptances and 10 rejections, plus 18 profile-specific missing-token attempts with 0 direct missing-token acceptances, 18 rejections, and 6 fallbacks. The root cause remains a critical target_routing_gap, and branch_routing_audit reports high output-bias escape risk, low representation separation across 9/9 multi-target profiles, and a glossary target-imbalance hotspot. Promotion still rejects on branch_diversity_target.

v0.114.0 adds branch_logit_prior_profiles and centroid separation metrics to direct-answer snapshots. The diagnostic screen at runs/transformer-answer-v0.114.0-logit-prior-representation-instrumentation-profile-specific-memory-consolidation-step1-dim4-context80/ consumes the v0.113.0 plan, targets owner, paraphrases, and glossary, and keeps retrieval exact at 219/219. It records 24 profile-specific missing-token attempts with 0 direct missing-token acceptances, 24 rejections, and 8 fallbacks. The root cause remains a critical target_routing_gap; output-bias risk remains high, but logit-prior decomposition says dominant-token wins are hidden-projection driven across 9/9 multi-target profiles. Promotion still rejects on branch_diversity_target.

Add --use-context-mean to either train or answer-train to test the experimental mean-pooled context residual in the final transformer representation. It is diagnostic architecture evidence only until it improves prompt-conditioned branch profiles and complete answer metrics. Add --use-context-projection to test a zero-initialized trainable projection of that context summary; it starts baseline-equivalent and must prove that its learned parameters improve branch profiles before it can be promoted. Add --use-prompt-prefix-projection to test a zero-initialized trainable projection of non-padding prompt-prefix positions before the final answer token. Add --use-prompt-attention-summary to test a trainable attention-pooled summary of the current context through a zero-initialized output projection. It is also diagnostic until branch profiles improve. Add --direct-answer-require-branch-context-gate to skip direct-answer training unless branch contexts are semantically complete and unambiguous. Add --direct-answer-snapshot-mode branch-only for bounded longer-context screens that need branch profiles and branch-context gate evidence but can intentionally skip greedy completion evals in direct-answer JSONL snapshots. Direct-answer snapshots also emit branch_diversity_target, which fails when multi-target eval profiles collapse to too few predicted branch tokens. Use --direct-answer-mode branch-diversity-unlikelihood to train distinct branch targets while also suppressing each branch context's current wrong prediction. Use --direct-answer-freeze-output-bias to exclude the transformer output bias from direct-answer updates when screening whether a branch objective is learning prompt-specific weights rather than moving one global token bias. Use --direct-answer-mode branch-target-softmax-unlikelihood to add a restricted softmax over the distinct branch targets in each batch, making the right target compete directly against the other observed branch targets. Use --direct-answer-restore-best-branch-snapshot to restore the best scored branch-diversity checkpoint before final metrics and checkpoint writing. Add --use-prompt-position-projection to test a zero-initialized position-specific projection of non-padding prompt-prefix positions. Add --prompt-position-projection-scale to scale that prompt-position projection residual before it is added to the final branch representation. Use --direct-answer-mode branch-target-margin-unlikelihood to add a smooth pairwise target-margin loss over the distinct branch targets in each batch. Direct-answer snapshots include branch_representation_profiles so runs can measure hidden-state pairwise distance before the output head. Use --direct-answer-mode branch-representation-contrast-unlikelihood to penalize nearly identical hidden states for different branch targets. Use --direct-answer-mode branch-balanced-representation-contrast-unlikelihood to build that representation-contrast batch from target buckets so frequent first answer tokens cannot crowd out rare branch targets. Direct-answer branch profiles also include target-rank diagnostics: average target rank, top-3/top-5 target coverage, and the top predicted alternatives on failed branch records. Use --direct-answer-mode branch-output-binding-unlikelihood to combine restricted branch-target softmax with branch representation contrast in the same update. Use --direct-answer-mode branch-rank-margin-unlikelihood to push each branch target above the model's current top wrong tokens. The --direct-answer-hard-negatives value controls how many top wrong tokens each branch target is margined against. Use --direct-answer-mode branch-balanced-rank-margin-unlikelihood to apply the same rank-margin repair with target-balanced branch batches. Use --direct-answer-mode branch-topk-softmax-unlikelihood to train each branch target against a restricted softmax over the target and the model's current top wrong tokens. Use --direct-answer-mode branch-balanced-topk-softmax-unlikelihood for the same objective with target-balanced branch batches. The --direct-answer-hard-negatives value controls the top-wrong-token candidate count, and --direct-answer-contrast-weight controls the restricted-softmax loss weight. Use --direct-answer-mode branch-bidirectional-binding-unlikelihood to bind prompt contexts and branch targets in both directions: row-wise target choice inside each prompt context, and column-wise target-token ownership across prompt contexts. Use --direct-answer-mode branch-balanced-bidirectional-binding-unlikelihood for the same objective with target-balanced branch batches. Use --direct-answer-mode branch-coverage-binding-unlikelihood to combine bidirectional binding with hard-wrong-token competition and a target-set mass coverage guard. Use --direct-answer-mode branch-balanced-coverage-binding-unlikelihood for the same objective with target-balanced branch batches, and use --direct-answer-hard-negatives to choose the hard wrong-token pool size. Use --direct-answer-mode branch-target-set-coverage-unlikelihood to train only target-set mass against hard wrong tokens before exact-target sharpening. Use --direct-answer-mode branch-balanced-target-set-coverage-unlikelihood for the same objective with target-balanced branch batches. Use --direct-answer-mode branch-target-diversity-unlikelihood to keep target-set mass pressure while adding an explicit target-share diversity term over the branch target set. Use --direct-answer-mode branch-balanced-target-diversity-unlikelihood for the same objective with target-balanced branch batches. Use --direct-answer-mode branch-target-replay-coverage-unlikelihood to apply target-set mass and target-share balance over the broader admitted branch training pool at the same branch position. Use --direct-answer-mode branch-balanced-target-replay-coverage-unlikelihood for the same objective with target-balanced sampled branch batches. Use --direct-answer-mode branch-context-replay-coverage-unlikelihood to train each sampled replay branch context to own its own target within the replay target set. Use --direct-answer-mode branch-balanced-context-replay-coverage-unlikelihood for the same objective with target-balanced sampled branch and replay batches. Use --direct-answer-mode branch-context-coverage-anchor-unlikelihood to add a covered-target anchor for replay branches whose own target is already top-1. Use --direct-answer-mode branch-balanced-context-coverage-anchor-unlikelihood for the same objective with target-balanced sampled branch and replay batches. Use --direct-answer-mode branch-context-target-balanced-anchor-unlikelihood to average covered-target anchors by covered target and skip singleton covered target batches. Use --direct-answer-mode branch-balanced-context-target-balanced-anchor-unlikelihood for the same objective with target-balanced sampled branch and replay batches. Use --direct-answer-mode branch-context-coverage-deficit-unlikelihood to identify replay target tokens that are absent from current replay predictions and focus extra target pressure on those missing targets. Use --direct-answer-mode branch-balanced-context-coverage-deficit-unlikelihood for the same objective with target-balanced sampled branch and replay batches. Use --direct-answer-mode branch-context-coverage-preserving-deficit-unlikelihood to combine missing-target pressure with target-balanced preservation anchors for target tokens currently represented in replay predictions. Use --direct-answer-mode branch-balanced-context-coverage-preserving-deficit-unlikelihood for the same objective with target-balanced sampled branch and replay batches. Use --direct-answer-mode branch-context-profile-coverage-preserving-deficit-unlikelihood to compute those deficits and preservation anchors inside each admitted source/profile instead of one global replay target set. Use --direct-answer-mode branch-balanced-context-profile-coverage-preserving-deficit-unlikelihood for the same objective with target-balanced sampled branch and replay batches. Profile-aware modes emit direct_answer_replay_plan.json with branch counts, replay counts, target ids, represented target ids, missing target ids, and coverage floors by profile before direct-answer training starts. Best branch snapshot scoring first enforces a profile-wise target-token coverage floor against the baseline snapshot. Eligible snapshots then use target-rank/top-k evidence before generic wrong-token diversity, so restore prefers snapshots that move correct targets upward without trading away coverage. v0.51 adds opt-in foundation-stack controls before the next repair objective: --optimizer adamw, --gradient-accumulation-steps, warmup/decay schedule flags, --resume-checkpoint, --resume-optimizer, --attention-heads, --use-rms-norm, --use-gated-mlp, --tie-output-embeddings, --use-rotary-positions, --use-kv-cache-path, generation sampling controls, and eval --samples-jsonl trace artifacts. Use STRUCTURE_AUDIT.md before adding the next transformer repair objective: QuarkLM may study open-source model/trainer/tokenizer/checkpoint structure, but must not import external weights, tokenizers, embeddings, datasets, or training text. Use --use-pre-layer-norm to run the audited opt-in GPT-style pre-layer-norm block path with final normalization before the language-model head.

Current language-model evidence from runs/transformer-v0.25/:

SignalValue
Steps40
Validation NLL3.5885 -> 3.4382
Answer exact eval0/28
Pretrained weightsfalse
Pretrained tokenizerfalse

Current promoted answer-lesson evidence from runs/transformer-answer-v0.42-branch-repair-contrast50-dim8-context32/:

SignalValue
Steps80
Context size32
Embedding / feed-forward dimensions8 / 16
Candidate scopeeval
Direct answer steps1000
Direct answer modeperiodic-branch-repair-contrast-unlikelihood
Direct answer negative weight1.0
Direct answer positive weight1.0
Direct answer contrast weight1.0
Direct answer branch position1
Direct answer rollout interval50
Direct answer training examples9144
Direct answer exact0/219 -> 0/219
Direct answer target loss3.4278 -> 2.2708
Direct answer uses candidatesfalse
Direct answer auxiliary weightsfalse
Answer target NLL3.5850 -> 2.4129
Transformer-only candidate accuracy15/219 -> 37/219
Selector-emitted exact answers18/219 -> 219/219
Selector candidate accuracy18/219 -> 219/219
v0.31 generator exact without candidates0/219 -> 219/219
v0.31 generator target loss3.3160 -> 0.0029
Pretrained weightsfalse
Pretrained tokenizerfalse
External embeddingsfalse
v0.31 generator uses answer candidatesfalse

Latest bounded stacked-transformer screen:

SignalValue
Runruns/transformer-answer-v0.43-two-layer-toponly-skip-screen-dim8-context32/
Layers2
Steps40 target-loss + 80 direct-answer
Direct-answer update scopetop layer and language-model head only
Post-direct candidate snapshotskipped and recorded in metrics
Pre-direct candidate accuracy15/219 -> 15/219
Pre-direct answer target NLL3.5855 -> 3.4796
Direct answer target loss3.5186 -> 3.2436
Direct answer exact0/219 -> 0/219
Failure patternrepeated "a" greedy completion
Promotion statusscreening evidence only

Latest direct-answer diagnostic smoke:

SignalValue
Runruns/transformer-answer-v0.43-branch-profile-smoke-dim4-context16/
Diagnosticbranch profiles from model logits
Branch position1
Smoke steps5 target-loss + 5 direct-answer
Post-direct candidate snapshotskipped and recorded in metrics
QA branch accuracy1/8 -> 1/8
Dominant QA branch predictionall "o" -> all "y"
Final QA target marginnegative, about -0.0048
Promotion statusdiagnostic smoke only

Latest branch repair smoke:

SignalValue
Selected comparison runruns/transformer-answer-v0.43-periodic-branch-batch-smoke-dim4-context16/
Prior rejected repairruns/transformer-answer-v0.43-periodic-branch-collapse-smoke-dim4-context16/
Modeperiodic-branch-batch-contrast-unlikelihood
Branch batch size4
Rollout interval5
Steps5 target-loss + 20 direct-answer
Direct answer loss3.5800 -> 3.5248
QA branch accuracy1/8 -> 0/8
Dominant QA branch predictionall "o" -> all "a"
Promotion statusrejected repair evidence

Latest representation-side smoke:

SignalValue
Selected runruns/transformer-answer-v0.43-context-mean-branch-repair-smoke-dim4-context16/
Comparison runruns/transformer-answer-v0.43-context-mean-branch-batch-smoke-dim4-context16/
Representation option--use-context-mean
Selected modeperiodic-branch-repair-unlikelihood
Comparison modeperiodic-branch-batch-contrast-unlikelihood
Steps5 target-loss + 20 direct-answer
Post-direct candidate snapshotskipped and recorded in metrics
Selected direct answer loss3.5805 -> 3.5310
Comparison direct answer loss3.5805 -> 3.5252
Selected QA branch accuracy1/8 -> 0/8
Comparison QA branch accuracy1/8 -> 0/8
Dominant QA branch predictionall "o" -> all "a" in both screens
Promotion statusrejected representation evidence

Latest learned-representation smoke:

SignalValue
Selected runruns/transformer-answer-v0.43-context-projection-branch-repair-smoke-dim4-context16/
Comparison runruns/transformer-answer-v0.43-context-projection-branch-batch-smoke-dim4-context16/
Representation option--use-context-projection
Selected modeperiodic-branch-repair-unlikelihood
Comparison modeperiodic-branch-batch-contrast-unlikelihood
Steps5 target-loss + 20 direct-answer
Post-direct candidate snapshotskipped and recorded in metrics
Projection parameter movementall 20 parameters moved in both screens
Selected direct answer loss3.5802 -> 3.5217
Comparison direct answer loss3.5802 -> 3.5252
Selected QA branch accuracy1/8 -> 0/8
Comparison QA branch accuracy1/8 -> 0/8
Dominant QA branch predictionall "o" -> all "a" in both screens
Promotion statusrejected representation evidence

Latest prompt-attention representation smoke:

SignalValue
Selected runruns/transformer-answer-v0.43-prompt-attention-branch-repair-smoke-dim4-context16/
Comparison runruns/transformer-answer-v0.43-prompt-attention-branch-batch-smoke-dim4-context16/
Representation option--use-prompt-attention-summary
Selected modeperiodic-branch-repair-unlikelihood
Comparison modeperiodic-branch-batch-contrast-unlikelihood
Steps5 target-loss + 20 direct-answer
Post-direct candidate snapshotskipped and recorded in metrics
Output projection movementall 20 zero-initialized parameters moved in both screens
Selected direct answer loss3.5802 -> 3.5217
Comparison direct answer loss3.5802 -> 3.5252
Selected QA branch accuracy1/8 -> 0/8
Comparison QA branch accuracy1/8 -> 0/8
Dominant QA branch predictionall "o" -> all "a" in both screens
Promotion statusrejected representation evidence

Latest branch-context coverage diagnostic:

SignalContext 16Context 32Context 80
Runruns/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context16/runs/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context32/runs/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context80/
QA semantic coverage0/80/88/8
QA ambiguous branch contexts400
All-eval semantic coverage0/21953/219219/219
All-eval ambiguous branch contexts4000
Promotion statusdiagnostic onlydiagnostic onlydiagnostic only

Latest branch-context gate smoke:

SignalContext 16Context 80
Runruns/transformer-answer-v0.43-branch-context-gate-smoke-dim4-context16/runs/transformer-answer-v0.43-branch-context-gate-smoke-dim4-context80/
Required gatetruetrue
Gate statusfailedpassed
Requested direct steps51
Actual direct steps01
Training skippedtruefalse
Promotion statusguardrail evidence onlyguardrail evidence only

Latest branch-only snapshot smoke:

SignalInitial smokeRepair/contrast screenBranch-batch screen
Runruns/transformer-answer-v0.43-branch-context-gated-branchonly-smoke-dim4-context80/runs/transformer-answer-v0.43-branchonly-periodic-repair-contrast50-dim8-context80/runs/transformer-answer-v0.43-branchonly-branch-batch-dim8-context80/
Context size808080
Embedding/feed-forward dim4/88/168/16
Snapshot modebranch-onlybranch-onlybranch-only
Required gatepassed, 219/219 semantic records coveredpassed, 219/219 semantic records coveredpassed, 219/219 semantic records covered
Requested/actual direct steps5/5100/10050/50
JSONL greedy evals skippedtruetruetrue
QA branch profileall "x" to all "r"; 1/8 finalall space to all "a"; 0/8 finalall space to all "a"; 0/8 final
Direct loss signalsmoke onlyinterval train loss 6.7890 -> 6.4326interval train loss 3.4614 -> 3.1976
Promotion statusscreening efficiency evidence onlyrejected screening evidencerejected screening evidence

Latest branch-diversity target smoke:

SignalValue
Runruns/transformer-answer-v0.43-branch-diversity-target-smoke-dim4-context80/
Context gatepassed, 219/219 semantic records covered
Direct steps5/5
Snapshot modebranch-only
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant prediction"r" at rate 1.0
Final QA target-token coverage0.125
Promotion statusexplicit target evidence only

Latest branch-diversity training smoke:

SignalValue
Runruns/transformer-answer-v0.43-branch-diversity-train-smoke-dim4-context80/
Modebranch-diversity-unlikelihood
Context gatepassed, 219/219 semantic records covered
Direct steps10/10
Snapshot modebranch-only
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionall "b"
Final QA target-token coverage0.125
Promotion statusrejected training-mode evidence

Latest branch-diversity freeze-bias smoke:

SignalValue
Runruns/transformer-answer-v0.43-branch-diversity-freezebias-smoke-dim4-context80/
Modebranch-diversity-unlikelihood
Stabilizer--direct-answer-freeze-output-bias
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Snapshot modebranch-only
Diversity targetfailed, 0/9 multi-target profiles passed
Direct answer train loss3.6149 -> 3.5016
Final QA target/predicted unique8 / 1
Final QA dominant predictionall "w"
Final QA target-token coverage0.0
Promotion statusrejected stabilizer evidence

Latest branch-target softmax smoke:

SignalValue
Runruns/transformer-answer-v0.43-branch-target-softmax-freezebias-smoke-dim4-context80/
Modebranch-target-softmax-unlikelihood
Stabilizer--direct-answer-freeze-output-bias
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Snapshot modebranch-only
Diversity targetfailed, 0/9 multi-target profiles passed
Composite train loss5.6671 -> 5.5820
Best QA predicted unique2/8 at step 20
Final QA target/predicted unique8 / 1
Final QA dominant predictionall "w"
Final QA target-token coverage0.0
Promotion statusrejected target-set evidence

Latest branch restore-best smoke:

SignalValue
Runruns/transformer-answer-v0.43-branch-target-softmax-restorebest-smoke-dim4-context80/
Modebranch-target-softmax-unlikelihood
Stabilizers--direct-answer-freeze-output-bias, --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Restored best branch snapshotyes, from step 40
Best branch score[0.0, 0.0, -9.0, 0.0, 0.0946, 0.1409, 0.0]
Snapshot modebranch-only
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionall "u"
Final QA target-token coverage0.125
Promotion statusrejected guardrail evidence

Latest prompt-prefix projection smoke:

SignalValue
Runruns/transformer-answer-v0.43-prompt-prefix-target-softmax-restorebest-smoke-dim4-context80/
Representation option--use-prompt-prefix-projection
Modebranch-target-softmax-unlikelihood
Stabilizers--direct-answer-freeze-output-bias, --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Prompt-prefix projection movementall 20 parameters moved, max abs about 0.0942
Composite train loss5.6649 -> 5.5679
Restored best branch snapshotyes, from step 40
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionall "u"
Final QA target-token coverage0.125
Promotion statusrejected representation evidence

Latest prompt-position projection smoke:

SignalValue
Runruns/transformer-answer-v0.43-prompt-position-target-softmax-restorebest-smoke-dim4-context80/
Representation option--use-prompt-position-projection
Modebranch-target-softmax-unlikelihood
Stabilizers--direct-answer-freeze-output-bias, --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Prompt-position projection movement1108/1284 parameters moved, max abs about 0.0942
Composite train loss5.6649 -> 5.5679
Restored best branch snapshotyes, from step 40
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionall "u"
Final QA target-token coverage0.125
Promotion statusrejected representation evidence

Latest branch-target margin smoke:

SignalValue
Runruns/transformer-answer-v0.43-branch-target-margin-prompt-position-smoke-dim4-context80/
Modebranch-target-margin-unlikelihood
Representation option--use-prompt-position-projection
Stabilizers--direct-answer-freeze-output-bias, --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Prompt-position projection movement1108/1284 parameters moved, max abs about 0.1096
Train loss4.8973 -> 4.7784
Restored best branch snapshotyes, from step 40
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionall "u"
Final QA target-token coverage0.125
Promotion statusrejected target-margin evidence

Latest branch-representation contrast smoke:

SignalValue
Runruns/transformer-answer-v0.43-branch-representation-contrast50-prompt-position-smoke-dim4-context80/
Modebranch-representation-contrast-unlikelihood
Representation option--use-prompt-position-projection
Representation contrast weight50.0
Stabilizers--direct-answer-freeze-output-bias, --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Snapshot diagnosticbranch_representation_profiles
Train loss53.5827 -> 53.4342
Restored best branch snapshotyes, from step 40
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionall "u"
Final QA target-token coverage0.125
Final QA different-target hidden distanceavg about 0.00107, max about 0.00237
Promotion statusrejected representation-contrast evidence

Latest branch-representation capacity smoke:

SignalValue
Runruns/transformer-answer-v0.43-branch-representation-contrast50-prompt-position-smoke-dim8-context80-steps40/
Modebranch-representation-contrast-unlikelihood
Embedding/feed-forward dim8/16
Representation option--use-prompt-position-projection
Representation contrast weight50.0
Stabilizers--direct-answer-freeze-output-bias, --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps40/40; 50-step dim8 screen was too slow for the regular loop
Train loss53.6111 -> 53.5752
Restored best branch snapshotyes, from step 10
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionall "u"
Final QA target-token coverage0.125
Final QA different-target hidden distanceavg about 0.00209, max about 0.00367
Promotion statusrejected capacity evidence

Latest prompt-position scale smoke:

SignalValue
Runruns/transformer-answer-v0.43-prompt-position-scale32-repcontrast50-smoke-dim4-context80/
Modebranch-representation-contrast-unlikelihood
Representation option--use-prompt-position-projection
Prompt-position scale32.0
Representation contrast weight50.0
Stabilizers--direct-answer-freeze-output-bias, --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Train loss55.3835 -> 50.8435
Prompt-position parameters moved1108/1284, max absolute value about 0.07087
Restored best branch snapshotyes, from step 40
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionall "u"
Final QA target-token coverage0.125
Final QA different-target hidden distancerestored avg about 0.01235, max about 0.03610; raw step-50 avg about 0.4115 before restore
Promotion statusrejected prompt-signal scale evidence

Latest pre-layer-norm structural smoke:

SignalValue
Runruns/transformer-answer-v0.44-prelayernorm-repcontrast50-prompt-position-smoke-dim4-context80/
Modebranch-representation-contrast-unlikelihood
Architecture option--use-pre-layer-norm
Representation option--use-prompt-position-projection
Representation contrast weight50.0
Stabilizers--direct-answer-freeze-output-bias, --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Train lossfinal interval 43.8918
Prompt-position parameters moved1108/1284, max absolute value about 0.44679
Final-norm parameters moved8/8, max absolute value about 2.6389
Restored best branch snapshotno; step 50 was best
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionall "y"
Final QA target-token coverage0.125
Final QA different-target hidden distanceavg about 0.2835, max about 0.5151
Partial diversity7/9 multi-target profiles no longer fully collapsed
Promotion statuspartial structural evidence, rejected for promotion

Latest target-balanced branch-batch smoke:

SignalValue
Runruns/transformer-answer-v0.44-target-balanced-prelayernorm-repcontrast50-prompt-position-smoke-dim4-context80/
Modebranch-balanced-representation-contrast-unlikelihood
Architecture option--use-pre-layer-norm
Representation option--use-prompt-position-projection
Batch samplertarget-bucket balanced branch batch
Representation contrast weight50.0
Stabilizers--direct-answer-freeze-output-bias, --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Train lossfinal interval 50.6619
Prompt-position parameters moved516/1284, max absolute value about 0.05881
Final-norm parameters moved8/8, max absolute value about 1.0013
Restored best branch snapshotyes, restored to baseline step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionall "n"
Final QA target-token coverage0.125
Final QA different-target hidden distancerestored avg about 0.1261, max about 0.2476
Promotion statusrejected sampler evidence

Latest branch-rank diagnostic smoke:

SignalValue
Runruns/transformer-answer-v0.45-branch-rank-diagnostic-smoke-dim4-context80/
Diagnosticbranch target rank, top-3/top-5 target coverage, and failed-record top predictions
Architecture option--use-pre-layer-norm
Representation option--use-prompt-position-projection
Snapshot modebranch-only
Context gatepassed, 219/219 semantic records covered
Direct steps1/1
Final QA dominant predictionall "n"
Final QA average target rank14.25
Final QA top-3/top-5 target coverage0.125 / 0.125
Final heldout dominant predictionall "n"
Final heldout average target rank14.25
Final heldout top-3/top-5 target coverage0.125 / 0.125
Promotion statusdiagnostic evidence only

Latest output-binding repair smoke:

SignalValue
Runruns/transformer-answer-v0.46-output-binding-rankscore-smoke-dim4-context80/
Modebranch-output-binding-unlikelihood
Architecture option--use-pre-layer-norm
Representation option--use-prompt-position-projection
Binding weight2.0
Stabilizers--direct-answer-freeze-output-bias, rank-aware --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps20/20
Train loss8.7064 -> 8.2205
Restored best branch snapshotno; step 20 was best by aggregate rank-aware score
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 2
Final QA dominant predictionswrong "l"/"j" branch tokens
Final QA average target rank17.375 -> 14.125
Final QA top-3/top-5 target coverage0.125 -> 0.0 / 0.125 -> 0.25
Final heldout average target rank17.25 -> 14.375
Final heldout top-3/top-5 target coverage0.125 -> 0.0 / 0.125 -> 0.25
Promotion statusrejected output-binding evidence

Latest rank-margin repair smoke:

SignalValue
Runruns/transformer-answer-v0.47-rank-margin-steps50-smoke-dim4-context80/
Modebranch-rank-margin-unlikelihood
Architecture option--use-pre-layer-norm
Representation option--use-prompt-position-projection
Hard wrong tokens5
Margin weight2.0
Stabilizers--direct-answer-freeze-output-bias, rank-aware --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Train loss7.3649 -> 6.1629
Restored best branch snapshotyes, restored from step 40
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionwrong "n"
Final QA average target rank17.375 -> 9.0
Final QA target-token coverage0.0 -> 0.125
Final QA top-3/top-5 target coverage0.125 -> 0.25 / 0.125 -> 0.5
Final heldout average target rank17.25 -> 9.0
Final heldout target-token coverage0.0 -> 0.125
Final heldout top-3/top-5 target coverage0.125 -> 0.25 / 0.125 -> 0.375
Promotion statusrejected rank-lift evidence

Latest balanced rank-margin repair smoke:

SignalValue
Runruns/transformer-answer-v0.48-balanced-rank-margin-smoke-dim4-context80/
Modebranch-balanced-rank-margin-unlikelihood
Architecture option--use-pre-layer-norm
Representation option--use-prompt-position-projection
Batch samplertarget-balanced branch batch
Hard wrong tokens5
Margin weight2.0
Stabilizers--direct-answer-freeze-output-bias, rank-aware --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Train loss7.2303 -> 6.3662
Restored best branch snapshotno; step 50 was best by aggregate rank-aware score
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 2
Final QA dominant predictionswrong "a"/"n" branch tokens
Final QA average target rank17.375 -> 9.375
Final QA target-token coverage0.0 -> 0.125
Final QA top-3/top-5 target coverage0.125 -> 0.375 / 0.125 -> 0.5
Final heldout average target rank17.25 -> 9.625
Final heldout target-token coverage0.0 -> 0.125
Final heldout top-3/top-5 target coverage0.125 -> 0.25 / 0.125 -> 0.5
Promotion statusrejected balanced rank-lift evidence

Latest top-one hard-negative rank-margin smoke:

SignalValue
Runruns/transformer-answer-v0.49-balanced-rank-margin-top1-smoke-dim4-context80/
Modebranch-balanced-rank-margin-unlikelihood
Architecture option--use-pre-layer-norm
Representation option--use-prompt-position-projection
Batch samplertarget-balanced branch batch
Hard wrong tokens1
Margin weight2.0
Stabilizers--direct-answer-freeze-output-bias, rank-aware --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Train loss7.3512 -> 6.3642
Restored best branch snapshotyes, restored from step 10
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionwrong "n"
Final QA average target rank17.375 -> 12.5
Final QA target-token coverage0.0 -> 0.125
Final QA top-3/top-5 target coverage0.125 -> 0.125 / 0.125 -> 0.25
Final heldout average target rank17.25 -> 12.375
Final heldout target-token coverage0.0 -> 0.125
Final heldout top-3/top-5 target coverage0.125 -> 0.125 / 0.125 -> 0.25
Promotion statusrejected top-one hard-negative evidence

Latest top-k softmax branch repair smoke:

SignalValue
Runruns/transformer-answer-v0.50-balanced-topk-softmax-w5-smoke-dim4-context80/
Modebranch-balanced-topk-softmax-unlikelihood
Architecture option--use-pre-layer-norm
Representation option--use-prompt-position-projection
Batch samplertarget-balanced branch batch
Hard wrong tokens5
Restricted-softmax weight5.0
Stabilizers--direct-answer-freeze-output-bias, rank-aware --direct-answer-restore-best-branch-snapshot
Context gatepassed, 219/219 semantic records covered
Direct steps50/50
Restored best branch snapshotyes, restored from step 40
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 1
Final QA dominant predictionwrong "u"
Final QA average target rank17.375 -> 8.75
Final QA target-token coverage0.0 -> 0.125
Final QA top-3/top-5 target coverage0.125 -> 0.375 / 0.125 -> 0.5
Final heldout average target rank17.25 -> 8.75
Final heldout target-token coverage0.0 -> 0.125
Final heldout top-3/top-5 target coverage0.125 -> 0.375 / 0.125 -> 0.5
Promotion statusrejected top-k softmax rank-lift evidence

Latest foundation-stack smoke:

SignalValue
Runruns/transformer-v0.51-foundation-stack-smoke/
Checkpoint formatquarklm-transformer-v2
Optimizeradamw with saved optimizer_state.json
Schedule / accumulationwarmup 1, decay 2, gradient accumulation 2
Architecture switches--attention-heads 2, --use-rms-norm, --use-gated-mlp, --tie-output-embeddings, --use-rotary-positions
Runtime switches--use-kv-cache-path, eval --use-kv-cache, top-k/top-p/temperature/repetition controls
Eval artifactseval.json and replayable eval_samples.jsonl with token traces
Steps2/2 language-model smoke steps
Validation statusmechanics smoke completed; transformer tests pass
Promotion statusfoundation mechanics evidence only

Latest full-stack top-k branch repair smoke:

SignalValue
Runruns/transformer-answer-v0.52-fullstack-topk-softmax-smoke-dim4-context80/
Modebranch-balanced-topk-softmax-unlikelihood
Foundation stackAdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Direct steps50/50
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA dominant predictionwrong "i"
Final QA average target rank13.25
Final QA target-token coverage0.25
Final QA top-3/top-5 target coverage0.25 / 0.375
Final heldout average target rank13.375
Final heldout target-token coverage0.25
Final heldout top-3/top-5 target coverage0.25 / 0.375
Promotion statusrejected unchanged top-k pressure under full stack

Latest full-stack bidirectional binding branch repair smoke:

SignalValue
Runruns/transformer-answer-v0.53-fullstack-bidir-binding-smoke-dim4-context80/
Modebranch-balanced-bidirectional-binding-unlikelihood
Foundation stackAdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Binding pressurerow-wise branch target choice plus column-wise target-token ownership across prompt contexts
Unit coveragefocused transformer tests pass, including the context-ownership regression
Direct steps50/50
Restored best branch snapshotyes, restored from step 40
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 2
Final QA dominant predictionwrong "a"
Final QA average target rank7.875
Final QA target-token coverage0.125
Final QA top-3/top-5 target coverage0.25 / 0.5
Step-50 QA notetarget-token coverage briefly reached 0.25 with average rank 8.375 before restore selected step 40
Final heldout average target rank9.0
Final heldout target-token coverage0.125
Final heldout top-3/top-5 target coverage0.25 / 0.375
Promotion statuspartial rank-pressure progress; rejected until target coverage is preserved and top-1 branch choices improve

Latest full-stack coverage binding branch repair smoke:

SignalValue
Runruns/transformer-answer-v0.54-fullstack-coverage-binding-smoke-dim4-context80/
Modebranch-balanced-coverage-binding-unlikelihood
Foundation stackAdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Binding pressurebranch targets versus sibling targets plus hard wrong tokens, with target-set mass coverage guard
Hard wrong tokens5
Unit coveragefocused transformer tests pass, including the hard-wrong-token coverage regression
Direct steps50/50
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA dominant predictionwrong "i"
Final QA average target rank13.25
Final QA target-token coverage0.25
Final QA top-3/top-5 target coverage0.25 / 0.375
Training snapshot notestep 50 improved QA average target rank to 8.125, but target-token coverage collapsed to 0.0 with wrong "a" top-1 collapse
Final heldout average target rank13.375
Final heldout target-token coverage0.25
Final heldout top-3/top-5 target coverage0.25 / 0.375
Promotion statusrejected; best-snapshot scoring protected the checkpoint, but the objective traded target coverage away for rank

Latest full-stack target-set coverage branch repair smoke:

SignalValue
Runruns/transformer-answer-v0.55-fullstack-target-set-coverage-smoke-dim4-context80/
Modebranch-balanced-target-set-coverage-unlikelihood
Foundation stackAdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Coverage pressurebranch target set versus hard wrong tokens, without exact-target row loss or cross-context ownership
Positive target CE0.0
Hard wrong tokens5
Unit coveragefocused transformer tests pass, including the target-set-only coverage regression
Direct steps50/50
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA dominant predictionwrong "i"
Final QA average target rank13.25
Final QA target-token coverage0.25
Final QA top-3/top-5 target coverage0.25 / 0.375
Training snapshot notestep 50 improved QA average target rank to 10.0, but target-token coverage collapsed to 0.0 with wrong "a" top-1 collapse
Final heldout average target rank13.375
Final heldout target-token coverage0.25
Final heldout top-3/top-5 target coverage0.25 / 0.375
Promotion statusrejected; batch-local target-set mass is not enough to preserve eval target-token coverage

Latest full-stack target-diversity branch repair smoke:

SignalValue
Runruns/transformer-answer-v0.57-fullstack-target-diversity-smoke-dim4-context80/
Modebranch-balanced-target-diversity-unlikelihood
Foundation stackAdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Diversity pressuretarget-set mass plus target-share balance over branch targets
Positive target CE0.0
Hard wrong tokens5
Unit coveragefocused transformer tests pass, including restricted target-set mass and weakest target-share balance regression
Direct steps50/50
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA dominant predictionwrong "i"
Final QA average target rank13.25
Final QA target-token coverage0.25
Final QA top-3/top-5 target coverage0.25 / 0.375
Training snapshot notestep 50 improved QA average target rank to 10.0, but target-token coverage collapsed to 0.0 with wrong "a" top-1 collapse
Final heldout average target rank13.375
Final heldout target-token coverage0.25
Final heldout top-3/top-5 target coverage0.25 / 0.375
Promotion statusrejected; batch-local target-share diversity still does not preserve eval-wide target-token coverage

Latest full-stack target-replay coverage branch repair smoke:

SignalValue
Runruns/transformer-answer-v0.58-fullstack-target-replay-coverage-smoke-dim4-context80/
Modebranch-balanced-target-replay-coverage-unlikelihood
Foundation stackAdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Replay pressuretarget-set mass plus target-share balance over admitted branch-pool targets
Positive target CE0.0
Hard wrong tokens5
Unit coveragefocused transformer tests pass, including sampled-batch missing pool-target replay regression
Direct steps50/50
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA dominant predictionwrong "i"
Final QA average target rank13.25
Final QA target-token coverage0.25
Final QA top-3/top-5 target coverage0.25 / 0.375
Training snapshot notestep 40 improved QA average target rank to 6.875 and top-5 coverage to 0.5; by step 50, QA/heldout top-1 collapsed to wrong "n" and target-token coverage had hit 0.0 during training
Final heldout average target rank13.375
Final heldout target-token coverage0.25
Final heldout top-3/top-5 target coverage0.25 / 0.375
Promotion statusrejected; pool-owned replay coverage still does not preserve context-specific target ownership

Latest full-stack context-replay coverage branch repair smoke:

SignalValue
Runruns/transformer-answer-v0.59-fullstack-context-replay-coverage-smoke-dim4-context80/
Modebranch-balanced-context-replay-coverage-unlikelihood
Foundation stackAdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Replay pressuretarget-set mass plus context-owned target share over admitted branch-pool replay contexts
Positive target CE0.0
Hard wrong tokens5
Unit coveragefocused transformer tests pass, including fixed replay-context owned-target share regression
Direct steps50/50
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA dominant predictionwrong "i"
Final QA average target rank13.25
Final QA target-token coverage0.25
Final QA top-3/top-5 target coverage0.25 / 0.375
Training snapshot notestep 40 improved QA average target rank to 7.375, top-3 to 0.375, and top-5 to 0.5; by step 50, QA predicted diversity was only 2/8 and target-token coverage had hit 0.0 during training
Final heldout average target rank13.375
Final heldout target-token coverage0.25
Final heldout top-3/top-5 target coverage0.25 / 0.375
Promotion statusrejected; context-owned replay improves rank/top-k snapshots but still does not preserve target-token coverage

Latest full-stack coverage-floor branch restore smoke:

SignalValue
Runruns/transformer-answer-v0.60-fullstack-context-replay-coverage-floor-metadata-smoke-dim4-context80/
Modebranch-balanced-context-replay-coverage-unlikelihood
Scoring guardprofile-wise target-token coverage floor before rank/top-k scoring
Snapshot metadatadirect-answer JSONL rows include branch_target_coverage_by_profile
Foundation stackAdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Positive target CE0.0
Hard wrong tokens5
Unit coveragefocused transformer tests pass, including profile-wise coverage-floor regression
Direct steps50/50
Direct-answer JSONL rows7 clean rows
Restored best branch snapshotyes, restored from step 0
Baseline coverage floorqa 0.25, heldout 0.25, admissions 0.1429, minimum profile 0.0714
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA dominant predictionwrong "i"
Final QA average target rank13.25
Final QA target-token coverage0.25
Final QA top-3/top-5 target coverage0.25 / 0.375
Training snapshot notestep 40 improved QA average target rank to 7.375, top-3 to 0.375, and top-5 to 0.5, but regressed profile coverage and was ineligible for restore
Final heldout average target rank13.375
Final heldout target-token coverage0.25
Final heldout top-3/top-5 target coverage0.25 / 0.375
Promotion statusgate repair accepted; trained model behavior rejected because coverage still collapses during training

Latest full-stack covered-target anchor branch repair smoke:

SignalValue
Runruns/transformer-answer-v0.61-fullstack-context-coverage-anchor-smoke-dim4-context80/
Modebranch-balanced-context-coverage-anchor-unlikelihood
Scoring guardprofile-wise target-token coverage floor before rank/top-k scoring
Anchor pressurecovered replay branches add target-vs-replay-target/hard-wrong CE
Foundation stackAdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Positive target CE0.0
Hard wrong tokens5
Unit coveragefocused transformer tests pass, including anchored-vs-unanchored covered branch regression
Direct steps50/50
Direct-answer JSONL rows7 clean rows
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA dominant predictionwrong "i"
Final QA average target rank13.25
Final QA target-token coverage0.25
Final QA top-3/top-5 target coverage0.25 / 0.375
Training snapshot notesnapshots collapsed harder to covered wrong "i"; QA/heldout predicted diversity fell to 1/8, target-token coverage to 0.125, and average target rank above 21
Final heldout average target rank13.375
Final heldout target-token coverage0.25
Final heldout top-3/top-5 target coverage0.25 / 0.375
Promotion statusrejected; global covered-target anchoring over-protects one covered token instead of preserving coverage diversity

Latest full-stack coverage-preserving deficit branch repair smoke:

SignalValue
Runruns/transformer-answer-v0.65-fullstack-coverage-preserving-deficit-smoke-dim4-context80/
Modebranch-balanced-context-coverage-preserving-deficit-unlikelihood
Scoring guardprofile-wise target-token coverage floor before rank/top-k scoring
Deficit pressurereplay target tokens absent from current replay predictions receive target-vs-hard-candidate pressure
Preservation pressuretarget tokens currently represented in replay predictions receive target-balanced anchors
Foundation stackAdamW, gradient accumulation, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Positive target CE0.0
Hard wrong tokens5
Unit coveragefocused transformer tests pass, including missing-target lift and represented-target preservation regressions
Direct steps50/50
Direct-answer JSONL rows7 clean rows
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA dominant predictionwrong "i"
Final QA average target rank13.25
Final QA target-token coverage0.25
Final QA top-3/top-5 target coverage0.25 / 0.375
Training snapshot notestep 50 reached QA/heldout branch accuracy 1/8, QA average target rank 7.75, heldout average target rank 7.125, and top-5 coverage 0.5, but both profiles collapsed to predicted diversity 1/8 and target-token coverage 0.125
Final heldout average target rank13.375
Final heldout target-token coverage0.25
Final heldout top-3/top-5 target coverage0.25 / 0.375
Promotion statusrejected; current-prediction preservation improves rank but over-preserves one represented target token

Latest profile-aware replay-plan smoke:

SignalValue
Runruns/transformer-answer-v0.67-profile-aware-replay-plan-smoke-dim4-context80/
Modebranch-balanced-context-profile-coverage-preserving-deficit-unlikelihood
Replay plan artifactdirect_answer_replay_plan.json
Replay plan size9144 branch records and 9144 replay records across 21 profiles
Example profile floorsqa:place coverage floor 0.5; qa:color coverage floor 0.0
Foundation stackAdamW, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Unit coveragefocused transformer tests pass, including profile-deficit isolation and shared-target source preservation
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Direct steps1/1 bounded smoke
Snapshot modebranch-only; post-direct candidate snapshot skipped and recorded
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Promotion statusmechanics-readiness evidence only; profile-aware plan exists, but model quality is not promoted

Latest profile-aware full-stack repair screen:

SignalValue
Runruns/transformer-answer-v0.68-fullstack-profile-aware-preserving-deficit-smoke-dim4-context80/
Modebranch-balanced-context-profile-coverage-preserving-deficit-unlikelihood
Replay plan artifactdirect_answer_replay_plan.json
Replay plan size9144 branch records and 9144 replay records across 21 profiles
Foundation stackAdamW, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Direct steps50/50
Direct-answer JSONL rows7 clean rows
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3 after restore
Final QA average target rank13.25 after restore
Final QA target-token coverage0.25 after restore
Training snapshot notestep 40 improved QA average target rank to 6.5 and top-5 coverage to 0.625, but QA target-token coverage regressed to 0.125 and predicted diversity collapsed to 1/8
Final heldout average target rank13.375 after restore
Training heldout notestep 40 improved heldout average target rank to 6.875 and top-5 coverage to 0.5, but target-token coverage regressed to 0.125 and predicted diversity collapsed to 1/8
Promotion statusrejected; profile-aware rank gains still trade away coverage and diversity

Latest profile target-share full-stack screen:

SignalValue
Runruns/transformer-answer-v0.82-fullstack-profile-target-share-smoke-dim4-context80/
Modebranch-balanced-context-profile-target-share-preserving-deficit-unlikelihood
Artifact stackexperiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size9144 branch records and 9144 replay records across 21 profiles
Foundation stackAdamW, two heads, RMSNorm, gated MLP, tied output embeddings, rotary positions, cache-aware metadata
Context / representationcontext 80, --use-pre-layer-norm, --use-prompt-position-projection
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Purity gatesno pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps50/50
Direct-answer JSONL rows7 clean rows
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3 after restore
Final QA average target rank13.25 after restore
Final QA target-token coverage0.25 after restore
Training snapshot notestep 40 improved QA average target rank to 9.125 and top-5 coverage to 0.375, but QA target-token coverage regressed to 0.0 and predicted diversity collapsed to 1/8
Training heldout notestep 40 improved heldout average target rank to 9.25 and top-5 coverage to 0.375, but heldout target-token coverage regressed to 0.0 and predicted diversity collapsed to 1/8
Promotion statusrejected; target-share pressure still trades coverage and diversity away for rank

Latest prompt-specific branch ownership full-stack screen:

SignalValue
Runruns/transformer-answer-v0.83-fullstack-prompt-ownership-smoke-dim4-context80/
Modebranch-balanced-context-profile-prompt-ownership-target-share-preserving-deficit-unlikelihood
Added mechanicsibling-target margin inside each profile so a replay context is trained to outrank other profile targets
Unit coveragefocused transformer test passes; prompt ownership lifts a context-specific target more than v0.82 target-share pressure
Artifact stackexperiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size9144 branch records and 9144 replay records across 21 profiles
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Purity gatesno pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps50/50
Direct-answer JSONL rows7 clean rows
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3 after restore
Final QA average target rank13.25 after restore
Final QA target-token coverage0.25 after restore
Training snapshot notestep 50 improved QA average target rank to 8.625, but QA target-token coverage regressed to 0.0 and predicted diversity collapsed to 1/8
Training heldout notestep 50 improved heldout average target rank to 8.5, but heldout target-token coverage regressed to 0.0 and predicted diversity collapsed to 1/8
Promotion statusrejected; prompt ownership needs coverage-preserving training before rank gains can be trusted

Latest baseline-anchored prompt ownership full-stack screen:

SignalValue
Runruns/transformer-answer-v0.84-fullstack-baseline-anchored-prompt-ownership-smoke-dim4-context80/
Modebranch-balanced-context-profile-baseline-anchored-prompt-ownership-target-share-preserving-deficit-unlikelihood
Added mechanicreplay preservation uses baseline profile-aware replay predictions instead of current prediction drift
Unit coveragefocused transformer tests pass; baseline prediction overrides are used by profiled replay batches and protect a covered target better than dynamic prediction preservation
Artifact stackexperiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size9144 branch records and 9144 replay records across 21 profiles
Baseline prediction anchors562 recorded and active
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Purity gatesno pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps50/50
Direct-answer JSONL rows7 clean rows
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3 after restore
Final QA average target rank13.25 after restore
Final QA target-token coverage0.25 after restore
Training snapshot notestep 40 improved QA average target rank to 8.0, but QA target-token coverage regressed to 0.125 and predicted diversity collapsed to 1/8
Training heldout notestep 40 improved heldout average target rank to 8.375, but heldout target-token coverage regressed to 0.125 and predicted diversity collapsed to 1/8
Promotion statusrejected; baseline anchors improve coverage over v0.83 but still miss the full 0.25 coverage floor

Latest baseline-floor update-gated prompt ownership full-stack screen:

SignalValue
Runruns/transformer-answer-v0.85-fullstack-baseline-floor-gated-prompt-ownership-smoke-dim4-context80/
Modebranch-balanced-context-profile-baseline-floor-gated-prompt-ownership-target-share-preserving-deficit-unlikelihood
Added mechanicdirect-answer updates are rolled back when branch-profile target-token coverage falls below the step-0 baseline floor
Unit coveragefocused transformer tests pass; the new mode records active baseline replay anchors and update-guard accounting
Artifact stackexperiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size9144 branch records and 9144 replay records across 21 profiles
Baseline prediction anchors562 recorded and active
Update guardchecked 50/50 attempted updates; accepted 0; rejected 50
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Purity gatesno pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps50/50 attempted
Direct-answer JSONL rows7 clean rows
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA average target rank13.25
Final QA target-token coverage0.25
Training snapshot noteevery recorded trained snapshot preserved QA target-token coverage at 0.25, but only because every attempted update was rejected
Training heldout noteevery recorded trained snapshot preserved heldout target-token coverage at 0.25, but only because every attempted update was rejected
Promotion statusrejected; the guard prevents unsafe forgetting, but no weight update is accepted and branch diversity still fails

Latest adaptive baseline-floor prompt ownership full-stack screen:

SignalValue
Runruns/transformer-answer-v0.86-fullstack-baseline-floor-adaptive-prompt-ownership-smoke-dim4-context80/
Modebranch-balanced-context-profile-baseline-floor-adaptive-prompt-ownership-target-share-preserving-deficit-unlikelihood
Added mechanicrejected updates are retried at learning-rate scales 1.0, 0.25, 0.05, and 0.01 after restoring model, optimizer, and RNG state
Unit coveragefocused transformer tests pass; the new mode records active baseline replay anchors and adaptive retry accounting
Artifact stackexperiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size9144 branch records and 9144 replay records across 21 profiles
Baseline prediction anchors562 recorded and active
Adaptive scales1.0, 0.25, 0.05, 0.01
Update guardchecked 50/50 steps; attempted 200 scaled updates; accepted 0; rejected 200
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Purity gatesno pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps50/50 attempted
Direct-answer JSONL rows7 clean rows
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA average target rank13.25
Final QA target-token coverage0.25
Training snapshot noteevery recorded trained snapshot preserved QA target-token coverage at 0.25, but adaptive retries accepted no updates
Training heldout noteevery recorded trained snapshot preserved heldout target-token coverage at 0.25, but adaptive retries accepted no updates
Promotion statusrejected; smaller learning-rate scales do not make the update safe, which sets up the v0.87 repair-retry screen

Latest repaired baseline-floor prompt ownership full-stack screen:

SignalValue
Runruns/transformer-answer-v0.87-fullstack-baseline-floor-repaired-prompt-ownership-clean-smoke-dim4-context80/
Modebranch-balanced-context-profile-baseline-floor-repaired-prompt-ownership-target-share-preserving-deficit-unlikelihood
Added mechanicfailed adaptive retries get one bounded baseline-covered anchor repair before the floor probe decides whether to keep or roll back the update
Unit coveragefocused transformer tests pass; the new mode records active baseline replay anchors, repair anchors, repair attempts, and accepted update-shape accounting
Artifact stackexperiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size9144 branch records and 9144 replay records across 21 profiles
Baseline prediction anchors562 recorded and active
Repair anchors227 recorded; one repair step per failed retry
Adaptive scales1.0, 0.25, 0.05, 0.01
Update guardchecked 50/50 steps; attempted 200 updates; ran 200 one-step repairs; accepted 0; rejected 200
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Purity gatesno pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps50/50 attempted
Direct-answer JSONL rows7 clean rows
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA average target rank13.25
Final QA target-token coverage0.25
Training snapshot noteevery recorded trained snapshot preserved QA target-token coverage at 0.25, but repair retries accepted no updates
Training heldout noteevery recorded trained snapshot preserved heldout target-token coverage at 0.25, but repair retries accepted no updates
Promotion statusrejected; post-update repair is insufficient and the next repair needs a floor-preserving objective before optimizer application

Latest objective-side baseline-floor prompt ownership full-stack screen:

SignalValue
Runruns/transformer-answer-v0.88-fullstack-baseline-floor-objective-prompt-ownership-smoke-dim4-context80/
Modebranch-balanced-context-profile-baseline-floor-objective-prompt-ownership-target-share-preserving-deficit-unlikelihood
Added mechanica balanced batch of baseline-covered floor anchors is included in the same direct-answer loss and backward pass as branch-diversity pressure
Unit coveragefocused transformer tests pass; the new mode records objective anchor counts, anchor batch size, anchor weight, and accepted/rejected guard accounting
Artifact stackexperiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size9144 branch records and 9144 replay records across 21 profiles
Baseline prediction anchors562 recorded and active
Objective-side floor anchors227 recorded; batch size 32; weight 10.0
Adaptive scales1.0, 0.25, 0.05, 0.01
Update guardchecked 50/50 steps; attempted 200 updates; ran 200 objective anchor batches covering 2400 anchor records; accepted 0; rejected 200
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Purity gatesno pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps50/50 attempted
Direct-answer JSONL rows7 clean rows
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA average target rank13.25
Final QA target-token coverage0.25
Training snapshot noteevery recorded trained snapshot preserved QA target-token coverage at 0.25, but objective-side floor anchors accepted no updates
Training heldout noteevery recorded trained snapshot preserved heldout target-token coverage at 0.25, but objective-side floor anchors accepted no updates
Promotion statusrejected; the combined floor-anchor and branch-pressure objective is insufficient, which sets up the stabilization-only screen

Latest stabilization-only baseline-floor full-stack screen:

SignalValue
Runruns/transformer-answer-v0.89-fullstack-baseline-floor-stabilization-smoke-dim4-context80/
Modebranch-context-profile-baseline-floor-stabilization-unlikelihood
Added mechanicguarded attempts train only baseline-covered floor anchors, with branch-diversity pressure removed from the update shape
Unit coveragefocused transformer tests pass; the new mode records stabilization anchor counts, anchor batch size, stabilization batches, and accepted/rejected guard accounting
Artifact stackexperiment intent, corpus hygiene, training plan, candidate quarantine, deterministic verifier, recipe, replay plan, constraint-first report, metrics, tokenizer, optimizer, lessons, checkpoint
Replay plan size9144 branch records and 9144 replay records across 21 profiles
Baseline prediction anchors562 recorded and active
Stabilization floor anchors227 recorded; batch size 32
Adaptive scales1.0, 0.25, 0.05, 0.01
Update guardchecked 50/50 steps; attempted 200 updates; ran 200 stabilization anchor batches covering 2400 anchor records; accepted 0; rejected 200
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Purity gatesno pretrained weights, no pretrained tokenizer, no external embeddings
Direct steps50/50 attempted
Direct-answer JSONL rows7 clean rows
Restored best branch snapshotyes, restored from step 0
Diversity targetfailed, 0/9 multi-target profiles passed
Final QA target/predicted unique8 / 3
Final QA average target rank13.25
Final QA target-token coverage0.25
Training snapshot noteevery recorded trained snapshot preserved QA target-token coverage at 0.25, but stabilization-only floor anchors accepted no updates
Training heldout noteevery recorded trained snapshot preserved heldout target-token coverage at 0.25, but stabilization-only floor anchors accepted no updates
Promotion statusrejected; floor-only anchor updates are insufficient under the current guard, so the next repair should diagnose the guard/update interaction before branch pressure is added back

Latest baseline-floor rejection diagnostics screen:

SignalValue
Runruns/transformer-answer-v0.90-fullstack-baseline-floor-stabilization-diagnostics-smoke-dim4-context80/
Modebranch-context-profile-baseline-floor-stabilization-unlikelihood
Added mechanicguard records rejected update-shape counts, rejected scale counts, violation profile counts, diagnostic samples, and worst rejected floor violation
Unit coveragefocused transformer tests pass; the reusable coverage diagnostic helper reports profile deficits and the stabilization guard records rejection diagnostics
Update guardchecked 50/50 steps; attempted 200 updates; accepted 0; rejected 200
Rejected update shapesstabilization: 200
Rejected adaptive scales1: 50, 0.25: 50, 0.05: 50, 0.01: 50
Violation profile countsheldout: 200, admissions: 150, glossary: 150, qa: 150, self: 100, learning: 50, owner: 50
Worst rejected floor violationlearning, baseline coverage 0.25, snapshot coverage 0.0, deficit 0.25
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed
Promotion statusrejected for model promotion, but diagnostic evidence is usable for the next profile-targeted floor repair

Profile-targeted baseline-floor stabilization screen:

SignalValue
Runruns/transformer-answer-v0.91-fullstack-baseline-floor-profile-targeted-stabilization-smoke-dim4-context80/
Modebranch-context-profile-baseline-floor-profile-targeted-stabilization-unlikelihood
Added mechanicguarded attempts train the full baseline-covered floor-anchor profile-target surface instead of a random 32-anchor sample
Unit coveragefocused transformer tests pass; the new mode records profile-target activity, full floor batch sizing, profile-target counts, and source-profile anchor counts
Floor anchors227 recorded; requested batch size 227; 12 profile-target groups
Anchor profile countsqa:owner 48, qa:place 41, fact:owner 40, fact:place 40, bridge:owner 20, bridge:place 16, fact:learning 8, qa:glossary 6, qa:learning 5, qa:self 3
Update guardchecked 50/50 steps; attempted 200 updates; accepted 0; rejected 200
Rejected update shapesprofile_targeted_stabilization: 200
Rejected adaptive scales1: 50, 0.25: 50, 0.05: 50, 0.01: 50
Violation profile countsheldout: 200, admissions: 150, glossary: 150, qa: 150, self: 100, learning: 50, owner: 50
Worst rejected floor violationlearning, baseline coverage 0.25, snapshot coverage 0.0, deficit 0.25
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed
Promotion statusrejected; full profile-target floor coverage alone does not make guarded updates safe

Sequential profile-floor stabilization screen:

SignalValue
Runruns/transformer-answer-v0.92-fullstack-baseline-floor-sequential-profile-stabilization-smoke-dim4-context80/
Modebranch-context-profile-baseline-floor-sequential-profile-stabilization-unlikelihood
Added mechanicguarded attempts train source-profile floor-anchor groups sequentially and roll back each unsafe group before trying the next one
Unit coveragefocused transformer tests pass; the new mode records sequential profile attempts, accept/reject counts, no-effective-update attempts, and profile probe samples
Floor anchors227 recorded; requested batch size 227; 12 profile-target groups; 10 source-profile groups
Sequential profile attempts2000 attempted; 0 accepted; 2000 rejected; 2400 anchor records
Source-profile rejection countseach of bridge:owner, bridge:place, fact:learning, fact:owner, fact:place, qa:glossary, qa:learning, qa:owner, qa:place, and qa:self rejected 200 times
Update guardchecked 50/50 steps; attempted 200 updates; accepted 0; rejected 200; no-effective-update attempts 200
Rejected update shapessequential_profile_stabilization: 200
Rejected adaptive scales1: 50, 0.25: 50, 0.05: 50, 0.01: 50
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed
Promotion statusrejected; sequential source-profile repair still cannot produce safe weight movement

Calibrated sequential profile-floor stabilization screen:

SignalValue
Runruns/transformer-answer-v0.93-baseline-floor-calibrated-sequential-profile-stabilization-step1-dim4-context80/
Modebranch-context-profile-baseline-floor-calibrated-sequential-profile-stabilization-unlikelihood
Added mechaniccalibrated adaptive scales below 0.01 plus coverage-only guard probes for floor checks
Unit coveragefocused transformer tests pass; the mode records calibrated activation, extended scale metadata, replay-plan scales, and accepted/rejected update-shape accounting
Calibrated scales1, 0.25, 0.05, 0.01, 0.0025, 0.0005, 0.0001
Update guardchecked 1/1 step; attempted 5 updates; accepted 1; rejected 4; no-effective-update attempts 4
Accepted updatebridge:owner source-profile group at scale 0.0025
Sequential profile attempts50 attempted; 1 accepted; 49 rejected; 60 anchor records
Rejected adaptive scales1: 1, 0.25: 1, 0.05: 1, 0.01: 1
Accepted update shapescalibrated_sequential_profile_stabilization: 1
Rejected update shapescalibrated_sequential_profile_stabilization: 4
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed
Promotion statusrejected for model promotion; calibrated floor-preserving movement is now proven possible

Latest profile-scale calibrated floor stabilization screen:

SignalValue
Runruns/transformer-answer-v0.94-baseline-floor-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/
Modebranch-context-profile-baseline-floor-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood
Added mechanicprofile-scale memory: search calibrated scales per source profile, preserve the first safe profile update, and roll back unsafe profile-scale attempts
Unit coveragefocused transformer tests pass; the mode records profile-scale activation, search/outer scales, profile-scale attempts, acceptance/rejection scale counts, and accepted profile scales
Search scales1, 0.25, 0.05, 0.01, 0.0025, 0.0005, 0.0001
Outer guardchecked 1/1 step; attempted 1 update; accepted 1; rejected 0; no-effective-update attempts 0
Profile-scale attempts60 attempted; 8 accepted; 52 rejected; 72 anchor records
Accepted profile scalesbridge:owner 0.0025, bridge:place 0.0005, fact:learning 0.0005, fact:owner 0.0001, fact:place 0.0001, qa:glossary 0.0001, qa:place 0.0001, qa:self 1
Accepted update shapesprofile_scale_calibrated_sequential_profile_stabilization: 1
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed
Promotion statusrejected for model promotion; safe calibrated movement now spans eight source profiles

Latest diversity-aware profile-scale floor stabilization screen:

SignalValue
Runruns/transformer-answer-v0.95-baseline-floor-diversity-profile-scale-calibrated-sequential-stabilization-configured-step1-dim4-context80/
Modebranch-context-profile-baseline-floor-diversity-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood
Added mechanicdiversity-aware profile-scale memory: accept profile-scale updates only when they preserve the baseline floor and do not regress branch-diversity score from the profile's pre-update state
Unit coveragefocused transformer tests pass; the mode records diversity activation, attempts, score improvements/ties/regressions, floor rejections, rejection reasons, and accepted profile outcomes
Search scales1, 0.25, 0.05, 0.01, 0.0025, 0.0005, 0.0001
Outer guardchecked 1/1 step; attempted 1 update; accepted 1; rejected 0; outer diversity rejections 0
Profile-scale attempts58 attempted; 5 accepted; 53 rejected
Diversity outcomes5 score improvements; 0 ties; 11 score regressions; 42 floor regressions
Accepted profile scalesbridge:owner 0.0025, bridge:place 0.0005, fact:learning 0.0005, qa:glossary 1, qa:learning 0.0025
Accepted update shapesprofile_scale_diversity_calibrated_sequential_profile_stabilization: 1
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed
Promotion statusrejected for model promotion; accepted movement is now explicitly diversity-score non-regressive

Latest frontier profile-scale floor stabilization screen:

SignalValue
Runruns/transformer-answer-v0.96-baseline-floor-diversity-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/
Modebranch-context-profile-baseline-floor-diversity-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood
Added mechanicfrontier target anchors: add missing-target branch contexts to eligible profile-scale batches, then accept only floor-preserving and branch-diversity-score-improving updates
Unit coveragefocused transformer tests pass; the mode records frontier activation, frontier anchor counts, attempts, score outcomes, floor rejections, rejection reasons, and accepted profile outcomes
Search scales1, 0.25, 0.05, 0.01, 0.0025, 0.0005, 0.0001
Frontier anchors52 anchors across 10 source-profile groups and 52 source-profile targets
Outer guardchecked 1/1 step; attempted 1 update; accepted 1; rejected 0
Profile-scale attempts43 attempted; 9 accepted; 34 rejected; 224 frontier records sampled
Diversity outcomes9 score improvements; 0 ties; 6 score regressions; 28 floor regressions
Accepted profile scalesbridge:owner 0.0025, fact:learning 0.0025, fact:owner 0.0025, fact:place 0.25, qa:glossary 0.05, qa:learning 0.05, qa:owner 0.01, qa:place 0.0005, qa:self 0.05
Accepted update shapesprofile_scale_frontier_diversity_calibrated_sequential_profile_stabilization: 1
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed; max dominant predicted rate improved to 0.9; minimum target-token coverage improved to 0.1667
Promotion statusrejected for model promotion; frontier movement improves diversity but does not yet satisfy full target coverage

Latest coverage-frontier profile-scale floor stabilization screen:

SignalValue
Runruns/transformer-answer-v0.97-baseline-floor-diversity-coverage-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/
Modebranch-context-profile-baseline-floor-diversity-coverage-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood
Added mechaniccoverage-frontier acceptance: keep frontier anchors active, then accept only floor-preserving updates that gain target-token coverage over the current profile-base snapshot
Unit coveragefocused transformer tests pass; the mode records coverage-frontier activation, coverage gain/tie/regression counts, coverage rejection reasons, accepted coverage deltas, and the new update shape
Search scales1, 0.25, 0.05, 0.01, 0.0025, 0.0005, 0.0001
Frontier anchors52 anchors across 10 source-profile groups and 52 source-profile targets
Outer guardchecked 1/1 step; attempted 1 update; accepted 1; rejected 0
Profile-scale attempts68 attempted; 1 accepted; 67 rejected
Coverage outcomes1 coverage gain; 15 coverage ties; 52 coverage regressions
Coverage rejection reasons50 floor regressions; 15 coverage ties; 2 coverage regressions
Accepted profile scalesbridge:owner 0.0025
Accepted update shapesprofile_scale_coverage_frontier_diversity_calibrated_sequential_profile_stabilization: 1
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed; strict coverage gating accepted only one source-profile update
Promotion statusrejected for model promotion; monotonic coverage gains are now auditable, but the screen starves later source-profile repairs

Latest coverage-prep frontier profile-scale floor stabilization screen:

SignalValue
Runruns/transformer-answer-v0.98-baseline-floor-diversity-coverage-prep-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/
Modebranch-context-profile-baseline-floor-diversity-coverage-prep-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood
Added mechaniccoverage-preparation acceptance: keep coverage-frontier accounting, accept coverage gains, and also accept coverage-tied moves only when branch-diversity score improves
Unit coveragefocused transformer tests pass; the mode records coverage-prep activation, gain/preparation acceptances, rejection reasons, accepted preparation outcomes, and the new update shape
Search scales1, 0.25, 0.05, 0.01, 0.0025, 0.0005, 0.0001
Frontier anchors52 anchors across 10 source-profile groups and 52 source-profile targets
Outer guardchecked 1/1 step; attempted 1 update; accepted 1; rejected 0
Profile-scale attempts43 attempted; 9 accepted; 34 rejected
Coverage outcomes3 coverage gains; 10 coverage ties; 30 coverage regressions
Coverage-prep outcomes3 gain acceptances; 6 preparation acceptances; 34 rejections
Coverage rejection reasons28 floor regressions; 4 coverage ties without score gain; 2 coverage regressions
Accepted profile scalesbridge:owner 0.0025, fact:learning 0.0025, fact:owner 0.0025, fact:place 0.25, qa:glossary 0.05, qa:learning 0.05, qa:owner 0.01, qa:place 0.0005, qa:self 0.05
Accepted update shapesprofile_scale_coverage_prep_frontier_diversity_calibrated_sequential_profile_stabilization: 1
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed; max dominant predicted rate remains 0.9; minimum target-token coverage remains 0.1667
Promotion statusrejected for model promotion; coverage-prep restores frontier movement while preserving explicit coverage-gain accounting

Latest coverage-recovery frontier profile-scale floor stabilization screen:

SignalValue
Runruns/transformer-answer-v0.99-baseline-floor-diversity-coverage-recovery-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/
Modebranch-context-profile-baseline-floor-diversity-coverage-recovery-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood
Added mechaniccoverage-recovery retry: after a safe coverage-preparation candidate, attempt a small missing-target update and keep either the recovered candidate or the original prepared state
Unit coveragefocused transformer tests pass; the mode records recovery activation, retry scales, prepared candidates, retry attempts, recovery acceptances, fallback preparations, rejection reasons, accepted outcomes, and the recovery update shape
Search scales1, 0.25, 0.05, 0.01, 0.0025, 0.0005, 0.0001; recovery retry scales 1, 0.25, 0.05
Frontier anchors52 anchors across 10 source-profile groups and 52 source-profile targets
Outer guardchecked 1/1 step; attempted 1 update; accepted 1; rejected 0
Profile-scale attempts54 attempted; 6 accepted; 48 rejected
Coverage outcomes2 coverage gains; 11 coverage ties; 41 coverage regressions
Coverage-prep outcomes2 gain acceptances; 4 preparation acceptances; 48 rejections
Coverage-recovery outcomes6 prepared candidates; 15 recovery retries over 95 recovery records; 2 recoveries; 4 preparation fallbacks; 13 retry rejections
Recovery rejection reasons7 floor regressions; 6 coverage ties
Coverage rejection reasons38 floor regressions; 7 coverage ties without score gain; 3 coverage regressions
Accepted profile scalesbridge:owner 1, bridge:place 1, fact:glossary 0.01, fact:owner 0.01, fact:place 0.01, qa:glossary 0.0025
Accepted recovery outcomesbridge:place and fact:glossary recovered coverage; bridge:owner, fact:owner, fact:place, and qa:glossary fell back to preparation
Accepted update shapesprofile_scale_coverage_recovery_frontier_diversity_calibrated_sequential_profile_stabilization: 1
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed; coverage preservation passed, but max dominant predicted rate reached 1.0 and minimum target-token coverage remained 0.0
Promotion statusrejected for model promotion; v0.99 proves recovery conversion is auditable, but branch-diverse behavior is not yet stable enough to promote

Latest branch-stable coverage-recovery frontier profile-scale floor stabilization screen:

SignalValue
Runruns/transformer-answer-v0.100.0-baseline-floor-diversity-branch-stable-coverage-recovery-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/
Modebranch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood
Added mechanicbranch-stable recovery acceptance: a coverage recovery must preserve the prepared candidate's branch-diversity score before the recovered state is accepted
Unit coveragefocused transformer tests pass; the mode records branch-stable activation, checks, acceptances, rejections, fallback preparations, rejection reasons, accepted outcomes, replay-plan activation, and the branch-stable update shape
Search scales1, 0.25, 0.05, 0.01, 0.0025, 0.0005, 0.0001; recovery retry scales 1, 0.25, 0.05
Frontier anchors52 anchors across 10 source-profile groups and 52 source-profile targets
Outer guardchecked 1/1 step; attempted 1 update; accepted 1; rejected 0
Profile-scale attempts54 attempted; 6 accepted; 48 rejected
Coverage outcomes2 coverage gains; 11 coverage ties; 41 coverage regressions
Coverage-prep outcomes2 gain acceptances; 4 preparation acceptances; 48 rejections
Branch-stable recovery outcomes6 prepared candidates; 15 branch-stability checks; 2 branch-stable recoveries; 4 preparation fallbacks; 13 retry rejections
Branch-stable rejection reasons7 floor regressions; 5 coverage ties; 1 branch-score regression
Coverage rejection reasons38 floor regressions; 7 coverage ties without score gain; 3 coverage regressions
Accepted profile scalesbridge:owner 1, bridge:place 1, fact:glossary 0.01, fact:owner 0.01, fact:place 0.01, qa:glossary 0.0025
Accepted branch-stable outcomesbridge:place and fact:glossary recovered coverage while preserving prepared branch score; bridge:owner, fact:owner, fact:place, and qa:glossary fell back to preparation
Accepted update shapesprofile_scale_branch_stable_coverage_recovery_frontier_diversity_calibrated_sequential_profile_stabilization: 1
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed; coverage preservation passed, but max dominant predicted rate remains 1.0 and minimum target-token coverage remains 0.0
Promotion statusrejected for model promotion; v0.100.0 proves recovery conversion can be checked against branch stability, but branch-diverse behavior is still not stable enough to promote

Latest branch-diversity recovery frontier profile-scale floor stabilization screen:

SignalValue
Runruns/transformer-answer-v0.101.0-baseline-floor-diversity-branch-diversity-recovery-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/
Modebranch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-branch-diversity-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood
Added mechanicbranch-diversity recovery: after a profile update is already safe under the floor, coverage, and branch-stability gates, try a small profile-local branch-diversity update and keep it only if branch score improves without coverage regression
Unit coveragefocused transformer tests pass; the mode records branch-diversity recovery activation, candidates, attempts, acceptances, fallback acceptances, rejection reasons, accepted outcomes, score deltas, replay-plan activation, and the branch-diversity recovery update shape
Search scales1, 0.25, 0.05, 0.01, 0.0025, 0.0005, 0.0001; recovery retry scales 1, 0.25, 0.05; branch-diversity recovery scales 0.25, 0.05, 0.01
Frontier anchors52 anchors across 10 source-profile groups and 52 source-profile targets
Outer guardchecked 1/1 step; attempted 1 update; accepted 1; rejected 0
Profile-scale attempts52 attempted; 6 accepted; 46 rejected
Coverage outcomes2 coverage gains; 16 coverage ties; 34 coverage regressions
Coverage-prep outcomes2 gain acceptances; 4 preparation acceptances; 46 rejections
Branch-stable recovery outcomes4 prepared candidates; 12 branch-stability checks; 0 branch-stable recoveries; 4 preparation fallbacks; 12 retry rejections
Branch-diversity recovery outcomes6 candidates; 9 attempts; 5 branch-score-improving refinements; 1 fallback; 4 rejected attempts
Branch-diversity rejection reasons1 floor regression; 1 score regression; 2 score ties
Accepted profile scalesbridge:owner 1, bridge:place 1, fact:glossary 0.05, fact:owner 0.01, fact:place 0.05, fact:self 0.0025
Accepted branch-diversity outcomesbridge:owner, bridge:place, fact:glossary, fact:owner, and fact:place improved branch score; fact:self fell back
Accepted update shapesprofile_scale_branch_diversity_recovery_frontier_calibrated_sequential_profile_stabilization: 1
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed; coverage preservation passed, but max dominant predicted rate remains 1.0 and minimum target-token coverage remains 0.0
Promotion statusrejected for model promotion; v0.101.0 proves local branch-diversity recovery can improve guarded profile states, but global branch-diverse behavior is still not stable enough to promote

Latest collapsed-profile binding frontier profile-scale floor stabilization screen:

SignalValue
Runruns/transformer-answer-v0.102.0-baseline-floor-diversity-collapsed-profile-binding-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/
Modebranch-context-profile-baseline-floor-diversity-branch-stable-coverage-recovery-branch-diversity-collapsed-profile-binding-frontier-profile-scale-calibrated-sequential-profile-stabilization-unlikelihood
Added mechaniccollapsed-profile binding: after a profile update is already safe under the floor, coverage, branch-stability, and branch-diversity recovery gates, try a small profile-local binding update and keep it only if a still-collapsed eval profile improves without coverage regression
Unit coveragefocused transformer tests pass; the mode records collapsed-profile binding activation, candidates, attempts, acceptances, fallback acceptances, rejection reasons, target collapsed profiles, profile-diversity deltas, replay-plan activation, and the collapsed-profile binding update shape
Search scales1, 0.25, 0.05, 0.01, 0.0025, 0.0005, 0.0001; recovery retry scales 1, 0.25, 0.05; branch-diversity recovery scales 0.25, 0.05, 0.01; collapsed-profile binding scales 0.25, 0.05, 0.01
Frontier anchors52 anchors across 10 source-profile groups and 52 source-profile targets
Outer guardchecked 1/1 step; attempted 1 update; accepted 1; rejected 0
Profile-scale attempts54 attempted; 11 accepted; 43 rejected
Branch-diversity recovery outcomes11 candidates; 26 attempts; 4 branch-score refinements; 7 fallbacks
Collapsed-profile binding outcomes11 candidates; 31 attempts; 1 binding update; 10 fallbacks; 30 rejected attempts
Collapsed-profile binding rejection reasons27 collapsed-profile ties; 1 floor regression; 2 score regressions
Collapsed profilesbaseline 9/9; final 3/9 remaining: learning, owner, paraphrases
Accepted update shapesprofile_scale_collapsed_profile_binding_frontier_calibrated_sequential_profile_stabilization: 1
Branch-context gatepassed across 219/219 semantic records with no ambiguous, colliding, or skipped records
Deterministic verifierpassed with no external model
Diversity targetfailed, 0/9 multi-target profiles passed; coverage preservation passed, but max dominant predicted rate remains 1.0 and minimum target-token coverage remains 0.0
Promotion statusrejected for model promotion; v0.102.0 proves a targeted collapsed-profile binding update can survive the guard, but learning, owner, and paraphrase collapse still block a functional transformer responder

The transformer is not yet promoted as a reliable responder. It is architecture evidence: a from-scratch attention model can update weights on the admitted corpus and leave a checkpoint plus metrics. v0.42 preserves the 37/219 transformer-only candidate result while improving answer-target NLL versus v0.41, but raw greedy completion still fails exact answers with the short wrong completion " te.". The latest v0.43 stacked screen proves that two-layer top-layer-only direct-answer training can complete and write a checkpoint when the expensive post-direct candidate snapshot is explicitly skipped, but its repeated "a" output is still a failed direct decoder. v0.31's no-candidate transformer-guided generator remains useful comparison evidence, but it is not raw transformer decoding. The branch-profile smoke adds a sharper diagnosis: at the configured branch position, the model is selecting one global token across prompts instead of separating target-specific answer branches. The branch-collapse repair uses that diagnosis by penalizing the sampled dominant branch token, but the evidence shows it only moves the collapse to a new global token. Branch-batch contrast then trains several distinct target branches in one update; it lowers loss under sparse dosage, but the branch profile still collapses globally and even loses the one initially correct QA branch. --use-context-mean then adds a mean-pooled context residual to the final hidden representation, but the bounded screens still collapse the QA branch to one wrong global token. The next repair needs a stronger prompt-conditioned representation signal than simple prompt averaging. --use-context-projection then lets the model learn a zero-initialized projection of that context summary, and the projection weights do move during training, but the branch profile still collapses globally. --use-prompt-attention-summary makes the summary itself attention-pooled and trainable, but the bounded screens still collapse globally. The branch-context coverage diagnostic explains why context-16 branch screens were partly underdetermined: QA had only four visible branch contexts for eight records, and those windows mapped to different first target tokens. Context-32 removes literal QA ambiguity but still truncates semantic prompt features. Context-80 gives every current eval record complete semantic branch-context coverage with no ambiguity. The next repair needs efficient longer-context prompt-specific discrimination, not just suppression, batching, or a trainable summary of a truncated context. The optional branch-context gate now enforces that distinction for direct-answer screens: unsafe context-16 branch repair can be skipped and recorded, while complete context-80 branch repair is allowed to run. The branch-only snapshot mode keeps those longer-context screens practical by skipping greedy completion evals while still recording the branch diagnostics and gate evidence needed for the next decision. The first dim8 follow-ups show that lower branch loss and complete branch context are still not enough: both repair/contrast and branch-batch contrast collapse QA branch prediction to one global token. A full greedy-eval promotion snapshot is not warranted until a screen improves prompt-specific branch diversity. The branch-diversity target now makes that requirement machine-readable in every direct-answer snapshot. branch-diversity-unlikelihood trains directly against the observed collapse token and improves the tiny unit case, but the first corpus smoke only moves the dominant global prediction. Freezing the output bias removes one cheap global escape hatch, but the corpus smoke still rotates to a single dominant branch token. Restricted target-set softmax briefly raises QA predicted diversity to two tokens, then collapses back by the final snapshot. The next repair needs to make diversity stable across prompts, not just rotate or momentarily crack the collapsed token. Best-snapshot restoration can preserve a better measured branch state, but it still ends as a one-token collapse until the underlying representation separates prompts. Prompt-prefix projection gives the model a targeted trainable prompt path and the new parameters move, but the evidence still ends in the same all-"u" branch collapse. Prompt-position projection keeps position-specific prompt access and moves many more parameters, but the branch profile remains collapsed too. Branch-target margin adds pairwise target separation on top of that prompt path and lowers bounded train loss, but the restored branch profile remains the same one-token collapse. Branch-representation contrast exposes that the hidden states themselves remain nearly indistinguishable at the answer branch, so the next repair needs a stronger prompt-conditioned representation path rather than another output-head loss alone. The dim-8 capacity screen increases measured hidden distance, but branch predictions still collapse globally, so width alone is not the missing repair. Prompt-position projection scaling shows the prompt residual can be made louder and the restored hidden-state distance can rise, but the branch prediction still collapses globally. The pre-layer-norm/final-normalization path is now implemented and screened; it cracks full collapse in most multi-target profiles but leaves QA and heldout collapsed. Target-balanced branch batching then regresses to a baseline-restored global "n" collapse, so the next repair should strengthen prompt-to-answer binding for QA and heldout rather than rely on sampler balancing or another unrelated loss term. The branch-rank diagnostic confirms the correct target is usually buried outside the top five predictions, which points the next repair toward output-head prompt binding instead of a simple near-miss margin tweak. The first output-binding repair combines that target-set pressure with representation contrast and improves average target rank/top-5 evidence, but it still fails target-token coverage and collapses to wrong branch tokens. The next repair needs to promote the correct target into the top branch set, not only move it upward while the wrong tokens remain on top. Hard rank-margin repair is the first screen to make that movement clear: it lifts correct targets into the top five more often and improves target-token coverage, but it still leaves a single global wrong prediction. The next repair needs to convert rank lift into prompt-specific top-1 branch choices. Target- balanced rank-margin adds some wrong-token diversity and better QA top-3 coverage, but it still does not make correct target tokens win the branch. The top-one hard-negative screen then regresses rank and top-k coverage, so the next repair should not simply concentrate more pressure on the current top wrong token. It needs a prompt-conditioned mechanism that selects among near-tied branch candidates.

The v0.66 open-source mechanics audit reframes the current blocker as trainer mechanics rather than another global branch loss. v0.67 implements the first profile-aware replay-plan surface: branch records carry source/profile keys, deficits and preservation are computed per profile, and the plan is written as a run artifact before training. v0.68 proves that constraint is doing useful work: profile-aware training moved correct targets upward in the ranked list, but only by collapsing target-token coverage and branch diversity, so the snapshot gate restored baseline. The next trainer change needs anti-collapse preservation inside the profile-aware plan. v0.81 implements that trainer change as a profile target-share objective mechanic. v0.82 screens it and rejects the trained snapshots because rank lift still comes from branch collapse. v0.83 adds prompt-specific sibling-target ownership margins and proves the focused mechanic, but the screen still restores step 0 because trained snapshots lose target-token coverage. v0.84 anchors replay preservation to baseline predictions and improves trained coverage relative to v0.83, but still restores step 0 because snapshots miss the full coverage floor. v0.85 adds a baseline-floor update guard that preserves the floor by rejecting all attempted unsafe updates. v0.86 retries those updates at four smaller scales and still rejects every attempt. v0.87 adds one baseline-covered repair after each failed retry and still rejects every attempt; v0.88 moves floor anchors into the objective and still rejects every attempt; v0.89 removes branch pressure and still rejects every floor-stabilization attempt. v0.90 records the rejected profile floors directly, showing heldout violates every attempt and the worst deficit is 0.25 on learning. v0.91 covers the full profile-target floor surface and still rejects every attempt. v0.92 changes the repair shape to sequential source-profile batches and still rejects every profile-local attempt. v0.93 calibrates that movement below 0.01 and accepts one source-profile update at scale 0.0025. v0.94 adds profile-scale memory and accepts eight source-profile updates. v0.95 adds diversity-aware profile-scale acceptance, preserves five score-improving source-profile updates, and rejects eleven floor-preserving score regressions, and v0.96 adds frontier target anchors, preserving nine score-improving source-profile updates while lowering max dominant predicted rate to 0.9. v0.97 adds coverage-frontier acceptance and shows strict monotonic coverage gating is auditable but too conservative, accepting only one coverage-gaining source-profile update. v0.98 adds coverage-prep acceptance, restores nine source-profile updates, and separates three coverage gains from six safe setup moves. v0.99 adds coverage-recovery retry, converts two prepared candidates into direct coverage recoveries, and preserves four preparation fallbacks. v0.100.0 adds branch-stable recovery acceptance, keeps those two recoveries, and records one branch-score regression rejection. v0.101.0 adds branch-diversity recovery, accepts five local branch-score refinements, and falls back once. v0.102.0 adds collapsed-profile binding, accepts one targeted binding update, and narrows final collapse from nine eval profiles to three. The next repair should target learning, owner, and paraphrases without weakening the coverage floor.