Current Evidence

The current promoted run is runs/self-improve-v0.42/.

Older milestone evidence that is no longer part of the current-state table is preserved in Historical evidence archive.

Signal	Current value
Product name	`QuarkLM`
Package / repository slug	`quark-lm`
Docs host	`Read the Docs`
Marketing host	`GitHub Pages`
RC spec	`RC_SPEC.md`
RC gap audit	`RC_GAP_AUDIT.md`
RC checklist	`RC_CHECKLIST.md`
Recommended RC track	`Research Prototype RC`
Research Prototype RC status	`near: the closed-world self-improvement system is reproducible, auditable, documented, and honest about unpromoted transformer evidence.`
Language Model RC status	`not ready: the from-scratch transformer still fails branch_diversity_target after v0.115.`
Next model bundle	`Profile-Balanced Routing Repair with representation-separation acceptance checks.`
Research grounding	`sites/docs/docs/learn/research-grounding.md`
Research reviewed	`2026-06-15`
Research pass	`v0.71-v0.115 implementation evidence; v0.115 adds a bias-frozen hidden-projection margin candidate with exact retrieval memory and rejected neural promotion.`
Research decision	`QuarkLM should model self-improvement as a closed-loop lifecycle with ledgered admission, verified selection, auditable weight optimization, separate inference rails, and promotion gates that can reject regressions.`
Research next step	`Use the v0.115 hidden-projection candidate evidence to design a broader guarded routing repair that can lift coverage without collapsing profiles.`
Open-source mechanics audit	`MECHANICS_AUDIT.md and sites/docs/docs/learn/open-source-mechanics-audit.md added as the v0.66 deeper comparison of open-source LLM, tokenizer, continual-learning, transparency, and self-improvement mechanics.`
Mechanics audit decision	`The next bottleneck is trainer mechanics, not another global branch-loss term: direct-answer replay should be profile-aware, artifacted, coverage-constrained, and tested for profile isolation before the next full-stack repair run.`
Mechanics audit next step	`Improve target-token diversity for remaining memory-backed failures using v0.115 hidden-projection candidate evidence; do not count retrieval as weight learning.`
Forward research plan	`FORWARD_RESEARCH_PLAN.md and sites/docs/docs/learn/forward-research-plan.md added as the v0.69 cross-referenced implementation strategy.`
Forward plan decision	`Pause direct-answer objective churn until QuarkLM has the self-improvement operating system needed to decide which training changes are legitimate: experiment registry, replay extraction, corpus governance, candidate quarantine, closed-world verifier checks, recipe boundaries, and constraint-first promotion gates.`
Forward plan next step	`After v0.115, expand guarded hidden-projection repair only if it improves target coverage under the branch-diversity gate.`
Deep research review	`DEEP_RESEARCH_REVIEW.md and sites/docs/docs/learn/deep-research-review.md added as the v0.70 deeper cross-referenced research and implementation-gap review.`
Deep review decision	`No larger transformer repair screen should run until experiment intent, corpus plans, replay plans, verifier checks, recipes, and constraint-first promotion are explicit artifacts.`
Deep review next step	`After v0.115, tie the target-routing gap to a guarded hidden-projection and prompt-representation repair surface that can survive promotion constraints.`
Research implementation map	`RESEARCH_IMPLEMENTATION_MAP.md and sites/docs/docs/learn/research-implementation-map.md added as the v0.74 source-to-gap-to-version implementation map.`
Implementation map decision	`Deep cross-referenced research and open-source mechanics review are now a required implementation control: each next mechanic should cite its research pattern, name the closed-world boundary it protects, and produce acceptance evidence.`
Implementation map next step	`Candidate quarantine through hidden-projection margin repair are implemented and screened through v0.115.0; broader guarded routing repair next.`
Experiment registry	`src/closed_world_lm/experiment_registry.py and sites/docs/docs/operate/experiment-registry.md added as the v0.71 run-intent implementation.`
Experiment registry decision	`Self-improvement and transformer answer-training runs now declare hypothesis, allowed data, planned artifacts, recipe id, acceptance gates, failure criteria, notes, and final decision before their outputs are trusted as evidence.`
Experiment registry next step	`Use the registry as the required evidence wrapper for replay, corpus, verifier, recipe, and promotion-gate mechanics.`
Replay planning	`src/closed_world_lm/replay_plan.py added as the v0.72 standalone replay-planning module.`
Replay planning decision	`Transformer training still uses the existing profile-aware replay behavior, but replay record normalization, profile grouping, coverage floors, missing-target summaries, and JSON-safe plan shape now live outside the transformer monolith.`
Replay planning next step	`Use standalone replay planning as input to corpus hygiene, candidate quarantine, verifier, recipe, and promotion-gate reports.`
Corpus hygiene	`src/closed_world_lm/corpus_hygiene.py and sites/docs/docs/operate/corpus-hygiene.md added as the v0.73 corpus hygiene and training-plan artifact implementation.`
Corpus hygiene decision	`Self-improvement and transformer answer-training runs now write corpus_hygiene.json and training_plan.json with source mixtures, duplicate checks, train/eval prompt overlap, candidate ratios, rare-profile coverage, allowed data sources, planned artifacts, and replay-plan summaries when available.`
Corpus hygiene next step	`Use candidate ratios, quarantine summaries, overlap evidence, verifier summaries, recipe summaries, transformer responsibility surfaces, checkpoint metadata surfaces, and eval surfaces as inputs to objective-repair work.`
Candidate quarantine	`src/closed_world_lm/candidate_quarantine.py and sites/docs/docs/operate/candidate-quarantine.md added as the v0.75 candidate lifecycle implementation.`
Candidate quarantine decision	`Self-improvement and transformer answer-training runs now write candidate_quarantine.json, and training_plan.json records that candidate records are not training data until admitted into the ledgered corpus and converted into curriculum lessons.`
Candidate quarantine next step	`Use the candidate quarantine manifest as input to deterministic verifier checks, recipe artifacts, and future promotion gates.`
Closed-world verifier	`src/closed_world_lm/closed_world_verifier.py and sites/docs/docs/operate/closed-world-verifier.md added as the v0.76 deterministic verifier implementation.`
Verifier decision	`Self-improvement and transformer answer-training runs now write closed_world_verifier.json, embed verifier summaries in training_plan.json, and require verifier approval as a run-intent gate without using an external model.`
Verifier next step	`Use verifier evidence as an input to recipe objects, constraint-first promotion gates, transformer responsibility surfaces, model/checkpoint metadata, eval surfaces, and objective-repair work.`
Training recipe	`src/closed_world_lm/training_recipe.py and sites/docs/docs/operate/training-recipes.md added as the v0.77 recipe and constraint-first promotion implementation.`
Training recipe decision	`Self-improvement and transformer answer-training runs now write training_recipe.json and constraint_first_promotion.json. Transformer decisions cannot promote from loss, NLL, rank, top-k, or exact quality evidence unless closed-world constraints pass first.`
Training recipe next step	`Use recipe and constraint-first artifacts as the surfaces for transformer objective-repair work.`
Transformer responsibility	`src/closed_world_lm/transformer_experiment.py, src/closed_world_lm/transformer_training.py, src/closed_world_lm/transformer_objectives.py, and sites/docs/docs/build/transformer-responsibilities.md added as the v0.78 transformer responsibility implementation.`
Transformer responsibility decision	`Transformer answer-training now keeps artifact contracts, experiment intent, recipe creation, promotion decisions, JSONL snapshot writing, shuffled training cursors, loss averaging, and the direct-answer objective catalog behind narrow tested surfaces while preserving the public CLI.`
Transformer responsibility next step	`Use the v0.78 responsibility surfaces through the v0.115.0 hidden-projection candidate evidence before broader routing repair.`
Transformer model surface	`src/closed_world_lm/transformer_model.py and tests/test_transformer_model.py added as the v0.79 transformer model/config and checkpoint metadata implementation.`
Transformer model decision	`Transformer config, optimizer config, generation config, validation, checkpoint architecture, checkpoint format, tokenizer identity, closed-world dataset metadata, arg-to-config adapters, and run metadata now live outside transformer_char_model.py while remaining re-exported for compatibility.`
Transformer model next step	`Use model/checkpoint metadata surfaces with the v0.115.0 hidden-projection candidate evidence before broader routing repair.`
Transformer eval surface	`src/closed_world_lm/transformer_checkpoint.py, src/closed_world_lm/transformer_eval.py, tests/test_transformer_checkpoint.py, and tests/test_transformer_eval.py added as the v0.80 transformer eval/checkpoint-load implementation.`
Transformer eval decision	`Checkpoint payload loading and identity validation, checkpoint summaries, probe loading, candidate collection, generic transformer scoring, eval report assembly, samples JSONL writing, and eval JSON writing now live outside transformer_char_model.py while preserving CLI behavior and artifact shapes.`
Transformer eval next step	`v0.115.0 uses eval and promotion surfaces to screen a bias-frozen hidden-projection margin candidate; branch_diversity_target still blocks promotion.`
Latest repository version	`v0.115.0`
Latest version summary	`bias-frozen hidden-projection margin candidate evidence`
Current version	`v0.42`
Admitted facts	`12`
Direct admission probes	`48/48`
Admission paraphrase probes	`84/84`
Glossary probes	`38/38`
QA exact	`8/8`
Admissions exact	`48/48`
Admission paraphrases exact	`84/84`
Glossary exact	`38/38`
Self exact	`7/7`
Learning exact	`4/4`
Forgetting audit	`passed`
Prompt leakage audit	`passed`
Exact eval audit	`passed`
Promotion gate	`passed`
Self-diagnosis	`passed`
Self-diagnosis external model	`false`
Self-diagnosis recommendation	`promote_or_expand_corpus`
Attempt archive	`enabled`
Transformer run	`runs/transformer-answer-v0.42-branch-repair-contrast50-dim8-context32/`
Transformer validation NLL	`answer target NLL 3.5850 -> 2.4129`
Transformer exact	`0/219 -> 0/219 direct greedy`
Transformer candidate accuracy	`15/219 -> 37/219 eval-scoped`
Direct transformer exact	`0/219 -> 0/219 direct greedy`
Direct transformer loss	`3.4278 -> 2.2708`
Direct transformer mode	`periodic-branch-repair-contrast-unlikelihood`
Direct transformer failure pattern	`short wrong ' te.' greedy completion after wider sparse branch contrast`
Latest transformer screen	`runs/transformer-answer-v0.115.0-hidden-projection-margin-candidate-step1-dim4-context80/`
Latest screen direct loss	`4.9050 one-step hidden-projection margin candidate screen`
Latest screen direct exact	`branch-only screen; direct greedy eval skipped`
Latest screen post-direct candidate snapshot skipped	`true`
Latest retrieval memory report	`runs/transformer-answer-v0.115.0-hidden-projection-margin-candidate-step1-dim4-context80/retrieval_memory_report.json`
Retrieval memory artifact	`retrieval_memory_report.json is now a transformer answer-training artifact declared in experiment intent and training plans.`
Retrieval memory summary	`497 corpus-only memory cards; 219/219 exact retrieval evals; no external model, embeddings, pretrained retriever, or weight updates.`
Retrieval memory status	`memory-first evidence remains exact in v0.115.0 and is consumed only as source-plan evidence, not neural promotion.`
Latest memory consolidation plan	`runs/transformer-answer-v0.115.0-hidden-projection-margin-candidate-step1-dim4-context80/memory_consolidation_plan.json`
Memory consolidation summary	`v0.115 keeps retrieval exact, screens hidden-projection margin repair with output bias frozen, and still rejects neural promotion on branch_diversity_target.`
Memory consolidation status	`logit-prior representation evidence: 8 missing-token candidates, 24 attempts, 0 direct missing-token acceptances, 24 rejections, 8 fallbacks, 1 accepted profile-specific update shape, no external model, embeddings, or pretrained retriever; branch_diversity_target still blocks promotion with critical target_routing_gap, high output-bias escape risk, low representation separation across 9/9 profiles, and hidden-projection pressure across 9/9 multi-target profiles.`
Latest transformer diagnostic run	`runs/transformer-answer-v0.43-branch-profile-smoke-dim4-context16/`
Latest transformer diagnostic	`direct-answer branch profiles from model logits`
Latest diagnostic QA branch accuracy	`1/8 -> 1/8`
Latest diagnostic dominant prediction	`all 'o' -> all 'y'`
Latest transformer repair run	`runs/transformer-answer-v0.43-periodic-branch-batch-smoke-dim4-context16/`
Latest transformer repair mode	`periodic-branch-batch-contrast-unlikelihood`
Latest transformer repair status	`rejected: loss improved but prompt-independent branch collapse worsened`
Latest representation screen	`runs/transformer-answer-v0.43-prompt-attention-branch-repair-smoke-dim4-context16/`
Latest representation option	`--use-prompt-attention-summary`
Latest representation status	`rejected: prompt-attention summary projection moved and lowered loss, but QA branch collapse still worsened`
Latest branch-context diagnostic	`runs/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context80/`
Branch context 16 QA	`0/8 semantic covered; 4 ambiguous QA branch contexts`
Branch context 32 QA	`0/8 semantic covered; 0 ambiguous QA branch contexts`
Branch context 80 all evals	`219/219 semantic covered; 0 ambiguous branch contexts`
Latest branch-context gate run	`runs/transformer-answer-v0.43-branch-context-gate-smoke-dim4-context80/`
Branch context gate at 16	`required gate failed; requested 5 direct steps, ran 0`
Branch context gate at 80	`required gate passed; requested 1 direct step, ran 1`
Latest branch-only screen	`runs/transformer-answer-v0.43-branch-context-gated-branchonly-smoke-dim4-context80/`
Branch-only gate	`passed; requested 5 direct steps, ran 5`
Branch-only eval skipping	`direct greedy evals skipped in JSONL snapshots; branch profiles and branch-context gate retained`
Latest branch-only repair screen	`runs/transformer-answer-v0.43-branchonly-periodic-repair-contrast50-dim8-context80/`
Branch-only repair status	`rejected screen: gate passed and 100/100 direct steps ran, but QA branch prediction collapsed from all space to all 'a'`
Latest branch-only batch screen	`runs/transformer-answer-v0.43-branchonly-branch-batch-dim8-context80/`
Branch-only batch status	`rejected screen: gate passed and 50/50 direct steps ran, but QA branch prediction still collapsed to all 'a'`
Latest branch-diversity target run	`runs/transformer-answer-v0.43-branch-diversity-target-smoke-dim4-context80/`
Branch-diversity target	`direct-answer snapshots include branch_diversity_target over multi-target eval profiles`
Branch-diversity smoke	`context gate passed; diversity target failed 0/9 multi-target profiles; QA target_unique 8, predicted_unique 1, dominant 'r' rate 1.0`
Latest branch-diversity training run	`runs/transformer-answer-v0.43-branch-diversity-train-smoke-dim4-context80/`
Branch-diversity training mode	`branch-diversity-unlikelihood`
Branch-diversity training status	`rejected smoke: gate passed and 10/10 direct steps ran, but diversity target still failed 0/9 multi-target profiles`
Latest branch-diversity freeze-bias run	`runs/transformer-answer-v0.43-branch-diversity-freezebias-smoke-dim4-context80/`
Branch-diversity freeze-bias mode	`branch-diversity-unlikelihood with --direct-answer-freeze-output-bias`
Branch-diversity freeze-bias status	`rejected stabilizer: gate passed and 50/50 direct steps ran with output bias frozen, but diversity target still failed 0/9 multi-target profiles`
Latest branch-target softmax run	`runs/transformer-answer-v0.43-branch-target-softmax-freezebias-smoke-dim4-context80/`
Branch-target softmax mode	`branch-target-softmax-unlikelihood with --direct-answer-freeze-output-bias`
Branch-target softmax status	`rejected target-set screen: gate passed and 50/50 direct steps ran, composite train loss moved 5.6671 -> 5.5820, but diversity target still failed 0/9 multi-target profiles`
Latest branch restore run	`runs/transformer-answer-v0.43-branch-target-softmax-restorebest-smoke-dim4-context80/`
Branch restore mode	`branch-target-softmax-unlikelihood with --direct-answer-restore-best-branch-snapshot`
Branch restore status	`rejected guardrail: restored best aggregate branch snapshot from step 40 after 50/50 direct steps, but diversity target still failed 0/9 multi-target profiles`
Latest prompt-prefix projection run	`runs/transformer-answer-v0.43-prompt-prefix-target-softmax-restorebest-smoke-dim4-context80/`
Prompt-prefix projection option	`--use-prompt-prefix-projection`
Prompt-prefix projection status	`rejected representation screen: all 20 prompt-prefix projection parameters moved and loss improved 5.6649 -> 5.5679, but diversity target still failed 0/9 multi-target profiles`
Latest prompt-position projection run	`runs/transformer-answer-v0.43-prompt-position-target-softmax-restorebest-smoke-dim4-context80/`
Prompt-position projection option	`--use-prompt-position-projection`
Prompt-position projection status	`rejected representation screen: 1108/1284 prompt-position projection parameters moved and loss improved 5.6649 -> 5.5679, but diversity target still failed 0/9 multi-target profiles`
Latest branch-target margin run	`runs/transformer-answer-v0.43-branch-target-margin-prompt-position-smoke-dim4-context80/`
Branch-target margin mode	`branch-target-margin-unlikelihood with --use-prompt-position-projection`
Branch-target margin status	`rejected target-margin screen: gate passed and 50/50 direct steps ran, train loss moved 4.8973 -> 4.7784, but diversity target still failed 0/9 multi-target profiles`
Latest branch-representation contrast run	`runs/transformer-answer-v0.43-branch-representation-contrast50-prompt-position-smoke-dim4-context80/`
Branch-representation contrast mode	`branch-representation-contrast-unlikelihood with --direct-answer-contrast-weight 50.0`
Branch-representation contrast status	`rejected representation-contrast screen: direct snapshots now record hidden-distance profiles, but high-weight contrast still failed diversity target 0/9 multi-target profiles`
Latest branch-representation capacity run	`runs/transformer-answer-v0.43-branch-representation-contrast50-prompt-position-smoke-dim8-context80-steps40/`
Branch-representation capacity mode	`dim8 branch-representation-contrast-unlikelihood with --direct-answer-contrast-weight 50.0`
Branch-representation capacity status	`rejected capacity screen: 40/40 direct steps ran after the 50-step dim8 screen proved too slow, hidden distance increased but diversity target still failed 0/9 multi-target profiles`
Latest prompt-position scale run	`runs/transformer-answer-v0.43-prompt-position-scale32-repcontrast50-smoke-dim4-context80/`
Prompt-position scale mode	`branch-representation-contrast-unlikelihood with --prompt-position-projection-scale 32.0`
Prompt-position scale status	`rejected prompt-signal scale screen: 50/50 direct steps ran, 1108/1284 prompt-position projection parameters moved, hidden distance increased, but diversity target still failed 0/9 multi-target profiles`
Transformer structure audit	`STRUCTURE_AUDIT.md now gates the next transformer repair: study open-source model/trainer/tokenizer/checkpoint structure without importing external weights, tokenizers, embeddings, datasets, or training text`
Transformer structure decision	`implemented and screened an opt-in pre-layer-norm transformer block path with final normalization; target-balanced branch sampling was rejected, so the next target is prompt-to-answer binding for QA and heldout`
Latest pre-layer-norm run	`runs/transformer-answer-v0.44-prelayernorm-repcontrast50-prompt-position-smoke-dim4-context80/`
Pre-layer-norm mode	`branch-representation-contrast-unlikelihood with --use-pre-layer-norm and --use-prompt-position-projection`
Pre-layer-norm status	`partial structural evidence: 50/50 direct steps ran, 1108/1284 prompt-position parameters and all 8 final-norm parameters moved, but diversity target still failed 0/9 multi-target profiles`
Latest target-balanced run	`runs/transformer-answer-v0.44-target-balanced-prelayernorm-repcontrast50-prompt-position-smoke-dim4-context80/`
Target-balanced mode	`branch-balanced-representation-contrast-unlikelihood with --use-pre-layer-norm and target-bucket branch batches`
Target-balanced status	`rejected sampler evidence: 50/50 direct steps ran, but best-snapshot restoration returned to step 0 and all 9/9 multi-target profiles collapsed to global 'n'`
Latest branch-rank diagnostic run	`runs/transformer-answer-v0.45-branch-rank-diagnostic-smoke-dim4-context80/`
Branch-rank diagnostic	`direct-answer branch profiles include average target rank, top-3/top-5 target coverage, and failed-record top predictions`
Branch-rank QA	`final QA collapsed to all 'n' with average target rank 14.25 and top-3/top-5 target coverage 0.125`
Branch-rank heldout	`final heldout collapsed to all 'n' with average target rank 14.25 and top-3/top-5 target coverage 0.125`
Branch-rank status	`diagnostic evidence: correct branch targets are usually buried, so the next repair should improve prompt-to-answer output binding`
Latest output-binding run	`runs/transformer-answer-v0.46-output-binding-rankscore-smoke-dim4-context80/`
Output-binding mode	`branch-output-binding-unlikelihood with rank-aware best-snapshot scoring and frozen output bias`
Output-binding QA	`QA average target rank improved 17.375 -> 14.125 and top-5 coverage reached 0.25, but target-token coverage stayed 0.0 and top-3 coverage ended 0.0`
Output-binding heldout	`heldout average target rank improved 17.25 -> 14.375 and top-5 coverage reached 0.25, but target-token coverage stayed 0.0 and top-3 coverage ended 0.0`
Output-binding status	`rejected repair evidence: output binding cracked wrong-token diversity but still collapsed QA and heldout to wrong branch tokens`
Latest rank-margin run	`runs/transformer-answer-v0.47-rank-margin-steps50-smoke-dim4-context80/`
Rank-margin mode	`branch-rank-margin-unlikelihood against top wrong branch tokens with frozen output bias`
Rank-margin QA	`QA average target rank improved 17.375 -> 9.0, target-token coverage reached 0.125, top-3 coverage reached 0.25, and top-5 coverage reached 0.5`
Rank-margin heldout	`heldout average target rank improved 17.25 -> 9.0, target-token coverage reached 0.125, top-3 coverage reached 0.25, and top-5 coverage reached 0.375`
Rank-margin status	`strongest rank-lift evidence so far, but rejected for promotion because predicted diversity stayed 1/8 and branches still collapsed to wrong 'n'`
Latest balanced rank-margin run	`runs/transformer-answer-v0.48-balanced-rank-margin-smoke-dim4-context80/`
Balanced rank-margin mode	`branch-balanced-rank-margin-unlikelihood with target-balanced branch batches and top wrong-token margins`
Balanced rank-margin QA	`QA predicted diversity reached 2/8, target-token coverage stayed 0.125, average target rank reached 9.375, top-3 reached 0.375, and top-5 reached 0.5`
Balanced rank-margin heldout	`heldout predicted diversity reached 2/8, target-token coverage stayed 0.125, average target rank reached 9.625, top-3 reached 0.25, and top-5 reached 0.5`
Balanced rank-margin status	`rejected evidence: target-balanced rank margin improves wrong-token diversity and top-3/top-5 coverage, but top-1 branch choices are still wrong`
Latest top-one rank-margin run	`runs/transformer-answer-v0.49-balanced-rank-margin-top1-smoke-dim4-context80/`
Top-one rank-margin mode	`branch-balanced-rank-margin-unlikelihood with one top wrong token`
Top-one rank-margin QA	`QA target-token coverage stayed 0.125, but average target rank regressed to 12.5, top-3 fell to 0.125, and top-5 fell to 0.25`
Top-one rank-margin heldout	`heldout target-token coverage stayed 0.125, but average target rank regressed to 12.375, top-3 fell to 0.125, and top-5 fell to 0.25`
Top-one rank-margin status	`rejected evidence: concentrating on one current top wrong token regressed rank/top-k evidence instead of converting targets into top-1 choices`
Latest top-k softmax run	`runs/transformer-answer-v0.50-balanced-topk-softmax-w5-smoke-dim4-context80/`
Top-k softmax mode	`branch-balanced-topk-softmax-unlikelihood with target-balanced branch batches and restricted target-vs-top-wrong-token softmax`
Top-k softmax QA	`QA target-token coverage stayed 0.125, average target rank improved to 8.75, top-3 reached 0.375, and top-5 reached 0.5`
Top-k softmax heldout	`heldout target-token coverage stayed 0.125, average target rank improved to 8.75, top-3 reached 0.375, and top-5 reached 0.5`
Top-k softmax status	`rejected evidence: top-k softmax recovers rank/top-k evidence after v0.49 but still leaves QA and heldout collapsed to wrong 'u' top-1 branch choices`
Latest foundation-stack run	`runs/transformer-v0.51-foundation-stack-smoke/`
Foundation-stack mode	`full mechanics stack: AdamW/SGD state, scheduling, accumulation, resume validation, multi-head/RMSNorm/gated/tied/rotary architecture options, generation traces, and replayable eval samples`
Foundation-stack smoke	`2/2 language-model steps completed with AdamW, attention_heads 2, RMSNorm, gated MLP, tied output embeddings, rotary positions, and cache-aware generation metadata`
Foundation-stack artifacts	`quarklm-transformer-v2 checkpoint, optimizer_state.json, eval.json, and eval_samples.jsonl`
Foundation-stack status	`mechanics-readiness evidence only; not a promoted responder or direct-answer repair run`
Latest full-stack top-k run	`runs/transformer-answer-v0.52-fullstack-topk-softmax-smoke-dim4-context80/`
Full-stack top-k mode	`branch-balanced-topk-softmax-unlikelihood under the full v0.51 stack`
Full-stack top-k QA	`restored step 0; QA predicted diversity 3/8, target-token coverage 0.25, average target rank 13.25, top-3 0.25, top-5 0.375`
Full-stack top-k heldout	`restored step 0; heldout predicted diversity 3/8, target-token coverage 0.25, average target rank 13.375, top-3 0.25, top-5 0.375`
Full-stack top-k status	`rejected evidence: full-stack baseline improves diversity, but unchanged top-k pressure collapses training to one wrong token; next repair should bind prompt contexts to target tokens`
Latest bidirectional binding run	`runs/transformer-answer-v0.53-fullstack-bidir-binding-smoke-dim4-context80/`
Bidirectional binding mode	`branch-balanced-bidirectional-binding-unlikelihood under the full v0.51 stack`
Bidirectional binding unit test	`focused transformer tests pass; context-ownership regression verifies target tokens gain probability mass on their own prompt contexts`
Bidirectional binding QA	`restored step 40; QA predicted diversity 2/8, dominant wrong 'a', target-token coverage 0.125, average target rank 7.875, top-3 0.25, top-5 0.5`
Bidirectional binding heldout	`restored step 40; heldout predicted diversity 2/8, dominant wrong 'a', target-token coverage 0.125, average target rank 9.0, top-3 0.25, top-5 0.375`
Bidirectional binding history	`training step 50 briefly reached QA target-token coverage 0.25 with average target rank 8.375 before best-snapshot restore selected the rank-focused step 40 checkpoint`
Bidirectional binding status	`partial progress, rejected for promotion: bidirectional binding improves rank pressure under the full stack, but target coverage is not preserved and diversity target still fails 0/9 multi-target profiles`
Latest coverage binding run	`runs/transformer-answer-v0.54-fullstack-coverage-binding-smoke-dim4-context80/`
Coverage binding mode	`branch-balanced-coverage-binding-unlikelihood under the full v0.51 stack`
Coverage binding unit test	`focused transformer tests pass; hard-wrong-token coverage regression verifies target-set mass and exact target probability improve in the restricted candidate set`
Coverage binding QA	`restored step 0; QA predicted diversity 3/8, target-token coverage 0.25, average target rank 13.25, top-3 0.25, top-5 0.375`
Coverage binding heldout	`restored step 0; heldout predicted diversity 3/8, target-token coverage 0.25, average target rank 13.375, top-3 0.25, top-5 0.375`
Coverage binding history	`training step 50 improved QA average target rank to 8.125, but target-token coverage collapsed to 0.0 with one wrong 'a' top-1 branch token`
Coverage binding status	`rejected evidence: best-snapshot scoring restored the baseline because bundled hard-negative coverage binding traded away target coverage for rank; next repair should preserve target-set coverage before exact-target sharpening`
Latest target-set coverage run	`runs/transformer-answer-v0.55-fullstack-target-set-coverage-smoke-dim4-context80/`
Target-set coverage mode	`branch-balanced-target-set-coverage-unlikelihood under the full v0.51 stack with positive target CE disabled`
Target-set coverage unit test	`focused transformer tests pass; target-set-only coverage regression verifies target-set mass improves against hard wrong tokens without requiring exact-target sharpening`
Target-set coverage QA	`restored step 0; QA predicted diversity 3/8, target-token coverage 0.25, average target rank 13.25, top-3 0.25, top-5 0.375`
Target-set coverage heldout	`restored step 0; heldout predicted diversity 3/8, target-token coverage 0.25, average target rank 13.375, top-3 0.25, top-5 0.375`
Target-set coverage history	`training step 50 improved QA average target rank to 10.0, but target-token coverage collapsed to 0.0 with one wrong 'a' top-1 branch token`
Target-set coverage status	`rejected evidence: batch-local target-set mass still trades away eval target-token coverage; next repair should add explicit anti-collapse pressure over predicted target tokens`
Latest target-diversity run	`runs/transformer-answer-v0.57-fullstack-target-diversity-smoke-dim4-context80/`
Target-diversity mode	`branch-balanced-target-diversity-unlikelihood under the full v0.51 stack with positive target CE disabled`
Target-diversity unit test	`focused transformer tests pass; target-diversity regression verifies restricted target-set mass and weakest target-share balance improve in a small branch batch`
Target-diversity QA	`restored step 0; QA predicted diversity 3/8, target-token coverage 0.25, average target rank 13.25, top-3 0.25, top-5 0.375`
Target-diversity heldout	`restored step 0; heldout predicted diversity 3/8, target-token coverage 0.25, average target rank 13.375, top-3 0.25, top-5 0.375`
Target-diversity history	`training step 50 improved QA average target rank to 10.0, but target-token coverage collapsed to 0.0 with one wrong 'a' top-1 branch token`
Target-diversity status	`rejected evidence: batch-local target-share diversity still trades away eval target-token coverage; next repair should preserve eval-wide target coverage directly`
Latest target-replay coverage run	`runs/transformer-answer-v0.58-fullstack-target-replay-coverage-smoke-dim4-context80/`
Target-replay coverage mode	`branch-balanced-target-replay-coverage-unlikelihood under the full v0.51 stack with positive target CE disabled`
Target-replay coverage unit test	`focused transformer tests pass; target-replay regression verifies replay target-set mass and weakest missing-target share improve when the sampled branch batch omits admitted pool targets`
Target-replay coverage QA	`restored step 0; QA predicted diversity 3/8, target-token coverage 0.25, average target rank 13.25, top-3 0.25, top-5 0.375`
Target-replay coverage heldout	`restored step 0; heldout predicted diversity 3/8, target-token coverage 0.25, average target rank 13.375, top-3 0.25, top-5 0.375`
Target-replay coverage history	`training step 40 improved QA average target rank to 6.875 and top-5 coverage to 0.5; by step 50, QA/heldout top-1 collapsed to wrong 'n' and target-token coverage had hit 0.0 during training`
Target-replay coverage status	`rejected evidence: pool-owned replay target coverage still trades away context-specific target ownership; next repair should bind replay pressure to branch contexts`
Latest context-replay coverage run	`runs/transformer-answer-v0.59-fullstack-context-replay-coverage-smoke-dim4-context80/`
Context-replay coverage mode	`branch-balanced-context-replay-coverage-unlikelihood under the full v0.51 stack with positive target CE disabled`
Context-replay coverage unit test	`focused transformer tests pass; context-replay regression verifies replay target-set mass and weakest owned-target share improve on fixed replay contexts`
Context-replay coverage QA	`restored step 0; QA predicted diversity 3/8, target-token coverage 0.25, average target rank 13.25, top-3 0.25, top-5 0.375`
Context-replay coverage heldout	`restored step 0; heldout predicted diversity 3/8, target-token coverage 0.25, average target rank 13.375, top-3 0.25, top-5 0.375`
Context-replay coverage history	`training step 40 improved QA average target rank to 7.375, top-3 to 0.375, and top-5 to 0.5; by step 50, QA predicted diversity was only 2/8 and target-token coverage had hit 0.0 during training`
Context-replay coverage status	`rejected evidence: context-owned replay improves rank/top-k snapshots but still does not preserve target-token coverage; next repair should strengthen target-preserving ownership or scoring gates`
Latest coverage-floor run	`runs/transformer-answer-v0.60-fullstack-context-replay-coverage-floor-metadata-smoke-dim4-context80/`
Coverage-floor mode	`profile-wise target-token coverage floor before branch snapshot rank/top-k scoring`
Coverage-floor unit test	`focused transformer tests pass; coverage-floor regression rejects a rank-lifted candidate when QA target-token coverage falls below baseline`
Coverage-floor QA	`restored step 0; QA predicted diversity 3/8, target-token coverage 0.25, average target rank 13.25, top-3 0.25, top-5 0.375`
Coverage-floor heldout	`restored step 0; heldout predicted diversity 3/8, target-token coverage 0.25, average target rank 13.375, top-3 0.25, top-5 0.375`
Coverage-floor history	`clean v0.60 JSONL wrote 7 direct-answer rows with branch_target_coverage_by_profile; step 40 improved QA rank/top-k but was ineligible because profile coverage regressed`
Coverage-floor status	`gate repair accepted, model behavior rejected: coverage floor prevents rank/top-k gains from promoting snapshots that regress target-token coverage`
Latest coverage-anchor run	`runs/transformer-answer-v0.61-fullstack-context-coverage-anchor-smoke-dim4-context80/`
Coverage-anchor mode	`branch-balanced-context-coverage-anchor-unlikelihood under the full v0.51 stack with the v0.60 coverage floor`
Coverage-anchor unit test	`focused transformer tests pass; anchor regression verifies covered-target probability is protected better than the same replay training without anchors`
Coverage-anchor QA	`restored step 0; QA predicted diversity 3/8, target-token coverage 0.25, average target rank 13.25, top-3 0.25, top-5 0.375`
Coverage-anchor heldout	`restored step 0; heldout predicted diversity 3/8, target-token coverage 0.25, average target rank 13.375, top-3 0.25, top-5 0.375`
Coverage-anchor history	`training snapshots over-anchored covered wrong 'i'; QA/heldout predicted diversity fell to 1/8, target-token coverage to 0.125, and average target rank above 21`
Coverage-anchor status	`rejected evidence: global covered-target anchors protect one covered token but do not preserve coverage diversity; next repair should be target-balanced or profile-aware`
Latest target-balanced anchor run	`runs/transformer-answer-v0.62-fullstack-target-balanced-anchor-smoke-dim4-context80/`
Target-balanced anchor mode	`branch-balanced-context-target-balanced-anchor-unlikelihood under the full v0.51 stack with the v0.60 coverage floor`
Target-balanced anchor unit test	`focused transformer tests pass; singleton covered-target regression verifies target-balanced anchors skip the v0.61 one-token over-anchor`
Target-balanced anchor QA	`restored step 0; QA predicted diversity 3/8, target-token coverage 0.25, average target rank 13.25, top-3 0.25, top-5 0.375`
Target-balanced anchor heldout	`restored step 0; heldout predicted diversity 3/8, target-token coverage 0.25, average target rank 13.375, top-3 0.25, top-5 0.375`
Target-balanced anchor history	`training avoided the v0.61 hard 'i' attractor, but QA/heldout target-token coverage still collapsed to 0.0 and trained snapshots remained ineligible`
Target-balanced anchor status	`rejected evidence: target-balanced anchors prevent singleton over-anchoring but do not preserve profile coverage; next repair should train from profile-level coverage deficits`
Latest coverage-deficit run	`runs/transformer-answer-v0.64-fullstack-coverage-deficit-smoke-dim4-context80/`
Coverage-deficit mode	`branch-balanced-context-coverage-deficit-unlikelihood under the full v0.51 stack with the v0.60 coverage floor`
Coverage-deficit unit test	`focused transformer tests pass; deficit regression verifies missing replay targets gain restricted probability over the old context replay objective`
Coverage-deficit QA	`restored step 0; QA predicted diversity 3/8, target-token coverage 0.25, average target rank 13.25, top-3 0.25, top-5 0.375`
Coverage-deficit heldout	`restored step 0; heldout predicted diversity 3/8, target-token coverage 0.25, average target rank 13.375, top-3 0.25, top-5 0.375`
Coverage-deficit history	`training step 50 reached QA accuracy 1/8 and predicted diversity 4/8 with average target rank 10.0, but QA/heldout target-token coverage regressed to 0.125 and trained snapshots remained ineligible`
Coverage-deficit status	`rejected evidence: deficit pressure can crack the top-1 branch in training but still trades away coverage, so the next repair should combine deficit pressure with an explicit coverage-preserving constraint`
Latest coverage-preserving deficit run	`runs/transformer-answer-v0.65-fullstack-coverage-preserving-deficit-smoke-dim4-context80/`
Coverage-preserving deficit mode	`branch-balanced-context-coverage-preserving-deficit-unlikelihood under the full v0.51 stack with the v0.60 coverage floor`
Coverage-preserving deficit unit test	`focused transformer tests pass; preserving-deficit regression verifies missing targets still lift while represented target tokens are protected better than deficit-only training`
Coverage-preserving deficit QA	`restored step 0; QA predicted diversity 3/8, target-token coverage 0.25, average target rank 13.25, top-3 0.25, top-5 0.375`
Coverage-preserving deficit heldout	`restored step 0; heldout predicted diversity 3/8, target-token coverage 0.25, average target rank 13.375, top-3 0.25, top-5 0.375`
Coverage-preserving deficit history	`training step 50 reached QA/heldout branch accuracy 1/8, QA average target rank 7.75, heldout average target rank 7.125, and top-5 coverage 0.5, but both profiles collapsed to predicted_unique 1/8 with target-token coverage 0.125`
Coverage-preserving deficit status	`rejected evidence: predicted-target preservation over-preserved the represented 'i' token and improved rank while regressing coverage diversity; next repair should make the coverage constraint profile-aware instead of anchoring current predicted target tokens`
Latest profile-aware replay run	`runs/transformer-answer-v0.67-profile-aware-replay-plan-smoke-dim4-context80/`
Profile-aware replay mode	`branch-balanced-context-profile-coverage-preserving-deficit-unlikelihood under the full v0.51 stack with the v0.60 coverage floor and v0.67 per-profile replay plan`
Profile-aware replay unit test	`focused transformer tests pass; profile replay plan verifies profile deficits are not hidden by global target coverage, and profiled replay records preserve source keys for shared branch targets`
Profile-aware replay plan	`direct_answer_replay_plan.json records 9144 branch/replay records across 21 profiles; example floors include qa:place 0.5 and qa:color 0.0`
Profile-aware replay gate	`branch-context gate passed 219/219 semantic records with 0 ambiguous contexts, 0 context collisions, and 0 skipped records`
Profile-aware replay smoke	`one gated branch-only direct step ran, post-direct candidate snapshot was skipped by configuration, and the best branch snapshot restored from step 0`
Profile-aware replay status	`mechanics-readiness evidence: replay plan and profile-aware objective surface are implemented, but branch-diversity target still failed 0/9 multi-target profiles so no model-quality promotion`
Latest profile-aware full-stack run	`runs/transformer-answer-v0.68-fullstack-profile-aware-preserving-deficit-smoke-dim4-context80/`
Profile-aware full-stack mode	`branch-balanced-context-profile-coverage-preserving-deficit-unlikelihood under the full v0.51 stack with the v0.60 coverage floor and v0.67 replay-plan artifact`
Profile-aware full-stack plan	`direct_answer_replay_plan.json records 9144 branch/replay records across 21 profiles; branch-context gate passed 219/219 semantic records`
Profile-aware full-stack QA	`restored step 0; QA predicted diversity 3/8, target-token coverage 0.25, average target rank 13.25, top-3 0.25, top-5 0.375`
Profile-aware full-stack heldout	`restored step 0; heldout predicted diversity 3/8, target-token coverage 0.25, average target rank 13.375, top-3 0.25, top-5 0.375`
Profile-aware full-stack history	`step 40 improved QA average target rank to 6.5 and top-5 to 0.625, with heldout rank 6.875 and top-5 0.5, but QA/heldout target-token coverage regressed to 0.125 and predicted diversity collapsed to 1/8`
Profile-aware full-stack status	`rejected evidence: profile-aware preservation can improve rank under training, but best-snapshot scoring restored step 0 because trained snapshots still erase coverage and diversity`
Profile target-share objective	`src/closed_world_lm/transformer_char_model.py and src/closed_world_lm/transformer_objectives.py add branch-balanced-context-profile-target-share-preserving-deficit-unlikelihood as the v0.81 profile target-share objective implementation.`
Profile target-share decision	`Profile-aware replay can now add balanced owned target-share pressure across each profile's replay targets while retaining deficit focus, represented-target preservation, replay-plan artifacts, and recipe/promotion surfaces.`
Profile target-share unit test	`focused transformer tests pass; the minority replay target gains more share with balanced profile target-share pressure than under the previous profile-aware replay loss.`
Latest profile target-share run	`runs/transformer-answer-v0.82-fullstack-profile-target-share-smoke-dim4-context80/`
Profile target-share mode	`branch-balanced-context-profile-target-share-preserving-deficit-unlikelihood`
Profile target-share artifacts	`experiment_intent.json, corpus_hygiene.json, training_plan.json, candidate_quarantine.json, closed_world_verifier.json, training_recipe.json, direct_answer_replay_plan.json, constraint_first_promotion.json, metrics JSON/JSONL, tokenizer, optimizer, lessons, and checkpoint are written.`
Profile target-share gate	`branch-context gate passed 219/219 semantic records; replay plan records 9144 branch/replay records across 21 profiles; deterministic verifier passed; purity gates include external_embeddings false.`
Profile target-share history	`50/50 direct steps completed with 7 clean JSONL rows. Step 40 lowered train loss to 19.7378 and improved QA average rank to 9.125, but QA and heldout collapsed to one 'c' prediction with target-token coverage 0.0.`
Profile target-share status	`rejected evidence: best-snapshot scoring restored step 0, preserving QA/heldout target-token coverage at 0.25, but branch_diversity_target still failed across all 9 multi-target profiles.`
Latest prompt-ownership run	`runs/transformer-answer-v0.83-fullstack-prompt-ownership-smoke-dim4-context80/`
Prompt-ownership mode	`branch-balanced-context-profile-prompt-ownership-target-share-preserving-deficit-unlikelihood`
Prompt-ownership unit test	`focused transformer tests pass; prompt-specific ownership margins lift a context's own target above a sibling profile target more than the v0.82 profile target-share pressure.`
Prompt-ownership gate	`branch-context gate passed 219/219 semantic records; replay plan records 9144 branch/replay records across 21 profiles; deterministic verifier passed; purity gates include external_embeddings false.`
Prompt-ownership history	`50/50 direct steps completed with 7 clean JSONL rows. Step 50 improved QA average target rank to 8.625 and heldout average target rank to 8.5, but QA and heldout collapsed to one 'c' prediction with target-token coverage 0.0 during training.`
Prompt-ownership status	`rejected evidence: best-snapshot scoring restored step 0, preserving QA/heldout target-token coverage at 0.25, but branch_diversity_target still failed across all 9 multi-target profiles.`
Latest baseline-anchor run	`runs/transformer-answer-v0.84-fullstack-baseline-anchored-prompt-ownership-smoke-dim4-context80/`
Baseline-anchor mode	`branch-balanced-context-profile-baseline-anchored-prompt-ownership-target-share-preserving-deficit-unlikelihood`
Baseline-anchor unit test	`focused transformer tests pass; profiled replay batches can use baseline prediction overrides, and anchored replay preservation protects a covered target better than following current prediction drift.`
Baseline-anchor gate	`branch-context gate passed 219/219 semantic records; replay plan records 9144 branch/replay records across 21 profiles; 562 baseline prediction anchors were recorded and active; deterministic verifier passed; purity gates include external_embeddings false.`
Baseline-anchor history	`50/50 direct steps completed with 7 clean JSONL rows. Step 40 improved QA average target rank to 8.0 and heldout rank to 8.375, but QA and heldout collapsed to one 'i' prediction with target-token coverage 0.125 during training.`
Baseline-anchor status	`rejected evidence: anchoring improves over the v0.83 zero-coverage collapse, but best-snapshot scoring restored step 0 because trained snapshots still fell below the 0.25 QA/heldout coverage floor and branch_diversity_target failed across all 9 multi-target profiles.`
Latest baseline-floor gate run	`runs/transformer-answer-v0.85-fullstack-baseline-floor-gated-prompt-ownership-smoke-dim4-context80/`
Baseline-floor gate mode	`branch-balanced-context-profile-baseline-floor-gated-prompt-ownership-target-share-preserving-deficit-unlikelihood`
Baseline-floor gate unit test	`focused transformer tests pass; the new mode records baseline replay anchors, a baseline-floor update guard, and one-step accepted/rejected guard accounting.`
Baseline-floor gate	`branch-context gate passed 219/219 semantic records; replay plan records 9144 branch/replay records across 21 profiles; 562 baseline prediction anchors were recorded and active; the baseline-floor update guard checked 50 attempted steps and rejected 50 unsafe updates; deterministic verifier passed; purity gates include external_embeddings false.`
Baseline-floor gate history	`50/50 attempted direct steps completed with 7 clean JSONL rows. The guard preserved baseline/final QA and heldout target-token coverage at 0.25, predicted diversity at 3/8, QA average target rank at 13.25, and heldout average rank at 13.375, but accepted 0/50 attempted updates.`
Baseline-floor gate status	`rejected evidence: v0.85 prevents unsafe forgetting by refusing every update below the profile-wise baseline coverage floor, but branch_diversity_target still fails across all 9 multi-target profiles and no weight update is accepted.`
Latest baseline-floor adaptive run	`runs/transformer-answer-v0.86-fullstack-baseline-floor-adaptive-prompt-ownership-smoke-dim4-context80/`
Baseline-floor adaptive mode	`branch-balanced-context-profile-baseline-floor-adaptive-prompt-ownership-target-share-preserving-deficit-unlikelihood`
Baseline-floor adaptive unit test	`focused transformer tests pass; the adaptive mode records baseline replay anchors, adaptive learning-rate scales, checked steps, attempted updates, accepted attempts, and rejected attempts.`
Baseline-floor adaptive gate	`branch-context gate passed 219/219 semantic records; replay plan records 9144 branch/replay records across 21 profiles; 562 baseline prediction anchors were recorded and active; adaptive scales were 1.0, 0.25, 0.05, and 0.01; the guard checked 50 steps, attempted 200 updates, and rejected 200 unsafe attempts; deterministic verifier passed; purity gates include external_embeddings false.`
Baseline-floor adaptive history	`50/50 attempted direct steps completed with 7 clean JSONL rows. The guard preserved baseline/final QA and heldout target-token coverage at 0.25, predicted diversity at 3/8, QA average target rank at 13.25, and heldout average rank at 13.375, but accepted 0/200 scaled attempted updates.`
Baseline-floor adaptive status	`rejected evidence: v0.86 proves the unsafe-update problem is not fixed by four learning-rate scales; every scaled retry still falls below at least one profile-wise baseline coverage floor and branch_diversity_target still fails across all 9 multi-target profiles.`
Latest baseline-floor repaired run	`runs/transformer-answer-v0.87-fullstack-baseline-floor-repaired-prompt-ownership-clean-smoke-dim4-context80/`
Baseline-floor repaired mode	`branch-balanced-context-profile-baseline-floor-repaired-prompt-ownership-target-share-preserving-deficit-unlikelihood`
Baseline-floor repaired unit test	`focused transformer tests pass; the repaired mode records baseline replay anchors, adaptive learning-rate scales, repair-anchor counts, repair attempts, repaired attempts, accepted update-shape counts, and rejected samples.`
Baseline-floor repaired gate	`branch-context gate passed 219/219 semantic records; replay plan records 9144 branch/replay records across 21 profiles; 562 baseline prediction anchors and 227 baseline-covered repair anchors were recorded; adaptive scales were 1.0, 0.25, 0.05, and 0.01; the guard checked 50 steps, attempted 200 updates, ran 200 one-step repairs, and rejected 200 unsafe attempts; deterministic verifier passed; purity gates include external_embeddings false.`
Baseline-floor repaired history	`50/50 attempted direct steps completed with 7 clean JSONL rows. The guard preserved baseline/final QA and heldout target-token coverage at 0.25, predicted diversity at 3/8, QA average target rank at 13.25, and heldout average rank at 13.375, but accepted 0/200 repaired attempted updates.`
Baseline-floor repaired status	`rejected evidence: v0.87 proves one bounded baseline-covered anchor repair after an unsafe update is still not enough; every repaired retry falls below at least one profile-wise baseline coverage floor and branch_diversity_target still fails across all 9 multi-target profiles.`
Latest baseline-floor objective run	`runs/transformer-answer-v0.88-fullstack-baseline-floor-objective-prompt-ownership-smoke-dim4-context80/`
Baseline-floor objective mode	`branch-balanced-context-profile-baseline-floor-objective-prompt-ownership-target-share-preserving-deficit-unlikelihood`
Baseline-floor objective unit test	`focused transformer tests pass; the objective mode records baseline replay anchors, objective-side floor-anchor counts, anchor batch size, anchor weight, objective anchor batches, accepted attempts, and rejected attempts.`
Baseline-floor objective gate	branch-context gate passed 219/219 semantic records; replay plan records 9144 branch/replay records across 21 profiles; 562 baseline prediction anchors and 227 objective-side floor anchors were recorded; anchor batch size was 32, anchor weight was 10.0, adaptive scales were 1.0, 0.25, 0.05, and 0.01; the guard checked 50 steps, attempted 200 updates, ran 200 objective anchor batches covering 2400 anchor records, and rejected 200 unsafe attempts; deterministic verifier passed; purity gates include external_embeddings false.
Baseline-floor objective history	`50/50 attempted direct steps completed with 7 clean JSONL rows. The guard preserved baseline/final QA and heldout target-token coverage at 0.25, predicted diversity at 3/8, QA average target rank at 13.25, and heldout average rank at 13.375, but accepted 0/200 objective-shaped attempted updates.`
Baseline-floor objective status	`rejected evidence: v0.88 proves a balanced objective-side floor-anchor term is still not enough when coupled to branch-diversity pressure; every retry falls below at least one profile-wise baseline coverage floor and branch_diversity_target still fails across all 9 multi-target profiles.`
Profile target-share next	`Use the v0.115.0 hidden-projection candidate evidence with profile target-share and branch-diversity gates before promotion.`
Transformer selector exact	`18/219 -> 219/219 selector-emitted`
Transformer selector candidate accuracy	`18/219 -> 219/219 eval-scoped`
Transformer-guided generator exact	`0/219 -> 219/219 no-candidate`
Tokenizer	`corpus-trained character tokenizer`

v0.42 Summary

QuarkLM v0.42 keeps the admitted corpus unchanged from v0.41 and widens the from-scratch transformer used by the sparse prompt-contrast branch repair path. The stable self-improvement run is runs/self-improve-v0.42/; the current transformer answer-lesson run is runs/transformer-answer-v0.42-branch-repair-contrast50-dim8-context32/.

The current corpus remains at 12 admitted facts. Direct admission probes pass 48/48, admission paraphrase probes pass 84/84, and glossary probes pass 38/38.

The transformer is a tiny decoder-only language model built in the Python standard library. It uses learned token and position embeddings, one causal self-attention block, a feed-forward block, and QuarkLM's corpus-trained character tokenizer. It starts from random weights and imports no pretrained model, vocabulary, or embeddings.

Transformer direct-answer evidence:

transformer checkpoint: runs/transformer-answer-v0.42-branch-repair-contrast50-dim8-context32/transformer_answer.json
v0.31 generator checkpoint retained for comparison: runs/transformer-answer-v0.31-generator-weighted-lr035-80k/answer_generator.json
selector checkpoint: runs/transformer-answer-v0.31-generator-weighted-lr035-80k/answer_selector.json
training steps: 80
context size: 32
embedding dimension: 8
feed-forward dimension: 16
direct answer steps: 1000
direct answer mode: periodic-branch-repair-contrast-unlikelihood
direct answer negative weight: 1.0
direct answer positive weight: 1.0
direct answer contrast weight: 1.0
branch position: 1
contrast interval: 50
direct answer training examples: 9144
answer target NLL: 3.5850 -> 2.4129
direct answer target loss: 3.4278 -> 2.2708
raw direct greedy exact answers: 0/219 -> 0/219
transformer-only eval-scoped candidate accuracy: 15/219 -> 37/219
selector-emitted exact answers: 18/219 -> 219/219
selector eval-scoped candidate accuracy: 18/219 -> 219/219
v0.31 generator exact answers without candidates: 0/219 -> 219/219
pretrained weights: false
pretrained tokenizer: false
external embeddings: false
direct path uses answer candidates: false
direct path uses auxiliary weights: false
generator uses answer candidates: false

That is a real movement toward raw transformer answering with a clear boundary. v0.42 preserves v0.33's transformer-only candidate discrimination while testing whether a wider random transformer gives sparse branch contrast more room to represent prompt differences. The direct path improves answer-target NLL versus v0.41 and reduces runaway greedy looping, but raw greedy completions still fail exact answer generation: the dominant failure is now the short wrong completion " te.". The next structured repair should make the prompt representation more target-specific without losing these scored gains. v0.31's auxiliary no-candidate generator remains the best exact no-candidate answer evidence. The current reliable response gate still belongs to the responder, learned answer classifier, and generative answer decoder.

Unpromoted v0.43 Findings

v0.43 development added transformer-loop improvements, but did not replace the v0.42 promoted checkpoint.

The transformer forward pass now computes only the final position consumed by the language-model head, preserving the next-character objective while making longer-context experiments practical.
Transformer answer artifacts now include prompt context-coverage metrics. A context size of 80 covers all current semantic eval templates (219/219), while context size 32 drops complete template coverage for many prompts.
runs/transformer-answer-v0.43-hard-branch-contrast4-dim8-context32/ preserved candidate accuracy at 37/219, but regressed direct loss to 2.4225, answer NLL to 2.5402, and collapsed greedy output to a repeated " a" loop.
runs/transformer-answer-v0.43-branch-repair-contrast50-dim8-context80/ achieved full context coverage and the shorter failure " t.", but still trailed v0.42 with direct loss 2.3122 and answer NLL 2.4546.
runs/transformer-answer-v0.43-branch-repair-contrast50-dim8-context80-1500/ reached 38/219 candidates, but regressed direct loss, answer NLL, and greedy output. It remains archived evidence rather than a promoted release.
runs/transformer-answer-v0.43-layernorm-screen-dim8-context80/ tested optional layer normalization with full context coverage. It preserved 37/219 candidates, but answer NLL regressed to 2.5881 and greedy output collapsed into repeated " y"/"e" loops, so it was not promoted.
runs/transformer-answer-v0.43-branch-span3-screen-dim8-context32/ tested branch repair over answer positions 1..3. It preserved 37/219 candidates, but answer NLL regressed to 2.7426 and greedy output became a long "neeee" loop, so it was not promoted.
runs/transformer-answer-v0.43-two-layer-screen-dim8-context32/ tested the new multi-layer transformer path. It was interrupted before final direct-answer metrics because two-layer full-block scalar autograd was too slow for the regular loop. The partial JSONL history is runtime evidence, not promotion evidence.
runs/transformer-answer-v0.43-two-layer-finalopt-screen-dim8-context32/ tested the optimized stacked path where the final layer computes only the last state. The optimization is covered by logit-equivalence tests, but the run was still interrupted before final metrics because the intermediate full-state layer remains too expensive for direct-answer repair updates.
runs/transformer-answer-v0.43-two-layer-toponly-skip-screen-dim8-context32/ tested top-layer-only direct-answer updates for a two-layer transformer and the explicit post-direct snapshot skip used for bounded screens. It completed and saved a checkpoint after 40 target-loss steps and 80 direct-answer steps, recorded the skipped post-direct candidate snapshot, improved direct-answer target loss 3.5186 -> 3.2436, but kept direct greedy exact at 0/219 -> 0/219 with repeated "a" output. It is training-loop completion evidence, not promotion evidence.
runs/transformer-answer-v0.43-branch-profile-smoke-dim4-context16/ verified direct-answer branch-profile metrics. The QA branch-position-1 profile stayed at 1/8 accuracy, moved from all "o" predictions to all "y" predictions after five tiny direct updates, and kept a negative average target margin. This is model-native self-diagnosis evidence for prompt-independent branch collapse, not promotion evidence.
runs/transformer-answer-v0.43-branch-collapse-smoke-dim4-context16/ tested full-dose dominant-branch-token suppression. It regressed direct loss and moved QA branch collapse from all "o" predictions to all "a" predictions.
runs/transformer-answer-v0.43-periodic-branch-collapse-smoke-dim4-context16/ tested sparse dominant-token suppression every five direct steps. It improved direct loss 3.5800 -> 3.5157, but QA branch accuracy stayed 1/8 -> 1/8 and the dominant prediction moved from all "o" to all "n". It remains rejected repair evidence because the branch stayed prompt-independent.
runs/transformer-answer-v0.43-branch-batch-smoke-dim4-context16/ tested full-dose distinct-target branch batching. It improved direct loss only slightly and moved QA branch collapse from all "o" predictions to all "y" predictions.
runs/transformer-answer-v0.43-periodic-branch-batch-smoke-dim4-context16/ tested sparse branch-batch contrast every five direct steps. It improved direct loss 3.5800 -> 3.5248, but QA branch accuracy regressed 1/8 -> 0/8 and the dominant prediction moved from all "o" to all "a". It is rejected evidence that distinct-target batching still does not force prompt-conditioned branch separation in the current representation.
runs/transformer-answer-v0.43-context-mean-branch-batch-smoke-dim4-context16/ added --use-context-mean, a representation-side option that adds the mean-pooled prompt context to the final transformer hidden state. With sparse branch-batch contrast it improved direct loss 3.5805 -> 3.5252, but QA branch accuracy regressed 1/8 -> 0/8 and the dominant prediction moved from all "o" to all "a".
runs/transformer-answer-v0.43-context-mean-branch-repair-smoke-dim4-context16/ tested the same context-mean representation with sparse branch repair. It improved direct loss 3.5805 -> 3.5310, but again regressed QA branch accuracy 1/8 -> 0/8 and collapsed to all "a" predictions. This is rejected representation evidence: prompt averaging alone is not enough to produce prompt-specific branch choices.
runs/transformer-answer-v0.43-context-projection-branch-repair-smoke-dim4-context16/ added --use-context-projection, a zero-initialized trainable projection of the mean-pooled context. It starts baseline-equivalent, moved all 20 projection parameters during training, and improved direct loss 3.5802 -> 3.5217, but QA branch accuracy regressed 1/8 -> 0/8 and the dominant prediction moved from all "o" to all "a".
runs/transformer-answer-v0.43-context-projection-branch-batch-smoke-dim4-context16/ tested the same learned projection with sparse branch-batch contrast. It moved all 20 projection parameters and improved direct loss 3.5802 -> 3.5252, but also regressed QA branch accuracy 1/8 -> 0/8 and collapsed to all "a" predictions. This keeps learned context projection in rejected representation evidence.
runs/transformer-answer-v0.43-prompt-attention-branch-repair-smoke-dim4-context16/ added --use-prompt-attention-summary, a trainable attention-pooled context summary with a zero-initialized output projection. It moved all 20 output projection parameters and improved direct loss 3.5802 -> 3.5217, but QA branch accuracy regressed 1/8 -> 0/8 and the dominant prediction moved from all "o" to all "a".
runs/transformer-answer-v0.43-prompt-attention-branch-batch-smoke-dim4-context16/ tested the same prompt-attention summary with sparse branch-batch contrast. It moved all 20 output projection parameters and improved direct loss 3.5802 -> 3.5252, but again regressed QA branch accuracy 1/8 -> 0/8 and collapsed to all "a" predictions. This keeps trainable prompt attention in rejected representation evidence.
runs/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context16/ added branch_context_coverage diagnostics to direct-answer snapshots. At context size 16, QA had 0/8 semantic coverage and 4 ambiguous branch contexts; for example "s ball?\nanswer: " mapped both place and color first target tokens.
runs/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context32/ removed QA branch ambiguity (0 ambiguous contexts), but still had 0/8 semantic coverage at the branch point because the prompt prefix was truncated.
runs/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context80/ reached complete branch-context coverage across all eval sets (219/219) with zero ambiguous branch contexts. This is diagnostic evidence for efficient longer-context branch repair.
runs/transformer-answer-v0.43-branch-context-gate-smoke-dim4-context16/ made that diagnostic actionable with --direct-answer-require-branch-context-gate. The required gate failed at context size 16, so the run recorded actual_steps: 0 for 5 requested direct-answer steps.
runs/transformer-answer-v0.43-branch-context-gate-smoke-dim4-context80/ passed the same required gate at context size 80 and recorded actual_steps: 1 for 1 requested direct-answer step.
runs/transformer-answer-v0.43-branch-context-gated-branchonly-smoke-dim4-context80/ added --direct-answer-snapshot-mode branch-only to keep longer-context branch screens bounded. The required context-80 gate passed across all 219/219 semantic records, all 5 requested direct-answer steps ran, and JSONL snapshots recorded evals_skipped: true while retaining branch profiles and branch-context gate evidence.
runs/transformer-answer-v0.43-branchonly-periodic-repair-contrast50-dim8-context80/ used branch-only snapshots for a dim8 context-80 version of the best prior sparse repair/contrast policy. The required gate passed and all 100 direct steps ran, but QA branch prediction collapsed to all "a" with final QA branch accuracy 0/8.
runs/transformer-answer-v0.43-branchonly-branch-batch-dim8-context80/ tested branch-batch contrast under the same complete context. It lowered interval train loss 3.4614 -> 3.1976, but final QA branch prediction still collapsed to all "a" with final QA branch accuracy 0/8.
runs/transformer-answer-v0.43-branch-diversity-target-smoke-dim4-context80/ added a first-class branch_diversity_target to direct-answer snapshots. The required branch-context gate passed and all 5 direct steps ran, but the diversity target failed across all 9 multi-target eval profiles. Final QA had target_unique: 8, predicted_unique: 1, dominant predicted token "r" at rate 1.0, and target-token coverage 0.125.
runs/transformer-answer-v0.43-branch-diversity-train-smoke-dim4-context80/ added branch-diversity-unlikelihood, which trains distinct branch targets while penalizing each branch context's current wrong prediction. The required branch-context gate passed and 10/10 direct steps ran, but the diversity target still failed across all 9 multi-target profiles. QA moved from all "x" to all "b" predictions, with target-token coverage 0.0 -> 0.125 and predicted_unique still 1/8.
runs/transformer-answer-v0.43-branch-diversity-freezebias-smoke-dim4-context80/ added --direct-answer-freeze-output-bias, which excludes the transformer output bias from direct-answer updates. The required branch-context gate passed and 50/50 direct steps ran with the output bias frozen. Loss moved 3.6149 -> 3.5016, but the diversity target still failed across all 9 multi-target profiles. QA moved from all "x" to all "w" predictions, final target-token coverage was 0.0, and predicted_unique stayed 1/8.
runs/transformer-answer-v0.43-branch-target-softmax-freezebias-smoke-dim4-context80/ added branch-target-softmax-unlikelihood, which applies a restricted softmax over the distinct branch targets in each batch. The required branch-context gate passed, output bias was frozen, and 50/50 direct steps ran. Composite train loss moved 5.6671 -> 5.5820, but the diversity target still failed across all 9 multi-target profiles. QA briefly reached predicted_unique: 2 at step 20, then collapsed back to all "w" by step 50.
runs/transformer-answer-v0.43-branch-target-softmax-restorebest-smoke-dim4-context80/ added --direct-answer-restore-best-branch-snapshot. The required branch-context gate passed, output bias was frozen, and 50/50 direct steps ran. The run restored the final checkpoint from step 40; final QA moved from the prior all-"w" endpoint to all "u" with target-token coverage 0.125, but predicted_unique stayed 1/8 and all 9 multi-target profiles still failed the diversity target.
runs/transformer-answer-v0.43-prompt-prefix-target-softmax-restorebest-smoke-dim4-context80/ added --use-prompt-prefix-projection, a zero-initialized trainable projection over non-padding prompt-prefix positions before the final answer token. All 20 projection parameters moved and composite train loss improved 5.6649 -> 5.5679, but the final checkpoint restored from step 40 to the same all-"u" QA collapse with target-token coverage 0.125.
runs/transformer-answer-v0.43-prompt-position-target-softmax-restorebest-smoke-dim4-context80/ added --use-prompt-position-projection, a position-specific trainable projection over non-padding prompt-prefix positions before the final answer token. 1108/1284 projection parameters moved and composite train loss improved 5.6649 -> 5.5679, but the final checkpoint restored from step 40 to the same all-"u" QA collapse with target-token coverage 0.125.
runs/transformer-answer-v0.43-branch-target-margin-prompt-position-smoke-dim4-context80/ added branch-target-margin-unlikelihood, a smooth pairwise target-margin loss over each batch's distinct branch targets. The prompt-position context-80 screen moved train loss 4.8973 -> 4.7784 and moved 1108/1284 prompt-position projection parameters, but the final checkpoint restored from step 40 to the same all-"u" QA collapse with target-token coverage 0.125.
runs/transformer-answer-v0.43-branch-representation-contrast50-prompt-position-smoke-dim4-context80/ added branch_representation_profiles and branch-representation-contrast-unlikelihood. The high-weight prompt-position context-80 screen used --direct-answer-contrast-weight 50.0 and moved QA different-target hidden distance only about 0.00097 -> 0.00107 at the restored checkpoint; the final branch profile still restored to the same all-"u" QA collapse.
runs/transformer-answer-v0.43-branch-representation-contrast50-prompt-position-smoke-dim8-context80-steps40/ tested the same high-weight representation-contrast path at embedding/feed- forward dimensions 8/16. The completed 40/40 step screen restored from step 10, moved QA different-target hidden distance to about 0.00209, and still restored to the same all-"u" QA collapse with target-token coverage 0.125.
runs/transformer-answer-v0.43-prompt-position-scale32-repcontrast50-smoke-dim4-context80/ added --prompt-position-projection-scale 32.0 to test whether the prompt- position residual was simply too quiet. The completed 50/50 step screen moved 1108/1284 prompt-position projection parameters and restored from step 40; restored QA different-target hidden distance rose to about 0.01235, but QA still collapsed to all "u" with target-token coverage 0.125.
STRUCTURE_AUDIT.md now records the next transformer checkpoint: study open-source model, trainer, tokenizer, checkpoint, and transparency patterns before adding another repair objective, while keeping all external weights, tokenizers, embeddings, datasets, and training text outside QuarkLM's closed-world boundary. The completed comparison table chooses an opt-in pre-layer-norm transformer block path with final normalization as the next structural implementation target.
runs/transformer-answer-v0.44-prelayernorm-repcontrast50-prompt-position-smoke-dim4-context80/ implemented that path with --use-pre-layer-norm. The bounded context-80 screen ran 50/50 direct steps, moved 1108/1284 prompt-position parameters and all 8 final-norm parameters, and cracked full collapse in 7/9 multi-target profiles. The formal diversity target still failed 0/9, and QA stayed collapsed to all "y" with target-token coverage 0.125.
runs/transformer-answer-v0.44-target-balanced-prelayernorm-repcontrast50-prompt-position-smoke-dim4-context80/ added target-bucket branch batch sampling through branch-balanced-representation-contrast-unlikelihood. The screen ran 50/50 direct steps, but best-snapshot restoration returned to step 0 because every trained snapshot scored worse than baseline. All 9/9 multi-target profiles collapsed to "n", so target balancing is rejected as a standalone repair.
runs/transformer-answer-v0.45-branch-rank-diagnostic-smoke-dim4-context80/ adds target-rank diagnostics to branch profiles. The smoke used the pre-layer-norm prompt-position path and recorded QA and heldout both collapsed to "n" with average target rank 14.25 and top-3/top-5 target coverage 0.125. The correct branch target is usually buried behind several global alternatives, so this is output-binding evidence rather than a near- miss rank problem.
runs/transformer-answer-v0.46-output-binding-rankscore-smoke-dim4-context80/ adds branch-output-binding-unlikelihood, combining branch target softmax with representation contrast, and makes best-snapshot scoring rank-aware. It ran 20/20 direct steps with output bias frozen. QA average target rank improved 17.375 -> 14.125, and QA/heldout top-5 coverage reached 0.25. Target-token coverage stayed 0.0, top-3 coverage ended 0.0, and the branch prediction still collapsed to wrong tokens, so the repair is rejected for promotion.
runs/transformer-answer-v0.47-rank-margin-steps50-smoke-dim4-context80/ adds branch-rank-margin-unlikelihood, which pushes each branch target above the model's own top wrong tokens. The screen ran 50/50 direct steps, restored the rank-aware best snapshot from step 40, and improved QA average target rank 17.375 -> 9.0. QA target-token coverage rose to 0.125, top-3 coverage rose to 0.25, and top-5 coverage rose to 0.5. It is still rejected because predicted diversity stayed 1/8 and QA/heldout remained collapsed to wrong "n".
runs/transformer-answer-v0.48-balanced-rank-margin-smoke-dim4-context80/ combines target-balanced branch batches with the same rank-margin repair. It ran 50/50 direct steps and reached QA predicted diversity 2/8, target- token coverage 0.125, average target rank 9.375, top-3 coverage 0.375, and top-5 coverage 0.5. It is still rejected because QA and heldout remain wrong top-1 branch choices.
runs/transformer-answer-v0.49-balanced-rank-margin-top1-smoke-dim4-context80/ tests the same balanced rank-margin path with --direct-answer-hard-negatives 1, concentrating margin pressure on only the current top wrong token. It restored from step 10; QA target-token coverage stayed 0.125, but average target rank regressed to 12.5, top-3 coverage fell to 0.125, and top-5 coverage fell to 0.25. This is rejected evidence.
runs/transformer-answer-v0.50-balanced-topk-softmax-w5-smoke-dim4-context80/ adds branch-balanced-topk-softmax-unlikelihood, where each correct branch target competes in a restricted softmax against the model's current top wrong tokens. It restored from step 40; QA target-token coverage stayed 0.125, average target rank improved to 8.75, top-3 coverage reached 0.375, and top-5 coverage reached 0.5. This recovers rank/top-k evidence after v0.49, but prediction diversity stayed 1/8 and top-1 branch choices remained wrong, so it is rejected repair evidence.
runs/transformer-v0.51-foundation-stack-smoke/ verifies the full transformer foundation stack before the next direct-answer repair run. It ran 2/2 language-model steps with AdamW, gradient accumulation, two attention heads, RMSNorm, gated MLPs, tied output embeddings, rotary positions, and cache-aware generation metadata. The run wrote a quarklm-transformer-v2 checkpoint, optimizer_state.json, eval.json, and replayable eval_samples.jsonl traces. This is mechanics-readiness evidence, not model-quality promotion evidence.
runs/transformer-answer-v0.52-fullstack-topk-softmax-smoke-dim4-context80/ reruns the v0.50 top-k branch objective under the full v0.51 stack. It completed 50/50 direct steps and restored to step 0: the full-stack baseline had QA and heldout predicted diversity 3/8 and target-token coverage 0.25, but training collapsed to one wrong token at later snapshots. This rejects unchanged top-k pressure under the full stack and points the next repair toward prompt-context-to-target-token binding.
runs/transformer-answer-v0.53-fullstack-bidir-binding-smoke-dim4-context80/ adds branch-balanced-bidirectional-binding-unlikelihood. The objective trains each prompt context to choose its own branch target and each target token to assign cross-context probability mass back to its own prompt contexts. The focused transformer unit test verifies that context-ownership signal on a small branch batch. The full-stack screen completed 50/50 direct steps and restored from step 40: QA average target rank improved to 7.875 with top-5 coverage 0.5, but target-token coverage ended at 0.125 and the diversity target still failed 0/9 multi-target profiles. This is partial rank-pressure evidence, not promotion evidence.
runs/transformer-answer-v0.54-fullstack-coverage-binding-smoke-dim4-context80/ adds branch-balanced-coverage-binding-unlikelihood, which makes every branch target compete against sibling branch targets and hard wrong tokens while adding a target-set mass coverage guard. The focused transformer test verifies that this pressure lifts target-set mass against hard wrong tokens. The full-stack screen completed 50/50 direct steps but restored from step 0: training snapshots improved QA average target rank to 8.125, but target-token coverage collapsed to 0.0 and top-1 predictions collapsed to wrong "a". This rejects the bundled coverage-binding loss under the full stack.
runs/transformer-answer-v0.55-fullstack-target-set-coverage-smoke-dim4-context80/ isolates target-set coverage with branch-balanced-target-set-coverage-unlikelihood, positive target CE disabled, and no exact-target row or cross-context ownership losses. The focused transformer test verifies that target-set mass can increase against hard wrong tokens without asserting exact-target sharpening. The full-stack screen completed 50/50 direct steps and restored from step 0: training snapshots improved QA average target rank to 10.0, but target-token coverage still collapsed to 0.0 with wrong "a" top-1 predictions. This rejects batch-local target-set mass as a sufficient coverage repair.
runs/transformer-answer-v0.57-fullstack-target-diversity-smoke-dim4-context80/ adds target-share anti-collapse pressure with branch-balanced-target-diversity-unlikelihood, positive target CE disabled, and hard wrong-token competition. The focused transformer test verifies that restricted target-set mass and weakest target-share balance can both improve in a small branch batch. The full-stack screen completed 50/50 direct steps and restored from step 0: training snapshots improved QA average target rank to 10.0, but target-token coverage again collapsed to 0.0 with wrong "a" top-1 predictions. This rejects batch-local target sharing as a sufficient eval-wide anti-collapse repair.
runs/transformer-answer-v0.58-fullstack-target-replay-coverage-smoke-dim4-context80/ extends the repair from batch-local target sharing to closed-world replay targets with branch-balanced-target-replay-coverage-unlikelihood, positive target CE disabled, and hard wrong-token competition. The focused transformer test verifies that replay target-set mass and weakest missing-target share can both improve when the sampled branch batch omits some admitted pool targets. The full-stack screen completed 50/50 direct steps and restored from step 0: training snapshots improved QA average target rank as far as 6.875 and top-5 coverage to 0.5, but target-token coverage still hit 0.0 during training and QA/heldout top-1 predictions collapsed to wrong "n" by step 50. This rejects pool-owned replay coverage as a sufficient context-specific target-ownership repair.
runs/transformer-answer-v0.59-fullstack-context-replay-coverage-smoke-dim4-context80/ makes replay context-owned with branch-balanced-context-replay-coverage-unlikelihood, positive target CE disabled, and hard wrong-token competition. The focused transformer test verifies that replay target-set mass and weakest owned-target share can both improve on fixed replay contexts. The full-stack screen completed 50/50 direct steps and restored from step 0: training snapshots improved QA average target rank as far as 7.375, QA top-3 to 0.375, QA top-5 to 0.5, and admissions top-5 to 0.5208 by step 50, but target-token coverage still hit 0.0 during training and the diversity target failed 0/9. This rejects context-owned replay coverage as implemented.
runs/transformer-answer-v0.60-fullstack-context-replay-coverage-floor-metadata-smoke-dim4-context80/ adds a profile-wise target-token coverage floor to branch snapshot selection: rank/top-k gains are eligible only when every multi-target profile preserves its baseline coverage. Direct-answer JSONL snapshots now write branch_target_coverage_by_profile, and the focused transformer test rejects a rank-lifted candidate that regresses QA coverage. The clean full-stack screen completed 50/50 direct steps, wrote 7 JSONL rows, and restored from step 0: the baseline coverage floor remained visible in the final row (qa 0.25, heldout 0.25, admissions 0.1429, minimum profile 0.0714). This accepts the self-improvement gate repair while still rejecting the trained model behavior.
runs/transformer-answer-v0.61-fullstack-context-coverage-anchor-smoke-dim4-context80/ adds a covered-target anchor to context replay: replay branches whose own target is already top-1 receive extra target-vs-replay-target/hard-wrong pressure. The focused transformer test verifies that the anchor protects a covered branch better than identical replay training without the anchor. The full-stack screen completed 50/50 direct steps and restored from step 0 under the v0.60 coverage floor, but trained snapshots over-anchored the already-covered wrong "i" token: QA/heldout predicted diversity fell to 1/8, target-token coverage to 0.125, and average target rank above 21. This rejects global covered-target anchoring as implemented.
runs/transformer-answer-v0.62-fullstack-target-balanced-anchor-smoke-dim4-context80/ makes covered-target anchoring target-balanced: anchor losses are averaged by covered target and skipped when only one covered target is present. The focused transformer test verifies that this singleton guard skips the v0.61 one-token over-anchor while the old global anchor still raises that token. The full-stack screen completed 50/50 direct steps and restored from step 0 under the v0.60 coverage floor. It avoided the hard "i" attractor, but QA/heldout target-token coverage still collapsed to 0.0 during training. This rejects target-balanced anchoring as sufficient.
runs/transformer-answer-v0.64-fullstack-coverage-deficit-smoke-dim4-context80/ adds branch-balanced-context-coverage-deficit-unlikelihood, which computes replay target tokens that are absent from the current replay predictions and adds target pressure only for those missing targets. The focused transformer test verifies that the deficit term lifts a missing replay target above the old context replay objective. The full-stack screen completed 50/50 direct steps and restored from step 0 under the v0.60 coverage floor. Step 50 cracked QA top-1 behavior enough to reach 1/8 branch accuracy and predicted diversity 4/8, but QA/heldout target-token coverage regressed to 0.125, so the trained snapshots remained ineligible. This rejects deficit pressure by itself.
runs/transformer-answer-v0.65-fullstack-coverage-preserving-deficit-smoke-dim4-context80/ adds branch-balanced-context-coverage-preserving-deficit-unlikelihood, which balances missing-target deficit pressure with preservation anchors for target tokens currently represented in replay predictions. Focused tests pass and verify both effects in isolation. The full-stack screen completed 50/50 direct steps and restored from step 0. Step 50 improved QA average target rank to 7.75, heldout average target rank to 7.125, and top-5 coverage to 0.5, but both profiles collapsed to one predicted target token with target-token coverage 0.125. This rejects current-prediction preservation as implemented.
runs/transformer-answer-v0.67-profile-aware-replay-plan-smoke-dim4-context80/ adds profile-aware replay records and direct_answer_replay_plan.json for the preserving-deficit path. Focused tests verify that global target coverage cannot hide a profile-local missing target and that profiled replay records keep their admitted source keys even when branch target tokens are shared. The bounded smoke wrote a plan for 9144 branch/replay records across 21 profiles, passed the branch-context gate across 219/219 semantic records, and showed profile-specific coverage floors such as qa:place at 0.5 and qa:color at 0.0. It ran one branch-only direct step and restored from step 0; branch diversity still failed 0/9 multi-target profiles. This is mechanics-readiness evidence, not model-quality promotion evidence.
runs/transformer-answer-v0.68-fullstack-profile-aware-preserving-deficit-smoke-dim4-context80/ spends that replay plan on the comparable full-stack repair screen. The run completed 50/50 direct steps, wrote 7 direct-answer JSONL rows, passed the branch-context gate, and used a replay plan for 9144 branch records across 21 profiles. Training step 40 improved QA average target rank to 6.5 and top-5 coverage to 0.625; heldout average rank improved to 6.875 with top-5 coverage 0.5. Those rank gains came with QA/heldout target-token coverage regressing to 0.125 and predicted diversity collapsing to 1/8, so best-snapshot scoring restored step 0. This is rejected evidence.

v0.81 keeps the context-coverage audit, profile-wise coverage floor, and replay-plan artifact, then adds balanced profile target-share pressure inside the profile-local direct-answer objective. Focused tests verify the minority replay target gains more share than under the previous profile-aware replay loss. v0.82 then screens that objective in runs/transformer-answer-v0.82-fullstack-profile-target-share-smoke-dim4-context80/. The run records the modern artifact stack and passes the verifier, branch-context, purity, and coverage-preservation gates, but it still fails branch diversity. Step 40 improves QA average target rank to 9.125, yet does so by collapsing QA and heldout to one "c" prediction with 0.0 target-token coverage. Best-snapshot scoring restores step 0, so this is rejected evidence.

v0.83 adds prompt-specific sibling-target ownership margins on top of that profile target-share objective. Focused tests show the new term lifts a context-specific target more than v0.82 target-share pressure. The full screen in runs/transformer-answer-v0.83-fullstack-prompt-ownership-smoke-dim4-context80/ writes the modern artifacts and passes the verifier, branch-context, and purity gates. It still fails branch diversity: step 50 improves QA average target rank to 8.625, but QA and heldout collapse to one "c" prediction with 0.0 target-token coverage during training. Best-snapshot scoring restores step 0.

v0.84 anchors replay preservation to the baseline profile-aware replay predictions captured before direct-answer training. Focused tests show replay batches can use those baseline prediction overrides and that anchored preservation protects a covered target better than following current prediction drift. The full screen in runs/transformer-answer-v0.84-fullstack-baseline-anchored-prompt-ownership-smoke-dim4-context80/ records 562 active baseline prediction anchors, passes the verifier, branch-context, and purity gates, and avoids the v0.83 zero-coverage collapse. Step 40 improves QA average target rank to 8.0, but QA and heldout still collapse to one "i" prediction with target-token coverage 0.125, below the baseline 0.25 floor. Best-snapshot scoring restores step 0, so the next repair must preserve the full baseline target-token floor.

v0.85 adds a baseline-floor update guard around the baseline-anchored prompt-ownership mode. The full screen in runs/transformer-answer-v0.85-fullstack-baseline-floor-gated-prompt-ownership-smoke-dim4-context80/ records 562 active baseline prediction anchors and checks 50/50 attempted direct-answer updates. The guard rejects all 50 unsafe updates, preserving QA and heldout target-token coverage at the baseline 0.25 floor in every recorded snapshot. It is still rejected evidence: no weight update is accepted and branch diversity still fails across all 9 multi-target profiles.

v0.86 adds adaptive retries around that update guard. The full screen in runs/transformer-answer-v0.86-fullstack-baseline-floor-adaptive-prompt-ownership-smoke-dim4-context80/ records 562 active baseline prediction anchors and attempts 200 scaled updates across 50 checked direct-answer steps. Scales 1.0, 0.25, 0.05, and 0.01 all still violate at least one profile-wise baseline coverage floor, so the guard rejects 200/200 attempts. It is still rejected evidence: step-size retry alone does not produce accepted safe updates.

v0.87 adds one bounded baseline-covered anchor repair after each unsafe adaptive retry. The clean full screen in runs/transformer-answer-v0.87-fullstack-baseline-floor-repaired-prompt-ownership-clean-smoke-dim4-context80/ records 562 active baseline prediction anchors, 227 repair anchors, 200 repair attempts, and 200/200 rejected update attempts. QA and heldout coverage remain at 0.25, but no repaired update is accepted, so post-update repair is also rejected as the missing mechanic.

v0.88 moves balanced baseline-floor anchors into the direct-answer objective itself. The full screen in runs/transformer-answer-v0.88-fullstack-baseline-floor-objective-prompt-ownership-smoke-dim4-context80/ records 562 active baseline prediction anchors, 227 objective-side floor anchors, 200 objective anchor batches, 2400 anchor records, and 200/200 rejected update attempts. QA and heldout coverage remain at 0.25, but no objective-shaped update is accepted, so branch-pressure coupling is rejected as the missing mechanic.

v0.89 isolates baseline-floor stabilization updates. The full screen in runs/transformer-answer-v0.89-fullstack-baseline-floor-stabilization-smoke-dim4-context80/ records 562 active baseline prediction anchors, 227 stabilization anchors, 200 stabilization anchor batches, 2400 anchor records, and 200/200 rejected update attempts. QA and heldout coverage remain at 0.25, but no stabilization-only update is accepted, so the next repair should diagnose why floor-only updates still violate the baseline floor.

v0.90 adds the missing rejection diagnosis. The full screen in runs/transformer-answer-v0.90-fullstack-baseline-floor-stabilization-diagnostics-smoke-dim4-context80/ records stabilization: 200 rejected update-shape counts, 50 rejected attempts at each adaptive scale, heldout: 200 violation counts, and a worst floor deficit of 0.25 on learning. Promotion still rejects the transformer, but the next repair now has measured profile-level floor evidence.

v0.91 applies that evidence by covering the full baseline-covered profile-target floor surface. The full screen in runs/transformer-answer-v0.91-fullstack-baseline-floor-profile-targeted-stabilization-smoke-dim4-context80/ records 227 floor anchors, 12 profile-target groups, profile_targeted_stabilization: 200 rejected attempts, and the same violation profile counts as v0.90. Promotion still rejects the transformer, and the next repair must change the floor repair shape rather than only broaden anchor coverage.

v0.92 changes that shape to sequential source-profile floor repair. The full screen in runs/transformer-answer-v0.92-fullstack-baseline-floor-sequential-profile-stabilization-smoke-dim4-context80/ records 10 source-profile groups, 2000 profile-local repair attempts, 2000 profile-local rejections, and 200 no-effective-update outer attempts. Promotion still rejects the transformer, and the next repair must isolate floor-preserving weight movement rather than only broaden coverage or reorder profiles.

v0.93 calibrates that movement below 0.01. The diagnostic screen in runs/transformer-answer-v0.93-baseline-floor-calibrated-sequential-profile-stabilization-step1-dim4-context80/ records calibrated scales down to 0.0001, 50 profile-local repair attempts, 49 profile-local rejections, and one accepted nonzero bridge:owner update at scale 0.0025. Promotion still rejects the transformer on branch_diversity_target, but the baseline floor guard has now accepted real weight movement.

v0.94 adds profile-scale memory to that calibrated path. The diagnostic screen in runs/transformer-answer-v0.94-baseline-floor-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/ records one accepted outer profile-scale update, 60 profile-scale attempts, 8 accepted source-profile updates, and 52 rejected profile-scale attempts. Promotion still rejects the transformer on branch_diversity_target, but safe floor-preserving movement now spans multiple source profiles.

v0.95 adds diversity-aware profile-scale memory to that path. The diagnostic screen in runs/transformer-answer-v0.95-baseline-floor-diversity-profile-scale-calibrated-sequential-stabilization-configured-step1-dim4-context80/ records one accepted outer diversity-aware profile-scale update, 58 profile-scale attempts, 5 score-improving accepted source-profile updates, 42 floor regressions, and 11 floor-preserving diversity-score regressions. Promotion still rejects the transformer on branch_diversity_target, but the training loop now records which safe movements are non-regressive for branch diversity.

v0.96 adds missing-target frontier anchors to that path. The diagnostic screen in runs/transformer-answer-v0.96-baseline-floor-diversity-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/ records 52 frontier anchors, one accepted outer frontier profile-scale update, 43 profile-scale attempts, 9 score-improving accepted source-profile updates, 28 floor regressions, and 6 floor-preserving diversity-score regressions. Promotion still rejects the transformer on branch_diversity_target, but max dominant predicted rate improves to 0.9 and minimum target-token coverage improves to 0.1667.

v0.97 adds coverage-frontier acceptance to that path. The diagnostic screen in runs/transformer-answer-v0.97-baseline-floor-diversity-coverage-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/ records one accepted outer coverage-frontier profile-scale update, 68 profile-scale attempts, 1 coverage-gaining accepted source-profile update, 50 floor regressions, 15 coverage ties, and 2 coverage regressions. Promotion still rejects the transformer on branch_diversity_target, but the update guard now records accepted coverage deltas and proves the strict monotonic screen is currently too conservative for full missing-target repair.

v0.98 adds coverage-prep frontier acceptance to that path. The diagnostic screen in runs/transformer-answer-v0.98-baseline-floor-diversity-coverage-prep-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/ records one accepted outer coverage-prep profile-scale update, 43 profile-scale attempts, 9 accepted source-profile updates, 3 coverage gains, 6 coverage-preparation moves, 28 floor regressions, 4 coverage ties without score gain, and 2 coverage regressions. Promotion still rejects the transformer on branch_diversity_target, but the update guard now separates direct coverage gains from safe preparation moves.

v0.99 adds coverage-recovery frontier retry to that path. The diagnostic screen in runs/transformer-answer-v0.99-baseline-floor-diversity-coverage-recovery-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/ records one accepted outer coverage-recovery profile-scale update, 54 profile-scale attempts, 6 accepted source-profile updates, 6 prepared recovery candidates, 15 recovery retries, 2 direct coverage recoveries, 4 preparation fallbacks, 38 floor regressions, 7 coverage ties without score gain, and 3 coverage regressions. Promotion still rejects the transformer on branch_diversity_target, but the guard now proves preparation can be tested as direct missing-target recovery before it is admitted as self-improvement evidence.

v0.100.0 adds branch-stable coverage-recovery acceptance to that path. The diagnostic screen in runs/transformer-answer-v0.100.0-baseline-floor-diversity-branch-stable-coverage-recovery-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/ records one accepted outer branch-stable recovery update, 54 profile-scale attempts, 6 accepted source-profile updates, 6 prepared recovery candidates, 15 branch-stability checks, 2 branch-stable coverage recoveries, 4 preparation fallbacks, 7 floor-regressed recovery retries, 5 coverage-tied retries, and 1 branch-score regression rejection. Promotion still rejects the transformer on branch_diversity_target, but the guard now proves recovery can be checked against the prepared branch-diversity score instead of coverage alone.

v0.101.0 adds branch-diversity recovery after already-safe profile updates. The diagnostic screen in runs/transformer-answer-v0.101.0-baseline-floor-diversity-branch-diversity-recovery-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/ records one accepted outer branch-diversity recovery update, 52 profile-scale attempts, 6 accepted source-profile updates, 6 branch-diversity recovery candidates, 9 branch-diversity recovery attempts, 5 branch-score-improving refinements, 1 fallback, 1 floor-regression rejection, 1 score-regression rejection, and 2 score-tie rejections. Promotion still rejects the transformer on branch_diversity_target, but the guard now proves local branch-diversity score can improve without weakening the coverage floor.

v0.102.0 adds collapsed-profile binding after branch-diversity recovery. The diagnostic screen in runs/transformer-answer-v0.102.0-baseline-floor-diversity-collapsed-profile-binding-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/ records one accepted outer collapsed-profile binding update, 54 profile-scale attempts, 11 accepted source-profile updates, 11 branch-diversity recovery candidates, 26 branch-diversity recovery attempts, 4 branch-score refinements, 31 collapsed-profile binding attempts, 1 accepted binding update, 10 binding fallbacks, 27 collapsed-profile ties, 1 floor-regression rejection, and 2 score-regression rejections. Promotion still rejects the transformer on branch_diversity_target, but the guard now proves a targeted binding update can survive while final collapse narrows from 9/9 eval profiles at baseline to 3/9 remaining collapsed profiles: learning, owner, and paraphrases.

v0.103.0 adds remaining-profile binding after collapsed-profile binding. The diagnostic screen in runs/transformer-answer-v0.103.0-baseline-floor-diversity-remaining-profile-binding-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/ records one accepted outer remaining-profile binding update, 56 profile-scale attempts, 11 accepted source-profile updates, 21 prioritized remaining-profile attempts, 6 prioritized acceptances, 15 prioritized rejections, 3 branch-diversity refinements, and 2 collapsed-profile binding updates. Promotion still rejects the transformer on branch_diversity_target, but the guard proves the remaining-profile curriculum can improve learning coverage from 0.0 to 0.25 without target coverage regression.

v0.104.0 adds owner/paraphrase residual binding after remaining-profile binding. The diagnostic screen in runs/transformer-answer-v0.104.0-baseline-floor-diversity-owner-paraphrase-binding-frontier-profile-scale-calibrated-sequential-stabilization-step1-dim4-context80/ records one accepted outer owner/paraphrase binding update, 16 owner/paraphrase-prioritized attempts, 6 prioritized acceptances, 10 prioritized rejections, 75 learning-preservation checks, 24 preservation failures, and 33 narrowed collapsed-profile binding rejections. Promotion still rejects the transformer on branch_diversity_target, but learning finishes non-collapsed with coverage 0.25 and predicted diversity 2.

v0.105.0 adds corpus-only retrieval memory as a separate evidence rail before weight consolidation. The diagnostic screen in runs/transformer-answer-v0.105.0-retrieval-memory-owner-paraphrase-frontier-profile-scale-step1-dim4-context80/ writes retrieval_memory_report.json, builds 497 memory cards from the closed-world corpus, answers 219/219 eval probes exactly, and records uses_external_model: false, external_embeddings: false, pretrained_retriever: false, and updates_weights: false. The neural transformer still rejects promotion on branch_diversity_target; v0.105.0 therefore proves immediate memory serving, not completed weight learning.

v0.106.0 adds memory-guided consolidation planning. The diagnostic screen in runs/transformer-answer-v0.106.0-memory-guided-consolidation-owner-paraphrase-frontier-profile-scale-step1-dim4-context80/ writes memory_consolidation_plan.json, keeps retrieval at 219/219, records 9 memory-backed neural failed profiles, and ranks owner, paraphrases, glossary, admission_paraphrases, and admissions as the top consolidation priorities. The collapsed memory-backed profiles are owner, paraphrases, and glossary; neural promotion still rejects on branch_diversity_target.

v0.107.0 adds gated memory-consolidation training. The diagnostic screen in runs/transformer-answer-v0.107.0-gated-memory-consolidation-owner-paraphrase-glossary-frontier-profile-scale-step1-dim4-context80/ consumes the v0.106.0 plan, targets owner, paraphrases, and glossary, records 26 memory-consolidation prioritized attempts with 8 acceptances and 18 rejections, and keeps retrieval at 219/219. The transformer still rejects neural promotion on branch_diversity_target, so this is plan-guided weight-consolidation evidence rather than promoted model evidence.

v0.108.0 expands that consolidation window. The diagnostic screen in runs/transformer-answer-v0.108.0-expanded-memory-consolidation-owner-paraphrase-heldout-qa-glossary-frontier-profile-scale-step1-dim4-context80/ consumes the v0.107.0 plan, targets owner, paraphrases, heldout, qa, and glossary, maps target-only profiles back to admitted source labels, and keeps retrieval at 219/219. Branch-diversity still blocks promotion, which means the next repair needs direct missing first-token diversity pressure.

v0.109.0 adds that direct missing first-token pressure. The diagnostic screen in runs/transformer-answer-v0.109.0-missing-first-token-memory-consolidation-owner-paraphrase-heldout-qa-glossary-frontier-profile-scale-step1-dim4-context80/ consumes the v0.108.0 plan, extracts missing first-token target maps for owner, paraphrases, heldout, qa, and glossary, and records 8 missing-token candidates, 22 attempts, 1 accepted guarded coverage-gain update, 21 rejections, and 7 fallback acceptances. Retrieval remains exact at 219/219; branch-diversity still blocks promotion, and the next plan narrows the collapsed memory-backed profiles to owner, paraphrases, and learning.

v0.110.0 makes that narrowed plan the explicit training contract. The diagnostic screen in runs/transformer-answer-v0.110.0-remaining-collapsed-missing-first-token-memory-consolidation-owner-paraphrase-learning-frontier-profile-scale-step1-dim4-context80/ consumes the v0.109.0 plan, requires source-plan collapsed_memory_backed_profiles, targets only owner, paraphrases, and learning, and records no unconsumed collapsed targets. Retrieval remains exact at 219/219; the missing-token phase records 6 candidates, 16 attempts, 1 accepted guarded coverage-gain update, 15 rejections, and 5 fallback acceptances. Branch-diversity still blocks promotion.

v0.111.0 makes that pressure profile-specific. The diagnostic screen in runs/transformer-answer-v0.111.0-profile-specific-missing-first-token-memory-consolidation-owner-paraphrase-learning-frontier-profile-scale-step1-dim4-context80/ consumes the v0.110.0 plan, keeps targets owner, paraphrases, and learning, and records the target map learning -> learning, owner -> owner/paraphrases, and color/place/training_data -> paraphrases. Retrieval remains exact at 219/219; memory-prioritized consolidation records 16 attempts with 6 acceptances and 10 rejections, and the missing-token phase records 6 candidates, 18 attempts, 0 direct missing-token acceptances, 18 rejections, and 6 fallbacks. The guard records 1 accepted profile-specific update shape, but branch-diversity still blocks promotion.

v0.112.0 adds branch-diversity root-cause diagnostics before another repair objective. The diagnostic screen in runs/transformer-answer-v0.112.0-branch-diversity-root-cause-profile-specific-memory-consolidation-step1-dim4-context80/ consumes the v0.111.0 plan, targets owner, paraphrases, and glossary, keeps retrieval exact at 219/219, records 24 profile-specific missing-token attempts with 0 direct acceptances and 8 fallbacks, and classifies the final branch-diversity failure as a critical target_routing_gap. The root-cause report records 9/9 failed profiles, 3 collapsed profiles, 1 zero-coverage profile, 6 buried-target profiles, and reused dominant tokens "n" and "a". Branch-diversity still blocks promotion.

v0.113.0 adds branch routing audit diagnostics to the same branch-only screen surface. The diagnostic screen in runs/transformer-answer-v0.113.0-branch-routing-audit-profile-specific-memory-consolidation-step1-dim4-context80/ consumes the v0.112.0 plan, targets owner, paraphrases, and learning, keeps retrieval exact at 219/219, records 18 profile-specific missing-token attempts with 0 direct acceptances and 6 fallbacks, and keeps branch-diversity as the blocker. The routing audit reports high output-bias escape risk ("n" bias rank 2), low representation separation across 9/9 multi-target profiles, and a glossary target-imbalance hotspot.

v0.114.0 adds logit-prior and centroid-separation instrumentation to the same screen surface. The diagnostic screen in runs/transformer-answer-v0.114.0-logit-prior-representation-instrumentation-profile-specific-memory-consolidation-step1-dim4-context80/ consumes the v0.113.0 plan, targets owner, paraphrases, and glossary, keeps retrieval exact at 219/219, records 24 profile-specific missing-token attempts with 0 direct acceptances and 8 fallbacks, and keeps branch-diversity as the blocker. The new logit-prior profiles report hidden-projection pressure across 9/9 multi-target profiles, while centroid separation remains poor.

v0.115.0 adds a bias-frozen hidden-projection margin candidate. The candidate screen in runs/transformer-answer-v0.115.0-hidden-projection-margin-candidate-step1-dim4-context80/ introduces branch-hidden-projection-margin-unlikelihood and tests one direct-answer step that compares target-token hidden * output_weight contributions directly. It lowers average collapsed-token hidden advantage from about 0.0842 to 0.0736, but promotion remains blocked before quality metrics: 10/11 constraints pass, branch_diversity_target fails, all 9/9 multi-target profiles still collapse to "n", and 2 profiles still have zero target-token coverage.

The v0.42 self-improvement run passed:

direct admission-probe audit
admission-paraphrase audit
glossary-probe audit
exact eval audit
promotion gate
forgetting audit against v0.41
protected prompt leakage audit
responder exact evals
learned answer classifier exact evals
generative answer decoder exact evals
rule-based self-diagnosis with no external model

Admission probes now pass 48/48 direct records and 84/84 paraphrase records. Glossary probes now pass 38/38 records. The passing attempt is archived at runs/self-improve-v0.42/attempts/attempt-001/ before the top-level latest report is updated.

v0.23 added attempt archives. A deliberately undertrained attempt failed at runs/self-improve-v0.23/attempts/attempt-001/, and the repaired passing attempt remains at runs/self-improve-v0.23/attempts/attempt-002/.

v0.42 Summary​

Unpromoted v0.43 Findings​

v0.42 Summary

Unpromoted v0.43 Findings