Historical Evidence Archive
This page preserves older run evidence that used to live in GOAL.md.
GOAL.md is now the durable goal contract. Current status, release-candidate
posture, and the latest transformer evidence live in
Current evidence, STATUS.md, and
sites/shared/current-state.json.
Early Learned Components
| Run | Archived signal |
|---|---|
runs/context64-v0.2/ | Character model validation NLL moved 3.4968 -> 2.6545; known QA target NLL moved 3.4979 -> 2.4155; held-out target NLL moved 3.4978 -> 2.5788; free-form exact remained 0. |
runs/answer-v0.1/ | Learned answer model moved QA, unknown, held-out, and paraphrase exactness from weak baselines to full exactness before stricter unseen-paraphrase tightening. |
runs/answer-v0.2/ | Learned answer model passed stricter unseen paraphrase probes: QA 8/8, unknown 4/4, held-out 8/8, paraphrase 8/8. |
runs/decoder-v0.2/ | Generative answer decoder moved from 0/8, 0/4, 0/8, 0/8 exactness to QA 8/8, unknown 4/4, held-out 8/8, paraphrase 8/8. |
Early Self-Improvement Runs
| Run | Archived signal |
|---|---|
runs/self-improve-v0.9/ | Stricter lesson split kept held-out facts out of exact held-out prompt training; prompt leakage audit passed; answer model and decoder passed QA, unknown, held-out, and paraphrase evals. |
runs/self-improve-v0.12/ | Added operational self and learning-admission concepts plus the first admitted memory event; answer model and decoder passed owner, self, learning, and admissions evals. |
runs/self-improve-v0.14/ | Expanded admitted memory log to two facts; admission probes expanded to 8; forgetting and prompt leakage audits passed. |
runs/self-improve-v0.16/ | Moved provenance code into closed_world_lm.provenance; wrote corpus snapshots and diffs; forgetting and leakage audits passed. |
runs/self-improve-v0.17/ | Generated admission probes from corpus/admissions.jsonl; probe sync passed with zero missing, extra, or mismatched ids. |
runs/self-improve-v0.18/ | Renamed the product to QuarkLM, added quark-lm-* script aliases, and generated admission paraphrase probes. |
runs/self-improve-v0.19/ | Added glossary word stone and admitted learned-ivy-stone; direct probes reached 12/12, paraphrase probes 21/21, and bridge lessons protected held-out transfer. |
runs/self-improve-v0.20/ | Generated glossary probes from corpus/glossary.json; glossary probes passed 20/20; exact eval audit and promotion gate passed. |
runs/self-improve-v0.21/ | Added glossary words shell, coin, and drum; admitted three new memories; direct probes reached 24/24, paraphrase probes 42/42, glossary probes 26/26; rule-based self-diagnosis reported uses_external_model: false. |
runs/self-improve-v0.22/ | Expanded operational self facts and learning rules, added self-diagnosis corpus facts, and exposed the need to preserve failed-attempt evidence. |
runs/self-improve-v0.23/ | Attempt archives became part of the loop so failed gates remain preserved instead of being overwritten by repair attempts. |
runs/self-improve-v0.24/ | First transformer architecture work was kept separate from promoted responder evidence. |
runs/self-improve-v0.25/ through runs/self-improve-v0.42/ | Continued the promoted responder track while transformer screens stayed separate until neural promotion gates mature. Current promoted responder evidence remains runs/self-improve-v0.42/. |
Transformer Evidence Index
The transformer run history is now documented primarily in
Transformer, Provenance,
and Current evidence. The old GOAL.md evidence
section included these major phases:
| Phase | Representative runs | Archived signal |
|---|---|---|
| Architecture start | runs/transformer-v0.24/, runs/transformer-v0.25/ | Tiny decoder-only transformer from random weights using the corpus-trained character tokenizer. |
| Answer training start | runs/transformer-answer-v0.26/, runs/transformer-answer-v0.27/ | First transformer answer-training and faster eval-scoped candidate evaluator. |
| Choice/selector path | runs/transformer-answer-v0.28-choice-prefix-pilot/, runs/transformer-answer-v0.29-selector-fast/, runs/transformer-answer-v0.30-selector-emission/ | Candidate-selector evidence improved answer selection while raw greedy generation stayed weak. |
| Generator path | runs/transformer-answer-v0.31-generator-weighted-lr035-80k/ | No-candidate auxiliary generator moved exact generation from 0/219 -> 219/219; this remains generator evidence, not transformer greedy promotion. |
| Direct-answer repair | runs/transformer-answer-v0.32-direct-base-context32/ through runs/transformer-answer-v0.42-branch-repair-contrast50-dim8-context32/ | Direct-answer modes improved distributional metrics and candidate behavior but did not make raw greedy transformer answers reliable. |
| Branch diagnostics | runs/transformer-answer-v0.43-branch-profile-smoke-dim4-context16/ through runs/transformer-answer-v0.43-branch-diversity-target-smoke-dim4-context80/ | Branch profiles, context coverage, and branch-diversity targets exposed prompt-independent first-token collapse. |
| Representation screens | runs/transformer-answer-v0.43-context-mean-branch-batch-smoke-dim4-context16/ through runs/transformer-answer-v0.43-prompt-position-scale32-repcontrast50-smoke-dim4-context80/ | Context summaries, projections, prompt attention, prompt-position projections, and representation contrast moved measured surfaces but did not pass branch diversity. |
| Structure audit and pre-layer norm | STRUCTURE_AUDIT.md, runs/transformer-answer-v0.44-prelayernorm-repcontrast50-prompt-position-smoke-dim4-context80/, runs/transformer-answer-v0.44-target-balanced-prelayernorm-repcontrast50-prompt-position-smoke-dim4-context80/ | Open-source structure was studied as reference only; pre-layer-norm partially cracked non-QA collapse but remained rejected because formal branch-diversity gates failed. |
Archive Rule
Historical evidence should not drift back into GOAL.md or README. Add
version-specific detail to this page only when it is archival context. Add
current release evidence to Current evidence, shared
current state, and the relevant Build or Operate docs.