Provenance
QuarkLM records corpus provenance in each self-improvement report.
corpus_snapshot.json captures:
- ledger source ids
- file paths
- training permissions
- curriculum-generation permissions
- file hashes
- JSONL record counts
- admitted memory ids
corpus_diff.json compares the current snapshot to the previous promoted run.
This makes corpus changes visible next to weight changes and eval changes.
v0.18 added admission-paraphrase-probes-v0 as an eval-only source.
v0.19 changed the glossary, admitted-memory log, and generated admission probes
when learned-ivy-stone entered the corpus.
v0.20 added glossary-probes-v0 as an eval-only source and changed the
glossary source to include explicit glossary probe words.
v0.21 added admitted memories learned-noah-shell, learned-ava-coin, and
learned-omar-drum; changed admission, admission-paraphrase, glossary, and
glossary-probe sources; and kept grammar unchanged.
v0.22 changed grammar, self-probe, and learning-probe sources to teach self-diagnosis source, external-model-shaping policy, and report-evidence repair selection. Admissions and generated admission/glossary probes stayed unchanged.
v0.23 added attempt archives under attempts/attempt-###/, preserving a failed
undertrained attempt and a repaired passing attempt in the same run directory.
v0.24 kept corpus sources unchanged from v0.23, passed the archived
self-improvement run, and added separate transformer architecture evidence under
runs/transformer-v0.24/.
v0.25 added admitted memories learned-mia-ring, learned-leo-kite, and
learned-nina-bell; added glossary probe words ring, kite, and bell;
changed admission, admission-paraphrase, glossary, and glossary-probe sources;
and added transformer architecture evidence under runs/transformer-v0.25/.
v0.26 kept corpus sources unchanged from v0.25, passed the archived
self-improvement run, and added transformer answer-lesson evidence under
runs/transformer-answer-v0.26/.
v0.27 kept corpus sources unchanged from v0.26, passed the archived
self-improvement run, and added context-48 transformer answer evidence plus a
faster eval-scoped candidate evaluator under runs/transformer-answer-v0.27/.
v0.28 added admitted memories learned-sara-marble, learned-milo-spoon, and
learned-ruth-ribbon; added glossary probe words marble, spoon, and
ribbon; changed admission, admission-paraphrase, glossary, and glossary-probe
sources; preserved two failed self-improvement attempts before passing
attempt-003; and added transformer prefix-choice evidence under
runs/transformer-answer-v0.28-choice-prefix-pilot/.
v0.29 kept corpus sources unchanged from v0.28, passed the archived
self-improvement run on attempt-001, and added transformer answer-selector
evidence under runs/transformer-answer-v0.29-selector-fast/.
v0.30 kept corpus sources unchanged from v0.29, passed the archived
self-improvement run on attempt-001, and added transformer selector-emission
evidence under runs/transformer-answer-v0.30-selector-emission/.
v0.31 kept corpus sources unchanged from v0.30, passed the archived
self-improvement run on attempt-001, and added no-candidate transformer-guided
answer-generator evidence under
runs/transformer-answer-v0.31-generator-weighted-lr035-80k/.
v0.32 kept corpus sources unchanged from v0.31, passed the archived
self-improvement run on attempt-001, and added direct greedy transformer
answer-training evidence under
runs/transformer-answer-v0.32-direct-base-context32/. Direct transformer loss
improved, but strict raw greedy exact answers remained 0/219.
v0.33 kept corpus sources unchanged from v0.32, passed the archived
self-improvement run on attempt-001, and added first-error unlikelihood
transformer evidence under
runs/transformer-answer-v0.33-unlikelihood-context32/. Transformer-only
candidate accuracy improved to 37/219, but strict raw greedy exact answers
remained 0/219.
v0.34 kept corpus sources unchanged from v0.33, passed the archived
self-improvement run on attempt-001, and added staged rollout unlikelihood
transformer evidence under runs/transformer-answer-v0.34-staged-context32/.
Direct target loss improved and the repeated-output failure changed shape, but
strict raw greedy exact answers remained 0/219.
v0.35 kept corpus sources unchanged from v0.34, passed the archived
self-improvement run on attempt-001, and added periodic rollout unlikelihood
transformer evidence under
runs/transformer-answer-v0.35-periodic10-context32/. Candidate discrimination
returned to 37/219, but strict raw greedy exact answers remained 0/219.
v0.36 kept corpus sources unchanged from v0.35, passed the archived
self-improvement run on attempt-001, and added periodic early-stop
unlikelihood transformer evidence under
runs/transformer-answer-v0.36-periodic-earlystop10-context32/. Candidate
discrimination stayed at 37/219, answer-target NLL improved to 2.9311, and
strict raw greedy exact answers remained 0/219 with a repeated " a" loop.
v0.37 kept corpus sources unchanged from v0.36, passed the archived
self-improvement run on attempt-001, and added periodic repeat-loop
unlikelihood transformer evidence under
runs/transformer-answer-v0.37-periodic-repeat50-context32/. Candidate
discrimination stayed at 37/219, answer-target NLL improved to 2.9041, and
strict raw greedy exact answers remained 0/219 with a repeated " t" loop.
v0.38 kept corpus sources unchanged from v0.37, passed the archived
self-improvement run on attempt-001, and added periodic balanced repair
transformer evidence under
runs/transformer-answer-v0.38-periodic-balanced50-context32/. Candidate
discrimination stayed at 37/219, answer-target NLL improved to 2.8552, and
strict raw greedy exact answers remained 0/219 with a repeated " t" loop.
v0.39 kept corpus sources unchanged from v0.38, passed the archived
self-improvement run on attempt-001, rejected two generated-prefix recovery
pilots, and added periodic sequence-repair transformer evidence under
runs/transformer-answer-v0.39-periodic-sequence50-context32/. Candidate
discrimination stayed at 37/219, answer-target NLL improved to 2.8257, and
strict raw greedy exact answers remained 0/219 with a repeated " t" loop.
v0.40 kept corpus sources unchanged from v0.39, passed the archived
self-improvement run on attempt-001, rejected a loop-escape-only pilot, kept a
protected sequence-plus-loop pilot as non-selected evidence, and added branch
repair transformer evidence under runs/transformer-answer-v0.40-branch-context32/.
Candidate discrimination stayed at 37/219, answer-target NLL improved to
2.5427, and strict raw greedy exact answers remained 0/219 with a repeated
"ten" loop.
v0.41 kept corpus sources unchanged from v0.40, passed the archived
self-improvement run on attempt-001, rejected a full-dose branch-contrast
pilot, and added sparse branch-repair/contrast transformer evidence under
runs/transformer-answer-v0.41-branch-repair-contrast50-context32/. Candidate
discrimination stayed at 37/219, answer-target NLL improved to 2.4734, and
strict raw greedy exact answers remained 0/219 with a repeated "te"/"e"
loop.
v0.42 kept corpus sources unchanged from v0.41, passed the archived
self-improvement run on attempt-001, and widened the sparse branch-contrast
transformer evidence under
runs/transformer-answer-v0.42-branch-repair-contrast50-dim8-context32/.
Candidate discrimination stayed at 37/219, answer-target NLL improved to
2.4129, and strict raw greedy exact answers remained 0/219 with the short
wrong completion " te.".
Post-v0.42 unpromoted transformer work kept corpus sources unchanged and added
runtime, diagnosis, and rejected checkpoint evidence. The transformer forward
pass now computes only the final position used by the language-model head.
Answer-training artifacts now record prompt context-coverage metrics. The
hard-negative context-32 run
runs/transformer-answer-v0.43-hard-branch-contrast4-dim8-context32/ preserved
37/219 candidate discrimination but regressed loss, NLL, and greedy output.
The context-80 run
runs/transformer-answer-v0.43-branch-repair-contrast50-dim8-context80/
covered all semantic eval templates (219/219) but still trailed v0.42 on
direct loss and answer NLL. The 1500-step context-80 run reached 38/219
candidates but regressed other promotion metrics. Optional layer normalization
was added as a tested architecture flag, but the context-80 screen
runs/transformer-answer-v0.43-layernorm-screen-dim8-context80/ preserved only
37/219 candidates and regressed answer NLL with repeated " y"/"e" greedy
loops. Branch-span repair was added as a tested direct-answer policy, but
runs/transformer-answer-v0.43-branch-span3-screen-dim8-context32/ preserved
only 37/219 candidates and regressed answer NLL with a long "neeee" greedy
loop. Multi-layer transformer support was added as a tested architecture option,
but runs/transformer-answer-v0.43-two-layer-screen-dim8-context32/ was
interrupted before final direct-answer metrics because the full-block scalar
autograd path was too slow for the regular loop. The final stacked layer was
then optimized to compute only the last state with logit-equivalence coverage,
but runs/transformer-answer-v0.43-two-layer-finalopt-screen-dim8-context32/
was still interrupted before final metrics because intermediate full-state
training remains too expensive. A later bounded screen added top-layer-only
direct-answer updates for stacked transformers plus the explicit
--skip-post-direct-snapshot control:
runs/transformer-answer-v0.43-two-layer-toponly-skip-screen-dim8-context32/
completed, saved a checkpoint, recorded that the post-direct candidate snapshot
was skipped, improved direct-answer target loss from 3.5186 to 3.2436, and
still failed direct greedy exact at 0/219 with repeated "a" output. It is
runtime and training-loop evidence, not promotion evidence. Direct-answer
snapshots then gained branch-profile diagnostics under
runs/transformer-answer-v0.43-branch-profile-smoke-dim4-context16/, recording
the model's own branch-position logits, dominant predicted tokens, target-token
distribution, and target margin. The smoke profile showed QA branch accuracy
staying at 1/8 while dominant prediction moved from all "o" to all "y",
which is prompt-independent branch-collapse evidence. Branch-collapse repair
then used the dominant sampled branch token as the unlikelihood negative. The
full-dose smoke at
runs/transformer-answer-v0.43-branch-collapse-smoke-dim4-context16/ regressed
loss and moved collapse to all "a" predictions. The periodic smoke at
runs/transformer-answer-v0.43-periodic-branch-collapse-smoke-dim4-context16/
improved direct loss to 3.5157, but branch accuracy stayed 1/8 and the
dominant prediction moved to all "n". Branch-batch contrast then trained
several distinct target branches in one update. The full-dose smoke at
runs/transformer-answer-v0.43-branch-batch-smoke-dim4-context16/ improved
loss only slightly and moved collapse to all "y" predictions. The periodic
smoke at
runs/transformer-answer-v0.43-periodic-branch-batch-smoke-dim4-context16/
improved direct loss to 3.5248, but QA branch accuracy regressed to 0/8
and the dominant prediction moved to all "a". A representation-side
context-mean option was then added without changing corpus sources. The
branch-batch screen
runs/transformer-answer-v0.43-context-mean-branch-batch-smoke-dim4-context16/
improved direct loss to 3.5252, and the branch-repair screen
runs/transformer-answer-v0.43-context-mean-branch-repair-smoke-dim4-context16/
improved direct loss to 3.5310; both regressed QA branch accuracy to 0/8
and collapsed the dominant prediction to all "a". A learned context-projection
option followed, again without changing corpus sources. The branch-repair
screen
runs/transformer-answer-v0.43-context-projection-branch-repair-smoke-dim4-context16/
improved direct loss to 3.5217, and the branch-batch screen
runs/transformer-answer-v0.43-context-projection-branch-batch-smoke-dim4-context16/
improved direct loss to 3.5252; both moved their projection weights, regressed
QA branch accuracy to 0/8, and collapsed the dominant prediction to all
"a". A prompt-attention summary option followed, again without changing corpus
sources. The branch-repair screen
runs/transformer-answer-v0.43-prompt-attention-branch-repair-smoke-dim4-context16/
improved direct loss to 3.5217, and the branch-batch screen
runs/transformer-answer-v0.43-prompt-attention-branch-batch-smoke-dim4-context16/
improved direct loss to 3.5252; both moved their zero-initialized output
projection weights, regressed QA branch accuracy to 0/8, and collapsed the
dominant prediction to all "a". Branch-context coverage diagnostics were then
added to direct-answer snapshots without changing corpus sources. The context-16
screen
runs/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context16/
showed QA branch contexts had 0/8 semantic coverage and 4 ambiguous branch
windows. The context-32 screen
runs/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context32/
removed QA ambiguity but still had 0/8 semantic coverage. The context-80
screen
runs/transformer-answer-v0.43-branch-context-coverage-smoke-dim4-context80/
reached complete branch-context coverage across all eval sets (219/219) with
zero ambiguous branch contexts. The branch-context gate was then added as an
opt-in training guardrail, again without changing corpus sources. The context-16
gate screen
runs/transformer-answer-v0.43-branch-context-gate-smoke-dim4-context16/
failed the required gate and recorded actual_steps: 0 for 5 requested
direct-answer steps. The context-80 gate screen
runs/transformer-answer-v0.43-branch-context-gate-smoke-dim4-context80/
passed the required gate and recorded actual_steps: 1 for 1 requested
direct-answer step. Branch-only direct-answer snapshots were then added as an
explicit screening mode for longer-context repair runs, again without changing
corpus sources. The context-80 gated branch-only screen
runs/transformer-answer-v0.43-branch-context-gated-branchonly-smoke-dim4-context80/
passed the required gate across all 219/219 semantic records, ran all 5
requested direct-answer steps, and recorded skipped greedy evals while retaining
branch profiles and branch-context gate evidence. Two dim8 context-80
branch-only follow-up screens then tested the best prior sparse repair/contrast
policy and branch-batch contrast under complete branch context. The
repair/contrast screen
runs/transformer-answer-v0.43-branchonly-periodic-repair-contrast50-dim8-context80/
ran all 100 requested direct steps and lowered interval train loss, but final
QA branch prediction collapsed to all "a". The branch-batch screen
runs/transformer-answer-v0.43-branchonly-branch-batch-dim8-context80/ ran all
50 requested direct steps and lowered interval train loss further, but also
collapsed QA branch prediction to all "a". Branch diversity was then promoted
from narrative diagnosis to an explicit snapshot target, again without changing
corpus sources. The smoke run
runs/transformer-answer-v0.43-branch-diversity-target-smoke-dim4-context80/
passed the branch-context gate, ran all 5 requested direct steps, and recorded
branch_diversity_target failure across all 9 multi-target eval profiles.
The first diversity-aware training objective was then added as
branch-diversity-unlikelihood, still without changing corpus sources. The
context-80 smoke
runs/transformer-answer-v0.43-branch-diversity-train-smoke-dim4-context80/
passed the branch-context gate and ran all 10 requested direct steps, but the
diversity target still failed across all 9 multi-target eval profiles.
Output-bias freezing was then added as a direct-answer stabilizer, still without
changing corpus sources. The context-80 smoke
runs/transformer-answer-v0.43-branch-diversity-freezebias-smoke-dim4-context80/
passed the branch-context gate and ran all 50 requested direct steps with
--direct-answer-freeze-output-bias, but the diversity target still failed
across all 9 multi-target eval profiles.
A restricted branch-target softmax objective followed, still without changing
corpus sources. The context-80 smoke
runs/transformer-answer-v0.43-branch-target-softmax-freezebias-smoke-dim4-context80/
passed the branch-context gate, froze output bias, and ran all 50 requested
direct steps. It briefly raised QA predicted diversity to two tokens at step
20, but the final diversity target still failed across all 9 multi-target
eval profiles.
Best-branch-snapshot restoration followed, still without changing corpus
sources. The context-80 smoke
runs/transformer-answer-v0.43-branch-target-softmax-restorebest-smoke-dim4-context80/
passed the branch-context gate, froze output bias, ran all 50 requested direct
steps, and restored the final checkpoint from step 40; the final diversity
target still failed across all 9 multi-target eval profiles.
Prompt-prefix projection followed, still without changing corpus sources. The
context-80 smoke
runs/transformer-answer-v0.43-prompt-prefix-target-softmax-restorebest-smoke-dim4-context80/
passed the branch-context gate, moved all 20 prompt-prefix projection
parameters, and restored the final checkpoint from step 40; the final
diversity target still failed across all 9 multi-target eval profiles.
Prompt-position projection followed, still without changing corpus sources. The
context-80 smoke
runs/transformer-answer-v0.43-prompt-position-target-softmax-restorebest-smoke-dim4-context80/
passed the branch-context gate, moved 1108/1284 prompt-position projection
parameters, and restored the final checkpoint from step 40; the final
diversity target still failed across all 9 multi-target eval profiles.
A pairwise branch-target margin objective followed, still without changing
corpus sources. The prompt-position context-80 smoke
runs/transformer-answer-v0.43-branch-target-margin-prompt-position-smoke-dim4-context80/
passed the branch-context gate, ran all 50 direct steps, moved train loss
4.8973 -> 4.7784, moved 1108/1284 prompt-position projection parameters,
and restored the final checkpoint from step 40; the final diversity target
still failed across all 9 multi-target eval profiles.
Branch representation diagnostics and contrastive hidden-state training
followed, still without changing corpus sources. The high-weight
prompt-position context-80 smoke
runs/transformer-answer-v0.43-branch-representation-contrast50-prompt-position-smoke-dim4-context80/
recorded hidden-state distance profiles, ran all 50 direct steps with
--direct-answer-contrast-weight 50.0, and restored the final checkpoint from
step 40; the final diversity target still failed across all 9 multi-target
eval profiles.
A dim-8 capacity screen followed, still without changing corpus sources. The
completed 40-step prompt-position context-80 smoke
runs/transformer-answer-v0.43-branch-representation-contrast50-prompt-position-smoke-dim8-context80-steps40/
used embedding/feed-forward dimensions 8/16, restored the final checkpoint
from step 10, and increased measured QA hidden distance; the final diversity
target still failed across all 9 multi-target eval profiles.
Prompt-position scale screening followed, still without changing corpus
sources. The scale-32 context-80 smoke
runs/transformer-answer-v0.43-prompt-position-scale32-repcontrast50-smoke-dim4-context80/
ran all 50 direct steps, moved 1108/1284 prompt-position projection
parameters, restored the final checkpoint from step 40, and increased
restored QA hidden distance to about 0.01235; the final diversity target
still failed across all 9 multi-target eval profiles.
The next checkpoint records an engineering-only open-source structure audit in
STRUCTURE_AUDIT.md: QuarkLM may study model/trainer/tokenizer/checkpoint
patterns, but no external weights, tokenizer vocabularies, embeddings,
datasets, or training text enter the corpus or learned artifacts. The audit
selects an opt-in pre-layer-norm transformer block path with final
normalization as the next structural screen before another branch-loss repair.
That path was implemented and screened in
runs/transformer-answer-v0.44-prelayernorm-repcontrast50-prompt-position-smoke-dim4-context80/
without changing corpus sources. The run moved prompt-position and final-norm
parameters and cracked full collapse in 7/9 multi-target profiles, but QA and
heldout remained collapsed and the final diversity target still failed across
all 9 multi-target eval profiles.
A target-balanced branch-batch screen followed, still without changing corpus
sources. The run
runs/transformer-answer-v0.44-target-balanced-prelayernorm-repcontrast50-prompt-position-smoke-dim4-context80/
used target-bucket branch batches, ran all 50 direct steps, and restored the
final checkpoint to baseline step 0 because trained snapshots scored worse;
all 9/9 multi-target eval profiles remained collapsed to one global token.
None of these runs were promoted.