Self-Improvement Loop
QuarkLM improves by changing the system that teaches and audits it:
new lesson -> corpus -> retrieval memory -> training candidates -> guarded weight update -> evaluation -> accepted or rejected
That lifecycle is the difference between QuarkLM and a conventional large-model workflow. A large pretrained model usually starts with world knowledge already encoded in its weights, then uses a smaller supervised or retrieval layer to shape behavior. QuarkLM starts with no world knowledge in weights. A lesson is first ledgered into the corpus, memory can answer it exactly, the trainer builds closed-world candidates from that evidence, and only guarded updates are allowed to modify weights.
Lifecycle Contract
| Step | System responsibility | Evidence artifact |
|---|---|---|
| New lesson | Receive a proposed fact, rule, probe, or repair with source context. | Candidate record or admission request. |
| Corpus | Admit only verified material into the ledgered closed world. | Ledger, corpus diff, curriculum manifest. |
| Retrieval memory | Make admitted knowledge answerable without weight movement. | Retrieval memory cards and exact retrieval evals. |
| Training candidates | Convert admitted sources and failure reports into bounded examples. | Training plan, replay plan, candidate quarantine, source map. |
| Guarded weight update | Apply only constrained pressure to random-initialized or closed-world checkpointed weights. | Update guard, accepted/rejected attempt records, checkpoint metadata. |
| Evaluation | Test current behavior before promotion is allowed. | Constraint-first promotion, forgetting audit, probe audits, branch metrics. |
| Accepted or rejected | Promote only passing evidence; keep failed runs as diagnostics. | Current-state docs, run report, release notes, archived attempt. |
This is why QuarkLM can say "I learned something new" only after the admission and evidence chain is visible. A retrieved answer means the corpus can serve the knowledge. A promoted guarded update means the learned model consolidated behavior from that knowledge without breaking the boundary.
Release loop:
- Admit or refine corpus data.
- Regenerate curriculum files and retrieval memory artifacts.
- Build training candidates from admitted sources and current failure reports.
- Train learned components from random initialization or a declared closed-world checkpoint.
- Evaluate responder, classifier, decoder, transformer, and retrieval memory.
- Audit generated probes, prompt leakage, provenance, forgetting, and exact eval coverage.
- Diagnose the report and name the next action without using an external model.
- Archive the attempt before updating the latest report pointer.
- Promote only when the promotion gate passes and docs are current.
Components
| Component | Role |
|---|---|
closed_world_lm.curriculum | Builds build/train.txt, build/valid.txt, and manifest data. |
closed_world_lm.respond | Reliable corpus-only responder used as a grounded rail. |
closed_world_lm.answer_model | Learned answer classifier trained from random softmax weights. |
closed_world_lm.answer_decoder | Generative answer decoder trained from random prompt-conditioned weights. |
closed_world_lm.transformer_char_model | Experimental decoder-only transformer trained from random weights on the corpus tokenizer. |
closed_world_lm.self_improve | Orchestrates training, evaluation, audits, and run reports. |
closed_world_lm.self_diagnose | Reads a run report and emits deterministic repair recommendations with uses_external_model: false. |
Promotion Rule
A run is not promoted because it completed. A run is promoted only when it preserves the purity boundary, records baseline and final metrics, passes the required audits, passes the recorded promotion gate, and updates the docs that describe current state.
The docs are part of the loop. If README, Docusaurus, or the marketing page references a current release, that surface must move with the release.
The current diagnosis layer is intentionally rule-based. It is not the final form of autonomous improvement, but it establishes the interface: QuarkLM should learn from its own reports, name what changed, and propose the next repair without another model shaping that decision.
Memory Before Consolidation
QuarkLM treats memory and weights as separate evidence rails. Retrieval memory can prove that admitted knowledge is available without pretending the neural model has learned it. Training candidates are the bridge: they decide what parts of retrieved or ledgered knowledge deserve pressure on the transformer. Guarded weight updates are accepted only if they improve the targeted profile while preserving closed-world constraints, prior coverage, and promotion gates.
Research Guardrails
The self-improvement loop is guided by continual learning, lifelong pretraining, replay, self-generated reasoning, and model-editing research, but QuarkLM applies those ideas under a stricter data boundary. A generated repair or lesson is only a candidate until a deterministic verifier accepts it against admitted sources. Weight updates still come from versioned corpus-derived curriculum, and every admitted batch must preserve prior accepted behavior through forgetting checks or replay.
See Research grounding for the current paper map and the design rules it implies.