Release Discipline
A promoted QuarkLM run must have:
- a named RC track when tagging or announcing a release candidate
- a versioned run directory
- baseline and final metrics
- responder, answer-model, and decoder evals
- generated admission-probe audit
- generated glossary-probe audit
- prompt leakage audit
- forgetting audit against the prior promoted run
- exact eval audit
- passing promotion gate
experiment_intent.jsonwith hypothesis, allowed data, planned artifacts, acceptance gates, failure criteria, and final decisioncorpus_hygiene.jsonwith source mixture, duplicate, train/eval overlap, candidate-ratio, and rare-profile evidencetraining_plan.jsonwith allowed data sources, scheduled example mixture, replay-plan status, and planned artifactstraining_recipe.jsonwith model, tokenizer, data, objective, optimizer, artifact, gate, replay, and rerun detailscandidate_quarantine.jsonwith candidate lifecycle state and proof that candidate records are not training data until admittedclosed_world_verifier.jsonwith deterministic pass/fail evidence for candidate checks and training-plan approvalconstraint_first_promotion.jsonwith proof that quality metrics were blocked until closed-world constraints passed- self-diagnosis with
uses_external_model: falseunless a future release explicitly admits and documents a different source - archived attempts under
attempts/attempt-###/ - corpus snapshot and diff
- docs updated for the current state
- RC spec, gap audit, and checklist reviewed for forbidden claims
New release identifiers use SemVer (Semantic Versioning) with
vMAJOR.MINOR.PATCH tags and matching run paths. The next release after
v0.99 is v0.100.0, not v1.00; the current line then advances as
v0.101.0, v0.102.0, and so on. Do not use XX.YY.ZZ placeholders in
release docs. v1.0.0 is reserved for a deliberate stable milestone.
Historical artifacts keep their existing names so provenance remains exact.
The release is not complete until the docs are complete. If a page references current eval counts, commands, run ids, hosting targets, or roadmap commitments, that page must move with the version.
closed_world_lm.self_improve answer-cycle should return failure when the
promotion gate fails. A report can remain useful evidence when it fails, but it
is not a promoted release.
closed_world_lm.self_diagnose can be run directly against a report to inspect
the recommended next action. The recommendation must come from report evidence,
not from an external model.
Architecture prototypes, such as the v0.24 transformer, can be documented as evidence only for the behavior they actually show. Lower language-model loss is not a reliable-answer claim until answer evals pass. From v0.71 onward, transformer answer-training screens also write experiment intent artifacts, but they close as rejected screen evidence until a dedicated transformer promotion gate exists.