Skip to main content

Project Overview

QuarkLM is a closed-world language model research prototype. It asks whether a language model can grow from a tiny owned dataset while keeping every learning claim tied to admitted sources and promotion evidence.

This page carries the durable project orientation that used to make the README hard to scan. The README should stay short. The docs carry the model philosophy, evidence trail, operating rules, and release-candidate boundaries.

What QuarkLM Is

QuarkLM starts from a constrained world:

  • human-authored seed glossary, grammar, stories, self facts, and admitted memories;
  • deterministic curriculum generated from ledgered corpus files;
  • a character tokenizer trained only on admitted text;
  • tiny learned components initialized from random weights;
  • corpus-only retrieval memory;
  • a tiny decoder-only transformer initialized from random weights.

The project does not claim to be a useful assistant yet. It is a research system for testing whether self-improvement can remain bounded, inspectable, and honest about failures.

What QuarkLM Does Not Use

The current prototype does not use:

  • pretrained weights;
  • pretrained tokenizers;
  • external embeddings;
  • unledgered training text;
  • external model outputs as training authority.

Generated material can propose lessons, probes, or repairs, but it is not training data until it is verified against admitted sources and included in the ledgered curriculum.

Current Release Posture

QuarkLM separates release-candidate readiness into two tracks:

TrackCurrent posture
Research Prototype RCNear. The closed-world self-improvement system is reproducible, auditable, documented, and clear about what is not promoted.
Language Model RCNot ready. The from-scratch transformer still fails branch_diversity_target after v0.115.

The current promoted responder evidence is runs/self-improve-v0.42/. The latest transformer screen is runs/transformer-answer-v0.115.0-hidden-projection-margin-candidate-step1-dim4-context80/. That screen is diagnostic evidence, not promoted neural model evidence.

Use Release Candidate Readiness, RC_SPEC.md, RC_GAP_AUDIT.md, and RC_CHECKLIST.md before tagging or announcing an RC.

Where The Long Evidence Trail Lives

The historical version narrative belongs in docs, not in README:

TopicCanonical docs
Model philosophy and closed-world boundariesLanguage model
Learning lifecycleSelf-improvement loop
Paper-backed control matrixResearch grounding
Open-source architecture/mechanics comparisonOpen-source mechanics audit
Branch-diversity root cause and v0.115 evidenceBranch diversity research
Source-to-gap implementation trailResearch implementation map
Latest metrics and run historyCurrent evidence
Promotion, release, and docs-drift rulesOperate

Public Surfaces

QuarkLM has two public surfaces with separate hosts:

SurfaceHostTarget
Docusaurus docsRead the Docsdocs.quark-lm.eidetic-labs.com
Standalone marketing pageGitHub Pagesquark-lm.eidetic-labs.com

See sites/DEPLOYMENT.md for deployment details.

Repository Orientation

PathPurpose
corpus/Ledgered source files allowed to influence training or evaluation.
src/closed_world_lm/Curriculum, models, responder, retrieval memory, verifier, trainer, and eval surfaces.
tests/Regression coverage for core mechanics.
runs/Local run evidence and checkpoints; ignored by git.
sites/docs/Docusaurus source for Learn, Build, Operate, and Secure docs.
sites/marketing/Standalone marketing page source.
sites/shared/current-state.jsonShared state consumed by docs and marketing.

Verification

Use these commands before release-candidate packaging or upload prep:

PYTHONPATH=src python3 -m unittest discover -s tests
npm run sites:build
python3 -m json.tool sites/shared/current-state.json >/dev/null

For day-to-day local use, start with Quickstart.