Skip to main content

Documentation · v0.42 · Research prototype

Big idea.Tiny package.

QuarkLM is a closed-world language model: random weights, no pretrained tokenizer, no external embeddings, and learning only through the admitted corpus. These docs cover the model, build loop, operating discipline, and security boundary.

QuickstartRead the evidence$ python3 -m closed_world_lm.self_improve
Run
runs/self-improve-v0.42/
Admission probes
48/48 direct · 84/84 paraphrase · 38/38 glossary
Boundary
no pretrained weights · no external embeddings
Diagnosis
passed · no external model

Navigate

Four entry points into QuarkLM.

  1. 01Learn

    Concepts and product model

    Understand closed-world learning, the language model, the admitted dataset, and the current evidence.

    • vision
    • model
    • self-improvement
    • evidence
    Open Learn
  2. 02Build

    Run and extend QuarkLM

    Generate curriculum, train random weights, admit new facts, and add generated probes without crossing the purity boundary.

    • quickstart
    • admission
    • probes
    • commands
    Open Build
  3. 03Operate

    Promote releases with evidence

    Use RC readiness, self-improvement reports, forgetting audits, provenance snapshots, and docs freshness gates.

    • RC readiness
    • release gates
    • provenance
    • docs drift
    Open Operate
  4. 04Secure

    Keep the world closed

    Guard against pretrained weights, unledgered text, prompt leakage, and claims outside the corpus.

    • purity
    • leakage
    • unknowns
    • boundaries
    Open Secure

Pick a path

Where are you trying to go?

Curated paths for the most common moves in the prototype: understand the experiment, admit new memory, and promote evidence without drift.

  1. 01New to QuarkLM

    Read the model, then run a smoke cycle

    1. 01Language model
    2. 02Quickstart
    3. 03Current evidence
  2. 02Teaching a new fact

    Admit memory, generate probes, retrain weights

    1. 01Admission workflow
    2. 02Generated probes
    3. 03RC readiness
  3. 03Protecting the experiment

    Audit provenance, leakage, and docs freshness

    1. 01Purity boundary
    2. 02Prompt leakage
    3. 03Docs drift

Primitives

The loop is seven auditable objects.

  • corpus.ledger

    Ledger

    The explicit list of files allowed to influence training or evaluation.
  • admission.log

    Admitted memory

    Structured facts that become learnable only after admission.
  • probe.audit

    Generated probes

    Direct and paraphrase checks derived from the admitted-memory log.
  • weight.run

    Versioned weights

    Randomly initialized checkpoints promoted only with recorded metrics.
  • forgetting.audit

    Forgetting audit

    A comparison against the previous promoted report.
  • diagnosis.report

    Self-diagnosis

    Rule-based repair recommendations derived from the run report, with no external model.
  • verifier.report

    Closed-world verifier

    Deterministic approval for candidate checks and training plans, with no external model.
  • recipe.run

    Training recipe

    A reproducible record of model, tokenizer, data, objective, optimizer, artifacts, gates, and rerun details.
  • transformer.surface

    Transformer surfaces

    Experiment, artifact, trainer, and objective catalog boundaries for answer-training screens.
  • checkpoint.meta

    Checkpoint metadata

    Centralized transformer config, checkpoint identity, dataset metadata, and run metadata.
  • eval.report

    Eval report

    Checkpoint loading, probe scoring, sample JSONL, and eval JSON assembled through narrow surfaces.
  • rc.boundary

    RC boundary

    Research Prototype RC and Language Model RC stay separate until transformer promotion gates pass.
  • docs.release

    Docs gate

    README, docs, and marketing content updated with each release when they reference current state.

Eidetic Labs

Need the product story or the source?

The marketing page carries the concise product position. The repository and these docs remain the source of truth for commands, evidence, and release gates.