Branch Diversity Research
Last reviewed: 2026-06-15.
QuarkLM's branch-diversity problem is the current transformer bottleneck. Retrieval memory can serve the admitted corpus exactly, and guarded weight updates can be accepted locally, but the transformer still predicts too few branch tokens across multi-target profiles.
v0.115 Evidence
Candidate run:
runs/transformer-answer-v0.115.0-hidden-projection-margin-candidate-step1-dim4-context80/.
v0.115 adds branch-hidden-projection-margin-unlikelihood, a repair candidate
that compares target-token hidden * output_weight contributions without using
output bias as the margin surface. The screen runs one direct-answer step with
output bias frozen. It reduces average collapsed-token hidden advantage from
about 0.0842 to 0.0736, which supports hidden projection as a relevant
repair surface.
The candidate is still rejected for neural promotion. Constraint-first
promotion passes 10/11 constraints and fails branch_diversity_target; all
9/9 multi-target profiles still collapse to "n", 2 profiles keep zero
target-token coverage, and hidden-projection pressure remains primary across
9/9 profiles. The next repair must scale beyond a single branch batch while
preserving coverage and representation-separation gates.
v0.114 Evidence
Diagnostic run:
runs/transformer-answer-v0.114.0-logit-prior-representation-instrumentation-profile-specific-memory-consolidation-step1-dim4-context80/.
The run consumes the v0.113 memory-consolidation plan, targets owner,
paraphrases, and glossary, keeps retrieval exact at 219/219, records
24 profile-specific missing-token attempts with 0 direct missing-token
acceptances and 8 fallbacks, and remains rejected on
branch_diversity_target.
The root-cause diagnosis remains a critical target_routing_gap. The routing
audit still flags high output-bias escape risk ("n" bias rank 1, "a" bias
rank 3), but branch_logit_prior_profiles show the dominant-token wins are
driven by hidden-projection pressure across 9/9 multi-target profiles.
Centroid separation remains poor across the sampled profiles.
v0.113 Evidence
Diagnostic run:
runs/transformer-answer-v0.113.0-branch-routing-audit-profile-specific-memory-consolidation-step1-dim4-context80/.
The run consumes the v0.112 memory-consolidation plan, targets owner,
paraphrases, and learning, keeps retrieval exact at 219/219, records
18 profile-specific missing-token attempts with 0 direct missing-token
acceptances and 6 fallbacks, and remains rejected on
branch_diversity_target.
The root-cause diagnosis remains a critical target_routing_gap: 9/9
profiles fail, 3 remain collapsed, 1 has zero target-token coverage, and
6 have buried targets. The new branch_routing_audit narrows the next
mechanics target:
audit_hypothesis:routing_gap_requires_representation_and_logit_audit- output-bias escape risk:
high, with"n"at bias rank2 - prompt-to-branch representation separation: low across
9/9multi-target profiles - minimum different-target hidden distance: about
0.00077 - target imbalance hotspot:
glossary, with top target share0.6316
v0.112 Evidence
Diagnostic run:
runs/transformer-answer-v0.112.0-branch-diversity-root-cause-profile-specific-memory-consolidation-step1-dim4-context80/.
The run consumes the v0.111 memory-consolidation plan, targets owner,
paraphrases, and glossary, keeps retrieval exact at 219/219, records
24 profile-specific missing-token attempts with 0 direct missing-token
acceptances and 8 fallbacks, and remains rejected on
branch_diversity_target.
The new root-cause diagnostic classifies the final failure as
target_routing_gap with critical severity. It records 9/9 failed
profiles, 3 collapsed profiles, 1 zero-coverage profile, 6 buried-target
profiles, and reused dominant tokens: "n" across 5 profiles and "a"
across 4 profiles. The worst profile is paraphrases: 0.0 target-token
coverage, predicted_unique: 1, and average target rank 22.5.
What External Work Suggests
| Source | What others do | QuarkLM implication |
|---|---|---|
| The Curious Case of Neural Text Degeneration | Shows that common decoding choices can produce bland or repetitive language even from strong likelihood-trained models. | Decoding diversity is not enough. QuarkLM needs branch diversity in the learned distribution before promotion. |
| Neural Text Generation with Unlikelihood Training | Penalizes undesirable tokens or sequences during training. | QuarkLM's unlikelihood variants can move the collapse token, but v0.112 says routing remains broken. |
| Hugging Face generation configuration | Exposes repetition penalties, no-repeat n-gram controls, diversity penalties, sampling, beam settings, and other generation-time controls. | Sampling or penalties may become inference rails, but they cannot prove closed-world weight consolidation. |
| Hugging Face generation utilities | Exposes logits processors, processed score tensors, and optional hidden-state outputs for instrumentation. | v0.114 follows this diagnostic pattern by inspecting output-bias ranks, hidden projections, and prompt-to-branch hidden-state separation. |
| Diverse Beam Search | Adds diversity to beam decoding to avoid near-duplicate candidate outputs. | Useful later for candidate exploration; not a substitute for target-token coverage. |
| fairseq search mechanics | Implements search variants, including diversity-aware beam scoring. | Mature stacks keep search diversity separate from model learning, so QuarkLM should keep decoding diversity out of promotion claims. |
| Class-Balanced Loss | Reweights long-tailed classes using effective sample counts. | Audit profile/target imbalance before changing another objective. |
| Supervised Contrastive Learning | Separates representations by label. | Measure whether prompt states separate by branch target before adding output-head pressure. |
| OLMo and LLM360 | Release training code, data, checkpoints, evaluations, and intermediate artifacts. | Keep root-cause diagnostics and promotion decisions artifacted. |
| nanoGPT and minGPT | Use clean GPT mechanics, cross-entropy training, logits, validation loss, checkpointing, and sampling. | Structure references only; they do not directly solve QuarkLM's tiny closed-world branch gate. |
Taxonomy
v0.112 adds branch_diversity_target.root_cause, v0.113 adds
branch_routing_audit, and v0.114 adds branch_logit_prior_profiles:
| Hypothesis | Meaning |
|---|---|
global_output_prior_collapse | Multi-target profiles collapse to one shared dominant token. |
profile_local_prediction_collapse | Profiles collapse, but not to one shared token. |
target_routing_gap | At least one profile has zero target-token coverage. |
target_rank_burial | Correct targets are usually outside the top-k set. |
wrong_diversity_not_target_coverage | Predictions are diverse but miss the target tokens. |
mixed_branch_diversity_gap | Multiple weaker failure modes appear together. |
Decision
The next repair should instrument the route from prompt evidence to branch target before adding another branch objective:
- Target hidden-projection contributions that make dominant tokens beat missing target tokens.
- Compare prompt-to-branch hidden-state separation for failed profiles.
- Separate zero-coverage profiles from buried-target profiles.
- Add candidate construction and sampling diagnostics for profile/target imbalance.
- Require both
branch_diversity_target.root_causeandbranch_routing_auditto improve without relaxingbranch_diversity_target.