Research & results
We publish our losses.
The RSN research program produced one genuine positive result, one rigorously characterized negative result, and an evaluation protocol that keeps the field honest. All three are below — with the numbers that go against us shown at the same size as the ones that don't.
Training-free classification is competitive.
| Benchmark | RSN — zero training | Trained baseline | Synthesis time | Verdict |
|---|---|---|---|---|
20 Newsgroups (4-class) 1,732 train / 1,126 test docs | 87.0% ± 0.5% ensemble, 9/10 seeds above baseline; 86.4% single-model | 86.1% TF-IDF + tuned linear SVM | ≈60 s CPU | ahead |
Topical text (5-class) 105 train / 45 test docs | 68.9% embedding-only synthesis | 60.0% TF-IDF + cosine | ≈15 s CPU | ahead |
Digit recognition (8×8) 1,437 train / 360 test images | 90.8% zero gradient steps | 96.4% MLP trained with backpropagation | < 1 s CPU | behind |
Read the loss row plainly: on 8×8 digits, a backpropagation-trained MLP remains 5.6 points ahead. RSN's digit result is reported as a zero-training achievement, not a victory. The 20 Newsgroups win is small (+0.9) but consistent — 9 of 10 seeds — and costs roughly a minute of CPU against a trained, tuned baseline.
Zero-training generation hits the n-gram ceiling.
Twelve closed-form enrichments — synthesized transformers, kernel methods, latent state models, retrieval, caches — and none beat a well-tuned n-gram. Measured fairly in bits-per-byte on identical bytes, against a GPT-2 we ran ourselves:
Could more data raise the ceiling? We scaled training data 33× — to the full WikiText-103 — and watched the curve flatten:
Scaling training data 33× lowers bits-per-byte by only 0.085 — the curve asymptotes near ~1.9, far above GPT-2's 1.04. The ceiling is representational, not data-limited.
Synthesis is a floor, not a shortcut to training.
Could synthesized weights at least warm-start a trained model? We ran the controlled ablation. Only the trivial unigram signal helped — the spectral embeddings actively hurt — and every trained variant stayed far above the n-gram floor that synthesis provides for free. We report this null result because the temptation to over-claim here is exactly what our field suffers from.
| Initialization (byte GRU, equal budget) | Final bits/byte |
|---|---|
| Random (control) | 2.74 |
| Output bias = KN unigram | 2.72 |
| Embeddings = PPMI+SVD | 2.81 |
| Both (full synthesis) | 2.83 |
| Pure RSN n-gram — zero gradient steps | 2.12 |
The claims ledger
What we claim. What we refuse to.
- ✓Training-free classification competitive with trained linear baselines — 87.0% on 20 Newsgroups vs a tuned SVM's 86.1%, in about a minute of CPU.
- ✓A clean, mechanistically explained negative result: closed-form synthesis of a text generator cannot exceed the n-gram ceiling.
- ✓An honest, tokenizer-agnostic evaluation protocol — bits-per-byte against a locally measured GPT-2.
- ✓Synthesis as a cheap, strong floor: a ~2.0 bits-per-byte language model for zero gradient steps and seconds of CPU.
- ✕Beating trained deep networks everywhere — on 8×8 digits RSN reaches 90.8% where a backprop MLP reaches 96.4%.
- ✕A zero-training GPT-2-grade generator — there is a real ~1.9× bits-per-byte gap that closed-form statistics do not close.
- ✕That an n-gram model is an LLM, or that synthesis accelerates the path to the trained frontier (our controlled ablation found no meaningful speed-up).
Sources: the RSN paper (Tables 1–6), the adversarial peer review, and the reproducible benchmark package — every number regenerable on a CPU.