Research & results

We publish our losses.

The RSN research program produced one genuine positive result, one rigorously characterized negative result, and an evaluation protocol that keeps the field honest. All three are below — with the numbers that go against us shown at the same size as the ones that don't.

Result 1 · positive

Training-free classification is competitive.

Benchmark	RSN — zero training	Trained baseline	Synthesis time	Verdict
20 Newsgroups (4-class) 1,732 train / 1,126 test docs	87.0% ± 0.5% ensemble, 9/10 seeds above baseline; 86.4% single-model	86.1% TF-IDF + tuned linear SVM	≈60 s CPU	ahead
Topical text (5-class) 105 train / 45 test docs	68.9% embedding-only synthesis	60.0% TF-IDF + cosine	≈15 s CPU	ahead
Digit recognition (8×8) 1,437 train / 360 test images	90.8% zero gradient steps	96.4% MLP trained with backpropagation	< 1 s CPU	behind

Read the loss row plainly: on 8×8 digits, a backpropagation-trained MLP remains 5.6 points ahead. RSN's digit result is reported as a zero-training achievement, not a victory. The 20 Newsgroups win is small (+0.9) but consistent — 9 of 10 seeds — and costs roughly a minute of CPU against a trained, tuned baseline.

Result 2 · negative — published on purpose

Zero-training generation hits the n-gram ceiling.

Twelve closed-form enrichments — synthesized transformers, kernel methods, latent state models, retrieval, caches — and none beat a well-tuned n-gram. Measured fairly in bits-per-byte on identical bytes, against a GPT-2 we ran ourselves:

RSN byte n-gram · order 4ours · zero training2.48 bits/byte

24k n-grams · none

RSN byte n-gram · order 6ours · zero training2.02 bits/byte

348k n-grams · none

RSN byte n-gram · order 8ours · zero training1.97 bits/byte

1.64M n-grams · none

GPT-2 small (measured locally)1.04 bits/byte

124M params · gradient descent

GPT-2 medium (measured locally)0.95 bits/byte

355M params · gradient descent

The 100-million-token stress test

Could more data raise the ceiling? We scaled training data 33× — to the full WikiText-103 — and watched the curve flatten:

1.989

16 MB

1.961

56 MB

1.929

152 MB

1.912

304 MB

1.904

531 MB (full WikiText-103)

Scaling training data 33× lowers bits-per-byte by only 0.085 — the curve asymptotes near ~1.9, far above GPT-2's 1.04. The ceiling is representational, not data-limited.

Result 3 · controlled null

Synthesis is a floor, not a shortcut to training.

Could synthesized weights at least warm-start a trained model? We ran the controlled ablation. Only the trivial unigram signal helped — the spectral embeddings actively hurt — and every trained variant stayed far above the n-gram floor that synthesis provides for free. We report this null result because the temptation to over-claim here is exactly what our field suffers from.

Initialization (byte GRU, equal budget)	Final bits/byte
Random (control)	2.74
Output bias = KN unigram	2.72
Embeddings = PPMI+SVD	2.81
Both (full synthesis)	2.83
Pure RSN n-gram — zero gradient steps	2.12

The claims ledger

What we claim. What we refuse to.

Claimed — supported by the numbers

✓Training-free classification competitive with trained linear baselines — 87.0% on 20 Newsgroups vs a tuned SVM's 86.1%, in about a minute of CPU.
✓A clean, mechanistically explained negative result: closed-form synthesis of a text generator cannot exceed the n-gram ceiling.
✓An honest, tokenizer-agnostic evaluation protocol — bits-per-byte against a locally measured GPT-2.
✓Synthesis as a cheap, strong floor: a ~2.0 bits-per-byte language model for zero gradient steps and seconds of CPU.

Not claimed — and we'll say it first

✕Beating trained deep networks everywhere — on 8×8 digits RSN reaches 90.8% where a backprop MLP reaches 96.4%.
✕A zero-training GPT-2-grade generator — there is a real ~1.9× bits-per-byte gap that closed-form statistics do not close.
✕That an n-gram model is an LLM, or that synthesis accelerates the path to the trained frontier (our controlled ablation found no meaningful speed-up).

Sources: the RSN paper (Tables 1–6), the adversarial peer review, and the reproducible benchmark package — every number regenerable on a CPU.