Siddha

Research & results

We publish our losses.

The RSN research program produced one genuine positive result, one rigorously characterized negative result, and an evaluation protocol that keeps the field honest. All three are below — with the numbers that go against us shown at the same size as the ones that don't.

Result 1 · positive

Training-free classification is competitive.

BenchmarkRSN — zero trainingTrained baselineSynthesis timeVerdict
20 Newsgroups (4-class)
1,732 train / 1,126 test docs
87.0% ± 0.5%
ensemble, 9/10 seeds above baseline; 86.4% single-model
86.1%
TF-IDF + tuned linear SVM
≈60 s CPUahead
Topical text (5-class)
105 train / 45 test docs
68.9%
embedding-only synthesis
60.0%
TF-IDF + cosine
≈15 s CPUahead
Digit recognition (8×8)
1,437 train / 360 test images
90.8%
zero gradient steps
96.4%
MLP trained with backpropagation
< 1 s CPUbehind

Read the loss row plainly: on 8×8 digits, a backpropagation-trained MLP remains 5.6 points ahead. RSN's digit result is reported as a zero-training achievement, not a victory. The 20 Newsgroups win is small (+0.9) but consistent — 9 of 10 seeds — and costs roughly a minute of CPU against a trained, tuned baseline.

Result 2 · negative — published on purpose

Zero-training generation hits the n-gram ceiling.

Twelve closed-form enrichments — synthesized transformers, kernel methods, latent state models, retrieval, caches — and none beat a well-tuned n-gram. Measured fairly in bits-per-byte on identical bytes, against a GPT-2 we ran ourselves:

RSN byte n-gram · order 4ours · zero training2.48 bits/byte
24k n-grams · none
RSN byte n-gram · order 6ours · zero training2.02 bits/byte
348k n-grams · none
RSN byte n-gram · order 8ours · zero training1.97 bits/byte
1.64M n-grams · none
GPT-2 small (measured locally)1.04 bits/byte
124M params · gradient descent
GPT-2 medium (measured locally)0.95 bits/byte
355M params · gradient descent
The 100-million-token stress test

Could more data raise the ceiling? We scaled training data 33× — to the full WikiText-103 — and watched the curve flatten:

1.989
16 MB
1.961
56 MB
1.929
152 MB
1.912
304 MB
1.904
531 MB (full WikiText-103)

Scaling training data 33× lowers bits-per-byte by only 0.085 — the curve asymptotes near ~1.9, far above GPT-2's 1.04. The ceiling is representational, not data-limited.

Result 3 · controlled null

Synthesis is a floor, not a shortcut to training.

Could synthesized weights at least warm-start a trained model? We ran the controlled ablation. Only the trivial unigram signal helped — the spectral embeddings actively hurt — and every trained variant stayed far above the n-gram floor that synthesis provides for free. We report this null result because the temptation to over-claim here is exactly what our field suffers from.

Initialization (byte GRU, equal budget)Final bits/byte
Random (control)2.74
Output bias = KN unigram2.72
Embeddings = PPMI+SVD2.81
Both (full synthesis)2.83
Pure RSN n-gram — zero gradient steps2.12

The claims ledger

What we claim. What we refuse to.

Claimed — supported by the numbers
  • Training-free classification competitive with trained linear baselines — 87.0% on 20 Newsgroups vs a tuned SVM's 86.1%, in about a minute of CPU.
  • A clean, mechanistically explained negative result: closed-form synthesis of a text generator cannot exceed the n-gram ceiling.
  • An honest, tokenizer-agnostic evaluation protocol — bits-per-byte against a locally measured GPT-2.
  • Synthesis as a cheap, strong floor: a ~2.0 bits-per-byte language model for zero gradient steps and seconds of CPU.
Not claimed — and we'll say it first
  • Beating trained deep networks everywhere — on 8×8 digits RSN reaches 90.8% where a backprop MLP reaches 96.4%.
  • A zero-training GPT-2-grade generator — there is a real ~1.9× bits-per-byte gap that closed-form statistics do not close.
  • That an n-gram model is an LLM, or that synthesis accelerates the path to the trained frontier (our controlled ablation found no meaningful speed-up).

Sources: the RSN paper (Tables 1–6), the adversarial peer review, and the reproducible benchmark package — every number regenerable on a CPU.