The science

Every weight, computed.
Not learned.

A Reverse Synthetic Network contains no loss function, no .backward(), no optimizer, and no gradient loop — verified by source inspection. Here is what it does instead, in plain language.

Fig. 2

Conventional training runs the river uphill

A standard network is an empty architecture filled with random numbers. Gradient descent then pushes information backward through it, millions of times, until the numbers fit. The intelligence arrives by erosion — expensive, slow, and opaque.

Fig. 3

Synthesis lets the data flow downhill

An RSN starts from the statistics the data already contains — co-occurrence, covariance, discriminant geometry, spectral structure — and computes the weights directly from them. One pass. Deterministic. The same corpus always grows the same model.

The four pillars

Neuron synthesis

Features are discovered, not designed.

Candidate features are scored by Fisher discriminant ratio; prototypes grow by recursive principal-direction splitting. The data decides what the network's neurons are.

Interconnection

Topology from statistics.

An 11-stage pipeline — similarity graphs, label propagation, spectral communities, hub/authority scores — computes how neurons connect and how much each one's vote counts.

Transformer synthesis

Every weight in closed form.

Embeddings from PPMI + SVD, attention heads from discriminant directions and cross-covariance, feed-forward maps from PCA — a complete transformer with no loss function and no gradient.

Convergence

Refinement without training.

Statistical re-weighting, spectral error correction, and an ensemble across embedding geometries — the layer that lifts 20 Newsgroups from 86.3% to 87.0%.

The mathematics, translated

Five ideas carry the whole system.

PPMI + SVD embeddings

Which words keep which company — compressed into coordinates. Mathematically equivalent to what word2vec learns by gradient descent (Levy & Goldberg, 2014), obtained here by pure linear algebra.

Fisher discriminant directions

The axes along which the classes separate most cleanly — used to score features and to point attention heads at what matters.

Cross-covariance attention

How positions in a sequence statistically co-vary at different distances — synthesized directly into attention weight matrices.

Spectral interconnection

Graph structure — communities, hubs, anomalies — computed from neuron similarity, deciding how much each neuron's vote counts.

Ensemble convergence

Several synthesized views of the same data, at different embedding geometries, combined by log-odds — the step that lifts 20 Newsgroups past the SVM.

Why the name

“Reverse Synthetic Network” — earned, word by word.

Reverse

The usual flow runs architecture → random weights → data nudges them by gradients. Here the flow is inverted: the data comes first and constructs the network. Nothing about the model precedes the data — not the features, not the neurons, not the connections.

Synthetic

Every component is synthesized in closed form — computed, not learned. In the digit engine, neurons are literally statistical condensates of training samples: each class is recursively split along its principal directions of variation, and each split's centroid becomes a neuron. That is why our demo can show you the neuron your drawing matched, as a picture.

Network

The synthesized neurons are then interconnected — similarity graphs, label propagation, spectral communities, hub scores — and each neuron's vote is weighted by that topology. A genuine network, with the caveat our own peer review insisted on: these are prototype neurons and statistical edges, not biological metaphors.

Honest footnote: the closest established relatives are prototype methods, extreme learning machines, and reservoir computing — but none of them synthesize allcomponents from data statistics. The name is our coinage, the mechanism it describes is verifiable in the code, and we keep it.

The dividing line

Why classification, and not generation.

Statistics of the kind synthesis can compute — co-occurrence, covariance, spectra — encode distributional similarity: which things resemble which. Classification is exactly a similarity question, which is why synthesis competes there.

Generation asks a different question — predictive composition: what follows from this context. Our research showed, across twelve experiments and a 100-million-token scaling study, that closed-form statistics cannot reach it; they plateau at the n-gram ceiling, ~1.9× the bits of a trained GPT-2. That structure is precisely what gradient descent discovers and closed-form spectra do not.

We regard publishing that boundary — with its mechanism — as part of the product. You should know exactly what you are buying, and exactly what nobody can sell you yet. See the full evidence →

Every weight, computed.Not learned.