Priscus
P1
A new class of language model — engineered for parameter efficiency and convergence speed.
At 431M parameters, our flagship configuration reaches 0.28 validation loss on a cross-domain held-out set — roughly 18× lower than comparable dense transformers trained on the same corpus and hardware. A compact configuration reaches competitive quality at under half the parameter count. The architecture, the training procedure, and the primitives that make this possible are held under NDA.
What the architecture actually does
Every number below comes from a controlled training run on shared hardware and a common corpus. The measurement methodology, validation harness, and comparator runs are documented in full under NDA.
Full benchmark suite, validation methodology, and reproduction artifacts available under NDA.
Priscus P1 configurations
Three configurations trained on shared hardware and a common corpus. Each was measured on the same held-out cross-domain validation set.
| Configuration | Params | Val Loss |
|---|---|---|
Priscus P1 (flagship) Best-performing configuration, held-out validation | 431M | 0.28 |
Priscus P1-XL Alternate configuration under evaluation | 408M | 0.78 |
Priscus P1-mini Competitive quality at under half the parameter count | 178M | 0.80 |
Validation loss vs industry baselines
Every model below was trained on the same hardware and the same corpus. Held-out cross-domain validation loss is a direct measure of how much learning each training token produced. Lower is dramatically better — and the gap is not subtle.
Reading the chart: every training token consumed by Priscus P1 produces roughly 18× the validation-loss reduction of the same token consumed by a comparable-scale dense transformer. The compact 178M configuration still outperforms 400M-class baselines by ~6× at under half the parameter count.
Comparisons run on shared hardware and a common corpus. Validation harness, comparator configuration, and vocabulary-normalization methodology documented in full under NDA.
Why the numbers are what they are, without revealing how
The results on the page above are direct outcomes of three design principles. We can describe the principles. The mechanisms that implement them are held under NDA.
Learning Density
Every training token buys dramatically more validation-loss reduction than a comparable dense transformer on the same corpus. The separation is not marginal — it is multi-fold, and it holds across held-out domains.
Parameter Discipline
Reaching competitive quality at substantially lower parameter counts changes the economics of training and the reach of deployment. A configuration at under half the usual parameter count still outperforms larger prior-generation baselines.
Production Viability
Training throughput is on the order of production open-weight models, not a slow research curiosity. The design is engineered from the outset for real inference workloads, not only research-bench numbers.
The results page is public. The architecture is not.
The design is protected as a trade secret. We are not disclosing how the architecture is structured, how training proceeds, or what the underlying primitives are. Serious partners — investors, strategic customers, acquirers — can arrange an NDA briefing via the waitlist below.
Get notified when we share more
Paper releases, technical briefings, early access, and investor conversations all flow through this list. No newsletter spam — we email only when there's a concrete milestone.