COMPUTATIONAL RESEARCH · ACTIVEARCHITECTURE CONFIDENTIAL

Priscus
P1

A new class of language model — engineered for parameter efficiency and convergence speed.

At 431M parameters, our flagship configuration reaches 0.28 validation loss on a cross-domain held-out set — roughly 18× lower than comparable dense transformers trained on the same corpus and hardware. A compact configuration reaches competitive quality at under half the parameter count. The architecture, the training procedure, and the primitives that make this possible are held under NDA.

Visit priscus.aiSee the Numbers

Headline Results

What the architecture actually does

Every number below comes from a controlled training run on shared hardware and a common corpus. The measurement methodology, validation harness, and comparator runs are documented in full under NDA.

0.28

Best Validation Loss

Flagship configuration, held-out cross-domain validation

~18×

Lower Than Baselines

Versus comparable-scale dense transformers on the same corpus

~55%

Parameter Reduction

Equivalent quality at under half the parameter count of prior generation

351k

Tokens / Second

Peak training throughput on 4× H100 during full training

Full benchmark suite, validation methodology, and reproduction artifacts available under NDA.

Priscus P1 configurations

Three configurations trained on shared hardware and a common corpus. Each was measured on the same held-out cross-domain validation set.

Configuration	Params	Val Loss	Tokens / sec
Priscus P1 (flagship) Best-performing configuration, held-out validation	431M	0.28	334–351k
Priscus P1-XL Alternate configuration under evaluation	408M	0.78	113–149k
Priscus P1-mini Competitive quality at under half the parameter count	178M	0.80	311k

Validation loss vs industry baselines

Every model below was trained on the same hardware and the same corpus. Held-out cross-domain validation loss is a direct measure of how much learning each training token produced. Lower is dramatically better — and the gap is not subtle.

DDG · 431M

Priscus P1 (flagship)

0.28

val loss

Flagship configuration — baseline

DDG · 178M

Priscus P1-mini

0.80

val loss

~2.9× the flagship loss · still ~6× lower than industry baselines

Alibaba · 403M

Qwen 2.5 0.5B

5.00

val loss

~17.9× the flagship loss

Hugging Face · 363M

SmolLM2 360M

5.02

val loss

~17.9× the flagship loss

Reading the chart: every training token consumed by Priscus P1 produces roughly 18× the validation-loss reduction of the same token consumed by a comparable-scale dense transformer. The compact 178M configuration still outperforms 400M-class baselines by ~6× at under half the parameter count.

Comparisons run on shared hardware and a common corpus. Validation harness, comparator configuration, and vocabulary-normalization methodology documented in full under NDA.

Three Pillars

Why the numbers are what they are, without revealing how

The results on the page above are direct outcomes of three design principles. We can describe the principles. The mechanisms that implement them are held under NDA.

Learning Density

Every training token buys dramatically more validation-loss reduction than a comparable dense transformer on the same corpus. The separation is not marginal — it is multi-fold, and it holds across held-out domains.

Parameter Discipline

Reaching competitive quality at substantially lower parameter counts changes the economics of training and the reach of deployment. A configuration at under half the usual parameter count still outperforms larger prior-generation baselines.

Production Viability

Training throughput is on the order of production open-weight models, not a slow research curiosity. The design is engineered from the outset for real inference workloads, not only research-bench numbers.

WHY THE DESIGN IS CONFIDENTIAL

The results page is public. The architecture is not.

The design is protected as a trade secret. We are not disclosing how the architecture is structured, how training proceeds, or what the underlying primitives are. Serious partners — investors, strategic customers, acquirers — can arrange an NDA briefing via the waitlist below.

Research Waitlist

Get notified when we share more

Paper releases, technical briefings, early access, and investor conversations all flow through this list. No newsletter spam — we email only when there's a concrete milestone.

Back to DDG home See Citrus Greening research

PriscusP1

What the architecture actually does

Priscus P1 configurations

Validation loss vs industry baselines

Why the numbers are what they are, without revealing how

Learning Density

Parameter Discipline

Production Viability

The results page is public. The architecture is not.

Get notified when we share more

Priscus
P1