We build a new AI inference accelerator with optimized computational blocks with integrated memory — delivering radically better tokens per watt.
Modern LLM inference is dominated by transformer matrix multiplications and the movement of weights and activations through memory. GPU-style processors are constrained by energy, HBM bandwidth, cooling, and chip-to-chip interconnects.
Most of the energy is spent moving data — not performing arithmetic. Conventional scaling alone is unlikely to deliver another 10× improvement in performance per watt.
Our A2I accelerator brings AI calculations next to memory — eliminating the data-movement tax that defines today's inference economics.
Memory islands purpose-built for storing and streaming LLM model weights, not generic system RAM.
Matrix-matrix operations execute next to where weights live — bandwidth requirements collapse.
Mixed-signal compute blocks inspired by how biological systems achieve work-per-watt orders of magnitude better than digital ASICs.
Flash and DRAM tiles laid out to feed compute blocks at the right cadence — no idle bandwidth, no wasted joules.
Tiled arrays — each computing a shard of LLM MoE layers — stitched together by digital control for accuracy, memory, routing, and programmability.
Mixed-signal multipliers running at just 0.1 W per array, performing the matrix-matrix operations that dominate transformer inference.
Model weights stored in Flash or DRAM, matrix ops adjacent, and a small CPU handling per-layer control flow — all co-packaged.
Tiled arrays each compute a shard of LLM MoE layers — scaling out across chiplets the way models scale out across experts.
Islands of 16 MB memory, 256×256 AI compute arrays, RISC-V CPU controller — engineered around what LLM inference actually needs.
Three commercial stages — each independently financeable, each carrying real customer value to the next.
NRE from strategic partners, government and defense grants, pilot fees, joint development agreements.
Simulation stack, test chip, benchmark results.
Sell accelerator modules and cards, support contracts, software runtime license.
Rack-compatible inference module that runs selected LLM blocks efficiently.
IP licensing, per-wafer royalty, per-module royalty, joint-branded product.
Memory manufacturer integrates our optimized compute blocks and controller with their memory technology.
Five sequenced phases, each unlocking the next — from feasibility study to volume product, licensing, or joint venture.
NRE from strategic partner
Memory macro / memory die
Memory + compute integration
Production-class module
Product, licensing, joint venture
We're talking to memory manufacturers, hyperscalers, defense partners, and strategic investors. If you build, deploy, or buy LLM inference at scale — let's talk.