AI inference memory and compute - united!

VeloxMem
A new generation of AI
memory and compute.

We build a new AI inference accelerator with optimized computational blocks with integrated memory — delivering radically better tokens per watt.

25×Less energy vs. HBM
0.1 WPer multiplier array
256×256Compute array tile
// Problem

The limiting factor
in AI is power.

Modern LLM inference is dominated by transformer matrix multiplications and the movement of weights and activations through memory. GPU-style processors are constrained by energy, HBM bandwidth, cooling, and chip-to-chip interconnects.

Chatbot session
700 W
Human brain
20 W
Conclusion
There is a path. Not the same old GPU design.

Most of the energy is spent moving data — not performing arithmetic. Conventional scaling alone is unlikely to deliver another 10× improvement in performance per watt.

// Solution · A2I Accelerator

Compute, redesigned around the memory.

Our A2I accelerator brings AI calculations next to memory — eliminating the data-movement tax that defines today's inference economics.

25×
Lower energy per token (vs. HBM baseline)
01

Custom memory

Memory islands purpose-built for storing and streaming LLM model weights, not generic system RAM.

02

Near-memory compute

Matrix-matrix operations execute next to where weights live — bandwidth requirements collapse.

03

Brain-logic design

Mixed-signal compute blocks inspired by how biological systems achieve work-per-watt orders of magnitude better than digital ASICs.

04

Efficient storage

Flash and DRAM tiles laid out to feed compute blocks at the right cadence — no idle bandwidth, no wasted joules.

// Architecture · A2I Chiplet

Inside the A2I chiplet.

Tiled arrays — each computing a shard of LLM MoE layers — stitched together by digital control for accuracy, memory, routing, and programmability.

Optimized multiplier array

Mixed-signal multipliers running at just 0.1 W per array, performing the matrix-matrix operations that dominate transformer inference.

0.1 WMatrix · MatrixMixed-signal

Memory + Op + CPU on one chiplet

Model weights stored in Flash or DRAM, matrix ops adjacent, and a small CPU handling per-layer control flow — all co-packaged.

Flash / DRAMRISC-VCo-packaged

Tiled, sharded MoE

Tiled arrays each compute a shard of LLM MoE layers — scaling out across chiplets the way models scale out across experts.

MoE-nativeTile-scalableProgrammable
Analog AI compute array
// The MEM Scenario

AI memory,
not generic RAM.

Islands of 16 MB memory, 256×256 AI compute arrays, RISC-V CPU controller — engineered around what LLM inference actually needs.

Memory island
16 MB
Per tile · sized for transformer weight shards
Compute array
256 × 256
AI-native matrix-matrix unit
Bandwidth
Reduced
Local compute eliminates HBM pressure
Power vs. HBM
~25× less
Comparable workloads, same precision
// Business Model

A path from IP
to volume.

Three commercial stages — each independently financeable, each carrying real customer value to the next.

Stage 01

IP, prototype
& paid pilots

Revenue

NRE from strategic partners, government and defense grants, pilot fees, joint development agreements.

Deliverable

Simulation stack, test chip, benchmark results.

Stage 02

Inference
module

Revenue

Sell accelerator modules and cards, support contracts, software runtime license.

Deliverable

Rack-compatible inference module that runs selected LLM blocks efficiently.

Stage 03

Chiplet & IP
licensing

Revenue

IP licensing, per-wafer royalty, per-module royalty, joint-branded product.

Deliverable

Memory manufacturer integrates our optimized compute blocks and controller with their memory technology.

Memory-manufacturer partnership

Five sequenced phases, each unlocking the next — from feasibility study to volume product, licensing, or joint venture.

01

Joint feasibility

NRE from strategic partner

02

Test vehicle

Memory macro / memory die

03

Co-packaged prototype

Memory + compute integration

04

Customer pilot

Production-class module

05

Volume / JV

Product, licensing, joint venture

Tokens per watt is the new economics.

We're talking to memory manufacturers, hyperscalers, defense partners, and strategic investors. If you build, deploy, or buy LLM inference at scale — let's talk.