Majestic Labs unveils Prometheus Server with up to 128 TB DRAM to address LLM memory bottleneck

News

6/1/2026, 4:31:53 PM

Majestic Labs is building Prometheus, a DRAM‑centric AI server that packs up to 128 TB of memory and aims to overcome the memory‑bound limits on large language model inference by increasing single‑node capacity and bandwidth.

Majestic Labs is developing Prometheus, a high‑capacity AI server designed to break the LLM “memory wall” by putting up to 128 terabytes of DRAM into a single node. That capacity — more than 60 times the memory footprint of NVIDIA’s DGX B300-targets the fundamental limit on token‑generation rates set by how fast model data can be read from memory, potentially changing how inference is scaled and deployed.

Prometheus departs from the current trend of stacking high‑bandwidth memory (HBM) close to processors and instead uses a unified, DRAM‑centric memory pool based on LPDDR6. Majestic combines commodity DRAM with a proprietary high‑speed memory interface made from miniature copper cables effective to about a meter. The company places custom memory‑aggregation chips beside the memory modules to fan out and coordinate access across many DRAM devices, creating a single large address space.

The architecture emphasizes both bandwidth and scale: Majestic says Prometheus can deliver up to 25.6 terabytes per second of memory bandwidth while enabling much larger single‑node memory capacity than typical GPU‑centered systems. By keeping larger models and longer context windows resident in one server’s memory, the design aims to reduce cross‑node communication and the operational complexity of sharding models across many machines.

Compute in Prometheus is handled by Majestic’s Ignite AI processor. Each server contains 12 Ignite chips that combine data‑center ARM application cores with RISC‑V vector and tensor cores on a single die sharing the same memory space. In that arrangement, ARM cores act as on‑chip hosts to orchestrate models while RISC‑V cores execute LLM workloads; Majestic has not published specific compute performance metrics yet.

Majestic frames Prometheus as a more economical alternative for growing models, arguing that GPU‑centered setups scale out compute but can remain memory‑constrained and thus over‑provision compute relative to memory needs. Co‑founder and president Sha Rabii contrasts the approaches, saying the DRAM‑first design better matches the resource profile of very large models.

The system is built for data‑center deployment: it is Open Compute Project compliant, supports modular, upgradeable memory, and up to four Prometheus servers can fit in a rack. Majestic expects a full rack to draw on the order of 120 kilowatts and uses cold‑plate liquid cooling for heat management. Prometheus also aims to lower migration friction for developers by supporting PyTorch, vLLM and OpenAI’s Triton inference frameworks without requiring code changes to existing models.

Sources

IEEE Spectrum AI · 6/1/2026

Replies (0)

No replies in this topic yet.

Back