Tenstorrent’s Galaxy Blackhole: Can RISC-V Processors Expand Fast Inference Globally?

Tenstorrent Galaxy Blackhole

Tenstorrent has moved into volume production with its Galaxy Blackhole compute server, a unified AI compute platform that integrates tensor processors, RISC-V CPUs, near-compute memory, and 400G networking in a single box. Powered by the Blackhole chip, a 6nm tensor processor using GDDR6 RAM, direct-attach Ethernet networking, and air cooling, the platform aims to drive down costs and simplify scaling. Tenstorrent’s focus on generality, open standards, and record-setting AI inference and video generation benchmarks positions it as a credible challenger to incumbent architectures.

What is Covered in this Article

  • Tenstorrent’s Galaxy Blackhole system: hardware, software, and developer innovations
  • Record-setting AI inference and video generation performance
  • Open-source software stack and broad model compatibility
  • Strategic partnerships and global deployments

The News: Tenstorrent has announced general availability and volume production of its Galaxy Blackhole system, a server that tightly integrates SRAM, DRAM, compute, and networking to enable massive scaling. The company highlighted ‘supercluster 36,’ which links 36 Galaxy boxes into a single supercomputer. The system is powered by the Blackhole chip, a 6nm tensor processor designed for lower costs by using GDDR6 RAM, direct-attach Ethernet fabric, and air cooling. For developers, Tenstorrent introduced the TT-QuietBox 2,’ a compact, water-cooled unit with 128 GB of memory, quiet enough for home use. The company emphasized record-breaking AI inference and video generation, including DeepSeek running at 308 tokens per second per user (TSU) with a roadmap to 500 TSU at $6/million output tokens, and a world record in video generation with Prodia, producing a 2.2s video in just 2.4 seconds. Tenstorrent’s software stack is fully open source, with a 90% pass rate for running Hugging Face models, and supports PyTorch, TensorFlow, CUDA, ONNX, and Triton. Strategic partnerships with Equinix, Orion VM, and BetterBrain are enabling full-stack sovereign AI hubs, with deployments in Tokyo, Seattle, and India, as well as for high-frequency trading research.

Tenstorrent’s Galaxy Blackhole: Can RISC-V Processors Expand Fast Inference Globally?

Analyst Take: Tenstorrent’s Galaxy Blackhole system is a bold attempt to redefine AI compute infrastructure. By tightly integrating hardware and delivering a fully open-source software stack, Tenstorrent addresses key pain points, including networking bottlenecks, compiler headaches, and closed-source vendor lock-in. The company’s focus on generality, supporting 2.5 million open-source models and compiling from multiple frameworks, sets it apart from closed approaches that hill climb on frontier lab challenges. The company now represents a bet on the future of RISC-V processors to power a globally open innovation ecosystem built on open-source and sovereign AI models.

Hardware Advancements and Product Availability

Tenstorrent Galaxy is now in volume production, integrating SRAM, DRAM, compute, and networking for scaling to 36 server clusters. The Blackhole Supercluster configuration links 36 Galaxy boxes into a single domain, demonstrating the architecture’s scalability. The Black Hole chip, built on a 6nm process, uses GDDR6 RAM, direct-attach Ethernet networking, and air cooling to reduce the total cost of ownership (TCO). For developers, the ‘Quiet Box’ offers a compact, water-cooled unit with 128 GB of memory, quiet enough for home or office use. These advancements demonstrate a broader addressable market than other chip startups that have focused only on hyperscale deployments.

Record-Breaking Video Generation Speed

Tenstorrent has set new benchmarks for AI inference and video generation. The company demonstrated DeepSeek running at 308 tokens per second per user (TSU), with a 350 TSU version coming soon and a roadmap to 500 TSU. The total cost of ownership is highly competitive at $6 per million tokens. In partnership with Prodia, Tenstorrent achieved a world record by generating a 5-second video with Wan 2.2 in just 3.5 seconds per Artificial Analysis testing, 83% faster than the previous industry record of 20.9 seconds. These results point towards hill climbing on specialized content workloads that other silicon providers have not prioritized, yet may grow significantly as models improve.

Generality and a 100% Open-Source Software Stack

A major theme for Tenstorrent is generality. The Galaxy Blackhole system boasts a 90% pass rate for running models directly from Hugging Face, supporting roughly 2.5 million AI models. The software stack can compile models from PyTorch, TensorFlow, CUDA, ONNX, and even from PDFs of AI papers. The entire stack, including the TT-Forge compiler and the new Python-based TT-Lang domain-specific language, is 100% open source and available on GitHub. This approach lowers barriers for developers and enterprises, enabling rapid adoption and customization. The architecture uses the Tensix NEO cluster design for high performance-per-watt and flexible data movement.

Go-to-market via Sovereign AI

Tenstorrent is building a global ecosystem to follow the inference chip startup playbook of proving cost savings with sovereign customers before shipping to hyperscalers. The company announced a Sovereign AI partnership with Equinix (data centers), OrionVM (cloud orchestration), and BetterBrain (Agentic AI applications) to deliver a turnkey, secure, distributed AI platform for enterprise customers. Galaxy hardware is now deployed in at least five neocloud colocations, with flagship installations in Tokyo (the largest deployment by ai&), Cirrascale in Seattle, Turium AI in India for sovereign AI and image-as-a-service, and Virtu Financial for high-frequency trading research. These deployments show real-world traction and validate the platform’s readiness for sovereign AI.

Read the announcement on Tenstorrent’s website.

What to Watch

  • Will enterprises port their models to Galaxy Blackhole in Cirrascale and Equinix data centers as supply constraints and GPU integration headaches persist?
  • Can Tenstorrent’s open-source approach attract enough developer and ISV support to drive broad adoption?
  • Will AI-native customer case studies and internal benchmarks confirm the claimed performance and cost advantages?
  • What workloads will Cirrascale port to Tenstorrent compared to other fast inference providers like Cerebras?

Declaration of generative AI and AI-assisted technologies in the writing process: This content has been generated with the support of artificial intelligence technologies. Due to the fast pace of content creation and the continuous evolution of data and information, The Futurum Group and its analysts strive to ensure the accuracy and factual integrity of the information presented. However, the opinions and interpretations expressed in this content reflect those of the individual author/analyst. The Futurum Group makes no guarantees regarding the completeness, accuracy, or reliability of any information contained herein. Readers are encouraged to verify facts independently and consult relevant sources for further clarification.
Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.
Read the full Futurum Group Disclosure.

Other Insights from Futurum:

Can AMD’s Edge Silicon Scale to the Trillion Dollar Orbital Opportunity?

Arm AGI CPU Goes to Market via Supermicro and Verda at 2026 OCP EMEA Summit

Orbital Computing Can Reach $1 Trillion Addressable Market by 2030

Author Information

Brendan Burke, Research Director

Brendan is Research Director, Semiconductors, Supply Chain, and Emerging Tech. He advises clients on strategic initiatives and leads the Futurum Semiconductors Practice. He is an experienced tech industry analyst who has guided tech leaders in identifying market opportunities spanning edge processors, generative AI applications, and hyperscale data centers. 

Before joining Futurum, Brendan consulted with global AI leaders and served as a Senior Analyst in Emerging Technology Research at PitchBook. At PitchBook, he developed market intelligence tools for AI, highlighted by one of the industry’s most comprehensive AI semiconductor market landscapes encompassing both public and private companies. He has advised Fortune 100 tech giants, growth-stage innovators, global investors, and leading market research firms. Before PitchBook, he led research teams in tech investment banking and market research.

Brendan is based in Seattle, Washington. He has a Bachelor of Arts Degree from Amherst College.

Related Insights
Twilio Q1 FY 2026 Earnings Show Accelerating Voice and Messaging Demand
May 4, 2026

Twilio Q1 FY 2026 Earnings Show Accelerating Voice and Messaging Demand

Futurum Research reviews Twilio’s Q1 FY 2026 earnings, focusing on accelerating voice and messaging demand, growing multi-product adoption, and how AI-driven use cases are shaping Twilio’s platform direction....
Amazon Q1 FY 2026: AWS Momentum Builds as AI Infrastructure Spend Surges
May 4, 2026

Amazon Q1 FY 2026: AWS Momentum Builds as AI Infrastructure Spend Surges

Futurum Research analyzes Amazon’s Q1 FY 2026 earnings, focusing on AWS re-acceleration, custom silicon expansion, and agentic AI product moves shaping near-term spending and longer-term positioning....
Microsoft Q3 FY 2026 Earnings Show Cloud Growth, With Capacity Still Tight
May 4, 2026

Microsoft Q3 FY 2026 Earnings Show Cloud Growth, With Capacity Still Tight

Brad Shimmin and Futurum Research analyze Microsoft Q3 FY 2026 earnings, focusing on cloud demand, Azure capacity constraints, Copilot usage intensity, and the shift toward user plus usage commercial models....
Agentic ERP Model
May 1, 2026

Can NetSuite’s Agentic ERP Model Survive the SaaS ‘Apocalypse’ and Win the Next AI Platform War?

Keith Kirkpatrick, Vice President & Research Director, Enterprise Software & Digital Workflows at Futurum, examines how NetSuite's agentic ERP model aims to deliver real AI ROI and counter the fragmenting...
Fusion Applications
May 1, 2026

Oracle Bets on Outcome-Driven AI Agents, But Will Enterprises Buy the Vision?

Keith Kirkpatrick, Vice President & Research Director, Enterprise Software & Di at Futurum, examines Oracle's pivot toward AI agents embedded in Fusion Applications, analyzing enterprise demand for measurable business value,...
Marketplace Integration
May 1, 2026

Assessing Ingram Micro’s Q1 2026: Cyclical Growth or Structural Channel Shift?

Ingram Micro's Q1 2026 results show distributors must shift from logistics to marketplace orchestrators or risk disintermediation as CIOs consolidate platforms and adopt AI....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.