Tenstorrent’s Galaxy Blackhole: Can RISC-V Processors Expand Fast Inference Globally?

Tenstorrent Galaxy Blackhole

Tenstorrent has moved into volume production with its Galaxy Blackhole compute server, a unified AI compute platform that integrates tensor processors, RISC-V CPUs, near-compute memory, and 400G networking in a single box. Powered by the Blackhole chip, a 6nm tensor processor using GDDR6 RAM, direct-attach Ethernet networking, and air cooling, the platform aims to drive down costs and simplify scaling. Tenstorrent’s focus on generality, open standards, and record-setting AI inference and video generation benchmarks positions it as a credible challenger to incumbent architectures.

What is Covered in this Article

  • Tenstorrent’s Galaxy Blackhole system: hardware, software, and developer innovations
  • Record-setting AI inference and video generation performance
  • Open-source software stack and broad model compatibility
  • Strategic partnerships and global deployments

The News: Tenstorrent has announced general availability and volume production of its Galaxy Blackhole system, a server that tightly integrates SRAM, DRAM, compute, and networking to enable massive scaling. The company highlighted ‘supercluster 36,’ which links 36 Galaxy boxes into a single supercomputer. The system is powered by the Blackhole chip, a 6nm tensor processor designed for lower costs by using GDDR6 RAM, direct-attach Ethernet fabric, and air cooling. For developers, Tenstorrent introduced the TT-QuietBox 2,’ a compact, water-cooled unit with 128 GB of memory, quiet enough for home use. The company emphasized record-breaking AI inference and video generation, including DeepSeek running at 308 tokens per second per user (TSU) with a roadmap to 500 TSU at $6/million output tokens, and a world record in video generation with Prodia, producing a 2.2s video in just 2.4 seconds. Tenstorrent’s software stack is fully open source, with a 90% pass rate for running Hugging Face models, and supports PyTorch, TensorFlow, CUDA, ONNX, and Triton. Strategic partnerships with Equinix, Orion VM, and BetterBrain are enabling full-stack sovereign AI hubs, with deployments in Tokyo, Seattle, and India, as well as for high-frequency trading research.

Tenstorrent’s Galaxy Blackhole: Can RISC-V Processors Expand Fast Inference Globally?

Analyst Take: Tenstorrent’s Galaxy Blackhole system is a bold attempt to redefine AI compute infrastructure. By tightly integrating hardware and delivering a fully open-source software stack, Tenstorrent addresses key pain points, including networking bottlenecks, compiler headaches, and closed-source vendor lock-in. The company’s focus on generality, supporting 2.5 million open-source models and compiling from multiple frameworks, sets it apart from closed approaches that hill climb on frontier lab challenges. The company now represents a bet on the future of RISC-V processors to power a globally open innovation ecosystem built on open-source and sovereign AI models.

Hardware Advancements and Product Availability

Tenstorrent Galaxy is now in volume production, integrating SRAM, DRAM, compute, and networking for scaling to 36 server clusters. The Blackhole Supercluster configuration links 36 Galaxy boxes into a single domain, demonstrating the architecture’s scalability. The Black Hole chip, built on a 6nm process, uses GDDR6 RAM, direct-attach Ethernet networking, and air cooling to reduce the total cost of ownership (TCO). For developers, the ‘Quiet Box’ offers a compact, water-cooled unit with 128 GB of memory, quiet enough for home or office use. These advancements demonstrate a broader addressable market than other chip startups that have focused only on hyperscale deployments.

Record-Breaking Video Generation Speed

Tenstorrent has set new benchmarks for AI inference and video generation. The company demonstrated DeepSeek running at 308 tokens per second per user (TSU), with a 350 TSU version coming soon and a roadmap to 500 TSU. The total cost of ownership is highly competitive at $6 per million tokens. In partnership with Prodia, Tenstorrent achieved a world record by generating a 5-second video with Wan 2.2 in just 3.5 seconds per Artificial Analysis testing, 83% faster than the previous industry record of 20.9 seconds. These results point towards hill climbing on specialized content workloads that other silicon providers have not prioritized, yet may grow significantly as models improve.

Generality and a 100% Open-Source Software Stack

A major theme for Tenstorrent is generality. The Galaxy Blackhole system boasts a 90% pass rate for running models directly from Hugging Face, supporting roughly 2.5 million AI models. The software stack can compile models from PyTorch, TensorFlow, CUDA, ONNX, and even from PDFs of AI papers. The entire stack, including the TT-Forge compiler and the new Python-based TT-Lang domain-specific language, is 100% open source and available on GitHub. This approach lowers barriers for developers and enterprises, enabling rapid adoption and customization. The architecture uses the Tensix NEO cluster design for high performance-per-watt and flexible data movement.

Go-to-market via Sovereign AI

Tenstorrent is building a global ecosystem to follow the inference chip startup playbook of proving cost savings with sovereign customers before shipping to hyperscalers. The company announced a Sovereign AI partnership with Equinix (data centers), OrionVM (cloud orchestration), and BetterBrain (Agentic AI applications) to deliver a turnkey, secure, distributed AI platform for enterprise customers. Galaxy hardware is now deployed in at least five neocloud colocations, with flagship installations in Tokyo (the largest deployment by ai&), Cirrascale in Seattle, Turium AI in India for sovereign AI and image-as-a-service, and Virtu Financial for high-frequency trading research. These deployments show real-world traction and validate the platform’s readiness for sovereign AI.

Read the announcement on Tenstorrent’s website.

What to Watch

  • Will enterprises port their models to Galaxy Blackhole in Cirrascale and Equinix data centers as supply constraints and GPU integration headaches persist?
  • Can Tenstorrent’s open-source approach attract enough developer and ISV support to drive broad adoption?
  • Will AI-native customer case studies and internal benchmarks confirm the claimed performance and cost advantages?
  • What workloads will Cirrascale port to Tenstorrent compared to other fast inference providers like Cerebras?

Declaration of generative AI and AI-assisted technologies in the writing process: This content has been generated with the support of artificial intelligence technologies. Due to the fast pace of content creation and the continuous evolution of data and information, The Futurum Group and its analysts strive to ensure the accuracy and factual integrity of the information presented. However, the opinions and interpretations expressed in this content reflect those of the individual author/analyst. The Futurum Group makes no guarantees regarding the completeness, accuracy, or reliability of any information contained herein. Readers are encouraged to verify facts independently and consult relevant sources for further clarification.
Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.
Read the full Futurum Group Disclosure.

Other Insights from Futurum:

Can AMD’s Edge Silicon Scale to the Trillion Dollar Orbital Opportunity?

Arm AGI CPU Goes to Market via Supermicro and Verda at 2026 OCP EMEA Summit

Orbital Computing Can Reach $1 Trillion Addressable Market by 2030

Author Information

Brendan Burke, Research Director

Brendan is Research Director, Semiconductors, Supply Chain, and Emerging Tech. He advises clients on strategic initiatives and leads the Futurum Semiconductors Practice. He is an experienced tech industry analyst who has guided tech leaders in identifying market opportunities spanning edge processors, generative AI applications, and hyperscale data centers. 

Before joining Futurum, Brendan consulted with global AI leaders and served as a Senior Analyst in Emerging Technology Research at PitchBook. At PitchBook, he developed market intelligence tools for AI, highlighted by one of the industry’s most comprehensive AI semiconductor market landscapes encompassing both public and private companies. He has advised Fortune 100 tech giants, growth-stage innovators, global investors, and leading market research firms. Before PitchBook, he led research teams in tech investment banking and market research.

Brendan is based in Seattle, Washington. He has a Bachelor of Arts Degree from Amherst College.

Related Insights
Is AI Ready for Real Work, or Are Enterprises Still Stuck in Experimentation?
July 4, 2026

Is AI Ready for Real Work, or Are Enterprises Still Stuck in Experimentation?

Most enterprises claim advanced AI maturity, but lack governance and deployment strategies. Leading organizations are moving from experimentation to measurable AI impact....
Compliance as Code Is No Longer Optional: Why Manual Reviews Can’t Keep Up
July 4, 2026

Compliance as Code Is No Longer Optional: Why Manual Reviews Can’t Keep Up

Qodo's 'Compliance as Code' framework automates enterprise AI compliance through PR checks, solving the data privacy and security gaps that plague manual reviews at scale....
Databricks AI’s GPU Reliability Push Exposes Hidden Risks for Large-Scale Training
July 3, 2026

Databricks AI’s GPU Reliability Push Exposes Hidden Risks for Large-Scale Training

Databricks AI reveals critical GPU reliability challenges in distributed training environments. Silent slowdowns and numerical corruption pose greater risks than visible failures, threatening model quality and compute efficiency at enterprise...
AI Code Review Hits a Wall: Why Speed Without Trust Risks Engineering Chaos
July 3, 2026

AI Code Review Hits a Wall: Why Speed Without Trust Risks Engineering Chaos

A survey shows 94% of engineering leaders use agentic AI coding tools, but 55% struggle with reliability and hallucinations—revealing a critical gap between development speed and production quality....
Brave's Browser Containers Raise the Bar for Privacy and Workflow Flexibility
July 3, 2026

Brave’s Browser Containers Raise the Bar for Privacy and Workflow Flexibility

As AI platform adoption accelerates to $181.3B projected market size, Brave's v1.92 release introduces native browser containers addressing data privacy concerns for 52.6% of enterprise decision makers managing multi-cloud AI...
Is Self-Healing ITOps Ready to Replace Manual Incident Response?
July 3, 2026

Is Self-Healing ITOps Ready to Replace Manual Incident Response?

LogicMonitor's AI-driven ITOps framework combines root-cause analysis with governed automation to reduce alert fatigue and accelerate issue resolution, as agentic AI reshapes enterprise infrastructure management....

Book a Demo

Welcome

The vision behind everything in Futurum’s Custom Research practice is this: research should show you what is happening, what comes next, and what to do about it. It should be personal to each audience, easy for people to grasp, and structured so LLMs can reason over it accurately. And it should be fast and turnkey; you want answers now, not another project to carry for quarters.

Whether you are defining business, channel, or go-to-market strategy; evaluating vendors or justifying ROI; or commissioning research to fill an emerging market need, we have your back, with a program that answers your questions with the objectivity and credibility to drive real decisions.

To do it, we bring unmatched data to bear: Futurum research, surveys, and market projections; validated market feeds; ETR’s 15 years of insight from 10,000 technology decision-makers; G2’s buyer and user data; and what our analysts hear every day. Add leading primary collection, from AI-moderated voice interviews to surveys and analyst-led interviews, all turnkey, and every project comes out credible, nuanced, and actionable.

And we don’t just drop the results in your lap. For internal work, we provide analyst-led sessions, interactive dashboards, and a range of formats. For market-facing work, Futurum delivers turnkey activation and amplification that actually gets seen, by people and by LLMs, through our media and share of voice. This is research that moves decisions and markets.

We will meet you wherever you are, from a fast-turn brief to a multi-year program, and shape the work to your goals, timeline, and budget. The right program for your moment.

If any of this is useful, I would love to talk.

Benjamin Brown, VP Custom Research, Futurum Research

Benjamin Brown

VP, Custom Research · The Futurum Group

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.