General Compute Launches Inference Neocloud to Solve AI Chip Shortage and Data Center Bottlenecks

By admin | May 28, 2026 | 3 min read

The insatiable appetite for computers capable of running AI models continues to accelerate, but anyone in this business faces two major hurdles: securing the right chips and placing them in data centers where they can start generating revenue. General Compute, a new inference neocloud—a company that rents out AI processing power, specializing in the phase when models are actively running and responding to users rather than being trained—has answers to these questions that shed light on where the AI ecosystem is heading. These solutions helped it raise a $15 million seed round at a $60 million post-money valuation, led by FUSE VC with participation from Carya Venture Partners and Village Global Ventures.

First, there’s the question of the right chip. Demand for GPUs has skyrocketed, but it’s becoming conventional wisdom that they aren’t the most suitable chips for running AI models after training. The phase where a model actively generates responses has different computational requirements than training, and a new class of chips is being designed specifically for this purpose. Nvidia’s $20 billion Groq transaction in December and Cerebras’ $57 billion IPO last week point in this direction. With capacity strained at both companies, General Compute’s co-founders, CEO Finn Puklowski and CTO Jason Goodison, found another option. They’re turning to specialized chips built by SambaNova, an Intel-backed chipmaker focused on inference that has somewhat faded from Silicon Valley’s conversation. That may change when SambaNova releases its new chips this year. The architecture is more flexible and uses more memory to store context during inference calculations, and SambaNova claims it outperforms not only GPUs but also other specialized chips from companies like Groq or Cerebras. Puklowski says the new chips will generate 600 to 700 tokens per second, compared to about 250 tokens per second for GPUs.

General Compute has $300 million worth of the company’s SN50 chips on order and says it will be the first neocloud to deploy them. These chips also help solve the second big problem—where to put them—for General Compute: They are air-cooled, not water-cooled, and consume less power, so they can be installed in existing data center facilities without requiring new infrastructure investments. Puklowski is pursuing colocation deals—arrangements where General Compute installs its hardware in someone else’s facility—not just with data center providers, but also with crypto miners looking to repurpose their infrastructure as the cost of producing a bitcoin has often exceeded its price.

General Compute launched its cloud offering last week, claiming it is already the fastest at running MiniMax 2.7, a powerful open-source LLM. Joe Hasselmann, a venture investor who got in on the ground floor of the inference boom when he invested in Groq in 2021, launched a new fund this year called Evercrest Capital Partners, focused on the AI space, and made General Compute his first investment. Hasselmann sees parallels between SambaNova’s partnership with General Compute and Coreweave’s relationship with Nvidia—as well as the pairing of Groq’s chip-making with its former cloud offering. “They do need a healthy mix of customers that are going to put their chips in environments that are going to have high growth to them,” Hasselmann said. “As much as General Compute is making a bet on SambaNova, SambaNova is making a bet on General Compute.”

The question is what kind of computer architecture will capture the most value in the AI future. Inference clouds are implicit bets on a world of multiple models and agents, one where no single provider dominates and speed and cost of inference become the key competitive variables. Consider the $113 million Series B raised for OpenRouter this week, reflecting the company’s ability to offer customers access to multiple models in order to optimize their token spending. Speed matters in that calculation, for price, and for capability. Puklowski wants to turn hour-long workloads for coding agents into five- or ten-minute tasks, and make audio agents for customer service, which require faster inference to converse effectively, more economical.