DeepSeek’s Large Language Models Cost More Than $1.5 Billion To Produce

A new report is shedding light on the production costs of one of the world’s leading LLMs.

DeepSeek made heads turn after it shared the latest V3 and R1 models which left many in shock. The fact that they were so superior in design and cost a fraction of the budget used by tech giants in the West was alarming to some.

The startup from China really went head to head with leading competitors in the West and that sparked serious discussions about whether or not spending billions on models was necessary by companies like OpenAI.

Now, we’re getting more insights about the real cost involved in producing DeepSeek’s large language models. The company shared on GitHub how only the training costs were released in December and that hit $5.6M. Many incorrectly assumed that this was the whole cost involved in designing the product.

In reality, $5.6M was restricted to pre-training expenses and excluded things such as maintenance, hardware, and operations. As per another report, the company invested more than $500 million to attain GPUs from Nvidia. So the total server CapEx was $1.6 billion and additional expenses hit $944M thanks to operating clusters.

The company is believed to only hire talent from local universities in the country. Anyone with promising performance and talent is given a salary of over $1.3M per year, which is huge when compared to what other Chinese tech firms pay.

The startup giant’s biggest advantage over the West is how fast it moves on ideas and puts them into reality. Similarly, the fact that it has its own data centers is another major benefit as it doesn’t rely on external parties to get tasks done. This gives rise to more experiments and innovation across the board.

One analysis from a recent study also brings to light how the firm’s models are fair against those produced in the West. The V3 model performed so much better than OpenAI’s GPT-4o in different ways. So many are not surprised to see the success of DeepMind which uses less computing to attain stronger capabilities.

It can reason and get the best results when compared to o1 from OpenAI which was launched late last year. As per Semianlaysis, the latest paradigm is more focused on aspects such as reasoning through synthetic data and reinforcement learning. This gives it quicker results at minimal cost.

Image: DIW-Aigen

Read next: Microsoft Leaves Users Vulnerable by Removing VPN Protection Layer for Defender
Previous Post Next Post