AI Model Battle: Engineering-led Competitions and Commercialization Challenges

2025-08-17 05:49:51

The "Battle of a Hundred Models" in the AI Field: An Engineering-Based Competition

Last month, the AI industry staged an "Animal Fight."

On one hand, there is the Llama model launched by Meta, which is favored by developers due to its open-source nature. After researching the Llama paper and code, the Japanese company NEC quickly developed a Japanese version of ChatGPT, solving the AI technology bottleneck for Japan.

The other party is a large model named Falcon. In May of this year, Falcon-40B was released, surpassing the Alpaca to reach the top of the open-source LLM rankings.

This ranking is produced by the open-source model community and provides a standard for evaluating LLM capabilities. The rankings primarily alternate between Llama and Falcon.

After the release of Llama 2, it temporarily led, but in early September, Falcon launched the 180B version and regained the top spot.

Interestingly, the developers of Falcon are not a tech company, but rather a technology research institute in the capital of the UAE. Government officials stated that they are participating in this competition to break the dominance of the leaders.

The day after the release of version 180B, the UAE AI Minister was selected for TIME magazine's "100 Most Influential People in AI" list, sharing the stage with "AI Godfather" Hinton, OpenAI's Altman, and others.

Today, the AI field has entered a stage of blossoming diversity. Countries and companies with certain financial resources are trying to create their own versions of ChatGPT. In the Gulf region, Saudi Arabia has just purchased over 3,000 H100 chips for domestic universities to be used for LLM training.

Some investors once complained: back then, they looked down on the innovation of internet business models, thinking there were no barriers. Unexpectedly, in the hard tech big model entrepreneurship, it is still a battle of hundreds of models.

How did the so-called high-difficulty hard technology evolve into a competition that everyone can participate in?

The Transformer algorithm has changed the game.

American startups, Chinese tech giants, and Middle Eastern oil tycoons can all engage in large model research and development thanks to the famous paper "Attention Is All You Need."

In 2017, eight Google scientists publicly released the Transformer algorithm in this paper. It is the third most cited paper in the history of AI, and the emergence of the Transformer has sparked the current wave of AI enthusiasm.

Currently, various large models, including the sensational GPT series, are built on the foundation of Transformers.

Previously, "teaching machines to read" has been recognized as an academic challenge. Unlike image recognition, human reading not only focuses on the current words and sentences but also understands them in context. Early neural networks struggled to handle long texts and could not comprehend the context.

In 2014, Google scientist Ilya made a breakthrough. He used Recurrent Neural Networks (RNN) to process natural language, significantly improving the performance of Google Translate. The RNN introduced a "recurrent design," allowing the neural network to understand context.

The emergence of RNN has sparked heated discussions in the academic community, and Transformer author Ashish Vaswani has also conducted in-depth research on it. However, developers soon discovered that RNN has serious flaws: it has low efficiency in sequential computation and struggles to handle a large number of parameters.

Since 2015, researchers such as Vaswani have been working on developing alternatives to RNNs, and the final result is the Transformer. Compared to RNNs, Transformers have two major innovations:

First, positional encoding is used to replace the recurrent design, enabling parallel computation, significantly improving training efficiency, and ushering AI into the era of large models.

Secondly, it has further enhanced the ability to understand context.

The Transformer has solved multiple technical challenges at once and gradually become the mainstream solution in the NLP field. Even the founder of RNN, Ilya, has turned to the Transformer camp.

It can be said that the Transformer is the cornerstone of all large models today, as it has transformed large models from theoretical research into engineering problems.

In 2019, OpenAI's GPT-2, developed based on Transformer, caused a sensation in the academic world. Google then launched the more powerful Meena, which surpassed GPT-2 simply by increasing training parameters and computing power. Transformer author Ashish Vaswani was deeply shocked by this and wrote a memo titled "Meena Devours the World."

The emergence of Transformers has slowed down the innovation speed of underlying algorithms in academia. Engineering factors such as data engineering, computing power scale, and model architecture have gradually become the key to AI competitions. Any tech company with a certain level of technical capability can develop large models.

Computer scientist Andrew Ng pointed out during a speech at Stanford University: "AI is a collection of tools, including supervised learning, unsupervised learning, reinforcement learning, and now generative artificial intelligence. These are all general-purpose technologies, similar to other general-purpose technologies like electricity and the internet."

OpenAI remains the leader in LLMs, but semiconductor analysis firms believe that the advantages of GPT-4 mainly come from engineering solutions. If open-sourced, competitors could quickly replicate it. The analyst expects that other large tech companies may soon be able to develop large models comparable to GPT-4 in performance.

Fragile Moat

Currently, the "Hundred Model War" has become an objective reality.

The report shows that as of July this year, the number of large models in China has reached 130, surpassing the 114 in the United States. Various myths and legends are no longer sufficient for domestic tech companies to use for naming.

Apart from China and the United States, other wealthy countries have also initially achieved "one country, one model": Japan and the UAE have their own models, the Indian government has developed Bhashini, and the South Korean internet company Naver has launched HyperClova X, among others.

This scene seems to take us back to the early days of the internet, an era when various capital sources were burning money to acquire land.

As mentioned earlier, the Transformer has turned large models into a pure engineering problem; as long as someone has money and computing power, they can develop it. However, a low entry barrier does not mean that everyone can become a giant in the AI era.

The "Animal Battle" mentioned at the beginning is a typical case: although Falcon is temporarily leading, it is hard to say how much impact it has had on Meta.

Companies open-source their own achievements, not only to share the technological benefits but also to encourage social intelligence. As various sectors continuously use and improve Llama, Meta can apply these results to its own products.

For open-source large models, an active developer community is the core competitive advantage.

Meta established its AI lab back in 2015 and determined an open-source approach. Zuckerberg is well aware of the importance of "maintaining good relations with the public."

In October, Meta also held the "AI Creator Incentive" event: developers using Llama 2 to solve social issues have the chance to receive a grant of $500,000.

Today, Meta's Llama series has become a benchmark for open-source LLMs.

As of early October, 8 out of the top 10 open-source LLMs in the rankings are developed based on Llama 2. On this platform alone, there are over 1500 LLMs using the Llama 2 open-source protocol.

Improving performance is certainly important, but there is still a significant gap between most LLMs and GPT-4.

For example, GPT-4 recently topped the AgentBench test leaderboard with a score of 4.41. AgentBench was jointly launched by several universities to evaluate the reasoning and decision-making capabilities of LLMs in multidimensional open environments.

The test results show that the second place, Claude, only scored 2.77 points, a significant gap. Those grand open-source LLMs mostly score around 1 point, which is less than 1/4 of GPT-4.

It is important to know that GPT-4 was released in March of this year, which is the result of over half a year of global peers trying to catch up. The gap is caused by OpenAI's high-level team of scientists and their long-term accumulated experience in LLM research.

In other words, the core capability of large models is not parameters, but ecological construction ( open-source ) or purely reasoning ability ( closed-source ).

As the open-source community becomes increasingly active, the performance of various LLMs may converge, as everyone uses similar model architectures and datasets.

Another more intuitive problem is: apart from Midjourney, it seems that no large model has been able to make a profit.

Anchor Points of Value

In August this year, an article titled "OpenAI may go bankrupt by the end of 2024" attracted attention. The main point of the article is that OpenAI is burning through cash too quickly.

The article mentions that since the development of ChatGPT, OpenAI's losses have rapidly expanded, with a loss of about $540 million in 2022, leaving it with no choice but to wait for Microsoft's investment.

Although the article title is exaggerated, it accurately reflects the current situation of large model providers: a serious imbalance between costs and revenues.

The high costs mean that currently only Nvidia is making big money from AI, with possibly Broadcom as well.

According to estimates from consulting firms, Nvidia sold over 300,000 H100 chips in the second quarter of this year. This is an efficient AI chip that technology companies and research institutions around the world are rushing to purchase. If these H100s were stacked together, they would weigh as much as 4.5 Boeing 747s.

NVIDIA's performance has soared, with a year-on-year revenue growth of 854%, shocking Wall Street. Currently, the H100 is being traded in the second-hand market for $40,000 to $50,000, while its cost is only about $3,000.

The high cost of computing power has become a hindrance to industry development. Institutions estimate that global technology companies are expected to spend $200 billion annually on large model infrastructure, while large models generate a maximum of $75 billion in revenue each year, resulting in a gap of at least $125 billion.

In addition, with a few exceptions, most software companies have not found a profitable model after significant investments. Even industry leaders like Microsoft and Adobe have not had a smooth journey.

The AI code generation tool GitHub Copilot, developed in collaboration between Microsoft and OpenAI, charges $10 per month; however, due to facility costs, Microsoft is losing $20 per month, with heavy users causing losses of up to $80. It can be inferred that the Microsoft 365 Copilot, priced at $30, may incur even greater losses.

Similarly, Adobe, which has just launched the Firefly AI tool, quickly introduced a points system to prevent users from overusing it and causing losses to the company. Once users exceed their monthly points, Adobe will reduce the speed of their services.

It is important to know that Microsoft and Adobe are already software giants with clear business models and a large number of paying users. Most large models with many parameters still have chatting as their primary application scenario.

It is undeniable that without OpenAI and ChatGPT, this AI revolution might not have happened. However, the value brought by training large models is still up for debate.

Moreover, as competition intensifies and the number of open-source models increases, pure large model providers may face greater pressure.

The success of the iPhone 4 was not due to the 45nm A4 processor, but because it could play Plants vs. Zombies and Angry Birds.

GPT-1.75%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

13 Likes