🎉 [Gate 30 Million Milestone] Share Your Gate Moment & Win Exclusive Gifts!
Gate has surpassed 30M users worldwide — not just a number, but a journey we've built together.
Remember the thrill of opening your first account, or the Gate merch that’s been part of your daily life?
📸 Join the #MyGateMoment# campaign!
Share your story on Gate Square, and embrace the next 30 million together!
✅ How to Participate:
1️⃣ Post a photo or video with Gate elements
2️⃣ Add #MyGateMoment# and share your story, wishes, or thoughts
3️⃣ Share your post on Twitter (X) — top 10 views will get extra rewards!
👉
The Integration of AI and the Encryption Industry: How Deep Learning is Reshaping the Web3 Landscape
AI x Crypto: From Zero to Peak
Introduction
The recent development of the artificial intelligence industry is viewed by some as the Fourth Industrial Revolution. The emergence of large models has significantly improved efficiency across various industries, estimated to have increased work efficiency in the United States by about 20%. At the same time, the generalization capability brought by large models is regarded as a new software design paradigm, shifting from precise code in the past to more generalized large model frameworks embedded in software, capable of supporting a wider range of modality inputs and outputs. Deep learning technology has brought the fourth boom to the AI industry, and this trend has also influenced the cryptocurrency industry.
This report will detail the development history of the AI industry, the classification of technologies, and the impact of deep learning technology on the industry. It will analyze the current status and trends of the upstream and downstream development of the industrial chain, including GPU, cloud computing, data sources, and edge devices in deep learning. Furthermore, it will fundamentally explore the relationship between cryptocurrency and the AI industry, outlining the structure of the AI industrial chain related to cryptocurrency.
The Development History of the AI Industry
The AI industry started in the 1950s, and to realize the vision of artificial intelligence, academia and industry have developed various schools of thought for achieving artificial intelligence in different eras and disciplines.
Modern artificial intelligence technology mainly uses the term "machine learning", which is based on the concept of enabling machines to iteratively improve system performance on tasks by relying on data. The main steps involve feeding data into algorithms to train models, testing and deploying the models, and using the models to perform automated prediction tasks.
Currently, there are three main schools of thought in machine learning: connectionism, symbolism, and behaviorism, each mimicking the human nervous system, thought, and behavior, respectively. Currently, connectionism, represented by neural networks (also known as deep learning), is dominant. The main reason for this is that this architecture has one input layer and one output layer, but multiple hidden layers. Once the number of layers and the number of neurons (parameters) is sufficient, there are enough opportunities to fit complex general tasks. By inputting data, the parameters of the neurons can be continuously adjusted. After multiple data inputs, the neuron will reach an optimal state (parameters), which is also the origin of its "depth"—sufficient numbers of layers and neurons.
Deep learning technology based on neural networks has gone through multiple technical iterations and evolutions, from the earliest neural networks to feedforward neural networks, RNNs, CNNs, GANs, and finally evolving into modern large models like GPT, which use Transformer technology. Transformer technology is just one evolution direction of neural networks, which adds a converter to encode data from all modalities (such as audio, video, images, etc.) into corresponding numerical values for representation. This data is then input into the neural network, enabling the neural network to fit any type of data, thereby achieving multimodality.
The development of AI has gone through three technological waves: The first wave occurred in the 1960s, a decade after AI technology was proposed. This wave was driven by the development of symbolic technology, which addressed the issues of general natural language processing and human-computer dialogue. During this period, expert systems were born.
The second wave of AI technology occurred in 1997 when IBM's Deep Blue defeated chess champion Garry Kasparov 3.5:2.5, a victory regarded as a milestone for artificial intelligence.
The third wave of AI technology occurred in 2006. The three giants of deep learning, Yann LeCun, Geoffrey Hinton, and Yoshua Bengio, proposed the concept of deep learning, an algorithm based on artificial neural networks for representation learning of data. Subsequently, deep learning algorithms gradually evolved, from RNNs, GANs to Transformers and Stable Diffusion; these algorithms collectively shaped this third technological wave and marked the peak of connectionism.
Deep Learning Industry Chain
Currently, the large models used in language are all based on deep learning methods based on neural networks. Led by GPT, large models have created a wave of artificial intelligence enthusiasm, with a large number of players pouring into this field, and the market's demand for data and computing power has surged. This section mainly explores the industrial chain of deep learning algorithms, its upstream and downstream components, as well as the current situation and supply-demand relationships of the upstream and downstream, and future development.
Training of LLMs (Large Language Models) led by GPT based on Transformer technology is divided into three steps:
The first step is pre-training. By providing enough data pairs to the input layer, we find the optimal parameters for each neuron under the model. This process requires a large amount of data and is also the most computationally intensive.
Step two, fine-tuning. Provide a smaller batch of high-quality data for training to improve the output quality of the model.
Step three, reinforcement learning. Establish a "reward model" to determine whether the output of the large model is of high quality, used to automatically iterate the parameters of the large model.
In short, during the training process of large models, pre-training requires a very high amount of data, and the GPU computing power needed is also the highest; fine-tuning requires higher quality data to improve parameters; reinforcement learning can iteratively adjust parameters through a reward model to produce higher quality results.
The performance of large models is mainly determined by three factors: the number of parameters, the amount and quality of data, and computational power. These three factors collectively influence the quality of the results and the generalization ability of large models. Assuming the number of parameters is p and the amount of data is n (calculated in terms of the number of tokens), the required computational amount can be estimated using empirical rules, which can then be used to estimate the computational power needed for purchase and the training time.
Computing power is generally measured in Flops, which represents one floating-point operation. According to the rule of thumb, pre-training a large model requires about 6np Flops. Inference (the process of waiting for a large model to output after inputting data) takes about 2np Flops.
Early use of CPU chips for training provided computing power support, and later GPUs gradually replaced them, such as Nvidia's A100 and H100 chips. This is because GPUs can serve as dedicated computing units, significantly outperforming CPUs in terms of energy efficiency. GPUs mainly perform floating-point operations through Tensor Core modules. The Flops data under the chip's FP16/FP32 precision represents its main computing capability and is one of the main metrics for evaluating the chip.
Assuming the parameters of a large model, taking GPT-3 as an example, have 175 billion parameters and a data size of 180 billion tokens (approximately 570GB), then a single pre-training requires 6np Flops, which is about 3.1510^22 Flops. In terms of TFLOPS (Trillion FLOPs), this is approximately 3.1510^10 TFLOPS, which means that a single pre-training of GPT-3 on an SXM model chip would take about 584 days.
It can be seen that the vast amount of computation required for pre-training needs multiple advanced chips to work together. The number of parameters in GPT-4 is ten times that of GPT-3, which means that even if the amount of data remains unchanged, the number of chips to be purchased must also increase tenfold. The number of tokens in GPT-4 is 13 trillion, which is also ten times that of GPT-3, and ultimately GPT-4 may require more than 100 times the chip computing power.
In large model training, there are also issues with data storage. The memory space of GPUs is generally small (for example, A100 has 80GB), which cannot accommodate all data, so it is necessary to examine the bandwidth of the chips, that is, the data transfer speed from the hard drive to the memory. At the same time, due to the use of multiple GPU chips, the transfer rate between GPUs is also involved. Therefore, in many cases, the factors or costs that restrict model training practices are not necessarily the computing power of the chips; more often, it may be the bandwidth of the chips. Because data transfer is very slow, it will lead to an extension of the time to run the model, and the electricity costs will increase.
The deep learning industry chain mainly includes the following parts:
Hardware GPU Providers
Nvidia currently holds an absolute leading position in the AI GPU chip field. The academic community mainly uses consumer-grade GPUs (RTX series); the industrial sector mainly uses H100, A100, and other models for the commercialization of large models.
In 2023, Nvidia's cutting-edge H100 chip was immediately subscribed by multiple companies upon its release. The global demand for the H100 chip far exceeds supply, with its delivery cycle reaching as long as 52 weeks. In light of Nvidia's monopoly, Google has taken the lead, with Intel, Qualcomm, Microsoft, and Amazon jointly establishing the CUDA Alliance, hoping to collaboratively develop GPUs to break free from Nvidia's influence.
For ultra-large technology companies/cloud service providers/national laboratories, they often purchase thousands or tens of thousands of H100 chips to build HPC (High-Performance Computing Centers). By the end of 2023, the order quantity of H100 chips has exceeded 500,000.
Regarding Nvidia's chip supply, the news about the H200 has been released, and it is expected that the performance of the H200 will be twice that of the H100, while the B100 is scheduled for release at the end of 2024 or the beginning of 2025. Currently, the development of GPUs still follows Moore's Law, with performance doubling every two years and prices halving.
Cloud Service Provider
Cloud service providers can offer flexible computing power and managed training solutions for AI companies with limited funds after purchasing enough GPUs to build HPC. Currently, the market is mainly divided into three types of cloud computing power providers:
Training Data Source Providers
The training of large models mainly goes through three steps: pre-training, fine-tuning, and reinforcement learning. Pre-training requires a large amount of data, while fine-tuning requires high-quality data. Therefore, companies like Google, which is a search engine, and Reddit, which has high-quality conversational data, have received widespread market attention.
Some developers choose to focus on niche areas such as finance, healthcare, and chemistry to avoid competing with general-purpose large models, requiring specific domain data. Therefore, there are companies that provide specific data for these large models, also known as Data labeling companies.
For model research and development companies, a large amount of data, high-quality data, and specific data are the three main data demands.
Microsoft's research suggests that if the data quality of small language models is significantly better than that of large language models, their performance may not necessarily be worse. In fact, GPT does not have a clear advantage in originality or data; its success is mainly due to its investment in this direction. Sequoia Capital in the United States also believes that GPT may not maintain a competitive edge in the future, as there is not a deep moat in this area, and the main limitation comes from the acquisition of computing power.
According to predictions, based on the current growth of model scale, all low-quality and high-quality data will be exhausted by 2030. Therefore, the industry is exploring synthetic data generated by artificial intelligence to create infinite data, leaving only computational power as the bottleneck. This direction is still in the exploration stage and is worth paying attention to.
Database Provider
For AI data and deep learning training inference tasks, the industry currently uses "vector databases". Vector databases are designed to efficiently store, manage, and index massive amounts of high-dimensional vector data. They can unify the storage of unstructured data in the form of "vectors", suitable for the storage and processing of these vectors.
The main players include Chroma, Zilliz, Pinecone, Weaviate, among others. It is expected that as the demand for data increases, along with the emergence of large models and applications in various niche fields, the demand for Vector Databases will surge significantly. Due to the strong technical barriers in this field, investment considerations will lean more towards mature companies with clients.
Edge Devices
When building a GPU HPC (High Performance Computing cluster), it usually consumes a lot of energy to generate heat, requiring cooling equipment to maintain temperature.