Evolution of Blockchain Data Indexing: From Node to AI-empowered Full-chain Services

robot
Abstract generation in progress

From Data Source to Smart Analysis: The Evolution of Blockchain Data Indexing Technology

1. Introduction

Since the first batch of decentralized applications (dApp) was born in 2017, the blockchain application ecosystem has flourished. When discussing these dApps, have we ever considered the sources of the various data they use?

In 2024, artificial intelligence and Web3 have become hot topics. In the field of AI, data is like the source of its growth and evolution. Just as plants need sunlight and moisture, AI systems also rely on vast amounts of data to continuously learn and think. Without data support, even the most advanced AI algorithms struggle to realize their potential.

This article will delve into the evolution of data indexing in the industry from the perspective of blockchain data accessibility, comparing traditional data indexing protocols with emerging blockchain data service protocols, and exploring the characteristics of new protocols that integrate AI technology in data services and product architecture.

Reading, indexing to analysis, a brief overview of the Web3 data indexing track

2. The Evolution of Data Indexing: From Blockchain Nodes to Full Chain Database

2.1 Data Source: Blockchain Node

Blockchain is often described as a decentralized ledger. Blockchain nodes are the foundation of the entire network, responsible for recording, storing, and disseminating all on-chain transaction data. Each node has a complete copy of the blockchain data, maintaining the decentralized nature of the network. However, for the average user, building and maintaining a node is not an easy task, as it requires specialized technology and comes with high costs. At the same time, the query capabilities of ordinary nodes are limited and cannot meet the needs of developers. Therefore, users typically rely on third-party services.

To address this issue, RPC node providers have emerged. They are responsible for node management and provide data access through RPC endpoints. Public RPC endpoints are free but have rate limits, which may affect the dApp user experience. Private RPC endpoints offer better performance but are inefficient for complex queries and difficult to scale. However, the standard API interfaces of node providers lower the barrier for users to access on-chain data, laying the foundation for subsequent data applications.

2.2 Data Parsing: From Raw Data to Usable Data

The raw data provided by blockchain nodes is usually encrypted and encoded. Although this data preserves the integrity and security of the blockchain, it increases the difficulty of analysis. For ordinary users or developers, directly handling this data requires a significant amount of technical knowledge and computational resources.

The data parsing process is particularly important in this context. By converting complex raw data into a more understandable and operable format, users can utilize this data more intuitively. The quality of the parsing directly affects the efficiency and effectiveness of blockchain data applications and is a key link in the entire data indexing process.

2.3 Development of Data Indexers

As the amount of Blockchain data increases, the demand for data indexers is growing. Indexers are responsible for organizing on-chain data and sending it to databases for querying. They index Blockchain data and provide API interfaces with query languages like SQL( and GraphQL), making data readily available. Indexers provide developers with a unified query interface, greatly simplifying the data retrieval process.

Different types of indexers optimize data retrieval methods differently:

  1. Full Node Indexer: Extracts data directly from full nodes, ensuring data integrity, but requires substantial storage and processing power.
  2. Lightweight Indexer: Relies on full nodes to fetch specific data on demand, reducing storage requirements but potentially increasing query time.
  3. Specialized Indexer: Optimized for specific types of data or Blockchain, such as NFT data or DeFi transactions.
  4. Aggregated Indexer: Extracts data from multiple blockchains and sources, including off-chain information, providing a unified query interface, suitable for multi-chain dApps.

Currently, the storage requirements for Ethereum archival nodes vary significantly across different clients. In the face of large data volumes, mainstream indexing protocols not only support multi-chain indexing but also customize data parsing frameworks for different application needs.

The emergence of indexers has greatly improved the efficiency of data indexing and querying. Compared to traditional RPC endpoints, indexers can efficiently index large amounts of data, supporting high-speed complex queries and data filtering. Some indexers also support aggregating data sources from multiple blockchains, avoiding the issue of multi-chain dApp deployments requiring multiple APIs. Distributed operation provides stronger security and performance, reducing the risks that centralized RPC providers may pose.

Reading, Indexing to Analysis, Brief Overview of Web3 Data Indexing Track

2.4 Full Chain Database: Aligning to Stream Priority

Using index nodes to query data usually makes the API the only gateway for digesting on-chain data. However, when projects enter the expansion stage, they often require more flexible data sources. As application demands become more complex, primary data indexers struggle to meet diverse query needs, such as searching, cross-chain access, or off-chain data mapping.

In modern data pipeline architecture, the "stream-first" approach has become a solution to the limitations of traditional batch processing, enabling real-time data processing and analysis. Blockchain data service providers are also moving towards building data streams, launching products that obtain real-time Blockchain data in a streaming manner.

These services aim to address the demand for real-time analysis of Blockchain transactions and comprehensive query capabilities. By redefining the challenges of on-chain data from the perspective of modern data pipelines, we are able to view the potential of on-chain data management, storage, and provision from a new angle.

3. The combination of AI and databases: A comparison of The Graph, Chainbase, and Space and Time

3.1 The Graph

The Graph network achieves multi-chain data indexing and query services through a decentralized node network, making it easier for developers to index blockchain data and build dApps. Its main product models include a data query execution market and a data index caching market, serving the product query needs of users.

Subgraphs are the basic data structure of The Graph network, defining how to extract and transform data from the blockchain into a queryable format. The network consists of four key roles: indexers, curators, delegators, and developers, ensuring the system operates through economic incentives.

The products of The Graph are also rapidly developing in the wave of AI. Tools developed by Semiotic Labs such as AutoAgora, Allocation Optimizer, and AgentC optimize pricing strategies, resource allocation, and user experience, enhancing the intelligence of the system and its user-friendliness.

Read, index to analyze, brief introduction to the Web3 data indexing track

3.2 Chainbase

Chainbase is a full-chain data network that integrates all blockchain data onto one platform. Its features include:

  • Real-time Data Lake: Provides a real-time data lake specifically for blockchain data streams.
  • Dual-chain architecture: The execution layer is built on Eigenlayer AVS, forming a parallel architecture with the CometBFT consensus algorithm.
  • Innovative data format standard: Introduce the "manuscripts" data format standard.
  • Crypto World Model: Combining AI model technology to create the AI model Theia, which can understand and predict Blockchain transactions.

Chainbase's AI model Theia is based on NVIDIA's DORA model, combining on-chain and off-chain data analysis encryption patterns to provide users with intelligent data services.

Reading, Indexing to Analysis, Brief Overview of Web3 Data Indexing Track

3.3 Space and Time

Space and Time (SxT) is dedicated to creating a verifiable computing layer that expands zero-knowledge proofs on decentralized data warehouses. Its innovative technology, Proof of SQL, ensures the tamper-proofing and verifiability of SQL queries, providing solutions for industries with high data reliability requirements.

SxT collaborates with Microsoft's AI Innovation Lab to develop generative AI tools, making it easier for users to process blockchain data through natural language. Users can experience AI automatically converting natural language into SQL and executing queries in Space and Time Studio.

Read, Index to Analyze, Briefly Describe the Web3 Data Indexing Track

Conclusion and Outlook

The blockchain data indexing technology has evolved from the initial node data source, through the development of data parsing and indexers, to the AI-enabled full-chain data service, undergoing a process of gradual improvement. The evolution of these technologies has not only enhanced the efficiency and accuracy of data access but also brought about an intelligent experience.

In the future, with the development of new technologies such as AI and zero-knowledge proofs, Blockchain data services will become further intelligent and secure. As an infrastructure, Blockchain data services will continue to support industry advancement and innovation.

Reading, indexing to analysis, a brief overview of the Web3 data indexing track

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 7
  • Share
Comment
0/400
HodlVeteranvip
· 14h ago
Crypto Veterans also need to keep up with the times.
View OriginalReply0
LonelyAnchormanvip
· 07-13 10:17
Blockchain is long and far.
View OriginalReply0
SerLiquidatedvip
· 07-11 14:48
Data is king, and the future is promising.
View OriginalReply0
GasFeeLadyvip
· 07-11 14:44
Technology always originates from human nature.
View OriginalReply0
AirdropHunterXMvip
· 07-11 14:43
Which chains will grow quickly in the future?
View OriginalReply0
shadowy_supercodervip
· 07-11 14:34
Finally, there is a useful on-chain database.
View OriginalReply0
Ser_This_Is_A_Casinovip
· 07-11 14:26
Revolutionary data transformation
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)