LOGO

Cap Theorem & Latency: How Engineers Fight the Tradeoffs

July 15, 2021
Cap Theorem & Latency: How Engineers Fight the Tradeoffs

The Vision Behind CockroachDB: A Globally Distributed Database

From its inception, CockroachDB was designed with global data distribution as a core principle. The creators at Cockroach Labs aimed to provide immediate data visibility across vast distances – specifically, ensuring data written in one location is instantly accessible 10,000 miles away.

While the concept appears straightforward, the implementation presented a significant undertaking. The company is heavily invested in overcoming a major hurdle for large-scale web applications.

A Complex Solution with Significant Potential

CockroachDB’s methodology is innovative, though it possesses a degree of complexity, especially for those without a technical background. However, leveraging its experienced team and robust engineering capabilities, the company is successfully navigating these challenges.

This progress positions CockroachDB to substantially influence the database landscape, making a thorough understanding of its technology highly beneficial. Exploring the intricacies of its design is therefore a worthwhile endeavor.

Understanding the Technology: Three Key Questions

In the first part of this series, a general overview of Cockroach Labs and its origins was presented. This installment will delve into the technical aspects of CockroachDB, tailored for a non-technical audience.

The technology will be explained through the following three core questions:

1. The Challenges of Global Data Distribution

Distributing data across a global network introduces considerable difficulties. The speed of light imposes inherent limitations on communication, and network latency becomes a critical factor.

Maintaining data consistency across geographically dispersed nodes is also a major concern. Ensuring all locations have the most up-to-date information requires sophisticated mechanisms.

2. CockroachDB’s Approach to Solving These Problems

CockroachDB tackles these challenges through a unique architecture. It employs a distributed SQL database model, leveraging the familiar SQL language while providing the scalability and resilience of a NoSQL system.

Replication is central to its design. Data is automatically copied across multiple nodes, ensuring high availability and fault tolerance. This means that even if one or more nodes fail, the data remains accessible.

Furthermore, CockroachDB utilizes a novel consensus algorithm called Raft. Raft ensures that all nodes agree on the order of transactions, guaranteeing data consistency even in the face of network partitions or failures.

3. Implications for CockroachDB Users

For users, CockroachDB offers several key advantages. Its global distribution capabilities enable low-latency access to data from anywhere in the world.

The system’s inherent resilience ensures high availability and minimizes downtime. Automatic scaling allows the database to adapt to changing workloads without manual intervention.

Ultimately, CockroachDB aims to simplify the development and deployment of globally distributed applications, allowing developers to focus on building features rather than managing infrastructure.

The Challenges of Global Data Access

Spencer Kimball, the CEO and co-founder of Cockroach Labs, illustrates the difficulties of reading and writing data across vast distances with a simple analogy.

The challenges inherent in accessing data globally stem from the fundamental constraints of time and space, mirroring the difference in delivery times between a local pizzeria and one situated across a city. Distance, whether physical or digital, inevitably impacts speed.

The primary goal of data architecture is to minimize this distance, bringing data as close as possible to the user. A straightforward method involves creating a local copy, exemplified by Amazon’s practice of downloading e-books directly to a Kindle device.

This direct download approach is effective for static data like e-books. However, dynamic information, such as fluctuating stock prices, necessitates a more advanced technological solution – namely, databases.

A database serves as a centralized hub for managing extensive and rapid data read and write operations, often occurring within fractions of a second. Unlike a single author contributing to a Kindle book, databases are designed to accommodate millions, potentially billions, of data “authors” and their subsequent reads.

how engineers fought the cap theorem in the global war on latencyConsider Kimball’s Quora example. A user in New York posting an answer to a database located in New York will experience a faster response time than a user in Sydney. Network conditions can introduce a lag ranging from half a second to several seconds, a significant issue as Kimball points out.

To address this, global applications employ a network of database servers distributed worldwide. When a user modifies data on a nearby server, these changes are replicated to all other servers globally.

how engineers fought the cap theorem in the global war on latencyWhile replication eliminates the need for users to access distant servers, it introduces a new challenge: maintaining data synchronization. Let's revisit the Quora scenario, as illustrated in Figure 1.

If a user in New York asks, “Who is the greatest guitarist of all time?”, the question appears immediately for New York users. However, Sydney-based users will experience a delay of a second or two. This indicates the Sydney server is temporarily out of sync with the New York server.

Eventually, the Sydney server will receive the updated data through replication. But, if a user in Paris then answers, “Eric Clapton is the greatest guitarist of all time,” the cycle repeats. Paris users see the answer immediately, while those in New York and Sydney remain unaware until replication occurs.

This situation mirrors the initial problem: proximity to data dictates access speed. Now, however, the distance is between the database servers themselves, rather than between user and server.

This issue of data inconsistency is well-recognized and formally defined by the CAP theorem.

The CAP theorem – representing consistency, availability, and partition – states that when data is distributed across multiple database servers (partitions), a trade-off is necessary. You can prioritize data consistency or immediate availability of potentially outdated data, but not both simultaneously.

This inherent difficulty is what makes global data management complex, and it’s the challenge that CockroachDB aims to overcome.

How does CockroachDB address the problem?

how engineers fought the cap theorem in the global war on latencyCockroachDB prioritizes rapid data reading and writing globally by integrating geographic optimization into its core database design. It also automates many performance-related complexities. Andy Woods, group product manager at Cockroach Labs, states this is about simplifying the creation of reliable, low-latency applications on a worldwide scale.

To achieve this, Cockroach Labs introduces a multiregion database architecture. Let's explore the specifics of this approach.

Understanding multiregion databases

To manage large datasets, database technologies frequently utilize a logical strategy called sharding to segment data. This defines where specific data will reside on database servers.

Traditionally, shards are determined by a standardized characteristic of the data. For example, zip codes could be divided into categories based on their initial digit. Each shard would then be stored on a separate computer, hard drive, or even data center, following the defined sharding rules.

Sharding simplifies data retrieval. When a user submits a query, the database engine locates and assembles the data based on the query's criteria. A query like “find users with zip codes between 50000 and 50009” would be directed to the server hosting that zip code range.

In a relational database, this involves examining each table row. This can be time-consuming with millions of rows. However, grouping rows closely on a hard disk or within a regional data center, based on a logical rule, significantly reduces query execution time, as shown in Figure 2. Sharding offers a partial solution to the challenges posed by the CAP theorem.

how engineers fought the cap theorem in the global war on latencyHowever, sharding requires substantial technical expertise and ongoing maintenance for reliable operation. Tasks like adding a column or modifying sharding logic are risky and demand significant technical skill. Errors can severely disrupt the database.

While most sharding relies on data characteristics, an alternative approach considers physical geography. A multiregion database segments and distributes data across database servers based on user location. CockroachDB leverages this architecture to minimize the complexities of sharding.

With CockroachDB, an engineer designates a database as multiregion. The database, or even individual tables, are then assigned to regions globally.

Figure 3 illustrates a CockroachDB database, MyDB, linked to servers in US EAST and US WEST. Three tables within MyDB are assigned to US WEST, while one is assigned to US EAST. This demonstrates how the multiregion feature segments tables geographically.

how engineers fought the cap theorem in the global war on latencyUtilizing CockroachDB’s multiregion feature to segment data by proximity directly supports the company’s goal: to position data as close as possible to the user.

Understanding row-level partitioning

Cockroach Labs further advances geographical data segmentation with row-level geopartitioning, assigning individual table rows to specific regions. This is a significant advancement, offered exclusively to enterprise customers.

Figure 4 displays a customer table with last name, first name, city, state, and zip code.

how engineers fought the cap theorem in the global war on latencyThe first two rows represent customers in the Western US, while the last represents an East Coast customer. When row-level partitioning is enabled, CockroachDB automatically places West Coast customer data in the US WEST region and East Coast data in the US EAST region. Applications on the West Coast will then access West Coast customer data with exceptional speed.

The company aims to challenge the core principle of the CAP theorem. In a partitioned database, data can be eventually consistent or immediately inconsistent, but not consistently available now. CockroachDB attempts to circumvent this by focusing on partitioning, allowing most users to experience both consistency and availability most of the time.

Row-level geopartitioning is a complex undertaking, but its benefits – fine-grained sharding without operational overhead – justify the effort. CockroachDB aims to handle the complexities for its users.

Implications for CockroachDB Users

For organizations requiring global data management, CockroachDB presents a compelling solution. The database’s automatic partitioning capabilities – functioning at the database, table, or row levels – are managed directly by the CockroachDB server engine. This represents a significant advancement in database architecture.

However, a period of adaptation is necessary. Developers should anticipate dedicating time to familiarize themselves with the SQL enhancements introduced by Cockroach Labs, as it is the standard query language for database programming.

Fault tolerance is another key feature of CockroachDB, enhanced by a mechanism called a Survival Goal. This refers to a database’s capacity to recover following system failures. In extensive, worldwide database infrastructures – potentially encompassing thousands of physical servers – server outages due to hardware or network issues are inevitable.

A Survival Goal is a configuration defined by the CockroachDB database administrator (DBA) that dictates how logical database data will be replicated across the server cluster. Survival Goals are central to Cockroach Labs’ database architecture. Both developers and DBAs must invest time in understanding and utilizing Survival Goals, as they are integral to the CockroachDB operational model.

A significant consideration for companies evaluating CockroachDB is the level of commitment involved in its adoption. Database selection is a substantial decision, often resulting in a long-term relationship – spanning years or even decades. The success of this adoption can significantly impact the careers of those responsible for the decision, highlighting the importance of meticulous planning and architectural alignment.

While versatile, CockroachDB excels in environments where global users demand rapid, accurate, and dependable access to data. Applications with less stringent requirements may find simpler database solutions more appropriate.

Despite these considerations, CockroachDB delivers a unique and valuable offering to a specific customer base, making its business model particularly noteworthy – a topic we will explore further in the subsequent section of this EC-1.

CockroachDB EC-1 Table of Contents

  • Introduction
  • Part 1: Origin story
  • Part 2: Technical design
  • Part 3: Developer relations and business
  • Part 4: Competitive landscape and future

Explore additional EC-1s available on Extra Crunch.

#CAP Theorem#latency#distributed systems#consistency#availability#partition tolerance