LOGO

a crypto company’s journey to data 3.0

AVATAR Michael Li
Michael Li
March 16, 2021
a crypto company’s journey to data 3.0

The Value of Data Evolution for Businesses

Data represents a significant asset for any organization. Effective management unlocks crucial clarity and actionable insights, enabling more informed and scalable decision-making. Furthermore, it serves as a vital instrument for ensuring accountability across all levels.

Currently, many companies operate within what we term Data 1.0, utilizing data in a manual and reactive capacity. Some are transitioning towards Data 2.0, employing basic automation to enhance team efficiency.

The inherent complexities within crypto data have spurred the emergence of new possibilities in data management, specifically the advancement to Data 3.0. This new frontier focuses on scaling value creation through systematic intelligence and automation. This represents our progression towards Data 3.0.

Coinbase's Unique Data Landscape

Coinbase defines itself as a crypto company, distinct from traditional finance or technology firms. This categorization profoundly influences our approach to data handling.

As a crypto company, we manage three primary data types – a complexity exceeding the typical one or two found elsewhere:

  • Blockchain data: This is decentralized and publicly accessible.
  • Product data: Characterized by its large volume and real-time nature.
  • Financial data: Requiring high precision and adherence to numerous financial, legal, and compliance standards.
a crypto company’s journey to data 3.0Our primary objective is to maximize value creation by integrating these diverse data sources, eliminating data silos, proactively addressing potential issues, and uncovering opportunities unique to Coinbase.

From Reactive to Proactive Data Strategies

Drawing from experience at technology companies like LinkedIn and eBay, as well as financial institutions such as Capital One, I’ve directly witnessed the evolution from Data 1.0 to Data 3.0.

In Data 1.0, data is often perceived as a reactive function, primarily used for ad-hoc requests or resolving urgent problems.

Data 2.0 introduces simple tools and third-party solutions to automate repetitive tasks, improving team productivity. However, the data team often still relies on increasing personnel to deliver greater value.

Finally, Data 3.0 involves the deliberate creation of data systems using both open-source and internally developed technologies. This approach is designed to fundamentally scale value creation.

Embarking on the Journey to Data 3.0

A primary advantage of Data 3.0 lies in the enhanced efficiency and uniformity it establishes throughout all data processes. This allows an organization to construct a robust data infrastructure, prepared for sustained long-term achievement while simultaneously addressing present requirements with constrained resources.

The importance of this may not be immediately apparent for smaller, rapidly evolving companies. However, as an organization expands and undergoes rapid growth, inconsistencies in data flows – or the absence of consistent flows – can become a significant obstacle. Correcting these issues later, without a pre-defined vision, can prove exceptionally challenging.

Even leading technology firms can inadvertently develop problematic practices. Independent engineering teams may construct customized data products and services to resolve particular issues.

The Pitfalls of Disparate Systems

This approach can result in substantial deficiencies within the standardized processes of a complete data system, hindering the ability to effectively build and manage data at scale. Furthermore, these isolated initiatives can expand into independent systems, requiring considerable effort to consolidate and migrate.

Such systems frequently persist as legacy infrastructure, accumulating significant technical debt for the company over time. Addressing this debt can be costly and time-consuming.

Despite the ongoing advancements in blockchain technologies and evolving data applications, our Data 3.0 initiative is an ongoing process. Nevertheless, we are pleased with the advancements achieved to date.

Below is a summary of our work and the systems currently in place.

a crypto company’s journey to data 3.0Data Storage and Processing Strategies

A well-defined strategy encompassing data storage, processing, and semantic consistency is crucial, irrespective of the technologies ultimately selected.

Specifically, a clear approach to separating storage, compute resources, and establishing a definitive “single source of truth” is paramount.

The Importance of Decoupling

Decoupling these core components – storage, compute, and semantic data – and formulating a robust technical strategy proactively prevents performance and capacity limitations as the organization expands.

By isolating these elements, scalability is significantly improved, and the system becomes more adaptable to evolving business needs.

Key Components Explained

  • Separation of Storage: This involves distinct management of where data resides, independent of how it is processed.
  • Separation of Compute: This focuses on isolating the processing power used to manipulate data from the storage itself.
  • Single Source of Truth: Establishing a single, authoritative source for all data ensures consistency and reliability across the organization.

Maintaining a single source of truth is vital for data integrity and informed decision-making.

A thoughtfully designed architecture, prioritizing these separations, will yield a more resilient and efficient data ecosystem.

Data Platform and Application Strategies

Our data infrastructure relies on a carefully curated mix of internally developed technologies, publicly available open-source tools, and solutions provided by external vendors. We deliberately evaluate and select specific tools for each functional area.

This strategic approach is designed to prevent redundancy and eliminate potential confusion as our systems evolve. It impacts our choices for event management, data orchestration, business intelligence, and experimentation platforms.

Key Architectural Principles

The conscious selection process fosters a highly decoupled and scalable architecture. This allows for independent development and deployment of individual components.

We prioritize avoiding overlap in functionality across different systems. This ensures efficient resource utilization and simplifies maintenance.

Components of the Data Ecosystem

  • Eventing System: Managed with a focus on reliability and real-time data propagation.
  • Data Orchestration Workflow: Designed for efficient and automated data processing pipelines.
  • Business Intelligence Layer: Provides insights through data analysis and reporting capabilities.
  • Experimentation Platform: Facilitates A/B testing and data-driven decision-making.

Each of these components is chosen to integrate seamlessly while maintaining its distinct purpose. This modularity is central to our data strategy.

The resulting system is not only robust but also adaptable to future growth and changing business requirements.

Machine Learning and the Platform Infrastructure

Despite being the area currently receiving the most attention due to the increased prominence of artificial intelligence, the machine learning function represents a highly collaborative element within the data team. Our comprehensive machine learning platform, known as Nostradamus, provides the capabilities to power all machine learning models at Coinbase.

This includes data pipelines, model training, deployment processes, serving infrastructure, and experimentation frameworks. The platform’s design prioritizes integration with the broader data ecosystem, ensuring it not only addresses current challenges but also supports future growth and scalability.

Data Science and Data Product Development

These two disciplines represent the most user-facing aspects of the data team, functioning as the presentation layer for refined data insights. The goal is to deliver curated information that provides value and enhances the experience for our customers. They directly benefit from the foundational work completed in other areas.

A key objective is to shift the focus of data scientists away from manual processes and towards enabling automated systems. This allows the infrastructure to efficiently deliver data and generate value for end-users at scale, rather than requiring scientists to act as intermediaries.

This transition promotes a more scalable and sustainable approach to data utilization.

Early Stage: A Resource for Startups

Early Stage is a leading event designed to provide practical guidance for startup entrepreneurs and investors. Attendees gain direct insights from successful founders and venture capitalists regarding business development, fundraising, and portfolio management.

The event covers all critical aspects of company building, including securing funding, talent acquisition, sales strategies, achieving product-market fit, public relations, marketing, and brand development. Each session incorporates dedicated time for audience questions and interactive discussion.

A 20% discount on tickets is available using the code “TCARTICLE” at checkout.

  • Event Focus: Startup entrepreneurship and investment.
  • Key Topics: Fundraising, recruiting, sales, and marketing.
  • Discount Code: TCARTICLE for 20% off.