LOGO

Delta Sharing: Open-Source Data Sharing by Databricks

May 26, 2021
Delta Sharing: Open-Source Data Sharing by Databricks

Databricks Introduces Delta Sharing for Open Data Exchange

Today, Databricks unveiled its fifth open-source initiative, a novel tool named Delta Sharing. This tool is engineered to provide a vendor-agnostic method for sharing data across diverse cloud infrastructures and Software-as-a-Service (SaaS) platforms, provided the necessary connector is available.

Delta Sharing is built as an extension of the broader Databricks open-source Delta Lake project.

Addressing the Challenges of Data Sharing

According to CEO Ali Ghodsi, the volume of data is experiencing exponential growth, and the process of transferring data between locations is becoming increasingly complex when relying on proprietary solutions. “A primary obstacle hindering organizational success with data is the sharing of information, both internally and externally – this is the most significant challenge we observe,” Ghodsi stated.

The Delta Sharing protocol is specifically designed to overcome this hurdle.

An Open Standard for Secure Data Access

“This represents the industry’s inaugural open protocol, establishing an open standard for the secure sharing of datasets.” Organizations can choose their preferred platform, whether it be Databricks or an alternative.

For example, they may already utilize AWS Data Exchange, Power BI, or Tableau, and can now securely access data through this new protocol.

Broad Industry Support and Partnerships

The tool is designed for compatibility with a wide range of cloud infrastructures and SaaS offerings. Initial partners include the leading cloud providers – Amazon, Microsoft, and Google – alongside data visualization and management companies such as Qlik, Starburst, Collibra, and Alation.

Furthermore, data providers like Nasdaq, S&P, and Foursquare are also involved.

The Power of Open Source and Collaboration

Ghodsi emphasized that the open-source nature of the project is crucial for its success. Donating the project to The Linux Foundation aims to ensure interoperability across various environments.

The extensive partnerships are also a key factor, as involvement from prominent companies increases the likelihood of widespread adoption and compatibility with popular services.

Numerous connectors are currently available, with Databricks anticipating further expansion as contributors develop integrations for additional services.

Flexibility and Consumption-Based Pricing

Databricks employs a consumption-based pricing model, similar to Snowflake, where costs are determined by the volume of data processed. However, Delta Sharing enables data sharing with any recipient, not solely other Databricks users.

Ghodsi believes that the open-source nature of Delta Sharing allows his company to remain competitive while providing customers with greater flexibility in data movement.

Benefits for Cloud Infrastructure Providers

Cloud infrastructure vendors also benefit from this model, as cloud data lake tools facilitate substantial data transfer through their services, generating revenue for them. This likely explains their enthusiastic support for the initiative.

Avoiding Vendor Lock-In

A significant concern for modern cloud users is the risk of vendor lock-in, a situation reminiscent of the 1990s and early 2000s when companies often relied on a single vendor like Microsoft, IBM, or Oracle.

While this offered a single point of contact, it also created dependence, as switching vendors was prohibitively expensive. Companies are keen to avoid such limitations, and open-source tooling provides a means of preventing this.

Databricks’ Growth and Market Position

Founded in 2013, Databricks has secured nearly $2 billion in funding. The most recent funding round in February raised $1 billion at a valuation of $28 billion, a remarkable figure for a private company.

Snowflake, a key competitor, went public last September and currently boasts a market capitalization exceeding $66 billion.

#Delta Sharing#Databricks#data sharing#open-source#data collaboration#data access