LOGO

tonic is betting that synthetic data is the new big data to solve scalability and security

AVATAR Danny Crichton
Danny Crichton
Contributor, TechCrunch
December 14, 2020
tonic is betting that synthetic data is the new big data to solve scalability and security

The prevailing notion of “big data” is being challenged. For some time, organizations have been encouraged to retain every piece of digital information generated, under the assumption that doing so will provide a crucial competitive advantage.

However, a significant issue with big data exists: its sheer volume is substantial.

Analyzing massive amounts of data to produce actionable business intelligence is both costly and lengthy. Furthermore, storing such extensive data creates a prominent vulnerability, making the company a target for malicious actors. Maintaining, securing, and protecting the privacy of big data is an expensive undertaking. Ultimately, the resulting insights may not justify the investment—often, carefully selected and refined data sets can yield quicker and more valuable results than vast quantities of unprocessed data.

What course of action should a company take? They require a solution like Tonic to address the drawbacks of their big data practices.

Tonic is a platform for creating “synthetic data” that converts raw data into more manageable and secure data sets for use by software developers and data analysts. During this process, Tonic’s algorithms remove identifying information from the original data and generate statistically equivalent synthetic data sets, ensuring that sensitive personal information is not compromised.

For example, an e-commerce business collects transaction data detailing customer purchases. Providing this data to all engineers and analysts within the company poses a risk, as purchase histories may contain personally identifiable information that should only be accessible to those with a legitimate need. Tonic can transform this original payment data into a new, smaller data set that maintains the same statistical characteristics but is not linked to individual customers. This allows engineers to test applications and analysts to evaluate marketing campaigns without raising privacy concerns.

Synthetic data and other methods for safeguarding large data sets have recently attracted considerable investor interest. We previously reported on Skyflow, which secured funding to utilize polymorphic encryption, restricting employee access to only the data required for their roles. BigID adopts a broader approach, focusing on tracking data location and access permissions based on applicable privacy regulations.

Tonic’s methodology offers the advantage of addressing both privacy concerns and scalability challenges as data sets grow in size. This combination has garnered investor attention: the company announced today that it has secured $8 million in Series A funding, led by Glenn Solomon and Oren Yunger of GGV, with the latter joining the company’s board.

The company was established in 2018 by a team of four founders: CEO Ian Coe, COO Karl Hanson (who initially connected in middle school), and CTO Andrew Colombi, all of whom previously worked at Palantir. Coe also collaborated with the company’s head of engineering, Adam Kamor, during their time at Tableau. This experience at leading data infrastructure companies in Silicon Valley has shaped the core of Tonic’s product development.

tonic is betting that synthetic data is the new big data to solve scalability and securityCoe explained that Tonic is engineered to prevent common security vulnerabilities in contemporary software development. In addition to accelerating data pipelining for engineering teams, Tonic “also ensures that sensitive data does not move from production environments to less secure lower environments.”

He stated that the concept behind Tonic originated while resolving issues for a Palantir banking client. The team required data to address a problem, but the data was highly sensitive, leading them to utilize synthetic data as a solution. Coe aims to broaden the application of synthetic data in a more systematic manner, particularly in light of evolving legal requirements. “I believe increasing regulatory pressure is driving teams to modify their data practices,” he observed.

The foundation of Tonic’s technology is its subsetter, which analyzes raw data and begins to define the statistical relationships between records. Some of this analysis is automated depending on the data source, and when automation is not possible, Tonic’s user interface allows a data scientist to manually onboard data sets and establish these relationships. Ultimately, Tonic generates these synthetic data sets, making them accessible to all data users within the organization.

With this new funding, Coe intends to continue prioritizing ease of use and onboarding, and to promote the benefits of this approach to clients. “In many respects, we are establishing a new category, which means people need to understand and recognize the value [and adopt] an early-adopter mindset,” he said.

In addition to lead investor GGV, Bloomberg Beta, Xfund, Heavybit and Silicon Valley CISO Investments participated in the funding round, along with angel investors Assaf Wand and Anthony Goldbloom.

#synthetic data#big data#scalability#security#tonic#data privacy

Danny Crichton

Danny Crichton currently serves as an investor with CRV, and previously worked as a regular contributor for TechCrunch.
Danny Crichton