LOGO

datafleets keeps private data useful and useful data private with federated learning and $4.5m seed

AVATAR Devin Coldewey
Devin Coldewey
Writer & Photographer, TechCrunch
October 26, 2020
datafleets keeps private data useful and useful data private with federated learning and $4.5m seed

A significant amount of data exists with the potential to provide valuable insights, however, privacy and security concerns frequently impose substantial restrictions on its utilization and analysis. DataFleets introduces a novel method for securely accessing and analyzing databases, eliminating the risk of privacy violations or misuse – and has secured $4.5 million in seed funding to facilitate its expansion.

Effective data utilization requires access to the information. For financial institutions, this encompasses transaction records and account details; for retailers, it includes inventory and supply chain data, and so forth. Valuable insights and actionable patterns are hidden within this data, and it is the responsibility of data scientists and related professionals to uncover them.

However, what if data access is restricted? Numerous industries, such as healthcare, have regulations or policies that discourage or even prohibit such access. It is impractical to obtain an entire hospital’s patient records and delegate data analysis to an external firm with a request to identify useful information. These, along with many other datasets, are considered too confidential or sensitive to permit unrestricted access, as even minor errors or malicious intent could have severe consequences.

In recent years, several technologies have emerged that enable a more effective approach: analyzing data without actually exposing it. While seemingly impossible, computational techniques allow data manipulation without granting the user direct access to the data itself. Homomorphic encryption is the most commonly employed method, but it unfortunately results in a substantial, many-fold decrease in efficiency – a critical drawback given the importance of efficiency in big data applications.

This is where DataFleets provides a solution. Rather than reinventing homomorphic encryption, it employs an alternative approach known as federated learning, which focuses on bringing the model to the data instead of the data to the model.

DataFleets functions as a trusted intermediary, connecting a private database with those seeking access, and securely transferring information between them without ever revealing any of the underlying raw data.

Image Credits: DataFleets

Consider this scenario: a pharmaceutical company aims to create a machine-learning model that predicts potential side effects of a new drug based on a patient’s medical history. A medical research facility’s private patient database would be ideal for training this model, but access is strictly controlled.

The pharmaceutical company’s analyst develops a machine-learning training program and submits it to DataFleets, which establishes contracts with both the analyst and the facility. DataFleets translates the model into its own proprietary runtime and distributes it to the servers hosting the medical data; within this secure environment, the model develops into a fully functional ML agent, which is then translated back into the analyst’s preferred format or platform. The analyst never views the actual data, but benefits from its full analytical potential.

Screenshot of the DataFleets interface. The applications are the primary focus. Image Credits: DataFleets

The process is straightforward: DataFleets acts as a secure messenger between platforms, performing the analysis on behalf of others and never storing or transferring sensitive data.

Many organizations are investigating federated learning; however, the challenge lies in constructing the infrastructure for a comprehensive, enterprise-level service. This requires accommodating a wide range of use cases, supporting diverse languages, platforms, and techniques, and ensuring complete security.

“We prioritize enterprise readiness, offering policy management, identity-access management, and are currently pursuing SOC 2 certification,” stated Nick Elledge, COO and co-founder of DataFleets. “Clients, including banks and hospitals, will confirm that prior privacy software lacked the flexibility to integrate your own tools.”

Once federated learning is implemented, the advantages are considerable. For example, a significant obstacle in the fight against COVID-19 has been the difficulty hospitals, health authorities, and other organizations worldwide have experienced in securely sharing data related to the virus, despite their willingness to do so.

Widespread data sharing is desired, but questions regarding data transmission, storage location, and authority and liability are complex. Traditional methods are often confusing, homomorphic encryption is slow, and federated learning, in theory, simplifies access control.

Because the data remains in its original location, this approach is inherently anonymous and therefore highly compliant with regulations such as HIPAA and GDPR, representing another significant benefit. Elledge points out: “Leading healthcare institutions are utilizing our services, recognizing that HIPAA does not provide sufficient protection when making a dataset available to third parties.”

There are also less critical, yet equally valid, applications in other sectors: Wireless carriers could provide subscriber metadata without compromising individual privacy; banks could offer consumer data without violating privacy; and large datasets, such as video files, can remain in their current location, eliminating the need for costly duplication and maintenance.

The company’s $4.5 million seed round reflects investor confidence (as summarized by Elledge): AME Cloud Ventures (Jerry Yang of Yahoo) and Morado Ventures, Lightspeed Venture Partners, Peterson Ventures, Mark Cuban, LG, Marty Chavez (president of the board of overseers of Harvard), Stanford-StartX fund, and three unicorn founders (Rappi, Quora and Lucid).

With a team of only 11 full-time employees, DataFleets is achieving significant results with limited resources, and the seed funding will accelerate product scaling and maturation. “We have had to decline or postpone new customer requests to concentrate on our work with key customers,” Elledge said. They plan to hire engineers in the U.S. and Europe to support the launch of a self-service product next year.

“We are transitioning from a data ownership model to a data access economy, where information can be valuable without requiring a transfer of ownership,” Elledge explained. If the company’s strategy proves successful, federated learning is poised to play a crucial role in this evolution.

#federated learning#data privacy#seed funding#datafleets#AI#machine learning

Devin Coldewey

Devin Coldewey is a writer and photographer who lives in Seattle. You can find his portfolio and personal website at coldewey.cc.
Devin Coldewey