LOGO

Code Ocean: Reproducible Research with $21M Funding

May 17, 2021
Code Ocean: Reproducible Research with $21M Funding

The Growing Need for Research Collaboration and Reproducibility

Increasingly, all scientific disciplines depend on large datasets and complex analyses. This reliance leads to a proliferation of diverse formats and platforms, creating challenges that extend beyond mere inconvenience. These complexities can actively impede the vital processes of peer review and research replication.

Code Ocean: A Platform for Streamlined Scientific Collaboration

Code Ocean aims to address these issues by providing scientists with a versatile and shareable platform for datasets and methodologies. The company has secured $21 million in funding to further develop and expand its capabilities.

Addressing the Challenges of Sharing Research

While numerous tools exist for data analysis – such as Jupyter, GitLab, and Docker – Code Ocean distinguishes itself as a container platform designed for ease of sharing. It packages all essential components of data and analysis into a readily distributable format, regardless of the native platform.

A significant hurdle arises when researchers attempt to share their work with colleagues, whether nearby or geographically distant. For accurate replication, data analysis, like any scientific procedure, must be executed identically. However, inconsistencies in structures, formats, and settings are common.

Sharing work isn’t impossible, but it necessitates extensive verification by those attempting to replicate or build upon the original research. This involves confirming the use of identical tools, versions, and configurations. Even minor discrepancies can have substantial consequences.

The Containerization Solution Inspired by Cloud Services

This problem mirrors challenges encountered in software deployment. Similar to scientific experiments, software deployments can be delicate, and containerization offers a solution. Containers function as miniature virtual machines, encapsulating everything needed for a computing task in a portable format.

Applying this concept to research allows for the bundling of data, software, and specific techniques into a single, organized package – the core offering of Code Ocean’s platform and “Compute Capsules.”

A Practical Example: Microbiological Research

Consider a microbiologist studying a compound’s effect on muscle cells, utilizing R and RStudio on an Ubuntu machine with specific in vitro data. While publication typically includes a declaration of these details, the absence of a compatible Ubuntu/RStudio setup prevents successful replication, even with access to the code.

Compute Capsules: Accessible and Reproducible Research

Code Ocean addresses this by making the code readily available and executable with a single click. Colleagues can inspect, run, or modify the code as needed, all through a web app accessible across platforms. The capsules can even be embedded on webpages.

Furthermore, Compute Capsules can be adapted by others using new data and modifications. A general-purpose RNA sequence analysis tool, for instance, can be utilized by anyone providing properly formatted data, eliminating the need for independent coding.

Users can clone a capsule, run it with their own data, and verify the original results. This can be done directly on the Code Ocean website or by downloading the capsule for local execution. Additional examples are available for review.

Breaking Down Silos in Data-Heavy Research

The exchange of research techniques has long been a cornerstone of scientific progress. However, modern, data-intensive experimentation often becomes isolated due to difficulties in sharing and verification, despite code availability. This leads to redundant efforts and reinforces existing silos.

Current Usage and Impact

Currently, Code Ocean hosts approximately 2,000 public Compute Capsules, many linked to published papers. These capsules have been utilized by others for replication and innovation, with some open-source libraries seeing usage by thousands of researchers.

Addressing Security Concerns

Recognizing the sensitivity of proprietary and medical data, Code Ocean offers an enterprise product allowing the system to operate on a private cloud platform, providing a secure internal tool for research institutions.

A Vision for Inclusive Collaboration

Code Ocean strives to foster a more collaborative research environment by embracing a wide range of codebases, platforms, and compute services.

Funding and Future Development

The company’s ambition is supported by $21 million in funding, including $15 million in a recent Series A round led by Battery Ventures, with participation from Digitalis Ventures, EBSCO, and Vaal Partners. This investment will fuel further platform development, scaling, and promotion.

The goal is to establish Code Ocean as an essential, deeply integrated, and profitable Software-as-a-Service (SaaS) solution within the scientific community.

#research reproducibility#data science#code ocean#scientific research#funding#replication