Synthetic Data for Enterprises | Rockfish

The Genesis of Rockfish: Addressing the Synthetic Data Need
For several years, Vyas Sekar routinely contacted Muckai Girish, a former undergraduate classmate, to discuss potential startup concepts and solicit his feedback. These discussions typically concluded with the initial conversation. However, when Sekar approached Girish in early 2022 with an idea centered around synthetic data, the dialogue extended beyond the phone call’s termination.
Reproducibility Challenges and Enterprise Validation
Sekar, alongside Giulia Fanti, a colleague from Carnegie Mellon University, had been developing synthetic data solutions to address the growing reproducibility crisis within academic research – the difficulty in replicating research findings due to data access limitations. While Sekar initially identified the need within academia, Girish recognized that his existing clientele were grappling with a similar issue. Subsequent conversations with various enterprises further substantiated this observation.
“It became apparent at that juncture that this was a genuine and significant opportunity,” Girish, the CEO, explained to TechCrunch. “This realization spurred us to initiate the project, and over the following months, we engaged with investors, individuals within our network, and crucially, enterprises, confirming the substantial nature of the problem and its worthiness as a long-term endeavor.”
Introducing Rockfish: Generative AI for Data Silos
This led to the creation of Rockfish, a startup leveraging generative AI to produce synthetic data for operational processes, enabling enterprises to overcome data silos. Rockfish is compatible with database providers such as AWS and Azure, and assists users in selecting optimal data configurations based on organizational policies or intended data applications.
Differentiating in a Growing Market
The demand for synthetic data in the AI landscape has been steadily increasing, even prior to the company’s launch in June 2022. Girish emphasized Rockfish’s commitment to developing a product that would distinguish itself from competitors and serve as a daily operational tool for enterprises, rather than a sporadic solution.
Consequently, the company’s product is engineered for continuous data ingestion and focuses on operational data, encompassing information related to financial transactions, cybersecurity measures, and supply chain management. These areas generate a constant stream of data and are subject to frequent changes. Girish believes this focus provides Rockfish with a competitive advantage.
Current Clients and Applications
Currently, Rockfish collaborates with a select group of enterprise clients, including the streaming analytics platform Conviva, as well as various government entities like the U.S. Army and the U.S. Department of Defense.
Seed Funding and Future Growth
Rockfish has recently announced a $4 million seed funding round, spearheaded by Emergent Ventures, with additional participation from Foster Ventures, TEN13, and Dallas VC. This investment brings the company’s total funding to approximately $6 million.
Emergent Ventures’ Investment Rationale
Anupam Rastogi, a managing partner at Emergent Ventures, revealed to TechCrunch that he had been following Sekar’s work prior to the founding of Rockfish. He stated that the firm’s decision to invest was based on the “team, market, and product, in that order.” Furthermore, Rockfish’s emphasis on serving enterprises aligned more closely with Emergent’s investment strategy compared to other companies in the field.
“The team comprises exceptionally skilled data scientists, including multiple PhD holders,” Rastogi noted. “This is a technically demanding field, and possessing such technical expertise is paramount. They have conducted substantial foundational work, not only within the company but also across the broader industry.”
Navigating a Competitive Landscape
While Rockfish’s strategic focus aims to establish a strong market position, the synthetic data market is expected to become increasingly competitive. AI companies are increasingly turning to synthetic data as traditional AI training data sources become exhausted.
Several startups are already vying for market share, including Tonic AI, which has secured over $45 million in venture capital; Mostly AI, with $31 million in VC funding; and Hazy, which was acquired by SAS in 2024 after raising $14.5 million.
Future Development and Innovation
Girish indicated that the company intends to enhance its synthetic data approach by integrating other modeling techniques, such as state space models – mathematical models utilizing state variables. The company also plans to refine its end-to-end functionalities.
“Simply utilizing random data from the internet to generate synthetic data is insufficient,” Girish explained. “There’s no assurance of success. However, when you integrate all these elements for enterprises, the results become highly relevant and realistic. The ability to perform this process continuously is what we consider to be truly valuable.”
TechCrunch’s AI-focused newsletter is available for subscription! Sign up here to receive it in your inbox every Wednesday.
Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature
