LOGO

data scientists: bring the narrative to the forefront

AVATAR Peter Wang
Peter Wang
April 16, 2021
data scientists: bring the narrative to the forefront

The Growing Importance of Storytelling in Data Science

Estimates suggest that by the year 2025, a staggering 463 exabytes of data will be generated on a daily basis. To illustrate the scale, a single exabyte is capable of storing 50,000 years of DVD-quality video.

The conversion of both physical and digital activities into quantifiable data has become increasingly streamlined. Consequently, organizations across all sectors have been actively pursuing the accumulation of vast datasets, seeking a competitive advantage.

Data Alone Isn't Enough

Despite this widespread focus on data acquisition, the crucial role of storytelling in unlocking its true potential is frequently underestimated. Raw data, in and of itself, proves inadequate when attempting to genuinely impact human decisions.

Whether the objective is to enhance a company’s financial performance or encourage public health measures, it is the compelling narrative, not merely the figures, that drives action.

Communication as a Key Skill

As data collection and analysis continue to expand, effective communication and the art of storytelling will become increasingly vital within the field of data science. This is due to their ability to distinguish meaningful insights from irrelevant information.

However, this represents a challenge for many data scientists. A 2020 survey conducted by Anaconda, involving over 2,300 professionals, revealed that nearly 25% of data science and machine learning (ML) teams reported deficiencies in communication skills.

This lack of communication proficiency may contribute to the fact that approximately 40% of respondents indicated they could only demonstrate business impact sporadically, or rarely.

Bridging the Gap: Data Science and Narrative

Successful data professionals must possess a comparable level of expertise in narrative construction as they do in coding and model deployment. This extends beyond simply producing visualizations for reports.

Here are some suggestions for how data scientists can effectively contextualize their findings within broader, more comprehensive narratives:

  • Focus on crafting a clear and concise message.
  • Understand your audience and tailor your story accordingly.
  • Use data visualizations to support, not overshadow, your narrative.
  • Emphasize the 'so what?' – the practical implications of your findings.

Rendering the Abstract Concrete

The continuous expansion of datasets aids machine learning models in achieving a more comprehensive grasp of problem domains. However, an increase in data volume doesn't automatically translate to enhanced human understanding.

Even individuals predisposed to analytical thinking find it challenging to process large, abstract figures or subtle gains in accuracy. Therefore, incorporating relatable references into data narratives is crucial for making information accessible.

The Pandemic as a Case Study

During the recent pandemic, the public was inundated with statistics concerning case numbers, mortality rates, and positivity percentages. While this data held significance, methods like interactive maps and discussions surrounding reproduction rates proved more effective.

These tools provided essential context, communicated risk levels, and ultimately facilitated behavioral adjustments when necessary, surpassing the impact of simply presenting large volumes of raw data. Data professionals bear the responsibility of structuring information for clear comprehension by their target audience.

A Gap in Data Science Education

Notably, the 2020 State of Data Science survey revealed that only 49% of student respondents were receiving instruction in data visualization techniques.

Regardless of whether pursued through self-directed learning or a structured academic program, this skillset warrants investment by data scientists. Developing proficiency in data visualization can significantly broaden the impact and effectiveness of their work.

The Importance of Subject Matter Expertise in Data Science

Effective data science isn't conducted in isolation. To leverage data for impactful problem-solving, data scientists require close collaboration with subject matter experts. This partnership is crucial for comprehending the nuances of the problem and ensuring hypotheses are well-aligned.

While data scientists focus on model development and coding, the true value of their results is unlocked through the insights of experts who can provide essential business or industry-specific context.

Identifying Overlooked Hypotheses

Experts play a vital role in identifying hypotheses that a data scientist might not consider. Their deep understanding of the field can broaden the scope of investigation and lead to more comprehensive analyses.

Furthermore, subject matter experts are essential for preventing misinterpretations of data, particularly avoiding the logical fallacy that correlation automatically implies causation.

Avoiding Spurious Correlations

As demonstrated by Tyler Vigen’s work on “Spurious Correlations,” apparent correlations don't always signify a meaningful relationship. The example of a strong correlation between margarine consumption and divorce rates in Maine highlights this potential pitfall.

Similar misleading patterns can easily occur in business environments without the guiding influence of industry-specific knowledge.

Contextualizing Analytical Findings

Consider a data scientist working for an e-commerce platform who observes increased website traffic on Mondays. They might initially interpret this as a preference for post-weekend shopping.

However, a marketing analyst could clarify that these spikes are likely attributable to marketing campaigns strategically launched on weekends. This illustrates how data science’s analytical power is maximized when combined with relevant domain expertise.

The Subjectivity Inherent in Data Interpretation

As previously noted, charts and graphs represent powerful instruments, but their perceived authority can sometimes obscure the fact that data visualizations, and even data presented in non-visual formats, are susceptible to manipulation. Consider, for instance, the correlation between margarine consumption and divorce rates.

Data is frequently regarded as an objective reality, however, its presentation is inherently subjective. This subjectivity stems from the interpreter and the methods employed. While data visualization excels at revealing trends, patterns, and anomalies within datasets, it rarely provides a complete basis for analysis and interpretation on its own.

The Importance of Contextualized Analysis

Comprehensive data analysis necessitates the inclusion of the complete context surrounding data collection. This includes detailing the analytical techniques applied and acknowledging potential limitations. Critical questions should be posed, such as “What rationale underlies this specific data presentation?” and “What compromises were made during model parameter optimization?”

Data scientists must convey the complete narrative of their findings, outlining the process that led to a particular result or insight. This practice not only mitigates potential biases and blind spots but also fosters greater trust in data-driven decision-making. The application of data science within a business context can be likened to consuming raw sushi; confidence is significantly enhanced by understanding the origin and handling of the ingredients.

Data Storytelling and its Impact

Innovation isn't solely driven by data itself; it’s the compelling narrative constructed from data that unlocks hidden trends, enables personalization, and optimizes workflows.

While data holds the potential to enhance decision-making and stimulate innovation, it can also introduce confusion and ambiguity. Data professionals bear an increasing responsibility to not only analyze data but also to guide their organizations in leveraging data for impactful storytelling.