Twitter Tests Reply Revision Feature to Combat Harmful Content

Twitter's New Pause-and-Think Prompt for Tweets

Twitter is currently evaluating a new feature designed to encourage more thoughtful posting. The platform will now, in certain instances, ask users to reconsider their tweets before they are published.

Specifically, when Twitter’s systems identify a reply that appears potentially harmful or offensive, a prompt will appear. This encourages users to revise their message prior to sending it.

How the Prompt Works

Users encountering this prompt will see a pop-up message asking, “Want to review this before Tweeting?”

They are then presented with three options:

Tweet the reply as is.
Utilize the Edit function to modify the text.
Delete the tweet entirely.

A link is also provided to submit feedback if the system incorrectly flagged the message.

Previous Testing and Current Scope

This isn't the first time Twitter has explored this concept.

Similar experiments were conducted in May and August of 2020. While the core message remained consistent, the visual presentation of the buttons differed in those earlier iterations.

Currently, this latest test is limited to iOS users who are accessing the platform in English.

Detecting Harmful Language

Twitter previously explained that its detection systems rely on identifying language patterns similar to those found in previously reported tweets.

The platform leverages data from past reports to recognize potentially problematic phrasing.

The Impact of Subtle Nudges

Research indicates that even minor interventions can influence user behavior.

For instance, prompting users to read an article before retweeting it resulted in a 40% increase in article opens.

Twitter has also implemented features to discourage unreflective retweets and slow the spread of misinformation.

Industry-Wide Trend

Other social media platforms are also employing similar techniques.

Instagram introduced a feature in 2019 that flagged potentially offensive comments before posting, later extending it to captions.

TikTok recently launched a banner questioning users before they share videos containing “unverified content.”

Why Not a Full Rollout?

The reason for not immediately deploying the prompt more broadly remains unclear, especially considering the ongoing issue of online abuse on the platform.

Given the scale of other projects, such as Fleets and Spaces, a simple prompt seems like a feature that could have been fully implemented by now.

Twitter's Explanation

Twitter states that the initial experiment was paused to allow for improvements.

A spokesperson explained, “We paused the experiment once we realized the prompt was inconsistent and that we needed to be more mindful of how we prompted potentially harmful Tweets.”

Further refinements were made to the health model, and a system for user feedback was implemented to address potential errors.

The company also focused on improving the evaluation of offensive language, including insults and hateful remarks, and on differentiating between genuine harm and playful banter between friends.