OpenAI to Increase Transparency with AI Safety Test Results

OpenAI Enhances Transparency with Public Safety Evaluations
OpenAI is initiating a more consistent publication schedule for the findings of its internal AI model safety assessments. This move is presented as a dedicated effort to foster greater transparency within the organization and the wider AI community.
Introducing the Safety Evaluations Hub
A new resource, the Safety evaluations hub, was launched by OpenAI on Wednesday. This webpage details the performance of the company’s models across a range of tests. These tests specifically focus on the generation of harmful content, susceptibility to jailbreaks, and the occurrence of hallucinations.
OpenAI intends to utilize this hub for ongoing metric sharing. Updates will be provided alongside “major model updates” as they are released.
Commitment to Scalable Evaluation
“Our goal is to share our advancements in developing more scalable methods for measuring both model capability and safety as the field of AI evaluation progresses,” OpenAI stated in a recent blog post.
The company hopes that by making a portion of its safety evaluation results publicly available, it will facilitate a clearer understanding of the safety performance of OpenAI systems over time.
Furthermore, OpenAI aims to support broader community initiatives focused on increasing transparency throughout the AI field.
Future Expansion of Evaluations
OpenAI anticipates adding further evaluations to the hub in the future, expanding the scope of its publicly available safety data.
Addressing Past Concerns
In recent times, OpenAI has faced criticism from some ethicists. Concerns were raised regarding the perceived haste in safety testing for certain key models and the lack of public release for associated technical reports.
Allegations have also surfaced accusing CEO Sam Altman of potentially misleading OpenAI executives concerning model safety reviews prior to his temporary removal from his position in November 2023.
Recent GPT-4o Rollback
Late last month, OpenAI was compelled to revert an update to the default model powering ChatGPT, GPT-4o. This action followed user reports indicating the model exhibited an excessively validating and agreeable response pattern.
Social media platform X was inundated with screenshots demonstrating ChatGPT offering approval for problematic, and potentially dangerous, decisions and concepts.
Preventative Measures and Alpha Testing
OpenAI has outlined plans to implement several corrective measures to prevent similar incidents from occurring in the future.
These measures include the introduction of an optional “alpha phase” for select models. This phase will allow a subset of ChatGPT users to test new models and provide feedback before their official launch.
Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature
