iPhone Accessibility: Automatic UI Recognition for Blind Users

Apple has consistently prioritized the creation of features designed to assist users with disabilities, and VoiceOver on iOS serves as a crucial resource for individuals with visual impairments—provided that each interface element has been accurately labeled. The company has now introduced a novel capability that leverages machine learning to automatically identify and label all buttons, sliders, and tabs.
Screen Recognition, currently available in iOS 14, is a computer vision system that has been trained using a vast collection of images depicting apps in operation, enabling it to recognize the appearance of buttons and the meaning of icons. These types of systems demonstrate considerable adaptability—their expertise can be tailored to identify specific objects based on the data they are provided, such as cats, facial expressions, or, in this instance, the various components of a user interface.
Consequently, within any application, users can now activate this feature and, within a fraction of a second, every item displayed on the screen will be labeled. And by “every,” it is meant that all elements are included—screen readers must be able to perceive everything a sighted user can see and interact with, encompassing images (which iOS has already been able to summarize concisely for some time), common icons (like home and back), and context-specific icons such as “…” menus found throughout many applications.
The intention is not to render manual labeling unnecessary—developers are best equipped to label their own applications, but updates, evolving standards, and complex scenarios (like in-game interfaces) can sometimes result in accessibility shortcomings.
I spoke with Chris Fleizach from Apple’s iOS accessibility engineering team, and Jeff Bigham from the AI/ML accessibility team, regarding the development of this exceptionally useful new feature. (Details are outlined in a paper scheduled for presentation next year.)
Image Credits: Apple“We sought opportunities to improve accessibility, such as with image descriptions,” explained Fleizach. “In iOS 13, we automated icon labeling—Screen Recognition represents a further advancement. We can analyze the pixels on the screen to determine the hierarchical structure of interactive objects, and this process occurs directly on the device in a matter of tenths of a second.”
This concept is not entirely new; Bigham referenced Outspoken, a screen reader that previously attempted to utilize pixel-level data to identify UI elements. However, while that system required exact matches, the flexible nature of machine learning systems and the processing power of iPhones’ integrated AI accelerators make Screen Recognition significantly more adaptable and effective.
This functionality would not have been feasible just a few years ago—the current state of machine learning and the absence of dedicated processing units would have made such a system excessively demanding on resources, resulting in slower performance and increased battery consumption.
However, once the possibility of such a system emerged, the team began prototyping it with the assistance of their accessibility staff and testing community.
“VoiceOver has long been a leading solution for vision accessibility. The development of Screen Recognition was rooted in collaboration across teams—Accessibility was involved throughout, along with our data collection and annotation partners, the AI/ML team, and, of course, design. This collaborative approach ensured that our machine learning development continued to prioritize an excellent user experience,” stated Bigham.
The process involved capturing thousands of screenshots of popular apps and games, then manually categorizing them as standard UI elements. This labeled data was then used to train the machine learning system, which quickly became capable of independently identifying those same elements.
The task is more intricate than it appears—humans have developed a strong ability to understand the intent behind graphics or text, allowing us to navigate even abstract or creatively designed interfaces. This is not easily replicated by a machine learning model, and the team had to collaborate with it to establish a complex set of rules and hierarchies to ensure the resulting screen reader interpretation is logical.
This new capability is expected to enhance the accessibility of millions of apps, or make them accessible for the first time, to users with visual impairments. It can be activated by navigating to Accessibility settings, then VoiceOver, then VoiceOver Recognition, where image, screen, and text recognition can be enabled or disabled.
Porting Screen Recognition to other platforms, such as the Mac, would be a substantial undertaking, so expectations should be tempered. However, the underlying principle remains valid, although the model itself is not directly transferable to desktop applications, which differ significantly from mobile apps. It is possible that others may pursue this task; the potential of AI-driven accessibility features is only beginning to unfold.
TechCrunch Editor-In-Chief Matthew Panzarino recently interviewed Apple’s Chris Fleizach (Accessibility Engineering Lead for iOS) and Sarah Herrlinger (Senior Director of Global Accessibility Policy & Initiatives)—the interview can be found here:
Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature
