AI-Generated Image Captions: Facebook & Instagram Updates

The artificial intelligence that analyzes pictures uploaded to Facebook and Instagram to create accompanying text is now significantly enhanced. This advancement promises benefits for users with visual impairments and could streamline photo searches in the future.
Alternative text, or alt text, is descriptive information embedded within an image’s data that explains its content—for example, “A person standing in a field with a horse,” or “a dog on a boat.” This allows individuals who cannot see the image to understand what it depicts.
While these descriptions are frequently added manually by professionals, those sharing photos on social media platforms often do not add them, or may not even have the option to do so. The recent capability to automatically generate alt text—a technology that has only recently reached sufficient accuracy within the last few years—has proven invaluable in increasing the overall accessibility of social media.
Facebook initially launched its Automatic Alt Text system in 2016, a considerable period of time in the rapidly evolving field of machine learning. Since then, the development team has consistently refined the system, improving both its speed and the level of detail provided. The latest update introduces the ability to generate a more comprehensive description when requested.
The upgraded system can now identify ten times more objects and concepts than it could at its inception, currently recognizing approximately 1,200 different elements. Furthermore, the generated descriptions now include a greater degree of specificity. What was previously described as “Two people by a building” might now be articulated as “A selfie of two people by the Eiffel Tower.” (It’s important to note that the actual descriptions use qualifying language like “may be…” and avoid making unsubstantiated assumptions.)
But the level of detail extends beyond that, even if it isn’t always essential. For example, the AI can identify the relative positioning of people and objects within an image:
Image Credits: FacebookIt is evident that the people are positioned above the drums, and the hats are above the people, details that may seem unnecessary for a general understanding of the image. However, consider an image described as “A house and some trees and a mountain.” Is the house located on the mountain, or in front of it? Are the trees positioned before or behind the house, or perhaps on the mountain in the distance?
To provide a complete and accurate description, these details are crucial, even if the core idea can be conveyed with fewer words. A sighted user can examine the image more closely or enlarge it for further inspection—this new “generate detailed image description” feature provides a similar option for those unable to do so. (This feature is activated with a long press within the Android application or through a custom action in iOS.)
A more descriptive rendering might be something like “A house and some trees in front of a mountain with snow on it.” This provides a clearer and more complete mental image, doesn’t it? (Please note that these examples are illustrative, but represent the type of improvements anticipated.)
The new detailed description functionality will initially be available on Facebook for testing purposes, while the expanded vocabulary will soon be implemented on Instagram. The descriptions are also designed to be concise to facilitate easy translation into the various languages already supported by the applications, although the feature’s rollout across different countries may not be simultaneous.
Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature
