This is a summary of our publication The Effect of Computer-Generated Descriptions on Photo-Sharing Experiences of People with Visual Impairments being presented at CSCW 2018. The lead author, Yuhang Zhao, worked on this project during a research internship with the Core Data Science team at Facebook, working with Facebook Researchers Shaomei Wu and Lindsay Reynolds.
– – –
“A picture is worth a thousand words.” Visual content plays an increasingly important role in our daily communications online, but consuming and creating it poses potential challenges for people with visual impairments. There are more than 39 million people in the world who are blind, and over 246 million people have a severe visual impairment. We want to build technology that helps this community experience Facebook and express themselves the same way that sighted people do.
Previously, we talked to people who use Facebook and are visually impaired to better understand the challenges they face in social interactions around visual content on Facebook. These conversations helped influence the creation of automatic alt-text, an AI-powered feature that automatically describes the content of photos on Facebook in real-time across 29 different languages.
With automatic alt-text, people with visual impairments are able to access information that was previously unavailable to them when they encounter images posted by others on Facebook. However, besides simply consuming information shared by others, people with visual impairment also have a strong presence on social networks, where the stakes are much higher to actively curate and share visual content. Just like most people on Facebook, they take photos to capture daily moments, and they want to share those photographs with their friends and family [1,2].
However, people with visual impairments often find it difficult to understand the contents of photos and select quality photos to share. If a person who is visually impaired takes a photo and does not upload it immediately, it is hard for them to navigate through the album and find that photo independently, especially when many photos have accumulated in their album over time. Moreover, it is difficult for them to judge the quality of a photo—for example, whether the photo is blurry, whether a person has their eyes closed in the photo, or whether the photo is aesthetically pleasing.
Most existing technology solutions to this problem rely on humans (e.g. friends, family, crowd workers) to provide photo descriptions or answer photo-based questions. However, these solutions are hard to scale and sustain due to the social and time requirements of sighted human reviewers, and the potential privacy concerns for people with visual impairments that may arise when they share private photos without necessarily knowing the content of the photo.
To make it easier for people with visual impairments to take and share photos, we designed and experimented with AI-generated photo descriptions to help them select and upload photos.
We interviewed 12 people who use Facebook and are visually impaired. We listened to their stories about photo sharing on Facebook, especially their challenges, current practices, and specific photo-description preferences. Based on these interviews, we designed a new feature for Facebook’s Android app, adding computer-generated descriptions to the photos in local albums to help people with visual impairments select photos to post on Facebook. The image below illustrates how this feature works.
Fig. 2. The workflow of the photo-sharing feature: (a) The entry point to the photo-sharing feature in Facebook; (b) A user can get a photo’s description by setting the focus on a photo; (c) After a user selects a photo to upload, by tapping this photo, she can still get the photo description; (d) When completing the photo selection, a user can double click the “Done” button to post the photos.
We then recruited 6 people who are visually impaired to test the feature for a week. The feedback showed that computer-generated descriptions were useful in terms of helping people recall memories and organize local photos. Participants especially appreciated that the AI could tell them which photos were blurry, who was in the photos, and whether or not people were smiling.
However, since AI-generated descriptions are still limited by the accuracy and richness of the photos, participants could still not completely rely on the technology to pick the best image without human input. For example, one participant took several photos of her dog, and it was not enough for the AI to describe “the photo contains dog” — that is, she needed more details to determine which photos to share, such as “I want to see which direction the dog is looking, what they’re doing, is their tongue out or something.”
The accuracy of our computer vision models also degraded when processing photos taken by people with visual impairments, as those photos are more likely to have issues with blurriness, lighting, or cropping. Adapting the current model, which was trained using photos taken by sighted people, to this application domain is a new challenge that we think is ripe for exploration by the broader research community.
Overall, our research found that people with visual impairments enjoyed using the AI-generated descriptions to understand photo content, recall memories and delete photos with low quality (e.g., photos with blurry faces). This highlighted the ability of this feature to increase photo-sharing efficiency and reduce workload for sighted assistance. Encouraged by these findings, we launched automatic alt-text for photo uploads in 2016, and we will continue expanding its descriptiveness and richness in the future.
We believe that applying AI technologies will improve accessibility and inclusivity. We are excited to share this research as another small step towards empowering everyone to participate and share their voices on Facebook.