WebAIM - Web Accessibility In Mind

Image Recognition

The Washington Post has an interesting article yesterday about teaching computers to recognize images. Computers do not currently have the ability to determine whether an image of a cat is actually a cat or whether it is a dog, a human, or a telephone booth. But this new technology is teaching computers to better recognize images. It learns over time by having humans describe images in an online matching-type game – and millions of images have been and continue to be identified, thus the computer system can begin to recognize elements of images over time to better determine what the image contains. Google is also getting on board with such technologies.

Because everyday computers cannot currently do this type of processing, developers instead provide alternative text for all non-text elements. These upcoming technologies, could describe images without the need for developers to provide the alternative in text form. However, the computer will likely never be able to determine the content of images. Yes, they might be able to determine if a picture of a cat is a cat, but maybe ‘cat’ is not what the page author is trying to convey. I can imagine lots of descriptions of “blue right arrow”, when “next” is the real content.

As I’ve noted before, alternative text is about the content that is being conveyed and it should rarely be the description of an image. Unfortunately, the web is full of images that have alternative text that describes the image rather than conveys the content of the image. While this exciting technology may provide great advancements for images that do not have alternative text defined, I hope it does not somehow become an excuse for developers to not provide equivalent alternative text for all images.


  1. dave

    An excuse? No, nothing excuses creating exclusionary content. However, I believe the hope is that in time, a computer could “read” an image as easily as it reads text. If the image is a graphic of words, the computer would recognize it as such and read the word. If there is a picture of a blue arrow, perhaps the creator of the page chose the image so that it would have a touch of ambiguity? Should we assume that someone with a vision impairment can’t learn that a right pointing arrow in this context means “next”?

    Not to imply that there isn’t a bunch of garbage masquerading as alternative text. There happens to be bunches of pixels masquerading as effecive visual communication, too.

  2. Jared Smith

    As I wrote, I see great potential in this technology. The ability to read text from images would alone be very valuable. But even humans can’t seem to get alternative text right much of the time – I just would not want to see technology like this to be interpreted to be a solution for the accessibility of all images. I’m not even sure how such technology could interpret between decorative and non-decorative images.

  3. dave

    I really agree that we often don’t get alternative text “right”.

    I also think that an image is often chosen based on nuance. To provide alternative text as you describe, would be to offer one interpretation of the nuance. Possibly, just as I see a blue right pointing arrow and understand “next” users with visual impairments can make that same leap? Should the writer of alternative text be responsible for interpreting an image that could be perfectly interpreted by any thinking adult who hears it described? Isn’t it a bit handicapist to assume an interpretation is needed?