Semantic automation is when user agents, such as browsers and screen readers, create meaning and relationships where the presented meaning and relationships are missing, ambiguous, or incorrect. In short, it’s applying algorithms to try and fix things that are probably broken. It’s computers guessing for good.
In a very simple example, it is Google’s “Did you mean…?” functionality. It’s much of what allows iPhone’s Siri to use loads of data to hopefully figure out what the heck you’re asking it to do.
As an example in the accessibility realm, if a form control does not have an associated label, the JAWS and VoiceOver screen readers will implement algorithms to auto-associate adjacent text to the control. In short, they guess what the label probably is. While this can improve the user experience in many cases, this semantic automation often fails. Even a line break or spanned text can break the current algorithms. And worse, an incorrect label for a control might be read if the layout is complex or different than the norm (such as when labels for checkboxes are placed to the left the checkboxes) .
When computers guess, the results are often not very good. But guessing is usually better than nothing.
Automation and Evaluation
I had switched my primary evaluation platform from JAWS to VoiceOver some time ago because until recently, VoiceOver did not implement semantic automation. It was very literal. If a text box was not properly labeled, it simply identified the presence of the text box, even if there was descriptive text next to it. With the release of iOS5 and Lion, VoiceOver will now auto-associate the adjacent text. When done correctly, this will be very helpful to users, but for evaluation, there’s no way to know if label text is actually associated or if VoiceOver or JAWS is just assuming it should be. And there’s no option to disable this functionality.
This creates a situation where screen reader evaluation and even user testing may not accurately reveal underlying accessibility issues. But this begs the question, if the user agent fixes the issue most of the time, is it really an issue at all?
Automation and Conformance
In order to be compliant with the Web Content Accessibility Guidelines 2.0, you have to implement accessibility. These guidelines don’t address or allow semantic automation. But what if they did? Most of the impactful success criteria could be automated by user agents to some extent:
- 1.1.1 – Alternative text: Image analysis could be performed to determine the content or description of an image.
- 1.2 – Captions and transcripts: Audio recognition could be done to auto-generate a transcript and captions, similar to YouTube’s automatic captioning functionality.
- 1.3.1 – Information and relationships: Headings could be assumed based on text size, length, and location. Form labels could be auto-associated. Table headings could be assumed based on styling and table structure. Lists could be auto-generated when numbers, bullets, sequential items, etc. are used.
- 1.3.2 – Meaningful sequence: The reading and navigation order of content could be based on the visual layout, rather than the underlying markup.
- 1.4.1 – Color, 1.4.3 – Contrast: Browsers could automatically replace colors or increase contrast if they don’t meet certain thresholds.
- 1.4.5 – Images of text: Character recognition could be implemented to replace images of text with true text.
- 2.4.1 – Bypass blocks: A user agent could analyze the document and define navigable page areas based on structure and visual presentation. VoiceOver does this now with auto-webspots.
- 2.4.4 – Link purpose: Screen readers could analyze link context to turn “Click here” into meaningful, descriptive text.
- 3.1.1 and 3.1.2 – Language of Page and Parts: The computer could determine the language of content automatically, or even automatically translate it.
And there’s more. These types of semantic automation would all be very beneficial to users with disabilities, but they will never be as good as authors just doing it right.
Defining the boundaries between what the web page author’s intentions are and what the browser can automatically do for the user is difficult. Should a screen reader automatically perform image analysis on an image that is missing alternative text? Should it do so even though this would present incorrect content much of the time? How would a user know if the screen reader is presenting true page semantics or automated semantics? How can the algorithms be improved to avoid spectacular failures in semantic automation? Etc.
Then Why Bother?
If screen readers automatically and correctly associate form controls for 95% of controls, why bother using label elements? If computers can usually determine table headers or heading structure or video transcripts, etc., then is it worth the effort to do it on my own? Of course the only way to ensure that accessibility is done right, is for authors to do it right. Semantic automation will never be perfect, yet because accessibility is about the human experience, it’s the obligation of the assistive technology to provide the best experience, regardless of the page’s accessibility or lack thereof.
The question is, then, what will ultimately lead to most optimal accessibility? Avoiding semantic automation so that authors are more motivated and required to do it right, or implementing eternally-less-than-perfect semantic automation with the knowledge that authors might never bother to do it right? As with most things in accessibility, the answer is probably somewhere in the middle. What do you think?
Thanks for writing a blog post about this issue! I thought Apple’s motto was to “Think Different” than Microsoft or Freedom Scientific. With this move they are thinking the same as JAWS by using the same form field guessing algorithm.
I stumbled upon this problem because I need to use my Mac to demonstrate an inaccessible form at work. Of course the form sounded terrible in NVDA which I always use as my second tool for testing accessibility; the first being the WAVE 🙂 toolbar. So I test the inaccessible form which is laid out using a table and inconsistent connections of labels to inputs. Half were connected, half were not; many with titles that made no sense. Then I switch to the Mac with Lion 10.7.2 installed and low and behold it sounded accessible? But when I clicked on the labels focus did not go into the input. I began to think I’m crazy and start inspecting the code and there was no for attribute connected to the input’s id attribute. So I pulled out the iPhone with iOS 5 to test and same issue, VoiceOver correctly guessed the field labels and the terribly designed form sounded accessible.
I even slapped together a quick test case to make sure I wasn’t imagining things. http://pauljadam.com/axtest/formTableLayout.html
What bothers me is that there is no ability to turn this guessing feature on or off and there is no documentation as to what Apple’s guessing algorithm is. It appears that it guesses the same way as JAWS, not sure if Freedom documents their method either. Apple did not list this as a new feature for Lion and iOS 5 either so it came as a big surprise.
There is also no consistency between Lion and iOS. You would expect them to guess in the same manner but it appears that iOS is the better guesser of the two.
I absolutely agree that this is a great feature for blind users and will only improve their access to forms on the web which are most always sloppily coded. But it does guess wrong in certain situations which you can test with VO at these two links: http://www.w3.org/WAI/demos/bad/draft/2009/before/survey/ & http://www.washington.edu/accesscomputing/AU/before.html.
The first site from the W3C-WAI does a great job in tricking VoiceOver to guessing wrong on both Lion and iOS 5. In the second site Lion does not guess at all, iOS 5 guesses correctly for the text inputs but guesses wrong for all checkboxes.
Steve Faulkner tweeted that the W3C’s User Agent Accessibility Guidelines recommend that label guessing be a user preference which I agree with. VoiceOver on iOS has a setting where the user 3 choices for navigating images: Always, With Descriptions, or Never. This type of setting could be implemented for the guessing feature.
My hope is that this does not encourage sloppy forms from developers. I already have to worry about the fact that many people will simply run a form through JAWS and say it’s accessible because “it works with JAWS”. It’s going to be harder to convince developers to create accessible HTML if JAWS and VoiceOver keep creating a false impression of a11y.
One thing this guessing feature does not do is make the click target larger than the input area which is a major issue with radio buttons and checkboxes that are very hard to activate if you cannot click the label.
Sadly I will no longer be able to reliably test for accessibility using VoiceOver on my iPhone and Mac. I always use NVDA at work where I’m forced to use Windows XP like most IT folks but I really did not want to have to install Windows on my Mac just for reliable testing with NVDA.
If this new feature if better for blind users to access the web then it does not matter what I think and I will learn to work around it.
I’m curious to hear what other folks in the accessibility world think about this issue as well.
You can find me on twitter @pauljadam
Personally, I feel that screen-reader guessing is counter-productive to broader accessibility.
Catering to one user group or one AT will almost always exclude another.
Picture the scenario where a form is using a table for layout, the label is either before or after the control. The heuristics of a screen-reader is able to (mostly) predict the association.
Someone with a mobility issue may not be able to readily achieve focus on the control, like a checkbox or radiobutton.
Explicit association makes the label an active control for these elements, thus creating a larger target for the control.
If screen-readers could interpret the implicit association with %100 percent accuracy, saying this was “accessible”” would exclude a large number users.
AT should respect standards and not compensate for author failures. Period.
This is the only way we can move standards forward.
If I hear, “It work in JAWS 12” one more time, …
Its fascinating – I increasingly believe that some form of automation / screen scraping is likely to be the way forward in the future, step by step thetechnologies are reaching the point where content and presentation can be seperated. This is a good example of one of thsoe technologies in practice. A second technology would be autocaptions in YouTube or facetagging in Facebook. Each of this examines a form of content, and then offers to reprsent that information in an alternative format or adds additional information to the original which potentially makes it more accessible.
weve been pushing standards for a very long time, and for the forseeable future need to continue to do so – but, need to explore in a very realistic way whether the long term solution to accessible content lies in technlogy innovation, initially alongside our exisiting work
This is no great surprise, we already have a number of technologies that deliver against this model – OCR would be the classic example, without the maturity of that technology we would all still be retyping evetry document from scratch to create alternative formats.
The model is also important as a solution because such technologies would need to operate within the mainstream, the extraction of meaning from content in this way would potentially support greater cross platform interoperability, would support the need for information in different formats which the user could select dependent on setting or specific needs at a moment in time and would potentially assist with translation issues
I guess we need to ask a question, if the technology which is creating your problem became pervasive – giving that increased level of access surely that is a good thing not a bad one, the problem I see is that the level of sophistication is early and that it is device/solution specific not that in itself it is a bad thing per se
whilst I have huge sympathy for Steves view, there will always be author failures, amateurs creating content the way we all envisage the web should be – hence there will always be a very high level of author failure. Perhaps the only lng term way to address that is through the technology itself.
I think this is an important debate, and one that the accessibility community needs to consider at a more strategic level, not least because the balance of approaches would suggest the balance of investment for the future
On twitter at @davebanesaccess
While I understand the developer’s view on this issue and internally agree with it, the user doesn’t really care where the form labels come from. On this I will agree with David–both automation technology and developers need to be educated. 🙂
Regardless, the auto labeling feature should probably be a toggle, just like most browsers these days provide a menu for developers.
Further to the points here, another issue that must be emphasised with regard to guessing is trust. As Vic points out, the user doesn’t care where form labels come from. As a user, it follows that if my screen reader correctly reports labels in 90% of cases, I am going to start to trust that it will report them correctly in all cases. So, when the guess is wrong, I’m probably going to trust it. This could be a real problem. Guessing can create a false sense of trust. Even with a configuration option, this problem still exists because most users will probably leave it enabled.
One thing we’ve considered for NVDA is a command to report a label guess for an unlabelled field. This way, the user is aware that they’re taking a risk and cannot fully trust the result, but they still have the option if they wish.
The situation here seems very similar to how browsers treat invalid markup – they try to guess what the author meant – but results can vary a lot between browsers.
HTML 5 recognises that a large number of the web pages use invalid markup, and has separate conformance requirements for user agents describing how to treat things like incorrectly nested tags (i.e. authors are required to produce valid markup, but user agent behaviour is standardised when they don’t)
This begs the question: should UAAG try to standardise semantic automation? Or would this just codify bad practice?
A succeed guessing system is a self training system but, there are two questions:
– Is there a semiotic tree for included meanings-keywords as a startup?
– How the system applies self correction processes?
The microformat – microdata syntax tries to create classes, that are very close to the human mind but, it has a long way to go.
Sometimes the “nothing” is better than the “wrong” …
As a screen reader user I’m against it unless it is 100% accurate 100% of the time.
Why do we always find ways to let developers off the hook from not doing the job properly in the first place?
cheers
Geof
Very interesting article Jared. Steve gave me a heads up about it. I also use VO for a lot of my testing and this will make me rethink some of the results from HTML5 test cases. Good work!
An interesting issue and set of posts, though not a new set of related problems (i.e., screen readers ignoring valid HTML, providing false positives, or testers/developers not checking source for explicit control labeling versus implicit or, in general, the best valid coding technique)! In perhaps ironic contrasts to the Voiceover-using-implicit labels example, until recently, JAWS use to recognize simple-data-table column headers tagged with TD’s as if they were appropriately tagged with TH’s and the Scope and “col” attribute an value, respectively. I believe Voiceover still ignores those invalid TD’s and recognizes the appropriate TH’s… I concur with my fellow posters above that advancing ICT accessibility will largely occur via our appropriately educating and engaging with all involved parties (e.g., developers, testers, IT/AT developers/vendors, users)…
Thanks, Happy New Year 2012, an Live Well! —Paul