Web and Software:Accessibility API Proposal
Information needed in a Desktop Accessibility API
Author: Peter Korn, Sun Microsystems, Inc.
Andrew Kirkpatrick, Adobe Systems
Reviewers: Luke Kowalski, Oracle Corporation
Michele Budris, Earl Johnson, Sun Microsystems, Inc.
Status: Draft, 12 December 2006
Summary
This is a proposal to more fully specify the programmatic exposure of user interface element information for accessibility - a specification of the contents of an accessibility API (Application Programming Interface - a "software contract" for software applications). Testability. The ISO accessibility standard language addresses interopreability with AT by describing generally the user interface element information that applications should provide. This enumeration is insufficient to provide AT with all that it needs, and is also not specific enough to be testable.
The ISO language in question (see Discussion of 1194.21(d) and 1194.21(f)) is:
User interface element information includes, but is not limited to: general states (such as existence, selection, focus, and position), attributes (such as size, colour, role and name), values (such as the text in a static or editable text field), states specific to particular classes of user interface elements (such as On/Off, depressed/released),and relationships between user interface elements (such as when one user interface element contains, names, describes, or affects another).
The proposal below attempts to address these goals of harmonization, AT interopreability, and testability by specifying more precisely the minimum information applications must provide AT. The proposal is a further specification of the ISO enumeration of user interface element information needed for accessibility. Nothing in this proposal runs counter to the ISO language. An application that exposed all of the information described below would meet the existing ISO provision.
This proposal is informed by the author's experiences with the Java platform accessibility API, the UNIX accessibility framework, and the WAI ARIA Roadmap). To the author's knowledge, nothing specified below isn't already covered by the Accessibility APIs in the Java platform, UNIX, WAI ARIA, Mac OS X NSAccessibility, or Microsoft UI Automation.
In addition to enumerating in detail the user interface element information required for accessibility, this proposal contains several example use cases, illustrating how an accessibility API (the platform-specific protocol for providing user interface element information) is used in conjunction with assistive technologies to provide rich access that is equivalent to or better than what users enjoy today (two of at least four examples are fully fleshed out).
Finally, this proposal ends with a small collections of questions – whether certain things are or are not necessary in all accessibility APIs (whether they should be part of the minimum specification of user interface element information needed for accessibility).
Accessibility API information contents
Minimum 'static' information for all user interface elements shown on the screen:
- Role of the object in the user interface (e.g. 'checkbox', 'radio button', 'menu item')
- Current state(s) of the object (e.g. 'checked', 'focused')
- “Parent” and “children” information – what user interface element contains this one (parent), and what user interface elements are contained within this one (children)
- Bounding rectangle
- Name of the object (note: not all objects necessarily will have a name, especially if that duplicates text provided elsewhere in the API; but all objects must be able to answer the question “what is your name?”)
- Description of the object (note: not all objects necessarily will have a description, especially if that duplicates text provided elsewhere in the API; but all objects must be able to answer the question “what is your description?”)
Additional object information needed for objects that contain text:
- The complete text contents
Additional object information needed for editable text objects:
- The local of the text insertion caret within the text (caret index/offset)
- The contents & location of any text selection
- The bounding rectangle of each character of text
- The text attributes for each character of text (e.g.: bold, italic, underline, font name, font size, font/text color)
- General styling information (margins, indent level, etc. These should be consistent with W3C CSS properties. This should be specified vs. limiting to bold, etc.) Since the group is looking to harmonize web and gui this is applicable)
Additional object information needed for objects within a table:
- The row & column of the object
- The row & column headers (if any) for the row/column of the object
- cell span information should be provided
- A mechanism for determining that a cell is active (the one where user input is directed).
Additional object information needed for objects a user can change the state of, or interact with (besides editable text):
- The named actions one can take on an object (e.g. checking/unchecking a checkbox)
- Programmatically taking one of those actions
Additional object information needed for objects that present one of a range of values (e.g. a slide or scroll bar):
- The minimum value
- The maximum value
- The current value
- Programmatically setting a new value
Object relationship information needed:
- The relationship between labels and the user interface element they are labeling (e.g. in a form)
General system information needed:
- What windows are on the screen?
- What is the top-most window?
- What object has focus (duplicates the FOCUS state information)?
- What is the accessible at this (x,y) coordinate?
- Where is the mouse, and what is it's shape?
- Is the object described by another
- Does the object control or flow into another object
Dynamic (event) information needed:
- State changes (e.g. “checked” to “unchecked”, but also “focused” to “unfocused”, “active” to “unactive”)
- Text caret movement
- Text insertion
- Text selection change
- Value changes (e.g. a slider moving up/down and changing value)
- Top-level windows appearing/disappearing/moving
Note: there are things in the Java/UNIX accessibility API, the Microsoft UI Automation specification, and the Apple Accessibility API, which aren't listed above. This is intentional. This proposal attempts to specify the minimum information needed to support general desktop accessibility and the existing known assistive technology use cases. By being “as small as possible”, it attempts to preserve the maximum amount of accessibility innovation in the future. Of course, this proposal should be taken in the context the existing 508 “equivalent facilitation” language – if an application is able to provide accessibility facilitation with assistive technologies equivalent to (or better than) what can be done via an accessibility API that meets these criteria, it shall be perfectly acceptable under 508 to do so.
Examples of this API in use
Screen Magnifier with a dialog box
The user is interacting with the Save-As dialog box of a word processor. The dialog box has just popped up onto the screen.
- The screen magnifier (the AT) is continuously receiving events noting the mouse location and caret, and panning the magnified view to track the mouse position
- The AT receives an event noting that a new window (the dialog box) appeared, and further that the “OK” button is focused. It makes accessibility API calls to obtain the bounding rectangle of the “OK” button and moves the magnified view to encompass it. It also speaks the text “OK” (which it likewise retrieved via the AccessibiltyAPI)
- As the user TABs through the dialog box, the AT likewise received those focus events, makes accessibility API calls to obtain information about the user interface objects that now have focus, and where necessary pans the magnified view to encompass them. It also speaks their names.
- When the user TABs to the edittext field bearing the label “Filename:”, the AT uses the accessibility API to discover that the edittext field is in a labeled-by relationship with another user interface element – the static text label “Filename:”. It obtains information about both objects, and pans the magnified view to encompass both the label and the start of the blank edittext field, and it further speaks the text “Filename:”.
- As the user types their filename, the AT obtains the text that appears in the edittext field not from snooping the keyboard, but from the text-insertion events coming from the editttext field. It likewise receives caret events and pans the magnified view as necessary to ensure the caret always remains within the field of view.
Screen Reader with an AJAX-based Web application
The user is interacting with a website, making an airline reservation using a rich web application. The web application follows the WAI ARIA specification, and while it uses asynchronous Javascript, downloaded images, and other techniques to render form controls, popup menus, etc., it also uses appropriately marked up XHTML to indicate which graphical elements are buttons, text fields, popup menus, etc. and fires events as per the WAI ARIA specification to indicate changes in those user interface elements.
- The screen reader (AT) is continuously receiving events noting what object has the focus, and speaking and sending to the attached refreshible Braille display the appropriate information for each object.
- The user TABs into the web application, and onto button titled “Make a new reservation”. The web application generates a DOM event indicating the focus change onto the “Make a new reservation” button, which the browser exposes via the platform accessibility API as an accessible user interface element of role “button”, and further fires a focus event indicating that that button now has the focus. The AT receives the focus event, and uses the accessibility API to obtain information about the button – specifically the button's text “Button: Make a new reservation”, which the AT speaks to the user. The AT makes further accessibility API calls to the browser, and obtains the text of the objects to either side of that button, and renders the entire line of information to the refreshible Braille display, using dot 8 on the display to “underline” the text “Make a new reservation” which is how this particular AT has been configured to indicate focus information to the user in Braille.
- After pressing <ENTER> to active the “Make a new reservation” button, the user decides to explore the new web user interface presented by the airline reservation web app. The page updates, with a form appearing seeking travel destinations, travel dates, etc. Focus is on a text field whose label is “Departure City”. The web app fires a DOM event indicating focus, which the browser echoes as a focus event in the platform accessibility API. The AT receives this focus event, and makes accessibility API calls to get more information about the editable text field. Among other things, the AT discovers that the text field is in a 'labeling relationship' with another user interface element – specifically it is labeled by a static text field that contains the text “Departure City”. The AT speaks the text “Text field: Departure City”. The AT makes further accessibility API calls to obtains the text of the objects to either side of the text field to format them all on the Braille line, and finds that there aren't any. The AT then renders in Braille the text label “Departure City” and uses a slowly flashing dot 7 & 8 to indicate the location of the text input caret.
- The user starts typing the name of their departure city airport code. Departing from Oakland, California, they type the letter 'o'. The Web application offers auto-completion of this text field. The first airport code this airline flies to that starts with O, is OAK for Oakland. The web app displays the letter 'O' in the text field in normal black text, followed by the caret, followed by the letters 'AK' in a light grey font. The Web application sends DOM events indicating the addition of the letters O, A, and K to the text field, which the browser in turn sends on as accessibility API events for the newly created text (and the new caret position within that text). The AT receives these events, and makes further accessibility API calls to determine the character attributes of the three letters O, A, and K. Since the AT is configured to echo keystrokes, the AT first speaks the letter “O”, and then further speaks the new text that appeared “OAK”. In Braille, the AT appends “OAK” to the existing Braille line, and because it was configured to do so with text of any kind of different attribute than plain text, it places a dot 7 underneath the A and K. It further indicates the caret location by flashing dots 7 & 8 on the letter A.
- The user presses <ENTER> to accept this as the departure city, and goes on to fill in the destination city and departure date (which we won't describe here for brevity's sake). The user TABs to the “Find flights” button, and presses <ENTER> to activate it, bringing up a table filled with options.
- The user invokes the flat review feature of the AT, and starts navigating downward through the table to review the options. The AT makes a succession of accessibility API calls, first traversing upward in the parent/child hierarchy from the first cell of the table (which has the focus), and then downward through all of the children, building an in-memory cache of the objects in this portion of the web page. Within that cache, it then constructs a left-to-right, top-to-bottom ordering of them and uses that ordering for the user's flat review path. As the user issues flat review navigation commands, the AT speaks the appropriate letter/word/line being reviewed amongst. In parallel, the AT updates the Braille display to show a line at a time of flat review, using dot 7 & 8 to indicate where in Braille the flat review is occurring (which letter/word).
- The user finds the desired flight option, and uses the touch-cursor on the Braille display that is above one of the characters contained within the radio button for the desired flight. The AT knows (from its cache) that the object whose text is being rendered at that Braille cell location is of role “radio button”, and further that it is an object that can be manipulated via the action portion of the accessibility API. The AT discovers that there is only the “select” action available, and since there is only one, it programmatically activates that action. This causes the browser to convey to the XHTML object that the “selection” action has been taken, and the web application updates itself, just as if a user with a mouse had clicked on that radio button. This in turn causes a state change in the radio button (to the “selected” state), which fires an XHTML event to the browser, which in turn fires the state change event to the AT. The AT makes accessibility API calls to determine the text of the object whose state has changed, finds that the text is “9:45am flight #324”, and speaks “Selected, 9:45am flight #324”. As this text is already what is shown on the Braille display, the display isn't updated.
- Finally, the user TABs to the “purchase ticket” button, and finishes the transaction.
Voice Recognition application with a VNC-based remote desktop
The user is running a remote desktop application, which has placed the entire desktop of a remote computer into a window on the computer in front of the user. [[[illustrate voice recognition for switching from local to remote desktop, and interacting with the remote desktop for both command-and-control, and text entry (dictation); should include an example of moving a slider by voice, that illustrates the Accessible Value interface]]]
- This section needs a lot of work. What has been described is access to a form in regular HTML with very limited use of JavaScript and no indication of AJAX. You cannot assume tabbing.
Text reading & composition assistance (cognitive impairment support) with an internally developed application for use in schools
The user is a student in high school, interacting with an educational program developed by a local University for use in teaching comparative literature. [[[illustrate TextHelp-like functionality, only working automatically in the text content fields of the University-developed app (perhaps doing database lookups to pull the text citations)]]]
Questions about this document:
- This document doesn't contain a set of minimum roles. This is intentional, as that would imply a minimum set of user interface element types. However, does that gap present AT-IT interopreability issues?
- This document doesn't contain a set of minimum state definitions. This is intentional, as that would imply a minimum set of user interface element behaviors. However, does that gap present AT-IT interopreability issues?
- There is no specification for text attributes (e.g. CSS). This is intentional, as that would imply a minimum set of text stylings. However, does that gap present AT-IT interopreability issues?
--Korn 19:50, December 12, 2006 (MST)