Note

This archival content is maintained by WebAIM and NCDAE on behalf of TEITAC and the U.S. Access Board . Additional and up-to-date details on the updates to section 508 and section 255 can be found at the Access Board web site.

Web and Software:Accessibility API Requirements Proposal

Contents

Information needed in a Desktop Accessibility API

Authors:    Peter Korn, Sun Microsystems, Inc.
            Andrew Kirkpatrick, Adobe Systems
            Sean Hayes, Microsoft
            Rich Schwerdtfeger, IBM
Reviewers:  Luke Kowalski, Oracle Corporation
Status:	    Draft, 31 January 2007
            Discussed at the February 7, 2007 meeting

Notes on this draft

This is an update to the API Requirements Proposal ISO Mapping, which is where the authors took apart the ISO language and saw how this API requirements proposal supports that language.

This update contains the following notable changes:

  1. Renames the proposal to reflect what it is: requirements for an Accessibility API, not a specific Accessibility API itself
  2. Changes references from "bounding rectangle" to "boundary", to take into account that what AT needs is to know the bounds of objects on the screen, and as we are seeing 2.5D and 3D interfaces on the desktop, objects will increasing have a non-rectangular boundary
  3. Clarifies the language around text character boundaries (and not bounding rectangles)
  4. Removed the General system information needed section, as that was the one part of the proposal that went beyond the requirements set forth in the ISO "Compatibility with assistive technology" language
  5. Added to this requirements proposal the requirement for an AT to be able to set the keyboard focus to a particular object, and to select a range of text [from ISO 8.6.5]
  6. Annotated each individual Accessibility API requirement with a reference to the part of ISO language that supports it

Summary

This is a proposal to more fully specify the programmatic exposure of user interface element information for accessibility - a specification of the contents of an accessibility API (Application Programming Interface - a "software contract" for software applications). Testability. The ISO accessibility standard language addresses interopreability with AT by describing generally the user interface element information that applications should provide. This enumeration is insufficient to provide AT with all that it needs, and is also not specific enough to be testable.

This proposal flows from a discussion of the issues of programmatic exposure of information raised by part of 1194.21 and 1194.22 (see Discussion of 1194.21(d) and 1194.21(f)) and the related language in 9241-171 section 8.4.2 Note 1, which states:

User interface element information includes, but is not
limited to: general states (such as existence, selection,
focus, and position), attributes (such as size, colour,
role and name), values (such as the text in a static or
editable text field), states specific to particular
classes  of user interface elements (such as On/Off,
depressed/released),and relationships between user interface
elements (such as when one user interface element contains,
names, describes, or affects another).

The proposal below attempts to address these goals of harmonization, AT interopreability, and testability by specifying more precisely the minimum information applications must provide AT. The proposal is a further specification of the ISO enumeration of user interface element information needed for accessibility. This proposal has been harmonized with the existing ISO language. An application that exposed all of the information described below would meet the existing ISO provision.

This proposal is informed by the authors' experiences with the Java platform accessibility API, the UNIX accessibility framework, and the WAI ARIA Roadmap). To the author's knowledge, nothing specified below isn't already covered by the Accessibility APIs in the Java platform, UNIX, WAI ARIA, Mac OS X NSAccessibility, or Microsoft UI Automation.

In addition to enumerating in detail the user interface element information required for accessibility, this proposal contains several example use cases, illustrating how an accessibility API (the platform-specific protocol for providing user interface element information) is used in conjunction with assistive technologies to provide rich access that is equivalent to or better than what users enjoy today (two of at least four examples are fully fleshed out).

Finally, this proposal ends with a small collections of questions – whether certain things are or are not necessary in all accessibility APIs (whether they should be part of the minimum specification of user interface element information needed for accessibility).

Accessibility API Requirements

Minimum 'static' information for all user interface elements shown on the screen:

  1. Role of the object in the user interface (e.g. 'checkbox', 'radio button', 'menu item') [from ISO 8.6.4, Note 1]
  2. Current state(s) of the object (e.g. 'checked', 'focused') [from ISO 8.6.4, Note 1]
  3. Boundary of the object [from ISO 8.6.4, Note 1]
  4. Name of the object (note: not all objects necessarily will have a name, especially if that duplicates text provided elsewhere in the API; but all objects must be able to answer the question “what is your name?”) [from ISO 8.6.4, Note 1]
  5. Description of the object (note: not all objects necessarily will have a description, especially if that duplicates text provided elsewhere in the API; but all objects must be able to answer the question “what is your description?”) [from ISO 8.6.6]

Additional requirement for all user interface elements that a user can manipulate/interact with:

  1. Programmatically setting focus to the object [from ISO 8.6.5]

Additional object information needed for objects that contain text:

  1. The complete text contents [from ISO 8.6.4, Note 1]

Additional object information needed for editable text objects:

  1. The index/offset of any text insertion caret(s) within the text [from ISO 8.6.4, Note 1]
  2. The contents & location of any text selection [from ISO 8.6.4, Note 1]
  3. Programmatically selecting a range of text [from ISO 8.6.5]
  4. The boundary of the character(s) of text returned in any text retrieval call [from ISO 8.6.4, Note 1]
  5. The text attributes of the character(s) of text returned in any text retrieval call (e.g.: bold, italic, underline, font name, font size, font/text color) [from ISO 8.6.4, Note 1]

Additional object information needed for objects within a table:

  1. The row & column of the object [from ISO 8.6.10]
  2. The row & column headers (if any) for the row/column of the object [from ISO 8.6.10]
  3. Cell span information should be provided [from ISO 8.6.10]
  4. A mechanism for determining that a cell is active (the one where user input is directed) [from ISO 8.6.10]

Additional object information needed for objects a user can change the state of, or interact with (besides editable text):

  1. The named actions one can take on an object (e.g. checking/unchecking a checkbox) [from ISO 8.6.5]
  2. Programmatically taking one of those actions [from ISO 8.6.5]

Additional object information needed for objects that present one of a range of values (e.g. a slide or scroll bar):

  1. The minimum value, if one exists for this object [from ISO 8.6.4, Note 1]
  2. The maximum value, if one exists for this object [from ISO 8.6.4, Note 1]
  3. The current value [from ISO 8.6.4, Note 1]
  4. Programmatically setting a new value [from ISO 8.6.5]

Object relationship information needed:

  1. The relationship between labels and the user interface element they are labeling (e.g. in a form) [from ISO 8.6.4 Note 1]
  2. “Parent” and “children” information – what user interface element contains this one (parent), and what user interface elements are contained within this one (children) [from ISO 8.6.4 Note 1]

Dynamic (event) information needed:

  1. State changes (e.g. “checked” to “unchecked”, but also “focused” to “unfocused”, “active” to “unactive”) [from ISO 8.6.7]
  2. Text caret movement [from ISO 8.6.7]
  3. Text insertion [from ISO 8.6.7]
  4. Text selection change [from ISO 8.6.7]
  5. Value changes (e.g. a slider moving up/down and changing value) [from ISO 8.6.7]
  6. Top-level windows appearing/disappearing/moving [from ISO 8.6.7]

Note: there are things in the Java/UNIX accessibility API, the Microsoft UI Automation specification, and the Apple Accessibility API, which aren't listed above. This is intentional. This proposal attempts to specify the minimum information needed to support general desktop accessibility and the existing known assistive technology use cases. By being “as small as possible”, it attempts to preserve the maximum amount of accessibility innovation in the future. Of course, this proposal should be taken in the context the existing 508 “equivalent facilitation” language – if an application is able to provide accessibility facilitation with assistive technologies equivalent to (or better than) what can be done via an accessibility API that meets these criteria, it shall be perfectly acceptable under 508 to do so.

Examples of this API in use

Screen Magnifier with a dialog box

The user is interacting with the Save-As dialog box of a word processor. The dialog box has just popped up onto the screen.

  1. The screen magnifier (the AT) is continuously receiving events noting the mouse location and caret, and panning the magnified view to track the mouse position
  2. The AT receives an event noting that a new window (the dialog box) appeared, and further that the “OK” button is focused. It makes accessibility API calls to obtain the bounding rectangle of the “OK” button and moves the magnified view to encompass it. It also speaks the text “OK” (which it likewise retrieved via the AccessibiltyAPI)
  3. As the user TABs through the dialog box, the AT likewise received those focus events, makes accessibility API calls to obtain information about the user interface objects that now have focus, and where necessary pans the magnified view to encompass them. It also speaks their names.
  4. When the user TABs to the edittext field bearing the label “Filename:”, the AT uses the accessibility API to discover that the edittext field is in a labeled-by relationship with another user interface element – the static text label “Filename:”. It obtains information about both objects, and pans the magnified view to encompass both the label and the start of the blank edittext field, and it further speaks the text “Filename:”.
  5. As the user types their filename, the AT obtains the text that appears in the edittext field not from snooping the keyboard, but from the text-insertion events coming from the editttext field. It likewise receives caret events and pans the magnified view as necessary to ensure the caret always remains within the field of view.


Screen Reader with an AJAX-based Web application

The user is interacting with a website, making an airline reservation using a rich web application. The web application follows the WAI ARIA specification, and while it uses asynchronous Javascript, downloaded images, and other techniques to render form controls, popup menus, etc., it also uses appropriately marked up XHTML to indicate which graphical elements are buttons, text fields, popup menus, etc. and fires events as per the WAI ARIA specification to indicate changes in those user interface elements.

  1. The screen reader (AT) is continuously receiving events noting what object has the focus, and speaking and sending to the attached refreshible Braille display the appropriate information for each object.
  2. The user TABs into the web application, and onto button titled “Make a new reservation”. The web application generates a DOM event indicating the focus change onto the “Make a new reservation” button, which the browser exposes via the platform accessibility API as an accessible user interface element of role “button”, and further fires a focus event indicating that that button now has the focus. The AT receives the focus event, and uses the accessibility API to obtain information about the button – specifically the button's text “Button: Make a new reservation”, which the AT speaks to the user. The AT makes further accessibility API calls to the browser, and obtains the text of the objects to either side of that button, and renders the entire line of information to the refreshible Braille display, using dot 8 on the display to “underline” the text “Make a new reservation” which is how this particular AT has been configured to indicate focus information to the user in Braille.
  3. After pressing <ENTER> to active the “Make a new reservation” button, the user decides to explore the new web user interface presented by the airline reservation web app. The page updates, with a form appearing seeking travel destinations, travel dates, etc. Focus is on a text field whose label is “Departure City”. The web app fires a DOM event indicating focus, which the browser echoes as a focus event in the platform accessibility API. The AT receives this focus event, and makes accessibility API calls to get more information about the editable text field. Among other things, the AT discovers that the text field is in a 'labeling relationship' with another user interface element – specifically it is labeled by a static text field that contains the text “Departure City”. The AT speaks the text “Text field: Departure City”. The AT makes further accessibility API calls to obtains the text of the objects to either side of the text field to format them all on the Braille line, and finds that there aren't any. The AT then renders in Braille the text label “Departure City” and uses a slowly flashing dot 7 & 8 to indicate the location of the text input caret.
  4. The user starts typing the name of their departure city airport code. Departing from Oakland, California, they type the letter 'o'. The Web application offers auto-completion of this text field. The first airport code this airline flies to that starts with O, is OAK for Oakland. The web app displays the letter 'O' in the text field in normal black text, followed by the caret, followed by the letters 'AK' in a light grey font. The Web application sends DOM events indicating the addition of the letters O, A, and K to the text field, which the browser in turn sends on as accessibility API events for the newly created text (and the new caret position within that text). The AT receives these events, and makes further accessibility API calls to determine the character attributes of the three letters O, A, and K. Since the AT is configured to echo keystrokes, the AT first speaks the letter “O”, and then further speaks the new text that appeared “OAK”. In Braille, the AT appends “OAK” to the existing Braille line, and because it was configured to do so with text of any kind of different attribute than plain text, it places a dot 7 underneath the A and K. It further indicates the caret location by flashing dots 7 & 8 on the letter A.
  5. The user presses <ENTER> to accept this as the departure city, and goes on to fill in the destination city and departure date (which we won't describe here for brevity's sake). The user TABs to the “Find flights” button, and presses <ENTER> to activate it, bringing up a table filled with options.
  6. The user invokes the flat review feature of the AT, and starts navigating downward through the table to review the options. The AT makes a succession of accessibility API calls, first traversing upward in the parent/child hierarchy from the first cell of the table (which has the focus), and then downward through all of the children, building an in-memory cache of the objects in this portion of the web page. Within that cache, it then constructs a left-to-right, top-to-bottom ordering of them and uses that ordering for the user's flat review path. As the user issues flat review navigation commands, the AT speaks the appropriate letter/word/line being reviewed amongst. In parallel, the AT updates the Braille display to show a line at a time of flat review, using dot 7 & 8 to indicate where in Braille the flat review is occurring (which letter/word).
  7. The user finds the desired flight option, and uses the touch-cursor on the Braille display that is above one of the characters contained within the radio button for the desired flight. The AT knows (from its cache) that the object whose text is being rendered at that Braille cell location is of role “radio button”, and further that it is an object that can be manipulated via the action portion of the accessibility API. The AT discovers that there is only the “select” action available, and since there is only one, it programmatically activates that action. This causes the browser to convey to the XHTML object that the “selection” action has been taken, and the web application updates itself, just as if a user with a mouse had clicked on that radio button. This in turn causes a state change in the radio button (to the “selected” state), which fires an XHTML event to the browser, which in turn fires the state change event to the AT. The AT makes accessibility API calls to determine the text of the object whose state has changed, finds that the text is “9:45am flight #324”, and speaks “Selected, 9:45am flight #324”. As this text is already what is shown on the Braille display, the display isn't updated.
  8. Finally, the user TABs to the “purchase ticket” button, and finishes the transaction.


Voice Recognition application with a VNC-based remote desktop

The user is running a remote desktop application, which has placed the entire desktop of a remote computer into a window on the computer in front of the user. [[[illustrate voice recognition for switching from local to remote desktop, and interacting with the remote desktop for both command-and-control, and text entry (dictation); should include an example of moving a slider by voice, that illustrates the Accessible Value interface]]]

  1. This section needs a lot of work. What has been described is access to a form in regular HTML with very limited use of JavaScript and no indication of AJAX. You cannot assume tabbing.

Text reading & composition assistance (cognitive impairment support) with an internally developed application for use in schools

The user is a student in high school, interacting with an educational program developed by a local University for use in teaching comparative literature. [[[illustrate TextHelp-like functionality, only working automatically in the text content fields of the University-developed app (perhaps doing database lookups to pull the text citations)]]]

Questions about this document:

  1. This document doesn't contain a set of minimum roles. This is intentional, as that would imply a minimum set of user interface element types. However, does that gap present AT-IT interopreability issues?
  2. This document doesn't contain a set of minimum state definitions. This is intentional, as that would imply a minimum set of user interface element behaviors. However, does that gap present AT-IT interopreability issues?
  3. There is no specification for text attributes (e.g. CSS). This is intentional, as that would imply a minimum set of text stylings. However, does that gap present AT-IT interopreability issues?

--Korn 19:50, January 30, 2007 (MST)

WebAIM is an initiative of:
Center for Persons with Disabilities (CPD) Utah State University