WebAIM - Web Accessibility In Mind

E-mail List Archives

More on doctype


From: Terence de Giere
Date: Jun 4, 2002 9:13AM

The doctype declaration and its associated document type definition file
are part of the International Organization for Standarization's 'ISO
8879:1986 SGML' standard for office documents. This was to provide a
system for structured documents that have a predefined and known
structure. The doctype declaration refers to the document type
definition file, which in an SGML processing system, allows the system
to build the document according to its specific definition. For example
HTML 3.2 and HTML 4.01 each have its own document type definition file,
and in an SGML system, the doctype declaration allows the system to find
and load the right definition file, so the HTML produced is the correct
one for that type. Without the document type definition file, the
processing system is empty and can't do anything.

Most graphical HTML editors are 'hard wired' with some mongrel form of
HTML. While this is not necessarily bad, others on the forum have
pointed out that the Web is accessed by many kinds of user agents. The
crux of the problem is how is a user agent supposed to interpret the
HTML. If each user agent has its own idea of what HTML is, then to work
properly with that agent, the HTML has to be customized for that agent.
The purpose of the ISO standard is to eliminate that kind of variation
so that all conforming SGML systems can process the file in a
satisfactory manner. A doctype and its definition file provide a common
application programming interface (API) to which both the content
creation system and the user agent that reads the file refer to get
consistent results. The process broke down as Netscape and Microsoft
battled it out to get control of Internet content by creating
proprietary versions of HTML that 'works best' with their respective

Much of the problem of accessibility on the Web has to do with trying to
create pages that look the same on browsers that interpret HTML
according to their own rules rather than a common set of rules. Hand
coding HTML to overcome browser quirks produces good visual results
etc., for that browser, but those same lines of code may not work well
on another browser. If all browsers followed the standard, there would
not be so many of these problems.

My own experience is, valid HTML, accessible or not, displays more
consistently accross a wider range of graphical browsers. HTML
validation is a powerful quality control mechanism. Hand coding for
specific browsers is a habit programmers have. They are used to doing
anything to get code to run on the target system. Usually in this case
however, the program is running on a single type of system, or some
small variations of this system, such as for Windows. The program may
not have to work on UNIX or the Mac. On the Internet there are so many
devices and software that process Web pages, that it is impossible, at
least economically, to deal with them all. HTML (and of course XML)
validation requires a partnership with content creators and device and
browser makers, which if successful, will allow the goal of single page
creation for all devices.

As other members of the forum have pointed out, the change from HTML
format to CSS format requires a valid, known structure for the CSS
format to map onto the HTML (or XHTML or XML) page correctly. The only
way this will ever work is for both the content structure and the
browser to follow the same set of rules. A correctly built browser will
not need any error correction to process a valid HTML or XML file. As
new browsers, special access technology browsers, and new devices are
coming onto the Web, we are no longer dealing with just the quirks of
Internet Explorer and Netscape browsers. There are a hundred different
presentations of Web pages on small devices. It is not practical to deal
with all of this by coding for each device and its associated software.

The ISO also standardized credit cards. Creadit cards have to be a
certain size, a certain thickness, and have other standard
characteristics. They work everywhere. But suppose the machines that can
automatically process a credit card all had differing characteristics.
Going to an automatic teller machine (ATM) would require one kind of
card. Getting a subway ticket would require another. Another version of
the card might be needed at a restaurant. Another for a small shop that
does not have electronic processing, and has to hand pull the card with
a carbon paper form. The purpose of the standard is universal use and
efficiency in processing information. The Web is like this. We need
efficiency in processing and presenting information. One of the main
cogs in this process is standarized page coding - valid HTML. It reduces
errors, it works better overall, it provides a powerful quality check,
and provides one of the means for universalizing the Internet. It is
estimated that just dealing with the differences between Internet
Explorer and Netscape add an average of 25% to development costs. It
might be better to spend this money on something more useful, like
usability and accessibility testing.

I know there are a lot of developers and designers that do not agree
with this, which is why we see so much HTML coding on the Web working on
a conceptual model a half-decade old, related to version 3.0 browsers
from Microsoft and Netscape. If there were only one or two browsers, and
disabled access was not a problem, this model would work well, but it is
beginning to outlive its usefulness. If the doctype declaration is used,
it also means that the HTML or XML that follows on the page should match
the document type definition. If it does not match it means the coder
has made mistakes that need to be corrected. Adding the doctype does not
make the code match the definition, and if another coder tries to load
such a page into a validating SGML or XML processing system, all those
errors will need to be fixed before the page can properly loaded. It
reflects poorly on the developer to use a doctype and follow it with bad
HTML.Without the doctype, the HTML is undefined and invalid, and thus
one cannot say exactly that it should have a certain structure. For
mongrel HTML it is better to leave the doctype off. One can always be
added later, the file checked and the errors repaired. It can take a
long time to fix those errors.

Valid HTML also is easer to convert to valid XML, such as transforming
HTML 4.01 Transitional in to XHTML 1.0 Transitional, and valid XML can
be transformed into other valid XML doctypes using XSLT. Without
validation, this process would lead to errors, similar in a way to
genetic mutations caused by errors in the transcription of DNA.

The doctype declaration, and valid HTML is required to meet Priority 2
compliance (Level Double-A) on the W3C Web Accessibility Content
Guidelines. It is not required for Priority 1 compliance (Level A) or to
comply with Section 5089 rules. I would however recommend that valid
HTML always be written, simply as a matter of quality control. I realize
that the situation developers are in may prevent this. A content
management system or graphical HTML software may actively prevent the
creation of valid HTML.

The user is king. Give the user the information and services in a way
they can get it whenever, in whatever form, they want. You can only
control the process of how the file will be rendered so much. Thus one
does have to give up some artistic control to do this, but not
necessarily give up a general, overreaching artistic approach. There
will be more users if you give up some control and provide a
user-centric approach using validated code. The Internet is information
first - presentation is important, but it is better to get the
information in a usable form, rather than a pretty, unusable form. Note
that in usability circles, the Web site that that tests as the most
usable is often not the one that scores highest on visual appeal.

Terence de Giere

To subscribe, unsubscribe, or view list archives,
visit http://www.webaim.org/discussion/