E-mail List Archives
Thread: how to detect images having math expressions
Number of posts in this thread: 8 (In chronological order)
From: Shrirang Prakash Sahasrabudhe
Date: Wed, Jun 04 2008 3:00AM
Subject: how to detect images having math expressions
No previous message | Next message →
Hi,
Is there any programmatic way of detecting use of math expressions inside images?
How do existing accessibility checkers deal with it?
Something like looking at height and width?
"This error is generated for all img elements that have a width and height greater than 100. "
http://checker.atrc.utoronto.ca/servlet/ShowCheck?check=135
Looking for pointers to related research.
Thanks.
Shrirang Prakash Sahasrabudhe
Accessibility specialist- Web 2.0 Research Lab
Infosys Technologies Ltd- Bangalore
Powered by Intellect, Driven by Values
**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are not
to copy, disclose, or distribute this e-mail or its contents to any other person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken
every reasonable precaution to minimize this risk, but is not liable for any damage
you may sustain as a result of any virus in this e-mail. You should carry out your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***
From: Steve Green
Date: Wed, Jun 04 2008 3:40AM
Subject: Re: how to detect images having math expressions
← Previous message | Next message →
This is a typical example of how automated tools waste so much of your time.
It's ludicrous to automatically generate this warning based on the size of
an image. If the tool is configurable I would turn off this check and any
others like it.
The only way to programmatically test for the presence of text or math
expressions is to use some form of OCR. I do not believe there are any
automated tools that are capable of correctly assessing this checkpoint. It
would probably be more cost effective and reliable to produce a list of all
the images and have someone (this need only be a very cheap resource) look
at them all and flag any that contain text or math expressions.
This is really just an extension of checkpoint 1.1, which begs the question
how are verifying that all images have appropriate 'alt' attributes?
Steve
From: Jukka K. Korpela
Date: Wed, Jun 04 2008 3:50AM
Subject: Re: how to detect images having math expressions
← Previous message | Next message →
Shrirang Prakash Sahasrabudhe wrote:
> Is there any programmatic way of detecting use of math expressions
> inside images?
Yes, but nothing practical for the purposes of accessibility testing.
There are programs that scan images for text in them, but math
expressions are very hard in this respect, partly because they contain
special symbols and are often two-dimensional. And how would you
distinguish, say, a short formula containing Greek letters from
non-mathematical text?
> How do existing accessibility checkers deal with it?
They don't.
> Something like looking at height and width?
> "This error is generated for all img elements that have a width and
> height greater than 100. "
That would be surely wrong. A large image can be just anything, and a
math expression can be very compact and fit into small space.
> http://checker.atrc.utoronto.ca/servlet/ShowCheck?check=135
It's about a guideline "All img elements with images containing math
expressions have equivalent MathML markup."
That's rather absurd, since adding MathML is of very little practical
value (for accessibility or otherwise) and takes quite a lot of work if
we count the initial labor needed to dig into MathML, and why shouldn't
we?
Moreover, the page really suggests the wild guess based on dimensions.
No wonder accessibility promotion has a bad reputation in some circles.
The "test process" would just throw images (except, arbitrarily, small
images) at a human tester and ask him to decide whether the image
contains a math expression, and the decision text is "Does this image
contain any math statements that are not described in the document?"
which is something quite different from guideline presented.
There's worse. The page says that an XHTML + MathML page (which is by
its format currently inaccessible to most users!) will pass the test
when it has
<img src="quadeqn.png" alt="solution to the quadratic equation"
width="179" height="63">
apparently because the MathML code is assumed to give the actual
solution. The alt="..." attribute is simply inadequate, since it is not
a _replacement_ for the image. It does not tell the solution. A correct
(though perhaps not optimal) alt attribute would be
alt="(-b±sqrt(b²-4ac))/2"
or, equivalently,
alt="(-b±sqrt(b²-4ac))/2"
(You might consider appending " where sqrt means square root".)
Often, for complex formulas, there is no good solution to the
accessibility issue, since the formula cannot be easily presented in
plain text (in alt="...") or as HTML text. But it is better to face this
rather than require MathML and claim that a page passes accessibility
test when the most important content is not accessible without seeing
the image (except to very few people via MathML).
For further bogosity, the "passing" example isn't even well-formed XML,
so even Firefox refuses to display it. (Missing "/" in the <img> tag.)
Some notes of mine on presenting math formulas on web pages:
http://www.cs.tut.fi/~jkorpela/math/
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
From: Jukka K. Korpela
Date: Wed, Jun 04 2008 4:00AM
Subject: Re: how to detect images having math expressions
← Previous message | Next message →
Steve Green wrote:
> It would probably be more cost effective and reliable to
> produce a list of all the images and have someone (this need only be
> a very cheap resource) look at them all and flag any that contain
> text or math expressions.
It would be even better to ask a human tester to evaluate all images for
accessibility. Texts that contain text or math are just a special case,
and it's probably better to work out all images in one pass. They should
of course be evaluated in context, since it is generally impossible to
judge what is an appropriate alt text just by looking at the image. (For
example, an image might be purely decorative, calling for alt="", in
some context, and a content-rich image requiring a longish textual
alternative in another.)
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
From: Steve Green
Date: Wed, Jun 04 2008 4:20AM
Subject: Re: how to detect images having math expressions
← Previous message | Next message →
I would agree with that but manually visiting every page simply isn't
practical for large websites. I was thinking that a more cost-effective
option might be to assess all the images and identify those that may need
non-null 'alt' attributes. When you find out which pages use these images it
may mean that perhaps only 10% of the pages need to be assessed for that
checkpoint.
Once you get above a few hundred pages it becomes impossible to test every
page for every checkpoint. It is not cost-effective (and may not even be
possible) to do it manually, and automated tools only do a very small part
of the job. I am interested in techniques that make better use of the
available manual resources to do things that automated tools cannot do at
all, when a brute force (i.e. view every page) approach is not an option.
Steve
From: Jukka K. Korpela
Date: Wed, Jun 04 2008 4:50AM
Subject: Re: how to detect images having math expressions
← Previous message | Next message →
Steve Green wrote:
> I would agree with that but manually visiting every page simply isn't
> practical for large websites.
That's why "accessibility testing" is a fairly impractical concept in
general. Alt tags are just a small part (very small part, really) of the
problem.
Pages should be _created_ and maintained so that they are accessible.
Testing them for accessibility is usually not useful, except when there
is a well-defined realistic goal (e.g., testing 0,1 % of the pages of a
large site to prove that the site needs a redesign).
> I was thinking that a more
> cost-effective option might be to assess all the images and identify
> those that may need non-null 'alt' attributes.
Can't be done, I'm afraid.
> When you find out
> which pages use these images it may mean that perhaps only 10% of the
> pages need to be assessed for that checkpoint.
How come? We might _guess_ that, say, 1 by 1 pixel images need alt="",
but that's just a guess, and such images are often symptoms of
accessibility problems that should be studied, instead of blindly
guessing that they are ignorable images. Besides, such images aren't
very popular these days, when formatting can be better achieved using
CSS.
I would say that even with a fairly sophisticated algorithm for
distinguishing "nullable" images from others, the percentage would
rather be 90 %, and what would be the point then?
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
From: Christophe Strobbe
Date: Wed, Jun 04 2008 5:00AM
Subject: Re: how to detect images having math expressions
← Previous message | Next message →
Hi,
At 12:15 4/06/2008, Steve Green wrote:
>I would agree with that but manually visiting every page simply isn't
>practical for large websites. I was thinking that a more cost-effective
>option might be to assess all the images and identify those that may need
>non-null 'alt' attributes. When you find out which pages use these images it
>may mean that perhaps only 10% of the pages need to be assessed for that
>checkpoint.
>
>Once you get above a few hundred pages it becomes impossible to test every
>page for every checkpoint.
You need a good sampling method when evaluating large sites.
In Europe, the WAB Cluster researched this (with a lot of input from
EIAO, one of the three WAB Cluster projects). See Chapter 4 (Scope and
sampling of resources) in the "Core" section of the Unified Web
Evaluation Methodology (UWEM) at
<http://www.wabcluster.org/uwem1_2/>. Section 4.3.2 discusses automated
sampling.
>It is not cost-effective (and may not even be
>possible) to do it manually, and automated tools only do a very small part
>of the job. I am interested in techniques that make better use of the
>available manual resources to do things that automated tools cannot do at
>all, when a brute force (i.e. view every page) approach is not an option.
There are open-source libraries for OCR (<http://jocr.sourceforge.net/>)
but I am not aware of efforts to use them in accessibility evaluation
tools (not even rough classification).
Best regards,
Christophe
>Steve
>
>
>
>
From: Steve Green
Date: Wed, Jun 04 2008 6:30AM
Subject: Re: how to detect images having math expressions
← Previous message | No next message
Of course pages should be created and maintained so that they are
accessible. I hope none of us would disagree with that. However, the reality
is that there are millions of websites that were not created or maintained
that way. As an independent testing company that does not design or maintain
websites, 100% of our clients have existing websites that they wish to
assess and improve. Few if any are in a position to do a ground-up rebuild.
It is not helpful to say that an approach should not be considered just
because it is not 100% reliable. Nor is it helpful to state that sites
should be rebuilt or that every page should be tested. The question that we
and many other organisations face is how to make best use of the available
resources.
So I would be interested to know how you might approach the task of
assessing a 10,000 page website with a view to making the most beneficial
improvements within a budget and timescale that does not allow all pages to
be assessed. Or would you simply not undertake such a task, and leave it to
someone else to worry about.
Steve