E-mail List Archives

Re: Well formed verses Valid code

for

From: Phil Teare
Date: Feb 26, 2007 6:20AM


Sorry didn't realize Josh and I had gone 'off forum' for a bit (just kept
hitting reply, thinking I was answering you too)...


> Well (from memory), the spec says to either encode the ampersands, or
> use semi-colons (which very few people do), so it should be on the
> authoring tool and/or server config really.

Agreed, but in practice, it'll be me who has the clean up after those who
don't, so I'll keep believing its up to me (and other AT devs) for the bits
I can possibly do something about...
My suggestion (and that's all it is) is that people put code snippets in a
headed div

<h3> CODE </h3>
<div>
<code>Evil marked up markup</code>
</div>

As this allows the user to avoid the code altogether, more easily
(especially if the div is styled and the user has some sight - most do).

Just: "Text you want to read <code> encoded stuff that could adjust your AT
settings or just babbel crap for 5 mins... </code> More stuff you want to
read"
, is not good. More below as to why.

Josh had asked if it was only Talklets that was effected (poss why he
politely took it off forum) and I've said no. I doubt it. Because of the
complexity of possibilities that could arise from partial code snippets and
the fact that most issues are engine dependent, not reader dependent, code
snippets even if encoded, are likely to cause issue with all AT, depending
upon which engine the user uses, its settings, the code snippet, as well as
the AT's methods of parsing and passing:

Part of my reply [slightly edited]:

> *"e.g. if any of the code snippet looks like the TTS engine's take
> on SOAP, SAMPA, IPA, or more likely its own proprietory XML based insertable
> command strings (for a section to be read faster, louder, by a different
> voice etc...), then you've got an issue. If the TTS engine has its own
> 'on/off switch' for reading XML containing strings at all, many do, and
> they're usually proprietary methods not supported by all AT, then you have a
> problem. If you strip everything XMLish, then the code will hardly be read,
> and thus not make sense. If the code is malformed and you try to strip it,
> you'll have a problem. There are tricks, like converting "<" to " angle
> braket ". But obviously this puts a stop to any hope of useing the engines
> nice XML based command string features... to issues are complex, endless and
> often impossible to entirely resolve for all TTS engines."*


Sorry if this is slightly tangential from the original issue, but does it
make sense? How would others see this better dealt with?

Cheers
Phil

Another reply I thought was to you guys too:


> Sorry. Yes, I'll clarify...
>
> When using & or &amp; or numeric Unicode representation even, the system
> had bugged in several different ways. Not reading that which came after (due
> to the system thinking the string variable to be read had finished and the
> next variable was being passed), misreading it (as 'amperes' - the most
> recent issue spotted - and fixed), etc....
>
> All of these I'm happy to call bugs, and rest on my head to solve (which I
> hope I have).
>
> However! Angle brackets, slashes,
>
> > && and ||
>
> etc... encoded or otherwise, can simply be too complex to combat. As the
> permutations are endless... Sure you can find a solution to fix any issue
> that is found, similar to the above, but in practice, not every.
>
> So if you are putting code or markup on a page (obviously most of the time
> you'd have to encode this, as otherwise it'd be rendered), the best thing to
> do IMO would be to put it in its own container (most likely a div). That
> way, if it messes with the AT, at least you can read the rest of the page/
> adjoining elements and you'll know why it may being reading badly, or not at
> all (because it'll be headed 'Code').
>
> So... For small common instances of special character (e.g. &) just encode
> them and use them as you wish, and we (the AT Devs) will try to cope. BUT if
> you're marking up markup or script (which you'll prob have to encode anyway)
> please stick it in a div and label it code (as is often the case on BBs and
> such anyway). I'm sure this could be refined, but its better than nothing.
>
>
> Is that clearer? Hopes so.
>

--
Phil Teare,
Technical Director & Lead Developer,
http://www.talklets.com from Textic Ltd.
(44) [0] 77 68479904