main banner

Update on Speech Recognition

In the past year, the field of voice recognition technology, hereafter referred to as speech recognition, has been completely revolutionized by the advent of continuous speech technology. Continuous speech recognition, which allows the user to dictate by speaking at a normal rate of speech leads the way for speech recognition to become a mainstream software item. The prices of speech recognition software have dropped by an order of 10 to the point where excellent programs can be purchased for under $100. Moreover, where this kind of software used to require special resellers who would not only sell the software, but set up systems and provide extensive training, the software is now available at your local software store. Commercials television primarily suggest that anyone can use speech recognition by simply sitting down in front of your computer and talking-definitely something of an oversimplification.

By contrast, the older speech recognition technology, discrete speech recognition, upon which this Web site and study is based, has always been an exciting technology that failed to break into the mainstream because of its inefficiency for most users: at its best input rates, it could not come close to the rates of most typists in typical business settings. However, the older discrete speech technology had a tremendous impact on the "disabilities market"-especially those people who have some kind of physical problem that interferes with their ability to use the keyboard. It was also being discovered by an increasing number of users with other kinds of disabilities that interfered with their ability to write.

The study described in this Web site was conceived and funded before continuous speech recognition was even commercially available, and so was initially based on discrete speech systems, especially on DragonDictate for Windows, which has an interface which is far more satisfactory than other discrete speech systems for most individuals with physical and learning disabilities. Now that continuous speech recognition software is readily available and inexpensive, we have had to address this technology alongside discrete speech. Within the confines of our study's budget, our efforts in this regard have been limited. However, based on our explorations into continuous speech with a few students and our own impressions after years of watching individuals with disabilities use discrete speech recognition software, we have some serious questions about the utility of continuous speech recognition for many individuals with disabilities. Our concerns are especially strong for children who are less experienced in writing because of their age, and who have difficulty with some of the language-based aspects of writing.

The use of continuous speech software software requires has a relatively high cognitive load. In other words, it is relatively complex to use, for the following reasons:

1. Language processing

Continuous speech recognition requires that the individual speak in "chunks" of language-from phrases to whole sentences, but the longer the better for the software to correctly recognize individual words-so that changing one's mind about the wording of the utterance or about the point one is making can require a fairly elaborate sequence of internal evaluation and reformulation while in the midst of dictating.

2. Pronunciation

The use of continuous speech requires that the person enunciates each word within the continuous stream of speech with relative precision-that is, they must pronounce words quite clearly. This may not seem like much of an issue, because the human brain has little difficulty understanding the imprecisions of normal continuous speech, and we rarely think much about our pronunciation when we talk to others. This is not true with speaking to the computer.

3. Monitoring

The person may already be at the end of their spoken sentence before the words from the beginning of the sentence start to appear on the screen, which makes monitoring one's performance very difficult, and can be confusing.

4. Learning to write

Since mature users are generally already familiar with the distinctions in voice and style between speaking and writing, they can adjust their speech-like dictation to reflect their intention to create a written document. This would be much more difficult for the inexperienced young or learning disabled user.

5. Error correction

Errors are often corrected not simply for single words, but for groups of words which are misidentified by the software, and the determination of which words need to be corrected can sometimes be tricky.

6. Voice models

Voice files in continuous speech technology are developed based on standard adult voices speaking long, complete utterances. These models are likely very different from the patterns exhibited by the language of children and those with disabilities.

In each of the areas identified above, discrete speech technology has some advantage over continuous speech recognition for the population of interest in this study. Discrete speech slows down the dictation process, so that difficulties in language processing, pronunciation and monitoring are much more manageable by the inexperienced writer. In addition, the word-by-word style much more closely reflects the text creation style of beginning writers, and makes error correction more comprehensible. Finally, the voice models in most discrete speech systems are based on single word pronunciations, not continuous speech models, so that system adaptation to the younger user's voice, while not always straightforward, can be more manageable.

There are only two real advantages of continuous speech recognition over discrete for the typical user: speed and accuracy of input. While obviously important, the speed of input is, itself one of the sources of difficulty in language processing and pronunciation. In our experience, students with disabilities who become fluent users of discrete speech recognition make significant gains in the rate of production and spelling accuracy in comparison to their typical writing methods. The accuracy is enhanced because of the presence of much stronger contextual clues about word identification, but its benefits would likely be lost on most younger or language-disabled users.

We already know that discrete speech recognition can be very successful for many learning disabled youngsters (see the accompanying Web pages). Our advice at this point to anyone attempting to introduce speech recognition to younger or disabled users is not necessarily to avoid continuous speech recognition software, but to be aware of its potential pitfalls. If it is not successful, to consider the discrete speech software as an alternative. Moreover, the danger of trying continuous speech recognition first is that, if unsuccessful, it may sour the expectant user on trying the alternative; if that user is already a "failed writer," then that may be one additional failure too many.

spotlight home