main banner

Spotlight on Voice Recognition:
HTML Version without frames


View the document by individual questions



THE FUNDAMENTALS OF VOICE RECOGNITION

1. What is voice recognition technology?

Voice recognition is a computer application that lets people control a computer by speaking to it. In other words, rather than using a keyboard to communicate with the computer, the user speaks commands into a microphone (usually on a headset) that is connected to a computer.

By speaking into the microphone, users can do two things. First, they can tell their computers to execute commands such as open a document, save changes, delete a paragraph, even move the cursor--all without touching a key. Second, users can write using voice recognition in conjunction with a standard word processing program. When users speak into the microphone their words can appear on a computer screen in a word processing format, ready for revision and editing.

2. How does voice recognition work?

First, to operate a computer through voice, the user must learn how to dictate in a word-by-word manner known as "discrete speech." In other words, the computer cannot recognize individual words if they are spoken the way people usually speak--in fluent sentences or "continuous speech." Next, the user must "teach" the system to recognize his or her voice through a combination of training and usage. We all pronounce individual words in different ways, and voice recognition software cannot simply recognize everyone's voice right off the bat.

As the user speaks to the system, the software creates a user-specific voice file that contains a lot of information about his or her voice qualities and pronunciations. The system uses this information to make its best guess at what each word is as it is dictated.

The process of "familiarizing" the voice recognition software with an individual voice takes time. When a user takes the time to properly train and use the voice recognition system, which creates a strong and accurate voice file, the system will supply the correct word most of the time. However, the system will never achieve a 100% accuracy rate in all situations. Sometimes the computer just doesn't get it right and suggests the wrong word. The user must then stop and correct the system.

3. What happens when the computer does not recognize a dictated word correctly?

Because the computer "knows" it occasionally makes mistakes, for each word it offers as its best guess, it generates a list of alternative words. In some voice recognition programs, this list appears in a suggestion window on the screen and the words in it change with each dictation. The user can correct a mistake by choosing the desired word from this list if it appears there.

If the correct word is not in this list of alternatives, the user can spell it aloud, letter by letter, or begin typing the letters on the keyboard. The computer will use this information to predict the right word.

If you haven't seen the screen-by-screen demonstration of how voice recognition works, now would be a good time to do so by selecting the demo button below. Keep in mind that every system works a little bit differently, but this generic demonstration will help you visualize the essential procedures that all systems rely on to some degree. After the demonstration you will have the opportunity to ask me questions.

demo icon Link to demo

4. What exactly constitutes a voice recognition system?

A voice recognition system is made up of a computer with system software, voice recognition software, a microphone, and usually a sound card. To use voice recognition to word process, a word processing program is also needed. Each software program has different hardware requirements, but generally speaking a more powerful computer is needed--typically with a Pentium or a very fast 486-based CPU and at least 16 MB of RAM.

In general, the voice recognition software itself is built on three parts: a large electronic dictionary (e.g., a 150,000 word dictionary from some publisher such as Merriam-Webster), a smaller active dictionary that reflects the user's own usage, and a voice model.

5. How do voice recognition systems differ from one another?

Voice recognition systems vary along several parameters. First, there are dictation systems with vocabularies of varying size. Large dictation systems have large active vocabularies and enable users to enter large amounts of text with varied vocabulary. Large dictation systems generally have a vocabulary of 20,000 words or more. Smaller dictation systems may have a vocabulary of 10,000 words or less.

Second, there are command and control systems which are not designed for text entry, but rather for controlling various applications on the computer or using the computer as a voice-activated environmental control unit for other electronic applications (e.g., answering the phone, turning on the VCR). Typically these systems have vocabularies that are about 1,000 words or commands.

Third, systems are sometimes referred to as "speaker dependent" or "speaker independent." Speaker dependent systems cannot be used at all unless they have been trained to recognize an individual speaker's voice. Speaker independent systems claim to have generally high recognition rates "right out of the box" without training. Some training is required, however, to achieve an acceptable level of accuracy.

6. Aren't voice recognition systems prohibitively expensive?

When I first started using and teaching about voice recognition in 1988, the basic system cost $9,000, not including the computer itself, which had to be a relatively powerful, and therefore costly, machine. Fortunately, the cost of both the hardware and software has dropped dramatically and there are more choices. There are still some powerful voice recognition options that cost thousands. However, it is now possible to purchase a beginning-level voice recognition system for $100 or less, and these will run on a basic multimedia computer. Once an individual has the computer itself, voice recognition is no longer an unusually costly option.

7. Which are the leading voice recognition systems on the market?

For IBM and compatible users, the three leading voice recognition systems are DragonDictate, IBM VoiceType, and Kurzweil Applied Intelligence. For Macintosh users, the primary system is Power Secretary.

Click on the product names to get more information about these systems and to connect to the manufacturers' home pages. But don't get lost out there on the Web. We have more work to do!

8. Which system do you think is the best?

Like all technological solutions, there is no "best" system. Determining the best system depends on the user's needs and the available resources. If the user needs 100% hands-free access to the computer, as of January 1997, DragonDictate is the only program that offers this feature, without having to patch several programs together. Otherwise the preferable system depends on factors such as the type of computer you prefer or have access to, and what kind of support for that system is available.

9. How fast can a person "type" or input text using voice recognition?

Once again, this varies greatly from user to user and, like typical keyboarding, depends to a large degree on the amount of training and practice a user has. Single word dictation rates of up to 90 wpm have been reported, but the average rate for typical adults after training is probably anywhere from 45 to 65 wpm. Speeds may be significantly less for students with disabilities depending on their oral formulation skills and the consistency of their articulation. Word rates can be increased significantly with the use of macros in predictable dictation situations. For example, a user can set up a macro to enter his or her full name and address with one voice command.

10. Will voice recognition systems ever be able to recognize continuous speech?

We have discussed that current voice recognition systems require the user to use "discrete speech" (i.e., inserting a brief pause between words during dictation). Several manufacturers are working on "continuous speech" voice recognition systems that will be able to recognize continuous strings of words more analogous to typical speech. Currently several systems do have a "continuous number generator." This means that a user can enter a string of numbers without inserting pauses.

POTENTIAL USERS

1. How can voice recognition benefit students with physical disabilities?

Some students and adults have physical disabilities that preclude their using a standard keyboard or mouse effectively. For these students, voice recognition is one of several alternative input methods to be explored. Voice recognition may provide a more efficient means of controlling a computer that is less physically and cognitively taxing than other alternative input methods.

However, a student may seem to have the ability to use the keyboard, but have subtler physical difficulties that make voice recognition a more attractive option for them.

Take for example, Jason, a 19-year-old young man who sustained a head injury at the age of 14 in a boating accident. Jason suffered a significant impairment, known as "aphasia," in his production of oral language. This was characterized mostly by great difficulty recalling words and formulating sentences. In addition, he incurred a variety of other cognitive impairments, as well as subtle physical difficulties, including a difficulty with intentional movement called "apraxia," which limited his ability to gain facility with the keyboard.

When we saw Jason he was 2 and 1/2 years post-accident and making considerable progress in regaining language. However, prior to his injury, he had a diagnosis of "dyslexia" which had already affected his ability to read and write. Consequently, he was overcoming the aphasia and apraxia, but also was still suffering from dyslexia, all of which made written language production very difficult for him.

Jason had already had an assistive technology consult elsewhere and had been using word prediction, but with little apparent success or interest. We explored it again, looking at different and slightly newer programs, but found that he frequently lost his train of thought as he coped with the multiple demands of formulating and remembering a sentence, locating the desired key on the keyboard, beginning to spell individual words, locating them in a list, looking back and forth from the keyboard to the monitor, etc.

We then presented voice recognition with synthetic speech readback of the text Jason had created. In the very supported examination environment, it worked very well for him; he could keep his attention focused in one place for much of the time, the preferred word choice was usually given first, and so forth.

Based on our recommendation, Jason's parents and school district collaborated to purchase a voice recognition system on a notebook computer for him so that he could work at home and at school. At school he did his dictation with his tutor/aide in a resource room, where it was relatively quiet compared to many other environments in the school. The school also placed another voice recognition system in Jason's classroom for other students to get trained on and to use for some writing. As of this year, there are up to five students in the school who are beginning to use voice recognition for writing. Jason graduated last year and has gone on to an art college in another state, where he continues to be a successful and increasingly more independent voice recognition user.

spotlight iconMeet Jason

2. How can voice recognition benefit students with learning disabilities?

Voice recognition technology can benefit students who have learning disabilities that interfere with their ability to spell and write. While many such students benefit from standard word processing, the visual-motor demands of keyboarding can be a major stumbling block that compounds the writing difficulties. Similarly students who are the poorest spellers are frequently unable to effectively use standard spell checkers. For whatever reason, if students' oral language skills far outstrip their ability to generate text with pencil and paper or standard word processing, voice recognition may enable them to become accomplished writers by circumventing the most frustrating aspects of text generation.

Take, for example Sara, a 15-year-old sophomore in high school. Sara is a very bright young woman with a learning disability in the area of written language. Like many students with written output difficulties, Sara has the "gift of gab," and readily provides vivid oral descriptions and explanations. Unlike many such students, Sara loves to read and has always been reasonably successful at it. Writing has been a different story for Sara. Her spelling is idiosyncratic at best, and her handwriting is very labored and difficult to read.

I first saw Sara as a fifth grader, after her parents had already purchased a computer for her in hopes that it would address her writing difficulties. The purpose of this visit was to address issues about using the computer in school. However, we quickly discovered that Sara was still struggling. As bad as her handwriting was, it was still faster than her ability to use the keyboard, and she did not have the patience to plod along in her "hunt and peck mode." Despite several months of keyboarding instruction in a computer lab at school, Sara was still struggling with learning key locations.

The computer provided little support in spelling as well. Her attempted spellings were so discrepant from the correct form that they foiled regular spellcheckers. Despite recommendations for training and support, by the end of fifth grade, Sara was not progressing in using the computer, and was getting ever more discouraged about school. Her preferred mode was to write as little as she could, and if possible, not at all.

We decided to launch a series of trial sessions with voice recognition over the summer. Within two sessions, Sara had begun to tell a yarn that would eventually spin out over the summer to a 10 page neighborhood epic. She was very enthusiastic and felt she had found the answer to her problems. Unfortunately, voice recognition systems at that point cost thousands of dollars and required a different computer than Sara had access to at home or at school. Two years passed as Sara became more discouraged about school and recommendations for the system fell on deaf ears at the school department. Eventually Sara's parents were able to secure a system for to use at home. Sara learned quickly and once again her natural writing talents came to the fore. At the end of that year, Sara was one of two school-wide recipients of a coveted creative writing award.

spotlight iconMeet Sara

3. If learning disabled students use voice recognition for writing, are they still able to use other methods?

Certainly. Usually, learning disabled students who use voice recognition are only able to do so in certain circumstances and therefore must use other methods of writing at other times. However, they often come to view voice recognition as their text-entry method of choice whenever they have a chance. Moreover, for some students, using voice recognition for writing enables them to regain confidence in themselves as writers, and in turn to persevere with other writing methods.

Ben, a 17-year-old boy, is a clear example of this phenomenon. He is a very bright youngster who came to our program during fifth grade for an assistive technology consult because of his increasing frustration with difficulties in getting his ideas down in writing. He had used the computer and word processors for a couple of years, and he still was not working at a pace that satisfied him. His parents, who brought him to the session, were very concerned about his frustration and perception of himself as unsuccessful and even incapable. They told me later that Ben, who had always loved school, had grown to dread this daily, negative experience because it reinforced his image of himself as a poor writer. Even at that young age, Ben had even expressed a desire to quit school.

We looked at a number of "lower-tech" options, but none of these worked for Ben. He simply could not manage to write efficiently enough, even with the benefit of word prediction. However, when he first tried voice recognition, it was like watching a light go on over his head. He, and his parents, were immediately very excited by the potential they saw in this system, and they went about obtaining a system for Ben on their own that he could use at home.

Ben used voice recognition throughout sixth and most of seventh grade. At the end of seventh grade, two critical events occurred: he got a terrible head cold and his voice changed! During this period, the voice recognition software had great difficulty understanding his changed voice and Ben found himself typing lots of corrections for the software. Despite his frustration, Ben learned to type in the process of correcting the software. In fact, during that time, Ben became so proficient with the keyboard that he dropped voice recognition all together.

It has now been more than two years since Ben stopped using voice recognition, and he has successfully maintained his transition back to typing. He now attends an academically challenging high school in the Boston area and is doing very well. In his own estimation, Ben thinks he is an "average" writer among his peers.

Despite the fact that Ben's family bought the voice recognition system when it was still fairly expensive, they think it was money very well spent. His father said that using voice recognition "saved Ben's life" in the sense that it kept him from giving up in school.

spotlight iconMeet Ben

4. Is voice recognition appropriate for all students with writing difficulties?

No. Voice recognition IS a promising technology, but like all other technological solutions, it is not necessarily appropriate for every student who experiences difficulty with writing. In exploring the use of voice recognition technology by a particular student, one should consider several skill areas that come into play:

Cognitively, students are asked to attend to several tasks at the same time. For example, students must be able to compose orally while operating the system through oral commands. They must be able to tell which aspect of the program is voice recognition and which is word processing. In other words, students will most likely fair better if they are somewhat flexible in their thinking and are able to juggle several tasks at once.

Linguistically, students must eventually understand the differences between written and spoken forms of language so that they can adopt a more formalistic style of talking for writing. They must be able to dictate in a word-by-word manner and simultaneously monitor both their written language and the system.

Academically, students must have sufficient word reading skills to accurately read alternative word lists and distinguish between visually similar words. They must be able to detect when the system makes a mistake. And, they must have sufficient phonetic spelling skills to prompt the system to generate the correct word when it has made a mistake.

Behaviorally, students must be motivated to learn the system and improve their writing skills. They must persevere through training and accept that they use a methodology different from the one most of their peers use. If students bring a positive attitude to the process, they can help themselves a great deal.

5. Do students need to have all the skills you mentioned in the preceding question to be able to use voice recognition?

Not necessarily. Most individuals who are strong in the skill areas I've mentioned will likely be able to write independently after being properly trained in the use of the system. However, if a student is weak in one or two of these areas, he or she may still be able to become proficient with voice recognition and derive significant benefit from it. In this case, however, more intensive instructional support will be needed, particularly in the early stages of training and use. As a clinician, I rarely discount any individual as a potential user based solely on his or her cognitive and language profile. A wide spectrum of individuals can use voice recognition if enough external support is provided.

6. How can one best determine whether or not an individual student can use voice recognition?

To rule voice recognition in or out, the student must have the opportunity to try voice recognition, perhaps over several sessions. If a school has purchased a system for multiple users, appropriate students can experiment with the approach in this setting. Alternatively, this exploration can be done with the help of an assistive technology evaluation team in a clinical setting or a person who routinely trains users. In either case, be cautious when working with trainers who also sell the software, because their assessment of the student's potential may be colored by their desire to sell the product.

7. Can students with speech impairments use voice recognition?

Some students with physical disabilities may also have labored or inconsistent speech. Even though speech impairments may complicate the picture, they do not necessarily preclude the student's using voice recognition. More likely, students with speech difficulties will need to spend more time training the system than students without such impairments.

8. Is there any research on who are the best potential users?

Because educational research on the use of voice recognition technology is in its infancy, very few studies exist to date on the possible benefits of this system for students with disabilities. One promising study (Higgins & Zvi, 1995) at California State University at Northridge explored the performance of learning disabled college students using voice recognition technology to complete the university's written proficiency exam. With the use of this innovation, the learning disabled students achieved the same distribution of scores on the exam as their non disabled peers. With a human transcriber's assistance or with no assistance at all, these same learning disabled students' score distribution fell below that of their non disabled peers.

Another exploratory study (Wetzel, 1996) focused on a single subject-a sixth grade student with learning disabilities. Wetzel was interested in whether middle school students could learn to use a voice recognition system, in this case IBM VoiceType, and whether this system would enhance their communication skills. Wetzel found that the student was able to learn to use the software, but that difficulties with the system's recognition accuracy and the complexity of editing compromised this student's success. This early research points to some of the difficulties in using this technology with students who have disabilities as well as to the potential benefits. For example, because the technology was developed with adult voice models, the software is not as proficient at recognizing the speech of prepubescent youth. The research also suggests that younger students may struggle to a greater degree with the cognitive demands of composing orally while also giving the computer oral directions.



References:

Higgins, E.L., & Zvi, J.C. (1995). Assistive technology for postsecondary students with learning disabilities: From research to practice. Annals of Dyslexia, 45: 123-143.

Wetzel, K. (1996). Speech-recognizing computers: A written-communication tool for students with learning disabilities? Journal of Learning Disabilities, 29(4): 371-380.

TRAINING

1. How important is training?

Proper training is critical. A solid training foundation is the key to on-going success with voice recognition for all users regardless of skill or age. There are in fact two aspects of training with voice recognition.

First, the voice recognition system itself must be properly trained to recognize the student's words. Rather like a speaker and listener who both know the same language, but have widely differing accents, the software tries to accustom itself to the user's voice. This is so that the software can understand every word the user says, even when it is a word that he or she has never said to the system before. However, this does not mean that the user has to say every word before it can be understood by the system.

As we discussed earlier, the software gets accustomed to the user's voice by building an individual model that is modified with every utterance. This model helps the software predict what word to display from the active dictionary with every subsequent user utterance. The better the model, the better the prediction, so that if the software is used correctly, prediction improves with increased usage.

Therefore, the trainer should help the student gain a general understanding of how the voice recognition software works, so that he or she understands the importance of proper usage.

This brings us to the second aspect of training--the student must be trained in all aspects of the system that they need to know. All users, and especially younger users, must be properly trained in the process of saying and selecting the words. Additionally, users must learn how to correct any mismatches between the user's spoken word and the software's predictions. Beyond this, some students may also want or need to learn how to spell by voice, give voice commands to the computer, or even to operate the mouse in order to play their favorite game.

It is also critical that parents, teachers, tutors, or aides who work most closely with the student when he or she is writing attend and observe some of the initial training. If they (the professionals) have an opportunity to try the system themselves, this can help them gain some insight into the student's needs during use.

2. Is training school-age students different from training adults?

Yes. Current training protocols and materials developed for voice recognition are designed to move adults toward independence with the system at a relatively fast clip and provide few accommodations for individual differences. In my experience, younger students can rapidly become overwhelmed during the training process unless modifications are made. In fact, training goals and methods need to be reconceptualized for students, and a slower, more incremental approach is often more successful with this population.

When initiating voice recognition training with students, trainers should consider building knowledge and mastery of three distinct, but interrelated aspects of the task.

First, students must learn how to dictate in a word-by-word manner and at the same time maintain some vocal consistency. To accomplish this, students could dictate in a "free-writing mode" or even practice this skill while reading from a text.

Second, students must learn how to master the voice recognition program itself. This involves learning a relatively complex set of commands and editing procedures that must be applied in conjunction with the word processing program. Again, the composition tasks should be relatively simple while the student is learning to operate the system.

Third, as students gain mastery in using the system, they can begin the task of becoming better writers. Students who have struggled with writing do not automatically become accomplished writers with voice recognition. Students will continue to need help with such skills as idea generation, organization, grammar, and vocabulary.

Click here to view an illustration that will help you visualize the various aspects of training.

3. What might a voice recognition training sequence for younger students look like?

The process of teaching students to use voice recognition must be individualized to their own learning needs and style. Remember that we are talking about students who may be skeptical of their own abilities and who may lack experience in writing. However, experience and common sense about teaching suggest that the process should usually adhere to some variation of the following steps:

  1. The student observes the evaluator or trainer inputting text by using a few of the most basic procedures; such as word-by-word dictation, selection of alternative word, spelling to generate additional choices, and basic error correction.

  2. The student undergoes enough initial training so that the system can identify an initial voice file.

  3. The student undergoes additional training to facilitate accurate dictation.

  4. The student is prompted to write a single, simple sentence (e.g., "I like to go snowboarding and skateboarding." This is done with the voice recognition system turned off. The student says the sentence aloud so that the trainer knows what is going to be said.

  5. With a word processor appropriate to the student's developmental level and interests, the student begins word-by-word dictation with the evaluator attending to all other operational matters, such as using the keyboard, watching the suggestion window, and making alternative selections. The goal at this point is to have the student dictate one sentence so that the system is familiar with the words.

  6. The student dictates the same sentence once or twice more, which allows him or her to experience a greater level of fluency.

  7. The student and trainer decide on a second sentence which uses some of the same words (e.g., "Snowboarding and skateboarding are popular sports.").

  8. The student dictates the new sentence and the preceding steps are repeated with the him or her taking gradual responsibility for operating the system.

4. Once the voice file has been set up, how does the student learn to operate the system?

As the student tries new sentences and gradually assumes responsibility for an increasing number of functions in the software, the trainer should carefully introduce him or her to the latter sequence of steps in learning to operate the system should. The following sequence is the one that we use:

  1. dictation only

  2. dictation plus selection of words from the list of alternatives

  3. dictation and selection plus spelling to train new words or elicit them from the background dictionary

  4. dictation, selection, spelling plus error correction

  5. dictation, selection, spelling, error correction, plus...(At this point, the sequence can be customized even more to the individual student's needs. For example, does he or she need to use voice to control the mouse or access the menus?)

IMPLEMENTATION

1. Where should a system reside? At school, at home, or both?

Once it has been determined that voice recognition is a good fit for a particular student, the issue of where to place the system arises. Ideally, the system should be placed in the environment where it will be most effective in meeting the student's writing goals.

Most secondary students that we know use their voice recognition systems at home. This makes sense for several reasons: students generally have larger blocks of time to write at home than during school; the potential to find a relatively quiet spot for dictation may be greater at home; and students can work at home without concern over how they might appear to others while dictating.

On the other hand, many students who will benefit from using voice recognition are not experienced writers; they may need a considerable amount of instructional support while they compose. A tutor working with the student at home can help remedy this problem to some extent, but may rarely be accessible for the entire time during dictation.

For this reason use of voice recognition in school is also an important option to consider. Jason used a notebook computer and carried his voice recognition system back and forth from home to school. At school he worked in the resource room while writing or dictating, with his instructional aide nearby to provide any assistance or guidance needed. (Note that, based on the school's experience with Jason, five other students now have access to voice recognition systems that they also use in the resource room.)

2. What about the noise factor at school?

Yes, excessive background noise can be a problem. To use voice recognition in the school setting, the student needs a relatively quiet, more or less private place to work. Even though voice recognition is being used in office environments and the software offers a number of internal settings that help control for ambient noise, middle and secondary school students can generate a lot of background noise; therefore placement must be a consideration. Sometimes voice recognition systems are placed in resource rooms, but a corner of the library or a station in the computer lab can also serve the purpose under the right circumstances (e.g., during a quieter, lower use period).

3. Who should provide ongoing training and technical assistance?

At least one person in the educational or home setting should have a deeper technical knowledge of the system so that he or she can provide ongoing training and technical assistance to the student without having to depend on outside trainers or consultants indefinitely.

4. How will students and teachers in the school setting react to a student using voice recognition?

All teachers and service providers working with the student should have a fundamental understanding of voice recognition and what it does and doesn't provide the student. This became a problem for Sara who used the system at home. When Sara brought in papers that she wrote with voice recognition, the teacher, who assumed that Sara's system would automatically correct any mistakes she made, was surprised to see occasional grammatical errors.

Peer acceptance of this technology is generally fairly high, especially if the student is perceived as successful or capable in other ways. However, it is rarely the perceptions of others that matter, but the students' own perceptions of his or her abilities and potential when using the technology. We have not found any single approach that works in this regard, other than general sensitivity to the issue and helping the student find the location in which he or she feels most comfortable working.

5. What about ongoing instructional support?

We have found that even if students do not need help in producing their first drafts by voice recognition, they usually require some additional support in editing and revising. This is particularly the case with students who have an aversion to writing and therefore have little experience with editing and revision. As a consequence, these students have often missed many of the incremental steps learned about writing in the earlier grades.

Therefore, many students who are successfully using voice recognition to create lengthy first drafts of texts (often, for the first time) require help in knowing how to proceed with these texts. Consequently, they should not be cut off from individual instructional support simply because they are using the voice recognition system.

6. What are the implications for the teaching of writing and the curriculum?

Teachers may need assistance in thinking through the implications of voice recognition technology for the teaching of writing at the classroom level and for writing as it is integrated throughout the curriculum. With careful planning, voice recognition can be used to facilitate various stages of the writing process, (i.e. brainstorming, outlining, drafting, revising, editing, publishing). Voice recognition software generally provides a means of segmenting the vocabulary so that the system can be fine-tuned for specific writing assignments in various subjects such as history or science.


spotlight viewing options | spotlight home