Speech Processing - Getting on Speaking Terms With Computers:
Speech - recognition systems can be used to enter certain kinds and quantities of data. Successful speech-recognition systems are limited to accepting words and tasks within a relatively small domain. Despite its limitations, speech recognition has a number of applications. Salespeople in the field can enter an order simply by calling the computer and starting the costumer number, item number, and quantity. Physicians in the operating room can request certain information about patient while operating.
Steps in Speech Recognition:
a. Say the Word:
When you speak into a microphone, each sound is broken down into its various frequencies.
b. Digitize the word:
The sounds in each frequency are digitized so they can be manipulated by the computer. Speech-recognition systems actually recognize phonemes, unique sounds that are the basics building blocks of speech.
c. Speech-recognition
Software identifies the phonemes and groups them into words. Match the word The digitized version of the word is matched against similarly formed templates in the system's electronic dictionary. The digitized template is a form that can be stored and interpreted by computers (in 1s and 0s).
d. Display the word or perform the command:
When a match is found, the world is displayed on a monitor or the appropriate command is performed (for example, 'move' the marked text). In some cases, the word is displayed or repeated by a speech synthesizer for confirmation. if no match is found, the speaker is asked to repeat the word.
If you were to purchase a speech-recognition system for your PC, you would receive software, a generic vocabulary database, and a high quality microphone with noise-canceling capabilities. The vocabulary may be only a few hundred words to enable navigation around windows (exit, copy drag, mouse and so on, called reserve words) and spreadsheets, or it may be 100.000 words for legal or medical dictation. Once you have installed the hardware and software, you would need to train the system to recognize your unique speech pattern. We all sound different, even to a computer. To train the system, we simply talk to it for at least 5 minutes, the longer the batter. Even if we say a word twice in succession, it will probably have a different inflection quality. The system uses artificial intelligence techniques to learn our speech patterns and update the vocabulary database accordingly.
The typical speech-recognition system never stops learning, for it is always fine-tuning the vocabulary so it can recognize words with greater speed and accuracy. Each user on a given PC would need to customize his own vocabulary database. Some speech-recognition system are smart enough to recognize the speaker and switch to that person's customized vocabulary database for speech interpretation. To further customize our personal vocabulary database, we can add words that are unique to our working environment.
Uses of Automatic Speech Recognition:
Can you imagine that one day you will sit in front of a computer to type a term paper, and instead of typing, you will actually speak your paper? That possibility is just around the corner. Infect, many people believe that Automatic Speech Recognition (ASR) systems will be standard technology on home computers within the next few years. that's only a small portion of the real potential of ASR systems. Imagine driving in your car and adjusting the temperature by simply saying, "Make it hotter" or watching television and saying, "ESPN" to switch the channel. This will become a reality in your lifetime. Not to be outdone, business are seeking innovative ASR implementations to gain advantage in the marketplace. Some of those organizations are listed below:
- Sprint, US west, Southwestern Bell, and many other telephone service providers already offer voice dialing to their customers. By simply saying "dad" or "pizza," your telephone will automatically dial the number from a list of predefined numbers.
- Kitchen-Aid recently demonstrated voice-controlled refrigerators, ovens, dishwashers, washing machines, and dryers. With a voice-controlled oven, for example, all you have to say it 'prime rib, 8 pounds,' and oven will automatically set the temperature and notify you when dinner is ready.
- Thomas cook travel is working on a voice-controlled travel agency system that you can use over the phone. When you call for plane reservations, a computer will ask you for your destination and decode your response to determine where and when you want to go and when you want to return.
Many organizations are even exploring "interviewer-less interviews." With this type of system, marketing research firms will be able to perform telemarketing activities without human operators. The possibilities really are limitless; anything you can communicate by typing, pointing, or speaking, can probably benefit from an ASR system.
Future of Automatic Speech Recognition:
ASR is an emerging technology because it has a long way to go before it becomes a standard business application. ASR will not become a standard business technology until the following conditions are met.
- Greater Storage for an Expandable Vocabulary:
Sounds, even when phonetically digitized, require more storage space then a word in text form. If you need an ASR system with a large vocabulary, you will need more storage for an audio model.
- Better Feature Analyses to Support Continuous Speech:
The most notable drawback to continuous ASR systems is their limited ability to distinguish words that are quickly and continuously spoken. One of the problems is that we tend to drop consonants when we speak; making it difficult for an ASR system to determine where one word ends and another begins.
- More Dynamic Language Models to Support Speech Understanding:
Speech recognition is great, but true speech understanding would be much batter. For this happen, language models that understand words in context must become more dynamic understanding your words not only within the context of sentence, but also in a paragraph or even in an entire conversation.
- More Flexible Pattern Classification to Support Many People:
For ASR to become truly viable in the workplace, a given system must be usable by anyone, in the same sense that anyone can use a keyboard or mouse. With the exception of speaker independent systems, which usually have a limited vocabulary, ASR system lack this quality. The production of ASR systems that can interpret the speech of anyone; even those suffering from a head cold or speaking in a phrase; will define the true success of automatic speech recognition in business.
|