In the past few years the DoD has placed an emphasis on Command, Control, Communications, Computers, and Intelligence For the Warrior (C4IFTW). C4I is the future for all the military services, and is playing a major role in the planning of future capabilities, makeup, and budgetary issues within DoD. A major factor in C4I FTW is the interface between man and computer. One of the technologies which is "coming of age" is voice recognition. Within a few years (some experts say within the next ten years) giving "orders" or inputting data into a computer by voice may be the normal way of doing business. For C4IFTW, to give the computer a common look and feel so that interfacing with it is almost natural, one solution is to incorporate voice recognition as an interface between the user and the machine.
Voice technology has made great strides within the past three to five years. Manufacturers are beginning to produce voice recognition packages that are ready to use right out of the box. Training commands and vocabularies is optional. These voice recognition packages are being produced to support all of the major computing platform operating systems. These include MS Windows (version 3.x, 95, and NT), UNIX, SunOS, OpenWindows 3.x, and even OS/2. With more of the computing industry focusing on multimedia, voice recognition is becoming a more popular technology.
This thesis took a look at three voice recognition software packages currently on available in the commercial market, DragonDictate version 1.3, VoicePilot version 2.0, and IN3 Voice Command for the SPARCstation version 2.2.2. These three packages were implemented on various systems and evaluated. Of these three packages DragonDictate was the best choice for dictation and navigation. It was shown that DragonDictate's accuracy improved steadily with increased usage, maintaining an accuracy above 98 % in a quiet environment, and 93.5 % accuracy in a relatively noisy environment. The accuracy was able to improve because DragonDictate was able to "learn" the users speech patterns, and apply corrections to voice commands to avoid future errors. The user needed to perform a twenty minute initial training, but this was the only extensive training the program needed. Navigational commands were not required to be trained for each specified application. VoicePilot and IN3 Voice Command both required training for each application or command within each vocabulary. DragonDictate was the simplest package to use, as well as the most accurate in recognizing voice commands.
This thesis provides a preliminary study on the application of voice recognition technology. Following is a list of three areas dealing with applications of voice recognition technology (although this is clearly not an exhaustive list of possible research areas involving voice recognition).
This thesis used voice recognition to automate many of the menu and button commands involved with software to access the Internet and the World Wide Web such as Netscape, Mosaic, and FTP tools. However, once connected many of the functions performed while "browsing" the Web were still done using the mouse. Possible research topics exists in the area of SLAM (Spoken Language Access to Multimedia) and its possible implementation on a machine at the Naval Postgraduate School.
Use of voice recognition in many commercial professional areas has become popular. Research topics can be examined in the possible application of using profession specific voice recognition software in the military counterpart or equivalent Warfare Specialty area, especially under field conditions. Many vendors are currently shipping special editions of voice recognition with vocabularies specifically created for the medical and legal professions.
A group of people suffering from RSI (repetitive strain injury)
have utilized a2x, a piece of public domain software designed
to interface the DragonDictate speech recognition system on a
PC to a workstation running the X window system. Research could
be performed at Naval Postgraduate school to utilize a2x to interface
voice recognition on a PC to a workstation.