Jump to Content

home > publications > forum proceedings > communication enhancement > input technologies

Forum Proceedings

Stakeholder Forum on Communication Enhancement

Input Technologies: Problem Statement


Summary| I. Biosignal-Based Technology: | II. Eyegaze | III. Gesture Recognition | IV. Multi-modal Technology | V. Speech Recognition | References


Input technology refers to the equipment that an individual will use to access their AAC device. The characteristics and capabilities of input technologies are a critical determinant of communication rate and accuracy. The use of AAC devices with current input technology can be challenging for some persons with severe disabilities. Advances to input technologies are especially required for AAC users with limited or unreliable motor abilities. Individuals who are unable to use direct selection (i.e. finger, toe, headwand, mouthstick) are often limited to switch and scanning methods of input, which are slow and laborious. The T2RERC, along with our customer, industry and research partners seek emerging technologies that will improve the input interface. The areas that need improvement and/or innovation are:

[ Top of Page ]

Biosignal Based Technology

Biosignal-based systems are needed that are reliable, not affected by involuntary movement (e.g. spasticity), not in continuous contact with the body (e.g. prevent skin breakdown), and not affected by skin conditions (i.e. perspiration). The potential to employ biosignal-based technology exists for a large market of users varying from the severely disabled to non-disabled. Biosignal-based technologies could be used in diverse environments (i.e. inside, outside, work, school, day, night) and would not be restricted to lighting conditions.

Description of the Problem

Biosignals use a combination of eye movements, facial muscle movements, and brain wave bio-potentials as input signals for device access. Signals produced by the brain, nervous system and muscles are amplified, digitized, and translated into commands that can be used as input for AAC devices or other related computer systems. Electroencephalographic (EEG - produced by brain activity), Electrooculargraphic (EOG - produced during eye movement), and Electromyographic (EMG - produced during muscle contraction and relaxation) signals are used to collect specific information from various body-brain systems for control and command of the device. Biosignals can be used for discrete on/off control of program commands, switch closures, keyboard commands, the left and/or right mouse buttons and other functions. Some disabilities (i.e. ALS) lend themselves to biosignal control while others (i.e. MS, Huntington's) may be less compatible.

Biosignal-based systems are currently used by AAC users, but only for accessing single switch applications. Individuals using biosignal-based technologies can respond to a signal within 10 milliseconds. This reaction time is significantly faster than volitional hand movement, thereby offering potential time savings of 100-150 milliseconds (which is the time it takes for volitional muscle movement to be produced by the brain).

Biosignals are continuously produced from movements, emotions, and other non-communicative activities. Biosignal-based systems are very sensitive to the other signals produced by these non-communicative activities. In addition, perspiration and other skin conditions can affect the reliability of biosignal systems to some degree.

Current biosignal device sensors require tethering to the device and sensors can cause skin breakdown from continuous contact. Biosignal-based systems also have an inability to prevent or halt signal transmission, which can incorrectly be interpreted as communicative signals. Biosignal technology currently does not have the ability to distinguish between intentional and non-intentional movement. Current biosignal systems require continuous vigilance by the user, reducing or eliminating conversational signals by the AAC user (i.e. eye contact, nodding). Biosignal systems are also subject to interruption from the environment (i.e. noise, light flash, etc.). When employing a biosignal system for input to an AAC device or other related computer systems a user would be unable to perform multiple tasks simultaneously due to signal interference.

Technology Requirements

here is a clear need for advanced biosignal-based systems for AAC and related computer systems. The following requirements provide a guideline for technology solutions. It is not expected that any particular solution will satisfy all requirements.

  • Must be able to distinguish between communicative and non-communicative signals (e.g. communicative signal being and eye movement versus a non-communicative signal being an eye blink).
  • Must not be affected by non-intentional uncontrollable signals (e.g. emotions, reorienting eye positions, physical repositioning, twitch or spasm).
  • Must not be affected by skin conditions (e.g. perspiration, oily skin).
  • Must be able to control or halt signal transmission (i.e. to gate communicative and non communicative signals; reducing signal misinterpretation).
  • Must not require sensors to remain in continuous contact with skin.
  • Should be provide a biosignal input unit that has a wireless communication link to the AAC device.
  • Should have an automatic power saver mode.
  • Should have an independent power supply.
  • Should be reliable throughout the day (i.e. is not affected by time of day factors).
  • Should not require a high degree of concentration.
  • Should employ algorithms that do not remove (filter out) too much of the signal (i.e. more information is available in an unfiltered signal to perform more complex tasks).
  • Should have signal processing capabilities that do not require an excessive amount of time or drain power.
  • Should support flexible control options (i.e. multi level switch and/or continuous mouse control).
  • Should provide universal interface across multiple platforms (i.e. PC, AAC).

[ Top of Page ]


Eyegaze systems are needed that are accurate, easily calibrated, non-fatiguing, and unobtrusive to the user. Eye gaze systems are needed for use in a variety of settings (i.e. work, home, social) and environments (i.e. sunlight, fluorescent light, dark) for access to AAC devices, PC's, and environmental control units (ECU's).

Description of the Problem

Most AAC users employ some sort of pointing (physical pressure or non-contact pointing) or switch technology to interface with their device. [1] For individuals who are severely involved and cannot efficiently use direct selection methods, improved input technologies are needed. Eye gaze systems are a viable solution that could allow people using AAC to access their devices using discrete eye movement.

Eye gaze systems can employ galvanometric sensors, which measure voltages across the eye, or video image processors that examine optical images of the eye. Current eye gaze systems are broadly divided into two categories: head mounted and remote. Remote mounted systems (remote cameras that measure eye movement) are easier and less obtrusive because they do not need to be physically connected to the user. Eye gaze systems work by centering an infrared light at the surface of the eye's cornea thus creating a reflection off the retina. The camera lens records this reflection, and the computer calculates the person's gaze point in relation to their display screen. [2]

Current eye gaze systems are activated using either dwell or switch modes of control. Systems are becoming more refined and can be accurate to within 1 cm, can identify the eye 60 times a second, and can interface with other computer software systems currently available on the market. [3] However, the sampling rate and spatial resolution for these devices is not sufficient to be used in conjunction with many AAC devices. Current eye gaze systems are not able to maintain an appropriate rate needed by high-end communicators (e.g. 25 words per minute at 5 characters per word would require a selection rate of 125 characters per minute).

A wide range of individuals can use eye gaze including persons without disability to persons who are severely disabled. Eye gaze systems might address the needs of individuals with significant cognitive disabilities by drawing on a person's innate tendency to look at what they want. Advanced eye gaze systems have potential for individuals who can't use a traditional keyboard due to hand/wrist impairments such as carpal tunnel syndrome or arthritis.

Technology Requirements

An eye gaze system for AAC devices would address important market needs and represent a clear business opportunity. The following "requirements" provide guidelines for a technology solution-though it is not expected that all requirements will be satisfied in any single solution.

  • Must be customizable to include user's postural orientation (i.e. slouching, head tilting, laying, sitting). Must be compatible with other electronic devices (i.e. PC, cell phone).
  • Must be used in connection with multiple platforms (i.e. AAC, ECU's, PC).
  • Must not be affected by environmental factors such as ambient lighting, fluorescent lighting, and/or lack of lighting.
  • Must have ability to be setup, optimized, connected, and disconnected independent of a third party.
  • Must be self-adjusting to allow for synchronization between eye gaze system and display to accommodate for creep of display targets.
  • Must have wireless connection to the central processing unit.
  • Should have the ability to eliminate the need for high levels of concentration by providing smart controls for initiation and termination of input into the system.
  • Should not be affected by extraneous factors such as contacts, eyeglasses, eyelashes, mascara, dust.
  • Should not require recalibration for changes in user's performance (e.g. increase in number of errors automatically signals the system to increase dwell time).
  • Should have automatic power saver mode.
  • Should have an independent power source separate from the central processing unit.
  • Should be able to be accessed within a reasonable distance from the display (at least 10 ft.).
  • Should not have cosmetic concerns due to sensors, wires, etc.

1.Beukelman, David R. & Mirenda, Pat. Augmentative and Alternative Communication. 2nd Edition. Paul H. Brookes Publishing Co. Baltimore MD. 1998.

2.Cleveland, Nancy. (1994) Eye gaze Human-Computer Interface for People with Disabilities. [Online: www.eyegaze.com/doc.cathuniv.htm]

3.Department of Systems Engineering at the University of Virginia. (5/4/01) [Online: http://www.sys.virginia.edu/research/erica.html]

[ Top of Page ]

Gesture Recognition

Gesture Recognition systems are needed that are accurate, non-fatiguing, not affected by proximity, and that utilize "natural" gestures and are reliable even with involuntary movements (i.e. spasms). Gestures are considered to be a socially acceptable form of communication and can increase interactions in a variety of settings (i.e. home, school, work). Gesture recognition systems are useful for device input since gestures are already part of "natural" communication, making them both efficient and intuitive for the AAC user.

Description of the Problem

Gesture recognition can be defined as the recognition and interpretation of voluntary movements (i.e. face, head, shoulders, hand, etc.) for the purpose of controlling and providing input to the AAC device. A gesture can be defined as any movement of the body whether idiosyncratic (e.g. a gesture that is recognized only by individuals familiar with the gesture language) or iconic (e.g. gesture can be recognized by anyone, the gesture is a direct representation of the word or action) that is used to convey some sort of meaning to an interactant or input interface. Gesture recognition systems employ video cameras (both visual and infrared spectrum cameras have been employed) to record gestures. Signal processing is used to interpret and provide control signals to access devices. Various types of gestures (e.g. continuous and discrete; head and hand gestures) are interpreted through signal processing.

Many individuals using AAC devices have accompanying physical impairments (i.e. limited range of motion, paralysis, flaccidity, spasticity) that limit their ability to access the device. Individuals who have severe speech impairments such as apraxia (i.e. problem with motor programming affecting a persons ability to sequence and say sounds, syllables, and words) or dysarthria (i.e. difficulty producing speech due to muscle incoordination and/or weakness) may benefit from gesture recognition as an input system. Non-traditional AAC users such as tracheotomy patients would also benefit from gesture recognition systems. Gesture recognition systems can provide an alternative means by which to control and access a variety of devices including personal computers and AAC devices.

Gesture recognition systems, using remote cameras, involve no physical contact (e.g. the person is not tethered to the device thereby eliminating skin breakdown. In addition, gesture recognition systems could augment and improve current telecommunication systems (i.e. video conferencing, telephones) as well as eliminate repetitive motion injuries (i.e. carpal tunnel) that sometimes accompany input systems.

Technology Requirements

There is a clear need for advanced gesture recognition systems for AAC and related computer systems. The following "requirements" provide guidelines for technology solutions - though it is not expected that all requirements will be satisfied in any single solution.

  • Must be able to distinguish between communicative gestures and continuous movement.

  • Must not be affected by non-communicative movement (i.e. ordinary movement, involuntary reflexes, etc.).

  • Must not be affected by environmental factors (i.e. lighting, vibration, dust).

  • Must function with user at "normal" distance to PC or AAC device.

  • Must accommodate a wide range of user orientations relative to gesture recognition systems (i.e. body posture, head angle).

  • Must be wireless (e.g. the person is not tethered to the device being controlled).

  • Must have a recognizable point of initiation and point of termination for control gestures.

  • Should be able to quickly switch attention between the gesture recognition system and other activities (e.g. communication partner).

  • Should have automatic/smart adaptation (not require assistance for customization or calibration beyond initial setup).

  • Should require little training to recognize/interpret idiosyncratic gestures.

  • Should not be disrupted by background gestures (i.e. by other people) in the video field.

  • Should have independent power supply.

  • Should have automatic power saver mode (i.e. power down when gesture recognition system not in use). Should not require a high degree of concentration.

  • Should not be fatiguing.

[ Top of Page ]

Multimodal and Multichannel Technology

Multi-modal technology has emerged in the field of AAC as a way to address the needs of a variety of users (from non-disabled to severely disabled). Multi-modal systems provide a user with more than one method of input for their AAC device or related computer system. Multi-modal systems can be created using a combination of several input systems such as speech recognition, gesture recognition, eye gaze, infrared, etc. Seamless switching of input devices in multi-modal systems would allow an AAC user to modify their access method for changes in environment (noise level), context (classroom, home), and device needs (i.e. cell phone, PC, ECU). Multi-channel input simultaneously utilizes control signals generated by two or more input devices (e.g. voice plus hand pointer, gestures plus data glove). Multi-channel input has the potential to dramatically increase selection rate and enable innovative interface designs. Wireless technology offers great potential for multi-modal access systems.

Description of the Problem

Multi-channel input is one method of improving device use by incorporating multiple signals that are generated by one or more methods of input. Researchers are looking to combine multiple simultaneous gestural inputs with other access techniques (e.g. voice input, switches) in order to improve device input and access for the user. [1] The combination of direct manipulation plus speech is intended to use the strengths of one modality to overcome the weaknesses of the other. [2] Multi-channel capabilities also support rapid improvements in scanning based interfaces. Systems incorporating multi-channel access should not require tethering of the input device to the AAC device or other related computer systems.

Multi-modal systems can provide users with more efficient access methods that may increase the rate, reliability, and ease of use for AAC and other related computer systems. Any number of current input technologies can be used and combined to create multi-modal systems for individual's using AAC. Examples of systems that could be combined for multi-modal access are isometric joysticks, speech recognition, virtual reality technologies (i.e. glove), eye gaze systems, gesture recognition systems, biosignal-based systems, etc. Two or more of these systems could be combined to provide the user with continuous reliable input that accommodates varying environments and changing cognitive and physical capabilities.

Some multi-modal input systems are currently being developed. The Archimedes Project at Stanford University seeks to address two crucial access problems: 1) a particular individual's access to one computer, and 2) that individual accessing any computer. The system they created is called the Total Access System. This system consists of two main components, the Personal Accessor (roughly an input system such as speech recognition, keyboard, etc.) and the Total Access Port (TAP, roughly a universal interface between any Personal Accessor and personal computer). Accessors (input system and its customization to the user) vary from person to person according to their abilities and preferences. The TAP interfaces a Personal Accessor to any host computer that the user wants to work on. The Personal Accessor can serve as a communication aid for face-to-face conversation (by controlling a PC-based speech synthesizer or AAC device) by connecting directly with another accessor used by a conversational participant. [3] Initial research focused on dedicated wire-based approaches but systems employing commercial LAN (Local Area Network) and wireless infrastructures are envisioned.

Technology Requirements

A multi-modal system for AAC devices would address important market needs and represent a clear business opportunity. The following "requirements" provide guidelines for a technology solution - though it is not expected that all requirements will be satisfied by any single solution.

HUB (Central Processing Unit)

  • Must have a HUB that acts as an interface between the input device and the AAC system.
  • Must have a HUB that automatically recognizes input device and establishes communication.
  • Must have a HUB that automatically recognizes AAC system and establishes communication.
  • Must have a HUB that provides appropriate signals to AAC device without installing software or calibrating the system.
  • Must have HUB transform signals from input device into signals appropriate for AAC system. Should have a HUB that accommodates simultaneous signals such as gesture recognition input, virtual reality control glove, and a camera.
  • Must have a HUB that is compatible with a broad range of input systems (i.e. eye gaze, Biosignal based, voice recognition, keyboard).
  • Should have input system that communicates with the HUB by wireless means (ideal).
  • Should have HUB that communicates with AAC device via Universal Serial Bus (USB) port (ideal).
  • Should have HUB that has its own power supply.
  • Should have HUB (and input devices) be able to control PC.
  • Should have HUB (and input devices) compatible with (no interference) other electronic devices (i.e. cell phones).
  • Should be able to set-up and take-down input devices and HUB with out the intervention of a third party (except perhaps on the initial setup).
  • Should have input device and HUB that are portable.
  • Should have HUB and input device that adapt to changing user abilities (self-calibrating) throughout the day (i.e. fatigue) and over longer periods (i.e. degenerative diseases).

    Input Device

  • Must have an input device that automatically turns off when another input device is being used to conserve input device power supply.
  • Must allow user to maintain communication across a broad range of environments and activities (i.e. switch input systems easily from voice input in a quiet room to a direct access method in a loud room).
  • Should provide an independent input system (i.e. if one system goes down other input systems can be easily substituted).
  • Should have input devices that require no positioning requirements (i.e. no line of sight, head to be maintained in a certain position or orientation).
  • Should have input device that has its own power supply.
  • Should not have input devices in continuous contact with the body.


1. Department of Systems Engineering at the University of Virginia. (5/4/01) [Online: http://www.sys.virginia.edu/research/erica.html]

2. Wright State University College of Engineering and Computer Science. (1999). Selection Systems. [online]. Available: http://www.cs.wright.edu/bie/rehabengr/AAC/selectmethod.htm. (January 24, 2001)

3. Stanford University: Archimedes Project. (5/10/01). [online: http://archimedes.stanford.edu//arch.html]

Speech Recognition

Speech Recognition systems are needed that are reliable despite the speech quality of the user. Speech Recognition should allow for editing, should be applicable in a variety of environments (i.e. home, work, school), and should not be affected by environmental factors (i.e. background noise). Current systems for speech recognition are used for individuals with perfect speech (articulation) for computer and telephone access. Additional research is being conducted to develop speech recognition systems for individuals with dysarthria (i.e. CP, ALS, MS). These systems may be used to access AAC devices, environmental control units, PCs, etc.

Description of the Problem

Speech recognition systems are already incorporated into current software systems (e.g. ViaVoice for Macintosh, Speech Works 6.5 for telephony). Speech recognition systems are transparent in that they are easy to learn, are natural to the individual, and are widely accepted by society. In addition, systems are relatively inexpensive and for non-disabled individuals with regular speech and volume, speech recognition systems are easy to set up and use.

A division of speech recognition systems that is currently being researched is that of dysarthric speech recognition. These systems would be able to provide a dysarthric speaker with the ability to use their own voice to access an AAC device. The device would recognize the user's voice and process the input. The ENABL system is one example of a device that is being used for dysarthric speech recognition. The system initiates with a spoken command that is detected by the system. The command is analyzed by the speech recognition module, which draws upon acoustic models, grammar and lexicon (vocabulary). Analysis of information is then fed to an output recognizer and parser (a program that dissects source code so that it can be translated into object code), which translates the command. [1]

Other dysarthric speech recognition research focuses on creating teachable interfaces for individuals with dysarthric speech and other severe physical disabilities. This technology would be capable of translating unintelligible vocalizations into effective actions or clearly articulated synthesized speech (e.g. Toco the Toucan [2]) Dysarthric speech databases should be established to create voice templates for speech recognition systems.

Background noise can reduce the reliability of the speech recognition causing misinterpretations of speech sounds. Speech recognition systems may not be applicable in certain environments and settings (i.e. work and school) because of the noise level (e.g. a student couldn't silently compose work causing a distraction for other students). Another issue for speech recognition systems is that current technology won't allow for editing (e.g. once the device records speech, the user can't go back and make changes).

Speech recognition systems have a narrow range of tolerance when recognizing speech. This factor limits their reliability for individuals with varying speech patterns. The user has to be able to reproduce a sound consistently in order for the device to recognize and use it as input. Persons with apraxia may not be able to use speech recognition because of inconsistencies in their speech. Speech recognition is most applicable for individuals with high-level spinal cord injuries whose speech quality is not affected but who may have problems with loudness.

Technology Requirements

There is a clear need for advanced speech recognition systems for AAC and related computer systems. The following requirements provide a guideline for technology solutions - though it is not expected that all requirements will be satisfied in any single solution.

  • Must be able to consistently recognize any reproducible sound.
  • Must be adaptive to a broad range of user's speech abilities ( take advantage of residual speech skills). Must tolerate speech variance, so as to not decrease reliability with changing or inconsistent speech patterns.
  • Must not be affected by background noise.
  • Must be able to be used in a variety of environments (i.e. home, school) without becoming a distraction for others.
  • Should have a wireless link between voice recognition unit and AAC device being controlled.
  • Should have it's own power supply.
  • Should have an automatic power saver mode.
  • Should be easy to set up and use independently.
  • Should be a stand-alone system with accessibility to multiple platforms (i.e. telephone, AAC device, computer).
  • Should tolerate time dependent speech changes (i.e. fatigue, time of day).
  • Should provide a stand-alone system that allows the user to edit speech before output.


  1. Rosengren, Elisabeth. (2000). "Perceptual analysis of dysarthric speech in the ENABL project" TMH-QPSR, KTH 1/2000, pgs. 13-18.
  2. MIT Media Laboratory. "Toco the Toucan." SIGGRAPH 1997. [Online: http://vismod.www.media.mit.edu/vismod/demos/toco/]

[ Top of Page ]