Jump to Content
T2RERC  

home > publications > forum proceedings > communication enhancement > voice output and display technologies

Forum Proceedings

Stakeholder Forum on Communication Enhancement

Voice Output and Display Technologies: Forum Data

 

Market Needs | Current Technology | Needed Technology | Barriers to Achieving Needed Technology | References

The following is the raw data collected during the T2RERC's Stakeholder Forum. It reflects the comments and needs as expressed by the Forum participants.

1. Needs (Unmet needs of consumers, clinicians, etc.)

CONTEXT RECOGNITION

  • Need context recognition (automatic knowledge of time and space) that makes appropriate vocabulary available and easily accessible.
  • Device should respond to communication environment (e.g. activities, environments, group members, work, home, and educational situations) and should change performance with context.
  • When accessing computer applications, a different voice should be associated with each program so that active programs can be identified just by voice. (Similar to text readers that use different voices when different applications are active.)
  • Need individualized, pre-programmed prompts for different situations.
  • Need voice cues for the interlocutor (i.e. the interlocutor pays attention to a louder "speaking voice" and ignores the user's "composition voice").
  • Voices need ability to automatically adapt in response to AAC processing.
  • Need automatic set-up of device features for various environments (i.e. change voices, loudness, etc). · Need automatic volume control that adjusts to environmental noise level.
  • Need speech recognition to obtain contextual information from interlocutor speech for the prediction of vocabulary. (Note: Substantial privacy issues may result from the recording of another's voice or discussion.)
  • Need clock for time-contextual information (i.e. 8 AM pulls up breakfast food page, etc.)

DISPLAY TECHNOLOGY

  • In large social settings, some users currently utilize display for communicating privately. This solution is not acceptable for bright or sunlit environments.
  • Utilizing both speech and text output may make it easier for unfamiliar interlocutors to understand speech. Interlocutor can use single display positioned for user, but this may not be as effective because the communication partner must re-position himself (i.e. lean over shoulder) to read screen, and this may invade the user's space. A dual display would facilitate dual mode communication by making it easier for interlocutor to view text.
  • Learning one device should generalize to other devices. Customization, symbols, options, layout, operation capabilities, interface capabilities, controls etc. should all have similarities.
  • Need for backlighting on keys.
  • Need to read displays in brightly lit environments (need non-glare coatings/shields for displays).
  • Need ability to customize display (i.e. change font size, color, and style)
  • Need display that can be read in sunlight.
  • Need display that will not create a physical barrier between two communication partners.
  • Need age-friendly displays. Some printed displays are too text based for children, many would like integrated pictures and text.
  • The speaker should face the communication partner.

SPEECH OUTPUT

  • AAC devices need higher voice quality. Speech output should sound natural and human, not computer-generated or synthesized.
  • Need ability to change gender, quality, intonation, and inflection in voice.
  • Need significant improvement to female voices. (Female voice personalities should be at least of equal quality to DECtalktm's "Perfect Paul".) Not many people utilize a female voice. Female users must currently sacrifice femininity of voice for a man's voice with greater clarity and quality.
  • In a fighter airplane, the warning system is given in a female voice, since normal voices are considered to be masculine, and use of a female voice should make it stand out as an alert.
  • Voice output should be at least of equal quality to the phone recordings that are supposed to sound computerized.
  • Automated voice response systems (i.e. those used to check financial account information, university grades, investments, and automated answering systems, etc.) are a specific problem for AAC users because of time constraints. The user can't respond to prompts in real-time and so is kicked off the system. The user must call the auto-response system two or more times to prepare responses to each prompt. This accommodation does not work well for long or complex automated systems.
  • Need ability to sing, tell jokes, and be sarcastic.
  • To accommodate for the device's inability to support intonation and emphasis, consumers use repetition of words (double or triple hitting a word to make a point, for example, "I don't don't don't want it" can substitute for the inflection).
  • People are identified by their AAC voice, therefore quality is important. (People link factors such as identity, intelligence, humor, etc., to a person's voice.)
  • There is a need for individualization and the expression of emotion. (A person may hear one voice from down the hall, and they think it's their friend "Bob", but realize it is some other person also using the "Perfect Paul" voice) · Devices should take advantage of digitized speech for intonation changes. Need digital quality voice that is varied by synthesized speech technology to achieve intonation, emphasis, and stress.
  • Auto-response systems understand synthesized speech better than natural speech. Synthesized speech is reproducible, invariant, clear, and monotone and therefore has higher recognition and better accuracy than recognition of a human's voice.
  • Users and clinicians often are not taught how to customize voices.
  • Need ability to choose from (and adapt) a wider selection of natural voices.
  • User needs the ability to quickly and easily select or change the voice personality for different contexts. Many consumers will switch voices when they are not understood. Some choose a male voice for a specific context (e.g. when using the telephone).
  • Need voices with regional and international accents. · Need ability to program slang terms with accuracy. Device doesn't know how to pronounce slang, and even when words are added it is difficult to customize pronunciation. New words can be placed into the device's dictionary, but slang usage requires elongated vowels or varied pitch that is not available.
  • Once a word has been added to the dictionary, it is a complex process to tell the device how you want the word pronounced (syntax, prosody, etc)
  • Pronunciation needs to depend on context of sentence, not just a properly pronounced word standing alone. · Messages often lose their power [emotional appeal, fallacy, manipulation, sarcasm, etc.] due to incorrect pronunciation.
  • Those who speak naturally add emphasis to their message and can transition smoothly to different words in the sentence. Similarly, there is a need for voice output that allows context to influence word production and pronounce it accordingly (e.g. read, for different tenses). Context recognition should account for the proper pronunciation.
  • Voice output should, when possible and desired, utilize the AAC user's "pre-injury" voice. For acquired, progressive diseases, people want the option to store their old, but own, voice for later use in their device.
  • Need intuitive, easy to use control for instantaneously changing amplification, speed, inflection, volume and emphasis. For example, adding information as the user composes a message without compromising rate of communication.
  • Need for real-time output. People tend to look at display due to speed of output delay.
  • Users need more choices in, and should have full control of, speed variability.
  • Privacy is compromised due to volume range of speech output. Problem environments include bars or other noisy social settings. Participants expressed difficulty participating in any large social activities.
  • Need to control speed and volume of voice output to facilitate communication in diverse environments. In groups, consumers sometimes switch from speech output to using their text display because the speed and volume does not facilitate natural conversation.
  • Need ability to increase volume of output to overcome degree of environmental noise.
  • Need for correct selection of words, individual word recognition and phasing, and pauses between sentences.
  • Need to insert pauses to facilitate understanding of phrases, especially when giving speeches. Currently spaces are inserted only at sentence ends. (e.g. user may want to insert a pause after "but".) Some consumers slow speech way down to make it more understandable, by introducing spaces so people understand word by word or in short phrases.
  • Communications (i.e. sentences or paragraphs) should include proper spacing, pacing, prosody, etc. A specific problem is prepared speech - preparing speech ahead of time is only effective if understood.
  • Loudness and direction of sound output should be controllable to facilitate communication in different environments (e.g. in cars, buses and vans, in classrooms when facing forward, and anytime when the communication partner is not directly in front of device.) Note: Better control of loudness of output also improves privacy.
  • Need AAC speaker accessory on power chairs.
  • Phonetic content should align automatically.

GENERAL

  • AAC should support increased independence.
  • AAC should allow the user to freely communicate.
  • AAC should better support socialization skills (e.g. allow user to be comfortable in initiating conversation, allow user to go into depth in conversation, to negotiate, express feelings, show enthusiasm etc.)
  • AAC should open employment opportunities and help the user achieve success.
  • AAC should facilitate language acquisition.
  • AAC should fit in with and support standard educational structures and methodology for literacy and language education. There must be a clear relationship between symbols and language.
  • AAC device should eliminate false selections due to accidental touches of input device. Accidental touches may produce vocabularies not meant for conversation that get in the way of proper, efficient, flowing communication. Dwell time, pressure, dual switch selection, etc. provide potential solutions to this problem.
  • AAC should support private communication.
  • AAC should facilitate telephone communication. Participants noted that some AAC users are often mistaken for telemarketers due to the delay in initial communication, and hung up on.
  • There is a general difficulty in interfacing AAC device to other electronic devices such as a telephone.
  • Need ability to use AAC device as control interface for PC's. (Users noted problems controlling PCs from their current AAC devices.)
  • Overall rate of communication (including all aspects of communication from input through output) should be faster. Participants addressed specific difficulties with touch screen input due to the reaction, or refresh, time of dynamic pages.
  • Device should be portable (i.e. for transportation in car, in plane, etc.).
  • Device should be smarter.
  • Device should give user the option to enable and disable certain features, functions, and programs of the device as desired.
  • Need longer lasting battery.
  • Device should be waterproofed for inclement weather (or, participants noted, when sitting by a waterfall).
  • Light-pointer should be waterproof (producers should test for and provide evidence that waterproofing is successful in all conditions).
  • Devices should be made more usable (e.g. learn, use, optimize). Currently the burden of responsibility is placed on clinicians and users.
  • Less time should be required to learn to customize devices. The time associated with reading and learning complex manuals discourages clinicians from learning the device's full capabilities and they often specialize in one product. This time factor is a barrier to optimizing the device for the user and often discourages clinicians from learning multiple systems.
  • Less time should be required for users to learn devices. Participants noted that currently one learns the device by "playing around" with the functions.
  • Devices should be intuitive and easy to learn, set-up, and operate without being so reliant on manual or cause them to spend hours learning the manual.
  • AAC devices should be more alike so that clinicians don't need to re-learn everything. Participants noted analogy to car, in which some things are basic to all. This change would significantly increase consumer choices. If more similar, clinicians can spend more time customizing for each user.
  • AAC users and clinicians should drive the process of AAC design and development rather than being evaluators of the products brought into the marketplace by manufacturers.
  • AAC devices and manuals are often complex and poorly documented, making it difficult for clinicians to setup, optimize and anatomize.
  • Clinicians should not have to be programmers in order to optimize AAC devices. Device optimization should be done within the clinical intervention.
  • Should be easy for caregivers and non-specialists (including family, friends, etc) to use and to assist in device customization.
  • Device training should be easier. Participants commented that clinicians now find AAC device training difficult.
  • Training manuals currently are written to have the device or software in front of you. (Note: A possible solution is training via CD or Internet simulation.)
  • Timeframe for the release of new technologies is too long.
  • AAC manufacturers should follow the Microsoft model - develop and share open operating system and support software development.
  • Need calculator, clock, calendar, date, etc. capabilities.

[ Top of Page ]

2. State-of-the-Practice (current technology, strengths, weaknesses, etc.)

CONTEXT RECOGNITION

  • Automatic volume sensor (Note: Currently available in cars [1])
  • Currently some laptop displays and some televisions have ambient light sensors. (Note: Ambient light sensors are light sensors at the top of the monitor that gauge ambient light in the work environment and automatically adjust the brightness of the monitor for optimum viewing. This takes away the frequent and tedious task of manually adjusting brightness and contrast on the screen; it is particularly beneficial in environments where light in the office is subject to change throughout the day. Currently found on the Compaq iPaq H3600, a personal digital assistant. [2])]
  • Manufacturers Bose and Bang Olufsen offer quality speakers which could be appropriate for AAC.

DISPLAY TECHNOLOGY

  • Dual mode (speech and text) output enhances communication and understanding because the interlocutor has text to follow along with.
  • The two-way screen currently available on Zygo's LightWRITERtm has a user-controlled display that keeps the interlocutor from jumping to conclusions about what the user is trying to say. (For example, the user may wish to have the words pop up one-by-one to engage the interlocutor in the conversation, and allow friends or family to guess the progression of the sentences to speed up communication. On the other hand, the user may wish to wait until they have completed their thought before the dual display reads out the sentence to the interlocutor.)
  • Technology has evolved from single scan to dual scan to active matrix screens. Effect is increased contrast and resolution especially for daylight viewing.
  • When laptop PCs are used as AAC devices, the screen gets in the way of two communication partners.
  • The display appears dim and is not easily read when outside, in bright areas or in sunlight classrooms.
  • Old [liquid] crystal display has "light up" feature for viewing in dark locations.
  • Most AAC devices lack key back-lighting.
  • Back-lit keys (on keyboard or touchscreen) are convenient for the evening.
  • Display print should be modified (as in Pathfinder).

SPEECH OUTPUT

  • DECTalktm developed software-only version of their speech synthesizer that allowed software developers to modify and build upon this software.
  • DECTalktm's "Perfect Paul" voice is most intelligible, recognizable and most widely used. Many consumers use this male voice regardless of their sex.
  • DECTalktm is currently the standard for speech output on AAC
  • Limited choices in voice output but can choose between different voice personalities, and change those as desired (Note: By personality, the group meant those personalities of the speech synthesis machine MITalk, designed by Dennis Klatt at the MIT Speech Lab and currently marketed by the Digital Equipment Corporation, DECTalktm. These personalities include "Perfect Paul", "Huge Harry", "Whispering Wendy", "Frail Frank", "Dr. Dennis", "Beautiful Betty" and "Kit the Kid")
  • Some devices support voice customization using macros for intonation, emphasis, tone and prosody, which goes along with toolbar to make device more powerful and user-friendly. (Note: A macro is a sequence of letters or commands run from one voice command. One example is a 'asap'. You say 'asap' and the computer types 'as soon as possible'. Another might be 'quick backup', and the computer will change all of the settings for a quick backup. This is done by recording keystrokes.")
  • Devices have both synthesized and digital speech output. Newer synthesized speech technology is available - see Speechworks
  • Eloquent Technologies, a division of SpeechWorks, developed ETI-Eloquence, a concatenation-based speech software. Can highlight words for emphasis using "toolbar", has different pitch patterns, detects dialectical differences, and is available in 13 different languages.
  • Systems don't recognize and pronounce words properly even when words are spelled correctly - "limited dictionary".
  • Difficult to add words to dictionary.
  • In some systems there is great flexibility in changing device parameters (adding words, changing volume level, stress and emphasis, customization of vocabulary), however to do so requires extensive programming.
  • For devices with synthesized speech, non-speech sounds can be recorded to enhance communication. For synthesized and digitized devices, specialized speech sounds (three stooges for example) are incorporated to provide additional inflection and emotion.
  • With some devices you have the capability of recording a "database" of speech with popular words and phrases. (Note: This could partly address limited word dictionaries to support local slang and dialect.)
  • AAC voice output has little capability for voice intonation.
  • Language modules are available for some languages. Modules are not available for Asian and Arabic languages that work with the eye gaze system.
  • Female voice is inferior due to pitch range.
  • For telephone conversations in which there is dead time and often hang-ups, the "speak on entry" function is used and usually grabs the attention of the person you are conversing with.
  • Various language options should be explored including AT&T speech synthesizers that provide voices in 58 languages, as well as banking and ATM machines which offer a number of language options.
  • Most of current research goes toward speech recognition rather than speech synthesis.
  • Speech recognition systems are not powerful enough to accurately recognize non-standard speech (e.g. dysarthric speech).
  • Eye-gaze system from LC Technologies uses synthesized speech and will continue using synthesized speech.

GENERAL

  • Some devices support wireless, high speed data-transfer via infrared data ports (ex. DynaBeam) Note: DynaBeam consists of an infrared receiver and cables which allow a DynaVox or DynaMyte user to access Mac and PC computers (both keyboard and mouse) by simply pointing the Device's infrared transmitter at the DynaBeam's receiver.
  • Some devices are supporting wireless networking (ex. Gemini from Assistive Technology Inc.).
  • Some AAC devices support word processing, Internet access, data transfer to/from PC etc.
  • Hardware should have increased processing speed and increased memory. Need 166 MHz. More powerful hardware is needed to run more powerful software (ex. 200 MHz might be necessary to run more complex programming such as Gus software from Gus Communications.
  • There is a time delay in selecting item and page coming up. This delay reflects language processing time and display-refresh time. Increased processor speed and memory will reduce both.
  • Many AAC systems use word prediction to increase communication rate and optimize vocabulary selection.
  • There exists a significant gap between current devices for small AAC markets (orphan product market) and technology readily available in large markets (and whether compatible for AAC). Participants stated that essentially the higher-ups in industry have the ability (political clout) to press to get new technology out faster than the current turn-around.
  • The AAC market is small but is expanding to the Amyotrophic Lateral Sclerosis and Autism populations.
  • AAC manufacturers should consider partnerships with mainstream companies (i.e. Lucent Technologies, cell phone companies, tablet computer companies, etc)
  • AAC developers should consider partnerships with Internet kiosk developers.

[ Top of Page ]

3. Needed Technology (refinements, innovations, etc.)

CONTEXT RECOGNITION

  • Need automatic volume control capabilities (turning up/down) that user can override manually.
  • Need to utilize context recognition (cultural, local and physical recognition for language processing and voice production) in order to improve rate and quality of communication.
  • Recognition needs to take place in real time; i.e. should not introduce communication delays.

DISPLAY TECHNOLOGY

  • Need universal wireless capabilities including remote displays, monitors and speakers (e.g. Ability to have wireless input so that user can access input device while in bed, a remote wireless display to communicate with teacher across classroom, remote speaker to contact caregiver in another room.)
  • User's need separate composition display that only user can see. Some participants suggested a glasses mounted display or eyepiece screen as a possible location.
  • User should have choice between remote or on-machine display (optional).

SPEECH OUTPUT

  • Need natural sounding speech.
  • Need volume and range of output to match human range
  • Need to utilize user's own voice when possible and desired. Use pre-injury quality and tone (ex. Answering machine voice clips could be used as samples). Use voice as starting point and vary from there.
  • Some users may benefit from a manual voice control, for example, a tone bender/tone pedal, as in a piano.
  • Need synthesized speech for international languages including Asian and Arabic.
  • Need to broaden frequency range (both higher and lower frequencies) of speech production (applies to digitized or synthesized).
  • Speaker placement directs sound away from interlocutor (e.g. speakers located on bottom or rear of device). Need to control sound directionality for different environments (in car, directed to your interlocutor who may be to your right). By directing sound, it allows user to lower volume and improve privacy.
  • Need real time conversation. Shorten delays in responding to dialogue.
  • Need wearable speaker array to support directional speech output and bring focus of listener attention from AAC device to user.
  • Increase quality of speakers.

GENERAL

  • Integration of email, cell phones, Internet capabilities, etc. into device.
  • Need wireless control of environment for access to cell phones, Internet, television, PC, speakerphones etc. Need device to be fully integrated with wireless environment (i.e. have wireless transmitters to deliver contextual information to AAC device.) Need device to be able to access household products such as refrigerator door, microwave, window shades, temperature and humidity control, oven etc.
  • Device needs wireless access to networks and Internet (ex. via wireless modems).
  • Need to connect AAC device directly to telephone line for increased privacy (AAC transmitted directly through phone line).
  • Need AAC device to have a stable operating system.
  • AAC users would like AAC software packages to be supported across AAC hardware platforms (i.e. Microsoft model).
  • Some participants believed Windows CE-based systems would be a good AAC device platform.
  • Users would like to load AAC software packages in order to customize to meet their communication needs (i.e. Microsoft model).
  • Software upgrades should be available. Device should be automatically notified of available software (e.g. via Internet).
  • Need extended battery life.
  • Improved input interface is needed. Input technology should not limit communication rate.
  • Need improved user interface design (test for usability, ergonomics, performance, operation, learnability etc.)
  • Need Beta testing and structured user trials for all devices.
  • Device should be affordable without third party reimbursement
  • System must be durable (to heat, humidity, vibrations, dust, water etc.).
  • Needs to be portable. Note: portability is dependent on whether device is being carried or mounted to wheelchair or other location.
  • Need quantitative performance data (e.g. Prentke-Romich Company's Language Activity Monitor, LAM, and the RERC on Communication Enhancement's Augmentative Communication Quantitative Analysis, or ACQUA). These assessment techniques record (LAM) and analyze (LAM and ACQUA) such things as character, word, and sentence selection; speed; rate; duration; efficiency etc.
  • Need performance monitoring to be user-controlled, including the ability to turn on/off and edit as desired.
  • Need 24 hour support and guarantee for any device.
  • Participation should occur in real time and delays in dialogue should be eliminated.

[ Top of Page ]

4. Barriers (to obtaining technology, to developing technology, etc.)

  • High purchase cost.
  • Compromises with Medicare. Lack of full coverage of all AAC devices complicates the process of device recommendation.
  • The assistive technology and AAC markets are small therefore profits are low and less funding is allocated to research and development.
  • The inability to provide developers and programmers a comparable salary to other mainstream markets (e.g. video game industry) limits their willingness to obtain employment in small markets such as augmentative communication.
  • Small markets such as AAC are not getting the attention of manufacturers with technological capabilities (such as phone, Internet companies).
  • Larger corporations are not specifically concerned with AAC users, nor in expanding their markets to include AAC.
  • AAC products require a long design time, while mainstream technologies such as cell phones are always changing. It is difficult for small companies to keep up.
  • Manufacturers don't consider it a high priority to improve speech synthesis.
  • Stereotypes and attitudes of society that current and speech producing devices are not as good as natural voices.
  • Massive cultural change is needed to get away from societal misconceptions regarding AAC.
  • Public awareness needed
  • Federal laws and restrictions.
  • Technologies exist that have transfer capability, but incorporation into products takes time and money, and a willingness of the large mainstream companies to invest.
  • Need increased computer power to handle more powerful speech recognition software.
  • Dedicated vs. non-dedicated systems. Need to run software on off-the-shelf PC and related platforms.
  • There are products with more capabilities than people are aware of.
  • Eye gaze systems use synthesized speech only.
  • Restrictive vocabulary for speech recognition, high error rates
  • User has low expectations for technology
  • Off the shelf software needs more sophisticated end user
  • Speech Language Pathologists don't often recommend AAC devices
  • AT vendors are not aware of delivery channel - not working with Speech Language Pathologists
  • Self-identification of current knowledge and its application
  • Voluntary information transfer

[ Top of Page ]

References

  1. Fujitsu. "Automatic Volume Noise-Sensor-Equipped Volume Control: Using Noise Level In Automobiles to Adjust Sound Level of Audio Systems for Greater Listening Pleasure." August , 1997.
  2. Piros, William. "Compaq iPaq H3650 Review." August 23, 2001. [Online: www.neoseeker.com/Articles/Hardware/Reviews/compaqh3650/2.html]

[ Top of Page ]