 |
Market Needs |
Current Technology |
Needed Technology |
Barriers to Achieving Needed Technology |
References
The following is the raw data collected during the T2RERC's
Stakeholder Forum. It reflects the comments and needs as expressed by
the Forum participants.
1. Needs (Unmet needs of consumers, clinicians, etc.)
CONTEXT RECOGNITION
- Need context recognition (automatic knowledge of time and space)
that makes appropriate vocabulary available and easily accessible.
- Device should respond to communication environment (e.g. activities,
environments, group members, work, home, and educational situations)
and should change performance with context.
- When accessing computer applications, a different voice should
be associated with each program so that active programs can be identified
just by voice. (Similar to text readers that use different voices
when different applications are active.)
- Need individualized, pre-programmed prompts for different situations.
- Need voice cues for the interlocutor (i.e. the interlocutor pays
attention to a louder "speaking voice" and ignores the user's "composition
voice").
- Voices need ability to automatically adapt in response to AAC
processing.
- Need automatic set-up of device features for various environments
(i.e. change voices, loudness, etc). · Need automatic volume control
that adjusts to environmental noise level.
- Need speech recognition to obtain contextual information from
interlocutor speech for the prediction of vocabulary. (Note: Substantial
privacy issues may result from the recording of another's voice
or discussion.)
- Need clock for time-contextual information (i.e. 8 AM pulls up
breakfast food page, etc.)
DISPLAY TECHNOLOGY
- In large social settings, some users currently utilize display
for communicating privately. This solution is not acceptable for
bright or sunlit environments.
- Utilizing both speech and text output may make it easier for unfamiliar
interlocutors to understand speech. Interlocutor can use single
display positioned for user, but this may not be as effective because
the communication partner must re-position himself (i.e. lean over
shoulder) to read screen, and this may invade the user's space.
A dual display would facilitate dual mode communication by making
it easier for interlocutor to view text.
- Learning one device should generalize to other devices. Customization,
symbols, options, layout, operation capabilities, interface capabilities,
controls etc. should all have similarities.
- Need for backlighting on keys.
- Need to read displays in brightly lit environments (need non-glare
coatings/shields for displays).
- Need ability to customize display (i.e. change font size, color,
and style)
- Need display that can be read in sunlight.
- Need display that will not create a physical barrier between two
communication partners.
- Need age-friendly displays. Some printed displays are too text
based for children, many would like integrated pictures and text.
- The speaker should face the communication partner.
SPEECH OUTPUT
- AAC devices need higher voice quality. Speech output should sound
natural and human, not computer-generated or synthesized.
- Need ability to change gender, quality, intonation, and inflection
in voice.
- Need significant improvement to female voices. (Female voice personalities
should be at least of equal quality to DECtalktm's
"Perfect Paul".) Not many people utilize a female voice. Female
users must currently sacrifice femininity of voice for a man's voice
with greater clarity and quality.
- In a fighter airplane, the warning system is given in a female
voice, since normal voices are considered to be masculine, and use
of a female voice should make it stand out as an alert.
- Voice output should be at least of equal quality to the phone
recordings that are supposed to sound computerized.
- Automated voice response systems (i.e. those used to check financial
account information, university grades, investments, and automated
answering systems, etc.) are a specific problem for AAC users because
of time constraints. The user can't respond to prompts in real-time
and so is kicked off the system. The user must call the auto-response
system two or more times to prepare responses to each prompt. This
accommodation does not work well for long or complex automated systems.
- Need ability to sing, tell jokes, and be sarcastic.
- To accommodate for the device's inability to support intonation
and emphasis, consumers use repetition of words (double or triple
hitting a word to make a point, for example, "I don't don't don't
want it" can substitute for the inflection).
- People are identified by their AAC voice, therefore quality is
important. (People link factors such as identity, intelligence,
humor, etc., to a person's voice.)
- There is a need for individualization and the expression of emotion.
(A person may hear one voice from down the hall, and they think
it's their friend "Bob", but realize it is some other person also
using the "Perfect Paul" voice) · Devices should take advantage
of digitized speech for intonation changes. Need digital quality
voice that is varied by synthesized speech technology to achieve
intonation, emphasis, and stress.
- Auto-response systems understand synthesized speech better than
natural speech. Synthesized speech is reproducible, invariant, clear,
and monotone and therefore has higher recognition and better accuracy
than recognition of a human's voice.
- Users and clinicians often are not taught how to customize voices.
- Need ability to choose from (and adapt) a wider selection of natural
voices.
- User needs the ability to quickly and easily select or change
the voice personality for different contexts. Many consumers will
switch voices when they are not understood. Some choose a male voice
for a specific context (e.g. when using the telephone).
- Need voices with regional and international accents. · Need ability
to program slang terms with accuracy. Device doesn't know how to
pronounce slang, and even when words are added it is difficult to
customize pronunciation. New words can be placed into the device's
dictionary, but slang usage requires elongated vowels or varied
pitch that is not available.
- Once a word has been added to the dictionary, it is a complex
process to tell the device how you want the word pronounced (syntax,
prosody, etc)
- Pronunciation needs to depend on context of sentence, not just
a properly pronounced word standing alone. · Messages often lose
their power [emotional appeal, fallacy, manipulation, sarcasm, etc.]
due to incorrect pronunciation.
- Those who speak naturally add emphasis to their message and can
transition smoothly to different words in the sentence. Similarly,
there is a need for voice output that allows context to influence
word production and pronounce it accordingly (e.g. read, for different
tenses). Context recognition should account for the proper pronunciation.
- Voice output should, when possible and desired, utilize the AAC
user's "pre-injury" voice. For acquired, progressive diseases, people
want the option to store their old, but own, voice for later use
in their device.
- Need intuitive, easy to use control for instantaneously changing
amplification, speed, inflection, volume and emphasis. For example,
adding information as the user composes a message without compromising
rate of communication.
- Need for real-time output. People tend to look at display due
to speed of output delay.
- Users need more choices in, and should have full control of, speed
variability.
- Privacy is compromised due to volume range of speech output. Problem
environments include bars or other noisy social settings. Participants
expressed difficulty participating in any large social activities.
- Need to control speed and volume of voice output to facilitate
communication in diverse environments. In groups, consumers sometimes
switch from speech output to using their text display because the
speed and volume does not facilitate natural conversation.
- Need ability to increase volume of output to overcome degree of
environmental noise.
- Need for correct selection of words, individual word recognition
and phasing, and pauses between sentences.
- Need to insert pauses to facilitate understanding of phrases,
especially when giving speeches. Currently spaces are inserted only
at sentence ends. (e.g. user may want to insert a pause after "but".)
Some consumers slow speech way down to make it more understandable,
by introducing spaces so people understand word by word or in short
phrases.
- Communications (i.e. sentences or paragraphs) should include proper
spacing, pacing, prosody, etc. A specific problem is prepared speech
- preparing speech ahead of time is only effective if understood.
- Loudness and direction of sound output should be controllable
to facilitate communication in different environments (e.g. in cars,
buses and vans, in classrooms when facing forward, and anytime when
the communication partner is not directly in front of device.) Note:
Better control of loudness of output also improves privacy.
- Need AAC speaker accessory on power chairs.
- Phonetic content should align automatically.
GENERAL
- AAC should support increased independence.
- AAC should allow the user to freely communicate.
- AAC should better support socialization skills (e.g. allow user
to be comfortable in initiating conversation, allow user to go into
depth in conversation, to negotiate, express feelings, show enthusiasm
etc.)
- AAC should open employment opportunities and help the user achieve
success.
- AAC should facilitate language acquisition.
- AAC should fit in with and support standard educational structures
and methodology for literacy and language education. There must
be a clear relationship between symbols and language.
- AAC device should eliminate false selections due to accidental
touches of input device. Accidental touches may produce vocabularies
not meant for conversation that get in the way of proper, efficient,
flowing communication. Dwell time, pressure, dual switch selection,
etc. provide potential solutions to this problem.
- AAC should support private communication.
- AAC should facilitate telephone communication. Participants noted
that some AAC users are often mistaken for telemarketers due to
the delay in initial communication, and hung up on.
- There is a general difficulty in interfacing AAC device to other
electronic devices such as a telephone.
- Need ability to use AAC device as control interface for PC's.
(Users noted problems controlling PCs from their current AAC devices.)
- Overall rate of communication (including all aspects of communication
from input through output) should be faster. Participants addressed
specific difficulties with touch screen input due to the reaction,
or refresh, time of dynamic pages.
- Device should be portable (i.e. for transportation in car, in
plane, etc.).
- Device should be smarter.
- Device should give user the option to enable and disable certain
features, functions, and programs of the device as desired.
- Need longer lasting battery.
- Device should be waterproofed for inclement weather (or, participants
noted, when sitting by a waterfall).
- Light-pointer should be waterproof (producers should test for
and provide evidence that waterproofing is successful in all conditions).
- Devices should be made more usable (e.g. learn, use, optimize).
Currently the burden of responsibility is placed on clinicians and
users.
- Less time should be required to learn to customize devices. The
time associated with reading and learning complex manuals discourages
clinicians from learning the device's full capabilities and they
often specialize in one product. This time factor is a barrier to
optimizing the device for the user and often discourages clinicians
from learning multiple systems.
- Less time should be required for users to learn devices. Participants
noted that currently one learns the device by "playing around" with
the functions.
- Devices should be intuitive and easy to learn, set-up, and operate
without being so reliant on manual or cause them to spend hours
learning the manual.
- AAC devices should be more alike so that clinicians don't need
to re-learn everything. Participants noted analogy to car, in which
some things are basic to all. This change would significantly increase
consumer choices. If more similar, clinicians can spend more time
customizing for each user.
- AAC users and clinicians should drive the process of AAC design
and development rather than being evaluators of the products brought
into the marketplace by manufacturers.
- AAC devices and manuals are often complex and poorly documented,
making it difficult for clinicians to setup, optimize and anatomize.
- Clinicians should not have to be programmers in order to optimize
AAC devices. Device optimization should be done within the clinical
intervention.
- Should be easy for caregivers and non-specialists (including family,
friends, etc) to use and to assist in device customization.
- Device training should be easier. Participants commented that
clinicians now find AAC device training difficult.
- Training manuals currently are written to have the device or software
in front of you. (Note: A possible solution is training via CD or
Internet simulation.)
- Timeframe for the release of new technologies is too long.
- AAC manufacturers should follow the Microsoft model - develop
and share open operating system and support software development.
- Need calculator, clock, calendar, date, etc.
capabilities.
[ Top of Page ]
2. State-of-the-Practice (current technology, strengths, weaknesses, etc.)
CONTEXT RECOGNITION
- Automatic volume sensor (Note: Currently available in cars [1])
- Currently some laptop displays and some televisions
have ambient light sensors. (Note: Ambient light sensors are
light sensors at the top of the monitor that gauge ambient light
in the work environment and automatically adjust the brightness
of the monitor for optimum viewing. This takes away the frequent
and tedious task of manually adjusting brightness and contrast
on the screen; it is particularly beneficial in environments
where light in the office is subject to change throughout the day.
Currently found on the Compaq iPaq H3600, a personal digital assistant.
[2])]
- Manufacturers Bose and Bang Olufsen offer quality
speakers which could be appropriate for AAC.
DISPLAY TECHNOLOGY
-
Dual mode (speech and text) output enhances communication
and understanding because the interlocutor has text to follow
along with.
-
The two-way screen currently available on Zygo's
LightWRITERtm has a user-controlled display
that keeps the interlocutor from jumping to conclusions about
what the user is trying to say. (For example, the user may wish
to have the words pop up one-by-one to engage the interlocutor
in the conversation, and allow friends or family to guess the
progression of the sentences to speed up communication. On the
other hand, the user may wish to wait until they have completed
their thought before the dual display reads out the sentence to
the interlocutor.)
-
Technology has evolved from single scan to dual
scan to active matrix screens. Effect is increased contrast and
resolution especially for daylight viewing.
-
When laptop PCs are used as AAC devices, the screen
gets in the way of two communication partners.
-
The display appears dim and is not easily read
when outside, in bright areas or in sunlight classrooms.
-
Old [liquid] crystal display has "light up" feature
for viewing in dark locations.
-
Most AAC devices lack key back-lighting.
-
Back-lit keys (on keyboard or touchscreen) are
convenient for the evening.
- Display print should be modified (as in Pathfinder).
SPEECH OUTPUT
-
DECTalktm developed
software-only version of their speech synthesizer that allowed software
developers to modify and build upon this software.
-
DECTalktm's "Perfect Paul" voice is most intelligible,
recognizable and most widely used. Many consumers use this male
voice regardless of their sex.
-
DECTalktm is currently
the standard for speech output on AAC
-
Limited choices in voice output but can choose
between different voice personalities, and change those as desired
(Note: By personality, the group meant those personalities of
the speech synthesis machine MITalk, designed by Dennis Klatt
at the MIT Speech Lab and currently marketed by the Digital Equipment
Corporation, DECTalktm. These personalities
include "Perfect Paul", "Huge Harry", "Whispering Wendy", "Frail
Frank", "Dr. Dennis", "Beautiful Betty" and "Kit the Kid")
-
Some devices support voice customization using
macros for intonation, emphasis, tone and prosody, which goes
along with toolbar to make device more powerful and user-friendly.
(Note: A macro is a sequence of letters or commands run from
one voice command. One example is a 'asap'. You say 'asap' and
the computer types 'as soon as possible'. Another might be 'quick
backup', and the computer will change all of the settings for
a quick backup. This is done by recording keystrokes.")
-
Devices have both synthesized and digital speech
output. Newer synthesized speech technology is available - see
Speechworks
-
Eloquent Technologies, a division of SpeechWorks,
developed ETI-Eloquence, a concatenation-based speech software.
Can highlight words for emphasis using "toolbar", has different
pitch patterns, detects dialectical differences, and is available
in 13 different languages.
-
Systems don't recognize and pronounce words properly
even when words are spelled correctly - "limited dictionary".
-
Difficult to add words to dictionary.
-
In some systems there is great flexibility in
changing device parameters (adding words, changing volume level,
stress and emphasis, customization of vocabulary), however to
do so requires extensive programming.
-
For devices with synthesized speech, non-speech
sounds can be recorded to enhance communication. For synthesized
and digitized devices, specialized speech sounds (three stooges
for example) are incorporated to provide additional inflection
and emotion.
-
With some devices you have the capability of recording
a "database" of speech with popular words and phrases. (Note:
This could partly address limited word dictionaries to support
local slang and dialect.)
-
AAC voice output has little capability for voice
intonation.
-
Language modules are available for some languages.
Modules are not available for Asian and Arabic languages that
work with the eye gaze system.
-
Female voice is inferior due to pitch range.
-
For telephone conversations in which there is
dead time and often hang-ups, the "speak on entry" function is
used and usually grabs the attention of the person you are conversing
with.
-
Various language options should be explored including
AT&T speech synthesizers that provide voices in 58 languages,
as well as banking and ATM machines which offer a number of language
options.
-
Most of current research goes toward speech recognition
rather than speech synthesis.
-
Speech recognition systems are not powerful enough
to accurately recognize non-standard speech (e.g. dysarthric
speech).
- Eye-gaze system from LC Technologies uses synthesized
speech and will continue using synthesized speech.
GENERAL
-
Some devices support wireless, high speed data-transfer
via infrared data ports (ex. DynaBeam) Note: DynaBeam consists
of an infrared receiver and cables which allow a DynaVox or DynaMyte
user to access Mac and PC computers (both keyboard and mouse)
by simply pointing the Device's infrared transmitter at the DynaBeam's
receiver.
-
Some devices are supporting wireless networking
(ex. Gemini from Assistive Technology Inc.).
-
Some AAC devices support word processing, Internet
access, data transfer to/from PC etc.
-
Hardware should have increased processing speed
and increased memory. Need 166 MHz. More powerful hardware is
needed to run more powerful software (ex. 200 MHz might be necessary
to run more complex programming such as Gus software from Gus
Communications.
-
There is a time delay in selecting item and page
coming up. This delay reflects language processing time and display-refresh
time. Increased processor speed and memory will reduce both.
-
Many AAC systems use word prediction to increase
communication rate and optimize vocabulary selection.
-
There exists a significant gap between current
devices for small AAC markets (orphan product market) and technology
readily available in large markets (and whether compatible for
AAC). Participants stated that essentially the higher-ups in
industry have the ability (political clout) to press to get new
technology out faster than the current turn-around.
-
The AAC market is small but is expanding to the
Amyotrophic Lateral Sclerosis and Autism populations.
-
AAC manufacturers should consider partnerships
with mainstream companies (i.e. Lucent Technologies, cell phone
companies, tablet computer companies, etc)
-
AAC developers should consider partnerships with
Internet kiosk developers.
[ Top of Page ]
3. Needed Technology (refinements, innovations, etc.)
CONTEXT RECOGNITION
- Need automatic volume control capabilities (turning up/down)
that user can override manually.
-
Need to utilize context recognition (cultural, local and physical
recognition for language processing and voice production) in order
to improve rate and quality of communication.
- Recognition needs to take place in real time; i.e. should not
introduce communication delays.
DISPLAY TECHNOLOGY
-
Need universal wireless capabilities including remote displays,
monitors and speakers (e.g. Ability to have wireless input so
that user can access input device while in bed, a remote wireless
display to communicate with teacher across classroom, remote
speaker to contact caregiver in another room.)
-
User's need separate composition display that only user can
see. Some participants suggested a glasses mounted display or
eyepiece screen as a possible location.
- User should have choice between remote or on-machine display
(optional).
SPEECH OUTPUT
-
Need natural sounding speech.
-
Need volume and range of output to match human range
-
Need to utilize user's own voice when possible and desired.
Use pre-injury quality and tone (ex. Answering machine voice
clips could be used as samples). Use voice as starting point
and vary from there.
-
Some users may benefit from a manual voice control, for example,
a tone bender/tone pedal, as in a piano.
-
Need synthesized speech for international languages including
Asian and Arabic.
-
Need to broaden frequency range (both higher and lower frequencies)
of speech production (applies to digitized or synthesized).
-
Speaker placement directs sound away from interlocutor (e.g.
speakers located on bottom or rear of device). Need to control
sound directionality for different environments (in car, directed
to your interlocutor who may be to your right). By directing
sound, it allows user to lower volume and improve privacy.
-
Need real time conversation. Shorten delays in responding to
dialogue.
-
Need wearable speaker array to support directional speech output
and bring focus of listener attention from AAC device to user.
- Increase quality of speakers.
GENERAL
- Integration of email, cell phones, Internet capabilities, etc.
into device.
-
Need wireless control of environment for access to cell phones,
Internet, television, PC, speakerphones etc. Need device to
be fully integrated with wireless environment (i.e. have wireless
transmitters to deliver contextual information to AAC device.)
Need device to be able to access household products such as
refrigerator door, microwave, window shades, temperature and
humidity control, oven etc.
-
Device needs wireless access to networks and Internet (ex.
via wireless modems).
-
Need to connect AAC device directly to telephone line for increased
privacy (AAC transmitted directly through phone line).
-
Need AAC device to have a stable operating system.
-
AAC users would like AAC software packages to be supported
across AAC hardware platforms (i.e. Microsoft model).
-
Some participants believed Windows CE-based systems would be
a good AAC device platform.
-
Users would like to load AAC software packages in order to
customize to meet their communication needs (i.e. Microsoft
model).
-
Software upgrades should be available. Device should be automatically
notified of available software (e.g. via Internet).
-
Need extended battery life.
-
Improved input interface is needed. Input technology should
not limit communication rate.
-
Need improved user interface design (test for usability, ergonomics,
performance, operation, learnability etc.)
-
Need Beta testing and structured user trials for all devices.
-
Device should be affordable without third party reimbursement
-
System must be durable (to heat, humidity, vibrations, dust,
water etc.).
-
Needs to be portable. Note: portability is dependent on whether
device is being carried or mounted to wheelchair or other location.
-
Need quantitative performance data (e.g. Prentke-Romich Company's
Language Activity Monitor, LAM, and the RERC on Communication
Enhancement's Augmentative Communication Quantitative Analysis,
or ACQUA). These assessment techniques record (LAM) and analyze
(LAM and ACQUA) such things as character, word, and sentence
selection; speed; rate; duration; efficiency etc.
-
Need performance monitoring to be user-controlled, including
the ability to turn on/off and edit as desired.
- Need 24 hour support and guarantee for any device.
- Participation should occur in real time and delays in dialogue
should be eliminated.
[ Top of Page ]
4. Barriers (to obtaining technology, to developing technology, etc.)
- High purchase cost.
- Compromises with Medicare. Lack of full coverage of all AAC devices
complicates the process of device recommendation.
- The assistive technology and AAC markets are small therefore profits
are low and less funding is allocated to research and development.
- The inability to provide developers and programmers a comparable
salary to other mainstream markets (e.g. video game industry) limits
their willingness to obtain employment in small markets such as
augmentative communication.
- Small markets such as AAC are not getting the attention of manufacturers
with technological capabilities (such as phone, Internet companies).
- Larger corporations are not specifically concerned with AAC users,
nor in expanding their markets to include AAC.
- AAC products require a long design time, while mainstream technologies
such as cell phones are always changing. It is difficult for small
companies to keep up.
- Manufacturers don't consider it a high priority to improve speech
synthesis.
- Stereotypes and attitudes of society that current and speech producing
devices are not as good as natural voices.
- Massive cultural change is needed to get away from societal misconceptions
regarding AAC.
- Public awareness needed
- Federal laws and restrictions.
- Technologies exist that have transfer capability, but incorporation
into products takes time and money, and a willingness of the large
mainstream companies to invest.
- Need increased computer power to handle more powerful speech recognition
software.
- Dedicated vs. non-dedicated systems. Need to run software on off-the-shelf
PC and related platforms.
- There are products with more capabilities than people are aware
of.
- Eye gaze systems use synthesized speech only.
- Restrictive vocabulary for speech recognition, high error rates
- User has low expectations for technology
- Off the shelf software needs more sophisticated end user
- Speech Language Pathologists don't often recommend AAC devices
- AT vendors are not aware of delivery channel - not working with
Speech Language Pathologists
- Self-identification of current knowledge and its application
- Voluntary information transfer
[ Top of Page ]
- Fujitsu. "Automatic Volume Noise-Sensor-Equipped Volume Control:
Using Noise Level In Automobiles to Adjust Sound Level of Audio
Systems for Greater Listening Pleasure." August , 1997.
- Piros, William. "Compaq iPaq H3650 Review." August 23, 2001.
[Online: www.neoseeker.com/Articles/Hardware/Reviews/compaqh3650/2.html]
[ Top of Page ]
|
 |