Pakistan's Fortnightly Magazine For IT Leaders IDG Publication
Home Events Archives Subscribe Contacts Advertise Profile Pakworld.com

  Technology  

Smart Talk
Companies are using speech-enabled applications to cut average call times, decrease staff requirements and enable new features.
(By Robert L. Mitchell)

When TV Guide subscribers want to notify the magazine about a change of address, they simply call customer service. But the friendly voice on the other end of the line isn’t a human call center representative. It’s a virtual agent - a speech-enabled application that can understand and respond to requests from the customer.


If TV Guide’s 40 million customers have any qualms about speaking with a machine, they aren’t complaining. One reason may be that using the system is easier and faster than talking to a live representative, says Steve Martin, executive director of fulfillment operations at the New York-based publication. The system, purchased from Tuvox Inc. in Cupertino, Calif., halved average call times, from four minutes to two. That improved customer service while also reducing telecommunications and staffing costs, Martin says.

Long considered overly expensive and complicated, speech-enabled applications are finally beginning to deliver bottom-line benefits, says Daniel Hong, an analyst at Datamonitor PLC in New York. Today, the systems can eliminate the old, stilted voice recordings used in interactive voice response (IVR) systems and add a more friendly voice-user interface (VUI) that understands natural, conversational language. The VUI accepts verbal input rather than requiring the caller to enter information from a touch-tone keypad.

State-of-the-art speech-enabled systems can cut through complex and confusing touch-tone menu hierarchies used in traditional dual-tone multifrequency (DTMF) systems by allowing users to say exactly what they want and then jump directly to that function. Speech-enabled systems are faster than touch-tone IVR systems for more advanced transactions and are more efficient at tasks like accepting alphanumeric serial numbers.

Competition in the speech-enabled applications market has increased, and prices have dropped by 30% over the past five years, according to Datamonitor. The emergence of open platforms built around standards such as VoiceXML and Speech Application Language Tags (SALT) has fostered the competition, spurring new entrants such as Microsoft Corp.’s Speech Server, which debuted last year.

The proprietary IVR system hardware and software in common use today are gradually being replaced with industry-standard servers with plug-in telephony cards. Vendors of speech-enabled IVR applications typically work with multiple speech engines, which provide basic speech-recognition, authentication and text-to-speech technology. Most offer prebuilt components that can be assembled into custom and packaged vertical-market applications.

The trend toward the use of prebuilt modules and reusable components has made the construction of speech-enabled applications easier. “Right now, we’re on the brink of going from the early adopter to the pragmatist phase,” says Hong. Although only about 7% to 10% of currently installed IVR systems will be speech-enabled this year, one in three new systems ship with the capability, and 50% will by 2009, according to Hong. The real potential of speech technology lies in new applications rather than in the replacement of functions handled by touch-tone systems, says Steve Coplan, an analyst at The 451 Group in New York.

Building on its initial success, TV Guide is adding caller self-service features. “We’ve expanded it to do surveys and to handle our in-house employee directory,” Martin says. And the system will soon handle online subscription payments as well.

Gtech Holdings Corp. in West Greenwich, R.I., has begun using the technology to automate field-service calls for retail machines it maintains for government lotteries. “It’s cumbersome to collect [alphanumeric] serial number information via a DTMF application” that takes 3 million calls annually, says Mike Sax, director of global technology services. A voice-enabled system changed that. “We saw a 15% increase in acceptance almost out of the gate,” Sax says, adding that he expects the system to pay for itself in 18 months.

Despite the advantages, speech-enabled applications still require specialists to perfect the VUI, customize “grammars” that the speech engine recognizes and tune the systems to improve accuracy. “They’re cheaper to implement relative to where the technology was but still require a lot of tuning and manual overhead to get the applications up and running,” says Coplan.

Users shouldn’t expect the systems to be perfect right away, adds Martin. “When we first started, we were getting about 40% success rate,” he says. With tuning, that rate has climbed to more than 70%, and Martin expects the system to top out above 80%, which he says is acceptable. Dialogues are carefully constructed so when the system fails to interpret a request, the caller may transfer to a live operator or use touch-tone keys to complete a transaction. Failures are tracked by the system, which is periodically tuned to improve accuracy.

That process can’t be rushed, says Casey Lewis, a software development manager at DST Systems Inc. in Kansas City, Mo. DST provides customer service outsourcing for mutual fund companies and their more than 90 million shareholders. The company uses a natural-language system from Edify Corp. in Santa Clara, Calif., that allows callers to perform activities such as checking their balances and redeeming shares.

While DST’s staff handled much of the programming and construction of the system, a lot of the nine-month deployment effort was spent on multiple iterations of tuning - an area where the staff had little experience. “Don’t hesitate to rely on vendors who can provide expertise,” Lewis advises.

Finding Your Voice

Just as tuning is important, developing a user-friendly VUI is also critical. At American Savings Bank (ASB) in Honolulu, success meant tuning the system to understand local dialects as well as creating a friendly virtual agent that would become part of the bank’s brand image. The institution, which takes more than 300,000 calls per month, already had a touch-tone IVR system. “Our local competition doesn’t have speech, so it was an opportunity to be first,” says Renee Lum, assistant vice president and manager of the bank’s customer service center.

Lum brought in Dallas-based InterVoice Inc. to help develop its virtual agent and tune the system. “The personality for the voice was very important. We got down to her age, her hobbies and how many kids she had,” says Lum. ASB then hired professional talent to record the voice and worked with InterVoice to develop the dialogues. “There were a lot of tuning cycles,” Lum says. The system needed to recognize local words such as aloha, as well as local pronunciations for words like four, which sounds more like “foa.” Testing with real users helped to refine the pace and pitch of interactions. The feedback also helped ASB refine dialogues by replacing confusing words like debit with the more straight forward withdrawal, for example.

The system runs in parallel with the existing touch-tone system. Unfortunately, 94% of callers still press 9 to go to the touch-tone system as soon as they call in, bypassing the voice interface. Those customers have memorized the touch-tone options and in some cases may not even realize that a new option exists, Lum says.

For its part, DST addressed that challenge through careful scripting of the initial call dialogues, according to Lewis. By making some adjustments, DST was able to retain 80% of callers within the speech-enabled system. “The way you build that [script] has a lot of impact on what your speech-recognition retention rate will be. We discourage scripting that lets people press [the star key] and go right to the touch-tone system,” Lewis says.

Although the technology has improved, the underlying complexity of speech application development remains a challenge, says Coplan. “There’s no real abstraction layer to separate out the complexities of the speech-recognition engine,” he says, and that makes the development process more complicated than it should be. Ultimately, the key to success may be the continued development of middleware from vendors such as Microsoft and IBM. But so far, the vendors have had little success, Coplan says.

Datamonitor’s Hong agrees that the technology is still evolving but says that the development of prebuilt modules means that users can build a speech-enabled application without doing the kind of low-level programming that used to bog down such projects. A speech-enabled IVR can pay for itself in 12 to 24 months through cost savings alone and may also offer a competitive advantage, Hong says. “It improves customer service while reducing costs for the company,” he says.
  News  
  Top Stories  
  Management  
  Technology