|This is really scary.|
The Human Voice, as Game Changer
By NATASHA SINGER
NEW YORK TIMES
VLAD SEJNOHA is talking to the TV again.
O.K., maybe you've done that, too. But here's the weird thing: His TV is listening.
"Dragon TV," Mr. Sejnoha says to the screen, "find movies with Meryl Streep." Up pops a list of films like "Out of Africa" and "It's Complicated."
"Dragon TV, change to CNN," he says. Presto — the channel flips to CNN.
Mr. Sejnoha is sitting in what looks like a living room but is, in fact, a sort of laboratory inside Nuance Communications, the leading force in voice technology, and the speech-recognition engine behind Siri, the virtual personal assistant on the Apple iPhone 4S.
Here, Mr. Sejnoha, the company's chief technology officer, and other executives are plotting a voice-enabled future where human speech brings responses from not only smartphones and televisions, cars and computers, but also coffee makers, refrigerators, thermostats, alarm systems and other smart devices and appliances.
It is a wildly disruptive idea. But such systems are already beginning to change the way we interact with the world and, for better and worse, how we think about technology. Until now, after all, we've talked only to one another. What if we begin talking to all sorts of machines, too — and, like Siri, those machines respond as if they were human?
Granted, people have been talking into machines and at machines since the days of Edison's phonograph. By the 1980s, commercial speech recognition systems had become sophisticated enough to transcribe spoken words into text. Today, voice technology is a fixture of many companies' customer-service operations, albeit an occasionally maddening one.
But now the race is on to make the voice the sought-after new interface between us and our technology. The results could rival innovations like the computer mouse and the graphic icon and, some experts say, eventually pose challenges for giants like Google by bypassing their traditional search engines.
No player is bigger in voice technology than Nuance, of Burlington, Mass., an industry pioneer that has acquired more than 40 companies in the field and today employs 7,300 people. It is one of the companies that helped make a big technological leap from programs that take dictation to systems that actually extract meaning from words and respond to them. Now it wants to push far beyond that.
"They are the equivalent of Microsoft, Google or Amazon in a very niche technological space," says Andrew Rosenberg, an assistant professor of computer science at Queens College.
Like many new technologies, sophisticated voice systems have potential drawbacks. Some experts worry about privacy invasions, others about our ever-deepening attachment to devices like smartphones.
Humans are wired for speech and tend to respond to talking devices as if they were kindred spirits, says Sherry Turkle, a professor of the social studies of science and technology at the Massachusetts Institute of Technology.
"I'm not saying voice recognition is bad," Professor Turkle says. "I'm saying it's part of a package of attachments to objects where we should tread carefully because we are pushing a lot of Darwinian buttons in our psychology."
ONLY a decade ago, voice-enabled virtual assistants seemed more science fiction than business fact. But in 2000, Paul Ricci, a former executive at Xerox, concluded that voice software could one day disrupt the marketplace the way the mouse and the icon had in the 1980s.
"We had to decide early on where there were markets where we could successfully deploy the technology," says Mr. Ricci, Nuance's chief executive.
Nuance, then known as ScanSoft, went on an aggressive acquisition spree. It bought a desktop dictation system called Dragon NaturallySpeaking, as well as dozens of small companies that had carved niches in medical dictation, automated voice-response systems and speech research. Its most significant acquisition was Nuance, a rival that had been spun off from S.R.I. International of Menlo Park, Calif. The combined company took the Nuance name. (S.R.I. International later developed and spun off Siri, which was acquired by Apple in 2010.)
"They have literally tried to buy every good asset out there, or build it themselves, knit it all together and augment it," Richard Davis, an analyst at Canaccord Genuity, says of Nuance.
Nuance reported revenue of about $1.3 billion for 2011, with $515 million of that coming from its health care technology business.
The stock market seems to like what it sees: Nuance's share price touched a record high of $31.15 on Feb. 9, about double a level of $15.59 last August; it closed on Friday at $25.58.
Not everyone is as enamored with voice technology. Some privacy advocates worry that it adds an audio track to the digital trail that people leave behind when they use the Web or apps, potentially exposing them to more data mining.
Voice recognition software works by sending speech to processors that break down spoken words into sound waves and use algorithms to identify the most likely words formed by the sounds. The system typically records and stores speech so it can teach itself to become more accurate over time. Nuance, for example, believes that, aside from the federal government, it has amassed the largest archive of recorded speech in the United States.
"We have no idea who you are today," says Peter Mahoney, the company's chief marketing officer.
Such assurances aside, voice recognition software could conceivably pose enough of a risk to people's privacy that regulators in Washington are watching.
"Just as we are concerned about the possible applications of facial recognition, there are other forms of biometric identification, like voice, that pose the same kind of problems," says David C. Vladeck, the director of the Bureau of Consumer Protection at the Federal Trade Commission. He was speaking about voice technology in general, not about Nuance in particular.
"DRAGON GO," Mr. Sejnoha says into his iPhone, "I want to make reservations for three tomorrow night at Craigie on Main."
Dragon Go is Nuance's own virtual assistant, an app that has been downloaded several million times since its introduction last summer.
Unlike Siri, however, Dragon Go doesn't talk back. Mr. Sejnoha was asking for a reservation at a restaurant in Cambridge, Mass., and the app went directly to OpenTable and displayed his reservation options.
Ask Dragon Go for tickets to, say, "The Hunger Games," and it typically displays a listing of showtimes at the nearest cinema from Fandango. A query about a particular spa might elicit reviews from Yelp.
Dragon Go, Nuance's first direct-to-consumer app, is part of a push to build the brand's visibility and demonstrate Nuance's technological advances to business customers. Its real goal is even bigger: to disrupt the role of search engines as gatekeepers to the Web.
For the most common queries, Dragon Go usually bypasses search engines by taking users directly to Web sites of companies like Amazon, Expedia and OpenTable, which are Nuance partners on the app. If people don't find what they're looking for there, Dragon Go offers traditional Web search.
The benefit for consumers, Nuance executives say, is faster answers in fewer steps. In many cases, Nuance collects a small fee from partner sites when people make restaurant reservations or complete purchases. The app could be construed as a challenge to the likes of Google and Microsoft, which have their own voice products — such as Google Voice Actions and Microsoft Tellme — as well as search engines.
"If you are Google," says Mr. Davis, the analyst, "you are saying, 'Holy smokes, we are about to get cut out of the equation.' "
Christopher Katsaros, a Google spokesman, declined to comment. The company has recently updated Google Voice Actions, its voice-command system for Android phones, with a feature that continuously converts people's speech to text, making it faster and smoother to dictate and send text messages, search Google aloud, or ask for directions.
Lezli Goheen, a spokeswoman for Microsoft, said that the company had addressed consumers' expectations for easier access to information through several means. In addition to Tellme, a program included in all new Windows products that lets people dictate text messages and commands to applications like calendars, she said, the company has introduced Bing Voice Search, a program that lets people speak their Bing searches.
Nuance, meanwhile, has similarly ambitious plans for its health care business. In collaboration with I.B.M., the company is developing analytics to scour the medical notes that doctors dictate after they see patients. The idea is to search the text for common red flags — like medicines that interact dangerously — and automatically alert doctors, hopefully reducing problems and health care costs.
MEMBERS of US Airways' frequent-flier program who have registered their mobile phone numbers are greeted by name by "Wally," an interactive voice system that Nuance created for the airline.
One day last month, Wally was talking to Kerry Hester, a senior vice president at US Airways, who had called to check on her own flight.
"Hello, Kerry, I've matched your mobile number to your Dividend mileage account," Wally said. Her flight from Phoenix to Los Angeles, Wally reported, unprompted, was "still scheduled to depart on time at 11:20 from Gate A23."
If Wally's voice sounds familiar, that's because it belongs to Wally Wingert, the announcer on "The Tonight Show With Jay Leno," who prerecorded all the words that callers hear.
US Airways introduced Wally last summer, as part of a relocation of its offshore customer service call-in operations back to the United States. Nuance designed the system to anticipate callers' requests. Wally, for example, can automatically tell frequent-flier members their seat assignments or report whether they have received upgrades. It also converts people's speech to text, so that, should customers ask to speak a live operator, they don't have to repeat their original requests.
Wally, Ms. Hester says, has reduced the number of customers who ask to speak with agents, as well as the average length of customer calls. "Without the system, we would have had to hire a couple hundred more agents," she says.
Wally, which never lets on that it is an automated system, seems so personable that many people say "thank you" before hanging up, Ms. Hester says.
"I think that tells us that they were satisfied," she says. "I think it tells us that they felt they were interacting with a person."
But the lack of disclosure bothers Professor Turkle of M.I.T. As voice-enabled systems become more sophisticated, she says, they create the illusion that we are interacting with other people, rather than with machines. In the long term, she says, the systems' sleekness and ease of use could end up diminishing the value of slower, messier, real human connections. Reminding users that they are talking to a machine can make them more conscious of the superficiality of the exchange.
"We need to make a cultural decision," Professor Turkle says. "Either we want to alert people when they are talking to a machine, or we don't."
IN 2008, Nuance sued a fierce rival in the voice technology market, contending patent infringement. The company, Vlingo, which markets its own virtual-assistant apps for Android phones, BlackBerries and iPhones, countersued, making similar allegations.
Last year, a court found that Vlingo had not infringed. On several other lawsuits, including the Vlingo countersuit, the parties have agreed to stays. That is because, last December, Nuance agreed to buy Vlingo for an undisclosed sum. It plans to complete the acquisition in the first half of this year.
"From our standpoint, the ability to compete with Google, that owns half the smartphone market, and Microsoft, that bundles voice with their products, that's the real business logic behind merging with Nuance," says Dave Grannan, the chief executive of Vlingo, which is based in Cambridge, Mass.
Nuance and Vlingo share a vision of a world populated by cloud-based, voice-enabled virtual assistants that move seamlessly from one device to another.
One afternoon earlier this year, a team of Vlingo executives demonstrated their own TV voice-command system to a New York Times reporter. The executives also showed a short animated video in which a fictional couple merrily conversed with their smartphones, tablet computer, TV and car — and the devices replied in kind, alerting the male character that his car needed gas and the woman that her flight that day had been canceled because of bad weather.
"More proactively alerting you with voice, telling you something about your car or an accident ahead, a personal assistant thinking about your needs and keeping you connected to other people is where we think this technology is really going," Mr. Grannan says.
BACK in Nuance's mock living room, Mr. Sejnoha is finishing his demo of Dragon TV, the company's new software that can be built into Web-connected TVs. With it, viewers can use voice commands not only to find programs but also to make Skype calls or shop on Amazon via the TV. The technology is scheduled to hit the market shortly: LG Electronics plans to introduce a smart TV powered by Nuance software that lets viewers update Facebook and Twitter accounts by speaking into a special remote control.
Soon, Mr. Sejnoha predicts, many other devices, not just televisions, will be taking voiced commands, and talking back. In Germany, people can already ask a Nuance-powered coffee maker — marketed as "the first fully automatic machine that obeys" speech — to make cappuccino. The machine, called the Jura Impressa Z7 One Touch Voice, speaks both English and German.
Dragon TV, meanwhile, is already available in about two dozen languages.
"Dragon TV, mute," Mr. Sejnoha says.
"See," he says, "it's useful."