Carnegie Mellon behind region's companies leading in language tech

| Sunday, Nov. 29, 2009

If that Northwest Airlines plane had some Pittsburgh-grown technology for its co-pilot, it would not have over-shot its airport by 150 miles in late October.

A synthesized voice would have repeatedly told the pilots something like, "You just missed Minneapolis." The voice might even have scolded the crew in Spanish or some other language.

Led by a deep research base at Carnegie Mellon University, Pittsburgh is a virtual mecca of language technology. The region is home to 20 or more companies, mainly spun out of CMU, that employ computer technology to unravel babel.

The software applications at these companies can convert text to synthesized speech or human speech to text, sort vast amounts of text, and even translate human speech into synthesized speech of another language in seconds.

"In the United States, this is clearly the place for language technology," said Jaime Carbonell, a CMU professor of computer science and director of the university's Language Technologies Institute. He has been involved with five spinout companies.

They range from older spinouts, such as Lycos Inc., a popular Internet search engine company that now is part of a Korean company, to Vocollect Inc., a Wilkins-based company whose computer products use voice-recognition technology that supports 26 languages.

More recent spinouts include M*Modal, Squirrel Hill, which converts doctors' and others' spoken words into text; and Carnegie Speech, Downtown, whose software detects and corrects speech for people in classrooms, government and corporations in 20 countries.

The University of Pennsylvania, Philadelphia, and Massachusetts Institute of Technology have language technology programs, said Carbonell. "But CMU is by far the largest and has the advantage in that we started first."

The Language Technologies Institute, in fact, grew out of voice-recognition research funded in the 1970s by the Defense Advanced Research Projects Agency, whose support led to development of the Internet. Three CMU programs were merged to form the institute in 1996.

CMU's speech recognition systems could convert only "a few dozen words" into text at first, Carbonell said. But by the 1990s, the capability grew to hundreds of words and stands at thousands.

Alex Waibel, CMU professor of computer science and language technology, has worked in the field since the 1980s and heads the institute's International Center for Advanced Communication Technologies, or InterACT.

He began developing programs to translate English spoken into a microphone into synthetic Spanish coming back through a computer speaker around 1990. It was "a bit awkward" and "took several minutes to do one sentence," Waibel said.

Now, he's got an iPhone app for that. It can begin spitting back up to 40,000 words -- from English to Spanish, or vice versa -- within about three seconds and costs $24.99.

Meaning, an American speaking no foreign language can take an iPhone to a Spanish-speaking country and converse and function effortlessly. Conversely, a Spanish-only speaker can use the iPhone app to communicate in English.

"We don't really know how many of these apps we might sell. But it just came out Oct. 21, so it's very new," said Waibel, who developed the technology at his recently launched Jibbigo Inc. The names stands for, "the gibberish of language on the go," he says.

But the market should be vast, he estimates, because users avoid two downsides: No need to type anything, and especially, no need to connect to a server and ring up a big phone bill from abroad. The voice recognition, language translation and speech synthesis capabilities are built into the cell phone.

"That means you can use it in the remotest village or on a plane or in the military without the enemy detecting where you are," said Waibel, who intends for Jibbigo to target health care workers in developing nations and government installations overseas.

To use the app, the user speaks a sentence, such as, "Where is the nearest hospital?" into the iPhone. Within three seconds, the device repeats the sentence in the opposite language. To erase and rephrase the sentence, the user just shakes the iPhone.

"So far, it's English to Spanish and Spanish to English," Waibel said. "But in the next six months, we hope to have four more languages." A laptop version already handles seven languages.

Another CMU spinout, Cepstral LLC, South Side, could have helped those Northwest pilots touch down in Minneapolis or provided navigational or other assistance. Founded in 2001, the South Side company's technology incorporates voice recognition, text searching and voice synthesis, Carbonell said.

"Like if a pilot is arriving somewhere and needs to be told something," he said. "That can be done by speech, and it calls attention to things immediately, without distracting your eyes."

Carnegie Speech, Downtown, provides voice recognition and speech synthesis technology. Co-founded by Carbonell in 2001, it went into commercial production in 2005, reached "several millions" in revenue and sold into several major markets, said CEO Angela Kennedy.

For instance, Carnegie Speech technology helps call-center operators and others with thick accents perfect their English pronunciation. The market is especially large in India and the Philippines, where operators' accents often frustrate English callers.

"The global spending in the market for improving speaking skills is about $6 billion right now," Kennedy said.

When CMU foreign language students were given 15 minutes to try out the system, they didn't want to leave the lab, said Maxine Eskanazi, associate teaching professor and Carnegie Speech co-founder. "So I figured it was pretty good," she said.

Called "Native Accent," the system displays on a computer screen the words a student speaks, with mispronounced words shown in red. When pronounced better, those words appear in yellow, then in green when the student "gets it right," she said. The screen illustrates the correct tongue and teeth placement to form the words.

It's also the company's technology bus riders have heard on the phone since 2005 when they call the Port Authority of Allegheny County after hours for scheduling and route information.

Carnegie Speech plans to serve a market being created by a new rule. Namely, the International Civil Aviation Organization, part of the United Nations, is requiring all civilian pilots and air traffic controllers to speak English by March 2011 in order to be certified.

"We think the size of this aviation market is probably about $300 million," Kennedy said. "This is a great thing for us."

Another product of CMU is M*Modal, Squirrel Hill, founded by three of the universities' students about 10 years ago. The company converts speech into text, especially creating clinical documentation and data.

"In its simplest form, it's converting what a doctor says into text," said Dr. Nick van Terheyden, M*Modal's chief medical officer.

The specialized speech-to-text technology enables hospitals and other health care providers to more cost-effectively and accurately search and use medical data. It grew the local company's work rolls from a half-dozen originally to about 40 today, he said.

Subscribe today! Click here for our subscription offers.


Show commenting policy