More on voicemail transcription

In a previous posting about Jott, I mentioned GotVoice. I spoke with Colin Lamont, the VP of Sales and Marketing at GotVoice the other day. GotVoice is a voicemail-to-email company with some interesting claims. First, it collects voicemail from all your voice mailboxes: cell phone, company, personal, then it transcribes it to text and sends it to you by email and SMS.

GotVoice sells its service directly to end users, and also licenses it to service providers. The largest end-user company that has licensed it to date has about 1,000 employees. The largest service provider licensed to date has 13 million subscribers. Most wireless companies bundle voicemail for free, so GotVoice appeals to them as a way to glean revenues from their voicemail repositories. Many service providers have cobbled-together networks formed by a series of acquisitions. For these, a by-product of the GotVoice service is that it pulls all their voicemail systems from multiple vendors into a unified system.

GotVoice claims that it works with any voicemail service. This is technically challenging. There are about 8 major systems vendors from whom telephone service providers buy voicemail equipment, and each of those providers has multiple iterations of its products. So GotVoice has done extensive work first to integrate with all of these by dial-up emulation of a user, then by direct access through the system APIs for service provider deployments.

A second collection of GotVoice special sauce is in their transcription technology. GotVoice has established an exclusive partnership with an ASR (Automatic Speech Recognition) vendor, working together to achieve a remarkable level of accuracy for automated recognition. The basis for this accuracy is twofold. First, it is tailored to voicemail, which tends to have a relatively consistent structure. Second, GotVoice had a non-transcription voice mail service for a few years, and amassed collection of archival voicemails from hundreds of thousands of users with which to train their recognizer. As a result, GotVoice claims 90% recognition accuracy, compared with 60%-65% from rivals.

This high accuracy enables GotVoice to depend less heavily on human transcribers. The obvious benefit of this is that their cost of doing business is lower because they need less workers. A less obvious benefit is that GotVoice claims greater confidentiality than its competitors. The agents who transcribe the parts that the ASR misses are presented only with small fragments of speech, and with a list of guesses from the recognizer. This means that the overall meaning of the message is less likely to be revealed to call center workers.

GotVoice charges $0.25 for each transcribed voicemail, with a minimum of $5.00 per month for the service.

GigaOM reviewed GotVoice in February. The review elicited some informative comments from users of various similar services.

I haven’t tried GotVoice yet, mainly because my current setup works well enough that my motivation to change is weak. I don’t have all that GotVoice offers, but I do have a single voice mailbox with a visual list of its contents.

My personal unified voicemail system is very simple. I only give out my landline number, which is provisioned to forward on busy/no answer to my cell phone. That way I pick it up on my desk when I am in the office and when I am out of the office the call rolls over to my mobile phone. If I don’t answer it there, it goes to voicemail. So all my voicemail is on the mobile.

Since my mobile is an iPhone, I get a nice visual voicemail interface. For each voicemail it shows the Caller ID and the time, though of course no text indicator of the contents. Unfortunately the iPhone visual voice mail has an irritating flaw: there is a long pause (4 or 5 seconds), between pressing the play button and starting to hear the message.

Jott for iPhone and other ASR products

In a previous posting I wished for an iPhone voice memo recorder, and I was disappointed to find that the 2.0 software load still lacked one. I now conclude that this was an intentional omission, yielding the opportunity to the new iPhone third party software community.

Last week I downloaded Jott, a free application, from the iTunes store. It is a serviceable voice recorder, so my wish is fulfilled.

But the beauty of the third party software community concept is that motivated, talented people in hungry startups will go beyond what’s justifiable in a large company like Apple, and this is what Jott has done. It doesn’t just record voice memos, it transcribes them into written text.

It works very well. It uses people to do the transcriptions. I am not sure if the utterances are preprocessed with Automatic Speech Recognition (ASR) and transmitted to humans for verification and correction, or if it is entirely done by people in a call center somewhere. When I mumble the text comes back as “Unclear,” but I can still play back what I said and recognize it for myself.

There are a few other transcription-type applications out there. Spinvox and PhoneTag transcribe voicemail into SMS and email. A great idea. Nuance, the world leader in voice recognition technology, announced a similar service in April.

In contrast to the foregoing, Yap is 100% automated, so to avoid mistakes it has the user verify its efforts. You speak the text you want to send as an SMS (or that you want to search the web for) and Yap renders it as text on your phone’s screen. You correct it and send it off. Yap doesn’t appear to be deployed yet.

Similar to Yap, but already deployed in the real world is Vlingo. I went to the Vlingo website to download a trial, but didn’t when I discovered I would have to buy a Blackberry to try it on. Vlingo was recently adopted by Yahoo! to power its onSearch mobile product. Nuance is suing Vlingo for patent infringement. Nuance has announced an application like this for the iPhone, but a search for “Nuance” in the iTunes store doesn’t yield any results for it yet.

Another ASR granddaddy is Tellme (now owned by Microsoft), which powers the Sprint Live Search service. Tellme also lets developers do free hosted low-volume implementations of their concepts in VoiceXML.

Getting back to my iPhone wish list, I am still baffled as to why it doesn’t do cut and paste. The argument that it would require an awkward user interface was exploded a year ago.

Update July 25th: I neglected to mention some other voicemail transcription services. Here is a comparative review of GotVoice, SpinVox, YouMail, and PhoneTag.