Skype’s new super-wideband codec

I spoke with Jonathan Christensen of Skype yesterday, about the new codec in the latest Windows beta of Skype:

MS: Skype announced a new voice codec at CES. What’s different about it from the old one?

JC: The new codec is code-named SILK. Compared to its predecessor, SVOPC, the new codec gives the same or better audio response at half the bit-rate for wideband, and we also introduced a super wideband mode. SVOPC is a 16kHz sample rate, 8kHz audio bandwidth. The new codec has that mode as well, but it also has a 24 kHz sample rate, 12 kHz audio bandwidth mode. Most USB headsets have enough capture and render fidelity that you can experience the 12 kHz super wideband audio.

MS: Is the new codec an evolution of SVOPC?

JC: The new codec was a separate development branch from SVOPC. It has been under development for over 3 years, during which we focused both on the codec and the echo canceller and all the surrounding bits, and eventually got all that put together.

MS: What about the computational complexity?

JC: The new codec design point was different from SVOPC. SVOPC was designed for use on the desktop with a math coprocessor. It is actually pretty efficient. It’s just that it has a number of floats in it so it becomes extremely inefficient when it’s not on a PC.
The new codec’s design goal was to be ultra lightweight and embeddable. The vast majority of the addressable device market is better suited to fixed point, so it’s written in fixed point ANSI C – it’s as lightweight as a codec can be in terms of CPU utilization. Our design point was to be able to put it into mobile devices where battery life and CPU power are constrained, and it took almost 3 years to put it together. It’s a fundamental, ground up development; lots of very interesting science going into it, and a really talented developer leading the project. And now it’s ready. It’s a pretty significant jump forward.

MS: Is the new codec based on predictive voice coding?

JC: SVOPC has two modes, an audio mode and a speech mode, and the speech mode is much more structured towards speech. The new codec strikes little bit more of a balance between a general audio coder and a speech coder. So it does a pretty good job with stuff like background noise and music. But to get that kind of bit-rate reduction there are things about speech that you can capitalize on and get huge efficiency; we didn’t toss all that out. We are definitely using some of the model approach.

MS: Normally one expects with an evolution for the increments to get smaller over time. With the new codec you are getting a 50% improvement in bandwidth utilization, so you can’t be at the incremental stage yet?

JC: I don’t think we are. We were listening to samples from various versions of the client going back to 2.6, now we are at 4.0. In the same situation – pushing the same files in the same acoustic settings through the different client versions – in every release there’s a noticeable (even to the naked ear) difference in quality between the releases.

We are not completely done with it. There are many different areas where we can continue to optimize and tweak it, but we believe it’s at or above the current state of the industry in terms of performance.


Skype 4.0 for Windows has the new codec.
The current Mac beta doesn’t yet support the new codec.

Update: February 3rd,2009: Here is a write-up of SILK from the Skype Journal.

Update: March 7th, 2009: Skype has announced an SDK for third parties to implement SILK in their products, royalty free.

Fixed Mobile Substitution and Voice over Wi-Fi

Getting rid of your land-line phone and relying on your cell phone instead is called Fixed Mobile Substitution (FMS).

A report from the National Center for Health Statistics of the Centers for Disease Control (CDC) shows a linear increase in the number of households that have a cell phone but no land-line, starting at 4.4% in 2004 and reaching 16.1% in the first half of 2008.
US Fixed Mobile Substitution 2005-2008 - source: CDC

These numbers match those in a recent Nielsen report on FMS.

FMS will most likely accelerate in 2009 because of the recession. It will be interesting to see by how much. We will reach a tipping point soon. 13% of households have a landline that they don’t use.

There are about 112 million occupied housing units in the US, and about 71 million broadband subscribers.

So what does this mean for Wi-Fi VoIP? One of the primary reasons for FMS is to save money; it is more prevalent in lower income households. There are two kinds of phone that do VoWi-Fi, smartphones and UMA phones. Smartphones are expensive, and probably less common among the cord-cutting demographic – except that that demographic is also younger and better educated as well as having a modest income – many are students.

Wi-Fi VoIP in smart phones is still negligible, but the seeds are planted: vigorous growth of smart phones, Wi-Fi attach rate to smart phones trending to 100%, a slow but steady opening up of smart phones to third party applications, broadband in most homes, Wi-Fi growing in all markets.