I spoke with Jonathan Christensen of Skype yesterday, about the new codec in the latest Windows beta of Skype:
MS: Skype announced a new voice codec at CES. What’s different about it from the old one?
JC: The new codec is code-named SILK. Compared to its predecessor, SVOPC, the new codec gives the same or better audio response at half the bit-rate for wideband, and we also introduced a super wideband mode. SVOPC is a 16kHz sample rate, 8kHz audio bandwidth. The new codec has that mode as well, but it also has a 24 kHz sample rate, 12 kHz audio bandwidth mode. Most USB headsets have enough capture and render fidelity that you can experience the 12 kHz super wideband audio.
MS: Is the new codec an evolution of SVOPC?
JC: The new codec was a separate development branch from SVOPC. It has been under development for over 3 years, during which we focused both on the codec and the echo canceller and all the surrounding bits, and eventually got all that put together.
MS: What about the computational complexity?
JC: The new codec design point was different from SVOPC. SVOPC was designed for use on the desktop with a math coprocessor. It is actually pretty efficient. It’s just that it has a number of floats in it so it becomes extremely inefficient when it’s not on a PC.
The new codec’s design goal was to be ultra lightweight and embeddable. The vast majority of the addressable device market is better suited to fixed point, so it’s written in fixed point ANSI C – it’s as lightweight as a codec can be in terms of CPU utilization. Our design point was to be able to put it into mobile devices where battery life and CPU power are constrained, and it took almost 3 years to put it together. It’s a fundamental, ground up development; lots of very interesting science going into it, and a really talented developer leading the project. And now it’s ready. It’s a pretty significant jump forward.
MS: Is the new codec based on predictive voice coding?
JC: SVOPC has two modes, an audio mode and a speech mode, and the speech mode is much more structured towards speech. The new codec strikes little bit more of a balance between a general audio coder and a speech coder. So it does a pretty good job with stuff like background noise and music. But to get that kind of bit-rate reduction there are things about speech that you can capitalize on and get huge efficiency; we didn’t toss all that out. We are definitely using some of the model approach.
MS: Normally one expects with an evolution for the increments to get smaller over time. With the new codec you are getting a 50% improvement in bandwidth utilization, so you can’t be at the incremental stage yet?
JC: I don’t think we are. We were listening to samples from various versions of the client going back to 2.6, now we are at 4.0. In the same situation – pushing the same files in the same acoustic settings through the different client versions – in every release there’s a noticeable (even to the naked ear) difference in quality between the releases.
We are not completely done with it. There are many different areas where we can continue to optimize and tweak it, but we believe it’s at or above the current state of the industry in terms of performance.
Update: February 3rd,2009: Here is a write-up of SILK from the Skype Journal.
Update: March 7th, 2009: Skype has announced an SDK for third parties to implement SILK in their products, royalty free.