I spoke with Jonathan Christensen of Skype yesterday, about the new codec in the latest Windows beta of Skype:
MS: Skype announced a new voice codec at CES. What’s different about it from the old one?
JC: The new codec is code-named SILK. Compared to its predecessor, SVOPC, the new codec gives the same or better audio response at half the bit-rate for wideband, and we also introduced a super wideband mode. SVOPC is a 16kHz sample rate, 8kHz audio bandwidth. The new codec has that mode as well, but it also has a 24 kHz sample rate, 12 kHz audio bandwidth mode. Most USB headsets have enough capture and render fidelity that you can experience the 12 kHz super wideband audio.
MS: Is the new codec an evolution of SVOPC?
JC: The new codec was a separate development branch from SVOPC. It has been under development for over 3 years, during which we focused both on the codec and the echo canceller and all the surrounding bits, and eventually got all that put together.
MS: What about the computational complexity?
JC: The new codec design point was different from SVOPC. SVOPC was designed for use on the desktop with a math coprocessor. It is actually pretty efficient. It’s just that it has a number of floats in it so it becomes extremely inefficient when it’s not on a PC.
The new codec’s design goal was to be ultra lightweight and embeddable. The vast majority of the addressable device market is better suited to fixed point, so it’s written in fixed point ANSI C – it’s as lightweight as a codec can be in terms of CPU utilization. Our design point was to be able to put it into mobile devices where battery life and CPU power are constrained, and it took almost 3 years to put it together. It’s a fundamental, ground up development; lots of very interesting science going into it, and a really talented developer leading the project. And now it’s ready. It’s a pretty significant jump forward.
MS: Is the new codec based on predictive voice coding?
JC: SVOPC has two modes, an audio mode and a speech mode, and the speech mode is much more structured towards speech. The new codec strikes little bit more of a balance between a general audio coder and a speech coder. So it does a pretty good job with stuff like background noise and music. But to get that kind of bit-rate reduction there are things about speech that you can capitalize on and get huge efficiency; we didn’t toss all that out. We are definitely using some of the model approach.
MS: Normally one expects with an evolution for the increments to get smaller over time. With the new codec you are getting a 50% improvement in bandwidth utilization, so you can’t be at the incremental stage yet?
JC: I don’t think we are. We were listening to samples from various versions of the client going back to 2.6, now we are at 4.0. In the same situation – pushing the same files in the same acoustic settings through the different client versions – in every release there’s a noticeable (even to the naked ear) difference in quality between the releases.
We are not completely done with it. There are many different areas where we can continue to optimize and tweak it, but we believe it’s at or above the current state of the industry in terms of performance.
Skype 4.0 for Windows has the new codec.
The current Mac beta doesn’t yet support the new codec.
Update: February 3rd,2009: Here is a write-up of SILK from the Skype Journal.
Update: March 7th, 2009: Skype has announced an SDK for third parties to implement SILK in their products, royalty free.
10 Replies to “Skype’s new super-wideband codec”
I use Skype on my 3G connection on my mobile. Would love to see this codec make its way to the Windows Mobile Skype client and 3rd party clients like Fring. Not sure if it has already…?
I don’t believe it is on either of these clients yet. Skype has made no announcement on any platform other than Windows. As for Fring, I don’t know the answer, but here’s my guess: looking at their sketchy technical documentation it appears that Fring runs the Skype client on its own servers, and connects to it from a “thin client” on the handset. This implies that the handset uses Fring’s codec to reach the servers, where it is transcoded to Skype. The Skype running on the server probably doesn’t support the new codec yet, but even if it did, it would not help unless the Fring codec was wideband. I don’t know what codec Fring uses, but I don’t see any mention of wideband on the Fring website. Even if the Fring codec was wideband, the transcoding step would still introduce latency and noise.
Update 8 February: Skype supports G.729 in both the clients and gateways so this is probably what Fring uses.
Any information of insight into the delay performance of this codec?
Thanks for asking. The total delay is 25 ms: 20 ms frame size + 5 ms look-ahead.
Skype 4.0 has now been released, and the users have been vociferous in their dislike of it, see Skype Users are Revolting.
Skype account systems are NOT safe. Google “skype account hijacked” [as mine was] and see how many folk are having their accounts compromised and credit stolen. It is HIGH time that public warnings were issued so that everyone realises the folly of contracting with skype – no matter what codec they may invent … it’s account access and security that matters the MOST to users.