Skype’s new super-wideband codec

I spoke with Jonathan Christensen of Skype yesterday, about the new codec in the latest Windows beta of Skype:

MS: Skype announced a new voice codec at CES. What’s different about it from the old one?

JC: The new codec is code-named SILK. Compared to its predecessor, SVOPC, the new codec gives the same or better audio response at half the bit-rate for wideband, and we also introduced a super wideband mode. SVOPC is a 16kHz sample rate, 8kHz audio bandwidth. The new codec has that mode as well, but it also has a 24 kHz sample rate, 12 kHz audio bandwidth mode. Most USB headsets have enough capture and render fidelity that you can experience the 12 kHz super wideband audio.

MS: Is the new codec an evolution of SVOPC?

JC: The new codec was a separate development branch from SVOPC. It has been under development for over 3 years, during which we focused both on the codec and the echo canceller and all the surrounding bits, and eventually got all that put together.

MS: What about the computational complexity?

JC: The new codec design point was different from SVOPC. SVOPC was designed for use on the desktop with a math coprocessor. It is actually pretty efficient. It’s just that it has a number of floats in it so it becomes extremely inefficient when it’s not on a PC.
The new codec’s design goal was to be ultra lightweight and embeddable. The vast majority of the addressable device market is better suited to fixed point, so it’s written in fixed point ANSI C – it’s as lightweight as a codec can be in terms of CPU utilization. Our design point was to be able to put it into mobile devices where battery life and CPU power are constrained, and it took almost 3 years to put it together. It’s a fundamental, ground up development; lots of very interesting science going into it, and a really talented developer leading the project. And now it’s ready. It’s a pretty significant jump forward.

MS: Is the new codec based on predictive voice coding?

JC: SVOPC has two modes, an audio mode and a speech mode, and the speech mode is much more structured towards speech. The new codec strikes little bit more of a balance between a general audio coder and a speech coder. So it does a pretty good job with stuff like background noise and music. But to get that kind of bit-rate reduction there are things about speech that you can capitalize on and get huge efficiency; we didn’t toss all that out. We are definitely using some of the model approach.

MS: Normally one expects with an evolution for the increments to get smaller over time. With the new codec you are getting a 50% improvement in bandwidth utilization, so you can’t be at the incremental stage yet?

JC: I don’t think we are. We were listening to samples from various versions of the client going back to 2.6, now we are at 4.0. In the same situation – pushing the same files in the same acoustic settings through the different client versions – in every release there’s a noticeable (even to the naked ear) difference in quality between the releases.

We are not completely done with it. There are many different areas where we can continue to optimize and tweak it, but we believe it’s at or above the current state of the industry in terms of performance.

————–

Skype 4.0 for Windows has the new codec.
The current Mac beta doesn’t yet support the new codec.

Update: February 3rd,2009: Here is a write-up of SILK from the Skype Journal.

Update: March 7th, 2009: Skype has announced an SDK for third parties to implement SILK in their products, royalty free.

Fixed Mobile Substitution and Voice over Wi-Fi

Getting rid of your land-line phone and relying on your cell phone instead is called Fixed Mobile Substitution (FMS).

A report from the National Center for Health Statistics of the Centers for Disease Control (CDC) shows a linear increase in the number of households that have a cell phone but no land-line, starting at 4.4% in 2004 and reaching 16.1% in the first half of 2008.
US Fixed Mobile Substitution 2005-2008 - source: CDC

These numbers match those in a recent Nielsen report on FMS.

FMS will most likely accelerate in 2009 because of the recession. It will be interesting to see by how much. We will reach a tipping point soon. 13% of households have a landline that they don’t use.

There are about 112 million occupied housing units in the US, and about 71 million broadband subscribers.

So what does this mean for Wi-Fi VoIP? One of the primary reasons for FMS is to save money; it is more prevalent in lower income households. There are two kinds of phone that do VoWi-Fi, smartphones and UMA phones. Smartphones are expensive, and probably less common among the cord-cutting demographic – except that that demographic is also younger and better educated as well as having a modest income – many are students.

Wi-Fi VoIP in smart phones is still negligible, but the seeds are planted: vigorous growth of smart phones, Wi-Fi attach rate to smart phones trending to 100%, a slow but steady opening up of smart phones to third party applications, broadband in most homes, Wi-Fi growing in all markets.

Wideband codecs and IPR

Wideband codecs are a good thing. They have been slow to enter the mainstream, but there are several reasons why this is about to change.

Voice codecs are benefiting from the usual good effects of Moore’s law. Each year higher-complexity (higher computation load) codecs become feasible on low-cost hardware, and each year it is cheaper to fit multiple codecs into a ROM (adding multiple codecs increases the chance that two endpoints will have one in common).

Voice codecs are often burdened by claims of intellectual property rights (IPR) by multiple players. This can make it difficult for software and equipment vendors to use codecs in their products without fear of litigation. The industry response has been to create “patent pools” where the patent owners agree to let a single party negotiate a blanket license on their behalf:

Prior to establishment of the Pool, the complexity of negotiating IPRs with each intellectual property owner discouraged potential integrators.

Unfortunately there is still no pool for the standard wideband codec ratified by the 3GPP for use in cell phones, AMR-WB (G.722.2). Even where there is a pool, getting a license from it doesn’t mean that a use of the codec doesn’t infringe some yet-to-be-revealed patent not in the pool, and it doesn’t indemnify the licensee from such a claim.

There are several royalty-free wideband codecs available. I mentioned a couple of them (from Microsoft and from Skype) in an Internet Telephony Column.

Microsoft and Skype have got around the royalty issue to some extent by creating proprietary codecs. They have researched their algorithms and have either concluded that they don’t infringe or have bought licenses for the patents they use.

G.722 (note that G.722, G.722.1 and G.722.2 are independent of each other, both technically and from the point of view of IPR) is so old that its patent restrictions have expired, making it an attractive choice of common baseline wideband codec for all devices. Unfortunately its antiquity also means that it is relatively inefficient in its use of bandwidth.

Polycom did a major good thing for the industry when it made G.722.1 (Siren7) available on a royalty-free basis. G.721.1 is considerably better than G.722, though it is not as efficient as G.722.2.

The open-source Speex codec is efficient and royalty free, but being open source it bears a little more fear of infringement than the other codecs mentioned here. There are three reasons why this fear may be misplaced. First, the coders claim to have based it on old (1980’s) technology. Second, it has now been available for some years, and has been shipped by large companies and no claims of infringement have surfaced. Third, while it is possible in these times of outrageous patent trolling that somebody will pop up with some claim against Speex, a similar risk exists for all the other codecs, including the ones with patent pools.

So we now have three royalty-free wideband codecs (G.722, G.722.1 and Speex); we have hardware capable of running them cheaply; we have broad deployment of VoIP and growing implementation of VoIP trunking. We have increasing data bandwidth to homes and businesses, to the point where the bandwidth demands of voice are trivial compared to other uses like streaming video and music downloads. Plus there’s a wild card. By 2010 over 300 million people will have mobile smartphones capable of running software that will give them wideband phone conversations over a Wi-Fi connection.

Perhaps the time for wideband telephony is at hand.

Counterpath’s new strategy

Counterpath has an enviable incumbency in the PC soft-phone market. Their eyeBeam soft phone is licensed by numerous service providers and PBX manufacturers. But the soft phone business is not enormous, so Counterpath is looking to use its leadership in the soft phone business as a beachhead into the fixed-mobile convergence space. Fixed-mobile convergence comes in two flavors: service provider and enterprise. So last year Counterpath made three acquisitions to fill in the spaces of a two by two matrix, with enterprise and service provider on one axis, and client software and mobility controller server software on the other.

Counterpath bought FirstHand for its Enterprise Mobility Gateway (EMG) and Bridgeport Networks for its service provider Network Convergence Gateway (NCG). It already had client software for service providers covered with its eyeBeam software. It bought NewHeights for its enterprise client software, a softphone with PBX features to complement the more consumer-oriented eyeBeam phone. These two soft phones have already been integrated by Counterpath into their new Bria softphone. It remains a challenge to get the soft phones and the two gateways working together seamlessly. It will also be a challenge to gain market share in the mobility gateway market.

Most mobility gateway vendors tend to focus on either service provider or enterprise customers, but Counterpath is not unique in having gateway devices for both. Tango Networks claims this as the differentiating feature of their solution; Tango’s two devices were designed from the outset to work together and complement each other. Counterpath must integrate two products with independent pedigrees. The NCG that came from Bridgeport is a pre-IMS solution. When a call comes in for a cell phone, the NCG can decide whether to ring the cell phone, a soft phone on a PC or both. The EMG that came from FirstHand is an enterprise mobility controller similar to RIM’s Ascendent product.

Neither of the two Gateways provides “true” FMC, namely the ability to run a call over Wi-Fi to a dual mode cell phone; this is presumably in the near future. The NCG fields calls to a cell phone number and directs them to a PC in the enterprise, while the EMG fields calls to the PBX and can route them to a 3G cellphone via a VoIP connection. What’s interesting about this particular solution is that it uses the 3G data connection for the VoIP call, rather than using the regular cellular voice connection. According to Counterpath the QoS (latency, jitter, packet loss) on the 3G data connection provides equivalent call quality to a cellular voice connection.

Low cost international calls from your mobile phone

I wrote about the vast array of ways to bypass international tolls in my Internet Telephony column a while back. Now there is an interesting web site, LowCostMob.com, that gives a listing of the services available and technical explanations of how they work.

If you go to the “contact us” link on the website you can type in “user feedback” with mini-reviews of the services. I presume that over time the database of user comments will become an additional helpful resource on the site.

All these services work to make calls to international destinations cheaper, but if you actually travel abroad you still have to pay exorbitant roaming charges for using the cellular network. The benefit of dual-mode (Wi-Fi plus cellular) phones is that with some of them you can use the Wi-Fi connection to make VoIP calls and completely bypass the cellular network, avoiding international roaming charges. Not all the listed services support this feature, and not all dual mode phones do either.

Verizon’s basic VoIP patents ruled invalid

Back in 2007, Verizon sued Vonage over three basic VoIP patents, and Vonage ended up settling for $120 million. It was a complicated story. Three US patents were involved: 6,104,711, 6,282,574 and 6,359,880. Verizon won that case, and was awarded $58 million plus a 5.5% royalty on Vonage’s future business. Vonage appealed, and the appeals court vacated the $58 million damages award and the 5.5% royalty. But it was on a minor point:

We hold that the district court did not err in its construction of disputed claim terms of the ’574 and ’711 patents. Therefore, we affirm the judgment of infringement with respect to those claims. However, we hold that the district court improperly construed one of the disputed terms in the ’880 patent, and accordingly vacate the judgment of infringement with respect to the ’880 patent and remand for a new trial… We vacate in its entirety the award of $58,000,000 in damages and the 5.5% royalty and remand to the district court for further proceedings.

But the case never went back to the district court! Verizon and Vonage had settled before the verdict, and under the terms of the settlement the verdict triggered a $120 million payment from Vonage to Verizon. Vonage went on to settle similar patent issues with AT&T for $39 million and Sprint for $80 million.

This year Verizon sued Cox on similar issues in the same court, Judge Claude Hilton’s court in the Eastern Virginia Federal District. This time Verizon lost. The jury found the claims of the ‘711 and ‘574 patents to be invalid, and Cox not guilty of infringing the others. Here is my summary of the claims that were found to be invalid:

US patent 6,104,711:
Claim 1 – A DNS (or similar) server translating an address based on a condition
Claim 3 – Like claim 1, where the condition is the status of an endpoint
Claim 11 – Like claim 1, where the condition is a query of an endpoint

US patent 6,282,574:
Claim 5 – Like 711.1, where the server returns a phone number (but no condition is involved)
Claim 6 – Like 574.5, where the server returns a phone number plus an IP (or similar) address

Presumably Verizon will appeal, but to this layman they seem unlikely to win. Their previous victory over Vonage was pyrrhic; the definitions returned by the Markman hearing in that case and the reasoning of the appeal court ruling broadened the scope of the patents to the extent that they encompassed a ton of prior art, as you probably expected when you saw the claim summaries above.

There are numerous patents covering VoIP, and numerous patent holders wanting a slice of the pie. James Surowiecki wrote a characteristically good piece on this type of situation in the New Yorker in August.

Femtocell versus Wi-Fi

Rethink Research has published an interesting article relating the new Wi-Fi voice certification to the outlook for femtocells.

The idea of the article is that voice over Wi-Fi for cell phones is competing with femtocells, and that femtocells may win out. The article distinguishes between business voice and consumer voice, saying that service providers see femtocells as “an important stalking horse for greater control of corporate customers. ” This gives a hint of why femtocells may be unattractive to businesses: many of them would rather not yield this control.

Consumer voice service is controlled by service providers. They have three options in this space: do nothing, deploy femtocells or deploy Wi-Fi. Do nothing is the obvious best choice, since neither of the other options carries a revenue upside. But poor coverage in a home discourages usage and risks cancellations of subscriptions. So in areas of poor coverage something like femtocells or UMA (voice over Wi-Fi) is attractive to service providers. For both technologies the service provider subsidizes the wireless router, but femtocells will remain more expensive than Wi-Fi routers because of their lower sales volumes, so Wi-Fi is more attractive on this count. But UMA requires phones with Wi-Fi, while femtocells will work with any phone in the service provider’s line-up, including legacy ones. So the customers’ experience of femtocells is better – they can choose or keep the phone they want and still get improved coverage at home. This benefit of femtocells clearly outweighs the marginal price advantage of Wi-Fi routers. Femtocells may help subscriber retention in another way: a Wi-Fi router is not tied to any particular cellular service provider, while a femtocell only works with the carrier that supplied it.

The situation in businesses is different. They generally prefer to control their own voice systems, which is why they have PBXs. But a substantial number of business calls are now made on cell phones, even on company premises. These calls don’t go through the PBX, so they are not least-cost-routed and they are not logged or managed by the IT department. Femtocells don’t fix these problems, but Voice over Wi-Fi does. Not service provider Voice over Wi-Fi, like UMA, but SIP-based Voice over Wi-Fi from companies like DiVitas and Agito. What about phone choice though? Won’t corporate customers be stuck with a limited choice of handsets? The answer is yes, only a limited number of phones have Wi-Fi: less than 10% of those sold in 2008. But in the category of enterprise smart phones, like the Nokia Eseries and Blackberries, the attach rate of Wi-Fi will soon be close to 100%.

So femtocells are a good way for service providers to remedy churn caused by poor residential coverage for consumers, but Wi-Fi may be the better option for businesses that want to regain control over their voice traffic.

Wi-Fi certification for voice devices

In news that is huge for VoWi-Fi, the Wi-Fi Alliance announced on June 30th a new certification program, “Voice-Personal.” Eight devices have already been certified under this program, including enterprise access points from Cisco and Meru, a residential access point from Broadcom, and client adapters from Intel and Redpine Signals.

Why is this huge news? Well, as the press release points out, by 2011 annual shipments of cell phones with Wi-Fi will be running at roughly 300 million units. The Wi-Fi in these phones will be used for Internet browsing, for syncing photos and music with PCs, and for cheap or free voice calls.

The certification requirements for Voice-Personal are not aggressive: only four simultaneous voice calls in the presence of data traffic, with a latency of less than 50 milliseconds and a maximum jitter of less than 50 milliseconds. These numbers will produce an acceptable call under most conditions, but a network round-trip delay of 300 ms is generally considered to approach the limit of acceptability, and with a Wi-Fi hop at each end running at the limit of these specifications there would be no room in the latency budget for any additional delays in the voice path. The packet loss requirement, 1% with no burst losses, is a very good number considering that modern voice codecs from companies like GIPS can yield excellent sound quality in the presence of much higher packet loss. This number is hard to achieve in the real world, as phones encounter microwave ovens, move through spots of poor coverage and transition between access points.

Since this certification is termed “Voice-Personal,” four active calls per access point is acceptable; a residence is unlikely to need more than that. Three of the four access points submitted for this certification are enterprise access points. They should be able to handle many more calls, and probably can. The Wi-Fi Alliance is planning a “Voice-Enterprise” certification for 2009.

There are several things that are good about this certification. First, the WFA has seen fit to highlight voice as a primary use for Wi-Fi, and has set a performance baseline. Second, this certification requires some other certifications as well, like WMM power save and WMM QoS. So far in 2008, of 99 residential access points certified only 6 support WMM power save, and of 52 enterprise access points only 13 support WMM power save. One of the biggest criticisms of Wi-Fi in handsets is that it draws too much power. WMM power save yields radical improvements in battery life – better than doubling talk time and increasing standby time by over 30%, according to numbers in the WFA promotional materials.

Making lemons into lemonade

Phybridge is a Canadian startup (founded May 2007) aiming to solve some of the problems of VoIP implementation. Its premise is that in many cases, an organization seeking to move from a traditional TDM phone network to a VoIP network does not have an Ethernet LAN capable of supporting VoIP. This inadequacy may result from insufficient capacity, QoS or reliability.

The conventional solution in these cases is to upgrade the Ethernet network while junking the old phone wiring.

Phybridge proposes to leave the Ethernet network alone, and to reuse the old phone wiring to implement a parallel data network, using Ethernet over a flavor of DSL. This is similar to HomePNA, but aimed at business use rather than consumer, and done point-to-point rather than into a shared medium.

The solution consists of two parts: a central box called “Uniphyer” has 24 ports connected to the legacy phone wiring. At the other end of each cable run is a “phy adapter” the size of a pack of cigarettes that you plug into the legacy phone jack, and into which you plug your Ethernet VoIP phone.

The Uniphyer provides power over the same copper pair as the data, so you can plug power-over-Ethernet phones into the client adapters.

The data rate is 3 megabits per second upstream, 30 down. This is slow for a data network, but certainly adequate for VoIP, so an organization that is replacing a conventional PBX phone system with a VoIP one may find Phybridge a cost effective solution if their existing data network isn’t up to VoIP, and the required improvements are extensive.

The Uniphyer is scheduled to launch at the end of September.

How does 802.11n get to 600Mbps?

802.11n incorporates all earlier amendments to 802.11, including the MAC enhancements in 802.11e for QoS and power savings.

The design goal of the 802.11n amendment is “HT” for High Throughput. The throughput it claims is high indeed: up to 600 Mbps in raw bit-rate. Let’s start with the maximum throughput of 802.11g (54 Mbps), and see what techniques 802.11n applies to boost it to 600 Mbps:

1. More subcarriers: 802.11g has 48 OFDM data subcarriers. 802.11n increases this number to 52, thereby boosting throughput from 54Mbps to 58.5 Mbps.

2. FEC: 802.11g has a maximum FEC (Forward Error Correction) coding rate of 3/4. 802.11n squeezes some redundancy out of this with a 5/6 coding rate, boosting the link rate from 58.5 Mbps to 65 Mbps.

3. Guard Interval: 802.11a has Guard Interval between transmissions of 800ns. 802.11n has an option to reduce this to 400ns, which boosts the throughput from 65 Mbps to 72.2 Mbps.

4. MIMO: thanks to the magical effect of spatial multiplexing, provided there are sufficient multi-path reflections, the throughput of a system goes up linearly with each extra antenna at both ends. Two antennas at each end double the throughput, three antennas at each end triple it, and four quadruple it. The maximum number of antennas in the receive and transmit arrays specified by 802.11n is four. This allows four simultaneous 72.2 Mbps streams, yielding a total throughput of 288.9 Mbps.

5. 40 MHz channels: all previous versions of 802.11 have a channel bandwidth of 20MHz. 802.11n has an optional mode (controversial and not usable in many circumstances) where the channel bandwidth is 40 MHz. While the channel bandwidth is doubled, the number of data subcarriers is slightly more than doubled, going from 52 to 108. This yields a total channel throughput of 150 Mbps. So again combining four channels with MIMO, we get 600 Mbps.

Lower MAC overhead
But raw throughput is not a very informative number.

The 11a/g link rate is 54 Mbps, but the higher layer throughput is only 26 Mbps; the MAC overhead is over 50%! In 11n when the link rate is 65 Mbps, the higher layer throughput is about 50 Mbps; the MAC overhead is down to 25%.

Bear mind that these numbers are the absolute top speed you can get out of the system. 802.11n has numerous modulation schemes to fall back to when the conditions are less than perfect, which is most of the time.

But to minimize these fall-backs, 11n contains additional improvements to make the effective throughput as high as possible under all circumstances. These improvements are described in the following paragraphs.

Fast MCS feedback – rate selection.
Existing equipment finds it hard to track rapid changes in the channel. Say you walk through the shadow of a pole in the building. The rate may go from 50 to 6 to 50 mbps in one step. It’s hard for conventional systems to track this, because they adapt based on transmit errors. With delay sensitive data like voice you have to be very conservative, so adapting up is much slower than down. 11n adds explicit per-packet feedback, recommending the transmission speed for the next packet. This is called Fast MCS (Modulation and Coding Scheme) Feedback.

LDPC (Low Density Partity Check) coding
LDPC is a super duper Forward Error Correction mechanism. Although it is almost 50 years old, it is the most effective error correcting code developed to date; it nears the theoretical limit of efficiency. It was little used until recently because of its high compute requirement. An interesting by-product of its antiquity is that it is relatively free of patent issues.

Transmit beam-forming
The term beam-forming conjures up images of a laser-like beam of radio waves pointing exactly at the client device, but it doesn’t really work like that. If you look at a fine-resolution map of signal intensity in a room covered by a Wi-Fi access point, it looks like the surface of a pond disturbed by a gust of wind – it is a patchwork of bumps and dips in signal intensity, some as small as a few cubic inches in volume. Transmit beam-forming adjusts the phase and transmit power at the various antennas to move one of the maxima of signal intensity to where the client device is.

STBC
In a phone the chances are that there will only be one Wi-Fi antenna, so there will be only one spatial channel. Even so, the MIMO technique of STBC (Space-Time Block Coding) enables the handset to take advantage of the multiple antennas on the Access Point to improve range, both rate-at-range and limiting range.

Incidentally, to receive 802.11n certification by the Wi-Fi Alliance, all devices must have two or more antennas except handsets which can optionally have a single antenna. Several considerations went into allowing this concession to handsets, mainly size and power constraints. STBC is particularly useful to handsets. It yields the robustness of MIMO without a second radio, which saves all the power the second radio would burn. This power saving is compounded with another: because of the greater rate-at-range the radio is on for less time while transmitting a given quantity of data. STBC is optional in 802.11n, though it should always be implemented for systems that support 802.11n handsets.

Hardware assistance
Many of these features impose a considerable compute load. LDPC and STBC fall into this category. This is an issue for handsets, since computation costs battery life. Fortunately these features are amenable to hardware implementation. With dedicated hardware the computation happens rapidly and with little cost in power.