AT&T to deploy Voice over Wi-Fi on iPhones

Don’t get too excited by Apple’s announcement of a Voice over IP service on the iPhone 3.0. It strains credulity that AT&T would open up the iPhone to work on third party VoIP networks, so presumably the iPhone’s VoIP service will be locked down to AT&T.

AT&T has a large network of Wi-Fi hotspots where iPhone users can get free Wi-Fi service. The iPhone VoIP announcement indicates that AT&T may be rolling out voice over Wi-Fi service for the iPhone. It will probably be SIP, rather than UMA, the technology that T-Mobile uses for this type of service. It is likely to be based on some flavor of IMS, especially since AT&T has recently been rumored to be spinning up its IMS efforts for its U-verse service, which happens to include VoIP. AT&T is talking about a June launch.

An advantage of the SIP flavor of Voice over Wi-Fi is that unlike UMA it can theoretically negotiate any codec, allowing HD Voice conversations between subscribers when they are both on Wi-Fi; wouldn’t that be great? The reference to the “Voice over IP service” in the announcement is too cryptic to determine what’s involved. It may not even include seamless roaming of a call between the cellular and Wi-Fi networks (VCC).

AT&T has several Wi-Fi smartphones in addition to the iPhone. They are mostly based on Windows Mobile, so they can probably be enabled for this service with a software download. The same goes for Blackberries. Actually, RIM may be ahead of the game, since it already has FMC products in the field with T-Mobile, albeit on UMA rather than SIP, while Windows Mobile phones are generally ill-suited to VoIP.

Skype’s SILK codec available royalty free to third parties

I wrote earlier about the need for royalty-free wideband codecs, and about a conversation with Jonathan Christensen about SILK, Skype’s new super-wideband codec.

This week Jonathan announced that Skype is releasing an SDK to let third parties integrate SILK with their products, and distribute it royalty free. This is very good news. It comes on top of Skype’s announcement that Nokia is putting the Skype client on some of its high end phones. If the Nokia deal includes SILK, and the platform exposes SILK to third party applications on the phones, SILK will quickly become the most widely used wideband codec for SIP as well as the most widely used wideband codec, period. That is, if the Nokia deal stands.

Polycom has been leading the wideband codec charge on deskphones, and it already co-brands a phone with Skype. It would make sense for Polycom to add SILK to its entire line of IP phones.

For network applications like voice, Metcalfe’s Law is like gravity. Skype has over 400 million users. If the royalty-free license has no catches, the wideband codec debate is history, at least until LTE brings AMR-WB to mass-market cell phones.

Skype’s new super-wideband codec

I spoke with Jonathan Christensen of Skype yesterday, about the new codec in the latest Windows beta of Skype:

MS: Skype announced a new voice codec at CES. What’s different about it from the old one?

JC: The new codec is code-named SILK. Compared to its predecessor, SVOPC, the new codec gives the same or better audio response at half the bit-rate for wideband, and we also introduced a super wideband mode. SVOPC is a 16kHz sample rate, 8kHz audio bandwidth. The new codec has that mode as well, but it also has a 24 kHz sample rate, 12 kHz audio bandwidth mode. Most USB headsets have enough capture and render fidelity that you can experience the 12 kHz super wideband audio.

MS: Is the new codec an evolution of SVOPC?

JC: The new codec was a separate development branch from SVOPC. It has been under development for over 3 years, during which we focused both on the codec and the echo canceller and all the surrounding bits, and eventually got all that put together.

MS: What about the computational complexity?

JC: The new codec design point was different from SVOPC. SVOPC was designed for use on the desktop with a math coprocessor. It is actually pretty efficient. It’s just that it has a number of floats in it so it becomes extremely inefficient when it’s not on a PC.
The new codec’s design goal was to be ultra lightweight and embeddable. The vast majority of the addressable device market is better suited to fixed point, so it’s written in fixed point ANSI C – it’s as lightweight as a codec can be in terms of CPU utilization. Our design point was to be able to put it into mobile devices where battery life and CPU power are constrained, and it took almost 3 years to put it together. It’s a fundamental, ground up development; lots of very interesting science going into it, and a really talented developer leading the project. And now it’s ready. It’s a pretty significant jump forward.

MS: Is the new codec based on predictive voice coding?

JC: SVOPC has two modes, an audio mode and a speech mode, and the speech mode is much more structured towards speech. The new codec strikes little bit more of a balance between a general audio coder and a speech coder. So it does a pretty good job with stuff like background noise and music. But to get that kind of bit-rate reduction there are things about speech that you can capitalize on and get huge efficiency; we didn’t toss all that out. We are definitely using some of the model approach.

MS: Normally one expects with an evolution for the increments to get smaller over time. With the new codec you are getting a 50% improvement in bandwidth utilization, so you can’t be at the incremental stage yet?

JC: I don’t think we are. We were listening to samples from various versions of the client going back to 2.6, now we are at 4.0. In the same situation – pushing the same files in the same acoustic settings through the different client versions – in every release there’s a noticeable (even to the naked ear) difference in quality between the releases.

We are not completely done with it. There are many different areas where we can continue to optimize and tweak it, but we believe it’s at or above the current state of the industry in terms of performance.


Skype 4.0 for Windows has the new codec.
The current Mac beta doesn’t yet support the new codec.

Update: February 3rd,2009: Here is a write-up of SILK from the Skype Journal.

Update: March 7th, 2009: Skype has announced an SDK for third parties to implement SILK in their products, royalty free.

Wideband codecs and IPR

Wideband codecs are a good thing. They have been slow to enter the mainstream, but there are several reasons why this is about to change.

Voice codecs are benefiting from the usual good effects of Moore’s law. Each year higher-complexity (higher computation load) codecs become feasible on low-cost hardware, and each year it is cheaper to fit multiple codecs into a ROM (adding multiple codecs increases the chance that two endpoints will have one in common).

Voice codecs are often burdened by claims of intellectual property rights (IPR) by multiple players. This can make it difficult for software and equipment vendors to use codecs in their products without fear of litigation. The industry response has been to create “patent pools” where the patent owners agree to let a single party negotiate a blanket license on their behalf:

Prior to establishment of the Pool, the complexity of negotiating IPRs with each intellectual property owner discouraged potential integrators.

Unfortunately there is still no pool for the standard wideband codec ratified by the 3GPP for use in cell phones, AMR-WB (G.722.2). Even where there is a pool, getting a license from it doesn’t mean that a use of the codec doesn’t infringe some yet-to-be-revealed patent not in the pool, and it doesn’t indemnify the licensee from such a claim.

There are several royalty-free wideband codecs available. I mentioned a couple of them (from Microsoft and from Skype) in an Internet Telephony Column.

Microsoft and Skype have got around the royalty issue to some extent by creating proprietary codecs. They have researched their algorithms and have either concluded that they don’t infringe or have bought licenses for the patents they use.

G.722 (note that G.722, G.722.1 and G.722.2 are independent of each other, both technically and from the point of view of IPR) is so old that its patent restrictions have expired, making it an attractive choice of common baseline wideband codec for all devices. Unfortunately its antiquity also means that it is relatively inefficient in its use of bandwidth.

Polycom did a major good thing for the industry when it made G.722.1 (Siren7) available on a royalty-free basis. G.721.1 is considerably better than G.722, though it is not as efficient as G.722.2.

The open-source Speex codec is efficient and royalty free, but being open source it bears a little more fear of infringement than the other codecs mentioned here. There are three reasons why this fear may be misplaced. First, the coders claim to have based it on old (1980’s) technology. Second, it has now been available for some years, and has been shipped by large companies and no claims of infringement have surfaced. Third, while it is possible in these times of outrageous patent trolling that somebody will pop up with some claim against Speex, a similar risk exists for all the other codecs, including the ones with patent pools.

So we now have three royalty-free wideband codecs (G.722, G.722.1 and Speex); we have hardware capable of running them cheaply; we have broad deployment of VoIP and growing implementation of VoIP trunking. We have increasing data bandwidth to homes and businesses, to the point where the bandwidth demands of voice are trivial compared to other uses like streaming video and music downloads. Plus there’s a wild card. By 2010 over 300 million people will have mobile smartphones capable of running software that will give them wideband phone conversations over a Wi-Fi connection.

Perhaps the time for wideband telephony is at hand.

Wideband audio conferencing bridge

Skype lets you do audio conferencing with wideband codecs, and a service called Vapps High Definition Conferencing does the same thing for non-Skype VoIP calls.

Now other VoIP providers can offer wideband conferencing too. A company called Wyde Voice sells an all-IP conferencing platform that natively uses wideband codecs. The Wyde platform uses the iSAC codec from GIPS, so anybody calling in from a soft phone like the Gismo5 client, or the Google, AOL or Yahoo VoIP clients can enjoy a conference in wideband. If one of the participants in the call is using a narrow-band codec, the Wyde device up-samples the signal to wideband quality for mixing.

I have always been an enthusiastic proponent of wideband audio – it is one of the major potential advantages of VoIP over circuit switched telephony. Circuit switched calls are encoded with G.711, which yields 12 bits of effective dynamic range and a maximum frequency of about 3.5KHz. Human speech has harmonics even above 10KHz, which is why it is hard to tell the difference between an “F” and an “S” over the phone. The G.711 codec places an absolute limit on the sound quality of a regular phone call. A VoIP phone call can use a wideband codec, with whatever dynamic range and frequency range you want. There are several of them, commonly with a sample size of 16 bits and a sampling rate of 16KHz which captures a maximum audio frequency of 8KHz. When you have a good enough connection Skype uses a wideband codec by default, which is why it can sound better than “toll quality” (if you aren’t limited by your loudspeaker and microphone.)

Unfortunately, for the non-Skype world there’s a chicken and egg problem – almost no phones support wideband codecs, so the carriers aren’t motivated to support them either. Worse, any VoIP call that traverses the PSTN at any point is converted to G.711, losing the wideband frequencies. Worse yet, to cut costs most carrier implementations of VoIP use a bandwidth-saving codec that intrinsically delivers inferior sound quality to G.711; for example, last I heard Vonage was using G.729A.

As VoIP matures, and more and more calls are IP end-to-end through VoIP peering and ENUM arrangements (what Gizmo5 calls “back-door dialing”) wideband codecs will become more pervasive and our conversations will become clearer. The Wyde announcement is a step towards that world.

CSR pitches better sound quality, battery life in Bluetooth headsets

CSR announced their Bluecore 6 chip today. It will ship in production volumes in January 2008. CSR claims a more robust connection – with increased transmit power and receive sensitivity. CSR also claims a breakthrough in sound quality, achieved by going from a Continuous Variable Slope Delta (CVSD) codec to Adaptive Differential Pulse Code Modulation (ADPCM). This enables packet retransmission and a halving of transmission bandwidth. The reduced bandwidth requirement results in a reduction in power consumption, and the ADPCM codec yields a MOS of 4.14 compared with a maximum of 2.41 for CVSD.

This is a welcome change, but doesn’t really go far enough. What’s needed is a wideband codec like AMR-WB to yield better-than-toll quality sound. While this would be redundant in a regular cell phone – ADPCM is more than adequate to carry a signal that has been encoded in GSM – it would make a huge difference in dual-mode phones carrying Voice over Wi-Fi.

T-Mobile launches FMC nationally in USA

***Update: I went to the T-Mobile store this morning and signed up. The service here in Dallas is $10 per month, not $20 as reported by Reuters. The store manager also told me that people with poor cellular reception at home can use the UMA service at no additional monthly charge, but that this usage is treated the same way as cellular usage – in other words, it counts against your cellular minutes.***

***Update 2: Here are some details on the T-Mobile launch campaign. ***

Reuters reported this morning that T-Mobile is rolling out FMC service nationally.

Subscribers would pay an extra fee of up to $19.99 per line or $29.99 for five lines on top of regular monthly cellular bills for unlimited calls in a subscriber’s home or the nearly 8,500 places T-Mobile runs Wi-Fi, like Starbucks coffee shops.

This pricing model seems ambitious, compared to what it is competing with. T-Mobile’s MyFaves 300 plan gives you unlimited minutes nights and weekends and unlimited minutes to a list of five people that you choose. So the 300 minutes are consumed during the day, calling to people whom you call infrequently. For $20 more you can bump this to 1,000 minutes. Alternatively, you can spend that $20 on the FMC service. It seems like the FMC service would only be a better deal for people who are home all day (or at Starbucks), who want to talk a lot to people beyond their five most frequently called. MyFaves 1000 would be a better deal for people who want to talk to a large variety of people during the day when they are not at home, for example in the car or out of range of a Starbucks – like at work, for example.

So who are these people that this “HotSpot@Home” service is aimed at? Surely there can’t be many. Why doesn’t T-Mobile use this technology to gain more customers, by giving it away free to subscribers? This would appeal to all the people who have poor reception at home, who would feel bilked by having to pay extra just for acceptable quality of service there (Hey! They do! See the update above). Another way to increase customer appeal would be to go with a wideband codec for Wi-Fi calls, guaranteeing CD-quality sound to Wi-Fi on-network calls. Or why not do both? This would provide a viral motivation to complement MyFaves, it would be unique among US carriers, it would improve retention, and it would bring new subscribers to start exploiting all that spectrum that T-Mobile picked up in the AWS auction in September 2006.

Dual-mode phones are the key to better-sounding calls

Potentially VoIP calls can sound radically better than what we are used to even on landline phones. So why don’t they? It may be lack of will. Some say the success of the mobile phone industry proves that people don’t care about sound quality on their calls. I don’t think this is a valid inference. All it proves is that people value mobility higher than sound quality.

The telephonic journey from mouth to ear, often thousands of miles in tens of milliseconds, traverses a chain of many weak links, each compounding the impairment of the sound. First, the phone. Whether it’s a headset, a desk phone or a PC, the microphone and speakers have to be capable of transmitting the full frequency spectrum of the human voice without loss, distortion or echo. Second the digital encoding of the call; it has to be done with a wideband codec. Third, the codec has to be end-to-end, so no hops through the circuit switched phone network. Finally the network must convey the media packets swiftly and reliably, since delayed packets are effectively lost, and lost packets reduce sound quality.

Discussions of VoIP QoS normally dwell mainly on the last of these factors, but the others are at least as important. The exciting thing about dual-mode cell phones is that they provide a means to cut through them. Because they must handle polyphonic ring tones and iPod-type capabilities, the speakers on most cell phones can easily carry the full frequency range of the human voice. Cell phone microphones can also pick up the required range, and DSP techniques can mitigate the physical acoustic design challenges of the cell phone form factor. Smart phone processors have the oomph to run modern wideband codecs. This leaves the issue of staying on the IP network from end-to-end. The great thing about dual-mode phones is that they can connect directly to the Internet in the two places where most people spend most of their time: at work and at home.

So if you and the person you are talking to are both in a Wi-Fi enabled location, and you both have a dual mode cell phone, your calls should not only be free, but the sound should be way better than toll quality.

Check out the V2oIP website for an industry initiative on this topic.