Using the Google Chrome Browser

I have some deep seated opinions about user interfaces and usability. It normally only takes me a few seconds to get irritated by a new application or device, since they almost always contravene one or more of my fundamental precepts of usability. So when I see a product that gets it righter than I could have done myself, I have to say it warms my heart.

I just noticed a few minutes ago, using Chrome, that the tabs behave in a better way than on any other browser that I have checked (Safari, Firefox, IE8). If you have a lot of tabs open, and you click on an X to close one of them, the tabs rearrange themselves so that the X of the next tab is right under the mouse, ready to get clicked to close that one too. Then after closing all the tabs that you are no longer interested in, when you click on a remaining one, the tabs rearrange themselves to a right size. This is a very subtle user interface feature. Chrome has another that is a monster, not subtle at all, and so nice that only stubborn sour grapes (or maybe patents) stop the others from emulating it. That is the single input field for URLs and searches. I’m going to talk about how that fits with my ideas about user interface design in just a moment, but first let’s go back to the tab sizing on closing with the mouse.

I like this feature because it took a programmer some effort to get it right, yet it only saves a user a fraction of a second each time it is used, and only some users close tabs with the mouse (I normally use Cmd-W), and only some users open large numbers of tabs simultaneously. So why did the programmer take the trouble? There are at least two good reasons: first, let’s suppose that 100 million people use the Chrome browser, and that they each use the mouse to close 12 tabs a day, and that in 3 of these closings, this feature saved the user from moving the mouse, and the time saved for each of these three mouse movements was a third of a second. The aggregate time saved per day across 100 million users is 100 million seconds. At 2,000 working hours per year, that’s more than 10 work-years saved per day. The altruistic programmer sacrificed an hour or a day or whatever of his valuable time, to give the world far more. But does anybody apart from me notice? As I have remarked before, at some level the answer is yes.

The second reason it was a good idea for the programmer to take this trouble is to do with the nature of usability and choice of products. There is plenty of competition in the browser market, and it is trivial for a user to switch browsers. Usability of a program is an accretion of lots of little ingredients. So in the solution space addressed by a particular application, the potential gradation of usability is very fine-grained, each tiny design decision moving the needle a tiny increment in the direction of greater or lesser usability. But although ease of use of an application is an infinitely variable property, whether a product is actually used or not is effectively a binary property. It is a very unusual consumer (guilty!) who continues to use multiple browsers on a daily basis. Even if you start out that way you will eventually fall into the habit of using just one. For each user of a product, there is a threshold on that infinite gradation of usability, that balances against the benefit of using the product. If the product falls below that effort/benefit threshold it gradually falls into disuse. Above that threshold the user forms the habit of using it regularly. Many years ago I bought a Palm Pilot. For me, that user interface was right on my threshold. It teetered there for several weeks as I tried to get into the habit of depending on it, but after I missed a couple of important appointments because I had neglected to put them into the device, I went back to my trusty pocket Day-Timer. For other people, the Palm Pilot was above their threshold of usability, and they loved it, used it and depended on it. Not all products are so close to the threshold of usability. Some fall way below it. You have never heard of them – or maybe you have: how about the Apple Newton? And some land way above it; before the iPhone nobody browsed the Internet on their phones – the experience was too painful. In one leap the iPhone landed so far above that threshold that it routed the entire industry.
The razor thin line between use and disuse
The point here is that the ‘actual use’ threshold is a a razor-thin line on the smooth scale of usability, so if a product lies close to that line, the tiniest, most subtle change to usability can move it from one side of the line to the other. And in a competitive market where the cost of switching is low, that line isn’t static; the competition is continuously moving the threshold up. This is consistent with “natural selection by variation and survival of the fittest.” So product managers who believe their usability is “good enough,” and that they need to focus on new features to beat the competition are often misplacing their efforts – they may be moving their product further to the right on the diagram above than they are moving it up.

Now let’s go on to Chrome’s single field for URLs and searches. Computer applications address complicated problem spaces. In the diagram below, each circle represents the aggregate complexity of an activity performed with the help of a computer. The horizontal red line represents the division between the complexity handled by the user, and that handled by the computer. In the left circle most of the complexity is dealt with by the user, in the right circle most is dealt with by the computer. For a given problem space, an application will fall somewhere on this line. For searching databases HAL 9000 has the circle almost entirely above this line, SQL is way further down. The classic example of this is the graphical user interface. It is vastly more programming work to create a GUI system like Windows than a command-line system like MS-DOS, and a GUI is correspondingly vastly easier on the user.

Its single field for typing queries and URLs clearly makes Chrome sit higher on this line than the browsers that use two fields. With Chrome the user has less work to do: he just gives the browser an instruction. With the others the user has to both give the instruction and tell the computer what kind of instruction it is. On the other hand, the programmer has to do more work, because he has to write code to determine whether the user is typing a URL or a search. But this is always going to be the case when you make a task of a given complexity easier on the user. In order to relieve the user, the computer has to handle more complexity. That means more work for the programmer. Hard-to-use applications are the result of lazy programmers.

The programming required to implement the single field for URLs and searches is actually trivial. All browsers have code to try to form a URL out of what’s typed into the address field; the programmer just has to assume it’s a search when that code can’t generate a URL. So now, having checked my four browsers, I have to partially eat my words. Both Firefox and IE8, even though they have separate fields for web addresses and searches, do exactly what I just said: address field input that can’t be made into a URL is treated as a search query. Safari, on the other hand, falls into the lazy programmer hall of shame.

This may be a result of a common “ease of use” fallacy: that what is easier for the programmer to conceive is easier for the user to use. The programmer has to imagine the entire solution space, while the user only has to deal with what he comes across. I can imagine a Safari programmer saying “We have to meet user expectations consistently – it will be confusing if the address field behaves in an unexpected way by doing a search when the user was simply mistyping a URL.” The fallacy of this argument is that while the premise is true (“it is confusing to behave in an unexpected way,”) you can safely assume that an error message is always unexpected, so rather than deliver one of those, the kind programmer will look at what is provoking the error message, and try to guess what the user might have been trying to achieve, and deliver that instead.

There are two classes of user mistake here: one is typing into the “wrong” field, the other is mistyping a URL. On all these browsers, if you mistype a URL you get an unwanted result. On Safari it’s an error page, on the others it’s an error page or a search, depending on what you typed. So Safari isn’t better, it just responds differently to your mistake. But if you make the other kind of “mistake,” typing a search into the “wrong” field, Safari gives an error, while the others give you what you actually wanted. So in this respect, they are twice as good, because the computer has gracefully relieved the user of some work by figuring out what they really wanted. But Chrome goes one step further, making it impossible to type into the “wrong” field, because there is only one field. That’s a better design in my opinion, though I’m open to changing my mind: the designers at Firefox and Microsoft may argue that they are giving the best of both worlds, since users accustomed to separate fields for search and addresses might be confused if they can’t find a separate search field.

ITExpo West — Building Better HD Video Conferencing & Collaboration Systems

I will be moderating a session at ITExpo West on Tuesday 5th October at 9:30 am: “Building Better HD Video Conferencing & Collaboration Systems,” will be held in room 306A.

Here’s the session description:

Visual communications are becoming more and more commonplace. As networks improve to support video more effectively, the moment is right for broad market adoption of video conferencing and collaboration systems.

Delivering high quality video streams requires expertise in both networks and audio/video codec technology. Often, however, audio quality gets ignored, despite it being more important to efficient communication than the video component. Intelligibility is the key metric here, where wideband audio and voice quality enhancement algorithms can greatly improve the quality of experience.

This session will cover both audio and video aspects of today’s conferencing systems, and the various criteria that are used to evaluate them, including round-trip delay, lip-sync, smooth motion, bit-rate required, visual artifacts and network traversal – and of course pure audio quality. The emphasis will be on sharing best practices for building and deploying high-definition conferencing systems.

The panelists are:

  • James Awad, Marketing Product Manager, Octasic
  • Amir Zmora, VP Products and Marketing, RADVISION
  • Andy Singleton, Product Manager, MASERGY

These panelists cover the complete technology stack from chips (Octasic), to equipment (Radvison) to network services (Masergy), so please bring your questions about any technical aspect of video conferencing systems.

Why are we waiting?

I just clicked on a calendar appointment enclosure in an email. Nothing happened, so I clicked again. Then suddenly two copies of the appointment appeared in my iCal calendar. Why on earth did the computer take so long to respond to my click? I waited an eternity – maybe even as long as a second.

The human brain has a clock speed of about 15 Hz. So anything that happens in less than 70 milliseconds seems instant. The other side of this coin is that when your computer takes longer than 150 ms to respond to you, it’s slowing you down.

I have difficulty fathoming how programmers are able to make modern computers run so slowly. The original IBM PC ran at well under 2 MIPS. The computer you are sitting at is around ten thousand times faster. It’s over 100 times faster than a Cray-1 supercomputer. This means that when your computer keeps you waiting for a quarter of a second, equally inept programming on the same task on an eighties-era IBM PC would have kept you waiting 40 minutes. I don’t know about you, but I encounter delays of over a quarter of a second with distressing frequency in my daily work.

I blame Microsoft. Around Intel the joke line about performance was “what Andy giveth, Bill taketh away.” This was actually a winning strategy for decades of Microsoft success: concentrate on features and speed of implementation and never waste time optimizing for performance because the hardware will catch up. It’s hard to argue with success, but I wonder if a software company obsessed with performance could be even more successful than Microsoft?

Interview with Jean-Marc Valin of Speex

I have written before about the appeal of wideband codecs, and the damping effect that royalty issues have on them. Speex is an interesting alternative wideband codec: open source and royalty free. Having discussed the new Skype codec with Jonathan Christensen earlier this year I thought it would be interesting to hear from the creator of Speex, Jean-Marc Valin.
MS: What got you into codecs?
JMV: I did a master’s in Sherbrooke lab – the same lab that did G.729 and all that. I did speech enhancement rather than codecs, but learned a bit about codecs there and after I did my master’s I thought it would be nice to have a codec that everybody could use, especially on Linux. All those proprietary codecs were completely unavailable on Linux because of patent issues, so I thought it would be something nice to do. I met a guy named David Rowe who thought the same thing and knew more about codecs than I did, so we started Speex together. In the end he didn’t have much time to write code but I did and he had great advice and feedback.
MS: How much of Speex did you write, and how much was contributed by others?
JMV: I wrote about 90%, but most of the contributions were not in code but in terms of bug reports, feedback, suggestions, or in the early beginning David Rowe didn’t write much code but he gave me really good advice. So a lot was contributed, but not a lot of the contributions were code. The port to windows was contributed.
MS: Were there any radical innovations in algorithms that a contributor came up with?
JMV: No, I don’t think there’s an issue of that. And even what I wrote it was mostly just a matter of putting together building blocks that were generally known, and just putting together so that a decent codec resulted. There’s nothing in Speex that somebody would look at and say “Wow, this is completely unheard of.” There are a few features that aren’t in other codecs, but they’re not fundamental breakthroughs or anything like that.
MS: How is Speex IPR-free? Do you just study the patents and figure out work-arounds or do you just assume that if you write code from scratch it’s not infringing, or do you look at patents for speech technologies that have already expired…
JMV: It’s actually a mixture of all that. Basically the first thing with Speex is that I wasn’t trying to innovate, especially in the technological sense. A lot of Speex is built on really old technology that either wasn’t patented or if it was the patents had expired. A lot of 80’s technologies.
CELP is 80’s technology. CELP was not patented. There are developments of it like ACELP which was patented – actually by my former university, so although its actually a pretty nice technique I couldn’t use it so I just avoided it and used something else, which turned out to be not that much worse, and in the end it didn’t really matter – it was just a bit of an inconvenience.
MS: Are the users like Adobe calling you to verify that Speex is IPR free?
JMV: I had a few short contacts with them. I didn’t speak with any lawyers, so I assume somebody had a look at it and decided that it was safe enough to use. It’s a fundamental problem with patents, in any case, regardless of whether you’re open source royalty free or proprietary, patented or anything like that. Anyone can claim they have a patent on whatever you do. At some point it’s a calculated risk, and Speex is not more risky than any other technology. Even if you license the patents you never know who else might claim they have a patent on the thing.
MS: Has anybody tried that with Speex?
JMV: No.
MS: How long has Speex been in use?
JMV: I started Speex in Feb. 02 and v1.1 was released in March 03 at which point the bit stream was frozen. All codecs have to freeze the bit stream at some point. All the G.72x ITU codecs have a development phase, then they agree on the codec and they say “this is the bit stream and it’s frozen in stone,” because you don’t want people changing the definition of the codec because nobody would be able to talk to each other.
MS: But you can change the implementations of the algorithms that generate the bit stream?
JMV: Most of the ITU codecs have a so-called “bit-exact” definition, which means a certain bit pattern as audio input has to produce exactly this pattern as the compressed version. This leaves a lot less room for optimization.
MS: Does Speex have a bit exact definition?
JMV: No. The decoder is defined, so the bit stream itself is defined, but there is no bit-exact definition, and there can’t really be because there is a floating point version and you can’t do bit exact with floating point anyway.
In that sense it’s more similar to the MPEG codecs that are also not bit-exact.
After the bit stream was frozen I spent quite a lot of time doing a fixed point port of Speex so it could run on ARM and other processors that don’t have floating point support. I also spent some time doing quality optimizations that didn’t involve changing the bit stream. There are still a lot of things you can do in terms of improving the encoder to produce a better combination of bits.
MS: So the decoder doesn’t change, but the encoder can be improved and that will give you a better end result?
JMV: Exactly. That’s what happened for example with MP3, where the first encoders were really, really bad. And over time they improved, and the current encoders are much better than the older ones.
MS: Have you optimized Speex for particular instruction sets?
JMV: There are a few optimizations that have been done in assembly code for ARM. Mostly for the ARM4 architecture there’s a tiny bit of work that I did several years ago to use the DSP instructions where available.
MS: How much attention have you paid to tweaking Speex for a particular platform, like for example a particular notebook computer?
JMV: Oh, no, no. First, all of that is completely independent of the actual codec. In the case of Speex I have in the same package I have the Speex codec and a lot of helper functions, echo cancellation, noise suppression and things like that. Those are completely independent of the codec. You could apply them to another codec or you could use Speex with another echo canceller. It’s completely interchangeable, and there are not really any laptop specific things. The only distinction between echo cancellers is between acoustic echo cancellers and line echo cancellers, which are usually completely different. The acoustic echo will be used mostly in VoIP when you have speakers and microphones instead of headsets. What really changes in terms of acoustic echo is not really from one laptop to another but from one room to another because you are canceling the echo from the whole room acoustics.
MS: Isn’t there a direct coupling between the speaker and the microphone?
JMV: Overall what you need to cancel is not just the path from the mic to the speakers. Even with the same laptop the model will change depending on all kinds of conditions. There’s the direct path which you need to cancel, but there’s also all the paths that go through all the walls in your room. Line echo cancellers only have a few tens of milliseconds, whereas acoustic echo cancellers need to cancel over more than 100 milliseconds and cancel all kinds of reflections and things like that.
Even if you are in front of your laptop and you just move slightly the path changes and you have to adjust for that.
MS: So who did the echo cancellers in the Speex package – was that you?
JMV: Yes.
MS: G.711 has an annex that includes PLC (Packet Loss Concealment), and others say PLC is a part of their codec.
JMV: The PLC is tied to the codec in the case of Speex and pretty much all relatively advanced codecs. G.711 is pretty old and all packets are completely independent, so you can do pretty much anything you like for concealment. For Speex or any other CELP codec you need to tie the PLC to the codec.
MS: As far as wideband is concerned, how wideband is Speex? What are the available sample rates?
JMV: Wideband was part of the original idea of Speex. I didn’t even think about writing a narrowband version of it. And in the end some people convinced me that narrowband was still useful so I did it. But it was always meant to be wideband. The way it turned out to be done was in an embedded way, which means that if you take a wideband Speex stream it is made up of narrowband stream and extra information for the higher frequencies. That makes it pretty easy to interoperate with narrowband systems. For instance if you have a wideband stream and you want to convert it to the PSTN you just remove the bits that correspond to the higher frequencies and you have something narrowband. This is for 16 kHz. For higher frequencies, Speex will support also a 32 kHz mode – I wouldn’t say that it’s that great, and that’s one of the reasons I wrote another codec which is called CELT (pronounced selt).
MS: What about the minimum packet size you have for Speex?
JMV: Packetization for Speex is in multiples of 20 ms. The total delay is slightly more than that – around 10 ms, so the total delay introduced by the codec is around 30 ms, which is similar to the other CELP based codecs.
MS: is Speex a variable bit rate codec?
JMV: It has both modes. In most VoIP applications people want to use constant bit rate because they know what their link can operate at. In some cases you can use VBR, that’s an option that Speex supports.
VBR will reduce the average bandwidth so if you have hundreds of conversations going through the same link, then at the same quality VBR will take of the order of 20% less bandwidth, or something like that. I don’t remember the exact figures.
A conversation can go above the average bit rate just as easily as it can go below.
MS: Can you put a ceiling on it, suppose you specify a variable bit rate not to exceed 40 kbps?
JMV: Yes, that’s a supported option. It would sound slightly worse than a constant bit rate of 40 kbps. There’s always the trade-off of bit rate and quality. I believe some people already do it in the Asterisk PBX, but I could be wrong on that one.
MS: How does Speex compare to other codecs on MIPS requirement?
JMV: I haven’t done precise comparisons, but I can say that in terms of computational complexity Speex narrowband is comparable to G.729 (not G.729A, which is less complex) and AMR-NB and Speex wideband is comparable to AMR-WB. The actual performance on a particular architecture may vary depending on how much optimization has been done. In most applications I’ve seen, the complexity of Speex is not a problem.
MS: So what about AMR-WB? Seems like it’s laden with IP encumbrances? What are the innovations in that that make it really good, and do you think it’s better than Speex or does Speex have alternative means of getting the same performance?
JMV: I never did a complete listening test comparing Speex to AMR-WB. To me Speex sounds more natural, but I’m the author, so possibly someone would disagree with me on that. In any case there wouldn’t b a huge difference of one being much better than the other. The techniques are pretty different, AMR-WB uses ACELP. Both are fundamentally CELP but they do it very differently.
MS: The textbooks say CELP models the human voice tract. What does that mean?
JMV: It’s not really modeling; it’s making many assumptions that are sort of true if the signal is actually voice. Basically the LP part of CELP is Linear Prediction, and that is a model that would perfectly fit the vocal tract if we didn’t have a nose. The rest has to do with modeling the pitch, which is very efficient, assuming the signal has a pitch, which is not true of music, for instance. The Code Excited part is mostly about vector quantization, which is an efficient way of coding signals in general.
The whole thing all put together makes it pretty efficient for voice.
MS: What is the biggest design win that you know of for Speex?
JMV: There are a couple of high profile companies that use Speex. Recently the one that people talked about was Flash version 10. Google Talk is using it as well.
MS: Do you track at all how many people are using it in terms of which applications are using it?
JMV: In some cases I hear about this company using Speex, or that company tells me they are using it or they ask me a few questions so they can use it. So I have a vague idea of a few companies using it, but I don’t really track them or even have a way to track them because a part of the idea of being open source is that anyone can use it with very few restrictions, and with no restrictions from me on having to get a license or anything like that.
MS: How many endpoints are running Speex now?
JMV: It’s pretty much impossible to say. There are a large number of video games that use Speex. It’s very popular in that market because it’s free.
MS: Would gamers want to use CELT instead? That’s a very delay-sensitive environment.
JMV: I think it depends on the bandwidth. I was involved in one of the first games that used it was unreal tournament in 2004 and they were using a very low bit rate, so CELT wouldn’t have worked. Now the bandwidths are larger, so possibly someone will want to use CELT at some point.
MS: What is CELT?
JMV: It’s an acronym for Constrained Energy Lap Transform. It actually works pretty equally either on voice or music. The bandwidth is a bit more than Speex. Speex in wideband mode will usually take about 30 kbps at 16 kHz, whereas with CELT usually you want to use at least 40 kbps. At 40 kbps you have pretty decent voice quality, at full audio bandwidth, 44 or 48 kHz. This is the CD sample rate or slightly higher, 48 which is another widely used rate. For those sample rates which basically give you the entire audio spectrum in terms of bit rate usually you need at least 40 for voice. You can go a bit lower but not much. If you use 48 you get decent music quality and at 64 you get pretty good music quality.
MS: Is CELT a replacement for Speex?
JMV: No, there is definitely a place for both of them. There’s actually very little competition between them. usually people either want the lower rate of Speex; for instance if you want something that works at 20 kbps, you use Speex, and CELT is for higher bit rate, lower delays, and also supports music, so there’s nearly no overlap between the two.
MS: How does CELT compare to MP3?
JMV: I actually did some tests with an older version, and in terms of quality alone it was already better than MP3 which was originally quite surprising to me, because my original goal was not to beat MP3 but to make the delay much lower, because you can’t use MP3 for any kind of real-time communication, because the delay will be way more than 100 ms. CELT is designed for delays that are lower than 10 ms.
MS: Wow! So how many milliseconds are in a packet?
JMV: It is configurable. You can use CELT with packets as small as around 2 ms. or you can use packets that are up to 10. The default I recommend is around 5 ms.
MS: So the IP overhead must be astronomical! 2 ms at 64 kbps is 16 bytes per packet!
JMV: In normal operation you wouldn’t use the 2 ms mode, but I wanted to enable real-time music collaboration over the network. So you can have two people on the same continent that play music together in real time over the net. This is something you can’t do with any other codec that exists today.
MS: So the Internet delay is going to be at least 70ms.
JMV: It depends. Overall you need to have less than 25 ms one – way delay for the communication to work. That’s the total delay. So if you look at existing codecs, even so-called low-delay AAC already has at least 20 ms of packetization delay. So if you add the network and anything else you can’t make it. Codecs such as AMR-WB will have 25 or 30 ms for packetization, so you have already busted right there. It won’t work for music. So this is one reason why I wrote CELT.
MS: Have you played music over the Internet with CELT yet?
JMV: I haven’t tried it yet but some other people have tried it and reported pretty good results.
The other goal, even if you are not trying to play music over the network, it has to do with AEC which although some software does it relatively well it’s still not a completely solved problem. If you are able to have the delay low enough you almost don’t have to care about echo cancellation at all because the feedback is so quick that you don’t even notice the echo. Just like when you speak on the phone you hear your own voice in the head set and it doesn’t bother you because it’s instantaneous.
MS: Has anybody done any research on how long the delay can get before it begins to become disorienting?
JMV: There’s is some research, usually for a particular application and it will depend on the amount of echo and all of that but usually if you manage to get it below around 50 ms for the round trip usually it won’t be a problem, and the lower the delay the less annoying the echo is, even if you don’t cancel it.
MS: What phone are you talking on now?
JMV: My company phone. At home I’m set up with my cable provider.
MS: So you don’t use Speex yourself?
JMV: I used it when I was in Australia and I wanted to talk with my family back here. But I’m not using it in any regular fashion now.
MS: So what software did you use with the webcam?
JMV: At the time I was using Ekiga and OpenWengo. Both are Linux clients, because I don’t run Windows on my machines. Open Wengo is one of the few on Linux that can talk to a Windows machine.
MS: Have you ever used Skype?
JMV: Once or twice, but not regularly.
MS: What kind of cell phone do you have?
JMV: A really basic cell phone I think I have sent maybe one or two SMS messages in my life and that’s the most complicated I have ever done with that phone, which I use mainly in case of emergencies. I am not a telecommunication power user or anything like that.
MS: I am thinking that cell phones are how wideband codecs are going to take off. I’m not talking about AMR-WB. There are going to be hundreds of millions of smart phones sold, over the next few years that have Wi-Fi in them. And because you will be able to do VoIP from a platform that you can load applications onto, it seems like a Wi-Fi voice client for smart phones is going to be a way that wideband audio can really infiltrate and take off. I’m thinking that that might be a way for you to start using Speex in your daily life.
JMV: Well, I sure hope that Speex will take off a lot more in that area. Originally it wasn’t planned to go that far. Originally the only market I had in mind was the Linux or open source market with PC based soft phones. That’s the only thing I cared about. It was designed mainly for IP networks as opposed to wireless, and I just wanted to see far it would go. It turned out to be a lot further than I expected.
Porting to Windows was done pretty early in the game. That was a contribution – I have never actually compiled it for Windows. And eventually people started porting it to all sorts of devices I have never heard of like embedded DSPs and lots of different architectures.
MS: I must say, thank you very much. I feel that wideband audio is a great benefit to the telephone world, and will undoubtedly become very common over time, and one of the biggest impediments to wideband audio is the intellectual property issue, so having an open source, IPR-free implementation of a wideband codec that seems to be a good one is just a great thing for the world, and a wonderful thing you have done for the world.
JMV: I think wideband will be pretty important, especially for voice over IP because it’s basically the only way that VoIP can ever say that it’s better than the PSTN. As long as people stay with narrowband the best VoIP can be is “almost as good as PSTN.” And yes, IPR is a pretty serious issue there.

Skype’s new super-wideband codec

I spoke with Jonathan Christensen of Skype yesterday, about the new codec in the latest Windows beta of Skype:

MS: Skype announced a new voice codec at CES. What’s different about it from the old one?

JC: The new codec is code-named SILK. Compared to its predecessor, SVOPC, the new codec gives the same or better audio response at half the bit-rate for wideband, and we also introduced a super wideband mode. SVOPC is a 16kHz sample rate, 8kHz audio bandwidth. The new codec has that mode as well, but it also has a 24 kHz sample rate, 12 kHz audio bandwidth mode. Most USB headsets have enough capture and render fidelity that you can experience the 12 kHz super wideband audio.

MS: Is the new codec an evolution of SVOPC?

JC: The new codec was a separate development branch from SVOPC. It has been under development for over 3 years, during which we focused both on the codec and the echo canceller and all the surrounding bits, and eventually got all that put together.

MS: What about the computational complexity?

JC: The new codec design point was different from SVOPC. SVOPC was designed for use on the desktop with a math coprocessor. It is actually pretty efficient. It’s just that it has a number of floats in it so it becomes extremely inefficient when it’s not on a PC.
The new codec’s design goal was to be ultra lightweight and embeddable. The vast majority of the addressable device market is better suited to fixed point, so it’s written in fixed point ANSI C – it’s as lightweight as a codec can be in terms of CPU utilization. Our design point was to be able to put it into mobile devices where battery life and CPU power are constrained, and it took almost 3 years to put it together. It’s a fundamental, ground up development; lots of very interesting science going into it, and a really talented developer leading the project. And now it’s ready. It’s a pretty significant jump forward.

MS: Is the new codec based on predictive voice coding?

JC: SVOPC has two modes, an audio mode and a speech mode, and the speech mode is much more structured towards speech. The new codec strikes little bit more of a balance between a general audio coder and a speech coder. So it does a pretty good job with stuff like background noise and music. But to get that kind of bit-rate reduction there are things about speech that you can capitalize on and get huge efficiency; we didn’t toss all that out. We are definitely using some of the model approach.

MS: Normally one expects with an evolution for the increments to get smaller over time. With the new codec you are getting a 50% improvement in bandwidth utilization, so you can’t be at the incremental stage yet?

JC: I don’t think we are. We were listening to samples from various versions of the client going back to 2.6, now we are at 4.0. In the same situation – pushing the same files in the same acoustic settings through the different client versions – in every release there’s a noticeable (even to the naked ear) difference in quality between the releases.

We are not completely done with it. There are many different areas where we can continue to optimize and tweak it, but we believe it’s at or above the current state of the industry in terms of performance.

————–

Skype 4.0 for Windows has the new codec.
The current Mac beta doesn’t yet support the new codec.

Update: February 3rd,2009: Here is a write-up of SILK from the Skype Journal.

Update: March 7th, 2009: Skype has announced an SDK for third parties to implement SILK in their products, royalty free.

iPhone 3G, SDK, enterprise orientation

UBS thinks that the 3G iPhone will be released mid-year. iLounge reports that the much-anticipated iPhone SDK will be delivered in June, at Apple’s Worldwide Developer Conference. A beta version will be released at an announcement event on March 6th.

There are several reports that Apple intends to target business users with the iPhone, competing with Blackberries, Nokia’s Eseries and Windows Mobile devices. Since the SDK reportedly will expose interfaces to the phone and Wi-Fi, developers of Wi-Fi soft-phones and enterprise Fixed-Mobile Convergence systems will presumably add iPhone support to their existing Symbian and Windows-supporting products. It remains to be seen how easy it will be for developers to actually get their software “officially” onto the iPhone. Apple can choose their degree of open-ness from a variety of options discussed here.

For Apple to aim at the business market makes a lot of sense. With the successful transition to Intel processors Macs already run Windows natively, and iPhones are supposedly making inroads among executives. According to ChangeWave, summarized here, the iPhone has a 5% share of corporate smartphones already, with astronomical ratings for satisfaction.

To make enterprise IT departments happy, though, Apple will have to make the iPhone more manageable; either by building in OMA DM like Nokia with the Eseries, or by letting third parties develop enterprise manageability clients using the iPhone SDK.

Competitors aren’t sitting still for this. The October 2007 announcement of “Microsoft System Center Mobile Device Manager” was a step forward for Windows Mobile in the enterprise. Microsoft is also leaking stories about how when Windows Mobile 7 is released in 2009 it is going to be more of a pleasure to use than the iPhone. It is conceivable, I suppose, but Microsoft’s track record on usability is pretty consistent. The fundamental part that they invariably seem to get wrong is instant response to user input.

Google phone developer community

The genesis of the Google phone project is described in this Boston Globe article by Scott Kirsner.

The Open Handset Alliance will release its SDK on November 12th, 2007. The iPhone SDK will not be released until four months later. Microsoft and Symbian already have not only mature SDKs, but vigorous development communities: in mid-2007 Windows Mobile had 650 thousand registered developers, and Forum Nokia had 2 million registered individuals and 440 companies in its “Platinum Program.”

As usual with a new development environment, it’s a chicken and egg situation, but the chicken is coming out in pretty good shape; if the base platform debuts with comparable functionality to what the iPhone came out with, it’s a low-risk proposition for the phone OEMs, and Google’s magic coattails will ensure hysterical enthusiasm in the developer community.

Google phone alliance members

The Open Handset Alliance was announced today by Google and 30 or so other companies. Until now the highest-profile open source handset operating environment was OpenMoko.

The list of participants has no real surprises in it. Nokia isn’t on the list, most likely because this project competes head on with Symbian. This may also help to explain why Sony Ericsson isn’t a supporter yet, either. But the other three of the top five handset manufacturers are members: Motorola, Samsung and LG. All of these ship Symbian-based phones, but they also ship Windows based phones, so they are already pursuing an OS-agnostic strategy. Open standards are less helpful to a market leader than to its competitors.

Of course the other leading smartphone OS vendors are also missing from the list: Microsoft, Apple, Palm and RIM.

Ebay is there because this massively benefits Skype.

Silicon vendors retain more control of their destiny when there is a competitive software community, so it makes sense that TI is aboard even though it is the market leader in cellphone chips. Intel is another chip vendor that is a member. Intel can normally be relied on to support this type of open platform initiative, and although Intel sold its handset-related businesses in 2006, its low power CPU efforts may evolve from ultra-mobile PCs down to smartphones in a few years.

Among MNOs Verizon and AT&T Mobile are notorious for their walled-garden policies, so it makes sense that they aren’t on the list, though Sprint and T-Mobile are, which is an encouraging indication.

At the launch of the iPhone Steve Jobs said that the reason there would be no SDK for the iPhone was that AT&T didn’t want their network brought down by a rogue application. I ridiculed this excuse in an Internet Telephony column. Even so, the carriers do have a valid objection to completely open platforms: their subscribers will call them for support when the phone crashes. For this reason, applications that use sensitive APIs in Symbian must be “Symbian signed.” When he announced the iPhone SDK, Steve Jobs alluded to this as a model that Apple may follow.

So Sprint’s and T-Mobile’s participation in this initiative is very interesting. Sprint’s press release says:

Unlike other wireless carriers, Sprint allows data users to freely browse the Internet outside its portal and has done so since first offering access to the Internet on its phones in 2001.

Open Internet access is actually available from all the major US MNOs other than Verizon; AT&T ships the best handset for this, the iPhone. But the iPhone doesn’t (officially) let users load whatever software they want onto the phone. Symbian and Windows-based phones generally do, and again all the major MNOs ship handsets based on these operating systems. An open source handset goes a big step further, but who benefits depends on what parts of the source code are published, and what APIs are exposed by the proprietary parts of the system. As a rule of thumb, one would think that giving developers this greater degree of control over the system will increase their scope for innovation.

Apple to let outsiders create programs for iPhone???

Reuters carried a story yesterday from the Apple World-Wide Developer Conference in San Francisco. The headline is “Apple to let outsiders create programs for iPhone,” and the story says “Apple Inc. will allow independent developers to write applications for its upcoming iPhone by tapping into the device’s built-in Web browser.” The story was presumably based on Apple’s press release on the topic.

This sounds exciting, so why did Apple stock lose $4.30 on the day? Well, the market focused on the glass-half-empty. Apple didn’t open the iPhone up to third party developers in the way that most developers want. The comment that squelched the crowd was “there’s no SDK!” The official version, what Jobs, Forstall and the press release said beyond that, is too scant and ambiguous to draw a clear idea of how well developers will be able to exploit the iPhone as a platform. Here’s a link to the video of the Jobs keynote. The iPhone developer part starts at time index 1:14. Ryan Block of Engadget was there live-blogging the Jobs keynote. His transcription and commentary:

Jobs: “We have been trying to come up with a solution to expand the capabilities of the iPhone so developers can write great apps for it, but keep the iPhone secure. And we’ve come up with a very. Sweet. Solution. Let me tell you about it. An innovative new way to create applications for mobile devices… it’s all based on the fact that we have the full Safari engine in the iPhone. And so you can write amazing Web 2.0 and AJAX apps that look and behave exactly like apps on the iPhone, and these apps can integrate perfectly with iPhone services. They can make a call, check email, look up a location on Gmaps… don’t worry about distribution, just put ’em on an internet server. They’re easy to update, just update it on your server. They’re secure, and they run securely sandboxed on the iPhone. And guess what, there’s no SDK you need! You’ve got everything you need if you can write modern web apps…”
Block: “Weeeeeaaaak.”
Scott Forstall, VP of iPhone software: “Your applications can take advantage of the built-in native services.”
Block: “He’s in the iPhone — no new apps up on screen, the same 11 as before — sorry iPhone fans!”
Forstall: “We built a custom corporate address book app to use our internal LDAP… it actually took less than one person-month to do this. It’s under 600 lines of code to do the whole thing.”
Block: “Shows up the vCards as they look in the built-in contact app. Not too shabby!”

The Web 2.0/AJAX model is great for AT&T, because this model requires continuous interaction with the server, so you will be burning up your data minutes. Except, I hope, when you are at home or at work and can use the Wi-Fi connection.

This is, as Block says, weak compared to loading real OSX applications on the phone. How weak depends on what Steve Jobs means by “the full Safari engine.” Apple’s Safari FAQ page says “All versions of Safari support Netscape-style plug-ins.” This undoubtedly applies to the iPhone version of Safari, since Steve Jobs has been toying with the idea of including Flash. The published Safari plugin SDK isn’t any use to iPhone developers, since the CPU is an ARM. So if Apple doesn’t publish an iPhone SDK, even the Safari plugin support is moot to third parties, except those working closely with Apple, like Google. One obvious Google plugin that would reduce the sting of no SDK would be Google Gears, which lets you run server-based applications off-line. The usual example is Google’s complete suite of Office applications.

From the overall context it appears that there is a JavaScript API to control some elements of the iPhone subsystem. That could be cool, depending how capable the API is. As for documentation of the API, a check of the Apple Developer website doesn’t reveal anything of that nature yet. There was a session at WWDC called “Developing Web Sites for iPhone,” which may have had some related information.

Blog reaction has been hysterical (but when has it ever not been?) Jesus Diaz of Gizmodo says No iPhone SDK Means No Killer iPhone Apps. One interesting tidbit in his piece concerns the degree of integration with the the iPhone’s services. Here’s what he says about clicking on a phone number in the browser to place a call:

This is nothing new, however. We knew this from the very beginning because iPhone’s Safari was already doing it. It’s called auto-detection of phone numbers and addresses: you click on a phone or address in your web page and it gets passed by Safari to the operating system, which calls the number or shows the address in the Google Maps app.

I certainly hope this isn’t the extent of it. If so, this guy is right. Nothing special here at all.

We live in hope, though. Steve Jobs was accurate in saying that Web 2.0/AJAX programming is the hot new thing, and that highly capable applications (especially enterprise applications) are being built like this. Users don’t care how software is written, they just want it to perform a useful function in a responsive and considerate way. If the API is rich enough, popular opinion will follow the trail that Ryan Block blazed, from “Weeeeeaaaak” to “Not too shabby!”

Parvesh Sethi’s opinions

Parvesh Sethi is Cisco’s Vice President, Advanced Services. In his keynote at the Communications Developer Conference this week in Santa Clara, he described an interesting use case for future wireless devices:

Your phone automatically notifies the hotel when you arrive – no need to stand in line to check in. Your assigned room number appears on your phone screen. The phone acts as a wireless key for your room. In your room the hotel puts targeted ads onto your phone’s screen.

The bulk of his talk consisted of advice for developers. The two main themes were “leverage the power of the network” and “exploit the long tail.”

The power of the network bit is to be expected from Cisco. The long tail part was a theme at many of the other presentations in the conference. For those who haven’t read the book, the idea is that the enormous reach of the web at relatively minuscule cost allows products that in the past would have been too narrow in appeal now to be commercially viable, and when combined with enough other low-volume products, to be lucrative. For example, a book that sells two copies a month isn’t worth carrying in a retail bookshop. But an online bookstore with a hundred thousand such titles would glean annual revenues in the tens of millions of dollars.

Sethi explained that custom programming for a particular enterprise used to be prohibitively expensive. But now the Web is packed with useful components that you can invoke through simple APIs. Web development environments automatically take care of the hard stuff for you, stuff like security, transcoding, QoS, authentication. Application acceleration is available right in the network. The open application development environment makes it possible for people to add their own value. This unleashes the long tail effect for the component vendors.

The example Sethi gave for this type of application was a real-world one, from an individual Subway franchisee – not Subway Corporate. The application runs on a Cisco phone. When an employee arrives in the morning, he logs in on the phone. This means no need for a time clock. Suppose four employees are scheduled to work a shift, but only three clock in. Previously one would have to start calling to find a substitute, leaving only two to perform the work. Now the phone system starts outdialing automatically, calling down the list of substitutes until one responds with a touchtone. Meanwhile, back in the store, the phone reminds the employees about essential process steps, like putting the bread in the oven. If the employee doesn’t acknowledge that he has done it, the system calls the supervisor to snitch. Sethi claimed that this application yielded a 30% increase in lunchtime revenue in its first month of operation.

Openness of the development environment, the ability for users to modify Cisco’s system and incorporate it into applications built on a whole set of such open components was one of four “Pillars of UC development” that Sethi identified. The other three were security, simplicity and virtuality (access to the application via any device, any where).