Gesture recognition in smartphones

This piece from the Aberdeen Group shows accelerometers and gyroscopes becoming universal in smartphones by 2018.

Accelerometers were exotic in smartphones when the first iPhone came out – used mainly for sensing the orientation of the phone for displaying portrait or landscape mode. Then came the idea of using them for dead-reckoning-assist in location-sensing. iPhones have always had accelerometers; since all current smartphones are basically copies of the original iPhone, it is actually odd that some smartphones lack accelerometers.

Predictably, when supplied with a hardware feature, the app developer community came up with a ton of creative uses for the accelerometer: magic tricks, pedometers, air-mice, and even user authentication based on waving the phone around.

Not all sensor technologies are so fertile. For example the proximity sensor is still pretty much only used to dim the screen and disable the touch sensing when you hold the phone to your ear or put it in your pocket.

So what about the user-facing camera? Is it a one-trick pony like the proximity sensor, or a springboard to innovation like the accelerometer? Although videophoning has been a perennial bust, I would argue for the latter: the you-facing camera is pregnant with possibilities as a sensor.

Looking at the Aberdeen report, I was curious to see “gesture recognition” on a list of features that will appear on 60% of phones by 2018. The others on the list are hardware features, but once you have a camera, gesture recognition is just a matter of software. (The Kinect is a sidetrack to this, provoked by lack of compute power.)

In a phone, compute power means battery-drain, so that’s a limitation to using the camera as a sensor. But each generation of chips becomes more power-efficient as well as more powerful, and as phone makers add more and more GPU cores, the developer community delivers new useful uses for them that max them out.

Gesture recognition is already here with Samsung, and soon every Android device. The industry is gearing up for innovation in phone based computer vision with OpenVX from Khronos. When always-on computer vision becomes feasible from a power-drain point of view, gesture recognition and face tracking will look like baby-steps. Smart developers will come up with killer applications that are currently unimaginable. For example, how about a library implementation of Paul Ekman’s emotion recognition algorithms to let you know how you are really feeling right now? Or, in concert with Google Glass, so you will never again be oblivious to your spouse’s emotional temperature.
Update November 19th: Here‘s some news and a little bit of technology background on this topic…
Update November 22:It looks like a company is already engaged on the emotion-recognition technology.

ITExpo: The Future is Now: Mobile Callers Want Visuals with Voice over the existing network

I will be moderating a panel on this topic at ITExpo East 2012 in Miami at 2:30 pm on Wednesday, February 1st.

The panelists will be Theresa Szczurek of Radish Systems, LLC, Jim Machi of Dialogic, Niv Kagan of Surf Communications Solutions and Bogdan-George Pintea of Damaka.

The concept of visuals with voice is a compelling one, and there are numerous kinds of visual content that you may want to convey. For example, when you do a video call with FaceTime or Skype, you can switch the camera to show what you are looking at if you wish, but you can’t share your screen or photos during a call.

FaceTime, Skype and Google Talk all use the data connection for both the voice and video streams, and the streams travel over the Internet.

A different, non-IP technology for videophone service called 3G-324M, is widely used by carriers in Europe and Asia. It carries the video over the circuit-switched channel, which enables better quality (lower latency) than the data channel. An interesting application of this lets companies put their IVR menus into a visual format, so instead of having to listen through a tedious listing of options that you don’t want, you can instantly select your choice from an on-screen menu. Dialogic makes back-end equipment that makes applications like on-screen IVR possible on 3G-324M networks.

Radish Systems uses a different method to provide a similar visual IVR capability for when your carrier doesn’t support 3G-324M (none of the US carriers do). The Radish application is called Choiceview. When you make a call from your iPhone to a Choiceview-enabled IVR, you dial the call the regular way, then start the Choiceview app on your iPhone. The Choiceview IVR matches the Caller ID on the call with your phone number that you typed into the app setup, and pushes a menu to the appropriate client. So the call goes over the old circuit-switched network, while Choiceview communicates over the data network. Choiceview is strictly a client-server application. A Choiceview server can push any data to a phone, but the phone can’t send data the other way, neither can two phones exchange data via Choiceview.

So this ITExpo session will try to make sense of this mix: multiple technologies, multiple geographies and multiple use cases for visual data exchange during phone calls.

Droid Razr first look.

First impression is very good. The industrial design on this makes the iPhone look clunky. The screen is much bigger, the overall feel reeks of quality, just like the iPhone. The haptic feedback felt slightly odd at first, but I think I will like it when I get used to it.

I was disappointed when the phone failed to detect my 5GHz Wi-Fi network. This is like the iPhone, but the Samsung Galaxy S2 and Galaxy Nexus support 5 Ghz, and I had assumed parity for the Razr.

Oddly, bearing in mind its dual core processor, the Droid Razr sometimes seems sluggish compared to the iPhone 4. But the Android user interface is polished and usable, and it has a significant user interface feature that the iPhone sorely lacks: a universal ‘back’ button. The ‘back’ button, like the ‘undo’ feature in productivity apps, fits with the way people work and learn: try something, and if that doesn’t work, try something else.

The Razr camera is currently unusable for me. The first photo I took had a 4 second shutter lag. On investigation, I found that if you hold the phone still, pointed at a static scene, it takes a couple of seconds to auto-focus. If you wait patiently for this to happen, watching the screen and waiting for the focus to sharpen, then press the shutter button, there is almost no shutter lag. But if you try to ‘point and shoot’ the shutter lag can be agonizingly long – certainly long enough for a kid to dodge out of the frame. This may be fixable in software, and if so, I hope Motorola gets the fix out fast.

While playing with the phone, I found it got warm. Not uncomfortably hot, but warm enough to worry about the battery draining too fast. Investigating this, I found a wonderful power analysis display, showing which parts of the phone are consuming the most power. The display, not surprisingly, was consuming the most – 35%. But the second most, 24%, was being used by ‘Android OS’ and ‘Android System.’ As the battery expired, the phone kindly suggested that it could automatically shut things off for me when the power got low, like social network updates and GPS. It told me that this could double my battery life. Even so, battery life does not seem to be a strength of the Droid Razr. Over a few days, I observed that even when the phone was completely unused, the battery got down to 20% in 14 hours, and the vast majority of the power was spent on ‘Android OS.’

So nice as the Droid Razr is, on balance I still prefer the iPhone.

P.S. I had a nightmare activation experience – I bought the phone at Best Buy and supposedly due to a failure to communicate between the servers at Best Buy and Verizon, the phone didn’t activate on the Verizon network. After 8 hours of non-activation including an hour on the phone with Verizon customer support (30 minutes of which was the two of us waiting for Best Buy to answer their phone), I went to a local Verizon store which speedily activated the phone with a new SIM.

Deciding on the contract, I was re-stunned to rediscover that Verizon charges $20 per month for SMS. I gave this a miss since I can just use Google Voice, which costs $480 less over the life of the contract.

HTML 5 takes iPhone developer support full circle

Today Rethink Wireless reported that Facebook is moving towards HTML 5 in preference to native apps on phones.

When the iPhone in arrived 2007, this was Steve Jobs’ preferred way to do third party applications:

We have been trying to come up with a solution to expand the capabilities of the iPhone so developers can write great apps for it, but keep the iPhone secure. And we’ve come up with a very. Sweet. Solution. Let me tell you about it. An innovative new way to create applications for mobile devices… it’s all based on the fact that we have the full Safari engine in the iPhone. And so you can write amazing Web 2.0 and AJAX apps that look and behave exactly like apps on the iPhone, and these apps can integrate perfectly with iPhone services. They can make a call, check email, look up a location on Gmaps… don’t worry about distribution, just put ‘em on an internet server. They’re easy to update, just update it on your server. They’re secure, and they run securely sandboxed on the iPhone. And guess what, there’s no SDK you need! You’ve got everything you need if you can write modern web apps…

But the platform and the developer community weren’t ready for it, so Apple was quickly forced to come up with an SDK for native apps, and the app store was born.

So it seems that Apple was four years early on its iPhone developer solution, and that in bowing to public pressure in 2007 to deliver an SDK, it made a ton of money that it otherwise wouldn’t have:

A web service which mirrors or enhances the experience of a downloaded app significantly weakens the control that a platform company like Apple has over its user base. This has already been seen in examples like the Financial Times newspaper’s HTML5 app, which has already outsold its former iOS native app, with no revenue cut going to Apple.

Video calling from your cell phone

Although phone numbers are an antiquated kind of thing, we are sufficiently beaten down by the machines that we think of it as natural to identify a person by a 10 digit number. Maybe the demise of the numeric phone keypad as big touch-screens take over will change matters on this front. But meanwhile, phone numbers are holding us back in important ways. Because phone numbers are bound to the PSTN, which doesn’t carry video calls, it is harder to make video calls than voice, because we don’t have people’s video addresses so handy.

This year, three new products attempted to address this issue in remarkably similar ways – clearly an idea whose time has come. The products are Apple’s FaceTime, Cisco’s IME and a startup product called Tango.

In all three of these products, you make a call to a regular phone number, which triggers a video session over the Internet. You only need the phone number – the Internet addressing is handled automatically. The two problems the automatic addressing has to handle are finding a candidate address, then verifying that it is the right one. Here’s how each of those three new products does the job:

1. FaceTime. When you first start FaceTime, it sends an SMS (text message) to an Apple server. The SMS contains sufficient information for the Apple server to reliably associate your phone number with the XMPP (push services) client running on your iPhone. With this authentication performed, anybody else who has your phone number in their address book on their iPhone or Mac can place a videophone call to you via FaceTime.

2. Cisco IME (Inter-Company Media Engine). The protocol used by IME to securely associate your phone number with your IP address is ViPR (Verification Involving PSTN Reachability), an open protocol specified in several IETF drafts co-authored by Jonathan Rosenberg who is now at Skype. ViPR can be embodied in a network box like IME, or in an endpoint like a phone of PC.
Here’s how it works: you make a phone call in the usual way. After you hang up, ViPR looks up the phone number you called to see if it is also ViPR-enabled. If it is, ViPR performs a secure mutual verification, by using proof-of-knowledge of the previous PSTN call as a shared secret. The next time you dial that phone number, ViPR makes the call through the Internet rather than through the phone network, so you can do wideband audio and video with no per-minute charge. A major difference between ViPR and FaceTime or Tango is that ViPR does not have a central registration server. The directory that ViPR looks up phone numbers in is stored in a distributed hash table (DHT). This is basically a distributed database with the contents stored across the network. Each ViPR participant contributes a little bit of storage to the network. The DHT itself defines an algorithm – called Chord – which describes how each node connects to other nodes, and how to look up information.

3. Tango, like FaceTime, has its own registration servers. The authentication on these works slightly differently. When you register with Tango, it looks in the address book on your iPhone for other registered Tango users, and displays them in your Tango address book. So if you already know somebody’s phone number, and that person is a registered Tango user, Tango lets you call them in video over the Internet.

iPhone 4 gets competition

When the iPhone came out it redefined what a smartphone is. The others scrambled to catch up, and now with Android they pretty much have. The iPhone 4 is not in a different league from its competitors the way the original iPhone was. So I have been trying to decide between the iPhone 4 and the EVO for a while. I didn’t look at the Droid X or the Samsung Galaxy S, either of which may be better in some ways than the EVO.

Each hardware and software has stronger and weaker points. The Apple wins on the subtle user interface ingredients that add up to delight. It is a more polished user experience. Lots of little things. For example I was looking at the clock applications. The Apple stopwatch has a lap feature and the Android doesn’t. I use the timer a lot; the Android timer copied the Apple look and feel almost exactly, but a little worse. It added a seconds display, which is good, but the spin-wheel to set the timer doesn’t wrap. To get from 59 seconds to 0 seconds you have to spin the display all the way back through. The whole idea of a clock is that it wraps, so this indicates that the Android clock programmer didn’t really understand time. Plus when the timer is actually running, the Android cutely just animates the time-set display, while the Apple timer clears the screen and shows a count-down. This is debatable, but I think the Apple way is better. The countdown display is less cluttered, more readable, and more clearly in a “timer running” state. The Android clock has a wonderful “desk clock” mode, which the iPhone lacks, I was delighted with the idea, especially the night mode which dims the screen and lets you use it as a bedside clock. Unfortunately when I came to actually use it the hardware let the software down. Even in night mode the screen is uncomfortably bright, so I had to turn the phone face down on the bedside table.

The EVO wins on screen size. Its 4.3 inch screen is way better than the iPhone’s 3.5 inch screen. The “retina” definition on the iPhone may look like a better specification but the difference in image quality is indistinguishable to my eye, and the greater size of the EVO screen is a compelling advantage.

The iPhone has far more apps, but there are some good ones on the Android that are missing on the iPhone, for example the amazing Wi-Fi Analyzer. On the other hand, this is also an example of the immaturity of the Android platform, since there is a bug in Android’s Wi-Fi support that makes the Wi-Fi Analyzer report out-of-date results. Other nice Android features are the voice search feature and the universal “back” button. Of course you can get the same voice search with the iPhone Google app, but the iPhone lacks a universal “back” button.

The GPS on the EVO blows away the GPS on the iPhone for accuracy and responsiveness. I experimented with the Google Maps app on each phone, walking up and down my street. Apple changed the GPS chip in this rev of the iPhone, going from an Infineon/GlobalLocate to a Broadcom/GlobalLocate. The EVO’s GPS is built-in to the Qualcomm transceiver chip. The superior performance may be a side effect of assistance from the CDMA radio network.

Incidentally, the GPS test revealed that the screens are equally horrible under bright sunshine.

The iPhone is smaller and thinner, though the smallness is partly a function of the smaller screen size.

The EVO has better WAN speed, thanks to the Clearwire WiMax network, but my data-heavy usage is mainly over Wi-Fi in my home, so that’s not a huge concern for me.

Battery life is an issue. I haven’t done proper tests, but I have noticed that the EVO seems to need charging more often than the iPhone.

Shutter lag is a major concern for me. On almost all digital cameras and phones I end up taking many photos of my shoes as I put the camera back in my pocket after pressing the shutter button and assuming the photo got taken at that time rather than half a second later. I just can’t get into the habit of standing still and waiting for a while after pressing the shutter button. The iPhone and the EVO are about even on this score, both sometimes taking an inordinately long time to respond to the shutter – presumably auto-focusing. The pictures taken with the iPhone and the EVO look very different; the iPhone camera has a wider angle, but the picture quality of each is adequate for snapshots. On balance the iPhone photos appeal to my eye more than the EVO ones.

For me the antenna issue is significant. After dropping several calls I stuck some black electrical tape over the corner of the phone which seems to have somewhat fixed it. Coverage inside my home in the middle of Dallas is horrible for both AT&T and Sprint.

The iPhone’s FM radio chip isn’t enabled, so I was pleased when I saw FM radio as a built-in app on the EVO, but disappointed when I fired it up and discovered that it needed a headset to be plugged in to act as an antenna. Modern FM chips should work with internal antennas. In any case, the killer app for FM radio is on the transmit side, so you can play music from your phone through your car stereo. Neither phone supports that yet.

So on the plus side, the EVO’s compelling advantage is the screen size. On the negative side, it is bulkier, the battery life is less, the software experience isn’t quite so polished.

The bottom line is that the iPhone is no longer in a class of its own. The Android iClones are respectable alternatives.

It was a tough decision, but I ended up sticking with the iPhone.

VoIP over the 3G data channel comes to the iPhone

I discussed last September how AT&T was considering opening up the 3G data channel to third party voice applications like Skype. According to Rethink Wireless, Steve Jobs mentioned in passing at this week’s iPad extravaganza that it is now a done deal.

Rethink mentions iCall and Skype as beneficiaries. Another notable one is Fring. Google Voice is not yet in this category, since it uses the cellular voice channel rather than the data channel, so it is not strictly speaking VoIP; the same applies to Skype for the iPhone.

According to Boaz Zilberman, Chief Architect at Fring, the Fring iPhone client needed no changes to implement VoIP on the 3G data channel. It was simply a matter of reprogramming the Fring servers to not block it. Apple also required a change to Fring’s customer license agreements, requiring the customer to use this feature only if permitted by his service provider. AT&T now allows it, but non-US carriers may have different policies.

Boaz also mentioned some interesting points about VoIP on the 3G data channel compared with EDGE/GPRS and Wi-Fi. He said that Fring only uses the codecs built in to handsets to avoid the battery drain of software codecs. He said that his preferred codec is AMR-NB; he feels the bandwidth constraints and packet loss inherent in wireless communications negate the audio quality benefits of wideband codecs. 3G data calls often sound better than Wi-Fi calls – the increased latency (100 ms additional round-trip according to Boaz) is balanced by reduced packet loss. 20% of Fring’s calls run on GPRS/EDGE, where the latency is even greater than on 3G; total round trip latency on a GPRS VoIP call is 400-500ms according to Boaz.

As for handsets, Boaz says that Symbian phones are best suited for VoIP, the Nokia N97 being the current champion. Windows Mobile has poor audio path support in its APIs. The iPhone’s greatest advantage is its user interface, it’s disadvantages are lack of background execution and lack of camera APIs. Android is fragmented: each Android device requires different programming to implement VoIP.

Apple’s App-roval process

I wrote earlier about AT&T’s responses to FCC’s questions concerning the iPhone App Store and Google Voice.

Now Apple has posted its responses to the same questions, which are basically the same as AT&T’s. Among the differences are that Apple’s responses contain some hard numbers on its controversial App Store approval process:

  • 80% of applications are approved as originally submitted.
  • 95% of applications are approved within 14 days of submission.
  • 65,000 applications have been approved.
  • 200,000 submissions and re-submissions have been made.
  • 8,500 submissions are coming in each week.
  • Each submission is reviewed by two reviewers.
  • There are 40 reviewers.

These numbers don’t really add up. So what Apple probably means is that 95% of the applications that have been approved were approved within 14 days of their final submission. Even so, each reviewer must look at an average of 425 submissions per week (8,500*2/40), which is 10 per hour per reviewer – an average of 12 minutes of reviewer time per submission, which doesn’t seem to justify the terms “comprehensive” and “rigorous” used in Apple’s description of the process:

Apple developed a comprehensive review process that looks at every iPhone application that is submitted to Apple. Applications and marketing text are submitted through a web interface. Submitted applications undergo a rigorous review process that tests for vulnerabilities such as software bugs, instability on the iPhone platform, and the use of unauthorized protocols. Applications are also reviewed to try to prevent privacy issues, safeguard children from exposure to inappropriate content, and avoid applications that degrade the core experience of the iPhone. There are more than 40 full-time trained reviewers, and at least two different reviewers study each application so that the review process is applied uniformly. Apple also established an App Store executive review board that determines procedures and sets policy for the review process, as well as reviews applications that are escalated to the board because they raise new or complex issues. The review board meets weekly and is comprised of senior management with responsibilities for the App Store. 95% of applications are approved within 14 days of being submitted.

Of course much of this might be automated, which would explain both the superhuman productivity of the reviewers and the alleged mindlessness of the decision-making.

AT&T, Apple and VoIP on the iPhone

The phone OEMs are customer-driven, and I mean that in a bad way. They view service providers rather than consumers as their customers, and therefore have historically tended to be relatively uninterested in ease of use or performance, concentrating on packing in long checklists of features, many of which went unused by baffled consumers. Nokia seemed to have factions that were more user-oriented, but it took the chutzpah of Steve Jobs to really change the game.

A recent FCC inquiry has provoked a fascinating letter from AT&T on the background of the iPhone and AT&T’s relationship with Apple, including Voice over IP on the iPhone. On the topic of VoIP, the letter says that AT&T bound Apple to not create a VoIP capability for the iPhone, but Apple did not commit to prevent third parties from doing so. AT&T says that it never had any objection to iPhone VoIP applications that run over Wi-Fi, and that it is currently reconsidering its opposition to VoIP applications that run over the 3G data connection. Since the argument that AT&T presents in the letter in favor of restrictions on VoIP is weak, such a reconsideration seems in order.

The argument goes as follows: the explosion of the mobile Internet led by the iPhone was catalyzed by cheap iPhones. iPhones are cheap because of massive subsidies. The subsidies are paid for by the voice services. Therefore, AT&T is justified in protecting its voice service revenues because the subsidies they allow had such a great result: the flourishing of the mobile Internet. The reason this argument is weak is that voice service revenues are not the only way to recoup subsidies. AT&T has discovered that it can charge for the mobile Internet directly, and recoup its subsidies that way. It will not sell a subsidized iPhone without an unlimited data plan, and it increased the price of that mandatory plan by 50% last year. Even with this price increase iPhone sales continued to burgeon. In other words, AT&T may be able to recoup lost voice revenues by charging more for its data services.

This is exactly what the “dumb pipes” crowd has been advocating for over a decade now: connectivity providers should charge a realistic price for connectivity, and not try to subsidize it with unrealistic charges for other services.