I’ve written several times in the past about the kind of “ultimate device” or laptop type replacement. Over the months I’ve become convinced that the keyboard specifically and maybe even the mouse are incredibly unintuitive and un-userfriendly input devices.

That consumers, and users, would much prefer something which is geared to how they work.

As such, I’ve been following advances in hardware and software very closely. Not because I necessarily want to do something rash, but because the possibilities are becoming more and more realistic.

The Idea

I had a conversation with an incredibly smart guy here at work and just brainstorming a bit.

Over nearly a dozen posts I’ve come down to the fact that we could easily do:

- a hiptop device with a 20GB HDD, P4M 1.5GHz processory, 128MB of RAM
- visual output device (glasses or projection-based) which sits in front of the eyes
- input for flash memory sticks, keyboard and mouse

This would give you a true “mobile desktop”. You could take it anywhere, run any app and interact via keyboard and mouse. Your “desktop” would become a “hub”. The problem is that, ultimately, I want something I can walk down the streeth with. I want something I can sit in the airport with. I want something that is more user friendly and more intuitive than a standard OS.

The Issues

As such, there are a few key problems:

- we need better, or more context-sensitive, voice recognition
- we need some kind of “mouse replacement”
- we need a new “shell” (at least) or a new OS (at most)

We need voice recognition because if you’re on the go you need to be able to interact with the device without a keyboard (mobile or otherwise). I’ll touch on context-sensitive a bit later, as it was really today’s “breakthrough”.

We need a mouse replacement, because even with the most intuitive and advanced voice recognition, it’s really, really cumbersome to, for instance, bold specified text without some kind of pointing / selecting mechanism.

And we need a new shell…

The Shell

The idea for this was thrown out somehow in our discussion today. If a user is working in Word, for instance, and already has decent voice recognition (doable) and some kind of imaginary mouse replacement, the only thing that is difficult is doing things like interacting with documents, opening programs, etc.

In a “mobile mode” situation, we thought, there is a very limited scope of things a user needs to be able to do. Either they are working in an application OR they are trying to work with other applications and data.

By taking this concept up a level, we realised that if a user isn’t working in an application, they must then be searching for applications or data. As such, they get to “the Windows environment”. But, really, the Windows environment is an entirely point and click environment. That won’t work for our mobile device.

We need a voice environment. Not a vocal environment, or a voice-feedback environment (though that would be cool as well). So, we just bounced back and forth about what kind of other environments there were besides standard desktop environments, and 2 jumped out at us.

First, the old Mac environment was really “single piont of entry”. Most Mac users didn’t use their desktops a lot, they used the little Apple symbol to access just about everything. Single point of entry. Just the kind of thing we’d need if users could only give vocal input.

Then we realised that many older PDA’s had ‘single point of entry’ type environments for when you weren’t working in an application. They had, in essence, multiple “launchers”.

You could go to your Application Launcher, to your Games Launcher, to your Documents Launcher, etc.

We quickly realised that something like this could be just what we were looking for. If a user was inside Microsoft word and wanted to send an email, the voice commands could easily be:

- “Open Application Launcher. Open Outlook”

Boom, they’re in. Obviously things like voice shortcuts would be useful as well (“Execute Open Outlook” should work on it’s own, for instance, or even “Execute O” maybe).

They could easily navigate between preset “Launchers”, as well as virtual ones.

If you’ve seen the Longhorn Video for Healthcare, you’ve seen the cnocept of meta, or virtual, filesets. Something like this would be incredibly useful for our “Launchers” environment.

Context Sensitive

One of the biggest reasons that this is useful, is that if you are in the “desktop” mode (as opposed to “application” mode I guess); the voice recognition software has a much smaller ‘vocabulary’ it needs to work with. If you’ve only got 20 apps, there are only 100 (ish) words it needs to try and recognize, which increases the chance of it getting the word right exponentially.

A great example of this is the Table PC Demo I pointed to earlier today. They show off just this kind of functionality for text recognition.

The Hub

One of the key features of such a device should be (in my mind anyways); that it could also go into “real desktop” mode: a full OS environment. You sit down, plug in a real keyboard and mouse (and maybe even a monitor) and you use this thing as a desktop replacement.

Summary

I still don’t think we’re at the point where we could realistically build a device like this out of commodity products, but we are much, much closer. The hardware can easily be built for less than 1,000$. The biggest hurdles now are the voice recognition, the mouse replacement and the shell.

I’m confident that if the mouse replacement issue could be solved, and if voice recognition improved (maybe Microsoft Speech Server is the key here?) the shell would be easy to do.