May 19 2004

Hiptop: Context & Communication

Category: IT ThoughtsJeremy C. Wright @ 11:13 am

This Hiptop Device I’ve been thinking about lately. My genius colleague and I did some more thinking today.

From the comments yesterday, everyone seemed to think voice wasn’t very practical. I’d disagree. Completely.

The Voice

If you step back from computers, you’ll quickly realise that the most natural form of communication is communication. Besides the most complex principles, human beings are communicative people. The fact that we don’t communicate with computers just goes to show how piss poor computers are for us.

Things like, by nature humans don’t multitask. Until the last 200 or so years, humans were single-task oriented. I know some would argue that we can’t really think about more than one thing about once (others would argue with that!) so we aren’t really multitasking. That’s not the point.

The point is that it isn’t natural for us to do so, therfore in order to design a hiptop computer which best works with humans we may need to step away from the ALT+TAB mentality.

I still have to think on this aspect more, but one of the suggestions I’ve been given is that while the “Launcher” idea is good, it might be even better to have an “avatar” (personal respondent… doesn’t need to be Clippy :p) that you can simply talk to. Communication is the most natural form of communication, yeah?

The key to something like this is limiting context.

I touched on this a little yesterday, but the more narrow your required vocabulary, the more likely that a speech recognition system will pick it up. For instance, if you go from the ‘complete’ active dictionary of the English language (roughly 200K words) down to, say, just commonly used words (20K); your accuracy rate goes from something like 85% to 93%. If you can further widdle this down with context to a few hundred words, you start getting into 99.x% accuracy.

One of the points yesterday was that speech recognition is too slow. While I’d agree, I’d also disagree. Currently what’s on the market is too slow. But, what’s currently on the market is really badly designed. For instance, one of HP’s call centres can currently do speech recognition help in 3 different languages, and 7 different English accents.

Getting back to context, if your avatar is only responding to ‘commands’, that will decrease the likelihood of errors. Not pureform commands, but regular speech commands: “Computer, I’d like to write a letter to Shannon” for instance.

Fun, fun.

A few thoughts have come out of this voice thing, as well. Subvocalization may be one way to do this. Just have a necklace around your neck. It might work (I have no idea); I’m just not sure how realistic it is at this point in time. Others may have some thoughts.

The Mouse

One of the fundamental things that my coworker realised was that we don’t actually need a pointer. Freedom of movement is not required in a voice or mobile environment.

You may (or may not) need to point. Click. But you do not need to wave the little cursor around the screen. This, again, brings us to limited context. Voice could easily be used to do command commands for applications (as long as those apps ‘expose’ the commands they accept to the ‘avatar’); and beyond that you could have a series of commands for ‘move to next selectable’, as well as a series for moving the screen around (scrolling in a long document).

One of the other thoughts was to create a fully 3D environment which could respond to visual stimulation: wiggle your finger in front of some goggles, move right, jab.

I’m not sure.

Open Source Ideas. Keep the feedback coming. Ultimately I’d like to see a design that could be built in this ‘era’ of technology: in the next 1-5 years. If it can’t be done in that time, and someone can convince me of that, I’ll happily nix the idea :)

2 Responses to “Hiptop: Context & Communication”

  1. Armas says:

    If you use Speech, to control a computer, I think Mac OS 8.x/9.x is the only experience I’ve had using it. You can say, “Show speakable commands”, “make this speakable”, Launch, open, close, next window, check mail, what time is it, tell me a joke…etc.
    I found it VERY fast! All Mac’s have a small Mic, in laptops or desktops, above the screen….

    It recognizes things well mostly….over 90%….but sometimes people walk into the room and start talking to you – “So I hate Netscape” (Launches Netscape on the computer) Says, “I have TIME” ( Computer says, “It’s 3:30pm great lord and master…) and applications are flying open, closed, sending emails…..
    So speech is gonna kill you in public places…..

    Again, see Handspring Treo, slap in a 1.5″ 40GB Microdrive, and all these are solved – speech may work in some cases, but not on a train, mall, office….

  2. Michael Giagnocavo says:

    Microsoft’s speech recognition product (Voice Commander? something like that) for Pocket PC does recognition in context. For a 300MHz proc and limited RAM, I was surprised it worked at all (I had tried IBM’s product for PPC and it didn’t run well at all).

    I can say “Show “, “Play ” and “Start “. Since it knows every possible value, it does provide great accuracy — and this is on a slow, small machine without amazing input (I’ve used it while driving).