Wednesday, April 13, 2011

A voice command philosophy

I don't use voice recognition on my Windows 7 computer for speech-to-text. I have touch-typing so ingrained in me that it's very difficult to switch. (I tried a few times with not so great results.) I know one day I will work harder at it, but for now I'm content with using voice to control my computer only.

Using voice to control your computer, when working right, is such a natural way to command your computer, it makes using a computer pleasant. What is the right way? That will differ from person-to-person, but suffice to say that a switch in philosophy on the part of voice command designers is necessary to usher in the next big thing in voice interface control.

Take printing as an example. To open the print dialog, you could press ctrl-p, if you knew the shortcut. If you didn't, you would stumble into the File menu, click Print, and then click OK. When toolbars first became popular, especially in Word, they added a little picture of a printer, which you immediately thought of as a shortcut to the Print dialog. The problem was that, when you clicked it, some programs would think you meant "Print now" - after all, the toolbar is for shortcuts right?

Except that's not how most people were thinking. A "print now" command is more of a macro than a toolbar command. Most people thought clicking the print icon would bring up the Print dialog, because that's how it worked before. Unfortunately, for a few years, many trees died for nothing. Finally someone somewhere set this right, and now clicking a Print button brings up the Print dialog. Behavior matches user expectation.

But then what if you wanted a "Print Now" macro?, i.e., all you do is click Print and then OK? What you really want is to be able to do one of the following actions or combined actions:

  • "Show me the print dialog so I can select options before sending it to the printer"
  • "Print now using current settings"
Taking Word 2007 as an example, since there is no such command to print now, you could make a macro, or you could program an external macro utility to press the keys for you.

This works fine for a few macros, if you are inclined to create them. However, if you start creating a lot of these for different programs, it starts to become impossible to track, and the efficiencies gained in using a "print now" macro are outweighed by trying to remember that you assigned this macro to ctrl-alt-p. You would also have to remember that if you use Photoshop, that already performs a command. It becomes a lot of mental work to use something that is supposed to make your life easier.

Using language commands is different. If set up the right way, language commands that invoke multiple steps can be incredible empowering and reduce drudgery. You end up performing tasks, not substeps of tasks. I'm old enough to remember when my mother had to squeeze wet clothes through the wringer before hanging them up to dry; now washing machines all have spin cycles.

One example: I sometimes want to copy a filename from Windows Explorer. Steps: click the file, wait, click it again, Windows Explorer enters "rename file" mode; right-click and then click Copy. You can press F2 and Ctrl-C to speed things up. Your hands and mouse are moving a lot. You might need to look down to make sure you are pressing the right keys. It takes time to move your hands back and forth. A better way please?

Using a macro program I could write a macro that would do these steps for me. But then to initiate the macro, I would have to either click a 'macro' menu or press a non-standard key combination.

But... (and getting back to speech) if you were to assign this macro to a voice command, your life becomes much easier. You could click the file, say "Copy filename", wait a second, and then continue. Your hand doesn't move from the mouse. Better still, it's much easier to remember to say "copy filename" than it is to remember to press ctrl-shift-alt-c, a combination which is likely to be available but also requires you to let go of the mouse and perhaps look down to see what keys you are pressing.

Through a combination of using Windows Speech Recognition, Windows Speech Macros, and Autohotkey (a macro program), I am able to do this now. I have commands for copying files, opening a folder to a DOS window, opening the Word paragraph dialog box (directly to the tab of my choice), zooming in and out, and of course a print now command.

Additionally, I started to notice that I sometimes change my speech when working quickly. So I might say "Print it now" instead of just "Print now". My macros are smart enough to figure out that it means the same thing. The computer is accomodating the fact that I'm not a computer, and that I can't remember how to do everything perfectly every time. The computer is working for *me*.

It requires a lot of investment on my part, and I'm continuously updating my macros. However, the point is, the masses will not put so much work into their computer setup to get this to work, nevermind needing to be so technical to do this and to get what I want. A next-generation voice command system will know that, no matter where I am in Windows, when I say "new email", it knows to open Outlook and hit Ctrl-N. It should give me some flexibility to know that if a browser window is open, and I'm logged into Gmail, saying "new email" will start a new email witihin Gmail instead of through Outlook. (In fact this is how my speech macros behave.) Customizing what target listens to what and when requires some input on my part, but why should anyone need to know how to write code to accomplish so many everyday tasks.

Finally, as I hinted in the aforementioned paragraph, a proper voice-command system should not be so literal with context. If I'm in Word and I want to create a new email, I don't want to have to say "Start Outlook, new email". I don't want to have to remember that Outlook is already open, in which I should say "Switch to Outlook, new email". I want to say "New email" and have my computer with a gazillion circuits figure it out for me.

Getting back to Word as an example, I think the Fluent interface introduced in Word 2007 is great. There are some shortcomings, but I like it better than plain menus. You can also use Windows Speech recognition to control Word out of the box. Whatever you see, you can click. So for example, if I'm in a table, and I want to add a new row below, I can say "Click Layout tab" and then "Click Insert Below".

That's great, but why do I even have to remember that the table insert command is in the Layout tab? Why can't I just say "Insert a new row below" and the computer will figure it out? Think about it from a non-technical user's perspective. He's created a table. He wants a new row. He's thinking "new row". Why can't he just say "new row"?

There are some programs out there that try to do this for you, but as is often the case, this is something that should be built into the operating system, and unless you are dedicated to learning it (you have a disability, say), you will stick with tried and true (mouse and keyboard). Which is unfortunate, because there is joy in being able to tell the computer "Empty the trash." This is what computers were meant to be.

Looking forward to this future.


  1. Hello, Thank you for writing on this obscure topic. I would like to open a folder not shown on the screen through a voice macro. Could you show what that code would look like? thanks

  2. [url=][b]Longchamp Outlet[/b][/url]

    [url=][b]NFL Jerseys[/b][/url]

    [url=][b]Yeezy Boots[/b][/url]

    [url=][b]Jordan 4[/b][/url]

    [url=][b]Longchamp Bags[/b][/url]

    [url=][b]Air Max 2016[/b][/url]

    [url=][b]Adidas UK[/b][/url]


    [url=][b]Under Armour[/b][/url]

    [url=][b]Timberland UK[/b][/url]

    [url=][b]Jordan 12[/b][/url]

    [url=][b]Polo Ralph Lauren Outlet[/b][/url]

    [url=][b]Yeezy Shoes[/b][/url]

    [url=][b]Nike Outlet[/b][/url]

    [url=][b]Nike Huarache[/b][/url]

    [url=][b]Keds Shoes[/b][/url]

    [url=][b]Nike Roshe Run Shoes[/b][/url]


    [url=][b]Nike Air Max[/b][/url]

    [url=][b]Oakley Outlet[/b][/url]


    [url=][b]Cheap Jordan[/b][/url]

    [url=][b]Toms Outlet Online[/b][/url]

    [url=][b]Adidas Ultra Boost[/b][/url]

    [url=][b]Ray-Ban Outlet[/b][/url]

    [url=][b]MLB Jerseys[/b][/url]

    [url=][b]Kate Spade Outlets[/b][/url]

    [url=][b]Ugg Clearance Sale Outlet[/b][/url]

    [url=][b]Ugg Boots Outlet[/b][/url]

    [url=][b]Timberland Outlet[/b][/url]

    [url=][b]Ray Ban Outlet[/b][/url]

    [url=][b]Ray-Ban Sunglasses[/b][/url]

    [url=][b]Adidas NMD UK[/b][/url]

    [url=][b]Kate Spade Outlet Online[/b][/url]

    [url=][b]Ralph Lauren Outlet[/b][/url]

  3. I found your this post while looking for information regarding blog-related research ... It is a good post .. keep posting and updating speech recognition