Thursday, April 21, 2011

Editing tables with speech recognition and Word

Technology: Microsoft Word 2007, Windows Speech Recognition, Windows Speech Macros

If you read my previous post on speech interfaces, you know I am a fan of the Word Ribbon (or Fluent) interface but also gripe about the compromises made in the design. My biggest gripe is when working with elements whose menus are contextual. Take tables as an example. If the Home tab is visible and I want to add a new row above the current row, I have to find and click the Layout tab, and then click "Insert row above".

Doesn't seem like a big deal, but as soon as you click off the table, the Layout tab disappears. Click back on the table, and if you want to insert another row, you have to click the Layout tab again. The Ribbon interface is jumping around and doesn't understand my mind's context.

To improve my Word efficiency and reduce clicking, my goal was to be able to say "Insert row above" any time I wanted. At first I used existing Word keyboard shortcuts, and when none existed, I created Word macros and assigned them keystrokes, all of which I called from Windows Speech Macros. However, I soon realized that I would have to replicate these macros on all the computers I use regularly (2 desktops at work, 1 laptop, 2 laptops I borrow from the kids when my laptop is at work). I sought a better way to maintain it all. Did Word have standard keyboard combos for every command?

It turns out, every program does. You press the Alt key and get access to menus. Duh. I had completely forgotten about such a basic feature. In Microsoft Office programs that use the Ribbon, pressing Alt reveals a keyboard combination for everything in the Ribbon. Sometimes 2 keys must be pressed in a row.

For example, to make something bold, you can press Alt-H, 1. (I know you can press Ctrl-B, but it's an example.) To apply strikethrough, whose keyboard shortcut I can never remember (if there is one), you can press Alt-H, 4.

Getting back to the table example, assuming a table already exists and the cursor is blinking inside one of the cells, you can press Alt-J, L, A. (When you first hold down Alt, the Table Tools context menu set shows "JT" and "JL" - you only have to press Alt-J to begin the sequence.)

Converting these to Windows speech macros, you can create a speech macro called "apply strikethough":

<command priority="100">
<listenFor>apply strikethrough</listenFor>

Coding shortcuts for the commands I use most often (insert row, insert column, delete row, table properties, show/hide gridlines, select table/row/column, etc.), I really fly when using Word. It makes it almost pleasant.

Here is a list of all the Word Fluent shortcuts I use in a basic Windows speech macro format. Note that many will work in Powerpoint, Excel and others; most shortcuts are similar but not always exactly the same. Since your approach may differ, if you have a question on how to code it for you setup, send me an email.

Table commands

<command priority="100">
  <listenFor>insert row above</listenFor>
<command priority="100">
  <listenFor>insert row below</listenFor>
<command priority="100">
  <listenFor>insert column before</listenFor>
<command priority="100">
  <listenFor>insert column after</listenFor>
<command priority="100">
  <listenFor>merge cells</listenFor>
<command priority="100">
  <listenFor>split table</listenFor>
<command priority="100">
  <listenFor>delte row</listenFor>
  <listenFor>delte rows</listenFor>
<command priority="100">
  <listenFor>delete column</listenFor>
  <listenFor>delete columns</listenFor>
<command priority="100">
  <listenFor>?show table properties</listenFor>
<command priority="100">
  <listenFor>?show ?hide table gridlines</listenFor>

Track Changes

<command priority="100">
  <listenFor>next change</listenFor>
<command priority="100">
  <listenFor>accept change</listenFor>


<command priority="100">
  <listenFor>zoom ?to whole page</listenFor>
<command priority="100">
  <listenFor>zoom ?to page width</listenFor>

As always feel free to give me a shout if you have questions or new ideas. If I can't help someone out there can.

Wednesday, April 13, 2011

A voice command philosophy

I don't use voice recognition on my Windows 7 computer for speech-to-text. I have touch-typing so ingrained in me that it's very difficult to switch. (I tried a few times with not so great results.) I know one day I will work harder at it, but for now I'm content with using voice to control my computer only.

Using voice to control your computer, when working right, is such a natural way to command your computer, it makes using a computer pleasant. What is the right way? That will differ from person-to-person, but suffice to say that a switch in philosophy on the part of voice command designers is necessary to usher in the next big thing in voice interface control.

Take printing as an example. To open the print dialog, you could press ctrl-p, if you knew the shortcut. If you didn't, you would stumble into the File menu, click Print, and then click OK. When toolbars first became popular, especially in Word, they added a little picture of a printer, which you immediately thought of as a shortcut to the Print dialog. The problem was that, when you clicked it, some programs would think you meant "Print now" - after all, the toolbar is for shortcuts right?

Except that's not how most people were thinking. A "print now" command is more of a macro than a toolbar command. Most people thought clicking the print icon would bring up the Print dialog, because that's how it worked before. Unfortunately, for a few years, many trees died for nothing. Finally someone somewhere set this right, and now clicking a Print button brings up the Print dialog. Behavior matches user expectation.

But then what if you wanted a "Print Now" macro?, i.e., all you do is click Print and then OK? What you really want is to be able to do one of the following actions or combined actions:

  • "Show me the print dialog so I can select options before sending it to the printer"
  • "Print now using current settings"
Taking Word 2007 as an example, since there is no such command to print now, you could make a macro, or you could program an external macro utility to press the keys for you.

This works fine for a few macros, if you are inclined to create them. However, if you start creating a lot of these for different programs, it starts to become impossible to track, and the efficiencies gained in using a "print now" macro are outweighed by trying to remember that you assigned this macro to ctrl-alt-p. You would also have to remember that if you use Photoshop, that already performs a command. It becomes a lot of mental work to use something that is supposed to make your life easier.

Using language commands is different. If set up the right way, language commands that invoke multiple steps can be incredible empowering and reduce drudgery. You end up performing tasks, not substeps of tasks. I'm old enough to remember when my mother had to squeeze wet clothes through the wringer before hanging them up to dry; now washing machines all have spin cycles.

One example: I sometimes want to copy a filename from Windows Explorer. Steps: click the file, wait, click it again, Windows Explorer enters "rename file" mode; right-click and then click Copy. You can press F2 and Ctrl-C to speed things up. Your hands and mouse are moving a lot. You might need to look down to make sure you are pressing the right keys. It takes time to move your hands back and forth. A better way please?

Using a macro program I could write a macro that would do these steps for me. But then to initiate the macro, I would have to either click a 'macro' menu or press a non-standard key combination.

But... (and getting back to speech) if you were to assign this macro to a voice command, your life becomes much easier. You could click the file, say "Copy filename", wait a second, and then continue. Your hand doesn't move from the mouse. Better still, it's much easier to remember to say "copy filename" than it is to remember to press ctrl-shift-alt-c, a combination which is likely to be available but also requires you to let go of the mouse and perhaps look down to see what keys you are pressing.

Through a combination of using Windows Speech Recognition, Windows Speech Macros, and Autohotkey (a macro program), I am able to do this now. I have commands for copying files, opening a folder to a DOS window, opening the Word paragraph dialog box (directly to the tab of my choice), zooming in and out, and of course a print now command.

Additionally, I started to notice that I sometimes change my speech when working quickly. So I might say "Print it now" instead of just "Print now". My macros are smart enough to figure out that it means the same thing. The computer is accomodating the fact that I'm not a computer, and that I can't remember how to do everything perfectly every time. The computer is working for *me*.

It requires a lot of investment on my part, and I'm continuously updating my macros. However, the point is, the masses will not put so much work into their computer setup to get this to work, nevermind needing to be so technical to do this and to get what I want. A next-generation voice command system will know that, no matter where I am in Windows, when I say "new email", it knows to open Outlook and hit Ctrl-N. It should give me some flexibility to know that if a browser window is open, and I'm logged into Gmail, saying "new email" will start a new email witihin Gmail instead of through Outlook. (In fact this is how my speech macros behave.) Customizing what target listens to what and when requires some input on my part, but why should anyone need to know how to write code to accomplish so many everyday tasks.

Finally, as I hinted in the aforementioned paragraph, a proper voice-command system should not be so literal with context. If I'm in Word and I want to create a new email, I don't want to have to say "Start Outlook, new email". I don't want to have to remember that Outlook is already open, in which I should say "Switch to Outlook, new email". I want to say "New email" and have my computer with a gazillion circuits figure it out for me.

Getting back to Word as an example, I think the Fluent interface introduced in Word 2007 is great. There are some shortcomings, but I like it better than plain menus. You can also use Windows Speech recognition to control Word out of the box. Whatever you see, you can click. So for example, if I'm in a table, and I want to add a new row below, I can say "Click Layout tab" and then "Click Insert Below".

That's great, but why do I even have to remember that the table insert command is in the Layout tab? Why can't I just say "Insert a new row below" and the computer will figure it out? Think about it from a non-technical user's perspective. He's created a table. He wants a new row. He's thinking "new row". Why can't he just say "new row"?

There are some programs out there that try to do this for you, but as is often the case, this is something that should be built into the operating system, and unless you are dedicated to learning it (you have a disability, say), you will stick with tried and true (mouse and keyboard). Which is unfortunate, because there is joy in being able to tell the computer "Empty the trash." This is what computers were meant to be.

Looking forward to this future.

Monday, August 23, 2010

Clicking with Windows Speech Macros - Update

My trip down speech recognition lane started with discovery of the free Camera Mouse program ( I started using the Camera Mouse to control the mouse pointer, but you could not click, at least not easily. I experimented with various ways of accomplishing a click, and settled on using Windows 7 Speech Recognition and speech macros you can install. (This PDF and other downloads at:

After several weeks I began to try different ways of improving how the macros work. Here is an update to the mouse click macros available from If you want to jump right into them, scroll to the bottom of this post.

Keywords Used

Because Windows Speech Recognition is trying its best to understand everything you say as a command, it was tough to find a unique word that means "click". This is because, if you use "click" as your word to listen for, Windows Speech Recognition will often confuse it with other commands of its own. In the end I chose "iclick" because it was different enough, but even then it wasn't 100% foolproof, and sometimes Windows thought I said "iright" instead of "iclick".

I've been a devourer of the MSDN speech documentation ( and Rob's Rhapsody ( I soon discovered a seemingly undocumented speech macro feature called "priority" which seems to tell Speech Recognition to make your macros more important than what Windows wants to do.

Because of the priority attribute, I'm able to use the words "click", "right-click" and "double-click" in my macros and they work almost flawlessly. Example left click:

<command priority="100">
<listenFor>?mouse click</listenFor> <mouse button="left" command="click" />

What's the question mark for?

You'll notice that the "listenFor" definition has "?mouse click". The question mark indicates that the word is optional. This is because when I'm "talking" to my computer, I sometimes say "mouse click" and sometimes just "click". You might want to adjust the "listenFor" to your own habits. For example, if you say "left-click" often, you could write:

<listenFor>?left click</listenFor>

What else is new?

I added a triple-click command. This was accomplished by clicking three times with a slight delay between each click.

I also added a Ctrl-Click command, which means you can open multiple windows in a browser (for example) without touching the keyboard. This is done by adding a modifier attribute to the mouse statement:

<mouse button="left" command="click" modifierKeys="^" /> 

My current mouse macros

Note: Be sure to replace "[YOURPATHTOAUTOHOTKEYMACROS]" with your path to the two autohotkey speech macros, which are separate macros used to perform the hold down and release mouse button actions.

<command priority="100"> <listenFor>?mouse click</listenFor> <mouse button="left" command="click" />

<command priority="100">
<listenFor>?mouse control-click</listenFor> <mouse button="left" command="click" modifierKeys="^" />

<command priority="100">
<listenFor>?mouse double-click</listenFor> <mouse button="left" command="dblclick" />

<command priority="100">
<listenFor>?mouse triple-click</listenFor> <mouse button="left" command="click" />  <waitFor seconds="0.05" /> <mouse button="left" command="click" />  <waitFor seconds="0.05" /> <mouse button="left" command="click" />

<command priority="100">
<listenFor>?mouse context ?menu</listenFor> <listenFor>?mouse right-click</listenFor> <mouse button="right" command="click" />

<command priority="100"> <listenFor>?mouse hold</listenFor> <run command="[YOURPATHTOAUTOHOTKEYMACROS]\mousedown.ahk" params=""/>

<command priority="100"> <listenFor>?mouse release</listenFor> <run command="[YOURPATHTOAUTOHOTKEYMACROS]\mouseup.ahk" params=""/>

As always feel free to send me questions.

Monday, August 9, 2010

Using Windows 7 Speech Recognition - Introduction

Recently I've been trying to use Windows 7 Speech Recognition to control my computer. I can still type but constant soreness is forcing me to find alternate ways to perform actions. I'm a very efficient computer user so the new way of doing things has to be pretty good.

I've tried speech recognition over the years, starting with Macintosh, but nothing quite fit for me.

Since my current employer uses Windows 7, I decided to turn it on and see how well it works.

The short conclusion is, pretty well. Many people are saying that Windows 7 Speech Recognition is just as good as Dragon Naturally Speaking. I've never used Dragon so I can't say. My needs are also different in that I just want to control my computer at the moment; I don't want to try to use dictation (though it's a goal down the road).

The essential part of the solution for me was the installation of Windows Speech Macros. Why these aren't just included with Windows 7 I can't hazard a guess, but they go a long way toward tailoring speech control to your needs.

Basic Setup Instructions
These instructions work for Windows Vista and Windows 7.
  1. Connect a microphone to your computer.
  2. Turn on Speech Recognition (Control Panel >  Speech Recognition, click Start Speech Recognition) and follow the wizard.
  3. Follow the speech tutorial and try out some commands to get the hang of it. For example, you can say “Show Desktop” to minimize all windows, or “Start” to “click” the Start menu.

Commands I commonly use 'out of the box':
  • Show Desktop
  • Close Window (actually closes the entire application, even if multiple tabs are open)
  • Minimize Window
  • Maximize Window
  • Open Control Panel (or anything else in the Start menu)
  • Copy, Cut, Delete, Paste, Undo, OK, Cancel
  • Any command available in an Windows Explorer tab bar (such as New Folder, but it has to be visible)
  • Click [button] - example, if a dialog box is visible, I can say Click Save.

Tips and Gotchas
  • Lower your expectations. Don't try to control the entire UI right from the start. Get used to making sure a few commands work before you try to use a few more.
  • If you say something that has multiple targets given the current context, you will see numbers appear on screen. If you see these, say the number out loud. The number will change to OK, and then you say OK to activate the command. For example, if you are in Microsoft Word 2007, and the Home tab is visible, and you say Paste, there are actually two paste commands available.
  • Controlling your computer with verbal commands is a cognitive challenge once you get into it. You may have noticed that you find it easier to perform commands when given a list of options - you are relying on recognition of a command and you are not having to recall it. However, with voice commands, you have to recall everything, which means it gets harder as you try to do more. I keep a list of commands taped to my monitor, and when I've got one down pat I remove it and put something else in its place.

I'll post information on what I've done for Windows Speech Macros to make my life easier in a future blog. Have fun for now!