Thursday, April 21, 2011

Editing tables with speech recognition and Word

Technology: Microsoft Word 2007, Windows Speech Recognition, Windows Speech Macros

If you read my previous post on speech interfaces, you know I am a fan of the Word Ribbon (or Fluent) interface but also gripe about the compromises made in the design. My biggest gripe is when working with elements whose menus are contextual. Take tables as an example. If the Home tab is visible and I want to add a new row above the current row, I have to find and click the Layout tab, and then click "Insert row above".

Doesn't seem like a big deal, but as soon as you click off the table, the Layout tab disappears. Click back on the table, and if you want to insert another row, you have to click the Layout tab again. The Ribbon interface is jumping around and doesn't understand my mind's context.

To improve my Word efficiency and reduce clicking, my goal was to be able to say "Insert row above" any time I wanted. At first I used existing Word keyboard shortcuts, and when none existed, I created Word macros and assigned them keystrokes, all of which I called from Windows Speech Macros. However, I soon realized that I would have to replicate these macros on all the computers I use regularly (2 desktops at work, 1 laptop, 2 laptops I borrow from the kids when my laptop is at work). I sought a better way to maintain it all. Did Word have standard keyboard combos for every command?

It turns out, every program does. You press the Alt key and get access to menus. Duh. I had completely forgotten about such a basic feature. In Microsoft Office programs that use the Ribbon, pressing Alt reveals a keyboard combination for everything in the Ribbon. Sometimes 2 keys must be pressed in a row.

For example, to make something bold, you can press Alt-H, 1. (I know you can press Ctrl-B, but it's an example.) To apply strikethrough, whose keyboard shortcut I can never remember (if there is one), you can press Alt-H, 4.

Getting back to the table example, assuming a table already exists and the cursor is blinking inside one of the cells, you can press Alt-J, L, A. (When you first hold down Alt, the Table Tools context menu set shows "JT" and "JL" - you only have to press Alt-J to begin the sequence.)

Converting these to Windows speech macros, you can create a speech macro called "apply strikethough":

<command priority="100">
<listenFor>apply strikethrough</listenFor>
<sendKeys>{ALT}H4</sendKeys>
</command>

Coding shortcuts for the commands I use most often (insert row, insert column, delete row, table properties, show/hide gridlines, select table/row/column, etc.), I really fly when using Word. It makes it almost pleasant.

Here is a list of all the Word Fluent shortcuts I use in a basic Windows speech macro format. Note that many will work in Powerpoint, Excel and others; most shortcuts are similar but not always exactly the same. Since your approach may differ, if you have a question on how to code it for you setup, send me an email.


Table commands


<command priority="100">
  <listenFor>insert row above</listenFor>
  <sendKeys>{ALT}jla</sendKeys>
</command>
<command priority="100">
  <listenFor>insert row below</listenFor>
  <sendKeys>{ALT}jle</sendKeys>
</command>
<command priority="100">
  <listenFor>insert column before</listenFor>
  <sendKeys>{ALT}jll</sendKeys>
</command>
<command priority="100">
  <listenFor>insert column after</listenFor>
  <sendKeys>{ALT}jlr</sendKeys>
</command>
<command priority="100">
  <listenFor>merge cells</listenFor>
  <sendKeys>{ALT}jlm</sendKeys>
</command>
<command priority="100">
  <listenFor>split table</listenFor>
  <sendKeys>{ALT}jlq</sendKeys>
</command>
<command priority="100">
  <listenFor>delte row</listenFor>
  <listenFor>delte rows</listenFor>
  <sendKeys>{ALT}jdr</sendKeys>
</command>
<command priority="100">
  <listenFor>delete column</listenFor>
  <listenFor>delete columns</listenFor>
  <sendKeys>{ALT}jdc</sendKeys>
</command>
<command priority="100">
  <listenFor>?show table properties</listenFor>
  <sendKeys>{ALT}jdo</sendKeys>
</command>
<command priority="100">
  <listenFor>?show ?hide table gridlines</listenFor>
  <sendKeys>{ALT}jltg</sendKeys>
</command>



Track Changes


<command priority="100">
  <listenFor>next change</listenFor>
  <sendKeys>{ALT}rh</sendKeys>
</command>
<command priority="100">
  <listenFor>accept change</listenFor>
  <sendKeys>{ALT}rac</sendKeys>
</command>





Zooming


<command priority="100">
  <listenFor>zoom ?to whole page</listenFor>
  <sendKeys>{ALT}w1</sendKeys>
</command>
<command priority="100">
  <listenFor>zoom ?to page width</listenFor>
  <sendKeys>{ALT}wi</sendKeys>
</command>




As always feel free to give me a shout if you have questions or new ideas. If I can't help someone out there can.

Wednesday, April 13, 2011

A voice command philosophy

I don't use voice recognition on my Windows 7 computer for speech-to-text. I have touch-typing so ingrained in me that it's very difficult to switch. (I tried a few times with not so great results.) I know one day I will work harder at it, but for now I'm content with using voice to control my computer only.

Using voice to control your computer, when working right, is such a natural way to command your computer, it makes using a computer pleasant. What is the right way? That will differ from person-to-person, but suffice to say that a switch in philosophy on the part of voice command designers is necessary to usher in the next big thing in voice interface control.

Take printing as an example. To open the print dialog, you could press ctrl-p, if you knew the shortcut. If you didn't, you would stumble into the File menu, click Print, and then click OK. When toolbars first became popular, especially in Word, they added a little picture of a printer, which you immediately thought of as a shortcut to the Print dialog. The problem was that, when you clicked it, some programs would think you meant "Print now" - after all, the toolbar is for shortcuts right?

Except that's not how most people were thinking. A "print now" command is more of a macro than a toolbar command. Most people thought clicking the print icon would bring up the Print dialog, because that's how it worked before. Unfortunately, for a few years, many trees died for nothing. Finally someone somewhere set this right, and now clicking a Print button brings up the Print dialog. Behavior matches user expectation.

But then what if you wanted a "Print Now" macro?, i.e., all you do is click Print and then OK? What you really want is to be able to do one of the following actions or combined actions:

  • "Show me the print dialog so I can select options before sending it to the printer"
  • "Print now using current settings"
Taking Word 2007 as an example, since there is no such command to print now, you could make a macro, or you could program an external macro utility to press the keys for you.

This works fine for a few macros, if you are inclined to create them. However, if you start creating a lot of these for different programs, it starts to become impossible to track, and the efficiencies gained in using a "print now" macro are outweighed by trying to remember that you assigned this macro to ctrl-alt-p. You would also have to remember that if you use Photoshop, that already performs a command. It becomes a lot of mental work to use something that is supposed to make your life easier.

Using language commands is different. If set up the right way, language commands that invoke multiple steps can be incredible empowering and reduce drudgery. You end up performing tasks, not substeps of tasks. I'm old enough to remember when my mother had to squeeze wet clothes through the wringer before hanging them up to dry; now washing machines all have spin cycles.

One example: I sometimes want to copy a filename from Windows Explorer. Steps: click the file, wait, click it again, Windows Explorer enters "rename file" mode; right-click and then click Copy. You can press F2 and Ctrl-C to speed things up. Your hands and mouse are moving a lot. You might need to look down to make sure you are pressing the right keys. It takes time to move your hands back and forth. A better way please?

Using a macro program I could write a macro that would do these steps for me. But then to initiate the macro, I would have to either click a 'macro' menu or press a non-standard key combination.

But... (and getting back to speech) if you were to assign this macro to a voice command, your life becomes much easier. You could click the file, say "Copy filename", wait a second, and then continue. Your hand doesn't move from the mouse. Better still, it's much easier to remember to say "copy filename" than it is to remember to press ctrl-shift-alt-c, a combination which is likely to be available but also requires you to let go of the mouse and perhaps look down to see what keys you are pressing.

Through a combination of using Windows Speech Recognition, Windows Speech Macros, and Autohotkey (a macro program), I am able to do this now. I have commands for copying files, opening a folder to a DOS window, opening the Word paragraph dialog box (directly to the tab of my choice), zooming in and out, and of course a print now command.

Additionally, I started to notice that I sometimes change my speech when working quickly. So I might say "Print it now" instead of just "Print now". My macros are smart enough to figure out that it means the same thing. The computer is accomodating the fact that I'm not a computer, and that I can't remember how to do everything perfectly every time. The computer is working for *me*.

It requires a lot of investment on my part, and I'm continuously updating my macros. However, the point is, the masses will not put so much work into their computer setup to get this to work, nevermind needing to be so technical to do this and to get what I want. A next-generation voice command system will know that, no matter where I am in Windows, when I say "new email", it knows to open Outlook and hit Ctrl-N. It should give me some flexibility to know that if a browser window is open, and I'm logged into Gmail, saying "new email" will start a new email witihin Gmail instead of through Outlook. (In fact this is how my speech macros behave.) Customizing what target listens to what and when requires some input on my part, but why should anyone need to know how to write code to accomplish so many everyday tasks.

Finally, as I hinted in the aforementioned paragraph, a proper voice-command system should not be so literal with context. If I'm in Word and I want to create a new email, I don't want to have to say "Start Outlook, new email". I don't want to have to remember that Outlook is already open, in which I should say "Switch to Outlook, new email". I want to say "New email" and have my computer with a gazillion circuits figure it out for me.

Getting back to Word as an example, I think the Fluent interface introduced in Word 2007 is great. There are some shortcomings, but I like it better than plain menus. You can also use Windows Speech recognition to control Word out of the box. Whatever you see, you can click. So for example, if I'm in a table, and I want to add a new row below, I can say "Click Layout tab" and then "Click Insert Below".

That's great, but why do I even have to remember that the table insert command is in the Layout tab? Why can't I just say "Insert a new row below" and the computer will figure it out? Think about it from a non-technical user's perspective. He's created a table. He wants a new row. He's thinking "new row". Why can't he just say "new row"?

There are some programs out there that try to do this for you, but as is often the case, this is something that should be built into the operating system, and unless you are dedicated to learning it (you have a disability, say), you will stick with tried and true (mouse and keyboard). Which is unfortunate, because there is joy in being able to tell the computer "Empty the trash." This is what computers were meant to be.

Looking forward to this future.