Diff for "Voice Input" - 4ourth Mobile Design Pattern Library

Differences between revisions 9 and 10

Problem

A method must be provided to control some or all of the functions of the mobile device, or provide text input, without handling the device.

Solution

Voice Input has for many decades promised to relieve users of all sorts of systems from executing complex commands in unnatural or distracting ways. Some specialized, expensive products have met these goals in fields like aviation, while desktop computing continues to not meet promises, or gain wide acceptance.

Mobile is however uniquely positioned to exploit voice as an input and control mechanism, and has uniquely higher demand for such a feature. The ubiquity of the device means many users with low-vision or poor motor function (so poor entry) demand alternative methods of input. Near universal use, and contextual requirements such as safety -- for example to use navigation devices while operating a vehicle -- demand eyes-off, and hands-off control methods.

And lastly, many mobile devices are (or are based on) mobile handsets, so already have speakers, microphones designed for voice quality communications, and voice processing embedded into the device chipset.

Since most mobile devices are now connected, or only are useful when connected to the network, an increasingly useful option is for a remote server to perform all the speech recognition functions. This can even be used for fairly core functions, such as dialing the handset, as long as a network connection is required for the function to be performed anyway. For mobile handsets, the use of the voice channel is especially advantageous as no special effort must be made to gather or encode the input audio.

Variations

Voice Command - use voice to input a limited number of commands; akin to use of Accesskeys but with a larger set of commands. big problem in affordance much like gestural or other touch commands, they are not on screen and generally cannot be due to space...

text - speech recognition (voice recognition implies user dependent input)... to type with the voice *** Always use "user-independent" systems for general use... only build user voice profiles (user dependent) when needed, such as for specialized languages or libraries of words.***

A detailed discussion of the methods used for recognizing speech is beyond the scope of this book, and is covered in detail in a number of other sources.

Interaction Details

usually, mobile devices use key or touch input and visual output, so have to initiate any voice input from one of these methods... to support low-sighted users or eyes-off use cases, suggest a key or key-combination. common one is something already associated with audio like speakerphone, as a long press

when active, should make a Tone or voice readback/reminder of the condition (e.g. "Say a command")... after this, the system accepts input.

when done, usually should read back what was entered...

during this, much like pen input where you get a correction time, "no" wipes or allows for selection from a list...

For Voice Command, as much interactivity as practical should be provided. When controlling the device OS, all the basic functions must be able to be performed, by offering controls such as Directional Entry and the ability to activate menus. This also may mean that a complete scroll-and-select style focus assignment system is required, even for devices that otherwise rely purely on touch or pen input.

Provide an easy method to abandon the Voice Input function, and return to keyboard or touch screen entry, without abandoning the entire current process. The best method for this will reverse the command used to enter the mode, such as the press-and-hold speakerphone key.

Presentation Details

... input should also have a visual component, to support glancing at the device, or completing the task by switching to a hands-on/eyes-on mode.

...hints should be provided on screen to activate. use common shorthand icons when possible. When space provides, such as text input into a single field (e.g. for a search field) provide additional on-screen instructions, so first-time users may become accustomed to the functionality...

Antipatterns

Audio systems and processing cannot be relied on to be full duplex so don't get in the way with too-fast response, etc.

-  ⇤ ← Revision 9 as of 2011-04-05 01:08:39 → 
  Size: 3853
  Editor: shoobe01
  Comment:
+   ← Revision 10 as of 2011-04-05 01:14:55 → ⇥
  Size: 4430
  Editor: shoobe01
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 33:
-input should also have a visual component, to support glancing at the device, or completing the task by switching to a hands-on/eyes-on mode.
+For Voice Command, as much interactivity as practical should be provided. When controlling the device OS, all the basic functions must be able to be performed, by offering controls such as '''[[Directional Entry]]''' and the ability to activate menus. This also may mean that a complete scroll-and-select style focus assignment system is required, even for devices that otherwise rely purely on touch or pen input.
 Line 35:
-For Voice Command, as much interactivity as practical should be provided. When controlling the device OS, all the basic functions must be able to be performed, by offering controls such as '''[[Directional Entry]]''' and the ability to activate menus. This also may mean that a complete scroll-and-select style focus assignment system is required, even for devices that otherwise rely purely on touch or pen input.
+Provide an easy method to abandon the '''Voice Input''' function, and return to keyboard or touch screen entry, without abandoning the entire current process. The best method for this will reverse the command used to enter the mode, such as the press-and-hold speakerphone key.
 Line 39:
+... input should also have a visual component, to support glancing at the device, or completing the task by switching to a hands-on/eyes-on mode.

...hints should be provided on screen to activate. use common shorthand icons when possible. When space provides, such as text input into a single field (e.g. for a search field) provide additional on-screen instructions, so first-time users may become accustomed to the functionality...