First Thought
Siri: Designing the Invisible Interface
Last week’s First Thought focused on how a combination of new interfaces and integrated services provide a foundation for disruption. Illustrated first in the touch-based smartphone revolution, with iPhone leading the way, this argument can be extrapolated further. The “killer app” that makes something like an “iTV” a viable, premium market opportunity is that very combination of disruptive interface with integrated service, only this time, I would propose the interface will be Siri, Apple’s natural language assistant, rather than touch.
We won’t all have an opportunity to bring “disruptive interfaces” to scale like Apple. On the other hand, as I've said earlier:
...recognition of this disruption by strategy, design and product teams of any size, combined with fluid network integration into useful packages of functionality and content, is something we can all do. Most product and service teams have jumped on the “touch” and app trains. Be ready to jump on the natural language train because Apple is likely to open Siri to developers after they iron out its kinks.
So let’s ponder a few questions. First, will Apple ultimately open Siri to the App Store developer community? Will they enable Mac OS X with Siri capability? Will they launch an Apple branded physical television with Siri or at least Siri-enable their AppleTV companion box? Finally, what will all this mean to us as strategists, designers and product developers?
Let’s answer these four questions rapid fire, one by one.
Question #1: Will Apple open Siri to the App Store developer community? Apple has not shared news of any potential “opening” of Siri API’s to developers. In fact, they’ve stated quite clearly they are not planning on opening Siri to older generation devices like the iPhone4 or 3GS and don’t plan on necessarily opening Siri to new non-Apple apps. The thing is, what Apple says is often not indicative of where they are going. We can remember back just a few years ago in 2007 at the iPhone’s launch when Jobs himself claimed they were not enabling developers to build Apps and pushed them to the web. It took about a year for Apple to build out infrastructure, stabilize iOS and ultimately launch the ridiculously successful iOS App Store. It’s fair to say Siri will be opened up to developers, albeit probably with some limitations, and this will spur a host of fascinating new applications and experiences.
Question #2: will they enable Mac OS X with Siri capability? Apple already has speech recognition built in to OS X. You probably just didn’t know about it. Of course, Speech Commands in its current form leaves something to be desired given it’s missing the whole “natural” part in language recognition. You have to spend far too much time setting up very specific commands with a result that doesn’t ever seem to pay it off. Given a similar, but lesser, capability already exists on the Mac, we can be assured Siri is coming. The question then becomes “When is it coming?” While it seems a long way off, we can probably bet on the next full version of the Mac OS following Lion. We don’t even have a codename for 10.8 yet (Liger?) so this could be a ways off. That said, probably isn’t a huge issue because unlike phones, laptops and desktops are devices we use for extended periods of time working on potentially very private subjects in the presence of multiple other people. We don’t all want to be talking to our devices in the middle of a busy office or Starbucks.
Question #3: Will they launch an Apple branded physical television with Siri or at least Siri-enable their AppleTV companion box? There are signs to suggest Apple won’t be entering the television business. If you think about it, the reason they built the iPod rather than enter the digital camera business–an industry where they had already innovated with the Quicktake–is because they saw the digital music space lacking compelling offerings. There was also a belief that the digital music space could be a massive growth area–few consumers had them–where lots of people already had digital cameras. If we consider TVs, there are many competitors producing quality products at remarkably low prices. It’s a tough market similar to where cameras were back when Jobs and Apple decided not to compete there. It’s also hard to ignore that televisions are products which people tend to keep around for years. While growth with TVs might be coming in emerging markets, we can’t say the same for the US or Europe.
That all said, it’s on record in the Jobs biography that he claimed to have “cracked TV”. If it’s true that he believed this and communicated it to Apple’s executive team, we know they will be working very hard to make this become reality. My hunch machine tells me it will be a next generation AppleTV box, not the entire TV, that somehow integrates cable services, iCloud, music, etc., in one device controlled effortlessly via Siri.
Question #4: What will all this mean to us as strategists, designers and product developers? This is the big question. While we can speculate the App store will become “Sirified”, can absolutely assume the next version of Max OS X will be natural voice-enabled, and we might even believe that Siri is the perfect way to interact with our TVs, it’s much more complicated to think about how designers and development teams with expertise in visually-based customer experiences will make the leap (or not) to natural voice.
In a couple of recent pieces worth reading, GigaOM’s Kevin Tofel dubbed Siri the invisible interface of the future while John Pavlus of Fast Company noted it’s The Ultimate Interface: None at All. I keep thinking of the not-too-distant-future where our mobile communications devices need be no larger than a Star Trek the Next Generation Badge. Given Apple’s willingness to reduce the interface of the shuffle to nothing more than a single button, it’s completely reasonable to imagine where this could be going. If Siri is “invisible” yet also the “ultimate” because of it’s lacking form, how the Hell do we design it?

One good thing is that over the last 15 years we’ve had to develop new methods of design and development for the web, for mobile, for tablets, for services and now for integrated multichannel experiences. This has prepared us to effectively manage the heuristic analysis and generative concepting required for multi-modal devices set in a variety of contexts. While it’s complex, experience strategy and design is better equipped than ever to manage it. Many voice-enabled devices will most likely just add another complimentary interaction mode–natural voice–to the palette.
A second valuable fact is that while “natural voice” hasn’t been a massive focus of user experience professionals, “voice” has been getting quite a bit of attention. The notion of a VUI, also known as a Voice User Interface, is ready for prime time and we should learn from those who have been focused on it to date. One example: Cohen, Balogh and Giangola’s book titled Voice User Interface Design. I’ll be honest: I have not read this book so I can’t wholeheartedly recommend it. That said, the first two authors listed are key players at Nuance Communication, also known as the providers of a key piece of technology which drives Dragon Dictation, a whole host of enterprise voice solutions and which powers Apple’s Siri. I expect this book, and others like it, to become much more popular with mainstream customer experience field rather than just the fringe.
A third fact is that what Siri does to achieve a “natural voice” rather than just a “voice” interface is by responding to factors (good) customer experience professionals already obsess about: context and recognition. Context is the understanding of “where”, “when” and “what” an interaction is about while recognition further refines context by providing the “who”, a persistent memory of its primary user. By articulating these, we can help shape how Siri behaves. Tools like journey maps, experience models and the definition of the typical roles of key channels already help us achieve much of what we’ll need know to also design for natural voice, even if we can’t see it.
As a customer experience professional, we’re in exciting times as the rapid introduction of and complexities managing new channels (search, social, mobile, tablet, etc.) and new user interfaces (touch, gestural, natural voice, etc.) make our capabilities more important than ever. This is great for job security but also great because we’re in a period of growth as a field which, hopefully, precedes a golden era of significantly better experiences for people. We can expect to extend tools we already use into the design of voice assistants for our products and services as well as create new ones. That is, at least until they become capable of designing themselves.
“Siri, design a voice interface for this.”