Everyone has thoughts.
Writing them down makes them powerful.
Last week’s First Thought focused on how a combination of new interfaces and integrated services provide a foundation for disruption. Illustrated first in the touch-based smartphone revolution, with iPhone leading the way, this argument can be extrapolated further. The “killer app” that makes something like an “iTV” a viable, premium market opportunity is that very combination of disruptive interface with integrated service, only this time, I would propose the interface will be Siri, Apple’s natural language assistant, rather than touch.
We won’t all have an opportunity to bring “disruptive interfaces” to scale like Apple. On the other hand, as I've said earlier:
...recognition of this disruption by strategy, design and product teams of any size, combined with fluid network integration into useful packages of functionality and content, is something we can all do. Most product and service teams have jumped on the “touch” and app trains. Be ready to jump on the natural language train because Apple is likely to open Siri to developers after they iron out its kinks.
So let’s ponder a few questions. First, will Apple ultimately open Siri to the App Store developer community? Will they enable Mac OS X with Siri capability? Will they launch an Apple branded physical television with Siri or at least Siri-enable their AppleTV companion box? Finally, what will all this mean to us as strategists, designers and product developers?
Let’s answer these four questions rapid fire, one by one.
Question #1: Will Apple open Siri to the App Store developer community? Apple has not shared news of any potential “opening” of Siri API’s to developers. In fact, they’ve stated quite clearly they are not planning on opening Siri to older generation devices like the iPhone4 or 3GS and don’t plan on necessarily opening Siri to new non-Apple apps. The thing is, what Apple says is often not indicative of where they are going. We can remember back just a few years ago in 2007 at the iPhone’s launch when Jobs himself claimed they were not enabling developers to build Apps and pushed them to the web. It took about a year for Apple to build out infrastructure, stabilize iOS and ultimately launch the ridiculously successful iOS App Store. It’s fair to say Siri will be opened up to developers, albeit probably with some limitations, and this will spur a host of fascinating new applications and experiences.
Question #2: will they enable Mac OS X with Siri capability? Apple already has speech recognition built in to OS X. You probably just didn’t know about it. Of course, Speech Commands in its current form leaves something to be desired given it’s missing the whole “natural” part in language recognition. You have to spend far too much time setting up very specific commands with a result that doesn’t ever seem to pay it off. Given a similar, but lesser, capability already exists on the Mac, we can be assured Siri is coming. The question then becomes “When is it coming?” While it seems a long way off, we can probably bet on the next full version of the Mac OS following Lion. We don’t even have a codename for 10.8 yet (Liger?) so this could be a ways off. That said, probably isn’t a huge issue because unlike phones, laptops and desktops are devices we use for extended periods of time working on potentially very private subjects in the presence of multiple other people. We don’t all want to be talking to our devices in the middle of a busy office or Starbucks.
Question #3: Will they launch an Apple branded physical television with Siri or at least Siri-enable their AppleTV companion box? There are signs to suggest Apple won’t be entering the television business. If you think about it, the reason they built the iPod rather than enter the digital camera business–an industry where they had already innovated with the Quicktake–is because they saw the digital music space lacking compelling offerings. There was also a belief that the digital music space could be a massive growth area–few consumers had them–where lots of people already had digital cameras. If we consider TVs, there are many competitors producing quality products at remarkably low prices. It’s a tough market similar to where cameras were back when Jobs and Apple decided not to compete there. It’s also hard to ignore that televisions are products which people tend to keep around for years. While growth with TVs might be coming in emerging markets, we can’t say the same for the US or Europe.
That all said, it’s on record in the Jobs biography that he claimed to have “cracked TV”. If it’s true that he believed this and communicated it to Apple’s executive team, we know they will be working very hard to make this become reality. My hunch machine tells me it will be a next generation AppleTV box, not the entire TV, that somehow integrates cable services, iCloud, music, etc., in one device controlled effortlessly via Siri.
Question #4: What will all this mean to us as strategists, designers and product developers? This is the big question. While we can speculate the App store will become “Sirified”, can absolutely assume the next version of Max OS X will be natural voice-enabled, and we might even believe that Siri is the perfect way to interact with our TVs, it’s much more complicated to think about how designers and development teams with expertise in visually-based customer experiences will make the leap (or not) to natural voice.
In a couple of recent pieces worth reading, GigaOM’s Kevin Tofel dubbed Siri the invisible interface of the future while John Pavlus of Fast Company noted it’s The Ultimate Interface: None at All. I keep thinking of the not-too-distant-future where our mobile communications devices need be no larger than a Star Trek the Next Generation Badge. Given Apple’s willingness to reduce the interface of the shuffle to nothing more than a single button, it’s completely reasonable to imagine where this could be going. If Siri is “invisible” yet also the “ultimate” because of it’s lacking form, how the Hell do we design it?
One good thing is that over the last 15 years we’ve had to develop new methods of design and development for the web, for mobile, for tablets, for services and now for integrated multichannel experiences. This has prepared us to effectively manage the heuristic analysis and generative concepting required for multi-modal devices set in a variety of contexts. While it’s complex, experience strategy and design is better equipped than ever to manage it. Many voice-enabled devices will most likely just add another complimentary interaction mode–natural voice–to the palette.
A second valuable fact is that while “natural voice” hasn’t been a massive focus of user experience professionals, “voice” has been getting quite a bit of attention. The notion of a VUI, also known as a Voice User Interface, is ready for prime time and we should learn from those who have been focused on it to date. One example: Cohen, Balogh and Giangola’s book titled Voice User Interface Design. I’ll be honest: I have not read this book so I can’t wholeheartedly recommend it. That said, the first two authors listed are key players at Nuance Communication, also known as the providers of a key piece of technology which drives Dragon Dictation, a whole host of enterprise voice solutions and which powers Apple’s Siri. I expect this book, and others like it, to become much more popular with mainstream customer experience field rather than just the fringe.
A third fact is that what Siri does to achieve a “natural voice” rather than just a “voice” interface is by responding to factors (good) customer experience professionals already obsess about: context and recognition. Context is the understanding of “where”, “when” and “what” an interaction is about while recognition further refines context by providing the “who”, a persistent memory of its primary user. By articulating these, we can help shape how Siri behaves. Tools like journey maps, experience models and the definition of the typical roles of key channels already help us achieve much of what we’ll need know to also design for natural voice, even if we can’t see it.
As a customer experience professional, we’re in exciting times as the rapid introduction of and complexities managing new channels (search, social, mobile, tablet, etc.) and new user interfaces (touch, gestural, natural voice, etc.) make our capabilities more important than ever. This is great for job security but also great because we’re in a period of growth as a field which, hopefully, precedes a golden era of significantly better experiences for people. We can expect to extend tools we already use into the design of voice assistants for our products and services as well as create new ones. That is, at least until they become capable of designing themselves.
“Siri, design a voice interface for this.”
Two pieces of financial news were released last week which caught my eye: RIM, maker of the once business-ubiquitous Blackberry, traded at below book value while simultaneously Sony announced an expected $1.2 billion loss for fiscal year 2011. The once mighty Sony is now looking like they will stop manufacturing TVs altogether. Two companies that were once leaders in product performance and profits in mobile phones and televisions, respectively, are struggling to justify their existence. These pieces of officially reported news are especially interesting in the context of the rumor of the moment: Apple is finalizing a new TV to bring to market and that Steve Jobs, as quoted in his eponymously titled biography, had “finally cracked” TV.
While Sony and RIM’s situations have many differences, the companies also have a similarity: they were formerly believed to be leaders in design, i.e., the kind of design that produced high quality physical products typically sold at a premium. They held long-standing periods of dominance which now are crumbling around them. What happened?
This week, the illustrious Horace Dediu wrote a piece titled Revolutionary User Interfaces. In it, he uses market and financial performance data to show just how devastating the flexibility of multitouch has been on the mobile phone industry. The mighty have fallen and upstarts based on multitouch, and the benefits it affords, now control the vast majority of revenue and profit. To quote Dediu, “My hypothesis is that The Primary Cause for the shift of profits from Incumbents to Entrants has been the disruptive impact of a new input method.” He ends his analysis suggesting aloud that with Siri, Apple’s groundbreaking new natural language interface, “It looks like things are about to change all over again.”
Back in January of 2007, I wrote a piece purposely provocatively titled iPhone - the death of product design. I presented Motorola’s iconic RAZR as the negative foil to everything that was different (and great) about the iPhone, but could have just have easily used a Blackberry Pearl. As noted in that piece, “iPhone’s large touch screen elegantly transforms it into whatever it needs to be: a keyboard, a widescreen movie viewer, a random access voicemail interface.” Also, the “integration of experience (with the ‘network’)... (was) considerably more important than (the) product design.”
I recognize, as Dediu does, that the change to an infinitely more flexible input method was critical but I suggest just as critical was coupling it with this shift of focus from product to network. These two interrelated factors ultimately drove the iPhone’s groundbreaking success. Touch wouldn’t have exploded if we didn’t have the addictive ability to download apps or connect at a moments notice. Moto and RIM were absolutely outclassed by the integrated breadth of what was being addressed in the iPhone: input method and larger product ecosystem.
The big strategic leap to iTV is to connect Siri–a disruptive input method–with Gruber noted BloombergTV like Apps as the New Channels–an integration with the network in bite sized chunks. It’s magic. If you could control your entertainment center–still very much a hassle in 2011–through natural language, it might be enough to actually push people to pay a significant premium for an Apple-branded television.
Think about it. Touch interfaces, while good for many tasks, just aren’t that brilliant for running an entertainment device sitting across the room. Although all my iOS devices have the Remote app installed, I typically default to the surprisingly poorly designed physical Apple remote. Touch just doesn’t work that well when you’re mediating through a separate device. Sitting in my flat in London, I fiddle with a minimum of four remotes to control my Samsung television and a variety of devices.
But let’s stand in the future for a second.
Wouldn’t it be great to just speak the command, “Siri, are there any NBA games tonight? Put the Celtics game on.” Or, “Siri, play a slideshow of my photos of the Jurassic Coast.” What if we had complete access to any photo, video or piece of music we owned with almost zero effort? That’s what sits right around the corner by tying a Siri-enabled iTV to TV apps and iCloud. If Apple does release a television, that is our future. Bet on it.
The crazy thing is that Sony had similar assets at its disposal years ago. Sony were producing the best TV’s, powerful Vaio computers, the PS3 network and, critically, a working natural language interface artificial intelligence product: AIBO the robot dog. AIBO was an early Siri in cute robot dog form. This stuff was all out for sale in the world! It just wasn’t tied together.
Back in graduate school at the IIT Institute of Design, I mused on a project notionally titled AIBO media system integrating hardware, software and network access controlled by an AIBO natural language interface:
How insanely great would it be to have AIBO meet you at the door as you walk in, ask it to put on some soothing music and bring up the latest tech news on the TV? Combining complimentary capabilities which were already developed, Sony could have integrated, transformed and dominated the living room and home media experience. Unfortunately, Sony’s splintered global business structure focused the company into silos which could not conceive or execute the type of integration required to stitch this experience together. They missed their opportunity.
Because of the overhead required, bringing a disruptive input method to scale is typically the purview of only a few tech behemoths: the Apples and Microsofts of the world. On the other hand, recognition of this disruption by strategy, design and product teams of any size, combined with fluid network integration into useful packages of functionality and content, is something we can all do. Most product and service teams have jumped on the “touch” and app trains. Be ready to jump on the natural language train because Apple is likely to open Siri to developers after they iron out its kinks. More on that later.
*Update*
One day after posting this on the 10th of November, it was first reported on the Wall Street Journal and then a variety of other places that Sir Howard Stringer, CEO of SONY, was soon to unveil a platform which he had spent "five years building" specifically to tackle the emerging Apple and Jobs digital entertainment hegemony. Dubbed a "four screen strategy" which magically links the TV, PC, tablet and smartphone in a differentiated manner, Stringer placed a big emphasis on televisions themselves. As he's quoted in the WSJ, "There's a tremendous amount of R&D going into a different kind of TV set," he said, also admitting that there was "no doubt" that Steve Jobs knew that a reinvention of the TV market was needed. "That's what we're all looking for," said Mr. Stringer. "We can't continue selling TV sets [like the industry has been]. Every TV set we all make loses money."
It's a great thing Stringer recognizes this, but I'm not so sure he realizes it's not just about providing media fluidly across channels but also about optimizing those experiences themselves through unique, and as Dediu describes "disruptive", interfaces. Regardless, it is a boon for consumers to have another consumer electronics heavy weight bringing competition to the ecosystem and forcing everyone to raise their game.