Siri - Opportunity (10/25/13 & 12/17/11)
Nearly two years has passed since I published the below. I believe it safe to suggest that Siri has not performed up to market expectations and its vast potential remains largely unfulfilled. It has improved since launch (the "beta" tag was removed with Apple's release of iOS7) but user frustrations with Siri means that only small number of iPhone owners regularly use it.
I planned to write a Siri update blog until I came across a lengthly article that pretty much covered everything I wanted to share (as well as rehashing a number of things I included in my first two Siri blogs). Rather than replicating what Bianca Bosker has done, I will instead include this link to her excellent piece: SIRI RISING: The Inside Story Of Siri's Origins -- And Why She Could Overshadow The iPhone.
The below is my original blog published 12/17/2011.
The first time I experienced Siri I quickly concluded that Apple’s voice-driven digital assistant and knowledge navigator will eventually be considered as pivotal a development as the Mac, iPod and iPhone. For the first time Apple’s Siri has given us a “good enough” general purpose, natural language voice interface and this means that there is no going back.
This seminal event, combined with my voice technology experience and entrepreneurial bent inspired me to begin investigating business models that tap into Siri, trying to figure where will opportunities exist and what form will they take. To do this I decided to explore some fundamental questions…
- Will Apple license its Siri technology?
- Will third parties be allowed to develop for the Siri Backend?
- Will there be Siri iOS app development opportunities?
This blog addresses these questions and related topics.
Before jumping into exploring opportunities, I believe it merited that I share my enhanced understanding of Siri’s competitive landscape. I began my previous Siri blog with a background of Siri. What I knew but now more fully appreciate is that Siri is the progeny of 40+ years of bleeding edge artificial intelligence, machine learning and natural language research and development. So comparisons of Siri to Google’s Voice Actions, Amazon’s Yap or worse, Microsoft’s TellMe is like comparing a computer-controlled Christmas light show to the Clapper…they both turn lights on and off but they are two entirely different things.
Only Siri benefits from thousands of person-years of research and untold millions of DARPA dollars. Comparisons of Siri to Android are sort of pointless, even given Android’s natural language extensions and Google promise to improve things further. Reviews often focus on features, ceding that Siri is superior at understanding but that this a feature somehow on par to say, voice versus button activation. By doing so writers miss the point and diminish the truth of Siri.
The truth is that Siri is really good at understanding meaning and context, maintaining conversations, handling accents and learning while all other solutions really just understand words—if spoken clearly enough. Ask TellMe, “Will I need an umbrella today?” and you’ll get nothing. Ask Siri the same thing and it will tell you if you if it looks like rain and display the weather forecast…two different things.
Does everyone care about understanding versus acting on learned commands? No. Are there those out there that will never buy an Apple product? You bet. Thus Voice Actions, Yap and others will find their markets in the near term. But note that Siri today only focuses on the same core things as its competitors—and they all have been around a lot longer. I think that as Apple extends Siri beyond contacts, calendaring, messaging, etc., the market will come to appreciate that Siri is an, ah, Apple and that every other competitor is an orange.
To give context to that which follows, the below summarizes essential, verifiable statements/facts about Siri
- Apple designated its initial Siri release a Beta product as the company knew full well that as impressive as Siri is today, there is still a great deal of work to be done before they can mainstream the product; Siri today is but a first step.
- As expected from a Beta release, Apple is dealing with some very real issues… outages, inability to launch apps, incomplete app support (e.g., Siri can access contacts but cannot create them), voice recognition limitations, etc.
- Where Apple will take Siri is not publically known though they have committed to support 5 new languages: Japanese, Chinese, Korean, Italian and Spanish.
- Siri only works in the US with US English and is only approved for use with iPhone 4S—when the rest of the world gets the full experience is not known.
- Siri is largely closed—aside from Google, Yahoo, Wolfram|Alpha, Yelp, and Wikipedia, Siri only works with core Apple iPhone services.
- Apple has not announced a Siri API/SDK or plans to license the technology to others; moreover, there is no guarantee that Apple will ever open the platform to third party developers.
- Siri interfaces with core iPhone apps, data and services but does not launch or communicate with third party iOS apps.
- Developers have shown that third parties can use the iPhone 4S’s user interface to perform tasks not supported by Siri’s Backend thereby proving that Apple cannot exert total control over Siri and this underscores that near-term opportunities exist.
- Siri learns, both at the user level and, through the analysis of crowd sourced requests, at Siri’s Backend.
- What separates Siri from its competition its handling of natural language—understanding what is said, meant and context—but it works correctly only around 90% of the time; to “compete” with humans, Siri likely needs to handle around 97% of all requests and reaching this point is years away.
- What makes Siri delightful is the personality that Apple has programmed into Siri; Siri’s persona is well on its way to becoming a powerful Apple brand.
I believe that Siri reflects what is best about Apple…that they go to great lengths to humanize technology and this is especially true with Siri.
Any exploration of Siri opportunities must begin with Apple Inc. Apple is secretive and its technology mostly closed so it is extremely challenging to accurately predict how they might open the platform, should they ever decide to do so. Given Apple’s secrecy-enhanced marketing (e.g., frenzied launch events), the premium the brand commands, a leadership team at Apple’s helm that worked for Jobs for many years and that Apple is arguably one of the most extraordinary business successes in history, I very much doubt Apple’s secrecy doctrine will change significantly with Jobs passing, especially in the nearer term.
Put another way, everything written about where Apple will take Siri—including this blog—must be considered highly speculative. Still, there is much historical, circumstantial, market and Siri platform information available that can help guide those looking to profit from Siri, which is what this blog addresses.
How it Works...A Correction
My original Siri primer blog indicated that that I still needed to confirm a few things regarding how Siri worked...as it turns out I did get something wrong. After the Siri protocol was cracked, it became clear that instead of Nuance’ speech-to-text engine somehow living on the iPhone 4S, what happens instead is that the handset records and compresses the voice file for processing by Nuance's speech-to-text engine at the Siri Backend. This makes a great deal of sense and further explains why simple commands that have been executed previously (e.g., “Call home”) can’t work unless the 4S is connected to the Internet.
Similarly, the Siri Backend uses Nuance’s text-to-speech engine and creates a voice response to each request, which is then sent to the 4S handset along with data and/or triggers that invoke services on the iPhone. Besides where speech recognition and responses are done, the How Siri Works section of the primer appears correct (note: I updated my Siri – A Primer blog to reflect this enhanced understanding).
The primer blog I posted provides a description of the Siri platform, which the following summarizes:
A few weeks after the iPhone 4S was released, a developer posted videos showing a compliant 4S (i.e., not “jail broken”) using “SiriProxy”. This hack allows him to use the 4S to control devices that support plug-ins. Contrasting SiriProxy to the above, the developer’s proxy server platform looks something like this:
SiriProxy works by spoofing the iPhone into thinking that it’s communicating with the Siri Backend when in fact the server is acting as an intermediary. SiriProxy passes requests to the Siri Backend to turn spoken requests into text but takes over when Siri sends back what was requested and it “sees” words or phrases it is looking for. Before sending a responses back to the 4S, SiriProxy acts upon key words it has been programmed to look for, invoking actions that control devices or access Internet services not currently supported by Apple.
While SiriProxy is a not a product or perhaps even a development opportunity in line with Apple’s various licenses, it does allow Siri to do useful things beyond Apple’s prescribed use cases. To me, SiriProxy serves as a valuable “proof of concept” that speaks to very real third party Siri platform opportunities.
What the SiriProxy postings fail to point out is that SiriProxy is essentially relying upon voice command technology. A SiriProxy server doesn’t understand what is being said, it just acts when it finds the right words or phrases. When requests are handled like this, there are very real risks to Apple’s Siri investment.
What makes Siri exceptional is its ability to understand natural language, maintain conversations, learn and deliver results that surprise and delight by being fresh, unexpected, entertaining and engaging. When a SiriProxy receives a word/phrase that it is has been programmed to act upon, the Siri Backend response is tossed—what users hear are responses programmed by the developer that setup the SiriProxy which could be very unSiri-like.
This noted SiriProxy effectively opens Siri up to custom handlers for different actions, something that iOS developers hope that Apple will eventually provide in a Siri SDK. Moreover, it demonstrates that Siri can be used to control local services in concert with the overall Siri platform.
Perhaps most important, SiriProxy demonstrates that local extensions to the Siri platform actually offer real value and this puts pressure on Apple to open Siri to third parties. The last thing that Apple wants is unbridled third parties diluting or screwing up the Siri brand/persona, subjecting Siri to doing things poorly or inappropriately or compromising the large investment Apple continues to make in the platform. Now that Apple knows developers can develop for the platform without an Siri SDK, I think that we can expect Apple to release some kind of third-party support far sooner than some industry wags have suggested.
Let’s say that Apple decides to open the Siri platform to third parties, how will they approach this? Gary Morganthaler, whose eponymous VC firm was the first investor in the company that Apple bought to take control of Siri, suggests that Apple needs to determine who will provide the service: third party developers that license Siri or Siri in the iCloud (“SITI”). I believe a third option exists—both.
I think that one of the important things that Apple is working on now is defining what Apple wants to control. But until Apple let’s us know what they want, all we can do is speculate. Given this, the following projects how Apple might approach licensing Siri, how they could allow third parties to develop for SITI and how Apple might allow third parties to develop iOS apps that integrate with Siri.
Part of the reason I profile SiriProxy above is because I believe that Apple has got to see this development is a “smoking gun,” one that shows that Apple can be disintermediated. Think about it—within weeks of the iPhone 4S hitting the market a few developers, building upon the work of Applidum (a French software company that reverse-engineered Siri’s protocol) post useful solutions that Apple would never develop. Wow.
Looking into SiriProxy got me wondering if the iPhone 4S user license includes language that prohibits licensees from using proxy servers. Searching the 4S license led to a somewhat startling discovery and an interesting epiphany.
The discovery is that Apple’s iPhone 4S license has bloated to 364 pages, fully 160 more pages than 2010’s iPhone 4 agreement and 205 pages more than the 3GS. Apple is clearly going to great lengths to protect its IP and cover its tail. A search for the word “proxy”, by the way, turned up nothing and this led to…
The epiphany—iPhone 4S users should not be in violation of their 4S license simply because they requested something that Siri doesn’t support. I am certain that this already happens many millions of times every day. And while I don’t have the slightest credential to weigh in on the legality of setting up a proxy server to process requests that Siri cannot handle, my gut tells me that if a proxy server passes through requests to Apple’s Siri Backend, then it wouldn’t violate any laws. It’s kind of similar to when you type in a URL that doesn’t exist and your ISP shoots you a “404” page loaded with ads that generate revenue for the ISP.
Now consider a few of the SiriProxy demos: thermostat, automobile starter, and advanced TV (Plex) controls and Internet access to a 40+ year old natural language computer therapist. Setting aside the Plex control, would Apple ever attempt to extend Siri to support controls/services like these? I think not. Bottom line, there are thousands of controls/services that developers will want to develop that Apple will never support.
The Plex control is a different animal as it is all but certain that Apple will introduce a line of HDTVs with Siri support. As well, I believe Apple will add Siri to Apple TV to generate revenues from the installed base of consumer televisions and media centers. Even given Apple’s likely release of Siri-enabled TV products, will the US government allow Apple to block third parties from using Siri to control non-Apple products? To me, if Apple works to exclude competitive products it could face the same antitrust challenges that dogged Microsoft since the 1990’s for using Windows to thwart competitors.
All this has led me conclude that Apple will indeed come up with (or attempt to at least) offer some sort of licensing program. The initial licenses will likely allow licensees the right to use Siri’s UI. I believe it reasonable that Apple should earn a license fee to support provisioning SiriProxy solutions, for confirming that they are compliant with the Siri platform and that the Siri brand is not compromised.
Everyone benefits from Apple’s stewardship of Siri provided that Apple is—or is forced to be—reasonable. I agree with Morganthaler that Apple has some work to do to figure out its licensing model and that they face some new challenges but I disagree that there is an extraordinary distance that Apple must go before Siri is ready for third party developers…clearly, the platform is pretty much ready today.
In the end, I believe that Apple will come up with licenses for specific use/local services that leverage Siri’s UI. This might be the easiest way for Apple to gain market and developer mind share as this approach avoids most of the very real challenges associated with allowing third parties to develop for SITI.
Siri in the iCloud (SITI)
What exactly does SITI mean? I think it means allowing third parties to somehow add to Siri’s core understanding and services. I also think it could mean, creating new or extending existing iOS apps so they can work with Siri once Apple allows Siri to interact with non-Apple iOS apps (see next section).
The Siri development team is reportedly one of the largest at Apple. I have got to believe that a meaningful number of these folk are tasked with figuring out how best to allow third parties to develop for Siri so that Apple can better handle competitive threats.
Allowing third parties to develop for SITI is no small undertaking. Apple must develop a pricing/revenue share model, manage provisioning, quality control, etc. It also requires Apple meter and charge for the use of Apple resources, handle resource scaling issues and perhaps most challenging, allow others to somehow add to Siri’s ability to understand, learn and access new resources in a way that maintains the constancy of Siri’s persona.
Developing for SITI will be dramatically different than developing standalone Xcode apps that are downloaded to mobile devices. Adding new services to SITI could require complicated natural language processing extensions, essentially programming knowledge into Siri’s AI-like brain, and in Siri’s proprietary way. Developing an SDK that gives third parties a way to add to Siri’s ability to understand, learn, access new resources and spit out responses in line with Siri’s personality is an extreme challenge. I think it could be some time before we see a "SITI SDK" that supports all this.
One of the more insightful blogs I read strongly posits that Apple will retain exclusive dominion over most of Siri’s core capabilities (referred to as Siri’s “Intent engine,” which figures out what to do based on what has been requested) and that third party developers should focus on aggregating data and services and build APIs that Siri can access. With this belief, developing for SITI means building the best APIs to data and services. While this is perhaps a prudent view given the challenges associated with allowing third parties to “mess with Siri’s brain,” I think that Apple will reap much larger rewards if they can predictably allow third parties to extend Siri’s core capabilities. Whether or not this is tenable is open to debate.
Until some sort of SITI SDK is available, what are the opportunities then? I think that Apple will add core services similar to Google, Yelp, Wolfram|Alpha, etc. For example, social media (Facebook, Twitter, etc.) is clearly lacking and a logical place for them to start. For these types of opportunities I don’t see Apple allowing others to do this for them. Instead, I think Apple will take responsibility for extending Siri if for no other reason than Android’s Voice Actions already supports limited social media interactions and Apple cannot let allow Android to gain a large competitive advantage.
Beyond core services additions with large partners, I see Apple assisting Fortune 500 companies with creating Siri interfaces to cloud services that Apple is unlikely to pursue on its own. To better understand what I mean by this, let’s look at an example—Dell Inc.’s technical support call center.
While Dell is a very large company whose customers number in the tens of millions they serve a closed population; i.e., they do not serve non-Dell customers so this could hardly be a core Siri service. Today, tech support calls are answered by an interactive voice response system and then routed to Dell’s hundreds or thousands of phone support personal. When a rep takes a call, they ask the caller a number of questions and, based on their answers, punch up likely solutions using Dell’s knowledge base. If this doesn’t resolve their issue, the caller is routed to level-two support.
I believe that Siri could be extended to handle Dell’s level-one support—the backend infrastructure needed is largely there. People would prefer Siri as it takes a lot less energy to deal with a machine than a human…think: ATM. Also, most customers dislike offshore support so if Siri resolves an acceptable number of level-one calls, I think that these kinds of Siri solutions would be a big winner.
Dell likely spends many tens of millions a year on support calls and I would guess if Dell could cut its cost meaningfully in favor of Siri-based level-one service then Apple would only be too happy to support them for a nice-sized recurring revenue stream. There exist many dozens of solutions like this just waiting to be “SITIized”. These are big dollar opportunities for Apple and these companies are likely willing to make up-front investments as their ongoing savings readily justifies the initial cost. Unless and until Apple publishes a SITI SDK, I see Apple offering some sort of professional services toolkit where top tier developers work independently but in concert with Apple to enable these kinds of solutions.
Over time Apple might move to a more inclusive SITI approach, similar to the way that they support adding apps to the App Store. To do this Apple will need an extensive and competent developer platform that enforces quality, consistency and prevents something done by a third party from impacting others. Given Siri’s AI-like natural language learning platform, this will be extremely difficult and it will likely take Apple a long time to develop, if Apple ever decides to go this way.
iOS Apps & Siri
Siri on the iPhone 4S is responsible for a small number of things: recording and playing voice files, communicating with the Siri Backend and displaying results obtained from the backend. iOS 5 with Siri also interacts with Apple apps (e.g., maps and location services) and retrieves data on the iPhone (e.g., contacts)—clear indications that Siri can work with iOS apps.
Should Apple release a “Siri App SDK” there would doubtless thousands of developers clamoring to hop on board. The challenge with offering developers the means to add voice interfaces to iOS apps is the same issue with developing for SITI…Siri could need to be “taught” what the app does. This raises at least some of the same challenges as those noted in the previous section. Again, an example might make this clear.
Let’s consider a workgroup scheduler, something that supports groups of people that either lack a common calendaring solution like Outlook or want to add a voice interface to a common calendar application. This app would allow group members to share schedules and setup meetings within the group.
Likely a web app of some sort would be needed to handle application and user administration, security and other features such as integration with other calendaring program. The iOS app would access cloud-based and local admin and user data (i.e., contacts), handle access to the group members’ calendars, understand which contacts belong to the overall group, are members of subgroups, etc., and, importantly, interact with Siri.
Now look at what could be a killer Siri supported function—allowing each member of the group to use Siri to setup, accept, change and decline appointments for the group, subgroups and one or more individual members. Consider a few requests that users might make:
For Siri to handle this, the third party developer or Apple will need to “teach” Siri about users' relationship to groups, subgroups and members. Beyond this, Siri already understands calendars, setting up appointments, resolving conflicts, etc. Clearly there are other variables to consider but with Siri, solutions like this offer extraordinary productivity enhancement opportunities.
Once opened to iOS apps, Siri holds the potential to serve an extraordinary broad number of markets. Consider the handicapped, infirm, geriatric and infant…they may not be able to gesture but they can speak. I can envision hundreds of basic productivity extensions to business apps. Security can be cost effectively revolutionized with biometric voice extensions and so many more.
After 50 years of sci-fi movies where folk talked to computers that understood them and performed tasks—that time is finally here. I am convinced that Siri represents a crossing of a threshold and it’s only just the beginning.
Aside from SiriProxy, third parties are forced to guess what to do until Apple tells developers where they will allow them to “play”. With so much potential to drive new Apple revenues, it should be clear why Apple’s Siri development team is so large as they see Siri’s potential better than anyone. Let's hope that they are hard at work figuring out how to involve the third party development community.
Apple faces some daunting challenges opening Siri up to developers. But the Apple community is used to being controlled and has shown faith that Apple will reward their patience. SiriProxy opens the door but only Apple can swing it wide open. Until then, third parties can only develop SiriProxy solutions, speculative Siri APIs or for what I believe are inferior platforms but with an eye towards Siri.