As we approach WWDC 2016, a lot of posts have been floating around that imply Siri will have an official API in just a few short days. Given it will be almost half a decade since Siri was introduced and an API was speculated, its probably not inappropriate to give this one the tag of 'finally.'
A few years back, my employer wanted to know how we could incorporate Siri-like capabilities in our app. It was a great idea, and one that our competitors eventually did implement, but at the time we considered it, the only real viable platform was from Nuance and the economics behind it were untenable without a major infusion of cash from marketing; something they were unwilling to do. Hey, I tried!
Ever since then, as Apple's major competitors have announced their own voice assistant APIs, I've wondered why it was that Apple hadn't kept their first-mover advantage and gone down that route as well. Sure, they had other things keeping them busy, I'm sure, but it was possible and routinely requested, so why hadn't they done it?
I think the biggest reason I can see is the security angle. Apple's competition relies a great deal on server-side execution and logic with a great deal of personalization based on the user's data, but Apple has historically not kept this kind of data unencrypted on their servers for security reasons. As a loyal customer of theirs with nothing to hide, I'm very grateful to them for not keeping my data in that way.
If you don't have the data accessible to the server, you're really looking at pushing that logic down to the device. This sounds ok, but there are a couple things that make it a lot trickier than you'd think on the surface.
On the server, you've got massive CPU, RAM and disk resources; they're effectively limitless (within economic constraints, of course). You don't need to worry about how many processor cycles you're burning; you just churn the data and push some results back down to the device. But when you move this processing down to the device, you're suddenly pushing voice data up to the server and textual data back to the phone, you then have to process that text data into meaning on the device. Yes, non-personalized data could be appended when in the server, but you then have to think about how you go about anonymizing and purging that data from the logs. Not impossible, but more moving pieces.
There is also the issue of the speed at which you can add new features. On the server side, you have a large farm of servers to deploy code to, but you control the environments, whereas on a user's device, you do not. Apple has historically added new device features only at major and sometimes minor versions, but there are only a few of those every year. Your responsiveness in adding new features is now tied to your phone version and that's something you can't just push whenever you feel like it. (Or, rather, you can, if you're willing to change your entire release strategy, but Apple hasn't so far shown its willing to do that.) Yes, I'm sure there is a way to make changes outside of a version (Apple has done this before) by pushing configuration files, but that's got issues all of its own when it comes to support and reliability.
But what would it do?
There are two categories of things I can see a Siri API doing; one of which would break down into two subcategories. Lets take the full category first: Information.
You have a question and you need information. Who 'ya gonna' call? Siri-buster! (Yes, even I'm appalled I just typed that.) Yet, one of the biggest reasons that you would use an API is to find data. Siri already integrates to 3rd party sources like Wolfram Alpha when it can't find the answer in its own internal memory, but it would be great to unlock the knowledge in custom data storage through voice activated assistants.
Lets say you work at a pharmaceutical company and you've got a lot of proprietary data that you want your drug reps to be able to search through. Since these users are mostly mobile while they visit doctors, having them type to search a database is probably not optimal as they are often in a moving vehicle. Also, you don't want Google indexing this data on its server because, well, proprietary data! So what do you do? You make an app, load your data in it, then expose that data to the Siri API. All the info you want is contained within your own data store, so if you remove the app (hello, mobile device management solutions!) from a now ex-employee's app, the data goes away, too. But as long as the app is on the device, the data is safe and sound.
Looking up information is great, but we can go further. This leads us to the next step, making things happen. Hello, Interactions! We're going to break this down into a couple sub-categories so lets talk about virtual interactions first.
Now lets assume that you work for a company that has created a task management app. You want to allow your customers to add tasks to perform using voice actions. Today, there are ways to do this using some very hacky solutions involving Apple's Reminder app and email, but a Siri API would make this a lot easier.
But the virtual world, as much as its become a dominant part of all our lives, still is mostly subservient to the physical world. That's where the second half of Interaction comes into play: physical!
This is a use case that's also been well defined. Want to turn on the lights in your house? Ask Siri. Want to call your car? Tell Siri to call your Tesla. Need a lift? Siri can snag you an Uber. These type of interactions are going to increase, especially once a Siri integration makes them significantly easier.
I've walked through a few situations where a Siri API would be useful, but the truth is, there are other types of uses out there that we won't see until Siri does have this long overdue API. This is the first step in unlocking that future. But note! Just like so many Apple technologies before it, we're likely to see some missteps as well.
App Store Rejections
Remember when P-Calc was rejected from the App Store for including a calculator widget in the Today View? The creator ran afoul of Apple's vague guidelines to not put actionable items in the view; it was information only! The outcry from developers was loud and swift, helping Apple to realize they really ought to reverse that decision. I see these rapids ahead when it comes to the Siri API as well.
Apple, like most tech companies, tends to iterate on its APIs, starting out with a few simple use cases and broadening to new areas over time. There will be developers, just like P-calc did, that push the boundaries of what Apple intended, causing some heart-burn to occur in the developer community and likely within Apple itself. This will happen. People will get bent out of shape. Apple will adjust and reach a new equilibrium, not where they started out, but likely not as far open as developers want, either.
Wait for it; its coming.
How will it work?
I linked to Erica Sadun at the top of this post and when it comes to technical details, she's unmatched, so I'm not going to talk about that level of implementation. What I am going to (wildly) speculate on is how, from a functional standpoint, I see this happening.
Four words: Verb, Noun, Message and App (if needed).
A Verb is an action and when you're calling the Siri API, you're making an assumption that it needs to do something. "Create", "Open", "Play", or "Tell", they're all Verbs and they're how you'll start out your command to Siri.
Next, you'll need to tell it more about the action; this is your noun. "A Note", "My Recipe File", "Dave Matthew's Band", or "My Wife" are all ways to think about the noun. Its the "who" in the command.
Third would be the message. Think of this as a lose set of instructions that only the App (more on that in a moment) will need to figure out on its own. This would be things like "about this great idea I have to...", "and read the instructions for making biscuits", "Song That Jane Likes", or "to stop nagging me" (all respectively).
The last, possibly unnecessary element, the App, is really only needed if you've got multiple apps installed that have the same Verb/Noun combination. If you've got two music apps installed, now you've got an issue because they're likely going to both "Play" and have access to the same music library, so the Nouns are going to be in common as well. If you don't specify an App, expect Apple's apps to be the default or Apple to make the decision for you. If you think there will be some kind of settings app to allow you to pick which app responds first to certain key phrases, you're obviously not thinking about the Apple we all know and love. That isn't magical enough.
My caveat to all this speculation is to say that there is a real possibility that Apple will only come out of the gate with a few, limited Verb/Noun pairs out of the gate and none of them will be allowed to overlap with their own default apps. Yes, this would put a big limitation on the usefulness of a Siri API, but it is Apple's playground, not yours. Its also not the first time they've put big limitations on new features out of the gate. They did it with apps running in the background and they will most definitely do it again in the future.
Speak now or forever hold your peace
Is any of this a guarantee? Not a chance. Maybe the Siri API isn't ready yet or maybe Apple withholds it until the September launch of the new phones. Maybe it ends up a 2017 project or maybe it never launches at all. All is still possible until the lights go back up at WWDC on June 13. Until then, we'll just have to continue to speculate. This time of year definitely is not boring.