Api.ai is a conversational user experience platform, recently acquired by Google. It uses natural language processing and machine learning algorithms to extract entities and actions from text. The best thing is that it has a web application, through which you can train your intents with custom sentences and based on that, get a JSON response with the recognized data. This brings a whole new set of opportunities for developers, since natural language processing and machine learning are not trivial tasks – it requires a lot of expertise and research in this area to get it right. On top of that, the service is currently free for developers. As we will see, api.ai offers a lot of powerful features and it’s definitely worth a look.
In this post, we will extend the grocery list app we were developing in Playing with Speech and Text to speech with synthesizers, so make sure to check those two posts first. One thing we did very naively in those two posts was the extraction of the words in a sentence – it was done by plain string matching with hardcoded predefined words in our app. It didn’t take in consideration the context in which the key words were spoken. For example, if you said something like “I don’t need chicken anymore”, it will still add chicken to the list, although it’s clear that we have to remove it. Let’s solve this and put some intelligence in our app by using api.ai!
First, let’s see how to get started with api.ai. There’s a great video on YouTube from Google that explains everything you need to be up and running. First, you need to sign up with a Google account and then you need to create an agent. I’ve already created one, called GroceryList:
An agent is a container for a group of actions that you build with intents. You can define multiple intents for an action. An intent, is simply put, what the user says. It can contain entities, which are the objects you want to be recognized in the sentence. You can define your own entities or use on of the ones provided by the system. Chicken is an entity of our type Product in the sentence above. It might look a bit abstract at the moment, but it will all make sense when we look at the examples. The developer documentation can also be helpful in better understanding these concepts.
Now let’s create the product entity and add some values there. One cool thing is that you can also define synonyms of the words you’ve provided. After you create the entity, you can always update the list with new values and that’s already a lot better than our current implementation:
Now let’s add intents. In our use-case, the user can have two intents – either add something to the list or remove it. We will create two separate intents for these, but we will also have in mind cases where the user had two intents in one sentence – to both add and remove something.
First, we will create the AddProduct intent. We will add product.add action there and create two parameters for it. As we’ve mentioned above, we would like to handle two possible intents in a sentence and that’s why we need the two parameters. The first one is called AddProduct, it’s an entity of the type Product that we’ve just created. It’s mandatory (there has to be a product in an add product intent) and it can be also a list (the users can provide as many values as they want in a sentence). We can also define a prompt, this is like an additional question that the platform will return in case a mandatory value is missing.
The second parameter that we’ve defined is the RemoveProduct, which is also an entity of type product and it can be a list. This one is optional for the Add action, the users don’t have to remove something if they want to add products to the list. However, this parameter will be mandatory for the product.remove action.
You can also define a text response to your action, this is useful for testing as we will see and it’s also included in the json response if you want to show it in your apps.
Now let’s add some sentences to the intent. These are samples of what the user might say and with every new entry, the model is re-trained and more precise when it encounters a new sentence for it. When you add sentences, you train the model by specifying which part of the sentence is which parameter. Over time, the model will find those parameters by itself, which is pretty cool and gives a glimpse of the huge power of machine learning. The AddProduct parameters are labeled with orange (might be also light brownish?) color and the RemoveProduct parameters are yellow.
Note the types of sentences we’ve added. They are all different and look like real sentences that the user might say. The idea with these platforms is to enable free flowing conversation with the users, not just a set of commands that the user must say and the machine will try to match that with some strictly defined format. Users are not machines, and they can easily forget to say part of such a command or maybe modify words and the order of the sentence. An intelligent system should be able to handle that and this is exactly what api.ai offers. The more examples you add, the more precise will be the response you get from api.ai.
Let’s test this with a new sentence and see what response we will get. The example is “Hmm.. I think I want something to eat, maybe some meat.” and there’s nothing similar in the provided samples above. Here’s the response:
It correctly put meat in the group of added products, which is pretty awesome. You can try this out with few more new examples. If some of them are wrong, you can manually label the parameters in the sentence, re-train the model (that’s a fancy way of saying click save) and try again. Then you will see that the text is correctly handled.
Similarly to the AddProduct intent, we will create the RemoveProduct intent and write some examples there:
You can also define Context, which as its name implies, can be used to get an idea of the context the user is in when doing the request. This can be useful if you handle several requests in a row and you need to keep track of what was spoken before. For example, if the user says “Play me a U2 song” and after that “Play another one”, api.ai using contexts can infer that the next song should also be from U2. More information about contexts here.
Another cool feature of api.ai is webhooks (Fulfillment section). Webhook integration allows you to pass the extracted information from a phrase into a web service and get a result from it. For example, if your agent provides transport information with bus/train departures, you can attach a webhook to a service that finds routes based on the from/to location extracted from the api.ai service. You can solve the whole flow with only one request to the api.ai service. Otherwise, you would have to handle the response from api.ai and then send it to a routing service. If you have a web, iOS and Android app, that’s another additional implementation per platform. With webhooks, you don’t need that.
Now let’s go back to our grocery list app. How can we integrate everything we discussed so far in our app? Api.ai also provides native integrations for iOS and Android. You can, of course, directly connect to the REST API, but using the SDKs is the faster approach. You can see all the available SDKs here.
The iOS SDK can be found on GitHub. We will integrate it as a CocoaPod, so go ahead and create new Podfile with the following contents:
target :SpeechPlayground do
Run ‘pod install’, open the generated .xcworkspace file and the SDK should be there. We will create a wrapper of the ApiAI SDK in a new class called ApiAIService. This class will do the communication with the ApiAI SDK and will return an ApiAIResponse struct, which will contain two arrays representing the added and removed products:
We are creating an instance of the ApiAI class, which will do the communication with the REST service of the platform. We need to provide clientAccessToken to the SDK, so please replace the placeholder value in that constant with your access token. We are also defining two closures (SuccesfullApiAIResponseBlock and FailureApiAIResponseBlock), which will be used as callbacks in the extractProducts(fromText:success:failure) method, which is the most important method in this class.
This method takes the text we’ve already recognized using the Speech framework from the previous post and sends it to api.ai for analysis. If the request is successful, we are taking the needed values from the JSON response and create a ApiAIResponse, which we are sending back in the success handler. In any other case, we are just returning error.
Now let’s see the JSON we get from api.ai:
There is a lot of interesting information here. For example, you can see how accurate is the resolution of the query with the score property. You can see information about contexts, whether webhooks are used, what’s the fulfillment message and everything else we’ve discussed above.
What we are really interested in is the parameters secion, we want to know what’s inside the “AddProduct” and “RemoveProduct” lists, so we can put them in the ApiAIResponse struct. In order to do this, we will add a new method:
Nothing special here, just going through the response dictionary until we get to the parameters section. Now that we are done with the service, let’s go back to the grocery list ViewController.
Since now we will have proper handling of the transcripted text, it’s time to get rid of some improvisations we did. Delete the SpeechHelper class and all it’s related arrays, like removalWords, sessionProducts, deletedProducts. The code will now be cleaner and more robust, with only one array that keeps the products displayed in the list (addedProducts). This will also simplify a lot our startRecording method, we just need to re-start the timer there:
With our new implementation, we are now extracting the products whenever we are finished with the recording. Apart from being cleaner, this is also faster than always iterating through the segments of the recognized text on the go. Let’s have a look at the extractProducts(fromText:) method.
It calls the ApiAIService to get the products that need to be added and removed. The merging logic is similar to what we’ve had before.
That now completes our grocery list app. You can say something more fancy, like this:
Hope you enjoyed the grocery list posts. You can find the implementation with ApiAI on GitHub.