At the Google I/O conference, Google has announced an exciting new framework for machine learning, called ML Kit. This is another sign that machine learning and artificial intelligence will be everywhere and will shape the future of our everyday lives. Unlike Apple, Google has made this framework cross-platform, which means we can use it for both iOS and Android apps.
In this post, we will build an app that will detect faces on a picture and will determine whether the people on the picture are smiling. We will also check if their eyes are opened.
How the app works
Introducing ML Kit
You can find the documentation for ML Kit on Firebase here. Similarly to Apple’s Core ML, MLKit releases the burden of creating and training complex machine learning models from the developers. This means that you can focus on the business aspects of your apps, without having to know much about machine learning.
ML Kit can run locally on the device, as well as on the Firebase cloud. The benefits of running it on the device are that it doesn’t require internet connection and it’s also free. If you want to run ML Kit on the cloud, you will have to pay fees after some initial limited free usage. But that also means that the services will be running on a much bigger data, updated continuously, which means better quality.
ML Kit currently supports several machine learning features, that you can directly integrate into your apps. First, you can do text recognition, something that I have already done with Core ML. Next, you can do barcode scanning, image labelling (determining what’s on an image), landmark recognition, as well as face detection. As you might have already guessed it, we will use the face detection feature in this post. If none of these work for you, you can also integrate custom model to the Firebase service, but that requires some machine learning expertise.
The face detection feature can do several cool things. First it can recognize and locate facial features, like the coordinates of the eyes, ears, nose and so on, similar to Apple’s Vision framework. This is handy if you want to build apps that add beard, glasses or something else to the face. Then, you can determine if a person is smiling or has their eyes closed, which is what we need for our app. You can also track faces through video frames, which is very handy if you want to build fancy chat application with video calls.
Setting up Firebase
ML Kit is part of the Firebase services, so to get started, you need to create a project on the Firebase console with your Google account.
After the project is created, follow the steps to add Firebase to your iOS app.
During the process, you will need to download GoogleService-Info.plist file, which contains information about the Firebase services you use in your app. This file needs to be added to your project. You can always download an updated version of this file in the Settings page of your Firebase console.
Now, let’s switch to the iOS implementation. The UI of the app is pretty basic, just a button to select an image, an image view displaying the selected image, as well as a label that will show what’s detected on the image.
Next, we will add Firebase with all the required Vision pods to our project. Create a Podfile with the following content and install the pods.
platform :ios, ‘9.0’
target ‘SmileMLKit’ do
After you have added the GoogleService-Info.plist and installed the pods, you need to configure Firebase for your app. To do this, you need to call a method called (you’ve guessed it) configure.
Since we will be using the camera, don’t forget to add the required “Privacy – Camera Usage Description” to your Info.plist file. We can now focus on coding.
When the user taps the “Select image” button, we need to present the standard iOS image picker (boilerplate code ahead warning).
When an image is selected, we are presenting it in the image view and we are calling the faceDetection method, where all the magic happens.
Before we see this method, let’s see how to create the face detector, which will do all the work. To create this face detector, we are using the Vision API from Firebase and the method faceDetector(options:).
What options are available there? First, we can do some tradeoffs between the speed of the detection and the accuracy, with the modeType. If you want something more accurate, this will require looking over some bigger data set, which will also make it slower. But, as you might have noticed in the video above, it is still very fast with the .accurate option. The landmarkType determines whether the parts of the face should be returned. We will not be using this, so you can put anything there. The classificationType must be set to .all, since this is the one that does the smiling and eye detection.
The minFaceSize is the smallest desired face size. The size is expressed as a proportion of the width of the head to the image width. In our case, the smallest face to search for is roughly 10% of the width of the image being processed. We don’t need tracking in our case, so we will set the isTrackingEnabled flag to false.
Now, we can finally see the faceDetection method. The image that was captured with the imagePicker is used to create a VisionImage, a wrapper from the ML Kit, which the faceDetector expects in order for it to be able to perform the detection. In the result handler, we receive either an error, or an array of detected faces.
If there’s an error, we show an alert. Otherwise, we get the face states and update the label with those states.
But what is a FaceState? It’s our custom struct, which will keep three booleans determining the state of the face (smiling, left and right eye opened).
With the faceStates(forDetectedFaces:) method, we are extracting the required information from ML Kit and fill our struct.
The detected faces are of type VisionFace. We go through these faces and we check if there’s smiling, left and right eye probability. If there’s is, we check whether it’s over some threshold (in our case, it’s 75%, but feel free to play with it). When we determine all the boolean values, we are creating our struct and appending it to the face states array.
The easy part is then displaying this information in the label, based on the face states. Based on these values, we are building the string that’s going to be presented on the screen.
That’s pretty much everything that we need to do. Run the app, test it with some pictures and analyse the results.
Let’s try with an easy one. You can see that it provides the correct answer.
Next, let’s see add a Bono to the app, to see how it reacts on people with sunglasses.
As you can see, his eyes were detected as closed, since it’s impossible to know behind those black glasses. By the way, the order of detection is based on who’s closer to the camera, not left to right.
Let’s test now with more people, with some chaps in the UK. All smiling, with wide eyes open.
A more complex image might be a suited up one, where different people feel different about wearing that suit. Some of them are smiling, some of them are not. Again, right to left, since they are closer to the camera.
As you can see, the results are pretty good. But, if you test it with selfie, where the face is in the focus, it doesn’t detect anything, which is a bit strange, but probably their training set is not consisted of selfies for the smiling part.
In any case, you can see that it’s pretty easy to do this face detection. We haven’t even trained a model, we were just focused on the Firebase setup and the integration in the mobile app.
Also, have in mind that ML Kit is (at the day of writing) only 3 days old and the framework will change and improve in the future.
What do you think about ML Kit? Will you prefer it over Core ML? What about machine learning and AI in general and their integration in mobile apps? Write your thoughts in the comment section and keep on smiling! 🙂
You can find the complete source code for this app here.