Here’s the second part of the ML Kit series and its going to be Face Detection! You pass in an image and you can get the coordinates of each face’s eyes, ears, etc. and recognise facial expression like people’s sweet smiles!

Or you can pass in a video then track and manipulate people’s faces in real-time (in the video of course, real life will have to wait). We won’t dive too much into the video part of it yet (I’ll try to get this updated ASAP to include it) so we’ll be focusing on image face detection for now.

If this is the first time you’ve heard about the Firebase ML Kit, check out its introduction here.

Add the Dependencies and Metadata

As with any other Firebase Service, we’ll start by importing this dependency which is the same one used for all the ML Kit features.
Although this is optional, it’s highly recommended to at this to your AndroidManifest.xml as well. Doing so will have the machine learning model downloaded along with your app in the Play Store. Otherwise, the model will be downloaded during the first ML request you make, at which point, you can’t get any results from ML operations before the model is downloaded.

Configuring Face Detection Settings

There’s a few settings you might want to configure based on your app’s needs. I’m just going to rip this table straight off of the official docs.

Detection mode FAST_MODE (default) | DEFAULT_MODE

Favor speed or accuracy when detecting faces.

Detect landmarks NO_LANDMARKS (default) | ALL_LANDMARKS

Whether or not to attempt to identify facial “landmarks”: eyes, ears, nose, cheeks, mouth.


Whether or not to classify faces into categories such as “smiling”, and “eyes open”.

Minimum face size FLOAT (default: 0.1f)

The minimum size, relative to the image, of faces to detect.

Enable face tracking false (default) | true

Whether or not to assign faces an ID, which can be used to track faces across images.

And here’s an example also ripped straight off of the official docs

Create the FirebaseVisionImage

Here’s where my interpretation of the article starts differing from the official docs, although this first step is identical to that of Text Recognition, so if you’ve read how to create a FirebaseVisionImage from there, this step will be EXACTLY the same as the one there.

This object will prepare the image for ML Kit processing. You can make a FirebaseVisionImage from a bitmap, media.Image, ByteBuffer, byte array, or a file on the device.

From Bitmap

The simplest way to do it. The above code will work as long as your image is upright.

From media.Image

Such as when taking a photo using your device’s camera. You’ll need to get the angle by which the image must be rotated to be turned upright, given the device’s orientation while taking a photo, and calculate that against the default camera orientation of the device (which is 90 on most devices, but can be different for other devices).

Long method do make all those calculations, but it’s pretty copy-pastable. Then you can pass in the mediaImage and the rotation to generate your FirebaseVisionImage.

From ByteBuffer

You’ll need the above (from media.Image) rotation method as well, on top of having to build the FirebaseVisionImage with the metadata of your image.

From File

Simple to present here in one line, but you’ll be wrapping this in a try-catch block.

Instantiate a FirebaseVisionFaceDetector

The actual face recognition method belongs to this object.

Call detectInImage

Use the detector, call detectInImage, pass in the image, add success and failure listeners, the success listener gives you a list of FirebaseVisionFaces. The code above says it all really.

What you can do with each FirebaseVisionFace

Here’s where the fun begins… All the following code assumes you loop through the FirebaseVisionFaces and are currently handling an object called  face

Get Face coordinates and rotation

Get Facial Landmark Positions (Requires Landmark Detection enabled)

Identify Facial Expressions (Requires Face Classification enabled)

Get Face Tracking ID (Requires Face Tracking enabled)