Ever tried out Google Lens and saw the app just tell you everything the camera sees?

That’s exactly what Image Labelling is. You pass in an image and in return you get a list of  FirebaseVisionCloudLabel each of which contains the label, the confidence (how sure ML Kit is of its correctness), and the entityID which references to the entity’s id in Google’s Knowledge Graph.

If this is your first time seeing Firebase ML Kit, I recommend you check out my introduction on it to get a nice little overview of the whole thing.

The process of running this in your app’s code is fairly simple, and it’s very similar to the other ML Kit operations.

Add the Dependencies and Metadata

Add these dependencies to your app-level  build.gradle file.
Then add this meta-data tag to your  AndroidManifest.xml file. This ensures the ML Kit data model is downloaded along with your app, otherwise the model will be downloaded when the operation is run for the first time which tends to slow it down.

Setting the Confidence Level

By using a  FirebaseVisionLabelDetectorOptions you can set the minimum confidence level needed for an entity to be returned. By default, this value is 0.5.

Get the FirebaseVisionImage

The first step to most ML Kit operations is to get a FirebaseVisionImage which you can get from a bitmapmedia.ImageByteBufferbyte[], or a file on the device.

From Bitmap

Your image must be upright for this to work. This would normally be the simplest way to get a FirebaseVisionImage.

From media.Image

Such as when taking a photo using your device’s camera. You’ll need to get the angle by which the image must be rotated to be turned upright, given the device’s orientation while taking a photo, and calculate that against the default camera orientation of the device (which is 90 on most devices, but can be different for other devices).

Long method do make all those calculations, but it’s pretty copy-pastable. Then you can pass in the mediaImage and the rotation to generate your FirebaseVisionImage.

From ByteBuffer

You’ll need the above (from media.Image) method to get the rotation, then use this method to build the FirebaseVisionImage with the metadata of your image.

From File

Simple to present here in one line, but you’ll be wrapping this in a try-catch block.

Instantiate a FirebaseVisionLabelDetector

The detector contains the detectInImage method which is the main function of the process. Pass in your options to set the confidence level, or leave the parameters empty if you’re fine with the default value.

Call detectInImage

Use the detector, call detectInImage, add success and failure listeners, and in the success method you have access to a list of the FirebaseVisionLabels from which you can get information. The code above says it all really.

Get the Information from each label

Each FirebaseVisionLabel in the list represents a different entity detected in the image. These are the 3 pieces of information you can get from each.


I’ll be honest, this is the one part of ML Kit where I feel like it could have a little more. How about if we could get the coordinates of each entity (like we do in every other ML Kit operation), we could do some pretty sick stuff with that.

If you want to learn more about ML Kit operations like Text Recognition, Face Detection, and Barcode Scanning, check out the rest of my ML Kit series.