Ever tried out Google Lens and saw the app just tell you everything the camera sees?
That’s exactly what Image Labelling is. You pass in an image and in return you get a list of FirebaseVisionCloudLabel each of which contains the label, the confidence (how sure ML Kit is of its correctness), and the entityID which references to the entity’s id in Google’s Knowledge Graph.
If this is your first time seeing Firebase ML Kit, I recommend you check out my introduction on it to get a nice little overview of the whole thing.
The process of running this in your app’s code is fairly simple, and it’s very similar to the other ML Kit operations.
Add the Dependencies and Metadata
1 2 |
implementation 'com.google.firebase:firebase-ml-vision:16.0.0' implementation 'com.google.firebase:firebase-ml-vision-image-label-model:15.0.0' |
1 2 3 4 5 6 7 |
<application ...> ... <meta-data android:name="com.google.firebase.ml.vision.DEPENDENCIES" android:value="label" /> <!-- To use multiple models: android:value="label,model2,model3" --> </application> |
Setting the Confidence Level
1 2 3 4 |
FirebaseVisionLabelDetectorOptions options = new FirebaseVisionLabelDetectorOptions.Builder() .setConfidenceThreshold(0.8f) .build(); |
Get the FirebaseVisionImage
The first step to most ML Kit operations is to get a FirebaseVisionImage which you can get from a bitmap, media.Image, ByteBuffer, byte[], or a file on the device.
From Bitmap
1 |
FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap); |
From media.Image
Such as when taking a photo using your device’s camera. You’ll need to get the angle by which the image must be rotated to be turned upright, given the device’s orientation while taking a photo, and calculate that against the default camera orientation of the device (which is 90 on most devices, but can be different for other devices).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
private static final SparseIntArray ORIENTATIONS = new SparseIntArray(); static { ORIENTATIONS.append(Surface.ROTATION_0, 90); ORIENTATIONS.append(Surface.ROTATION_90, 0); ORIENTATIONS.append(Surface.ROTATION_180, 270); ORIENTATIONS.append(Surface.ROTATION_270, 180); } private int getRotationCompensation(String cameraId) throws CameraAccessException { int deviceRotation = getWindowManager().getDefaultDisplay().getRotation(); int rotationCompensation = ORIENTATIONS.get(deviceRotation); CameraManager cameraManager = (CameraManager) getSystemService(CAMERA_SERVICE); int sensorOrientation = cameraManager .getCameraCharacteristics(cameraId) .get(CameraCharacteristics.SENSOR_ORIENTATION); rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360; // Return the corresponding FirebaseVisionImageMetadata rotation value. int result; switch (rotationCompensation) { case 0: result = FirebaseVisionImageMetadata.ROTATION_0; break; case 90: result = FirebaseVisionImageMetadata.ROTATION_90; break; case 180: result = FirebaseVisionImageMetadata.ROTATION_180; break; case 270: result = FirebaseVisionImageMetadata.ROTATION_270; break; default: result = FirebaseVisionImageMetadata.ROTATION_0; Log.e(LOG_TAG, "Bad rotation value: " + rotationCompensation); } return result; } private void someOtherMethod() { int rotation = getRotationCompensation(cameraId); FirebaseVisionImage image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation); } |
Long method do make all those calculations, but it’s pretty copy-pastable. Then you can pass in the mediaImage and the rotation to generate your FirebaseVisionImage.
From ByteBuffer
1 2 3 4 5 6 7 8 |
FirebaseVisionImageMetadata metadata = new FirebaseVisionImageMetadata.Builder() .setWidth(1280) .setHeight(720) .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21) .setRotation(rotation) .build(); FirebaseVisionImage image = FirebaseVisionImage.fromByteBuffer(buffer, metadata); |
From File
1 |
FirebaseVisionImage image = FirebaseVisionImage.fromFilePath(context, uri); |
Instantiate a FirebaseVisionLabelDetector
1 2 |
FirebaseVisionLabelDetector detector = FirebaseVision.getInstance() .getVisionLabelDetector(options); |
Call detectInImage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Task<List<FirebaseVisionLabel>> result = detector.detectInImage(image) .addOnSuccessListener( new OnSuccessListener<List<FirebaseVisionLabel>>() { @Override public void onSuccess(List<FirebaseVisionLabel> labels) { // Task completed successfully // ... } }) .addOnFailureListener( new OnFailureListener() { @Override public void onFailure(@NonNull Exception e) { // Task failed with an exception // ... } }); |
Get the Information from each label
1 2 3 4 5 |
for (FirebaseVisionLabel label: labels) { String text = label.getLabel(); String entityId = label.getEntityId(); float confidence = label.getConfidence(); } |
Conclusion
I’ll be honest, this is the one part of ML Kit where I feel like it could have a little more. How about if we could get the coordinates of each entity (like we do in every other ML Kit operation), we could do some pretty sick stuff with that.
If you want to learn more about ML Kit operations like Text Recognition, Face Detection, and Barcode Scanning, check out the rest of my ML Kit series.