TFJS image classification model maker using Google Colab and Google Drive
The premise of this approach to is create your model files that is very simple, straightforward, cheap, and easy to do. This simple Python script will output model.json, the binary weight (group1-shard1of2.bin, group1-shard1of3.bin…), and labels.txt files. These files can be used and loaded in a TFJS(Tensorflow for Javascript) app to test and infer image classification. You can look at the documentation from the Google Tensorflow JS Team for further information about this.
Both graph or keras models can be created or converted from a SavedModel format; which is also the first part of this model maker conversion script.
TFJS app can be a web app, a web-based desktop app (this can be done using ElectronJS), or a web-based mobile app. I created an Ionic mobile app to test image classification using your Android or iOS device or even using your desktop web browser. You should enable your webcam if you use the latter. On any tfjs-converter or model maker by Google, the creation of labels.txt is not really included in the process. I just added that at the end of the script so that you won’t need to programmatically create the model classes in a TFJS source code.
Model classes are just the name or kind of images you are training for. For example, the classes for a flower model can be daisy, dandelion, roses, tulips, and sunflowers. So, basically, labels.txt contains the list of the model classes. If you use my Ionic app, you need to include this txt file in the same path as the model files. My code will implicitly get the list from labels.txt and create a logic for the code to identify the object during image classification testing. You won’t need to change or understand the code. I created this Ionic testing app so that any person can use this not only the developers. You just need to open this on an IDE like Visual Studio Code, dump your model files in a specified path, and then run the app.
You will need to have a Google account to be able to use Google Colab and Google Drive. Google Colab is a free service offered by Google where you can run Python scripts and use machine learning libraries taking advantage of their powerful hardware. You can use Google Drive with 15GB free storage. Google Drive is where you will store your images.zip (or any filename you prefer) as your input dataset and this is where your output model files will be dumped as well after the script has been successfully completed. Google Colab will connect to Google Drive and you need to authenticate and give permission to do this. You can copy the Python script ‘Retraining_Image_Classifier_for_TFJS.ipynb’ I made for Google Colab here to the designated folder of your choice in your Google Drive. Just double click the file to open it on another browser tab. If it ask you to ‘Open to Google Colaboratory’ just click and proceed on doing so.
Google Colab is just a cloud-based Jupyter Notebook. The advantage of using it is that you don’t have to worry about the expensive and excessive hardware requirements you need to be able to run your Python scripts. In addition to that, we often encounter problems on library installation and hardware prerequisites we need on our on-premise, local or server machine. We don’t have to worry about that here.
The Python scripts on .ipynb file can be grouped as follows:
- 1) You have the section for authenticating and giving permission to access your Google Drive. We need to read your input zipped file from you Google Drive and also save the output model files here.
- 2) You have the sections to create the SavedModel format first. Of course, it includes all the bells and whistles of creating a model using TensorFlow such as setting up and importing the prerequisite libraries, unzipping the image zipped file to session storage to setup the input dataset; and defining, training, and creating the new SavedModel format. The scripts here were solely based on this document. Note that the word ‘Retraining’ on the title indicates the use of transfer learning technology on image classification; and not training the dataset from scratch, which is a longer process to execute. This will be based on the first few hidden layers, called ‘feature_extractor_layer’, of the already pretrained model from ‘mobilenet_v2_100_224’ or ‘inception_v3’. Use the former if you are using IoT or mobile device for testing. Inception_v3 creates bigger sets of model files which can make infer testing slow on mobile devices. Tick on ‘do_data_augmentation’ if you have fewer collections of images use for input dataset. It is explained here how data augmentation can make several more images to train and test. I would highly advice to tick on ‘do_fine_tuning’ because this will give higher accuracy on the final output. It will take a little more time to finish but it’s worth it.
- 3) The last part of the script is to convert the SavedModel model format to a TFJS web format. The SavedModel files are created and stored in session storage and the final TFJS web format model files are saved in your Google Drive. The script for model conversion was based on these documents here and here . I also added a small script to create labels.txt.
The TFJS web format consists of model.json and the binary weight files (group1-shard1of2.bin, group1-shard1of3.bin…). You may ask why the binary weight files are divided(or sharded and grouped — as their filenames imply) into several 4MB files. The reason for this is that TFJS is using a web format for browser use. A browser can only handle up to 4MB cache for any file. So the reason is for a better web performance thru caching.
Just a reminder, the Google Colab session storage will only last up to 12 hours after the the creation of files here. After that, they’ll be all deleted. That was the main reason why I decided to use Google Drive as the permanent storage of the output result in the first place. The unzipped images files and the SavedModel format files, which are both saved on the session storage, are not really the main focus of this topic and are not really needed for future use. Of course, if you need to get hold of the SavedModel format for other uses, it is really not hard to trace the script on how to redirect the saving to your Google Drive.
The following are the detailed steps on how your model is created on your Google Drive:
Prepare the Image dataset
- 1) An image dataset is a folder containing a lot of images with each type still contained in its own folder. Each folder type represents a class in the model that is to be created. I would suggest to get hundreds of images for each class.
- 2) Compress the images from its main folder. Name it whatever you prefer.
Later on, you will set variable names on the ipynb script that would point to the zipped file from your Google Drive.
- 3) Upload the zipped file to your designated path on your Google Drive. Please note that the zipped file should reside on the path with ‘Retraining_Image_Classifier_for_TFJS.ipynb’.
If you do not have set of images to test this script out, you could download my samples zipped file here as a starter.
Prepare your .ipynb script
- 1) You may copy my ‘Retraining_Image_Classifier_for_TFJS.ipynb’ script to your Google Drive. It is also available here.
- 2) Double click on this file so that it would open on Google Colab.
Run ‘Retraining_Image_Classifier_for_TFJS.ipynb’ script
This section will be executed exclusively on Google Colab. Within these steps, you can set Notebook settings and you will set variable names and various dropdown and checkbox options before running the script. Upon running the script, first, you should give permission to read and write to your Google Drive. After a successful authentication, the script will proceed on running on the remaining cells below for unzipping your compressed image file, saving it on session storage, defining, training and creating the model. If everything went well, your model is saved back and exported to your Google Drive.
- 1) Change the hardware accelerator to GPU for better and faster performance. You can find this dialog box from Edit>Notebook settings or Runtime>Change runtime type.
- 2) Set variable names data_dir, base_path, and imagezipped. data_dir is the main folder of the image dataset before it was compressed. base_path is the path on your Google Drive where ipynb file and compressed image file reside. imagezipped is the name of your compressed image file. There is also an option to set epoch value to 10. Epoch is the number of iteration for back propagation to correct the errors on earlier part so that it would increase accuracy. The larger the epoch the more iteration thus the longer the training process.
- 3) Set script options. You can choose between InceptionV3 or MobileNetV2 as your pretrained TF2 SavedModel module. You can check on Do Data Augmentation to increase your image dataset. You can check on Do Fine Tuning for better performance and accuracy during training.
- 4) Run the script from Runtime>Run all. Alternatively, you can also run each cell manually. Please note that the first cell is the authentication to connect to Google Drive. You will need to give permission to that before proceeding to the next cell.
Your TFJS web-format model files are now exported to your Google Drive
After you successfully run the ipynb script, you should have created and exported your model on a new ‘output’ path on your Google Drive.
Notable observations on your Google Colab upon running the script
- 1) Your Google Drive is mounted on your session storage after successfully giving permissions to read and write from it.
- 2) Your image dataset is unzipped on your session storage. This will be the source of your image dataset for defining the model
To view the session storage, you should click the Files icon(a manila folder icon) on the left pane and press on the Refresh button on the toolbar to see the updates.
- 3) After the training section is done, your new version of SavedModel is created on session storage. This will be the basis of converting to a TFJS web format version on your Google Drive.
- 4) View the accuracy upon training the model. You can view this on the last epoch iteration output log or the Accuracy graph created by matplotlib. This is a significant part of running the script because the accuracy will dictate the performance and acceptance of your testing app when you finally deploy and include the model files to the source code. If the accuracy is very low and the error is high, you would not want to proceed using the output model for deployment. You could either check your input image dataset by adding some more images or check any error in classes mixed up with other classes. You can also tick ‘Do Fine Tuning’ on the script. Furthermore, there could be many reasons for low accuracy on training the model and Transfer learning may not be the right solution for it (you would otherwise train the model from scratch) .
Other related links:
- Test your model on desktop or mobile device using my Ionic app.
- If you want TFLite image classification model maker, please check this out, ‘TFLite image classification model maker using Google Colab and Google Drive’. This will create model files model.tflite and labels.txt.
Consider donating to Paypal if you appreciate the effort on creating this article.