I had a chance to play with TensorFlow for one of the projects. I've never had any machine learning experience before, as well as Python itself (🔥). In this article I will explain how did I train my first model on a custom image set for object recognition purposes. For some of you this article may be supremely mediocre, so make adjustments for yourself.

Firstly, this tutorial has been designed for UNIX systems, so if you are running Windows - stages may be quite different.

Let's start!

The problem 🤔

I have a small sticky note with chessboard-style rectangular shapes. My imaginary task is to track/recognise this particular sticker on any given image. Here is a sticker for reference:

And the main idea is to find this sticker on any other image. Like this one:

Project setup 👩‍💻

For this task we need to install/download couple of mandatory repositories/tools.

  1. Python

TensorFlow is based on Python, and almost 99% of this tutorial will be based on it. Please, install the latest (on a moment of this post it is 3.7) version since some of supporting TensorFlow libraries are unavailable prior to 3.6.


2. TensorFlow

pip install tensorflow

to install through the Python Pip package manager or follow the link to get a different version:


3. TensorFlow models

git clone https://github.com/tensorflow/models.git

Once everything is set up, it is time to prepare some test data!

Preparing test data

Our training data will represent a set of labeled pictures in CSV format. Simply speaking, we are using a supervised ML approach, where our model is trained on a data with correct outcome. Imagine that we are going show a number of pictures with highlighted sticker in it :)

For this purpose I used LabelImg which has an easy UI for zone highlighting and XML export. Let's install it!

Go to: https://github.com/tzutalin/labelImg and follow installation example for your system. For me it was a Python3 setup on High Sierra (from author):

brew install qt  # will install qt-5.x.x 
brew install libxml2 
make qt5py3 
python3 labelImg.py 
As a side note, if mssing pyrcc5 or lxml, try: 
pip3 install pyqt5 lxml

Once the UI tool started, we highlight desired area (sticker in our case) and save as XML. In the end, your output file should look like:


Now we should repeat the same process... for every test picture (in my case I had only 5, so it was not a big deal).

What we have done was basically to mark sticker area to highlight what we are expecting to recognise.

But there is another thing!

TensorFlow doesn't support XML files, so we need to convert all generated XML snippets into CSV. Thankfully, a fast Python script has been already available on Git will help us to convert labeled XML to CSV:


You can run above-described file in the same folder with all of your CSV label metadata.

Don't forget to change raccoon_labels.csv into training.csv.

As an output training.csv, you will get an aggregated version of all XML files in a single CSV:


Since we are already generating many files, let's agree on a folder structure:

└── data
    └── training.csv

Where training.csv is our CSV aggregation of XML labels.

What's next? CSV is awesome, but TensorFlow doesn't understand it. We need to convert aggregated dataset into TensorFlow Record 🔥

Converting CSV aggregated file into TensorFlow Record (TFR)

Luckily enough, there is already a script doing transformation of CSV into TFR:


This script accepts our CSV input path and TFR file output path. We can launch it by executing:

python generate_tfRecord.py --csv_input=data/training.csv  --output_path=data/training.record

This record file will be used as a feed for training process.

Now our directory looks like this:

└── data
    ├── training.csv
    └── training.record

Oh, almost forgot!

We will also need to create a testing record file which we will feed with training record in order to train our model. I was lazy enough to literally copy-paste training dataset, but I definitely don't recommend to do it. In a perfect scenario training dataset should be much-much bigger than a testing one.

And also don't forget our images! We will need them... Final directory look should be like:

├── data
│   ├── testing.csv
│   ├── testing.record
│   ├── training.csv
│   └── training.record
└── images
    ├── 1.jpg
    ├── 2.jpg
    ├── 3.jpg
    ├── 4.jpg
    └── 5.jpg

Alright, we are done with a data! 🎉 Let's train!

...in a part II 🤞