====== Object Detection ======
====== Introduction ======
An object detection algorithm is one that takes in an image, and outputs bounding boxes surrounding the objects of interest in the image. An example is shown below, a dog, bicycle, and a car.
{{:cs:vision:object_detection:object_detection_example.jpg?400|}}
Before you can use an object detection algorithm, you must first predefine what sort of objects (aka classes) you want to be able to detect. Next, you must train the algorithm on example images that contain the objects you want it to learn. These images must be manually labeled by people, showing where the objects are in the image. In our experience, you probably need at least a few hundred example images for each class to get decent results.
After the algorithm has been trained on the example data, you can then use it to find objects in new images that it hasn't seen before! The process of using the object detection algorithm to find objects is also known as "inference".
====== Tools ======
We currently use the [[https://github.com/tensorflow/models/tree/master/research/object_detection|Tensorflow object detection API]] (henceforth abbreviated as TFODA) for both training and inference. Previously we used the [[https://pjreddie.com/darknet/|Darknet framework]], however we found it rather difficult to use because the code is messy and uses rather old versions of libraries. The tensorflow object detection API is developed by more people, easier to use, and kept more up to date.
====== Setup ======
These instructions are for manual installation on your own computer, we really need to automate the installation of everything. Also, ideally we should be training on Kamiak, not our own computers.
Before you can start training, you first need to install tensorflow and the Tensorflow Object Detection API.
===== Tensorflow =====
Tensorflow can run either on your CPU, or on an Nvidia GPU (unfortunately AMD support isn't ready at this time). If you have an Nvidia GPU, it's highly recommended to use it, it is orders of magnitude faster. Instructions are adapted from [[https://www.tensorflow.org/install/install_linux | here]]. To test your installation, open up python and run:
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
==== CPU Installation ====
If you're running on CPU only, you simply need to run:J
pip install --user tensorflow
==== GPU Installation ====
If you're running on an Nvidia GPU, you'll need to install Nvidia's CUDA and CuDNN libraries. Currently, Tensorflow requires CUDA 9.0 and CuDNN 7.0, it's important you install the right versions. To install tensorflow for gpu, run:
pip install --user tensorflow-gpu
=== CUDA Installation ===
You can download CUDA 9.0 for Ubuntu 16.04 from [[https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=debnetwork | here]]. To install, run:
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-key adv --fetch-keys \
http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo aptitude update
sudo aptitude install cuda
=== CuDNN Installation ===
To download CuDNN you'll need to create an Nvidia account. Download (and create an account) [[https://developer.nvidia.com/rdp/cudnn-download | here]]. You'll want to download CuDNN 7.0.x for CUDA 9.0, both the developer and runtime libraries. Install both of them.
===== Object Detection API =====
Instructions were adapted from [[https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md | here]]
Install dependencies:
sudo aptitude install protobuf-compiler
pip install --user Cython pillow lxml jupyter matplotlib
Download the Object Detection API from Github. I'm putting it in ~/.local, you can put it wherever you'd like, but you'll have to change all the instructions accordingly.
git clone git@github.com:tensorflow/models.git ~/.local/tensorflow_object_detection_api
Notice how long that took to clone? That's why we don't put binary files in our git repos!
Setup the API:
cd ~/.local/tensorflow_object_detection_api/research/
protoc object_detection/protos/*.proto --python_out=.
echo -e '# Tensorflow object detection api\nexport PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim' >> ~/.bashrc
source ~/.bashrc
Test it:
python ~/.local/tensorflow_object_detection_api/research/object_detection/builders/model_builder_test.py
====== Generating Training Data ======
To label our images, we use sloth. For more details, see (link). Sloth creates a json file that describes the labels for each image. The TFODA requires the example images and their labels to be packaged into a specific file format called a TFrecord.
We have a [[https://github.com/PalouseRobosub/vision_dev/blob/master/sloth_to_tfrecord.py|python script]] for converting the sloth json format in TFrecord files in our vision_dev repository. To create a TFrecord file, you need both the json file containing the labels, and the images the json file refers to. Usage is shown below:
./sloth_to_tf_record.py