A²I: Detectron—usage and features

  • August 06, 2018

Facebook AI Research (FAIR) open sourced their object detection platform — Detectron in mid-January this year. The unique feature that it has is the capability of performing object segmentation. It just takes the object detection a step further and instead of just drawing a bounding box around the image, it can actually draw a nonlinear boundary comprising the object. Detectron is powered by Caffe2 deep learning framework and is written in Python.


The detectron platform includes implementation for the following algorithms:


. Mask R-CNN

. RetinaNet

. Faster R-CNN


. Fast R-CNN


The detectron platform can be used for general object detection out-of-the-box. It can also be trained on your own dataset by slightly modifying the train and inference files.


**The detectron platform currently doesn’t have CPU implementation, a GPU is required for inference as well.

1. Install Caffe2 with CUDA support. If you already have Caffe2 installed, make sure to update it to a version that includes the Detectron module.

2. Install the Python dependencies and COCO API:

pip install numpy>=1.13 pyyaml>=3.12 matplotlib opencv-python>=3.2 setuptools Cython mock scipy # COCOAPI=/path/to/clone/cocoapi git clone https://github.com/cocodataset/cocoapi.git $COCOAPI cd $COCOAPI/PythonAPI # Install into global site-packages make install # Alternatively, if you do not have permissions or prefer # not to install the COCO API into global site-packages python setup.py install --user

3. Clone the Detectron repository and set up Python modules:

# DETECTRON=/path/to/clone/detectron git clone https://github.com/facebookresearch/detectron $DETECTRON cd $DETECTRON/lib && make

4. Check that Detectron tests are passed:

python2 $DETECTRON/detectron/tests/test_spatial_narrow_as_op.py

For detailed instructions refer Detectron installation guide.

Getting Started

1. Inference using pre-trained models:

We can run inference on a directory of image files or single files. We will use the infer_simple.py module. In this example, we're using an end-to-end trained Mask R-CNN model with a ResNet-101-FPN backbone from the model zoo (discussed later). You can store your experimental images in folder named input and run:

python2 tools/infer_simple.py \ --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \ --output-dir /inference/detectron-visualizations \ --image-ext jpg \ --wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \ input/

Detectron will automatically download the model from the URL specified by the --wts argument. This tool will output visualizations of the detections in PDF format in the directory specified by --output-dir.

2. Training your own model:

This tiny tutorial shows you how to train a model on COCO. The model will be an end-to-end trained Faster R-CNN using a ResNet-50-FPN backbone. For this tutorial, we’ll use a short training schedule and a small input image size so that training and inference will be relatively fast.

python2 tools/train_net.py \ --cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml \ OUTPUT_DIR /path/to/output

Output (models, validation set detections, etc.) will be saved under OUTPUT_DIR. Similarly training with multiple GPUs can be performed my adding merely a --multi-gpu-testing flag which instructs Detectron to parallelize inference over multiple GPUs.

Model Zoo

Detectron has a large collection of baselines which are available here. The ones trained in late December 2017 are referred as the “12 2017 baselines”. All configurations for these baselines are located in the configs/12_2017_baselinesdirectory. Links to the trained models as well as their output are also provided. Please refer to the Model Zoo to get information about common settings, training schedules, ImageNet pretrained models, RPN Proposal Baselines, Fast & Mask R-CNN Baselines Using Precomputed RPN Proposals, End-to-End Faster & Mask R-CNN Baselines, RetinaNet Baselines, Keypoint Detection Baselines, Person-Specific RPN Baselines and more.