Update version to 0.6.0a

Ting PAN
Commit ca4313d9 authored Apr 02, 2022 by Ting PAN
Showing with 1925 additions and 2440 deletions
.flake8
MODEL_ZOO.md
README.md
configs/faster_rcnn/README.md
configs/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_1x.yml → configs/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x.yml
configs/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_2x.yml → configs/faster_rcnn/coco_faster_rcnn_R_50_FPN_2x.yml
configs/faster_rcnn/voc_faster_rcnn_R-50-FPN_640.yml
configs/mask_rcnn/README.md
configs/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_1x.yml → configs/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x.yml
configs/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_2x.yml → configs/mask_rcnn/coco_mask_rcnn_R_50_FPN_2x.yml
configs/retinanet/README.md
configs/retinanet/coco_retinanet_R-50-FPN_416_6x.yml
configs/retinanet/coco_retinanet_R-50-FPN_512_6x.yml
configs/retinanet/coco_retinanet_R-50-FPN_800_1x.yml → configs/retinanet/coco_retinanet_R_50_FPN_1x.yml
configs/retinanet/coco_retinanet_R-50-FPN_800_2x.yml → configs/retinanet/coco_retinanet_R_50_FPN_2x.yml
configs/retinanet/voc_retinanet_R-50-FPN_416.yml
configs/retinanet/voc_retinanet_R-50-FPN_512.yml
configs/ssd/README.md
configs/ssd/voc_ssd_VGG-16_300.yml → configs/ssd/voc_ssd300_VGG_16_120e.yml
configs/ssd/voc_ssd_VGG-16_512.yml → configs/ssd/voc_ssd512_VGG_16_120e.yml
--- a/.flake8
+++ b/.flake8
@@ -9,4 +9,3 @@ ignore = E741, # ambiguous variable name
         W504, # line break after binary operator
 # module imported but unused
 per-file-ignores = __init__.py: F401
-exclude = seetadet/utils/pycocotools
--- a/MODEL_ZOO.md
+++ b/MODEL_ZOO.md
@@ -2,25 +2,9 @@

 ## Introduction

-### ImageNet Pretrained Models
+### Pretrained Models

-#### ResNet Models
-
- [R-50.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-50.pkl)
- [R-101.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-101.pkl)
-
-#### VGG Models
-
- [VGG16.SSD.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/VGG16.SSD.pkl)
-
-#### MobileNet Models
-
- [MobileNetV2.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/MobileNetV2.pkl)
- [ProxylessMobile.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/ProxylessMobile.pkl)
-
-#### AirNet Models
-
- [AirNet.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/AirNet.pkl)
+Please refer to [Pretrained Models](data/pretrained/README.md) for details.

 ## Baselines


--- a/README.md
+++ b/README.md
@@ -7,10 +7,6 @@ while the style of codes is torch.

 The torch-style codes help us to simplify the hierarchical pipeline of modern detection.

-## Requirements
-
-seeta-dragon >= 0.3.0.dev20201024
-
 ## Installation

 ### Build From Source
@@ -57,35 +53,23 @@ python test.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
 ```
 Or

+### Export a detection model to ONNX
+
 ```bash
 cd tools
-python test_all.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --last 1
+python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
 ```

-### Export a detection model to ONNX
+### Serve a detection model

 ```bash
 cd tools
-python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
+python serve.py --cfg <MODEL_YAML> --model_dir <MODEL_DIR>
 ```

 ## Benchmark and Model Zoo

 Results and models are available in the [Model Zoo](MODEL_ZOO.md).

-### Supported Backbones
-
- [ResNet](MODEL_ZOO.md#resnet-models)
- [VGG](MODEL_ZOO.md#vgg-models)
- [MobileNet](MODEL_ZOO.md#mobilenet-models)
- [AirNet](MODEL_ZOO.md#airnet-models)
-
-### Supported Algorithms
-
- [Faster R-CNN](configs/faster_rcnn)
- [Mask R-CNN](configs/mask_rcnn)
- [SSD](configs/ssd)
- [RetinaNet](configs/retinanet)
-
 ## License
 [BSD 2-Clause license](LICENSE)
--- a/configs/faster_rcnn/README.md
+++ b/configs/faster_rcnn/README.md
@@ -14,13 +14,7 @@

 ## COCO Object Detection Baselines

-| Model | Lr sched | Infer time (s/im) | box AP | Download |
-| :---: | :------: | :---------------: | :----: | :------: |
-| [R-50-FPN-800](coco_faster_rcnn_R-50-FPN_800_1x.yml) | 1x | 0.046 | 38.3 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_1x/model_final.pkl) |
-| [R-50-FPN-800](coco_faster_rcnn_R-50-FPN_800_2x.yml) | 2x | 0.046 | 39.7 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_2x/model_final.pkl) |
-
-## Pascal VOC Object Detection Baselines
-
-| Model | Infer time (s/im) | AP@0.5 | Download |
-| :---: | :---------------: | :----: | :------: |
-| [R-50-FPN-640](voc_faster_rcnn_R-50-FPN_640.yml) | 0.030 | 80.8 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/voc_faster_rcnn_R-50-FPN_640_1x/model_final.pkl) |
+| Model | Lr sched | Infer time (fps) | box AP | Download |
+| :---: | :------: | :--------------: | :----: | :-----: |
+| [R-50-FPN](coco_faster_rcnn_R_50_FPN_1x.yml) | 1x | 27.78 | 38.4 | [model](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x/model_adb024b6.pkl) &#124; [log]() |
+| [R-50-FPN](coco_faster_rcnn_R_50_FPN_2x.yml) | 2x | 27.78 | 39.8 | [model](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_2x/model_9a8c9ae5.pkl) &#124; [log]() |
--- a/configs/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_1x.yml
+++ b/configs/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_1x.yml
 NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
  TYPE: 'faster_rcnn'
-  BACKBONE: 'resnet50.fpn'
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
            'bus', 'train', 'truck', 'boat', 'traffic light',
@@ -19,27 +17,30 @@ MODEL:
            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
 SOLVER:
  BASE_LR: 0.02
  DECAY_STEPS: [60000, 80000]
  MAX_STEPS: 90000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'coco_faster_rcnn_R-50-FPN_800_1x'
-FRCNN:
-  BATCH_SIZE: 512
-  ROI_XFORM_RESOLUTION: 7
+  SNAPSHOT_PREFIX: 'coco_faster_rcnn_R_50_FPN'
 TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/coco_2017_train'
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
+  DATASET: '../data/datasets/coco_train2017'
  IMS_PER_BATCH: 2
  SCALES: [640, 672, 704, 736, 768, 800]
  MAX_SIZE: 1333
  USE_DIFF: False # Do not use crowd objects
 TEST:
-  DATASET: '/data/coco_2017_val'
-  JSON_FILE: '/data/instances_val2017.json'
-  PROTOCOL: 'coco'
+  DATASET: '../data/datasets/coco_val2017'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
+  EVALUATOR: 'coco'
  IMS_PER_BATCH: 1
  SCALES: [800]
  MAX_SIZE: 1333
-  NMS: 0.5
--- a/configs/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_2x.yml
+++ b/configs/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_2x.yml
 NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
  TYPE: 'faster_rcnn'
-  BACKBONE: 'resnet50.fpn'
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
            'bus', 'train', 'truck', 'boat', 'traffic light',
@@ -19,27 +17,30 @@ MODEL:
            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
 SOLVER:
  BASE_LR: 0.02
  DECAY_STEPS: [120000, 160000]
  MAX_STEPS: 180000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'coco_faster_rcnn_R-50-FPN_800_2x'
-FRCNN:
-  BATCH_SIZE: 512
-  ROI_XFORM_RESOLUTION: 7
+  SNAPSHOT_PREFIX: 'coco_faster_rcnn_R-50-FPN_800'
 TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/coco_2017_train'
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
+  DATASET: '../data/datasets/coco_train2017'
  IMS_PER_BATCH: 2
  SCALES: [640, 672, 704, 736, 768, 800]
  MAX_SIZE: 1333
  USE_DIFF: False # Do not use crowd objects
 TEST:
-  DATASET: '/data/coco_2017_val'
-  JSON_FILE: '/data/instances_val2017.json'
-  PROTOCOL: 'coco'
+  DATASET: '../data/datasets/coco_val2017'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
+  EVALUATOR: 'coco'
  IMS_PER_BATCH: 1
  SCALES: [800]
  MAX_SIZE: 1333
-  NMS: 0.5
--- a/configs/faster_rcnn/voc_faster_rcnn_R-50-FPN_640.yml
+++ b/configs/faster_rcnn/voc_faster_rcnn_R-50-FPN_640.yml
-NUM_GPUS: 1
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
-MODEL:
-  TYPE: 'faster_rcnn'
-  BACKBONE: 'resnet50.fpn'
-  CLASSES: ['__background__',
-            'aeroplane', 'bicycle', 'bird', 'boat',
-            'bottle', 'bus', 'car', 'cat', 'chair',
-            'cow', 'diningtable', 'dog', 'horse',
-            'motorbike', 'person', 'pottedplant',
-            'sheep', 'sofa', 'train', 'tvmonitor']
-FRCNN:
-  BATCH_SIZE: 128
-  ROI_XFORM_RESOLUTION: 7
-SOLVER:
-  BASE_LR: 0.002
-  DECAY_STEPS: [80000, 100000]
-  MAX_STEPS: 120000
-  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'voc_faster_rcnn_R-50-FPN_640'
-TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/voc_0712_trainval'
-  IMS_PER_BATCH: 2
-  SCALES: [480, 512, 544, 576, 608, 640]
-  MAX_SIZE: 1066
-  USE_COLOR_JITTER: True
-TEST:
-  DATASET: '/data/voc_2007_test'
-  PROTOCOL: 'voc2007'
-  IMS_PER_BATCH: 1
-  SCALES: [640]
-  MAX_SIZE: 1066
-  NMS: 0.45
--- a/configs/mask_rcnn/README.md
+++ b/configs/mask_rcnn/README.md
@@ -14,7 +14,7 @@

 ## COCO Instance Segmentation Baselines

-| Model | Lr sched | Infer time (s/im) | box AP | mask AP | Download |
+| Model | Lr sched | Infer time (fps) | box AP | mask AP | Download |
 | :---: | :------: | :---------------: | :----: | :-----: | :------: |
-| [R-50-FPN-800](coco_mask_rcnn_R-50-FPN_800_1x.yml) | 1x | 0.056 | 39.2 | 34.8 | [model](https://dragon.seetatech.com/download/models/seetadet/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_1x/model_final.pkl) |
-| [R-50-FPN-800](coco_mask_rcnn_R-50-FPN_800_2x.yml) | 2x | 0.056 | 41.4 | 36.5 | [model](https://dragon.seetatech.com/download/models/seetadet/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_2x/model_final.pkl) |
+| [R-50-FPN](coco_mask_rcnn_R_50_FPN_1x.yml) | 1x | 22.22 | 39.2 | 35.1 | [model](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x/model_90266029.pkl) &#124; [log]() |
+| [R-50-FPN](coco_mask_rcnn_R_50_FPN_2x.yml) | 2x | 22.22 | 41.4 | 36.7 | [model](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_2x/model_4ace9d05.pkl) &#124; [log]() |
--- a/configs/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_1x.yml
+++ b/configs/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_1x.yml
 NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
-  TYPE: 'mask_rcnn'
-  BACKBONE: 'resnet50.fpn'
+  TYPE: mask_rcnn
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
            'bus', 'train', 'truck', 'boat', 'traffic light',
@@ -19,28 +17,31 @@ MODEL:
            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
 SOLVER:
  BASE_LR: 0.02
  DECAY_STEPS: [60000, 80000]
  MAX_STEPS: 90000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'coco_mask_rcnn_R-50-FPN_800_1x'
-FRCNN:
-  BATCH_SIZE: 512
-  ROI_XFORM_RESOLUTION: 7
-MRCNN:
-  ROI_XFORM_RESOLUTION: 14
+  SNAPSHOT_PREFIX: 'coco_mask_rcnn_R-50-FPN'
 TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/coco_2017_train'
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
+  DATASET: '../data/datasets/coco_train2017'
+  LOADER: 'mask_train'
  IMS_PER_BATCH: 2
  SCALES: [640, 672, 704, 736, 768, 800]
  MAX_SIZE: 1333
  USE_DIFF: False # Do not use crowd objects
 TEST:
-  DATASET: '/data/coco_2017_val'
-  JSON_FILE: '/data/instances_val2017.json'
-  PROTOCOL: 'coco'
+  DATASET: '../data/datasets/coco_val2017'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
+  EVALUATOR: 'coco'
+  IMS_PER_BATCH: 1
  SCALES: [800]
  MAX_SIZE: 1333
-  NMS: 0.5
--- a/configs/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_2x.yml
+++ b/configs/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_2x.yml
 NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
-  TYPE: 'mask_rcnn'
-  BACKBONE: 'resnet50.fpn'
+  TYPE: mask_rcnn
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
            'bus', 'train', 'truck', 'boat', 'traffic light',
@@ -19,28 +17,31 @@ MODEL:
            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
 SOLVER:
  BASE_LR: 0.02
  DECAY_STEPS: [120000, 160000]
  MAX_STEPS: 180000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'coco_mask_rcnn_R-50-FPN_800_2x'
-FRCNN:
-  BATCH_SIZE: 512
-  ROI_XFORM_RESOLUTION: 7
-MRCNN:
-  ROI_XFORM_RESOLUTION: 14
+  SNAPSHOT_PREFIX: 'coco_mask_rcnn_R_50_FPN'
 TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/coco_2017_train'
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
+  DATASET: '../data/datasets/coco_train2017'
+  LOADER: 'mask_train'
  IMS_PER_BATCH: 2
  SCALES: [640, 672, 704, 736, 768, 800]
  MAX_SIZE: 1333
  USE_DIFF: False # Do not use crowd objects
 TEST:
-  DATASET: '/data/coco_2017_val'
-  JSON_FILE: '/data/instances_val2017.json'
-  PROTOCOL: 'coco'
+  DATASET: '../data/datasets/coco_val2017'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
+  EVALUATOR: 'coco'
+  IMS_PER_BATCH: 1
  SCALES: [800]
  MAX_SIZE: 1333
-  NMS: 0.5
--- a/configs/retinanet/README.md
+++ b/configs/retinanet/README.md
@@ -12,16 +12,7 @@

 ## COCO Object Detection Baselines

-| Model | Lr sched | Infer time (s/im) | box AP | Download |
-| :---: | :------: | :---------------: | :----: | :------: |
-| [R-50-FPN-416](coco_retinanet_R-50-FPN_416_6x.yml) | 6x | 0.019 | 34.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_416_6x/model_final.pkl) |
-| [R-50-FPN-512](coco_retinanet_R-50-FPN_512_6x.yml) | 6x | 0.022 | 36.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_512_6x/model_final.pkl) |
-| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_1x.yml) | 1x | 0.051 | 37.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_1x/model_final.pkl) |
-| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_2x.yml) | 2x | 0.051 | 39.1 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_2x/model_final.pkl) |
-
-## Pascal VOC Object Detection Baselines
-
-| Model | Infer time (s/im) | AP@0.5 | Download |
-| :---: | :---------------: | :----: | :------: |
-| [R-50-FPN-416](voc_retinanet_R-50-FPN_416.yml) | 0.015 | 82.3 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_416/model_final.pkl) |
-| [R-50-FPN-512](voc_retinanet_R-50-FPN_512.yml) | 0.017 | 83.0 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_512/model_final.pkl) |
+| Model | Lr sched | Infer time (fps) | box AP | Download |
+| :---: | :------: | :--------------: | :----: | :------: |
+| [R-50-FPN](coco_retinanet_R_50_FPN_1x.yml) | 1x | 23.3 | 37.4 | [model](https://dragon.seetatech.com/download/seetadet/retinanet/coco_retinanet_R_50_FPN_1x/model_01a4d35f.pkl) &#124; [log]() |
+| [R-50-FPN](coco_retinanet_R_50_FPN_2x.yml) | 2x | 23.3 | 39.0 | [model](https://dragon.seetatech.com/download/seetadet/retinanet/coco_retinanet_R_50_FPN_2x/model_7e81f3ad.pkl) &#124; [log]() |
--- a/configs/retinanet/coco_retinanet_R-50-FPN_416_6x.yml
+++ b/configs/retinanet/coco_retinanet_R-50-FPN_416_6x.yml
-NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
-MODEL:
-  TYPE: 'retinanet'
-  BACKBONE: 'resnet50.fpn'
-  CLASSES: ['__background__',
-            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
-            'bus', 'train', 'truck', 'boat', 'traffic light',
-            'fire hydrant', 'stop sign', 'parking meter', 'bench',
-            'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
-            'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
-            'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
-            'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
-            'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
-            'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
-            'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
-            'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
-            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
-            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
-            'teddy bear', 'hair drier', 'toothbrush']
-FPN:
-  RPN_MIN_LEVEL: 3
-  RPN_MAX_LEVEL: 7
-SOLVER:
-  BASE_LR: 0.01
-  DECAY_STEPS: [90000, 120000]
-  MAX_STEPS: 135000
-  SNAPSHOT_EVERY: 2500
-  SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_416_6x'
-PIPELINE:
-  TYPE: 'ssd'
-TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/coco_2017_train'
-  IMS_PER_BATCH: 8
-  SCALES: [416]
-  USE_DIFF: False # Do not use crowd objects
-TEST:
-  DATASET: '/data/coco_2017_val'
-  JSON_FILE: '/data/instances_val2017.json'
-  PROTOCOL: 'coco'
-  IMS_PER_BATCH: 1
-  SCALES: [416]
-  NMS: 0.5
--- a/configs/retinanet/coco_retinanet_R-50-FPN_512_6x.yml
+++ b/configs/retinanet/coco_retinanet_R-50-FPN_512_6x.yml
-NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
-MODEL:
-  TYPE: 'retinanet'
-  BACKBONE: 'resnet50.fpn'
-  CLASSES: ['__background__',
-            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
-            'bus', 'train', 'truck', 'boat', 'traffic light',
-            'fire hydrant', 'stop sign', 'parking meter', 'bench',
-            'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
-            'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
-            'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
-            'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
-            'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
-            'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
-            'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
-            'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
-            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
-            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
-            'teddy bear', 'hair drier', 'toothbrush']
-FPN:
-  RPN_MIN_LEVEL: 3
-  RPN_MAX_LEVEL: 7
-SOLVER:
-  BASE_LR: 0.01
-  DECAY_STEPS: [90000, 120000]
-  MAX_STEPS: 135000
-  SNAPSHOT_EVERY: 2500
-  SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_512_6x'
-PIPELINE:
-  TYPE: 'ssd'
-TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/coco_2017_train'
-  IMS_PER_BATCH: 8
-  SCALES: [512]
-  USE_DIFF: False # Do not use crowd objects
-TEST:
-  DATASET: '/data/coco_2017_val'
-  JSON_FILE: '/data/instances_val2017.json'
-  PROTOCOL: 'coco'
-  IMS_PER_BATCH: 1
-  SCALES: [512]
-  NMS: 0.5
--- a/configs/retinanet/coco_retinanet_R-50-FPN_800_1x.yml
+++ b/configs/retinanet/coco_retinanet_R-50-FPN_800_1x.yml
 NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
  TYPE: 'retinanet'
-  BACKBONE: 'resnet50.fpn'
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
            'bus', 'train', 'truck', 'boat', 'traffic light',
@@ -19,27 +17,25 @@ MODEL:
            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
            'teddy bear', 'hair drier', 'toothbrush']
-FPN:
-  RPN_MIN_LEVEL: 3
-  RPN_MAX_LEVEL: 7
+BACKBONE:
+  TYPE: 'resnet50.fpn'
 SOLVER:
  BASE_LR: 0.01
  DECAY_STEPS: [60000, 80000]
  MAX_STEPS: 90000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800_1x'
+  SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800'
 TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/coco_2017_train'
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
+  DATASET: '../data/datasets/coco_train2017'
  IMS_PER_BATCH: 2
  SCALES: [640, 672, 704, 736, 768, 800]
  MAX_SIZE: 1333
  USE_DIFF: False # Do not use crowd objects
 TEST:
-  DATASET: '/data/coco_2017_val'
-  JSON_FILE: '/data/instances_val2017.json'
-  PROTOCOL: 'coco'
+  DATASET: '../data/datasets/coco_val2017'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
+  EVALUATOR: 'coco'
  IMS_PER_BATCH: 1
  SCALES: [800]
  MAX_SIZE: 1333
-  NMS: 0.5
--- a/configs/retinanet/coco_retinanet_R-50-FPN_800_2x.yml
+++ b/configs/retinanet/coco_retinanet_R-50-FPN_800_2x.yml
 NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
  TYPE: 'retinanet'
-  BACKBONE: 'resnet50.fpn'
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
            'bus', 'train', 'truck', 'boat', 'traffic light',
@@ -19,27 +17,25 @@ MODEL:
            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
            'teddy bear', 'hair drier', 'toothbrush']
-FPN:
-  RPN_MIN_LEVEL: 3
-  RPN_MAX_LEVEL: 7
+BACKBONE:
+  TYPE: 'resnet50.fpn'
 SOLVER:
  BASE_LR: 0.01
  DECAY_STEPS: [120000, 160000]
  MAX_STEPS: 180000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800_2x'
+  SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800'
 TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/coco_2017_train'
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
+  DATASET: '../data/datasets/coco_train2017'
  IMS_PER_BATCH: 2
  SCALES: [640, 672, 704, 736, 768, 800]
  MAX_SIZE: 1333
  USE_DIFF: False # Do not use crowd objects
 TEST:
-  DATASET: '/data/coco_2017_val'
-  JSON_FILE: '/data/instances_val2017.json'
-  PROTOCOL: 'coco'
+  DATASET: '../data/datasets/coco_val2017'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
+  EVALUATOR: 'coco'
  IMS_PER_BATCH: 1
  SCALES: [800]
  MAX_SIZE: 1333
-  NMS: 0.5
--- a/configs/retinanet/voc_retinanet_R-50-FPN_416.yml
+++ b/configs/retinanet/voc_retinanet_R-50-FPN_416.yml
-NUM_GPUS: 1
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
-MODEL:
-  TYPE: 'retinanet'
-  BACKBONE: 'resnet50.fpn'
-  CLASSES: ['__background__',
-            'aeroplane', 'bicycle', 'bird', 'boat',
-            'bottle', 'bus', 'car', 'cat', 'chair',
-            'cow', 'diningtable', 'dog', 'horse',
-            'motorbike', 'person', 'pottedplant',
-            'sheep', 'sofa', 'train', 'tvmonitor']
-FPN:
-  RPN_MIN_LEVEL: 3
-  RPN_MAX_LEVEL: 7
-RETINANET:
-  NUM_CONVS: 2
-SOLVER:
-  BASE_LR: 0.01
-  DECAY_STEPS: [80000, 100000]
-  MAX_STEPS: 120000
-  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'voc_retinanet_R-50-FPN_416'
-PIPELINE:
-  TYPE: 'ssd'
-TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/voc_0712_trainval'
-  IMS_PER_BATCH: 16
-  SCALES: [416]
-  RANDOM_SCALES: [0.25, 1.0]
-  USE_COLOR_JITTER: True
-TEST:
-  DATASET: '/data/voc_2007_test'
-  PROTOCOL: 'voc2007'
-  IMS_PER_BATCH: 1
-  SCALES: [416]
-  NMS: 0.45
-  RETINANET_PRE_NMS_TOP_N: 1000
--- a/configs/retinanet/voc_retinanet_R-50-FPN_512.yml
+++ b/configs/retinanet/voc_retinanet_R-50-FPN_512.yml
-NUM_GPUS: 2
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
-MODEL:
-  TYPE: 'retinanet'
-  BACKBONE: 'resnet50.fpn'
-  CLASSES: ['__background__',
-            'aeroplane', 'bicycle', 'bird', 'boat',
-            'bottle', 'bus', 'car', 'cat', 'chair',
-            'cow', 'diningtable', 'dog', 'horse',
-            'motorbike', 'person', 'pottedplant',
-            'sheep', 'sofa', 'train', 'tvmonitor']
-FPN:
-  RPN_MIN_LEVEL: 3
-  RPN_MAX_LEVEL: 7
-RETINANET:
-  NUM_CONVS: 2
-SOLVER:
-  BASE_LR: 0.01
-  DECAY_STEPS: [80000, 100000]
-  MAX_STEPS: 120000
-  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'voc_retinanet_R-50-FPN_512'
-PIPELINE:
-  TYPE: 'ssd'
-TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/voc_0712_trainval'
-  IMS_PER_BATCH: 8
-  SCALES: [512]
-  RANDOM_SCALES: [0.25, 1.0]
-  USE_COLOR_JITTER: True
-TEST:
-  DATASET: '/data/voc_2007_test'
-  PROTOCOL: 'voc2007'
-  IMS_PER_BATCH: 1
-  SCALES: [512]
-  NMS: 0.45
-  RETINANET_PRE_NMS_TOP_N: 1000
--- a/configs/ssd/README.md
+++ b/configs/ssd/README.md
@@ -12,7 +12,9 @@

 ## Pascal VOC Object Detection Baselines

-| Model | Infer time (s/im) | AP@0.5 | Download |
-| :---: | :---------------: | :----: | :------: |
-| [VGG-16-300](voc_ssd_VGG-16_300.yml) | 0.012 | 78.3 | [model](https://dragon.seetatech.com/download/models/seetadet/ssd/voc_ssd_VGG-16_300/model_final.pkl) |
-| [VGG-16-512](voc_ssd_VGG-16_512.yml) | 0.021 | 80.1 | [model](https://dragon.seetatech.com/download/models/seetadet/ssd/voc_ssd_VGG-16_512/model_final.pkl) |
+| Model | Lr sched | Infer time (fps) | AP@0.5 | Download |
+| :---: | :----:   | :--------------: | :----: | :------: |
+| [VGG-16-SSD300](voc_ssd300_VGG_16_120e.yml) | 120e | 100.0 | 78.3 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd300_VGG-16_120e/model_54664312.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd300_VGG-16_120e/logs.json) |
+| [VGG-16-SSD512](voc_ssd512_VGG_16_120e.yml) | 120e | 71.4 | 80.6 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd512_VGG-16_120e/model_e332ebfe.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd512_VGG-16_120e/logs.json) |
+| [MobileNetV2-SSDLite](voc_ssdlite_MobileNetV2_300e.yml) | 300e | 76.9 | 71.6 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV2_300e/model_da31ebe7.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV2_300e/logs.json) |
+| [MobileNetV3L-SSDLite](voc_ssdlite_MobileNetV3L_300e.yml) | 300e | 66.7 | 72.6 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV3L_300e/model_43b33a97.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV3L_300e/logs.json) |
--- a/configs/ssd/voc_ssd_VGG-16_300.yml
+++ b/configs/ssd/voc_ssd_VGG-16_300.yml
 NUM_GPUS: 1
-PIXEL_STDS: [1.0, 1.0, 1.0]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
  TYPE: 'ssd'
-  BACKBONE: 'vgg16_reduced_300'
-  COARSEST_STRIDE: 0
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'aeroplane', 'bicycle', 'bird', 'boat',
            'bottle', 'bus', 'car', 'cat', 'chair',
            'cow', 'diningtable', 'dog', 'horse',
            'motorbike', 'person', 'pottedplant',
            'sheep', 'sofa', 'train', 'tvmonitor']
-SSD:
+BACKBONE:
+  TYPE: 'vgg16_fcn.ssd300'
+  NORM: ''
+  COARSEST_STRIDE: 300
+FPN:
+  ACTIVATION: 'ReLU'
+ANCHOR_GENERATOR:
  STRIDES: [8, 16, 32, 64, 100, 300]
-  ANCHOR_SIZES: [[30, 60],
-                 [60, 110],
-                 [110, 162],
-                 [162, 213],
-                 [213, 264],
-                 [264, 315]]
+  SIZES: [[30, 60], [60, 110],[110, 162],
+          [162, 213], [213, 264], [264, 315]]
  ASPECT_RATIOS: [[1, 2, 0.5],
                  [1, 2, 0.5, 3, 0.33],
                  [1, 2, 0.5, 3, 0.33],
@@ -31,18 +30,21 @@ SOLVER:
  DECAY_STEPS: [80000, 100000]
  MAX_STEPS: 120000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'voc_ssd_VGG-16_300'
+  SNAPSHOT_PREFIX: 'voc_ssd300_VGG_16'
+AUG:
+  COLOR_JITTER: 0.5
 TRAIN:
-  WEIGHTS: '/model/VGG16.SSD.pkl'
-  DATASET: '/data/voc_0712_trainval'
+  WEIGHTS: '../data/pretrained/VGG-16-FCN_in1k.pkl'
+  DATASET: '../data/datasets/voc_trainval0712'
  IMS_PER_BATCH: 16
  SCALES: [300]
-  RANDOM_SCALES: [0.25, 1.0]
-  USE_COLOR_JITTER: True
+  SCALES_RANGE: [0.25, 1.0]
+  LOADER: 'ssd_train'
 TEST:
-  DATASET: '/data/voc_2007_test'
-  PROTOCOL: 'voc2007'
+  DATASET: '../data/datasets/voc_test2007'
+  JSON_DATASET: '../data/datasets/voc_test2007.json'
+  EVALUATOR: 'voc2007'
  IMS_PER_BATCH: 1
  SCALES: [300]
-  NMS: 0.45
+  NMS_THRESH: 0.45
  SCORE_THRESH: 0.01
--- a/configs/ssd/voc_ssd_VGG-16_512.yml
+++ b/configs/ssd/voc_ssd_VGG-16_512.yml
-NUM_GPUS: 2
-PIXEL_STDS: [1.0, 1.0, 1.0]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
+NUM_GPUS: 1
 MODEL:
  TYPE: 'ssd'
-  BACKBONE: 'vgg16_reduced_512'
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'aeroplane', 'bicycle', 'bird', 'boat',
            'bottle', 'bus', 'car', 'cat', 'chair',
            'cow', 'diningtable', 'dog', 'horse',
            'motorbike', 'person', 'pottedplant',
            'sheep', 'sofa', 'train', 'tvmonitor']
-SSD:
+BACKBONE:
+  TYPE: 'vgg16_fcn.ssd512'
+  NORM: ''
+  COARSEST_STRIDE: 512
+FPN:
+  ACTIVATION: 'ReLU'
+ANCHOR_GENERATOR:
  STRIDES: [8, 16, 32, 64, 128, 256, 512]
-  ANCHOR_SIZES: [[35.84, 76.8],
+  SIZES: [[35.84, 76.8],
          [76.8, 153.6],
          [153.6, 230.4],
          [230.4, 307.2],
@@ -32,18 +36,21 @@ SOLVER:
  DECAY_STEPS: [80000, 100000]
  MAX_STEPS: 120000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'voc_ssd_VGG-16_512'
+  SNAPSHOT_PREFIX: 'voc_ssd512_VGG_16'
+AUG:
+  COLOR_JITTER: 0.5
 TRAIN:
-  WEIGHTS: '/model/VGG16.SSD.pkl'
-  DATASET: '/data/voc_0712_trainval'
-  IMS_PER_BATCH: 8
+  WEIGHTS: '../data/pretrained/VGG-16-FCN_in1k.pkl'
+  DATASET: '../data/datasets/voc_trainval0712'
+  IMS_PER_BATCH: 16
  SCALES: [512]
-  RANDOM_SCALES: [0.25, 1.0]
-  USE_COLOR_JITTER: True
+  SCALES_RANGE: [0.25, 1.0]
+  LOADER: 'ssd_train'
 TEST:
-  DATASET: '/data/voc_2007_test'
-  PROTOCOL: 'voc2007'
+  DATASET: '../data/datasets/voc_test2007'
+  JSON_DATASET: '../data/datasets/voc_test2007.json'
+  EVALUATOR: 'voc2007'
  IMS_PER_BATCH: 1
  SCALES: [512]
-  NMS: 0.45
+  NMS_THRESH: 0.45
  SCORE_THRESH: 0.01
--- a/configs/ssd/voc_ssdlite_MobileNetV2_300e.yml
+++ b/configs/ssd/voc_ssdlite_MobileNetV2_300e.yml
+NUM_GPUS: 1
+MODEL:
+  TYPE: 'ssd'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'aeroplane', 'bicycle', 'bird', 'boat',
+            'bottle', 'bus', 'car', 'cat', 'chair',
+            'cow', 'diningtable', 'dog', 'horse',
+            'motorbike', 'person', 'pottedplant',
+            'sheep', 'sofa', 'train', 'tvmonitor']
+BACKBONE:
+  TYPE: 'mobilenet_v2.ssdlite'
+  NORM: 'BN'
+FPN:
+  CONV: 'SepConv2d'
+  NORM: 'BN'
+  ACTIVATION: 'ReLU6'
+ANCHOR_GENERATOR:
+  STRIDES: [16, 32, 64, 107, 160, 320]
+  SIZES: [[48, 100], [100, 150],[150, 202],
+          [202, 253], [253, 304], [304, 320]]
+  ASPECT_RATIOS: [[1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33]]
+SOLVER:
+  BASE_LR: 0.04
+  WEIGHT_DECAY: 0.00004
+  DECAY_STEPS: [50000, 62500]
+  MAX_STEPS: 75000
+  SNAPSHOT_EVERY: 1250
+  SNAPSHOT_PREFIX: 'voc_ssdlite_MobileNetV2'
+AUG:
+  COLOR_JITTER: 0.5
+TRAIN:
+  WEIGHTS: '../data/pretrained/MobileNetV2_in1k_cls300e.pkl'
+  DATASET: '../data/datasets/voc_trainval0712'
+  IMS_PER_BATCH: 64
+  SCALES: [320]
+  SCALES_RANGE: [0.25, 1.0]
+  LOADER: 'ssd_train'
+  NUM_WORKERS: 12
+TEST:
+  DATASET: '../data/datasets/voc_test2007'
+  JSON_DATASET: '../data/datasets/voc_test2007.json'
+  EVALUATOR: 'voc2007'
+  IMS_PER_BATCH: 1
+  SCALES: [320]
+  NMS_THRESH: 0.45
+  SCORE_THRESH: 0.01
--- a/configs/ssd/voc_ssdlite_MobileNetV3L_300e.yml
+++ b/configs/ssd/voc_ssdlite_MobileNetV3L_300e.yml
+NUM_GPUS: 1
+MODEL:
+  TYPE: 'ssd'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'aeroplane', 'bicycle', 'bird', 'boat',
+            'bottle', 'bus', 'car', 'cat', 'chair',
+            'cow', 'diningtable', 'dog', 'horse',
+            'motorbike', 'person', 'pottedplant',
+            'sheep', 'sofa', 'train', 'tvmonitor']
+BACKBONE:
+  TYPE: 'mobilenet_v3_large.ssdlite'
+  NORM: 'BN'
+FPN:
+  CONV: 'SepConv2d'
+  NORM: 'BN'
+  ACTIVATION: 'ReLU'
+ANCHOR_GENERATOR:
+  STRIDES: [16, 32, 64, 107, 160, 320]
+  SIZES: [[48, 100], [100, 150],[150, 202],
+          [202, 253], [253, 304], [304, 320]]
+  ASPECT_RATIOS: [[1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33]]
+SOLVER:
+  BASE_LR: 0.04
+  WEIGHT_DECAY: 0.00004
+  DECAY_STEPS: [50000, 62500]
+  MAX_STEPS: 75000
+  SNAPSHOT_EVERY: 1250
+  SNAPSHOT_PREFIX: 'voc_ssdlite_MobileNetV3L'
+AUG:
+  COLOR_JITTER: 0.5
+TRAIN:
+  WEIGHTS: '../data/pretrained/MobileNetV3L_in1k_cls600e.pkl'
+  DATASET: '../data/datasets/voc_trainval0712'
+  IMS_PER_BATCH: 64
+  SCALES: [320]
+  SCALES_RANGE: [0.25, 1.0]
+  LOADER: 'ssd_train'
+  NUM_WORKERS: 12
+TEST:
+  DATASET: '../data/datasets/voc_test2007'
+  JSON_DATASET: '../data/datasets/voc_test2007.json'
+  EVALUATOR: 'voc2007'
+  IMS_PER_BATCH: 1
+  SCALES: [320]
+  NMS_THRESH: 0.45
+  SCORE_THRESH: 0.01
--- a/csrc/cxx/operators/nms_op.cc
+++ b/csrc/cxx/operators/nms_op.cc
-#include "nms_op.h"
-#include "../utils/detection_utils.h"
+#include "../operators/nms_op.h"
+#include "../utils/detection.h"

 namespace dragon {

 template <class Context>
 template <typename T>
 void NonMaxSuppressionOp<Context>::DoRunWithType() {
-  int num_selected;
-  utils::detection::ApplyNMS(
-      Output(0)->count(),
-      Output(0)->count(),
+  auto &X = Input(0), *Y = Output(0);
+  CHECK(X.ndim() == 2 && X.dim(1) == 5)
+      << "\nThe dimensions of boxes should be (num_boxes, 5).";
+  detection::ApplyNMS(
+      X.dim(0),
+      X.dim(0),
      iou_threshold_,
-      Input(0).template mutable_data<T, Context>(),
-      Output(0)->template mutable_data<int64_t, CPUContext>(),
-      num_selected,
+      X.template mutable_data<T, Context>(),
+      out_indices_,
      ctx());
-  Output(0)->Reshape({num_selected});
-}
-
-template <class Context>
-void NonMaxSuppressionOp<Context>::RunOnDevice() {
-  CHECK(Input(0).ndim() == 2 && Input(0).dim(1) == 5)
-      << "\nThe dimensions of boxes should be (num_boxes, 5).";
-  Output(0)->Reshape({Input(0).dim(0)});
-  DispatchHelper<TensorTypes<float>>::Call(this, Input(0));
+  Y->template CopyFrom<int64_t>(out_indices_);
 }

 DEPLOY_CPU_OPERATOR(NonMaxSuppression);

--- a/csrc/cxx/operators/nms_op.h
+++ b/csrc/cxx/operators/nms_op.h
@@ -10,8 +10,8 @@
 * ------------------------------------------------------------
 */

-#ifndef SEETADET_CXX_OPERATORS_NMS_OP_H_
-#define SEETADET_CXX_OPERATORS_NMS_OP_H_
+#ifndef DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
+#define DRAGON_EXTENSION_OPERATORS_NMS_OP_H_

 #include "dragon/core/operator.h"

@@ -25,15 +25,18 @@ class NonMaxSuppressionOp final : public Operator<Context> {
        iou_threshold_(OP_SINGLE_ARG(float, "iou_threshold", 0.5f)) {}
  USE_OPERATOR_FUNCTIONS;

-  void RunOnDevice() override;
+  void RunOnDevice() override {
+    DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(0));
+  }

  template <typename T>
  void DoRunWithType();

 protected:
  float iou_threshold_;
+  vector<int64_t> out_indices_;
 };

 } // namespace dragon

-#endif // SEETADET_CXX_OPERATORS_NMS_OP_H_
+#endif // DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
--- a/csrc/cxx/operators/retinanet_decoder_op.cc
+++ b/csrc/cxx/operators/retinanet_decoder_op.cc
-#include <dragon/utils/math_functions.h>
-
-#include "../utils/detection_utils.h"
-#include "retinanet_decoder_op.h"
+#include "../operators/retinanet_decoder_op.h"
+#include "../utils/detection.h"

 namespace dragon {

 template <class Context>
 template <typename T>
 void RetinaNetDecoderOp<Context>::DoRunWithType() {
-  using BT = float; // DType of BBox
-  using BC = CPUContext; // Context of BBox
-  int total_proposals = 0;
-
-  auto* batch_scores = Input(SCORES).template data<T, Context>();
-  auto* batch_deltas = Input(DELTAS).template data<T, BC>();
-  auto* im_info = Input(IMAGE_INFO).template data<BT, BC>();
-  auto* all_proposals = Output(0)->template mutable_data<BT, BC>();
-
-  for (int im_idx = 0; im_idx < num_images_; ++im_idx) {
-    BT im_h = im_info[0];
-    BT im_w = im_info[1];
-    BT im_scale_h = im_info[2];
-    BT im_scale_w = im_info[2];
-    if (Input(IMAGE_INFO).dim(1) == 4) im_scale_w = im_info[3];
-    CHECK_EQ(strides_.size(), InputSize() - 3)
-        << "\nGiven " << strides_.size() << " strides "
-        << "and " << InputSize() - 3 << " features";
-    // Select the top-k candidates as proposals
-    auto num_boxes = Input(SCORES).dim(1);
+  auto num_images = Input(SCORES).dim(0);
+  auto num_anchors = Input(SCORES).dim(1);
  auto num_classes = Input(SCORES).dim(2);
-    utils::detection::SelectProposals(
-        Input(SCORES).count(1),
-        score_thr_,
-        batch_scores + im_idx * Input(SCORES).stride(0),
-        roi_scores_,
-        roi_indices_,
-        ctx());
-    auto num_candidates = (int)roi_scores_.size();
-    auto num_proposals = std::min(num_candidates, (int)pre_nms_topn_);
-    utils::detection::ArgPartition(
-        num_candidates, num_proposals, true, roi_scores_.data(), indices_);
-    scores_.resize(indices_.size());
-    for (int i = 0; i < num_proposals; ++i) {
-      scores_[i] = roi_scores_[indices_[i]];
-      indices_[i] = roi_indices_[indices_[i]];
-    }
-    // Decode proposals via anchors
-    int stride_offset = 0;
-    for (int i = 0; i < strides_.size(); i++) {
-      auto feature_h = Input(i).dim(2);
-      auto feature_w = Input(i).dim(3);
-      auto K = feature_h * feature_w;
-      auto A = int(ratios_.size() * scales_.size());
-      anchors_.resize((size_t)(A * 4));
-      utils::detection::GenerateAnchors(
+  auto num_scores = num_anchors * num_classes;
+  auto num_cell_anchors = int64_t(ratios_.size() * scales_.size());
+
+  // Generate anchors.
+  CHECK_EQ(Input(GRID_INFO).dim(0), int64_t(strides_.size()))
+      << "\nProvide " << Input(GRID_INFO).dim(0) << " grids for "
+      << strides_.size() << " strides.";
+  cell_anchors_.resize(strides_.size());
+  vector<detection::GridArgs<int64_t>> grid_args(strides_.size());
+  for (int i = 0; i < strides_.size(); ++i) {
+    grid_args[i].stride = strides_[i];
+    auto& anchors = cell_anchors_[i];
+    if (int64_t(anchors.size()) == num_cell_anchors * 4) continue;
+    anchors.resize(num_cell_anchors * 4);
+    detection::GenerateAnchors(
        strides_[i],
-          (int)ratios_.size(),
-          (int)scales_.size(),
+        int64_t(ratios_.size()),
+        int64_t(scales_.size()),
        ratios_.data(),
        scales_.data(),
-          anchors_.data());
-      utils::detection::GetShiftedAnchors(
-          num_proposals,
+        anchors.data());
+  }
+
+  // Set grid arguments.
+  auto* grid_info = Input(GRID_INFO).template data<int64_t, CPUContext>();
+  detection::SetGridArgs(num_anchors, num_cell_anchors, grid_info, grid_args);
+
+  // Decode detections.
+  auto* scores = Input(SCORES).template data<T, Context>();
+  auto* deltas = Input(DELTAS).template data<T, CPUContext>();
+  auto* im_info = Input(IM_INFO).template data<float, CPUContext>();
+  auto* Y = Output(0)->Reshape({num_images * pre_nms_topn_, 7});
+  auto* dets = Y->template mutable_data<float, CPUContext>();
+  int64_t size_dets = 0;
+
+  for (int batch_ind = 0; batch_ind < num_images; ++batch_ind) {
+    detection::ImageArgs<int64_t> im_args(im_info + batch_ind * 4);
+    im_args.batch_ind = batch_ind;
+    detection::SelectProposals(
+        num_scores,
+        pre_nms_topn_,
+        score_thresh_,
+        scores + batch_ind * num_scores,
+        scores_,
+        indices_,
+        ctx());
+    auto* offset_dets = dets + size_dets * 7;
+    auto num_dets = int64_t(indices_.size());
+    size_dets += num_dets;
+    for (int i = 0; i < strides_.size(); ++i) {
+      detection::GetAnchors(
+          num_dets,
+          num_cell_anchors,
          num_classes,
-          A,
-          feature_h,
-          feature_w,
-          strides_[i],
-          stride_offset,
-          anchors_.data(),
+          grid_args[i],
+          cell_anchors_[i].data(),
          indices_.data(),
-          all_proposals);
-      stride_offset += (A * K);
+          offset_dets);
    }
-    utils::detection::GenerateDetections(
-        num_proposals,
-        num_boxes,
+    detection::DecodeDetections(
+        num_dets,
+        num_anchors,
        num_classes,
-        im_idx,
-        im_h,
-        im_w,
-        im_scale_h,
-        im_scale_w,
+        im_args,
        scores_.data(),
-        batch_deltas + im_idx * Input(DELTAS).stride(0),
+        deltas + batch_ind * Input(DELTAS).stride(0),
        indices_.data(),
-        all_proposals);
-    total_proposals += num_proposals;
-    all_proposals += (num_proposals * 7);
-    im_info += Input(IMAGE_INFO).dim(1);
+        offset_dets);
  }

-  Output(0)->Reshape({total_proposals, 7});
-}
-
-template <class Context>
-void RetinaNetDecoderOp<Context>::RunOnDevice() {
-  num_images_ = Input(0).dim(0);
-  CHECK_EQ(Input(-1).dim(0), num_images_)
-      << "\nExcepted " << num_images_ << " groups info, got "
-      << Input(-1).dim(0) << ".";
-  Output(0)->Reshape({num_images_ * pre_nms_topn_, 7});
-  DispatchHelper<TensorTypes<float>>::Call(this, Input(SCORES));
+  // Shrink to the correct dimensions.
+  Y->Reshape({size_dets, 7});
 }

 DEPLOY_CPU_OPERATOR(RetinaNetDecoder);
@@ -109,7 +88,7 @@ DEPLOY_CPU_OPERATOR(RetinaNetDecoder);
 DEPLOY_CUDA_OPERATOR(RetinaNetDecoder);
 #endif

-OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX);
+OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(4).NumOutputs(1);

 NO_GRADIENT(RetinaNetDecoder);


--- a/csrc/cxx/operators/retinanet_decoder_op.h
+++ b/csrc/cxx/operators/retinanet_decoder_op.h
@@ -10,8 +10,8 @@
 * ------------------------------------------------------------
 */

-#ifndef SEETADET_CXX_OPERATORS_RETINANET_DECODER_OP_H_
-#define SEETADET_CXX_OPERATORS_RETINANET_DECODER_OP_H_
+#ifndef DRAGON_EXTENSION_OPERATORS_RETINANET_DECODER_OP_H_
+#define DRAGON_EXTENSION_OPERATORS_RETINANET_DECODER_OP_H_

 #include "dragon/core/operator.h"

@@ -26,24 +26,29 @@ class RetinaNetDecoderOp final : public Operator<Context> {
        ratios_(OP_REPEATED_ARG(float, "ratios")),
        scales_(OP_REPEATED_ARG(float, "scales")),
        pre_nms_topn_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
-        score_thr_(OP_SINGLE_ARG(float, "score_thresh", 0.05f)) {}
+        score_thresh_(OP_SINGLE_ARG(float, "score_thresh", 0.05f)) {}
  USE_OPERATOR_FUNCTIONS;

-  void RunOnDevice() override;
+  void RunOnDevice() override {
+    DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(SCORES));
+  }

  template <typename T>
  void DoRunWithType();

-  enum INPUT_TAGS { SCORES = -3, DELTAS = -2, IMAGE_INFO = -1 };
+  enum INPUT_TAGS { SCORES = 0, DELTAS = 1, IM_INFO = 2, GRID_INFO = 3 };

 protected:
-  float score_thr_;
-  vec64_t strides_, indices_, roi_indices_;
-  vector<float> ratios_, scales_, anchors_;
-  vector<float> scores_, roi_scores_;
-  int64_t num_images_, pre_nms_topn_;
+  float score_thresh_;
+  vector<int64_t> strides_;
+  vector<float> ratios_, scales_;
+  int64_t pre_nms_topn_;
+
+  vector<float> scores_;
+  vector<int64_t> indices_;
+  vector<vector<float>> cell_anchors_;
 };

 } // namespace dragon

-#endif // SEETADET_CXX_OPERATORS_RETINANET_DECODER_OP_H_
+#endif // DRAGON_EXTENSION_PERATORS_RETINANET_DECODER_OP_H_
--- a/csrc/cxx/operators/rpn_decoder_op.cc
+++ b/csrc/cxx/operators/rpn_decoder_op.cc
-#include <dragon/utils/math_functions.h>
-
-#include "../utils/detection_utils.h"
-#include "rpn_decoder_op.h"
+#include "../operators/rpn_decoder_op.h"
+#include "../utils/detection.h"

 namespace dragon {

 template <class Context>
 template <typename T>
 void RPNDecoderOp<Context>::DoRunWithType() {
-  using BT = float; // DType of BBox
-  using BC = CPUContext; // Context of BBox
-
-  int feat_h, feat_w, K, A;
-  int total_rois = 0, num_rois;
-  int num_candidates, num_proposals;
-
-  auto* batch_scores = Input(SCORES).template data<T, BC>();
-  auto* batch_deltas = Input(DELTAS).template data<T, BC>();
-  auto* im_info = Input(IMAGE_INFO).template data<BT, BC>();
-  auto* all_rois = Output(0)->template mutable_data<BT, BC>();
-
-  for (int im_idx = 0; im_idx < num_images_; ++im_idx) {
-    const BT im_h = im_info[0];
-    const BT im_w = im_info[1];
-    auto* scores = batch_scores + im_idx * Input(SCORES).stride(0);
-    auto* deltas = batch_deltas + im_idx * Input(DELTAS).stride(0);
-    CHECK_EQ(strides_.size(), InputSize() - 3)
-        << "\nGiven " << strides_.size() << " strides "
-        << "and " << InputSize() - 3 << " feature inputs";
-    CHECK_EQ(strides_.size(), scales_.size())
-        << "\nGiven " << strides_.size() << " strides "
-        << "and " << scales_.size() << " scales";
-    // Select the top-k candidates as proposals
-    num_candidates = Input(SCORES).dim(1);
-    num_proposals = std::min(num_candidates, (int)pre_nms_top_n_);
-    utils::math::ArgPartition(
-        num_candidates, num_proposals, true, scores, indices_);
-    // Decode the candidates
-    int stride_offset = 0;
-    proposals_.Reshape({num_proposals, 5});
-    auto* proposals = proposals_.template mutable_data<BT, BC>();
-    for (int i = 0; i < strides_.size(); i++) {
-      feat_h = Input(i).dim(2);
-      feat_w = Input(i).dim(3);
-      K = feat_h * feat_w;
-      A = (int)ratios_.size();
-      anchors_.resize((size_t)(A * 4));
-      utils::detection::GenerateAnchors(
+  auto num_images = Input(SCORES).dim(0);
+  auto num_anchors = Input(SCORES).dim(1);
+  auto num_cell_anchors = int64_t(ratios_.size() * scales_.size());
+
+  // Generate anchors.
+  CHECK_EQ(Input(GRID_INFO).dim(0), int64_t(strides_.size()))
+      << "\nProvide " << Input(GRID_INFO).dim(0) << " grids for "
+      << strides_.size() << " strides.";
+  cell_anchors_.resize(strides_.size());
+  vector<detection::GridArgs<int64_t>> grid_args(strides_.size());
+  for (int i = 0; i < strides_.size(); ++i) {
+    grid_args[i].stride = strides_[i];
+    auto& anchors = cell_anchors_[i];
+    if (int64_t(anchors.size()) == num_cell_anchors * 4) continue;
+    anchors.resize(num_cell_anchors * 4);
+    detection::GenerateAnchors(
        strides_[i],
-          (int)ratios_.size(),
-          1,
+        int64_t(ratios_.size()),
+        int64_t(scales_.size()),
        ratios_.data(),
        scales_.data(),
-          anchors_.data());
-      utils::detection::GetShiftedAnchors(
+        anchors.data());
+  }
+
+  // Set grid arguments.
+  auto* grid_info = Input(GRID_INFO).template data<int64_t, CPUContext>();
+  detection::SetGridArgs(num_anchors, num_cell_anchors, grid_info, grid_args);
+
+  // Decode proposals.
+  auto* scores = Input(SCORES).template data<T, CPUContext>();
+  auto* deltas = Input(DELTAS).template data<T, CPUContext>();
+  auto* im_info = Input(IM_INFO).template data<float, CPUContext>();
+  auto* Y = Output("Y")->Reshape({num_images * pre_nms_topn_, 5});
+  auto* proposals = Y->template mutable_data<float, CPUContext>();
+  vector<int64_t> size_proposals({0});
+
+  for (int batch_ind = 0; batch_ind < num_images; ++batch_ind) {
+    detection::ImageArgs<int64_t> im_args(im_info + batch_ind * 4);
+    im_args.batch_ind = batch_ind;
+    detection::SelectProposals(
+        num_anchors,
+        pre_nms_topn_,
+        score_thresh_,
+        scores + batch_ind * num_anchors,
+        scores_,
+        indices_,
+        (CPUContext*)nullptr); // Faster.
+    auto* offset_proposals = proposals + size_proposals.back() * 5;
+    auto num_proposals = int64_t(indices_.size());
+    size_proposals.push_back(size_proposals.back() + num_proposals);
+    for (int i = 0; i < strides_.size(); ++i) {
+      detection::GetAnchors(
          num_proposals,
-          A,
-          feat_h,
-          feat_w,
-          strides_[i],
-          stride_offset,
-          anchors_.data(),
+          num_cell_anchors,
+          grid_args[i],
+          cell_anchors_[i].data(),
          indices_.data(),
-          proposals);
-      stride_offset += (A * K);
+          offset_proposals);
    }
-    utils::detection::GenerateProposals(
-        num_candidates,
+    detection::DecodeProposals(
        num_proposals,
-        im_h,
-        im_w,
-        scores,
-        deltas,
-        &indices_[0],
-        proposals);
-    // Sort, NMS and Retrieve
-    utils::detection::SortProposals(
-        0, num_proposals - 1, num_proposals, proposals);
-    utils::detection::ApplyNMS(
+        num_anchors,
+        im_args,
+        scores_.data(),
+        deltas + batch_ind * Input(DELTAS).stride(0),
+        indices_.data(),
+        offset_proposals);
+    detection::SortBoxes<T, detection::Box5d<T>>(
+        num_proposals, offset_proposals);
+  }
+
+  // Apply NMS.
+  auto* proposals_v2 = Y->template data<float, Context>();
+  int64_t size_rois = 0;
+  for (int batch_ind = 0; batch_ind < num_images; ++batch_ind) {
+    auto offset = size_proposals[batch_ind];
+    auto num_proposals = size_proposals[batch_ind + 1] - offset;
+    detection::ApplyNMS(
        num_proposals,
-        post_nms_top_n_,
-        nms_thr_,
-        proposals_.template mutable_data<BT, Context>(),
-        roi_indices_.data(),
-        num_rois,
+        post_nms_topn_,
+        nms_thresh_,
+        proposals_v2 + offset * 5,
+        nms_indices_,
        ctx());
-    utils::detection::RetrieveRoIs(
-        num_rois, im_idx, proposals, roi_indices_.data(), all_rois);
-    total_rois += num_rois;
-    all_rois += (num_rois * 5);
-    im_info += Input(IMAGE_INFO).dim(1);
+    num_proposals = int64_t(nms_indices_.size());
+    for (int i = 0; i < num_proposals; ++i) {
+      scores_[size_rois] = batch_ind;
+      indices_[size_rois++] = nms_indices_[i] + offset;
+    }
  }

-  Output(0)->Reshape({total_rois, 5});
-
-  // Distribute rois into K bins
-  if (OutputSize() > 1) {
-    CHECK_EQ(max_level_ - min_level_ + 1, OutputSize())
-        << "\nExcepted " << OutputSize() << " outputs for levels "
-        << "between [" << min_level_ << ", " << max_level_ << "].";
-    vector<BT*> ys(OutputSize());
-    vector<vec64_t> bins(OutputSize());
-    Tensor RoIs;
-    RoIs.ReshapeLike(*Output(0));
-
-    auto* rois = RoIs.template mutable_data<BT, BC>();
-
-    ctx()->template Copy<BT, BC, BC>(
-        Output(0)->count(), rois, Output(0)->template data<BT, BC>());
-
-    utils::detection::CollectRoIs(
-        total_rois,
+  // Apply Histogram.
+  detection::ApplyHistogram(
+      size_rois,
      min_level_,
      max_level_,
      canonical_level_,
      canonical_scale_,
-        rois,
-        bins);
-
-    for (int i = 0; i < OutputSize(); i++) {
-      Output(i)->Reshape({std::max((int)bins[i].size(), 1), 5});
-      ys[i] = Output(i)->template mutable_data<BT, BC>();
-    }
-
-    utils::detection::DistributeRoIs(bins, rois, ys);
+      proposals,
+      scores_.data(),
+      indices_.data(),
+      output_rois_);
+
+  // Copy to outputs.
+  for (int i = 0; i < OutputSize(); ++i) {
+    const auto& rois = output_rois_[i];
+    vector<int64_t> dims({int64_t(rois.size()) / 5, 5});
+    auto* Yi = Output(i)->Reshape(dims);
+    std::memcpy(
+        Yi->template mutable_data<T, CPUContext>(),
+        rois.data(),
+        sizeof(T) * rois.size());
  }
 }

-template <class Context>
-void RPNDecoderOp<Context>::RunOnDevice() {
-  num_images_ = Input(0).dim(0);
-  CHECK_EQ(Input(IMAGE_INFO).dim(0), num_images_)
-      << "\nExcepted " << num_images_ << " groups info, got "
-      << Input(IMAGE_INFO).dim(0) << ".";
-  roi_indices_.resize(post_nms_top_n_);
-  Output(0)->Reshape({num_images_ * post_nms_top_n_, 5});
-  DispatchHelper<TensorTypes<float>>::Call(this, Input(SCORES));
-}
-
 DEPLOY_CPU_OPERATOR(RPNDecoder);
 #ifdef USE_CUDA
 DEPLOY_CUDA_OPERATOR(RPNDecoder);
 #endif

-OPERATOR_SCHEMA(RPNDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX);
+OPERATOR_SCHEMA(RPNDecoder).NumInputs(4).NumOutputs(1, INT_MAX);

 NO_GRADIENT(RPNDecoder);


--- a/csrc/cxx/operators/rpn_decoder_op.h
+++ b/csrc/cxx/operators/rpn_decoder_op.h
@@ -10,8 +10,8 @@
 * ------------------------------------------------------------
 */

-#ifndef SEETADET_CXX_OPERATORS_RPN_DECODER_OP_H_
-#define SEETADET_CXX_OPERATORS_RPN_DECODER_OP_H_
+#ifndef DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
+#define DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_

 #include "dragon/core/operator.h"

@@ -25,32 +25,39 @@ class RPNDecoderOp final : public Operator<Context> {
        strides_(OP_REPEATED_ARG(int64_t, "strides")),
        ratios_(OP_REPEATED_ARG(float, "ratios")),
        scales_(OP_REPEATED_ARG(float, "scales")),
-        pre_nms_top_n_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
-        post_nms_top_n_(OP_SINGLE_ARG(int64_t, "post_nms_top_n", 1000)),
-        nms_thr_(OP_SINGLE_ARG(float, "nms_thresh", 0.7f)),
+        pre_nms_topn_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
+        post_nms_topn_(OP_SINGLE_ARG(int64_t, "post_nms_top_n", 1000)),
+        nms_thresh_(OP_SINGLE_ARG(float, "nms_thresh", 0.7f)),
+        score_thresh_(OP_SINGLE_ARG(float, "score_thresh", 0.f)),
        min_level_(OP_SINGLE_ARG(int64_t, "min_level", 2)),
        max_level_(OP_SINGLE_ARG(int64_t, "max_level", 5)),
        canonical_level_(OP_SINGLE_ARG(int64_t, "canonical_level", 4)),
        canonical_scale_(OP_SINGLE_ARG(int64_t, "canonical_scale", 224)) {}
  USE_OPERATOR_FUNCTIONS;

-  void RunOnDevice() override;
+  void RunOnDevice() override {
+    DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(SCORES));
+  }

  template <typename T>
  void DoRunWithType();

-  enum INPUT_TAGS { SCORES = -3, DELTAS = -2, IMAGE_INFO = -1 };
+  enum INPUT_TAGS { SCORES = 0, DELTAS = 1, IM_INFO = 2, GRID_INFO = 3 };

 protected:
-  float nms_thr_;
-  vec64_t strides_, indices_, roi_indices_;
-  vector<float> ratios_, scales_, scores_, anchors_;
-  int64_t pre_nms_top_n_, post_nms_top_n_;
-  int64_t num_images_, min_level_, max_level_;
+  float nms_thresh_, score_thresh_;
+  vector<int64_t> strides_;
+  vector<float> ratios_, scales_;
+  int64_t min_level_, max_level_;
+  int64_t pre_nms_topn_, post_nms_topn_;
  int64_t canonical_level_, canonical_scale_;
-  Tensor proposals_;
+
+  vector<float> scores_;
+  vector<int64_t> indices_, nms_indices_;
+  vector<vector<float>> cell_anchors_;
+  vector<vector<float>> output_rois_;
 };

 } // namespace dragon

-#endif // SEETADET_CXX_OPERATORS_RPN_DECODER_OP_H_
+#endif // DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
--- a/csrc/cxx/setup.py
+++ b/csrc/cxx/setup.py
@@ -8,7 +8,7 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
-"""Build cxx sources."""
+"""Build cpp extensions."""

 from __future__ import absolute_import
 from __future__ import division
@@ -16,7 +16,7 @@ from __future__ import print_function

 import glob

-from dragon.tools import cpp_extension
+from dragon.utils import cpp_extension
 from setuptools import setup

 Extension = cpp_extension.CppExtension
@@ -32,23 +32,18 @@ def find_sources(*dirs):
    sources = []
    for path in dirs:
        for ext_suffix in ext_suffixes:
-            sources += glob.glob(
-                path + '/*' + ext_suffix,
-                recursive=True,
-            )
+            sources += glob.glob(path + '/*' + ext_suffix, recursive=True)
    return sources


 ext_modules = [
    Extension(
-        name='install.lib.modules._C',
+        name='seetadet.ops._C',
        sources=find_sources('**'),
        define_macros=[('THRUST_IGNORE_CUB_VERSION_CHECK', None)],
    ),
 ]

-setup(
-    name='SeetaDet',
+setup(name='seetadet',
      ext_modules=ext_modules,
-    cmdclass={'build_ext': cpp_extension.BuildExtension},
-)
+      cmdclass={'build_ext': cpp_extension.BuildExtension})
--- a/csrc/cxx/utils/detection.h
+++ b/csrc/cxx/utils/detection.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_H_
+
+#include "../utils/detection/anchors.h"
+#include "../utils/detection/bbox.h"
+#include "../utils/detection/nms.h"
+#include "../utils/detection/proposals.h"
+#include "../utils/detection/types.h"
+
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_H_
--- a/csrc/cxx/utils/detection/anchors.h
+++ b/csrc/cxx/utils/detection/anchors.h
+
+
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
+
+#include "../../utils/detection/types.h"
+
+namespace dragon {
+
+namespace detection {
+
+/*!
+ * Anchor Functions.
+ */
+
+template <typename IndexT>
+inline void SetGridArgs(
+    const int num_anchors,
+    const int num_cell_anchors,
+    const IndexT* grid_info,
+    vector<GridArgs<IndexT>>& grid_args) {
+  IndexT grid_offset = 0;
+  for (int i = 0; i < grid_args.size(); ++i, grid_info += 2) {
+    auto& args = grid_args[i];
+    args.h = grid_info[0];
+    args.w = grid_info[1];
+    args.offset = grid_offset;
+    grid_offset += num_cell_anchors * args.h * args.w;
+  }
+  std::stringstream ss;
+  if (grid_offset != num_anchors) {
+    ss << "Mismatched number of anchors. (Excepted ";
+    ss << num_anchors << ", Got " << grid_offset << ")";
+    for (int i = 0; i < grid_args.size(); ++i) {
+      ss << "\nGrid #" << i << ": "
+         << "A=" << num_cell_anchors << ", H=" << grid_args[i].h
+         << ", W=" << grid_args[i].w << "\n";
+    }
+  }
+  if (!ss.str().empty()) LOG(FATAL) << ss.str();
+}
+
+template <typename T>
+inline void GenerateAnchors(
+    const int stride,
+    const int num_ratios,
+    const int num_scales,
+    const T* ratios,
+    const T* scales,
+    T* anchors) {
+  T* offset_anchors = anchors;
+  const T area = T(stride * stride);
+  const T ctr = T(0.5) * T(stride - 1);
+  for (int i = 0; i < num_ratios; ++i) {
+    const T ratio_w = std::round(std::sqrt(area / ratios[i]));
+    const T ratio_h = std::round(ratio_w * ratios[i]);
+    for (int j = 0; j < num_scales; ++j) {
+      const T w_half = T(0.5) * (ratio_w * scales[j] - T(1));
+      const T h_half = T(0.5) * (ratio_h * scales[j] - T(1));
+      offset_anchors[0] = ctr - w_half;
+      offset_anchors[1] = ctr - h_half;
+      offset_anchors[2] = ctr + w_half;
+      offset_anchors[3] = ctr + h_half;
+      offset_anchors += 4;
+    }
+  }
+}
+
+template <typename T>
+inline void GetAnchors(
+    const int num_anchors,
+    const int num_cell_anchors,
+    const GridArgs<int64_t>& args,
+    const T* cell_anchors,
+    const int64_t* indices,
+    T* anchors) {
+  const int64_t index_min = args.offset;
+  const int64_t index_max = num_cell_anchors * args.h * args.w;
+  for (int i = 0; i < num_anchors; ++i) {
+    auto index = indices[i] - index_min;
+    if (index >= 0 && index < index_max) {
+      const auto w = index % args.w;
+      index /= args.w;
+      const auto h = index % args.h;
+      index /= args.h;
+      const auto shift_x = T(w * args.stride);
+      const auto shift_y = T(h * args.stride);
+      auto* offset_anchors = anchors + i * 5;
+      const auto* offset_cell_anchors = cell_anchors + index * 4;
+      offset_anchors[0] = shift_x + offset_cell_anchors[0];
+      offset_anchors[1] = shift_y + offset_cell_anchors[1];
+      offset_anchors[2] = shift_x + offset_cell_anchors[2];
+      offset_anchors[3] = shift_y + offset_cell_anchors[3];
+    }
+  }
+}
+
+template <typename T>
+inline void GetAnchors(
+    const int num_anchors,
+    const int num_cell_anchors,
+    const int num_classes,
+    const GridArgs<int64_t>& args,
+    const T* cell_anchors,
+    const int64_t* indices,
+    T* anchors) {
+  const int64_t index_min = num_classes * args.offset;
+  const int64_t index_max = num_classes * (num_cell_anchors * args.h * args.w);
+  for (int i = 0; i < num_anchors; ++i) {
+    auto index = indices[i] - index_min;
+    if (index >= 0 && index < index_max) {
+      index /= num_classes;
+      const auto w = index % args.w;
+      index /= args.w;
+      const auto h = index % args.h;
+      index /= args.h;
+      const auto shift_x = T(w * args.stride);
+      const auto shift_y = T(h * args.stride);
+      auto* offset_anchors = anchors + i * 7 + 1;
+      const auto* offset_cell_anchors = cell_anchors + index * 4;
+      offset_anchors[0] = shift_x + offset_cell_anchors[0];
+      offset_anchors[1] = shift_y + offset_cell_anchors[1];
+      offset_anchors[2] = shift_x + offset_cell_anchors[2];
+      offset_anchors[3] = shift_y + offset_cell_anchors[3];
+    }
+  }
+}
+
+} // namespace detection
+
+} // namespace dragon
+
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
--- a/csrc/cxx/utils/detection/bbox.h
+++ b/csrc/cxx/utils/detection/bbox.h
+
+
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
+
+#include "../../utils/detection/types.h"
+
+#if defined(__CUDACC__)
+#define HOSTDEVICE_DECL inline __host__ __device__
+#else
+#define HOSTDEVICE_DECL inline
+#endif
+
+namespace dragon {
+
+namespace detection {
+
+/*
+ * BBox Functions.
+ */
+
+template <typename T, class BoxT>
+inline void SortBoxes(const int N, T* data, bool descend = true) {
+  auto* boxes = reinterpret_cast<BoxT*>(data);
+  std::sort(boxes, boxes + N, [descend](BoxT lhs, BoxT rhs) {
+    return descend ? (lhs.score > rhs.score) : (lhs.score < rhs.score);
+  });
+}
+
+/*
+ * BBox Utilities.
+ */
+
+namespace utils {
+
+template <typename T>
+HOSTDEVICE_DECL bool CheckIoU(const T thresh, const T* a, const T* b) {
+#if defined(__CUDACC__)
+  const T x1 = max(a[0], b[0]);
+  const T y1 = max(a[1], b[1]);
+  const T x2 = min(a[2], b[2]);
+  const T y2 = min(a[3], b[3]);
+  const T width = max(T(0), x2 - x1 + T(1));
+  const T height = max(T(0), y2 - y1 + T(1));
+#else
+  const T x1 = std::max(a[0], b[0]);
+  const T y1 = std::max(a[1], b[1]);
+  const T x2 = std::min(a[2], b[2]);
+  const T y2 = std::min(a[3], b[3]);
+  const T width = std::max(T(0), x2 - x1 + T(1));
+  const T height = std::max(T(0), y2 - y1 + T(1));
+#endif
+  const T inter = width * height;
+  const T Sa = (a[2] - a[0] + T(1)) * (a[3] - a[1] + T(1));
+  const T Sb = (b[2] - b[0] + T(1)) * (b[3] - b[1] + T(1));
+  return inter > thresh * (Sa + Sb - inter);
+}
+
+template <typename T>
+inline void BBoxTransform(
+    const T dx,
+    const T dy,
+    const T dw,
+    const T dh,
+    const T im_w,
+    const T im_h,
+    const T im_scale_h,
+    const T im_scale_w,
+    T* bbox) {
+  const T w = bbox[2] - bbox[0] + 1;
+  const T h = bbox[3] - bbox[1] + 1;
+  const T ctr_x = bbox[0] + T(0.5) * w;
+  const T ctr_y = bbox[1] + T(0.5) * h;
+  const T pred_ctr_x = dx * w + ctr_x;
+  const T pred_ctr_y = dy * h + ctr_y;
+  const T pred_w = std::exp(dw) * w;
+  const T pred_h = std::exp(dh) * h;
+  const T x1 = pred_ctr_x - T(0.5) * pred_w;
+  const T y1 = pred_ctr_y - T(0.5) * pred_h;
+  const T x2 = pred_ctr_x + T(0.5) * pred_w;
+  const T y2 = pred_ctr_y + T(0.5) * pred_h;
+  bbox[0] = std::max(T(0), std::min(x1, im_w - T(1))) / im_scale_w;
+  bbox[1] = std::max(T(0), std::min(y1, im_h - T(1))) / im_scale_h;
+  bbox[2] = std::max(T(0), std::min(x2, im_w - T(1))) / im_scale_w;
+  bbox[3] = std::max(T(0), std::min(y2, im_h - T(1))) / im_scale_h;
+}
+
+template <typename T>
+inline int GetBBoxLevel(
+    const int lvl_min,
+    const int lvl_max,
+    const int lvl0,
+    const int s0,
+    T* bbox) {
+  const T w = bbox[2] - bbox[0] + 1;
+  const T h = bbox[3] - bbox[1] + 1;
+  const T s = std::sqrt(w * h);
+  const int lvl = lvl0 + std::log2(s / s0 + T(1e-6));
+  return std::min(std::max(lvl, lvl_min), lvl_max);
+}
+
+} // namespace utils
+
+} // namespace detection
+
+} // namespace dragon
+
+#undef HOSTDEVICE_DECL
+
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
--- a/csrc/cxx/utils/detection/iterator.h
+++ b/csrc/cxx/utils/detection/iterator.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *     <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
+
+#include <dragon/core/common.h>
+
+namespace dragon {
+
+namespace detection {
+
+template <typename MapT>
+class KeyValueMapIterator
+    : public std::iterator<std::input_iterator_tag, MapT> {
+ public:
+  typedef KeyValueMapIterator self_type;
+  typedef ptrdiff_t difference_type;
+  typedef MapT value_type;
+  typedef MapT& reference;
+
+  KeyValueMapIterator(
+      typename MapT::key_type* key_ptr,
+      typename MapT::value_type* value_ptr)
+      : key_ptr_(key_ptr), value_ptr_(value_ptr) {}
+
+  self_type operator++(int) {
+    self_type ret = *this;
+    key_ptr_++;
+    value_ptr_++;
+    return ret;
+  }
+
+  self_type operator++() {
+    key_ptr_++;
+    value_ptr_++;
+    return *this;
+  }
+
+  self_type operator--() {
+    key_ptr_--;
+    value_ptr_--;
+    return *this;
+  }
+
+  self_type operator--(int) {
+    self_type ret = *this;
+    key_ptr_--;
+    value_ptr_--;
+    return ret;
+  }
+
+  reference operator*() const {
+    if (map_.key_ptr != key_ptr_) {
+      map_.key_ptr = key_ptr_;
+      map_.value_ptr = value_ptr_;
+    }
+    return map_;
+  }
+
+  self_type operator+(difference_type n) const {
+    return self_type(key_ptr_ + n, value_ptr_ + n);
+  }
+
+  self_type& operator+=(difference_type n) {
+    key_ptr_ += n;
+    value_ptr_ += n;
+    return *this;
+  }
+
+  self_type operator-(difference_type n) const {
+    return self_type(key_ptr_ - n, value_ptr_ - n);
+  }
+
+  self_type& operator-=(difference_type n) {
+    key_ptr_ -= n;
+    value_ptr_ -= n;
+    return *this;
+  }
+
+  difference_type operator-(self_type other) const {
+    return key_ptr_ - other.key_ptr_;
+  }
+
+  bool operator<(const self_type& rhs) const {
+    return key_ptr_ < rhs.key_ptr_;
+  }
+
+  bool operator<=(const self_type& rhs) const {
+    return key_ptr_ <= rhs.key_ptr_;
+  }
+
+  bool operator==(const self_type& rhs) const {
+    return key_ptr_ == rhs.key_ptr_;
+  }
+
+  bool operator!=(const self_type& rhs) const {
+    return key_ptr_ != rhs.key_ptr_;
+  }
+
+ private:
+  mutable MapT map_;
+  typename MapT::key_type* key_ptr_;
+  typename MapT::value_type* value_ptr_;
+};
+
+} // namespace detection
+
+} // namespace dragon
+
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
--- a/csrc/cxx/utils/detection/nms.cc
+++ b/csrc/cxx/utils/detection/nms.cc
+#include <dragon/core/context.h>
+
+#include "../../utils/detection/bbox.h"
+#include "../../utils/detection/nms.h"
+
+namespace dragon {
+
+namespace detection {
+
+template <>
+void ApplyNMS<float, CPUContext>(
+    const int N,
+    const int K,
+    const float thresh,
+    const float* boxes,
+    vector<int64_t>& indices,
+    CPUContext* ctx) {
+  int num_selected = 0;
+  indices.resize(K);
+  vector<char> is_dead(N, 0);
+  for (int i = 0; i < N; ++i) {
+    if (is_dead[i]) continue;
+    indices[num_selected++] = i;
+    if (num_selected >= K) break;
+    for (int j = i + 1; j < N; ++j) {
+      if (is_dead[j]) continue;
+      if (!utils::CheckIoU(thresh, &boxes[i * 5], &boxes[j * 5])) continue;
+      is_dead[j] = 1;
+    }
+  }
+  indices.resize(num_selected);
+}
+
+} // namespace detection
+
+} // namespace dragon
--- a/csrc/cxx/utils/detection/nms.cu
+++ b/csrc/cxx/utils/detection/nms.cu
+#include <dragon/core/context_cuda.h>
+#include <dragon/core/workspace.h>
+
+#include "../../utils/detection/bbox.h"
+#include "../../utils/detection/nms.h"
+#include "../../utils/detection/utils.h"
+
+namespace dragon {
+
+namespace detection {
+
+namespace {
+
+#define NUM_THREADS 64
+
+template <typename T>
+__global__ void _NonMaxSuppression(
+    const int N,
+    const T thresh,
+    const T* boxes,
+    uint64_t* mask) {
+  const int row_start = blockIdx.y;
+  const int col_start = blockIdx.x;
+  if (row_start > col_start) return;
+
+  const int row_size = min(N - row_start * NUM_THREADS, NUM_THREADS);
+  const int col_size = min(N - col_start * NUM_THREADS, NUM_THREADS);
+
+  __shared__ T block_boxes[NUM_THREADS * 4];
+
+  if (threadIdx.x < col_size) {
+    auto* offset_block_boxes = block_boxes + threadIdx.x * 4;
+    auto* offset_boxes = boxes + (col_start * NUM_THREADS + threadIdx.x) * 5;
+#pragma unroll
+    for (int i = 0; i < 4; ++i) {
+      *(offset_block_boxes++) = *(offset_boxes++);
+    }
+  }
+
+  __syncthreads();
+
+  if (threadIdx.x < row_size) {
+    const int index = row_start * NUM_THREADS + threadIdx.x;
+    const T* offset_boxes = boxes + index * 5;
+    unsigned long long val = 0;
+    const int start = (row_start == col_start) ? (threadIdx.x + 1) : 0;
+    for (int i = start; i < col_size; ++i) {
+      if (utils::CheckIoU(thresh, offset_boxes, block_boxes + i * 4)) {
+        val |= 1ULL << i;
+      }
+    }
+    mask[index * gridDim.x + col_start] = val;
+  }
+}
+
+} // namespace
+
+template <>
+void ApplyNMS<float, CUDAContext>(
+    const int N,
+    const int K,
+    const float thresh,
+    const float* boxes,
+    vector<int64_t>& indices,
+    CUDAContext* ctx) {
+  const auto num_blocks = utils::DivUp(N, NUM_THREADS);
+  vector<uint64_t> mask_host(N * num_blocks);
+  auto* mask_dev = (uint64_t*)ctx->workspace()->data<CUDAContext>(
+      mask_host.size() * sizeof(uint64_t), "BufferKernel");
+  _NonMaxSuppression<<<
+      dim3(num_blocks, num_blocks),
+      NUM_THREADS,
+      0,
+      ctx->cuda_stream()>>>(N, thresh, boxes, mask_dev);
+  CUDA_CHECK(cudaMemcpyAsync(
+      mask_host.data(),
+      mask_dev,
+      mask_host.size() * sizeof(uint64_t),
+      cudaMemcpyDeviceToHost,
+      ctx->cuda_stream()));
+  ctx->FinishDeviceComputation();
+  vector<uint64_t> is_dead(num_blocks);
+  memset(&is_dead[0], 0, sizeof(uint64_t) * num_blocks);
+  int num_selected = 0;
+  indices.resize(K);
+  for (int i = 0; i < N; ++i) {
+    const int nblock = i / NUM_THREADS;
+    const int inblock = i % NUM_THREADS;
+    if (!(is_dead[nblock] & (1ULL << inblock))) {
+      indices[num_selected++] = i;
+      if (num_selected >= K) break;
+      auto* offset_mask = &mask_host[0] + i * num_blocks;
+      for (int j = nblock; j < num_blocks; ++j) {
+        is_dead[j] |= offset_mask[j];
+      }
+    }
+  }
+  indices.resize(num_selected);
+}
+
+} // namespace detection
+
+} // namespace dragon
--- a/csrc/cxx/utils/detection/nms.h
+++ b/csrc/cxx/utils/detection/nms.h
+
+
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
+
+#include "../../utils/detection/types.h"
+
+namespace dragon {
+
+namespace detection {
+
+template <typename T, class Context>
+void ApplyNMS(
+    const int N,
+    const int K,
+    const T thresh,
+    const T* boxes,
+    vector<int64_t>& indices,
+    Context* ctx);
+
+} // namespace detection
+
+} // namespace dragon
+
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
--- a/csrc/cxx/utils/detection/proposals.cc
+++ b/csrc/cxx/utils/detection/proposals.cc
+#include <dragon/core/context.h>
+
+#include "../../utils/detection/proposals.h"
+
+namespace dragon {
+
+namespace detection {
+
+namespace {
+
+template <typename KeyT, typename ValueT>
+inline void
+ArgPartition(const int N, const int K, const ValueT* values, KeyT* keys) {
+  std::nth_element(keys, keys + K, keys + N, [&values](KeyT lhs, KeyT rhs) {
+    return values[lhs] > values[rhs];
+  });
+}
+
+} // namespace
+
+template <>
+void SelectProposals<float, CPUContext>(
+    const int N,
+    const int K,
+    const float thresh,
+    const float* scores,
+    vector<float>& out_scores,
+    vector<int64_t>& out_indices,
+    CPUContext* ctx) {
+  int num_selected = 0;
+  out_indices.resize(N);
+  if (thresh > 0.f) {
+    for (int i = 0; i < N; ++i) {
+      if (scores[i] > thresh) {
+        out_indices[num_selected++] = i;
+      }
+    }
+  } else {
+    num_selected = N;
+    std::iota(out_indices.begin(), out_indices.end(), 0);
+  }
+  if (num_selected > K) {
+    ArgPartition(num_selected, K, scores, out_indices.data());
+    out_scores.resize(K);
+    out_indices.resize(K);
+    for (int i = 0; i < K; ++i) {
+      out_scores[i] = scores[out_indices[i]];
+    }
+  } else {
+    out_scores.resize(num_selected);
+    out_indices.resize(num_selected);
+    for (int i = 0; i < num_selected; ++i) {
+      out_scores[i] = scores[out_indices[i]];
+    }
+  }
+}
+
+} // namespace detection
+
+} // namespace dragon
--- a/csrc/cxx/utils/detection/proposals.cu
+++ b/csrc/cxx/utils/detection/proposals.cu
+#include <dragon/core/context_cuda.h>
+#include <dragon/core/workspace.h>
+#include <dragon/utils/device/common_thrust.h>
+
+#include "../../utils/detection/iterator.h"
+#include "../../utils/detection/proposals.h"
+
+namespace dragon {
+
+namespace detection {
+
+namespace {
+
+template <typename KeyT, typename ValueT>
+struct ThresholdFunctor {
+  ThresholdFunctor(ValueT thresh) : thresh_(thresh) {}
+  inline __device__ bool operator()(
+      const thrust::tuple<KeyT, ValueT>& kv) const {
+    return thrust::get<1>(kv) > thresh_;
+  }
+  ValueT thresh_;
+};
+
+template <typename IterT>
+inline void ArgPartition(const int N, const int K, IterT data) {
+  std::nth_element(
+      data,
+      data + K,
+      data + N,
+      [](const typename IterT::value_type& lhs,
+         const typename IterT::value_type& rhs) {
+        return *lhs.value_ptr > *rhs.value_ptr;
+      });
+}
+
+} // namespace
+
+template <>
+void SelectProposals<float, CUDAContext>(
+    const int N,
+    const int K,
+    const float thresh,
+    const float* scores,
+    vector<float>& out_scores,
+    vector<int64_t>& out_indices,
+    CUDAContext* ctx) {
+  int num_selected = N;
+  int64_t* indices = nullptr;
+  if (thresh > 0.f) {
+    indices = ctx->workspace()->data<int64_t, CUDAContext>(N, "BufferKernel");
+    auto policy = thrust::cuda::par.on(ctx->cuda_stream());
+    auto functor = ThresholdFunctor<int64_t, float>(thresh);
+    thrust::sequence(policy, indices, indices + N);
+    auto kv = thrust::make_tuple(indices, const_cast<float*>(scores));
+    auto first = thrust::make_zip_iterator(kv);
+    auto last = thrust::partition(policy, first, first + N, functor);
+    num_selected = last - first;
+  }
+  out_scores.resize(num_selected);
+  out_indices.resize(num_selected);
+  CUDA_CHECK(cudaMemcpyAsync(
+      out_scores.data(),
+      scores,
+      num_selected * sizeof(float),
+      cudaMemcpyDeviceToHost,
+      ctx->cuda_stream()));
+  if (thresh > 0.f) {
+    CUDA_CHECK(cudaMemcpyAsync(
+        out_indices.data(),
+        indices,
+        num_selected * sizeof(int64_t),
+        cudaMemcpyDeviceToHost,
+        ctx->cuda_stream()));
+  } else {
+    std::iota(out_indices.begin(), out_indices.end(), 0);
+  }
+  ctx->FinishDeviceComputation();
+  if (num_selected > K) {
+    auto iter = KeyValueMapIterator<KeyValueMap<int64_t, float>>(
+        out_indices.data(), out_scores.data());
+    ArgPartition(num_selected, K, iter);
+    out_scores.resize(K);
+    out_indices.resize(K);
+  }
+}
+
+} // namespace detection
+
+} // namespace dragon
--- a/csrc/cxx/utils/detection/proposals.h
+++ b/csrc/cxx/utils/detection/proposals.h
+
+
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
+
+#include "../../utils/detection/bbox.h"
+#include "../../utils/detection/types.h"
+
+namespace dragon {
+
+namespace detection {
+
+template <typename T, class Context>
+void SelectProposals(
+    const int N,
+    const int K,
+    const float thresh,
+    const T* input_scores,
+    vector<T>& output_scores,
+    vector<int64_t>& output_indices,
+    Context* ctx);
+
+template <typename T>
+void DecodeProposals(
+    const int num_proposals,
+    const int num_anchors,
+    const ImageArgs<int64_t>& im_args,
+    const T* scores,
+    const T* deltas,
+    const int64_t* indices,
+    T* proposals) {
+  T* offset_proposals = proposals;
+  const T* offset_dx = deltas;
+  const T* offset_dy = deltas + num_anchors;
+  const T* offset_dw = deltas + num_anchors * 2;
+  const T* offset_dh = deltas + num_anchors * 3;
+  for (int i = 0; i < num_proposals; ++i) {
+    const auto index = indices[i];
+    utils::BBoxTransform(
+        offset_dx[index],
+        offset_dy[index],
+        offset_dw[index],
+        offset_dh[index],
+        T(im_args.w),
+        T(im_args.h),
+        T(1),
+        T(1),
+        offset_proposals);
+    offset_proposals[4] = scores[i];
+    offset_proposals += 5;
+  }
+}
+
+template <typename T>
+void DecodeDetections(
+    const int num_dets,
+    const int num_anchors,
+    const int num_classes,
+    const ImageArgs<int64_t>& im_args,
+    const T* scores,
+    const T* deltas,
+    const int64_t* indices,
+    T* dets) {
+  T* offset_dets = dets;
+  const T* offset_dx = deltas;
+  const T* offset_dy = deltas + num_anchors;
+  const T* offset_dw = deltas + num_anchors * 2;
+  const T* offset_dh = deltas + num_anchors * 3;
+  for (int i = 0; i < num_dets; ++i) {
+    const auto index = indices[i] / num_classes;
+    utils::BBoxTransform(
+        offset_dx[index],
+        offset_dy[index],
+        offset_dw[index],
+        offset_dh[index],
+        T(im_args.w),
+        T(im_args.h),
+        T(im_args.scale_h),
+        T(im_args.scale_w),
+        offset_dets + 1);
+    offset_dets[0] = T(im_args.batch_ind);
+    offset_dets[5] = scores[i];
+    offset_dets[6] = T(indices[i] % num_classes + 1);
+    offset_dets += 7;
+  }
+}
+
+template <typename T>
+inline void ApplyHistogram(
+    const int N,
+    const int lvl_min,
+    const int lvl_max,
+    const int lvl0,
+    const int s0,
+    const T* boxes,
+    const T* batch_indices,
+    const int64_t* box_indices,
+    vector<vector<T>>& output_rois) {
+  vector<int> bin_indices(N);
+  vector<int> bin_count(lvl_max - lvl_min + 1, 0);
+  for (int i = 0; i < N; ++i) {
+    const T* offset_boxes = boxes + box_indices[i] * 5;
+    auto lvl = utils::GetBBoxLevel(lvl_min, lvl_max, lvl0, s0, offset_boxes);
+    bin_indices[i] = lvl - lvl_min;
+    bin_count[lvl - lvl_min]++;
+  }
+  output_rois.resize(lvl_max - lvl_min + 1);
+  for (int i = 0; i < output_rois.size(); ++i) {
+    auto& rois = output_rois[i];
+    rois.resize(std::max(bin_count[i], 1) * 5);
+    if (bin_count[i] == 0) rois[0] = T(-1); // Ignored.
+  }
+  for (int i = 0; i < N; ++i) {
+    const T* offset_boxes = boxes + box_indices[i] * 5;
+    const auto bin_index = bin_indices[i];
+    const auto roi_index = --bin_count[bin_index];
+    auto& rois = output_rois[bin_index];
+    T* offset_rois = rois.data() + roi_index * 5;
+    offset_rois[0] = batch_indices[i];
+    offset_rois[1] = offset_boxes[0];
+    offset_rois[2] = offset_boxes[1];
+    offset_rois[3] = offset_boxes[2];
+    offset_rois[4] = offset_boxes[3];
+  }
+}
+
+} // namespace detection
+
+} // namespace dragon
+
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
--- a/csrc/cxx/utils/detection/types.h
+++ b/csrc/cxx/utils/detection/types.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
+
+#include <dragon/core/common.h>
+
+namespace dragon {
+
+namespace detection {
+
+template <typename T>
+struct Box4d {
+  T x1, y1, x2, y2;
+};
+
+template <typename T>
+struct Box5d {
+  T x1, y1, x2, y2, score;
+};
+
+template <typename IndexT>
+struct ImageArgs {
+  ImageArgs(const float* im_info) {
+    h = im_info[0], w = im_info[1];
+    scale_h = im_info[2], scale_w = im_info[3];
+  }
+
+  IndexT batch_ind, h, w;
+  float scale_h, scale_w;
+};
+
+template <typename IndexT>
+struct GridArgs {
+  IndexT h, w, stride, offset;
+};
+
+template <typename KeyT, typename ValueT>
+struct KeyValueMap {
+  typedef KeyT key_type;
+  typedef ValueT value_type;
+
+  friend void swap(KeyValueMap& x, KeyValueMap& y) {
+    std::swap(*x.key_ptr, *y.key_ptr);
+    std::swap(*x.value_ptr, *y.value_ptr);
+  }
+
+  KeyT* key_ptr = nullptr;
+  ValueT* value_ptr = nullptr;
+};
+
+} // namespace detection
+
+} // namespace dragon
+
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
--- a/csrc/cxx/utils/detection/utils.h
+++ b/csrc/cxx/utils/detection/utils.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *     <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
+
+namespace dragon {
+
+namespace detection {
+
+/*
+ * Detection Utilities.
+ */
+
+namespace utils {
+
+template <typename T>
+inline T DivUp(const T a, const T b) {
+  return (a + b - T(1)) / b;
+}
+
+} // namespace utils
+
+} // namespace detection
+
+} // namespace dragon
+
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
--- a/csrc/cxx/utils/detection_utils.cc
+++ b/csrc/cxx/utils/detection_utils.cc
-#include "detection_utils.h"
-#include <dragon/core/context.h>
-
-namespace dragon {
-
-namespace utils {
-
-namespace detection {
-
-template <typename T>
-T IoU(const T A[], const T B[]) {
-  if (A[0] > B[2] || A[1] > B[3] || A[2] < B[0] || A[3] < B[1]) return 0;
-  const T x1 = std::max(A[0], B[0]);
-  const T y1 = std::max(A[1], B[1]);
-  const T x2 = std::min(A[2], B[2]);
-  const T y2 = std::min(A[3], B[3]);
-  const T width = std::max((T)0, x2 - x1 + 1);
-  const T height = std::max((T)0, y2 - y1 + 1);
-  const T area = width * height;
-  const T A_area = (A[2] - A[0] + 1) * (A[3] - A[1] + 1);
-  const T B_area = (B[2] - B[0] + 1) * (B[3] - B[1] + 1);
-  return area / (A_area + B_area - area);
-}
-
-template <>
-void ApplyNMS<float, CPUContext>(
-    const int num_boxes,
-    const int max_keeps,
-    const float thresh,
-    const float* boxes,
-    int64_t* keep_indices,
-    int& num_keep,
-    CPUContext* ctx) {
-  int count = 0;
-  std::vector<char> is_dead(num_boxes);
-  for (int i = 0; i < num_boxes; ++i)
-    is_dead[i] = 0;
-  for (int i = 0; i < num_boxes; ++i) {
-    if (is_dead[i]) continue;
-    keep_indices[count++] = i;
-    if (count == max_keeps) break;
-    for (int j = i + 1; j < num_boxes; ++j)
-      if (!is_dead[j] && IoU(&boxes[i * 5], &boxes[j * 5]) > thresh) {
-        is_dead[j] = 1;
-      }
-  }
-  num_keep = count;
-}
-
-template <>
-void SelectProposals<float, CPUContext>(
-    const int count,
-    const float score_thresh,
-    const float* input_scores,
-    vector<float>& output_scores,
-    vector<int64_t>& output_indices,
-    CPUContext* ctx) {
-  int num_proposals = 0;
-  for (int i = 0; i < count; ++i) {
-    if (input_scores[i] > score_thresh) {
-      output_indices[num_proposals++] = i;
-    }
-  }
-  output_scores.resize(num_proposals);
-  for (int i = 0; i < num_proposals; ++i) {
-    output_scores[i] = input_scores[output_indices[i]];
-  }
-}
-
-} // namespace detection
-
-} // namespace utils
-
-} // namespace dragon
--- a/csrc/cxx/utils/detection_utils.cu
+++ b/csrc/cxx/utils/detection_utils.cu
-#ifdef USE_CUDA
-
-#include <dragon/core/context_cuda.h>
-#include <dragon/core/workspace.h>
-#include <dragon/utils/device/common_cub.h>
-#include <dragon/utils/device/common_thrust.h>
-#include "detection_utils.h"
-
-namespace dragon {
-
-namespace utils {
-
-namespace detection {
-
-#define DIV_UP(m, n) ((m) / (n) + ((m) % (n) > 0))
-#define NUM_THREADS 64
-
-namespace {
-
-template <typename T>
-struct ThresholdFunctor {
-  ThresholdFunctor(float thresh) : thresh_(thresh) {}
-  inline __device__ bool operator()(
-      const thrust::tuple<int64_t, T>& key_val) const {
-    return thrust::get<1>(key_val) > thresh_;
-  }
-  float thresh_;
-};
-
-template <typename T>
-__device__ bool _CheckIoU(const T* a, const T* b, const float thresh) {
-  const T x1 = max(a[0], b[0]);
-  const T y1 = max(a[1], b[1]);
-  const T x2 = min(a[2], b[2]);
-  const T y2 = min(a[3], b[3]);
-  const T width = max(T(0), x2 - x1 + 1);
-  const T height = max(T(0), y2 - y1 + 1);
-  const T inter = width * height;
-  const T Sa = (a[2] - a[0] + T(1)) * (a[3] - a[1] + T(1));
-  const T Sb = (b[2] - b[0] + T(1)) * (b[3] - b[1] + T(1));
-  return inter > thresh * (Sa + Sb - inter);
-}
-
-template <typename T>
-__global__ void _NonMaxSuppression(
-    const int num_blocks,
-    const int num_boxes,
-    const T thresh,
-    const T* dev_boxes,
-    uint64_t* dev_mask) {
-  const int row_start = blockIdx.y;
-  const int col_start = blockIdx.x;
-  if (row_start > col_start) return;
-
-  const int row_size = min(num_boxes - row_start * NUM_THREADS, NUM_THREADS);
-  const int col_size = min(num_boxes - col_start * NUM_THREADS, NUM_THREADS);
-
-  __shared__ T block_boxes[NUM_THREADS * 4];
-
-  if (threadIdx.x < col_size) {
-    const int c1 = threadIdx.x * 4;
-    const int c2 = (col_start * NUM_THREADS + threadIdx.x) * 5;
-    block_boxes[c1] = dev_boxes[c2];
-    block_boxes[c1 + 1] = dev_boxes[c2 + 1];
-    block_boxes[c1 + 2] = dev_boxes[c2 + 2];
-    block_boxes[c1 + 3] = dev_boxes[c2 + 3];
-  }
-
-  __syncthreads();
-
-  if (threadIdx.x < row_size) {
-    const int index = row_start * NUM_THREADS + threadIdx.x;
-    const T* dev_box = dev_boxes + index * 5;
-    unsigned long long val = 0;
-    const int start = (row_start == col_start) ? (threadIdx.x + 1) : 0;
-    for (int i = start; i < col_size; ++i) {
-      if (_CheckIoU(dev_box, block_boxes + i * 4, thresh)) {
-        val |= 1ULL << i;
-      }
-    }
-    dev_mask[index * num_blocks + col_start] = val;
-  }
-}
-
-} // namespace
-
-template <>
-void SelectProposals<float, CUDAContext>(
-    const int count,
-    const float score_thresh,
-    const float* in_scores,
-    vector<float>& out_scores,
-    vector<int64_t>& out_indices,
-    CUDAContext* ctx) {
-  auto* in_indices = ctx->workspace()->template data<int64_t, CUDAContext>(
-      {count}, "data:1")[0];
-  auto iter = thrust::make_zip_iterator(
-      thrust::make_tuple(in_indices, const_cast<float*>(in_scores)));
-  auto policy = thrust::cuda::par.on(ctx->cuda_stream());
-  thrust::counting_iterator<int64_t> offset(0);
-  thrust::copy(policy, offset, offset + count, in_indices);
-  auto last = thrust::partition(
-      policy, iter, iter + count, ThresholdFunctor<float>(score_thresh));
-  size_t num_proposals = last - iter;
-  out_scores.resize(num_proposals);
-  out_indices.resize(num_proposals);
-  CUDA_CHECK(cudaMemcpyAsync(
-      out_scores.data(),
-      in_scores,
-      num_proposals * sizeof(float),
-      cudaMemcpyDeviceToHost,
-      ctx->cuda_stream()));
-  CUDA_CHECK(cudaMemcpyAsync(
-      out_indices.data(),
-      in_indices,
-      num_proposals * sizeof(int64_t),
-      cudaMemcpyDeviceToHost,
-      ctx->cuda_stream()));
-  ctx->FinishDeviceComputation();
-}
-
-template <>
-void ApplyNMS<float, CUDAContext>(
-    const int num_boxes,
-    const int max_keeps,
-    const float thresh,
-    const float* boxes,
-    int64_t* keep_indices,
-    int& num_keep,
-    CUDAContext* ctx) {
-  const int num_blocks = DIV_UP(num_boxes, NUM_THREADS);
-
-  vector<uint64_t> mask_host(num_boxes * num_blocks);
-  auto* mask_dev = (uint64_t*)ctx->workspace()->data<CUDAContext>(
-      {mask_host.size() * sizeof(uint64_t)}, "data:1")[0];
-
-  _NonMaxSuppression<<<
-      dim3(num_blocks, num_blocks),
-      NUM_THREADS,
-      0,
-      ctx->cuda_stream()>>>(num_blocks, num_boxes, thresh, boxes, mask_dev);
-
-  CUDA_CHECK(cudaMemcpyAsync(
-      mask_host.data(),
-      mask_dev,
-      mask_host.size() * sizeof(uint64_t),
-      cudaMemcpyDeviceToHost,
-      ctx->cuda_stream()));
-
-  ctx->FinishDeviceComputation();
-
-  vector<uint64_t> dead_bit(num_blocks);
-  memset(&dead_bit[0], 0, sizeof(uint64_t) * num_blocks);
-
-  int num_selected = 0;
-  for (int i = 0; i < num_boxes; ++i) {
-    const int nblock = i / NUM_THREADS;
-    const int inblock = i % NUM_THREADS;
-    if (!(dead_bit[nblock] & (1ULL << inblock))) {
-      keep_indices[num_selected++] = i;
-      auto* mask_i = &mask_host[0] + i * num_blocks;
-      for (int j = nblock; j < num_blocks; ++j)
-        dead_bit[j] |= mask_i[j];
-      if (num_selected == max_keeps) break;
-    }
-  }
-  num_keep = num_selected;
-}
-
-} // namespace detection
-
-} // namespace utils
-
-} // namespace dragon
-
-#endif // USE_CUDA
--- a/csrc/cxx/utils/detection_utils.h
+++ b/csrc/cxx/utils/detection_utils.h
--- a/csrc/pyx/_mask.pyx
+++ b/csrc/pyx/_mask.pyx
--- a/csrc/pyx/maskApi.c
+++ b/csrc/pyx/maskApi.c
-/**************************************************************************
-* Microsoft COCO Toolbox.      version 2.0
-* Data, paper, and tutorials available at:  http://mscoco.org/
-* Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
-* Licensed under the Simplified BSD License [see coco/license.txt]
-**************************************************************************/
-#include "maskApi.h"
-#include <math.h>
-#include <stdlib.h>
-
-uint umin( uint a, uint b ) { return (a<b) ? a : b; }
-uint umax( uint a, uint b ) { return (a>b) ? a : b; }
-
-void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts ) {
-  R->h=h; R->w=w; R->m=m; R->cnts=(m==0)?0:malloc(sizeof(uint)*m);
-  if(cnts) for(siz j=0; j<m; j++) R->cnts[j]=cnts[j];
-}
-
-void rleFree( RLE *R ) {
-  free(R->cnts); R->cnts=0;
-}
-
-void rlesInit( RLE **R, siz n ) {
-  *R = (RLE*) malloc(sizeof(RLE)*n);
-  for(siz i=0; i<n; i++) rleInit((*R)+i,0,0,0,0);
-}
-
-void rlesFree( RLE **R, siz n ) {
-  for(siz i=0; i<n; i++) rleFree((*R)+i); free(*R); *R=0;
-}
-
-void rleEncode( RLE *R, const byte *M, siz h, siz w, siz n ) {
-  siz i, j, k, a=w*h; uint c, *cnts; byte p;
-  cnts = malloc(sizeof(uint)*(a+1));
-  for(i=0; i<n; i++) {
-    const byte *T=M+a*i; k=0; p=0; c=0;
-    for(j=0; j<a; j++) { if(T[j]!=p) { cnts[k++]=c; c=0; p=T[j]; } c++; }
-    cnts[k++]=c; rleInit(R+i,h,w,k,cnts);
-  }
-  free(cnts);
-}
-
-void rleDecode( const RLE *R, byte *M, siz n ) {
-  for( siz i=0; i<n; i++ ) {
-    byte v=0; for( siz j=0; j<R[i].m; j++ ) {
-      for( siz k=0; k<R[i].cnts[j]; k++ ) *(M++)=v; v=!v; }}
-}
-
-void rleMerge( const RLE *R, RLE *M, siz n, bool intersect ) {
-  uint *cnts, c, ca, cb, cc, ct; bool v, va, vb, vp;
-  siz i, a, b, h=R[0].h, w=R[0].w, m=R[0].m; RLE A, B;
-  if(n==0) { rleInit(M,0,0,0,0); return; }
-  if(n==1) { rleInit(M,h,w,m,R[0].cnts); return; }
-  cnts = malloc(sizeof(uint)*(h*w+1));
-  for( a=0; a<m; a++ ) cnts[a]=R[0].cnts[a];
-  for( i=1; i<n; i++ ) {
-    B=R[i]; if(B.h!=h||B.w!=w) { h=w=m=0; break; }
-    rleInit(&A,h,w,m,cnts); ca=A.cnts[0]; cb=B.cnts[0];
-    v=va=vb=0; m=0; a=b=1; cc=0; ct=1;
-    while( ct>0 ) {
-      c=umin(ca,cb); cc+=c; ct=0;
-      ca-=c; if(!ca && a<A.m) { ca=A.cnts[a++]; va=!va; } ct+=ca;
-      cb-=c; if(!cb && b<B.m) { cb=B.cnts[b++]; vb=!vb; } ct+=cb;
-      vp=v; if(intersect) v=va&&vb; else v=va||vb;
-      if( v!=vp||ct==0 ) { cnts[m++]=cc; cc=0; }
-    }
-    rleFree(&A);
-  }
-  rleInit(M,h,w,m,cnts); free(cnts);
-}
-
-void rleArea( const RLE *R, siz n, uint *a ) {
-  for( siz i=0; i<n; i++ ) {
-    a[i]=0; for( siz j=1; j<R[i].m; j+=2 ) a[i]+=R[i].cnts[j]; }
-}
-
-void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o ) {
-  siz g, d; BB db, gb; bool crowd;
-  db=malloc(sizeof(double)*m*4); rleToBbox(dt,db,m);
-  gb=malloc(sizeof(double)*n*4); rleToBbox(gt,gb,n);
-  bbIou(db,gb,m,n,iscrowd,o); free(db); free(gb);
-  for( g=0; g<n; g++ ) for( d=0; d<m; d++ ) if(o[g*m+d]>0) {
-    crowd=iscrowd!=NULL && iscrowd[g];
-    if(dt[d].h!=gt[g].h || dt[d].w!=gt[g].w) { o[g*m+d]=-1; continue; }
-    siz ka, kb, a, b; uint c, ca, cb, ct, i, u; bool va, vb;
-    ca=dt[d].cnts[0]; ka=dt[d].m; va=vb=0;
-    cb=gt[g].cnts[0]; kb=gt[g].m; a=b=1; i=u=0; ct=1;
-    while( ct>0 ) {
-      c=umin(ca,cb); if(va||vb) { u+=c; if(va&&vb) i+=c; } ct=0;
-      ca-=c; if(!ca && a<ka) { ca=dt[d].cnts[a++]; va=!va; } ct+=ca;
-      cb-=c; if(!cb && b<kb) { cb=gt[g].cnts[b++]; vb=!vb; } ct+=cb;
-    }
-    if(i==0) u=1; else if(crowd) rleArea(dt+d,1,&u);
-    o[g*m+d] = (double)i/(double)u;
-  }
-}
-
-void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o ) {
-  double h, w, i, u, ga, da; siz g, d; bool crowd;
-  for( g=0; g<n; g++ ) {
-    BB G=gt+g*4; ga=G[2]*G[3]; crowd=iscrowd!=NULL && iscrowd[g];
-    for( d=0; d<m; d++ ) {
-      BB D=dt+d*4; da=D[2]*D[3]; o[g*m+d]=0;
-      w=fmin(D[2]+D[0],G[2]+G[0])-fmax(D[0],G[0]); if(w<=0) continue;
-      h=fmin(D[3]+D[1],G[3]+G[1])-fmax(D[1],G[1]); if(h<=0) continue;
-      i=w*h; u = crowd ? da : da+ga-i; o[g*m+d]=i/u;
-    }
-  }
-}
-
-void rleToBbox( const RLE *R, BB bb, siz n ) {
-  for( siz i=0; i<n; i++ ) {
-    uint h, w, x, y, xs, ys, xe, ye, cc, t; siz j, m;
-    h=(uint)R[i].h; w=(uint)R[i].w; m=R[i].m;
-    m=((siz)(m/2))*2; xs=w; ys=h; xe=ye=0; cc=0;
-    if(m==0) { bb[4*i+0]=bb[4*i+1]=bb[4*i+2]=bb[4*i+3]=0; continue; }
-    for( j=0; j<m; j++ ) {
-      cc+=R[i].cnts[j]; t=cc-j%2; y=t%h; x=(t-y)/h;
-      xs=umin(xs,x); xe=umax(xe,x); ys=umin(ys,y); ye=umax(ye,y);
-    }
-    bb[4*i+0]=xs; bb[4*i+2]=xe-xs+1;
-    bb[4*i+1]=ys; bb[4*i+3]=ye-ys+1;
-  }
-}
-
-void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n ) {
-  for( siz i=0; i<n; i++ ) {
-    double xs=bb[4*i+0], xe=xs+bb[4*i+2];
-    double ys=bb[4*i+1], ye=ys+bb[4*i+3];
-    double xy[8] = {xs,ys,xs,ye,xe,ye,xe,ys};
-    rleFrPoly( R+i, xy, 4, h, w );
-  }
-}
-
-int uintCompare(const void *a, const void *b) {
-  uint c=*((uint*)a), d=*((uint*)b); return c>d?1:c<d?-1:0;
-}
-
-void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w ) {
-  // upsample and get discrete points densely along entire boundary
-  siz j, m=0; double scale=5; int *x, *y, *u, *v; uint *a, *b;
-  x=malloc(sizeof(int)*(k+1)); y=malloc(sizeof(int)*(k+1));
-  for(j=0; j<k; j++) x[j]=(int)(scale*xy[j*2+0]+.5); x[k]=x[0];
-  for(j=0; j<k; j++) y[j]=(int)(scale*xy[j*2+1]+.5); y[k]=y[0];
-  for(j=0; j<k; j++) m+=umax(abs(x[j]-x[j+1]),abs(y[j]-y[j+1]))+1;
-  u=malloc(sizeof(int)*m); v=malloc(sizeof(int)*m); m=0;
-  for( j=0; j<k; j++ ) {
-    int xs=x[j], xe=x[j+1], ys=y[j], ye=y[j+1], dx, dy, t;
-    bool flip; double s; dx=abs(xe-xs); dy=abs(ys-ye);
-    flip = (dx>=dy && xs>xe) || (dx<dy && ys>ye);
-    if(flip) { t=xs; xs=xe; xe=t; t=ys; ys=ye; ye=t; }
-    s = dx>=dy ? (double)(ye-ys)/dx : (double)(xe-xs)/dy;
-    if(dx>=dy) for( int d=0; d<=dx; d++ ) {
-      t=flip?dx-d:d; u[m]=t+xs; v[m]=(int)(ys+s*t+.5); m++;
-    } else for( int d=0; d<=dy; d++ ) {
-      t=flip?dy-d:d; v[m]=t+ys; u[m]=(int)(xs+s*t+.5); m++;
-    }
-  }
-  // get points along y-boundary and downsample
-  free(x); free(y); k=m; m=0; double xd, yd;
-  x=malloc(sizeof(int)*k); y=malloc(sizeof(int)*k);
-  for( j=1; j<k; j++ ) if(u[j]!=u[j-1]) {
-    xd=(double)(u[j]<u[j-1]?u[j]:u[j]-1); xd=(xd+.5)/scale-.5;
-    if( floor(xd)!=xd || xd<0 || xd>w-1 ) continue;
-    yd=(double)(v[j]<v[j-1]?v[j]:v[j-1]); yd=(yd+.5)/scale-.5;
-    if(yd<0) yd=0; else if(yd>h) yd=h; yd=ceil(yd);
-    x[m]=(int) xd; y[m]=(int) yd; m++;
-  }
-  // compute rle encoding given y-boundary points
-  k=m; a=malloc(sizeof(uint)*(k+1));
-  for( j=0; j<k; j++ ) a[j]=(uint)(x[j]*(int)(h)+y[j]);
-  a[k++]=(uint)(h*w); free(u); free(v); free(x); free(y);
-  qsort(a,k,sizeof(uint),uintCompare); uint p=0;
-  for( j=0; j<k; j++ ) { uint t=a[j]; a[j]-=p; p=t; }
-  b=malloc(sizeof(uint)*k); j=m=0; b[m++]=a[j++];
-  while(j<k) if(a[j]>0) b[m++]=a[j++]; else {
-    j++; if(j<k) b[m-1]+=a[j++]; }
-  rleInit(R,h,w,m,b); free(a); free(b);
-}
-
-char* rleToString( const RLE *R ) {
-  // Similar to LEB128 but using 6 bits/char and ascii chars 48-111.
-  siz i, m=R->m, p=0; long x; bool more;
-  char *s=malloc(sizeof(char)*m*6);
-  for( i=0; i<m; i++ ) {
-    x=(long) R->cnts[i]; if(i>2) x-=(long) R->cnts[i-2]; more=1;
-    while( more ) {
-      char c=x & 0x1f; x >>= 5; more=(c & 0x10) ? x!=-1 : x!=0;
-      if(more) c |= 0x20; c+=48; s[p++]=c;
-    }
-  }
-  s[p]=0; return s;
-}
-
-void rleFrString( RLE *R, char *s, siz h, siz w ) {
-  siz m=0, p=0, k; long x; bool more; uint *cnts;
-  while( s[m] ) m++; cnts=malloc(sizeof(uint)*m); m=0;
-  while( s[p] ) {
-    x=0; k=0; more=1;
-    while( more ) {
-      char c=s[p]-48; x |= (c & 0x1f) << 5*k;
-      more = c & 0x20; p++; k++;
-      if(!more && (c & 0x10)) x |= -1 << 5*k;
-    }
-    if(m>2) x+=(long) cnts[m-2]; cnts[m++]=(uint) x;
-  }
-  rleInit(R,h,w,m,cnts); free(cnts);
-}
--- a/csrc/pyx/maskApi.h
+++ b/csrc/pyx/maskApi.h
-/**************************************************************************
-* Microsoft COCO Toolbox.      version 2.0
-* Data, paper, and tutorials available at:  http://mscoco.org/
-* Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
-* Licensed under the Simplified BSD License [see coco/license.txt]
-**************************************************************************/
-#pragma once
-#include <stdbool.h>
-
-typedef unsigned int uint;
-typedef unsigned long siz;
-typedef unsigned char byte;
-typedef double* BB;
-typedef struct { siz h, w, m; uint *cnts; } RLE;
-
-// Initialize/destroy RLE.
-void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts );
-void rleFree( RLE *R );
-
-// Initialize/destroy RLE array.
-void rlesInit( RLE **R, siz n );
-void rlesFree( RLE **R, siz n );
-
-// Encode binary masks using RLE.
-void rleEncode( RLE *R, const byte *mask, siz h, siz w, siz n );
-
-// Decode binary masks encoded via RLE.
-void rleDecode( const RLE *R, byte *mask, siz n );
-
-// Compute union or intersection of encoded masks.
-void rleMerge( const RLE *R, RLE *M, siz n, bool intersect );
-
-// Compute area of encoded masks.
-void rleArea( const RLE *R, siz n, uint *a );
-
-// Compute intersection over union between masks.
-void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o );
-
-// Compute intersection over union between bounding boxes.
-void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o );
-
-// Get bounding boxes surrounding encoded masks.
-void rleToBbox( const RLE *R, BB bb, siz n );
-
-// Convert bounding boxes to encoded masks.
-void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n );
-
-// Convert polygon to encoded mask.
-void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w );
-
-// Get compressed string representation of encoded mask.
-char* rleToString( const RLE *R );
-
-// Convert from compressed string representation of encoded mask.
-void rleFrString( RLE *R, char *s, siz h, siz w );
--- a/csrc/pyx/setup.py
+++ b/csrc/pyx/setup.py
@@ -8,7 +8,7 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
-"""Compile the cython extensions."""
+"""Build cython extensions."""

 from __future__ import absolute_import
 from __future__ import division
@@ -16,34 +16,25 @@ from __future__ import print_function

 from distutils.extension import Extension
 from distutils.core import setup
-import os

 from Cython.Distutils import build_ext
 import numpy as np

 ext_modules = [
    Extension(
-        'install.lib.utils.cython_bbox',
+        'seetadet.utils.bbox.cython_bbox',
        ['cython_bbox.pyx'],
        extra_compile_args=['-w'],
-        include_dirs=[np.get_include()]
+        include_dirs=[np.get_include()],
    ),
    Extension(
-        'install.lib.utils.cython_nms',
+        'seetadet.utils.nms.cython_nms',
        ['cython_nms.pyx'],
        extra_compile_args=['-w'],
-        include_dirs=[np.get_include()]
-    ),
-    Extension(
-        'install.lib.utils.pycocotools._mask',
-        ['maskApi.c', '_mask.pyx'],
-        include_dirs=[np.get_include(), os.path.dirname(os.path.abspath(__file__))],
-        extra_compile_args=['-w']
+        include_dirs=[np.get_include()],
    ),
 ]

-setup(
-    name='SeetaDet',
+setup(name='seetadet',
      ext_modules=ext_modules,
-    cmdclass={'build_ext': build_ext},
-)
+      cmdclass={'build_ext': build_ext})
--- a/data/datasets/README.md
+++ b/data/datasets/README.md
+# Datasets
+
+## Introduction
+
+This folder is kept for the record and json datasets.
+
+Please prepare the datasets following the [documentation](../../scripts/datasets/README.md).
--- a/data/images/README.md
+++ b/data/images/README.md
+# Demo Images
+
+## Introduction
+
+This folder is kept for the demo images.
--- a/data/images/coco_val2017_000000001000.jpg
+++ b/data/images/coco_val2017_000000001000.jpg
--- a/data/images/coco_val2017_000000002157.jpg
+++ b/data/images/coco_val2017_000000002157.jpg
--- a/data/images/coco_val2017_000000013201.jpg
+++ b/data/images/coco_val2017_000000013201.jpg
--- a/data/images/coco_val2017_000000015254.jpg
+++ b/data/images/coco_val2017_000000015254.jpg
--- a/data/images/coco_val2017_000000015497.jpg
+++ b/data/images/coco_val2017_000000015497.jpg
--- a/data/pretrained/README.md
+++ b/data/pretrained/README.md
+# Pretrained Models
+
+## Introduction
+
+This folder is kept for the pretrained models.
+
+## ImageNet Pretrained Models
+
+### Training settings
+
+- ResNet models trained with 200 epochs follow the procedure in arXiv.1812.01187.
+
+### ResNet
+
+| Model | Lr sched | Acc@1 | Acc@5 | Source |
+| :---: | :------: | :---: | :---: | :----: |
+| [R-50](https://dragon.seetatech.com/download/seetadet/pretrained/R-50_in1k_cls90e.pkl) | 90e | 76.53 | 93.16 | Ours |
+| [R-50](https://dragon.seetatech.com/download/seetadet/pretrained/R-50_in1k_cls200e.pkl) | 200e | 78.64 | 94.30 | Ours |
+
+### MobileNet
+
+| Model | Lr sched | Acc@1 | Acc@5 | Source |
+| :---: | :------: | :---: | :---: | :----: |
+| [MobileNetV2](https://dragon.seetatech.com/download/seetadet/pretrained/MobileNetV2_in1k_cls300e.pkl) | 300e | 71.88 | 90.29 | TorchVision |
+| [MobileNetV3L](https://dragon.seetatech.com/download/seetadet/pretrained/MobileNetV3L_in1k_cls600e.pkl) | 600e | 74.04 | 91.34 | TorchVision |
+
+### VGG
+
+| Model | Lr sched | Acc@1 | Acc@5 | Source |
+| :---: | :------: | :---: | :---: | :----: |
+| [VGG-16-FCN](https://dragon.seetatech.com/download/seetadet/pretrained/VGG-16-FCN_in1k.pkl) | - | - | - | weiliu89 |
+
--- a/scripts/coco/im2rec.py
+++ b/scripts/coco/im2rec.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""Make record file for COCO dataset."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-import shutil
-
-from maker import make_record
-from roidb import make_database
-
-
-if __name__ == '__main__':
-    COCO_ROOT = '/data'
-
-    # Encode masks to RLE bytes
-    if not os.path.exists('build'):
-        os.makedirs('build')
-    make_database('train', '2017', COCO_ROOT)
-    make_database('val', '2017', COCO_ROOT)
-
-    # coco_2017_train
-    make_record(
-        db_file='build/coco_2017_train.db.pkl',
-        record_file=os.path.join(COCO_ROOT, 'coco_2017_train'),
-        images_path=[os.path.join(COCO_ROOT, 'images/train2017')],
-        splits_path=[os.path.join(COCO_ROOT, 'splits')],
-        splits=['train2017'],
-    )
-
-    # coco_2017_val
-    make_record(
-        db_file='build/coco_2017_val.db.pkl',
-        record_file=os.path.join(COCO_ROOT, 'coco_2017_val'),
-        images_path=[os.path.join(COCO_ROOT, 'images/val2017')],
-        splits_path=[os.path.join(COCO_ROOT, 'splits')],
-        splits=['val2017'],
-    )
-
-    shutil.rmtree('build')
--- a/scripts/coco/maker.py
+++ b/scripts/coco/maker.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-
-import os
-import pickle
-import time
-
-import cv2
-import dragon
-import numpy as np
-
-
-def make_example(image_file, objects, im_scale=None):
-    filename = os.path.split(image_file)[-1]
-    example = {'id': filename.split('.')[0], 'object': []}
-
-    if im_scale:
-        img = cv2.imread(image_file)
-        img = cv2.resize(
-            img, None,
-            fx=im_scale, fy=im_scale,
-            interpolation=cv2.INTER_LINEAR,
-        )
-        example['height'], example['width'], example['depth'] = img.shape
-        _, img = cv2.imencode('.jpg', img, [int(cv2.IMWRITE_JPEG_QUALITY), 95])
-        example['content'] = img.tostring()
-    else:
-        with open(image_file, 'rb') as f:
-            img_bytes = bytes(f.read())
-        img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 3)
-        example['height'], example['width'], example['depth'] = img.shape
-        example['content'] = img_bytes
-
-    for obj in objects:
-        x1, y1, x2, y2 = obj['bbox']
-        example['object'].append({
-            'name': obj['name'],
-            'xmin': x1,
-            'ymin': y1,
-            'xmax': x2,
-            'ymax': y2,
-            'mask': obj['mask'],
-            'polygons': obj['polygons'],
-            'difficult': obj.get('crowd', 0),
-        })
-
-    return example
-
-
-def make_record(
-    record_file,
-    images_path,
-    db_file,
-    splits_path,
-    splits,
-    ext='.jpg',
-    im_scale=None,
-):
-    if os.path.exists(record_file):
-        raise ValueError('The record file is already exist.')
-    os.makedirs(record_file)
-
-    if not isinstance(images_path, list):
-        images_path = [images_path]
-    if not isinstance(splits_path, list):
-        splits_path = [splits_path]
-    assert len(splits) == len(splits_path)
-    assert len(splits) == len(images_path)
-
-    if db_file is not None:
-        with open(db_file, 'rb') as f:
-            all_entries = pickle.load(f)
-    else:
-        all_entries = {}
-
-    print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
-
-    writer = dragon.io.KPLRecordWriter(
-        path=record_file,
-        protocol={
-            'id': 'string',
-            'content': 'bytes',
-            'height': 'int64',
-            'width': 'int64',
-            'depth': 'int64',
-            'object': [{
-                'name': 'string',
-                'xmin': 'float64',
-                'ymin': 'float64',
-                'xmax': 'float64',
-                'ymax': 'float64',
-                'mask': 'bytes',
-                'polygons': [['float64']],
-                'difficult': 'int64',
-            }]
-        }
-    )
-
-    count, total_line = 0, 0
-    start_time = time.time()
-
-    for db_idx, split in enumerate(splits):
-        split_file = os.path.join(splits_path[db_idx], split + '.txt')
-        if not os.path.exists(split_file):
-            # Fallback to try if split provided as json format
-            split_file = os.path.join(splits_path[db_idx], split + '.json')
-            if not os.path.exists(split_file):
-                raise FileNotFoundError('Unable to find the split:', split)
-            with open(split_file, 'r') as f:
-                import json
-                images_info = json.load(f)
-                total_line = len(images_info['images'])
-                lines = []
-                for info in images_info['images']:
-                    lines.append(os.path.splitext(info['file_name'])[0])
-        else:
-            with open(split_file, 'r') as f:
-                lines = f.readlines()
-                total_line += len(lines)
-        for line in lines:
-            count += 1
-            if count % 2000 == 0:
-                now_time = time.time()
-                print('{} / {} in {:.2f} sec'.format(
-                    count, total_line, now_time - start_time))
-            filename = line.strip()
-            image_file = os.path.join(images_path[db_idx], filename + ext)
-            objects = all_entries[filename] if filename in all_entries else {}
-            writer.write(make_example(image_file, objects, im_scale))
-
-    now_time = time.time()
-    print('{} / {} in {:.2f} sec'.format(count, total_line, now_time - start_time))
-    writer.close()
-
-    end_time = time.time()
-    data_size = os.path.getsize(record_file + '/root.data') * 1e-6
-    print('{} images take {:.2f} MB in {:.2f} sec.'
-          .format(total_line, data_size, end_time - start_time))
--- a/scripts/coco/roidb.py
+++ b/scripts/coco/roidb.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-import os
-import os.path as osp
-import pickle
-
-from seetadet.utils.pycocotools import mask_utils
-from seetadet.utils.pycocotools.coco import COCO
-
-
-class COCOWrapper(object):
-    def __init__(self, image_set, year, data_dir):
-        self._year = year
-        self._image_set = image_set
-        self._data_path = osp.join(data_dir)
-        self.invalid_cnt = 0
-        self.ignore_cnt = 0
-
-        # Load COCO API, classes, class <-> id mappings
-        self._COCO = COCO(self._get_ann_file())
-        cats = self._COCO.loadCats(self._COCO.getCatIds())
-        self._classes = tuple(['__background__'] + [c['name'] for c in cats])
-        self._class_to_ind = dict(zip(self._classes, range(self.num_classes)))
-        self._ind_to_class = dict(zip(range(self.num_classes), self._classes))
-        self._class_to_cat_id = dict(zip([c['name'] for c in cats], self._COCO.getCatIds()))
-        self._cat_id_to_class_id = dict([(self._class_to_cat_id[cls], self._class_to_ind[cls])
-                                         for cls in self._classes[1:]])
-        self._data_name = {
-            # 5k ``val2014`` subset
-            'minival2014': 'val2014',
-            # ``val2014`` minus ``minival2014``
-            'valminusminival2014': 'val2014',
-        }.get(image_set + year, image_set + year)
-        self._image_index = self._load_image_set_index()
-        self._annotations = self._load_annotations()
-
-    def _get_ann_file(self):
-        prefix = 'instances' \
-            if self._image_set.find('test') == -1 \
-            else 'image_info'
-        return osp.join(
-            self._data_path,
-            'annotations',
-            prefix + '_' +
-            self._image_set +
-            self._year + '.json'
-        )
-
-    def _load_image_set_index(self):
-        """Load image ids."""
-        image_ids = self._COCO.getImgIds()
-        return image_ids
-
-    def _load_annotations(self):
-        """Load annotations."""
-        annotations = [self._load_coco_annotation(index)
-                       for index in self._image_index]
-        return annotations
-
-    def image_path_from_index(self, index):
-        """Construct an image path from the image's "index" identifier."""
-        # Example image path for index=119993:
-        # images/train2014/COCO_train2014_000000119993.jpg
-        # images/train2017/000000119993.jpg
-        filename = str(index).zfill(12) + '.jpg'
-        if '2014' in self._data_name:
-            filename = 'COCO_{}_{}'.format(self._data_name, filename)
-        image_path = osp.join(self._data_path, 'images',
-                              self._data_name, filename)
-        assert osp.exists(image_path), \
-            'Path does not exist: {}'.format(image_path)
-        return image_path
-
-    def image_path_at(self, i):
-        """Return the absolute path to image i in the image sequence."""
-        return self.image_path_from_index(self._image_index[i])
-
-    def annotation_at(self, i):
-        """Return the absolute path to image i in the image sequence."""
-        return self._annotations[i]
-
-    def _load_coco_annotation(self, index):
-        """Loads COCO bounding-box instance annotations."""
-        im_ann = self._COCO.loadImgs(index)[0]
-        width, height = im_ann['width'], im_ann['height']
-        ann_ids = self._COCO.getAnnIds(imgIds=index, iscrowd=None)
-        objects = self._COCO.loadAnns(ann_ids)
-        # Sanitize boxes -- some are invalid
-        valid_objects = []
-        mask, polygons = b'', []
-        for obj in objects:
-            x1 = float(max(0, obj['bbox'][0]))
-            y1 = float(max(0, obj['bbox'][1]))
-            x2 = float(min(width - 1, x1 + max(0, obj['bbox'][2] - 1)))
-            y2 = float(min(height - 1, y1 + max(0, obj['bbox'][3] - 1)))
-            if isinstance(obj['segmentation'], list):
-                for p in obj['segmentation']:
-                    if len(p) < 6:
-                        print('Remove Invalid segm.')
-                # Valid polygons have >= 3 points, so require >= 6 coordinates
-                polygons = [p for p in obj['segmentation'] if len(p) >= 6]
-            else:
-                # Crowd masks
-                # Some are encoded with height or width
-                # running out of the image bound
-                # Do not use them or decoding error is inevitable
-                mask = mask_utils.poly2bytes(obj['segmentation'], height, width)
-            if obj['area'] > 0 and x2 > x1 and y2 > y1:
-                obj['clean_bbox'] = [x1, y1, x2, y2]
-                valid_objects.append({
-                    'bbox': [x1, y1, x2, y2],
-                    'mask': mask,
-                    'polygons': polygons,
-                    'category_id': obj['category_id'],
-                    'class_id': self._cat_id_to_class_id[obj['category_id']],
-                    'crowd': obj['iscrowd'],
-                })
-                valid_objects[-1]['name'] = \
-                    self._ind_to_class[valid_objects[-1]['class_id']]
-        return height, width, valid_objects
-
-    @property
-    def num_images(self):
-        return len(self._image_index)
-
-    @property
-    def num_classes(self):
-        return len(self._classes)
-
-
-def make_database(split, year, data_dir):
-    coco = COCOWrapper(split, year, data_dir)
-    print('Preparing to make split: {}, total {} images'
-          .format(split, coco.num_images))
-    if not osp.exists(osp.join(coco._data_path, 'splits')):
-        os.makedirs(osp.join(coco._data_path, 'splits'))
-
-    entries = collections.OrderedDict()
-    for i in range(coco.num_images):
-        filename = osp.basename(coco.image_path_at(i)).split('.')[0]
-        h, w, objects = coco.annotation_at(i)
-        entries[filename] = objects
-
-    with open(osp.join('build',
-                       'coco_' + year + '_' + split +
-                       '.db.pkl'), 'wb') as f:
-        pickle.dump(entries, f, pickle.HIGHEST_PROTOCOL)
-
-    with open(osp.join(coco._data_path, 'splits',
-                       split + year + '.txt'), 'w') as f:
-        for i in range(coco.num_images):
-            filename = str(osp.basename(coco.image_path_at(i)).split('.')[0])
-            if i != coco.num_images - 1:
-                filename += '\n'
-            f.write(filename)
-
-
-def merge_database(split, year, db_files):
-    entries = collections.OrderedDict()
-    data_path = os.path.dirname(db_files[0])
-
-    for db_file in db_files:
-        with open(db_file, 'rb') as f:
-            entries = pickle.load(f)
-            entries.update(entries)
-
-    with open(osp.join(data_path,
-                       'coco_' + year + '_' + split +
-                       '.db.pkl'), 'wb') as f:
-        pickle.dump(entries, f, pickle.HIGHEST_PROTOCOL)
--- a/scripts/datasets/README.md
+++ b/scripts/datasets/README.md
+# Prepare Datasets
+
+## Create Datasets for PASCAL VOC
+
+We assume that raw dataset has the following structure:
+
+```
+VOC<year>
+|_ JPEGImages
+|  |_ <im-1-name>.jpg
+|  |_ ...
+|  |_ <im-N-name>.jpg
+|_ Annotations
+|  |_ <im-1-name>.xml
+|  |_ ...
+|  |_ <im-N-name>.xml
+|_ ImageSets
+|  |_ Main
+|  |  |_ trainval.txt
+|  |  |_ test.txt
+|  |  |_ ...
+```
+
+Create record and json dataset by:
+
+```
+python pascal_voc.py \
+  --rec /path/to/datasets/voc_trainval0712 \
+  --gt /path/to/datasets/voc_trainval0712.json \
+  --images /path/to/VOC2007/JPEGImages \
+           /path/to/VOC2012/JPEGImages \
+  --annotations /path/to/VOC2007/Annotations \
+                /path/to/VOC2012/Annotations \
+  --splits /path/to/VOC2007/ImageSets/Main/trainval.txt \
+           /path/to/VOC2012/ImageSets/Main/trainval.txt
+```
+
+## Create Datasets for COCO
+
+We assume that raw dataset has the following structure:
+
+```
+COCO
+|_ images
+|  |_ train2017
+|  |  |_ <im-1-name>.jpg
+|  |  |_ ...
+|  |  |_ <im-N-name>.jpg
+|_ annotations
+|  |_ instances_train2017.json
+|  |_ ...
+```
+
+Create record dataset by:
+
+```
+python coco.py \
+  --rec /path/to/datasets/coco_train2017 \
+  --images /path/to/COCO/images/train2017 \
+  --annotations /path/to/COCO/annotations/instances_train2017.json
+```
--- a/scripts/datasets/coco.py
+++ b/scripts/datasets/coco.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Prepare MS COCO datasets."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import argparse
+import os
+import sys
+import time
+
+import dragon
+from pycocotools.coco import COCO
+from pycocotools.mask import frPyObjects
+
+
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Prepare PASCAL VOC datasets')
+    parser.add_argument(
+        '--rec',
+        default=None,
+        help='path to write record dataset')
+    parser.add_argument(
+        '--images',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of images folder')
+    parser.add_argument(
+        '--annotations',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of annotations folder')
+    parser.add_argument(
+        '--splits',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of split file')
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+
+
+def make_example(img_id, img_file, cocoGt):
+    """Return the record example."""
+    img_meta = cocoGt.imgs[img_id]
+    img_anns = cocoGt.loadAnns(cocoGt.getAnnIds(imgIds=[img_id]))
+    cat_id_to_cat = dict((v['id'], v['name'])
+                         for v in cocoGt.cats.values())
+    with open(img_file, 'rb') as f:
+        img_bytes = bytes(f.read())
+    height, width = img_meta['height'], img_meta['width']
+    example = {'id': str(img_id), 'height': height, 'width': width,
+               'depth': 3, 'content': img_bytes, 'object': []}
+    for ann in img_anns:
+        x1 = float(max(0, ann['bbox'][0]))
+        y1 = float(max(0, ann['bbox'][1]))
+        x2 = float(min(width - 1, x1 + max(0, ann['bbox'][2] - 1)))
+        y2 = float(min(height - 1, y1 + max(0, ann['bbox'][3] - 1)))
+        mask, polygons = b'', []
+        segm = ann.get('segmentation', None)
+        if segm is not None and isinstance(segm, list):
+            for p in ann['segmentation']:
+                if len(p) < 6:
+                    print('Remove Invalid segm.')
+            # Valid polygons have >= 3 points, so require >= 6 coordinates
+            polygons = [p for p in ann['segmentation'] if len(p) >= 6]
+        elif segm is not None:
+            # Crowd masks.
+            # Some are encoded with wrong height or width.
+            # Do not use them or decoding error is inevitable.
+            rle = frPyObjects(ann['segmentation'], height, width)
+            assert type(rle) == dict
+            mask = rle['counts']
+        example['object'].append({
+            'name': cat_id_to_cat[ann['category_id']],
+            'xmin': x1, 'ymin': y1, 'xmax': x2, 'ymax': y2,
+            'mask': mask, 'polygons': polygons,
+            'difficult': ann.get('iscrowd', 0)})
+    return example
+
+
+def write_dataset(args):
+    assert len(args.images) == len(args.annotations)
+    if os.path.exists(args.rec):
+        raise ValueError('The record path is already exist.')
+    os.makedirs(args.rec)
+    print('Write record dataset to {}'.format(args.rec))
+
+    writer = dragon.io.KPLRecordWriter(
+        path=args.rec,
+        protocol={
+            'id': 'string',
+            'content': 'bytes',
+            'height': 'int64',
+            'width': 'int64',
+            'depth': 'int64',
+            'object': [{
+                'name': 'string',
+                'xmin': 'float64',
+                'ymin': 'float64',
+                'xmax': 'float64',
+                'ymax': 'float64',
+                'mask': 'bytes',
+                'polygons': [['float64']],
+                'difficult': 'int64',
+            }]
+        }
+    )
+
+    # Scan all available entries.
+    print('Scan entries...')
+    entries, cocoGts = [], []
+    for ann_file in args.annotations:
+        cocoGts.append(COCO(ann_file))
+    if args.splits is not None:
+        assert len(args.splits) == len(args.images)
+        for i, split in enumerate(args.splits):
+            f = open(split, 'r')
+            for line in f.readlines():
+                filename = line.strip()
+                img_id = int(filename)
+                img_file = os.path.join(args.images[i], filename + '.jpg')
+                entries.append((img_id, img_file, cocoGts[i]))
+            f.close()
+    else:
+        for i, cocoGt in enumerate(cocoGts):
+            for info in cocoGt.imgs.values():
+                img_id = info['id']
+                img_file = os.path.join(args.images[i], info['file_name'])
+                entries.append((img_id, img_file, cocoGts[i]))
+
+    print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
+    start_time = time.time()
+    for i, entry in enumerate(entries):
+        if i > 0 and i % 2000 == 0:
+            now_time = time.time()
+            print('{} / {} in {:.2f} sec'.format(
+                i, len(entries), now_time - start_time))
+        writer.write(make_example(*entry))
+    now_time = time.time()
+    print('{} / {} in {:.2f} sec'.format(
+        len(entries), len(entries), now_time - start_time))
+    writer.close()
+
+    end_time = time.time()
+    data_size = os.path.getsize(args.rec + '/root.data') * 1e-6
+    print('{} images take {:.2f} MB in {:.2f} sec.'
+          .format(len(entries), data_size, end_time - start_time))
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    if args.rec is not None:
+        write_dataset(args)
--- a/scripts/datasets/json_dataset.py
+++ b/scripts/datasets/json_dataset.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Prepare JSON datasets."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import argparse
+import json
+import os
+import sys
+
+import dragon
+
+
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Prepare PASCAL VOC datasets')
+    parser.add_argument(
+        '--rec',
+        default=None,
+        help='path to read record')
+    parser.add_argument(
+        '--gt',
+        default=None,
+        help='path to write json ground-truth')
+    parser.add_argument(
+        '--categories',
+        nargs='+',
+        type=str,
+        default=None,
+        help='dataset object categories')
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+
+
+def get_image_id(image_name):
+    image_id = image_name.split('_')[-1].split('.')[0]
+    try:
+        return int(image_id)
+    except ValueError:
+        return image_name
+
+
+def write_dataset(args):
+    dataset = {'images': [], 'categories': [], 'annotations': []}
+    kpl_dataset = dragon.io.KPLRecordDataset(args.rec)
+    cat_to_cat_id = dict(zip(args.categories,
+                             range(1, len(args.categories) + 1)))
+    print('Writing json dataset to {}'.format(args.gt))
+    for cat in args.categories:
+        dataset['categories'].append({
+            'name': cat, 'id': cat_to_cat_id[cat]})
+    for _ in range(len(kpl_dataset)):
+        example = kpl_dataset.get()
+        image_id = get_image_id(example['id'])
+        dataset['images'].append({
+            'id': image_id, 'height': example['height'],
+            'width': example['width']})
+        for obj in example['object']:
+            if 'x2' in obj:
+                x1, y1, x2, y2 = obj['x1'], obj['y1'], obj['x2'], obj['y2']
+            elif 'xmin' in obj:
+                x1, y1, x2, y2 = obj['xmin'], obj['ymin'], obj['xmax'], obj['ymax']
+            else:
+                x1, y1, x2, y2 = obj['bbox']
+            w, h = x2 - x1 + 1, y2 - y1 + 1
+            dataset['annotations'].append({
+                'id': str(len(dataset['annotations'])),
+                'bbox': [x1, y1, w, h],
+                'area': w * h,
+                'iscrowd': obj.get('difficult', 0),
+                'image_id': image_id,
+                'category_id': cat_to_cat_id[obj['name']]})
+    with open(args.gt, 'w') as f:
+        json.dump(dataset, f)
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    if args.rec is None or not os.path.exists(args.rec):
+        raise ValueError('Specify the prepared record dataset.')
+    if args.gt is None:
+        raise ValueError('Specify the path to write json dataset.')
+    write_dataset(args)
--- a/scripts/voc/maker.py
+++ b/scripts/voc/maker.py
@@ -8,27 +8,67 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Prepare PASCAL VOC datasets."""

 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

+import argparse
 import os
+import sys
 import time

 import cv2
 import dragon
 import numpy as np
-import xml.etree.ElementTree as ET
+import xml.etree.ElementTree


-def make_example(image_file, xml_file):
-    tree = ET.parse(xml_file)
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Prepare PASCAL VOC datasets')
+    parser.add_argument(
+        '--rec',
+        default=None,
+        help='path to write record dataset')
+    parser.add_argument(
+        '--gt',
+        default=None,
+        help='path to write json dataset')
+    parser.add_argument(
+        '--images',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of images folder')
+    parser.add_argument(
+        '--annotations',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of annotations folder')
+    parser.add_argument(
+        '--splits',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of split file')
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+
+
+def make_example(img_file, xml_file):
+    """Return the record example."""
+    tree = xml.etree.ElementTree.parse(xml_file)
    filename = os.path.split(xml_file)[-1]
-    objs = tree.findall('object')
+    objects = tree.findall('object')
    size = tree.find('size')
    example = {'id': filename.split('.')[0], 'object': []}
-    with open(image_file, 'rb') as f:
+    with open(img_file, 'rb') as f:
        img_bytes = bytes(f.read())
    if size is not None:
        example['height'] = int(size.find('height').text)
@@ -38,7 +78,7 @@ def make_example(image_file, xml_file):
        img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 3)
        example['height'], example['width'], example['depth'] = img.shape
    example['content'] = img_bytes
-    for ix, obj in enumerate(objs):
+    for obj in objects:
        bbox = obj.find('bndbox')
        is_diff = 0
        if obj.find('difficult') is not None:
@@ -49,35 +89,21 @@ def make_example(image_file, xml_file):
            'ymin': float(bbox.find('ymin').text),
            'xmax': float(bbox.find('xmax').text),
            'ymax': float(bbox.find('ymax').text),
-            'difficult': is_diff,
-        })
-
+            'difficult': is_diff})
    return example


-def make_record(
-    record_file,
-    images_path,
-    annotations_path,
-    splits_path,
-    splits
-):
-    if os.path.exists(record_file):
-        raise ValueError('The record file is already exist.')
-    os.makedirs(record_file)
-
-    if not isinstance(images_path, list):
-        images_path = [images_path]
-    if not isinstance(annotations_path, list):
-        annotations_path = [annotations_path]
-    if not isinstance(splits_path, list):
-        splits_path = [splits_path]
-    assert len(splits) == len(splits_path)
-    assert len(splits) == len(images_path)
-    assert len(splits) == len(annotations_path)
+def write_dataset(args):
+    """Write the record dataset."""
+    assert len(args.splits) == len(args.images)
+    assert len(args.splits) == len(args.annotations)
+    if os.path.exists(args.rec):
+        raise ValueError('The record path is already exist.')
+    os.makedirs(args.rec)
+    print('Write record dataset to {}'.format(args.rec))

    writer = dragon.io.KPLRecordWriter(
-        path=record_file,
+        path=args.rec,
        protocol={
            'id': 'string',
            'content': 'bytes',
@@ -95,36 +121,56 @@ def make_record(
        }
    )

-    # Scan all available entries
+    # Scan all available entries.
    print('Scan entries...')
    entries = []
-    for i, split in enumerate(splits):
-        split_file = os.path.join(splits_path[i], split + '.txt')
-        with open(split_file, 'r') as f:
+    for i, split in enumerate(args.splits):
+        with open(split, 'r') as f:
            lines = f.readlines()
        for line in lines:
            filename = line.strip()
-            img_file = os.path.join(images_path[i], filename + '.jpg')
-            ann_file = os.path.join(annotations_path[i], filename + '.xml')
+            img_file = os.path.join(args.images[i], filename + '.jpg')
+            ann_file = os.path.join(args.annotations[i], filename + '.xml')
            entries.append((img_file, ann_file))

-    # Parse and write into record file
+    # Parse and write into record file.
    print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
    start_time = time.time()
-
-    for i, (img_file, ann_file) in enumerate(entries):
+    for i, (img_file, xml_file) in enumerate(entries):
        if i > 0 and i % 2000 == 0:
            now_time = time.time()
            print('{} / {} in {:.2f} sec'.format(
                i, len(entries), now_time - start_time))
-        writer.write(make_example(img_file, ann_file))
-
+        writer.write(make_example(img_file, xml_file))
    now_time = time.time()
    print('{} / {} in {:.2f} sec'.format(
        len(entries), len(entries), now_time - start_time))
    writer.close()

    end_time = time.time()
-    data_size = os.path.getsize(record_file + '/root.data') * 1e-6
+    data_size = os.path.getsize(args.rec + '/root.data') * 1e-6
    print('{} images take {:.2f} MB in {:.2f} sec.'
          .format(len(entries), data_size, end_time - start_time))
+
+
+def write_json_dataset(args):
+    """Write the json dataset."""
+    categories = ['aeroplane', 'bicycle', 'bird', 'boat',
+                  'bottle', 'bus', 'car', 'cat', 'chair',
+                  'cow', 'diningtable', 'dog', 'horse',
+                  'motorbike', 'person', 'pottedplant',
+                  'sheep', 'sofa', 'train', 'tvmonitor']
+    import subprocess
+    scirpt = os.path.dirname(os.path.abspath(__file__)) + '/json_dataset.py'
+    cmd = '{} {} '.format(sys.executable, scirpt)
+    cmd += '--rec {} --gt {} '.format(args.rec, args.gt)
+    cmd += '--categories {} '.format(' '.join(categories))
+    return subprocess.call(cmd, shell=True)
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    if args.rec is not None:
+        write_dataset(args)
+    if args.gt is not None:
+        write_json_dataset(args)
--- a/scripts/rotated/maker.py
+++ b/scripts/rotated/maker.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-import time
-
-import cv2
-import dragon
-import numpy as np
-import xml.etree.ElementTree as ET
-
-
-def make_example(image_file, xml_file):
-    tree = ET.parse(xml_file)
-    filename = os.path.split(xml_file)[-1]
-    objs = tree.findall('object')
-    example = {'id': filename.split('.')[0], 'object': []}
-    with open(image_file, 'rb') as f:
-        img_bytes = bytes(f.read())
-    img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 1)
-    example['height'], example['width'], example['depth'] = img.shape
-    example['content'] = img_bytes
-    for ix, obj in enumerate(objs):
-        bbox = obj.find('bndbox')
-        is_diff = 0
-        if obj.find('difficult') is not None:
-            is_diff = int(obj.find('difficult').text) == 1
-        example['object'].append({
-            'name': obj.find('name').text.strip(),
-            'x1': float(bbox.find('x1').text),
-            'y1': float(bbox.find('y1').text),
-            'x2': float(bbox.find('x2').text),
-            'y2': float(bbox.find('y2').text),
-            'x3': float(bbox.find('x3').text),
-            'y3': float(bbox.find('y3').text),
-            'x4': float(bbox.find('x4').text),
-            'y4': float(bbox.find('y4').text),
-            'difficult': is_diff,
-        })
-
-    return example
-
-
-def make_record(
-    record_file,
-    images_path,
-    annotations_path,
-    splits_path,
-    splits
-):
-    if os.path.exists(record_file):
-        raise ValueError('The record file is already exist.')
-    os.makedirs(record_file)
-
-    if not isinstance(images_path, list):
-        images_path = [images_path]
-    if not isinstance(annotations_path, list):
-        annotations_path = [annotations_path]
-    if not isinstance(splits_path, list):
-        splits_path = [splits_path]
-    assert len(splits) == len(splits_path)
-    assert len(splits) == len(images_path)
-    assert len(splits) == len(annotations_path)
-
-    print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
-
-    writer = dragon.io.KPLRecordWriter(
-        path=record_file,
-        protocol={
-            'id': 'string',
-            'content': 'bytes',
-            'height': 'int64',
-            'width': 'int64',
-            'depth': 'int64',
-            'object': [{
-                'name': 'string',
-                'x1': 'float64',
-                'y1': 'float64',
-                'x2': 'float64',
-                'y2': 'float64',
-                'x3': 'float64',
-                'y3': 'float64',
-                'x4': 'float64',
-                'y4': 'float64',
-                'difficult': 'int64',
-            }]
-        }
-    )
-
-    # Scan all available entries
-    print('Scan entries...')
-    entries = []
-    for i, split in enumerate(splits):
-        split_file = os.path.join(splits_path[i], split + '.txt')
-        with open(split_file, 'r') as f:
-            lines = f.readlines()
-        for line in lines:
-            filename = line.strip()
-            img_file = os.path.join(images_path[i], filename + '.jpg')
-            ann_file = os.path.join(annotations_path[i], filename + '.xml')
-            entries.append((img_file, ann_file))
-
-    # Parse and write into record file
-    print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
-    start_time = time.time()
-
-    for i, (img_file, ann_file) in enumerate(entries):
-        if i > 0 and i % 2000 == 0:
-            now_time = time.time()
-            print('{} / {} in {:.2f} sec'.format(
-                i, len(entries), now_time - start_time))
-        writer.write(make_example(img_file, ann_file))
-
-    now_time = time.time()
-    print('{} / {} in {:.2f} sec'.format(
-        len(entries), len(entries), now_time - start_time))
-    writer.close()
-
-    end_time = time.time()
-    data_size = os.path.getsize(record_file + '/root.data') * 1e-6
-    print('{} images take {:.2f} MB in {:.2f} sec.'
-          .format(len(entries), data_size, end_time - start_time))
--- a/scripts/voc/im2rec.py
+++ b/scripts/voc/im2rec.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""Make record file for VOC dataset."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from os import path as osp
-from maker import make_record
-
-
-if __name__ == '__main__':
-    voc_root = '/data'
-
-    make_record(
-        record_file=osp.join(voc_root, 'voc_0712_trainval'),
-        images_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/JPEGImages'),
-                     osp.join(voc_root, 'VOCdevkit2012/VOC2012/JPEGImages')],
-        annotations_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/Annotations'),
-                          osp.join(voc_root, 'VOCdevkit2012/VOC2012/Annotations')],
-        splits_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/ImageSets/Main'),
-                     osp.join(voc_root, 'VOCdevkit2012/VOC2012/ImageSets/Main')],
-        splits=['trainval', 'trainval']
-    )
-
-    make_record(
-        record_file=osp.join(voc_root, 'voc_2007_test'),
-        images_path=osp.join(voc_root, 'VOCdevkit2007/VOC2007/JPEGImages'),
-        annotations_path=osp.join(voc_root, 'VOCdevkit2007/VOC2007/Annotations'),
-        splits_path=osp.join(voc_root, 'VOCdevkit2007/VOC2007/ImageSets/Main'),
-        splits=['test']
-    )
--- a/seetadet/algo/common/anchor_sampler.py
+++ b/seetadet/algo/common/anchor_sampler.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from seetadet.core.config import cfg
-
-
-class AnchorSampler(object):
-    """Sample precomputed anchors asynchronously."""
-
-    def __init__(self):
-        self._rpn_target = None
-        self._retinanet_target = None
-        self._ssd_target = None
-        if 'rcnn' in cfg.MODEL.TYPE:
-            from seetadet.algo.faster_rcnn import anchor_target
-            self._rpn_target = anchor_target.AnchorTarget()
-        elif cfg.MODEL.TYPE == 'retinanet':
-            from seetadet.algo.retinanet import anchor_target
-            self._retinanet_target = anchor_target.AnchorTarget()
-        elif cfg.MODEL.TYPE == 'ssd':
-            from seetadet.algo.ssd import anchor_target
-            self._ssd_target = anchor_target.AnchorTarget()
-
-    def __call__(self, **inputs):
-        """Return the sample anchors."""
-        if self._rpn_target:
-            fg_inds, bg_inds = \
-                self._rpn_target.sample_anchors(
-                    gt_boxes=inputs['gt_boxes'],
-                    im_info=inputs['im_info'],
-                )
-            return {'fg_inds': fg_inds, 'bg_inds': bg_inds}
-        if self._retinanet_target:
-            fg_inds, ignore_inds = \
-                self._retinanet_target.sample_anchors(
-                    gt_boxes=inputs['gt_boxes'],
-                    im_info=inputs['im_info'],
-                )
-            return {'fg_inds': fg_inds, 'bg_inds': ignore_inds}
-        if self._ssd_target:
-            fg_inds, neg_inds = \
-                self._ssd_target.sample_anchors(
-                    gt_boxes=inputs['gt_boxes'],
-                )
-            return {'fg_inds': fg_inds, 'bg_inds': neg_inds}
-        return {}
--- a/seetadet/algo/faster_rcnn/anchor_target.py
+++ b/seetadet/algo/faster_rcnn/anchor_target.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-import math
-
-import numpy as np
-import numpy.random as npr
-
-from seetadet.algo.faster_rcnn import generate_anchors as anchor_util
-from seetadet.algo.faster_rcnn import utils as rcnn_util
-from seetadet.core.config import cfg
-from seetadet.utils import boxes as box_util
-from seetadet.utils.env import new_tensor
-
-
-class AnchorTarget(object):
-    """Assign ground-truth targets to anchors."""
-
-    def __init__(self):
-        super(AnchorTarget, self).__init__()
-        # Load the basic configs
-        self.scales = cfg.RPN.SCALES
-        self.strides = cfg.RPN.STRIDES
-        self.ratios = cfg.RPN.ASPECT_RATIOS
-        self.num_strides = len(self.strides)
-        # Generate base anchors
-        self.base_anchors = []
-        for i in range(self.num_strides):
-            self.base_anchors.append(
-                anchor_util.generate_anchors(
-                    self.strides[i],
-                    self.ratios,
-                    np.array([self.scales[i]])
-                    if self.num_strides > 1
-                    else np.array(self.scales)))
-        # Plan the maximum shifted anchor layout
-        max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
-        if cfg.MODEL.COARSEST_STRIDE > 0:
-            stride = float(cfg.MODEL.COARSEST_STRIDE)
-            max_size = int(math.ceil(max_size / stride) * stride)
-        self.max_shapes = [[math.ceil(max_size / stride)] * 2
-                           for stride in self.strides]
-        self.all_coords = rcnn_util.get_shifted_coords(
-            self.max_shapes, self.base_anchors)
-        self.all_anchors = rcnn_util.get_shifted_anchors(
-            self.max_shapes, self.base_anchors, self.strides)
-
-    def sample_anchors(self, gt_boxes, im_info, all_anchors=None):
-        if all_anchors is None:
-            all_anchors = self.all_anchors
-
-        # Only keep anchors inside the image
-        # to get higher quality proposals.
-        inds_inside = np.where(
-            (all_anchors[:, 0] >= 0) &
-            (all_anchors[:, 1] >= 0) &
-            (all_anchors[:, 2] < im_info[1]) &
-            (all_anchors[:, 3] < im_info[0]))[0]
-        anchors = all_anchors[inds_inside, :]
-
-        num_inside = len(inds_inside)
-        labels = np.empty((num_inside,), 'int32')
-        labels.fill(-1)
-
-        # Overlaps between the anchors and the gt boxes.
-        overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
-        argmax_overlaps = overlaps.argmax(axis=1)
-        max_overlaps = overlaps[np.arange(num_inside), argmax_overlaps]
-
-        # Overlaps between the gt boxes and anchors with highest IoU.
-        gt_argmax_overlaps = overlaps.argmax(axis=0)
-        gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])]
-        gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
-
-        # Foreground: for each gt, anchor with highest overlap.
-        labels[gt_argmax_overlaps] = 1
-
-        # Foreground: above threshold IoU.
-        labels[max_overlaps >= cfg.RPN.POSITIVE_OVERLAP] = 1
-
-        # Background: below threshold IoU.
-        labels[max_overlaps < cfg.RPN.NEGATIVE_OVERLAP] = 0
-
-        # Retract the clamping if we don't have one.
-        fg_inds = np.where(labels == 1)[0]
-        if len(fg_inds) == 0:
-            labels[gt_argmax_overlaps] = 1
-            fg_inds = np.where(labels == 1)[0]
-
-        # Subsample positive labels if we have too many.
-        num_fg = int(cfg.RPN.FG_FRACTION * cfg.RPN.BATCH_SIZE)
-        if len(fg_inds) > num_fg:
-            fg_inds = npr.choice(fg_inds, num_fg, False)
-
-        # Subsample negative labels if we have too many.
-        num_bg = cfg.RPN.BATCH_SIZE - len(fg_inds)
-        bg_inds = np.where(labels == 0)[0]
-        if len(bg_inds) > num_bg:
-            bg_inds = npr.choice(bg_inds, num_bg, False)
-
-        return inds_inside[fg_inds], inds_inside[bg_inds]
-
-    def __call__(self, **inputs):
-        num_images = cfg.TRAIN.IMS_PER_BATCH
-        shapes = [f.shape[-2:] for f in inputs['features']]
-        image_stride = sum(self.base_anchors[i].shape[0] * np.prod(shapes[i])
-                           for i in range(len(inputs['features'])))
-
-        narrow_args = [self.all_coords, self.base_anchors, self.max_shapes, shapes]
-        outputs = collections.defaultdict(list)
-
-        for ix in range(num_images):
-            fg_inds = inputs['fg_inds'][ix]
-            bg_inds = inputs['bg_inds'][ix]
-            gt_boxes = inputs['gt_boxes'][ix]
-
-            # Narrow anchors to match the feature layout
-            anchors = self.all_anchors[fg_inds]
-            bg_inds = rcnn_util.narrow_anchors(*(narrow_args + [bg_inds]))
-            _, anchors = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds, anchors]))
-            fg_inds = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds]))
-
-            # Compute bbox targets
-            gt_assignment = box_util.bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
-            bbox_targets = box_util.bbox_transform(anchors, gt_boxes[gt_assignment, :4])
-            outputs['bbox_anchors'].append(anchors)
-            outputs['bbox_targets'].append(bbox_targets)
-
-            # Compute sparse indices
-            fg_inds += ix * image_stride
-            bg_inds += ix * image_stride
-            outputs['cls_inds'].extend([fg_inds, bg_inds])
-            outputs['bbox_inds'].extend([fg_inds])
-            outputs['labels'].extend([np.ones_like(fg_inds, 'float32'),
-                                      np.zeros_like(bg_inds, 'float32')])
-
-        return {
-            'labels': new_tensor(
-                np.concatenate(outputs['labels'])),
-            'cls_inds': new_tensor(
-                np.concatenate(outputs['cls_inds'])),
-            'bbox_inds': new_tensor(
-                np.concatenate(outputs['bbox_inds'])),
-            'bbox_targets': new_tensor(
-                np.concatenate(outputs['bbox_targets']).astype('float32')),
-            'bbox_anchors': new_tensor(
-                np.concatenate(outputs['bbox_anchors']).astype('float32')),
-        }
--- a/seetadet/algo/faster_rcnn/data_loader.py
+++ b/seetadet/algo/faster_rcnn/data_loader.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-import multiprocessing as mp
-import time
-import threading
-import queue
-
-import dragon
-import dragon.vm.torch as torch
-import numpy as np
-
-from seetadet.algo.faster_rcnn import data_transformer
-from seetadet.core.config import cfg
-from seetadet.datasets.factory import get_dataset
-from seetadet.utils import blob as blob_util
-from seetadet.utils import logger
-
-
-class DataLoader(object):
-    """Load mini-batches of data."""
-
-    def __init__(self):
-        super(DataLoader, self).__init__()
-        dataset = get_dataset(cfg.TRAIN.DATASET)
-        self.iterator = Iterator(**{
-            'dataset': dataset.cls,
-            'source': dataset.source,
-            'classes': dataset.classes,
-            'shuffle': cfg.TRAIN.USE_SHUFFLE,
-            'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
-            'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
-        })
-        self.iterator.start()
-
-    def __call__(self):
-        outputs = self.iterator.next()
-        if isinstance(outputs['image'], np.ndarray):
-            outputs['image'] = torch.from_numpy(outputs['image'])
-        return outputs
-
-
-class Iterator(threading.Thread):
-    """Iterator to return the batch of data."""
-
-    def __init__(self, **kwargs):
-        super(Iterator, self).__init__()
-        # Distributed settings
-        rank, group_size = 0, 1
-        process_group = dragon.distributed.get_group()
-        if process_group is not None and \
-                kwargs.get('phase', 'TRAIN') == 'TRAIN':
-            group_size = process_group.size
-            rank = dragon.distributed.get_rank(process_group)
-
-        # Configuration
-        self._batch_size = kwargs.get('batch_size', 2)
-        self._num_readers = kwargs.get('num_readers', 1)
-        self._num_transformers = kwargs.get('num_transformers', 3)
-        self.daemon = True
-
-        # Initialize queues
-        num_batches = self._num_readers
-        self._queue1 = mp.Queue(num_batches * self._batch_size)
-        self._queue2 = mp.Queue(num_batches * self._batch_size)
-        self._queue3 = queue.Queue(num_batches)
-
-        # Initialize readers
-        self._readers = []
-        for i in range(self._num_readers):
-            part_idx, num_parts = i, self._num_readers
-            num_parts *= group_size
-            part_idx += rank * self._num_readers
-            self._readers.append(dragon.io.DataReader(
-                part_idx=part_idx, num_parts=num_parts, **kwargs))
-            self._readers[i]._seed += part_idx
-            self._readers[i].q_out = self._queue1
-            self._readers[i].start()
-            time.sleep(0.1)
-
-        # Initialize transformers
-        self._transformers = []
-        for i in range(self._num_transformers):
-            p = data_transformer.DataTransformer(**kwargs)
-            p._seed += (i + rank * self._num_transformers)
-            p.q_in, p.q_out = self._queue1, self._queue2
-            p.start()
-            self._transformers.append(p)
-            time.sleep(0.1)
-
-        # Register cleanup callbacks
-        def cleanup():
-            def terminate(processes):
-                for p in processes:
-                    p.terminate()
-                    p.join()
-            terminate(self._transformers)
-            logger.info('Terminate DataTransformer.')
-            terminate(self._readers)
-            logger.info('Terminate DataReader.')
-
-        import atexit
-        atexit.register(cleanup)
-
-    def next(self):
-        """Return the next batch of data."""
-        return self.__next__()
-
-    def run(self):
-        """Main loop."""
-        num_images = cfg.TRAIN.IMS_PER_BATCH
-        num_batches = cfg.TRAIN.ASPECT_GROUPING
-        logger.info('Initialize prefetching batches...')
-        example_buffer = [self._queue2.get()
-                          for _ in range(num_images * num_batches)]
-        next_examples = []
-
-        while True:
-            # Use cached buffer for next N examples
-            # Examples are sorted to simulate aspect grouping
-            if len(next_examples) == 0:
-                next_examples = example_buffer
-                next_examples.sort(key=lambda d: d['aspect_ratio'])
-                example_buffer = []
-
-            # Prepare the next batch
-            outputs = collections.defaultdict(list)
-            for i in range(num_images):
-                example = next_examples.pop(0)
-                outputs['image'].append(example['image'])
-                outputs['gt_boxes'].append(example['boxes'])
-                outputs['im_info'].append(example['im_info'])
-                outputs['fg_inds'].append(example.get('fg_inds', None))
-                outputs['bg_inds'].append(example.get('bg_inds', None))
-                example_buffer.append(self._queue2.get())
-            outputs['image'] = blob_util.im_list_to_blob(
-                outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
-
-            # Send batch data to consumer
-            self._queue3.put(outputs)
-
-    def __iter__(self):
-        """Return the iterator self."""
-        return self
-
-    def __next__(self):
-        """Return the next batch of data."""
-        return self._queue3.get()
--- a/seetadet/algo/faster_rcnn/data_transformer.py
+++ b/seetadet/algo/faster_rcnn/data_transformer.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import multiprocessing
-
-import cv2
-import numpy as np
-import numpy.random as npr
-
-from seetadet.algo import common as algo_common
-from seetadet.core.config import cfg
-from seetadet.datasets.example import Example
-from seetadet.utils import boxes as box_util
-from seetadet.utils import image as image_util
-
-
-class DataTransformer(multiprocessing.Process):
-    """DataTransformer."""
-
-    def __init__(self, **kwargs):
-        super(DataTransformer, self).__init__()
-        self._scales = cfg.TRAIN.SCALES
-        self._random_scales = cfg.TRAIN.RANDOM_SCALES
-        self._max_size = cfg.TRAIN.MAX_SIZE
-        self._seed = cfg.RNG_SEED
-        self._use_diff = cfg.TRAIN.USE_DIFF
-        self._use_flipped = cfg.TRAIN.USE_FLIPPED
-        self._use_distort = cfg.TRAIN.USE_COLOR_JITTER
-        self._classes = kwargs.get('classes', ('__background__',))
-        self._num_classes = len(self._classes)
-        self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
-        self._anchor_sampler = algo_common.AnchorSampler()
-        self.q_in = self.q_out = None
-        self.daemon = True
-
-    def get_boxes(self, example, im_scale, im_offset, flipped):
-        objects, num_objects = example.objects, 0
-        height, width = example.height, example.width
-        if not self._use_diff:
-            for obj in objects:
-                if obj.get('difficult', 0) == 0:
-                    num_objects += 1
-        else:
-            num_objects = len(objects)
-
-        boxes = np.zeros((num_objects, 4), 'float32')
-        gt_classes = np.zeros((num_objects,), 'float32')
-
-        # Filter the difficult instances.
-        object_idx = 0
-        for obj in objects:
-            if not self._use_diff and obj.get('difficult', 0) > 0:
-                continue
-            bbox = obj['bbox']
-            boxes[object_idx, :] = [max(0, bbox[0]),
-                                    max(0, bbox[1]),
-                                    min(bbox[2], width - 1),
-                                    min(bbox[3], height - 1)]
-            gt_classes[object_idx] = self._class_to_ind[obj['name']]
-            object_idx += 1
-
-        # Flip the boxes if necessary.
-        if flipped:
-            boxes = box_util.flip_boxes(boxes, width)
-
-        # Scale the boxes to the detecting scale.
-        boxes *= im_scale
-
-        # Offset the boxes to align the cropping.
-        if im_offset is not None:
-            boxes[:, 0::2] += im_offset[1]
-            boxes[:, 1::2] += im_offset[0]
-            boxes[:, :] = np.minimum(
-                np.maximum(boxes[:, :], 0),
-                [im_offset[2][1] - 1, im_offset[2][0] - 1] * 2)
-
-        # Attach the classes.
-        gt_boxes = np.empty((num_objects, 5), dtype=np.float32)
-        gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
-
-        return gt_boxes
-
-    def get(self, example):
-        example = Example(example)
-
-        # Resize.
-        target_size = npr.choice(self._scales)
-        img, im_scale = image_util.resize_image_with_target_size(
-            example.image,
-            target_size=target_size,
-            max_size=self._max_size,
-            random_scales=self._random_scales,
-        )
-
-        # Flip.
-        flipped = False
-        if self._use_flipped and npr.randint(2) > 0:
-            img = img[:, ::-1]
-            flipped = True
-
-        # Crop or Pad.
-        im_offset = None
-        if self._max_size == 0:
-            img, im_offset = image_util.get_image_with_target_size(
-                img, target_size)
-
-        # Distort.
-        if self._use_distort:
-            img = image_util.distort_image(img)
-
-        # Boxes.
-        boxes = self.get_boxes(example, im_scale, im_offset, flipped)
-
-        # Standard outputs.
-        outputs = {'image': img,
-                   'boxes': boxes,
-                   'im_info': img.shape[:2] + (im_scale,)}
-
-        # Attach precomputed targets.
-        if len(boxes) > 0:
-            outputs.update(
-                self._anchor_sampler(
-                    gt_boxes=boxes,
-                    im_info=outputs['im_info']))
-
-        return outputs
-
-    def run(self):
-        # Disable the opencv threading.
-        cv2.setNumThreads(1)
-        # Fix the process-local random seed.
-        np.random.seed(self._seed)
-
-        # Main prefetch loop
-        while True:
-            outputs = self.get(self.q_in.get())
-            if len(outputs['boxes']) < 1:
-                continue  # Ignore non-object image.
-            height, width = outputs['image'].shape[:2]
-            outputs['aspect_ratio'] = float(height) / float(width)
-            self.q_out.put(outputs)
--- a/seetadet/algo/faster_rcnn/generate_anchors.py
+++ b/seetadet/algo/faster_rcnn/generate_anchors.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# Codes are based on:
-#
-#     <https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/generate_anchors.py>
-#
-# ------------------------------------------------------------
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-
-# Verify that we compute the same anchors as Shaoqing's matlab implementation:
-#
-#    >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat
-#    >> anchors
-#
-#    anchors =
-#
-#       -83   -39   100    56
-#      -175   -87   192   104
-#      -359  -183   376   200
-#       -55   -55    72    72
-#      -119  -119   136   136
-#      -247  -247   264   264
-#       -35   -79    52    96
-#       -79  -167    96   184
-#      -167  -343   184   360
-
-# array([[ -83.,  -39.,  100.,   56.],
-#       [-175.,  -87.,  192.,  104.],
-#       [-359., -183.,  376.,  200.],
-#       [ -55.,  -55.,   72.,   72.],
-#       [-119., -119.,  136.,  136.],
-#       [-247., -247.,  264.,  264.],
-#       [ -35.,  -79.,   52.,   96.],
-#       [ -79., -167.,   96.,  184.],
-#       [-167., -343.,  184.,  360.]])
-
-
-def generate_anchors(
-    base_size=16,
-    ratios=(0.5, 1, 2),
-    scales=2**np.arange(3, 6),
-):
-    """
-    Generate anchor (reference) windows by enumerating aspect ratios X
-    scales wrt a reference (0, 0, 15, 15) window.
-    """
-    base_anchor = np.array([1, 1, base_size, base_size]) - 1
-    ratio_anchors = _ratio_enum(base_anchor, ratios)
-    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
-                         for i in range(ratio_anchors.shape[0])])
-    return anchors
-
-
-def generate_anchors_v2(
-    stride=16,
-    ratios=(0.5, 1, 2),
-    sizes=(32, 64, 128, 256, 512),
-):
-    """
-    Generates a matrix of anchor boxes in (x1, y1, x2, y2) format. Anchors
-    are centered on stride / 2, have (approximate) sqrt areas of the specified
-    sizes, and aspect ratios as given.
-    """
-    return generate_anchors(
-        base_size=stride,
-        ratios=ratios,
-        scales=np.array(sizes, dtype=np.float) / stride,
-    )
-
-
-def _whctrs(anchor):
-    """Return width, height, x center, and y center for an anchor (window)."""
-    w = anchor[2] - anchor[0] + 1
-    h = anchor[3] - anchor[1] + 1
-    x_ctr = anchor[0] + 0.5 * (w - 1)
-    y_ctr = anchor[1] + 0.5 * (h - 1)
-    return w, h, x_ctr, y_ctr
-
-
-def _mkanchors(ws, hs, x_ctr, y_ctr):
-    """
-    Given a vector of widths (ws) and heights (hs) around a center
-    (x_ctr, y_ctr), output a set of anchors (windows).
-    """
-    ws = ws[:, np.newaxis]
-    hs = hs[:, np.newaxis]
-    anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
-                         y_ctr - 0.5 * (hs - 1),
-                         x_ctr + 0.5 * (ws - 1),
-                         y_ctr + 0.5 * (hs - 1)))
-    return anchors
-
-
-def _ratio_enum(anchor, ratios):
-    """Enumerate a set of anchors for each aspect ratio wrt an anchor."""
-    w, h, x_ctr, y_ctr = _whctrs(anchor)
-    size = w * h
-    size_ratios = size / ratios
-    ws = np.round(np.sqrt(size_ratios))
-    hs = np.round(ws * ratios)
-    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
-    return anchors
-
-
-def _scale_enum(anchor, scales):
-    """Enumerate a set of anchors for each scale wrt an anchor."""
-    w, h, x_ctr, y_ctr = _whctrs(anchor)
-    ws = w * scales
-    hs = h * scales
-    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
-    return anchors
-
-
-if __name__ == '__main__':
-    print(generate_anchors())
--- a/seetadet/algo/faster_rcnn/proposal.py
+++ b/seetadet/algo/faster_rcnn/proposal.py
--- a/seetadet/algo/faster_rcnn/proposal_target.py
+++ b/seetadet/algo/faster_rcnn/proposal_target.py
--- a/seetadet/algo/faster_rcnn/test.py
+++ b/seetadet/algo/faster_rcnn/test.py
--- a/seetadet/algo/faster_rcnn/utils.py
+++ b/seetadet/algo/faster_rcnn/utils.py
--- a/seetadet/algo/mask_rcnn/data_loader.py
+++ b/seetadet/algo/mask_rcnn/data_loader.py
--- a/seetadet/algo/mask_rcnn/data_transformer.py
+++ b/seetadet/algo/mask_rcnn/data_transformer.py
--- a/seetadet/algo/mask_rcnn/proposal_target.py
+++ b/seetadet/algo/mask_rcnn/proposal_target.py
--- a/seetadet/algo/mask_rcnn/test.py
+++ b/seetadet/algo/mask_rcnn/test.py
--- a/seetadet/algo/retinanet/anchor_target.py
+++ b/seetadet/algo/retinanet/anchor_target.py
--- a/seetadet/algo/retinanet/test.py
+++ b/seetadet/algo/retinanet/test.py
--- a/seetadet/algo/ssd/anchor_target.py
+++ b/seetadet/algo/ssd/anchor_target.py
--- a/seetadet/algo/ssd/data_loader.py
+++ b/seetadet/algo/ssd/data_loader.py
--- a/seetadet/algo/ssd/data_transformer.py
+++ b/seetadet/algo/ssd/data_transformer.py
--- a/seetadet/algo/ssd/generate_anchors.py
+++ b/seetadet/algo/ssd/generate_anchors.py
--- a/seetadet/algo/ssd/test.py
+++ b/seetadet/algo/ssd/test.py
--- a/seetadet/algo/ssd/transforms.py
+++ b/seetadet/algo/ssd/transforms.py
--- a/seetadet/algo/ssd/utils.py
+++ b/seetadet/algo/ssd/utils.py
--- a/seetadet/core/backend.py
+++ b/seetadet/core/backend.py
--- a/seetadet/core/config.py
+++ b/seetadet/core/config.py
--- a/seetadet/algo/common/__init__.py
+++ b/seetadet/algo/common/__init__.py
--- a/seetadet/core/config/defaults.py
+++ b/seetadet/core/config/defaults.py
--- a/seetadet/core/config/yacs.py
+++ b/seetadet/core/config/yacs.py
--- a/seetadet/core/coordinator.py
+++ b/seetadet/core/coordinator.py
--- a/scripts/coco/__init__.py
+++ b/scripts/coco/__init__.py
--- a/seetadet/core/modules/faster_rcnn.py
+++ b/seetadet/core/modules/faster_rcnn.py
--- a/seetadet/core/modules/mask_rcnn.py
+++ b/seetadet/core/modules/mask_rcnn.py
--- a/seetadet/core/modules/retinanet.py
+++ b/seetadet/core/modules/retinanet.py
--- a/seetadet/core/modules/ssd.py
+++ b/seetadet/core/modules/ssd.py
--- a/seetadet/core/registry.py
+++ b/seetadet/core/registry.py
--- a/seetadet/core/test_engine.py
+++ b/seetadet/core/test_engine.py
--- a/seetadet/core/test_server.py
+++ b/seetadet/core/test_server.py
--- a/scripts/rotated/__init__.py
+++ b/scripts/rotated/__init__.py
--- a/seetadet/core/testing/test_engine.py
+++ b/seetadet/core/testing/test_engine.py
--- a/seetadet/core/testing/test_server.py
+++ b/seetadet/core/testing/test_server.py
--- a/seetadet/core/train.py
+++ b/seetadet/core/train.py
--- a/scripts/voc/__init__.py
+++ b/scripts/voc/__init__.py
--- a/seetadet/core/training/build.py
+++ b/seetadet/core/training/build.py
--- a/seetadet/core/training/lr_scheduler.py
+++ b/seetadet/core/training/lr_scheduler.py
--- a/seetadet/core/training/train_engine.py
+++ b/seetadet/core/training/train_engine.py
--- a/seetadet/core/training/utils.py
+++ b/seetadet/core/training/utils.py
--- a/seetadet/algo/faster_rcnn/__init__.py
+++ b/seetadet/algo/faster_rcnn/__init__.py
--- a/seetadet/algo/__init__.py
+++ b/seetadet/algo/__init__.py
--- a/seetadet/data/anchors/rpn.py
+++ b/seetadet/data/anchors/rpn.py
--- a/seetadet/data/anchors/ssd.py
+++ b/seetadet/data/anchors/ssd.py
--- a/seetadet/data/assigners.py
+++ b/seetadet/data/assigners.py
--- a/seetadet/data/build.py
+++ b/seetadet/data/build.py
--- a/scripts/rotated/im2rec.py
+++ b/scripts/rotated/im2rec.py
--- a/seetadet/data/datasets/dataset.py
+++ b/seetadet/data/datasets/dataset.py
--- a/seetadet/datasets/example.py
+++ b/seetadet/datasets/example.py
--- a/seetadet/data/datasets/kpl_dataset.py
+++ b/seetadet/data/datasets/kpl_dataset.py
--- a/seetadet/algo/ssd/__init__.py
+++ b/seetadet/algo/ssd/__init__.py
--- a/seetadet/data/evaluators/coco_evaluator.py
+++ b/seetadet/data/evaluators/coco_evaluator.py
--- a/seetadet/data/evaluators/evaluator.py
+++ b/seetadet/data/evaluators/evaluator.py
--- a/seetadet/data/evaluators/voc_eval.py
+++ b/seetadet/data/evaluators/voc_eval.py
--- a/seetadet/data/evaluators/voc_evaluator.py
+++ b/seetadet/data/evaluators/voc_evaluator.py
--- a/seetadet/data/loader.py
+++ b/seetadet/data/loader.py
--- a/seetadet/data/pipelines.py
+++ b/seetadet/data/pipelines.py
--- a/seetadet/datasets/__init__.py
+++ b/seetadet/datasets/__init__.py
--- a/seetadet/data/targets/rcnn.py
+++ b/seetadet/data/targets/rcnn.py
--- a/seetadet/data/targets/retinanet.py
+++ b/seetadet/data/targets/retinanet.py
--- a/seetadet/data/targets/rpn.py
+++ b/seetadet/data/targets/rpn.py
--- a/seetadet/data/targets/ssd.py
+++ b/seetadet/data/targets/ssd.py
--- a/seetadet/data/transforms.py
+++ b/seetadet/data/transforms.py
--- a/seetadet/datasets/coco_evaluator.py
+++ b/seetadet/datasets/coco_evaluator.py
--- a/seetadet/datasets/dataset.py
+++ b/seetadet/datasets/dataset.py
--- a/seetadet/datasets/factory.py
+++ b/seetadet/datasets/factory.py
--- a/seetadet/datasets/kpl_dataset.py
+++ b/seetadet/datasets/kpl_dataset.py
--- a/seetadet/datasets/voc_eval.py
+++ b/seetadet/datasets/voc_eval.py
--- a/seetadet/datasets/voc_evaluator.py
+++ b/seetadet/datasets/voc_evaluator.py
--- a/seetadet/modeling/airnet.py
+++ b/seetadet/modeling/airnet.py
--- a/seetadet/modeling/detector.py
+++ b/seetadet/modeling/detector.py
--- a/seetadet/modeling/efficientnet.py
+++ b/seetadet/modeling/efficientnet.py
--- a/seetadet/modeling/fast_rcnn.py
+++ b/seetadet/modeling/fast_rcnn.py
--- a/seetadet/modeling/fpn.py
+++ b/seetadet/modeling/fpn.py
--- a/seetadet/modeling/mask_rcnn.py
+++ b/seetadet/modeling/mask_rcnn.py
--- a/seetadet/modeling/mobilenet_v2.py
+++ b/seetadet/modeling/mobilenet_v2.py
--- a/seetadet/modeling/mobilenet_v3.py
+++ b/seetadet/modeling/mobilenet_v3.py
--- a/seetadet/modeling/resnet.py
+++ b/seetadet/modeling/resnet.py
--- a/seetadet/modeling/retinanet.py
+++ b/seetadet/modeling/retinanet.py
--- a/seetadet/modeling/rpn.py
+++ b/seetadet/modeling/rpn.py
--- a/seetadet/modeling/ssd.py
+++ b/seetadet/modeling/ssd.py
--- a/seetadet/modeling/vgg.py
+++ b/seetadet/modeling/vgg.py
--- a/seetadet/modeling/__init__.py
+++ b/seetadet/modeling/__init__.py
--- a/seetadet/models/backbones/__init__.py
+++ b/seetadet/models/backbones/__init__.py
--- a/seetadet/models/backbones/fpn.py
+++ b/seetadet/models/backbones/fpn.py
--- a/seetadet/models/backbones/mobilenet_v2.py
+++ b/seetadet/models/backbones/mobilenet_v2.py
--- a/seetadet/models/backbones/mobilenet_v3.py
+++ b/seetadet/models/backbones/mobilenet_v3.py
--- a/seetadet/models/backbones/resnet.py
+++ b/seetadet/models/backbones/resnet.py
--- a/seetadet/models/backbones/vgg.py
+++ b/seetadet/models/backbones/vgg.py
--- a/seetadet/models/build.py
+++ b/seetadet/models/build.py
--- a/seetadet/solver/__init__.py
+++ b/seetadet/solver/__init__.py
--- a/seetadet/models/decoders/retinanet.py
+++ b/seetadet/models/decoders/retinanet.py
--- a/seetadet/models/decoders/rpn.py
+++ b/seetadet/models/decoders/rpn.py
--- a/seetadet/models/dense_heads/__init__.py
+++ b/seetadet/models/dense_heads/__init__.py
--- a/seetadet/models/dense_heads/retinanet.py
+++ b/seetadet/models/dense_heads/retinanet.py
--- a/seetadet/models/dense_heads/rpn.py
+++ b/seetadet/models/dense_heads/rpn.py
--- a/seetadet/models/dense_heads/ssd.py
+++ b/seetadet/models/dense_heads/ssd.py
--- a/seetadet/algo/retinanet/data_loader.py
+++ b/seetadet/algo/retinanet/data_loader.py
--- a/seetadet/models/detectors/detector.py
+++ b/seetadet/models/detectors/detector.py
--- a/seetadet/models/detectors/faster_rcnn.py
+++ b/seetadet/models/detectors/faster_rcnn.py
--- a/seetadet/models/detectors/mask_rcnn.py
+++ b/seetadet/models/detectors/mask_rcnn.py
--- a/seetadet/models/detectors/retinanet.py
+++ b/seetadet/models/detectors/retinanet.py
--- a/seetadet/models/detectors/ssd.py
+++ b/seetadet/models/detectors/ssd.py
--- a/seetadet/models/roi_heads/__init__.py
+++ b/seetadet/models/roi_heads/__init__.py
--- a/seetadet/models/roi_heads/fast_rcnn.py
+++ b/seetadet/models/roi_heads/fast_rcnn.py
--- a/seetadet/models/roi_heads/mask_rcnn.py
+++ b/seetadet/models/roi_heads/mask_rcnn.py
--- a/seetadet/modules/det.py
+++ b/seetadet/modules/det.py
--- a/seetadet/modules/nn.py
+++ b/seetadet/modules/nn.py
--- a/seetadet/modules/utils.py
+++ b/seetadet/modules/utils.py
--- a/seetadet/modules/vision.py
+++ b/seetadet/modules/vision.py
--- a/seetadet/modules/__init__.py
+++ b/seetadet/modules/__init__.py
--- a/seetadet/ops/build.py
+++ b/seetadet/ops/build.py
--- a/seetadet/ops/conv.py
+++ b/seetadet/ops/conv.py
--- a/seetadet/ops/fusion.py
+++ b/seetadet/ops/fusion.py
--- a/seetadet/ops/loss.py
+++ b/seetadet/ops/loss.py
--- a/seetadet/ops/nn.py
+++ b/seetadet/ops/nn.py
--- a/seetadet/ops/normalization.py
+++ b/seetadet/ops/normalization.py
--- a/seetadet/onnx/nodes.py
+++ b/seetadet/onnx/nodes.py
--- a/seetadet/ops/vision.py
+++ b/seetadet/ops/vision.py
--- a/seetadet/solver/lr_scheduler.py
+++ b/seetadet/solver/lr_scheduler.py
--- a/seetadet/solver/sgd.py
+++ b/seetadet/solver/sgd.py
--- a/seetadet/utils/attrdict.py
+++ b/seetadet/utils/attrdict.py
--- a/seetadet/utils/bbox/__init__.py
+++ b/seetadet/utils/bbox/__init__.py
--- a/seetadet/utils/bbox/helper.py
+++ b/seetadet/utils/bbox/helper.py
--- a/seetadet/utils/boxes_v2.py
+++ b/seetadet/utils/boxes_v2.py
--- a/seetadet/utils/boxes.py
+++ b/seetadet/utils/boxes.py
--- a/seetadet/utils/blob.py
+++ b/seetadet/utils/blob.py
--- a/seetadet/utils/env.py
+++ b/seetadet/utils/env.py
--- a/seetadet/utils/image.py
+++ b/seetadet/utils/image.py
--- a/seetadet/utils/logger.py
+++ b/seetadet/utils/logger.py
--- a/seetadet/utils/mask.py
+++ b/seetadet/utils/mask.py
--- a/seetadet/onnx/__init__.py
+++ b/seetadet/onnx/__init__.py
--- a/seetadet/utils/mask/helper.py
+++ b/seetadet/utils/mask/helper.py
--- a/seetadet/utils/mask/metrics.py
+++ b/seetadet/utils/mask/metrics.py
--- a/seetadet/utils/nms.py
+++ b/seetadet/utils/nms.py
--- a/seetadet/modules/init.py
+++ b/seetadet/modules/init.py
--- a/seetadet/utils/nms/nms_impl.py
+++ b/seetadet/utils/nms/nms_impl.py
--- a/seetadet/utils/observer.py
+++ b/seetadet/utils/observer.py
--- a/seetadet/algo/mask_rcnn/__init__.py
+++ b/seetadet/algo/mask_rcnn/__init__.py
--- a/seetadet/utils/stats.py
+++ b/seetadet/utils/stats.py
--- a/seetadet/utils/time_util.py
+++ b/seetadet/utils/time_util.py
--- a/seetadet/utils/pycocotools/__init__.py
+++ b/seetadet/utils/pycocotools/__init__.py
--- a/seetadet/utils/pycocotools/coco.py
+++ b/seetadet/utils/pycocotools/coco.py
--- a/seetadet/utils/pycocotools/cocoeval.py
+++ b/seetadet/utils/pycocotools/cocoeval.py
--- a/seetadet/utils/pycocotools/mask.py
+++ b/seetadet/utils/pycocotools/mask.py
--- a/seetadet/utils/pycocotools/mask_utils.py
+++ b/seetadet/utils/pycocotools/mask_utils.py
--- a/seetadet/algo/retinanet/__init__.py
+++ b/seetadet/algo/retinanet/__init__.py
--- a/seetadet/utils/colormap.py
+++ b/seetadet/utils/colormap.py
--- a/seetadet/utils/vis.py
+++ b/seetadet/utils/vis.py
--- a/setup.py
+++ b/setup.py
--- a/tools/mpi_train.py
+++ b/tools/mpi_train.py
--- a/tools/export.py
+++ b/tools/export.py
--- a/tools/serve.py
+++ b/tools/serve.py
--- a/tools/test.py
+++ b/tools/test.py
--- a/tools/train.py
+++ b/tools/train.py
--- a/version.txt
+++ b/version.txt