Update version to 0.6.0a

Ting PAN
Commit ca4313d9 authored Apr 02, 2022 by Ting PAN
Showing with 9307 additions and 13443 deletions
.flake8
MODEL_ZOO.md
README.md
configs/faster_rcnn/README.md
configs/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_1x.yml → configs/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x.yml
configs/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_2x.yml → configs/faster_rcnn/coco_faster_rcnn_R_50_FPN_2x.yml
configs/faster_rcnn/voc_faster_rcnn_R-50-FPN_640.yml
configs/mask_rcnn/README.md
configs/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_1x.yml → configs/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x.yml
configs/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_2x.yml → configs/mask_rcnn/coco_mask_rcnn_R_50_FPN_2x.yml
configs/retinanet/README.md
configs/retinanet/coco_retinanet_R-50-FPN_416_6x.yml
configs/retinanet/coco_retinanet_R-50-FPN_512_6x.yml
configs/retinanet/coco_retinanet_R-50-FPN_800_1x.yml → configs/retinanet/coco_retinanet_R_50_FPN_1x.yml
configs/retinanet/coco_retinanet_R-50-FPN_800_2x.yml → configs/retinanet/coco_retinanet_R_50_FPN_2x.yml
configs/retinanet/voc_retinanet_R-50-FPN_416.yml
configs/retinanet/voc_retinanet_R-50-FPN_512.yml
configs/ssd/README.md
configs/ssd/voc_ssd_VGG-16_300.yml → configs/ssd/voc_ssd300_VGG_16_120e.yml
configs/ssd/voc_ssd_VGG-16_512.yml → configs/ssd/voc_ssd512_VGG_16_120e.yml
--- a/.flake8
+++ b/.flake8
@@ -9,4 +9,3 @@ ignore = E741, # ambiguous variable name
         W504, # line break after binary operator
 # module imported but unused
 per-file-ignores = __init__.py: F401
-exclude = seetadet/utils/pycocotools
--- a/MODEL_ZOO.md
+++ b/MODEL_ZOO.md
@@ -2,25 +2,9 @@
 ## Introduction
-### ImageNet Pretrained Models
+### Pretrained Models
-#### ResNet Models
+Please refer to [Pretrained Models](data/pretrained/README.md) for details.
- [R-50.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-50.pkl)
- [R-101.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-101.pkl)
-#### VGG Models
- [VGG16.SSD.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/VGG16.SSD.pkl)
-#### MobileNet Models
- [MobileNetV2.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/MobileNetV2.pkl)
- [ProxylessMobile.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/ProxylessMobile.pkl)
-#### AirNet Models
- [AirNet.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/AirNet.pkl)
 ## Baselines

--- a/README.md
+++ b/README.md
@@ -7,10 +7,6 @@ while the style of codes is torch.
 The torch-style codes help us to simplify the hierarchical pipeline of modern detection.
-## Requirements
-seeta-dragon >= 0.3.0.dev20201024
 ## Installation
 ### Build From Source
@@ -57,35 +53,23 @@ python test.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
 ```
 Or
+### Export a detection model to ONNX
 ```bash
 cd tools
-python test_all.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --last 1
+python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
 ```
-### Export a detection model to ONNX
+### Serve a detection model
 ```bash
 cd tools
-python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
+python serve.py --cfg <MODEL_YAML> --model_dir <MODEL_DIR>
 ```
 ## Benchmark and Model Zoo
 Results and models are available in the [Model Zoo](MODEL_ZOO.md).
-### Supported Backbones
- [ResNet](MODEL_ZOO.md#resnet-models)
- [VGG](MODEL_ZOO.md#vgg-models)
- [MobileNet](MODEL_ZOO.md#mobilenet-models)
- [AirNet](MODEL_ZOO.md#airnet-models)
-### Supported Algorithms
- [Faster R-CNN](configs/faster_rcnn)
- [Mask R-CNN](configs/mask_rcnn)
- [SSD](configs/ssd)
- [RetinaNet](configs/retinanet)
 ## License
 [BSD 2-Clause license](LICENSE)
--- a/configs/faster_rcnn/README.md
+++ b/configs/faster_rcnn/README.md
@@ -14,13 +14,7 @@
 ## COCO Object Detection Baselines
-| Model | Lr sched | Infer time (s/im) | box AP | Download |
+| Model | Lr sched | Infer time (fps) | box AP | Download |
-| :---: | :------: | :---------------: | :----: | :------: |
+| :---: | :------: | :--------------: | :----: | :-----: |
-| [R-50-FPN-800](coco_faster_rcnn_R-50-FPN_800_1x.yml) | 1x | 0.046 | 38.3 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_1x/model_final.pkl) |
+| [R-50-FPN](coco_faster_rcnn_R_50_FPN_1x.yml) | 1x | 27.78 | 38.4 | [model](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x/model_adb024b6.pkl) &#124; [log]() |
-| [R-50-FPN-800](coco_faster_rcnn_R-50-FPN_800_2x.yml) | 2x | 0.046 | 39.7 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_2x/model_final.pkl) |
+| [R-50-FPN](coco_faster_rcnn_R_50_FPN_2x.yml) | 2x | 27.78 | 39.8 | [model](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_2x/model_9a8c9ae5.pkl) &#124; [log]() |
-## Pascal VOC Object Detection Baselines
-| Model | Infer time (s/im) | AP@0.5 | Download |
-| :---: | :---------------: | :----: | :------: |
-| [R-50-FPN-640](voc_faster_rcnn_R-50-FPN_640.yml) | 0.030 | 80.8 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/voc_faster_rcnn_R-50-FPN_640_1x/model_final.pkl) |
--- a/configs/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_1x.yml
+++ b/configs/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_1x.yml
 NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
  TYPE: 'faster_rcnn'
-  BACKBONE: 'resnet50.fpn'
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
            'bus', 'train', 'truck', 'boat', 'traffic light',
@@ -19,27 +17,30 @@ MODEL:
            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
 SOLVER:
  BASE_LR: 0.02
  DECAY_STEPS: [60000, 80000]
  MAX_STEPS: 90000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'coco_faster_rcnn_R-50-FPN_800_1x'
+  SNAPSHOT_PREFIX: 'coco_faster_rcnn_R_50_FPN'
-FRCNN:
-  BATCH_SIZE: 512
-  ROI_XFORM_RESOLUTION: 7
 TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
-  DATASET: '/data/coco_2017_train'
+  DATASET: '../data/datasets/coco_train2017'
  IMS_PER_BATCH: 2
  SCALES: [640, 672, 704, 736, 768, 800]
  MAX_SIZE: 1333
  USE_DIFF: False # Do not use crowd objects
 TEST:
-  DATASET: '/data/coco_2017_val'
+  DATASET: '../data/datasets/coco_val2017'
-  JSON_FILE: '/data/instances_val2017.json'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
-  PROTOCOL: 'coco'
+  EVALUATOR: 'coco'
  IMS_PER_BATCH: 1
  SCALES: [800]
  MAX_SIZE: 1333
-  NMS: 0.5
--- a/configs/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_2x.yml
+++ b/configs/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_2x.yml
 NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
  TYPE: 'faster_rcnn'
-  BACKBONE: 'resnet50.fpn'
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
            'bus', 'train', 'truck', 'boat', 'traffic light',
@@ -19,27 +17,30 @@ MODEL:
            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
 SOLVER:
  BASE_LR: 0.02
  DECAY_STEPS: [120000, 160000]
  MAX_STEPS: 180000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'coco_faster_rcnn_R-50-FPN_800_2x'
+  SNAPSHOT_PREFIX: 'coco_faster_rcnn_R-50-FPN_800'
-FRCNN:
-  BATCH_SIZE: 512
-  ROI_XFORM_RESOLUTION: 7
 TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
-  DATASET: '/data/coco_2017_train'
+  DATASET: '../data/datasets/coco_train2017'
  IMS_PER_BATCH: 2
  SCALES: [640, 672, 704, 736, 768, 800]
  MAX_SIZE: 1333
  USE_DIFF: False # Do not use crowd objects
 TEST:
-  DATASET: '/data/coco_2017_val'
+  DATASET: '../data/datasets/coco_val2017'
-  JSON_FILE: '/data/instances_val2017.json'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
-  PROTOCOL: 'coco'
+  EVALUATOR: 'coco'
  IMS_PER_BATCH: 1
  SCALES: [800]
  MAX_SIZE: 1333
-  NMS: 0.5
--- a/configs/faster_rcnn/voc_faster_rcnn_R-50-FPN_640.yml
+++ b/configs/faster_rcnn/voc_faster_rcnn_R-50-FPN_640.yml
-NUM_GPUS: 1
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
-MODEL:
-  TYPE: 'faster_rcnn'
-  BACKBONE: 'resnet50.fpn'
-  CLASSES: ['__background__',
-            'aeroplane', 'bicycle', 'bird', 'boat',
-            'bottle', 'bus', 'car', 'cat', 'chair',
-            'cow', 'diningtable', 'dog', 'horse',
-            'motorbike', 'person', 'pottedplant',
-            'sheep', 'sofa', 'train', 'tvmonitor']
-FRCNN:
-  BATCH_SIZE: 128
-  ROI_XFORM_RESOLUTION: 7
-SOLVER:
-  BASE_LR: 0.002
-  DECAY_STEPS: [80000, 100000]
-  MAX_STEPS: 120000
-  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'voc_faster_rcnn_R-50-FPN_640'
-TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/voc_0712_trainval'
-  IMS_PER_BATCH: 2
-  SCALES: [480, 512, 544, 576, 608, 640]
-  MAX_SIZE: 1066
-  USE_COLOR_JITTER: True
-TEST:
-  DATASET: '/data/voc_2007_test'
-  PROTOCOL: 'voc2007'
-  IMS_PER_BATCH: 1
-  SCALES: [640]
-  MAX_SIZE: 1066
-  NMS: 0.45
--- a/configs/mask_rcnn/README.md
+++ b/configs/mask_rcnn/README.md
@@ -14,7 +14,7 @@
 ## COCO Instance Segmentation Baselines
-| Model | Lr sched | Infer time (s/im) | box AP | mask AP | Download |
+| Model | Lr sched | Infer time (fps) | box AP | mask AP | Download |
 | :---: | :------: | :---------------: | :----: | :-----: | :------: |
-| [R-50-FPN-800](coco_mask_rcnn_R-50-FPN_800_1x.yml) | 1x | 0.056 | 39.2 | 34.8 | [model](https://dragon.seetatech.com/download/models/seetadet/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_1x/model_final.pkl) |
+| [R-50-FPN](coco_mask_rcnn_R_50_FPN_1x.yml) | 1x | 22.22 | 39.2 | 35.1 | [model](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x/model_90266029.pkl) &#124; [log]() |
-| [R-50-FPN-800](coco_mask_rcnn_R-50-FPN_800_2x.yml) | 2x | 0.056 | 41.4 | 36.5 | [model](https://dragon.seetatech.com/download/models/seetadet/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_2x/model_final.pkl) |
+| [R-50-FPN](coco_mask_rcnn_R_50_FPN_2x.yml) | 2x | 22.22 | 41.4 | 36.7 | [model](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_2x/model_4ace9d05.pkl) &#124; [log]() |
--- a/configs/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_1x.yml
+++ b/configs/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_1x.yml
 NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
-  TYPE: 'mask_rcnn'
+  TYPE: mask_rcnn
-  BACKBONE: 'resnet50.fpn'
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
            'bus', 'train', 'truck', 'boat', 'traffic light',
@@ -19,28 +17,31 @@ MODEL:
            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
 SOLVER:
  BASE_LR: 0.02
  DECAY_STEPS: [60000, 80000]
  MAX_STEPS: 90000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'coco_mask_rcnn_R-50-FPN_800_1x'
+  SNAPSHOT_PREFIX: 'coco_mask_rcnn_R-50-FPN'
-FRCNN:
-  BATCH_SIZE: 512
-  ROI_XFORM_RESOLUTION: 7
-MRCNN:
-  ROI_XFORM_RESOLUTION: 14
 TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
-  DATASET: '/data/coco_2017_train'
+  DATASET: '../data/datasets/coco_train2017'
+  LOADER: 'mask_train'
  IMS_PER_BATCH: 2
  SCALES: [640, 672, 704, 736, 768, 800]
  MAX_SIZE: 1333
  USE_DIFF: False # Do not use crowd objects
 TEST:
-  DATASET: '/data/coco_2017_val'
+  DATASET: '../data/datasets/coco_val2017'
-  JSON_FILE: '/data/instances_val2017.json'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
-  PROTOCOL: 'coco'
+  EVALUATOR: 'coco'
+  IMS_PER_BATCH: 1
  SCALES: [800]
  MAX_SIZE: 1333
-  NMS: 0.5
--- a/configs/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_2x.yml
+++ b/configs/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_2x.yml
 NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
-  TYPE: 'mask_rcnn'
+  TYPE: mask_rcnn
-  BACKBONE: 'resnet50.fpn'
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
            'bus', 'train', 'truck', 'boat', 'traffic light',
@@ -19,28 +17,31 @@ MODEL:
            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
 SOLVER:
  BASE_LR: 0.02
  DECAY_STEPS: [120000, 160000]
  MAX_STEPS: 180000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'coco_mask_rcnn_R-50-FPN_800_2x'
+  SNAPSHOT_PREFIX: 'coco_mask_rcnn_R_50_FPN'
-FRCNN:
-  BATCH_SIZE: 512
-  ROI_XFORM_RESOLUTION: 7
-MRCNN:
-  ROI_XFORM_RESOLUTION: 14
 TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
-  DATASET: '/data/coco_2017_train'
+  DATASET: '../data/datasets/coco_train2017'
+  LOADER: 'mask_train'
  IMS_PER_BATCH: 2
  SCALES: [640, 672, 704, 736, 768, 800]
  MAX_SIZE: 1333
  USE_DIFF: False # Do not use crowd objects
 TEST:
-  DATASET: '/data/coco_2017_val'
+  DATASET: '../data/datasets/coco_val2017'
-  JSON_FILE: '/data/instances_val2017.json'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
-  PROTOCOL: 'coco'
+  EVALUATOR: 'coco'
+  IMS_PER_BATCH: 1
  SCALES: [800]
  MAX_SIZE: 1333
-  NMS: 0.5
--- a/configs/retinanet/README.md
+++ b/configs/retinanet/README.md
@@ -12,16 +12,7 @@
 ## COCO Object Detection Baselines
-| Model | Lr sched | Infer time (s/im) | box AP | Download |
+| Model | Lr sched | Infer time (fps) | box AP | Download |
-| :---: | :------: | :---------------: | :----: | :------: |
+| :---: | :------: | :--------------: | :----: | :------: |
-| [R-50-FPN-416](coco_retinanet_R-50-FPN_416_6x.yml) | 6x | 0.019 | 34.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_416_6x/model_final.pkl) |
+| [R-50-FPN](coco_retinanet_R_50_FPN_1x.yml) | 1x | 23.3 | 37.4 | [model](https://dragon.seetatech.com/download/seetadet/retinanet/coco_retinanet_R_50_FPN_1x/model_01a4d35f.pkl) &#124; [log]() |
-| [R-50-FPN-512](coco_retinanet_R-50-FPN_512_6x.yml) | 6x | 0.022 | 36.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_512_6x/model_final.pkl) |
+| [R-50-FPN](coco_retinanet_R_50_FPN_2x.yml) | 2x | 23.3 | 39.0 | [model](https://dragon.seetatech.com/download/seetadet/retinanet/coco_retinanet_R_50_FPN_2x/model_7e81f3ad.pkl) &#124; [log]() |
-| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_1x.yml) | 1x | 0.051 | 37.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_1x/model_final.pkl) |
-| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_2x.yml) | 2x | 0.051 | 39.1 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_2x/model_final.pkl) |
-## Pascal VOC Object Detection Baselines
-| Model | Infer time (s/im) | AP@0.5 | Download |
-| :---: | :---------------: | :----: | :------: |
-| [R-50-FPN-416](voc_retinanet_R-50-FPN_416.yml) | 0.015 | 82.3 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_416/model_final.pkl) |
-| [R-50-FPN-512](voc_retinanet_R-50-FPN_512.yml) | 0.017 | 83.0 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_512/model_final.pkl) |
--- a/configs/retinanet/coco_retinanet_R-50-FPN_416_6x.yml
+++ b/configs/retinanet/coco_retinanet_R-50-FPN_416_6x.yml
-NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
-MODEL:
-  TYPE: 'retinanet'
-  BACKBONE: 'resnet50.fpn'
-  CLASSES: ['__background__',
-            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
-            'bus', 'train', 'truck', 'boat', 'traffic light',
-            'fire hydrant', 'stop sign', 'parking meter', 'bench',
-            'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
-            'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
-            'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
-            'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
-            'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
-            'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
-            'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
-            'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
-            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
-            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
-            'teddy bear', 'hair drier', 'toothbrush']
-FPN:
-  RPN_MIN_LEVEL: 3
-  RPN_MAX_LEVEL: 7
-SOLVER:
-  BASE_LR: 0.01
-  DECAY_STEPS: [90000, 120000]
-  MAX_STEPS: 135000
-  SNAPSHOT_EVERY: 2500
-  SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_416_6x'
-PIPELINE:
-  TYPE: 'ssd'
-TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/coco_2017_train'
-  IMS_PER_BATCH: 8
-  SCALES: [416]
-  USE_DIFF: False # Do not use crowd objects
-TEST:
-  DATASET: '/data/coco_2017_val'
-  JSON_FILE: '/data/instances_val2017.json'
-  PROTOCOL: 'coco'
-  IMS_PER_BATCH: 1
-  SCALES: [416]
-  NMS: 0.5
--- a/configs/retinanet/coco_retinanet_R-50-FPN_512_6x.yml
+++ b/configs/retinanet/coco_retinanet_R-50-FPN_512_6x.yml
-NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
-MODEL:
-  TYPE: 'retinanet'
-  BACKBONE: 'resnet50.fpn'
-  CLASSES: ['__background__',
-            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
-            'bus', 'train', 'truck', 'boat', 'traffic light',
-            'fire hydrant', 'stop sign', 'parking meter', 'bench',
-            'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
-            'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
-            'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
-            'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
-            'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
-            'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
-            'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
-            'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
-            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
-            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
-            'teddy bear', 'hair drier', 'toothbrush']
-FPN:
-  RPN_MIN_LEVEL: 3
-  RPN_MAX_LEVEL: 7
-SOLVER:
-  BASE_LR: 0.01
-  DECAY_STEPS: [90000, 120000]
-  MAX_STEPS: 135000
-  SNAPSHOT_EVERY: 2500
-  SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_512_6x'
-PIPELINE:
-  TYPE: 'ssd'
-TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/coco_2017_train'
-  IMS_PER_BATCH: 8
-  SCALES: [512]
-  USE_DIFF: False # Do not use crowd objects
-TEST:
-  DATASET: '/data/coco_2017_val'
-  JSON_FILE: '/data/instances_val2017.json'
-  PROTOCOL: 'coco'
-  IMS_PER_BATCH: 1
-  SCALES: [512]
-  NMS: 0.5
--- a/configs/retinanet/coco_retinanet_R-50-FPN_800_1x.yml
+++ b/configs/retinanet/coco_retinanet_R-50-FPN_800_1x.yml
 NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
  TYPE: 'retinanet'
-  BACKBONE: 'resnet50.fpn'
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
            'bus', 'train', 'truck', 'boat', 'traffic light',
@@ -19,27 +17,25 @@ MODEL:
            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
            'teddy bear', 'hair drier', 'toothbrush']
-FPN:
+BACKBONE:
-  RPN_MIN_LEVEL: 3
+  TYPE: 'resnet50.fpn'
-  RPN_MAX_LEVEL: 7
 SOLVER:
  BASE_LR: 0.01
  DECAY_STEPS: [60000, 80000]
  MAX_STEPS: 90000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800_1x'
+  SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800'
 TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
-  DATASET: '/data/coco_2017_train'
+  DATASET: '../data/datasets/coco_train2017'
  IMS_PER_BATCH: 2
  SCALES: [640, 672, 704, 736, 768, 800]
  MAX_SIZE: 1333
  USE_DIFF: False # Do not use crowd objects
 TEST:
-  DATASET: '/data/coco_2017_val'
+  DATASET: '../data/datasets/coco_val2017'
-  JSON_FILE: '/data/instances_val2017.json'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
-  PROTOCOL: 'coco'
+  EVALUATOR: 'coco'
  IMS_PER_BATCH: 1
  SCALES: [800]
  MAX_SIZE: 1333
-  NMS: 0.5
--- a/configs/retinanet/coco_retinanet_R-50-FPN_800_2x.yml
+++ b/configs/retinanet/coco_retinanet_R-50-FPN_800_2x.yml
 NUM_GPUS: 8
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
  TYPE: 'retinanet'
-  BACKBONE: 'resnet50.fpn'
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
            'bus', 'train', 'truck', 'boat', 'traffic light',
@@ -19,27 +17,25 @@ MODEL:
            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
            'teddy bear', 'hair drier', 'toothbrush']
-FPN:
+BACKBONE:
-  RPN_MIN_LEVEL: 3
+  TYPE: 'resnet50.fpn'
-  RPN_MAX_LEVEL: 7
 SOLVER:
  BASE_LR: 0.01
  DECAY_STEPS: [120000, 160000]
  MAX_STEPS: 180000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800_2x'
+  SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800'
 TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
-  DATASET: '/data/coco_2017_train'
+  DATASET: '../data/datasets/coco_train2017'
  IMS_PER_BATCH: 2
  SCALES: [640, 672, 704, 736, 768, 800]
  MAX_SIZE: 1333
  USE_DIFF: False # Do not use crowd objects
 TEST:
-  DATASET: '/data/coco_2017_val'
+  DATASET: '../data/datasets/coco_val2017'
-  JSON_FILE: '/data/instances_val2017.json'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
-  PROTOCOL: 'coco'
+  EVALUATOR: 'coco'
  IMS_PER_BATCH: 1
  SCALES: [800]
  MAX_SIZE: 1333
-  NMS: 0.5
--- a/configs/retinanet/voc_retinanet_R-50-FPN_416.yml
+++ b/configs/retinanet/voc_retinanet_R-50-FPN_416.yml
-NUM_GPUS: 1
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
-MODEL:
-  TYPE: 'retinanet'
-  BACKBONE: 'resnet50.fpn'
-  CLASSES: ['__background__',
-            'aeroplane', 'bicycle', 'bird', 'boat',
-            'bottle', 'bus', 'car', 'cat', 'chair',
-            'cow', 'diningtable', 'dog', 'horse',
-            'motorbike', 'person', 'pottedplant',
-            'sheep', 'sofa', 'train', 'tvmonitor']
-FPN:
-  RPN_MIN_LEVEL: 3
-  RPN_MAX_LEVEL: 7
-RETINANET:
-  NUM_CONVS: 2
-SOLVER:
-  BASE_LR: 0.01
-  DECAY_STEPS: [80000, 100000]
-  MAX_STEPS: 120000
-  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'voc_retinanet_R-50-FPN_416'
-PIPELINE:
-  TYPE: 'ssd'
-TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/voc_0712_trainval'
-  IMS_PER_BATCH: 16
-  SCALES: [416]
-  RANDOM_SCALES: [0.25, 1.0]
-  USE_COLOR_JITTER: True
-TEST:
-  DATASET: '/data/voc_2007_test'
-  PROTOCOL: 'voc2007'
-  IMS_PER_BATCH: 1
-  SCALES: [416]
-  NMS: 0.45
-  RETINANET_PRE_NMS_TOP_N: 1000
--- a/configs/retinanet/voc_retinanet_R-50-FPN_512.yml
+++ b/configs/retinanet/voc_retinanet_R-50-FPN_512.yml
-NUM_GPUS: 2
-PIXEL_STDS: [57.375, 57.12, 58.395]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
-MODEL:
-  TYPE: 'retinanet'
-  BACKBONE: 'resnet50.fpn'
-  CLASSES: ['__background__',
-            'aeroplane', 'bicycle', 'bird', 'boat',
-            'bottle', 'bus', 'car', 'cat', 'chair',
-            'cow', 'diningtable', 'dog', 'horse',
-            'motorbike', 'person', 'pottedplant',
-            'sheep', 'sofa', 'train', 'tvmonitor']
-FPN:
-  RPN_MIN_LEVEL: 3
-  RPN_MAX_LEVEL: 7
-RETINANET:
-  NUM_CONVS: 2
-SOLVER:
-  BASE_LR: 0.01
-  DECAY_STEPS: [80000, 100000]
-  MAX_STEPS: 120000
-  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'voc_retinanet_R-50-FPN_512'
-PIPELINE:
-  TYPE: 'ssd'
-TRAIN:
-  WEIGHTS: '/model/R-50.pkl'
-  DATASET: '/data/voc_0712_trainval'
-  IMS_PER_BATCH: 8
-  SCALES: [512]
-  RANDOM_SCALES: [0.25, 1.0]
-  USE_COLOR_JITTER: True
-TEST:
-  DATASET: '/data/voc_2007_test'
-  PROTOCOL: 'voc2007'
-  IMS_PER_BATCH: 1
-  SCALES: [512]
-  NMS: 0.45
-  RETINANET_PRE_NMS_TOP_N: 1000
--- a/configs/ssd/README.md
+++ b/configs/ssd/README.md
@@ -12,7 +12,9 @@
 ## Pascal VOC Object Detection Baselines
-| Model | Infer time (s/im) | AP@0.5 | Download |
+| Model | Lr sched | Infer time (fps) | AP@0.5 | Download |
-| :---: | :---------------: | :----: | :------: |
+| :---: | :----:   | :--------------: | :----: | :------: |
-| [VGG-16-300](voc_ssd_VGG-16_300.yml) | 0.012 | 78.3 | [model](https://dragon.seetatech.com/download/models/seetadet/ssd/voc_ssd_VGG-16_300/model_final.pkl) |
+| [VGG-16-SSD300](voc_ssd300_VGG_16_120e.yml) | 120e | 100.0 | 78.3 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd300_VGG-16_120e/model_54664312.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd300_VGG-16_120e/logs.json) |
-| [VGG-16-512](voc_ssd_VGG-16_512.yml) | 0.021 | 80.1 | [model](https://dragon.seetatech.com/download/models/seetadet/ssd/voc_ssd_VGG-16_512/model_final.pkl) |
+| [VGG-16-SSD512](voc_ssd512_VGG_16_120e.yml) | 120e | 71.4 | 80.6 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd512_VGG-16_120e/model_e332ebfe.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd512_VGG-16_120e/logs.json) |
+| [MobileNetV2-SSDLite](voc_ssdlite_MobileNetV2_300e.yml) | 300e | 76.9 | 71.6 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV2_300e/model_da31ebe7.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV2_300e/logs.json) |
+| [MobileNetV3L-SSDLite](voc_ssdlite_MobileNetV3L_300e.yml) | 300e | 66.7 | 72.6 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV3L_300e/model_43b33a97.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV3L_300e/logs.json) |
--- a/configs/ssd/voc_ssd_VGG-16_300.yml
+++ b/configs/ssd/voc_ssd_VGG-16_300.yml
 NUM_GPUS: 1
-PIXEL_STDS: [1.0, 1.0, 1.0]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
  TYPE: 'ssd'
-  BACKBONE: 'vgg16_reduced_300'
+  PRECISION: 'float16'
-  COARSEST_STRIDE: 0
  CLASSES: ['__background__',
            'aeroplane', 'bicycle', 'bird', 'boat',
            'bottle', 'bus', 'car', 'cat', 'chair',
            'cow', 'diningtable', 'dog', 'horse',
            'motorbike', 'person', 'pottedplant',
            'sheep', 'sofa', 'train', 'tvmonitor']
-SSD:
+BACKBONE:
+  TYPE: 'vgg16_fcn.ssd300'
+  NORM: ''
+  COARSEST_STRIDE: 300
+FPN:
+  ACTIVATION: 'ReLU'
+ANCHOR_GENERATOR:
  STRIDES: [8, 16, 32, 64, 100, 300]
-  ANCHOR_SIZES: [[30, 60],
+  SIZES: [[30, 60], [60, 110],[110, 162],
-                 [60, 110],
+          [162, 213], [213, 264], [264, 315]]
-                 [110, 162],
-                 [162, 213],
-                 [213, 264],
-                 [264, 315]]
  ASPECT_RATIOS: [[1, 2, 0.5],
                  [1, 2, 0.5, 3, 0.33],
                  [1, 2, 0.5, 3, 0.33],
@@ -31,18 +30,21 @@ SOLVER:
  DECAY_STEPS: [80000, 100000]
  MAX_STEPS: 120000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'voc_ssd_VGG-16_300'
+  SNAPSHOT_PREFIX: 'voc_ssd300_VGG_16'
+AUG:
+  COLOR_JITTER: 0.5
 TRAIN:
-  WEIGHTS: '/model/VGG16.SSD.pkl'
+  WEIGHTS: '../data/pretrained/VGG-16-FCN_in1k.pkl'
-  DATASET: '/data/voc_0712_trainval'
+  DATASET: '../data/datasets/voc_trainval0712'
  IMS_PER_BATCH: 16
  SCALES: [300]
-  RANDOM_SCALES: [0.25, 1.0]
+  SCALES_RANGE: [0.25, 1.0]
-  USE_COLOR_JITTER: True
+  LOADER: 'ssd_train'
 TEST:
-  DATASET: '/data/voc_2007_test'
+  DATASET: '../data/datasets/voc_test2007'
-  PROTOCOL: 'voc2007'
+  JSON_DATASET: '../data/datasets/voc_test2007.json'
+  EVALUATOR: 'voc2007'
  IMS_PER_BATCH: 1
  SCALES: [300]
-  NMS: 0.45
+  NMS_THRESH: 0.45
  SCORE_THRESH: 0.01
--- a/configs/ssd/voc_ssd_VGG-16_512.yml
+++ b/configs/ssd/voc_ssd_VGG-16_512.yml
-NUM_GPUS: 2
+NUM_GPUS: 1
-PIXEL_STDS: [1.0, 1.0, 1.0]
-PIXEL_MEANS: [103.53, 116.28, 123.675]
 MODEL:
  TYPE: 'ssd'
-  BACKBONE: 'vgg16_reduced_512'
+  PRECISION: 'float16'
  CLASSES: ['__background__',
            'aeroplane', 'bicycle', 'bird', 'boat',
            'bottle', 'bus', 'car', 'cat', 'chair',
            'cow', 'diningtable', 'dog', 'horse',
            'motorbike', 'person', 'pottedplant',
            'sheep', 'sofa', 'train', 'tvmonitor']
-SSD:
+BACKBONE:
+  TYPE: 'vgg16_fcn.ssd512'
+  NORM: ''
+  COARSEST_STRIDE: 512
+FPN:
+  ACTIVATION: 'ReLU'
+ANCHOR_GENERATOR:
  STRIDES: [8, 16, 32, 64, 128, 256, 512]
-  ANCHOR_SIZES: [[35.84, 76.8],
+  SIZES: [[35.84, 76.8],
          [76.8, 153.6],
          [153.6, 230.4],
          [230.4, 307.2],
@@ -32,18 +36,21 @@ SOLVER:
  DECAY_STEPS: [80000, 100000]
  MAX_STEPS: 120000
  SNAPSHOT_EVERY: 5000
-  SNAPSHOT_PREFIX: 'voc_ssd_VGG-16_512'
+  SNAPSHOT_PREFIX: 'voc_ssd512_VGG_16'
+AUG:
+  COLOR_JITTER: 0.5
 TRAIN:
-  WEIGHTS: '/model/VGG16.SSD.pkl'
+  WEIGHTS: '../data/pretrained/VGG-16-FCN_in1k.pkl'
-  DATASET: '/data/voc_0712_trainval'
+  DATASET: '../data/datasets/voc_trainval0712'
-  IMS_PER_BATCH: 8
+  IMS_PER_BATCH: 16
  SCALES: [512]
-  RANDOM_SCALES: [0.25, 1.0]
+  SCALES_RANGE: [0.25, 1.0]
-  USE_COLOR_JITTER: True
+  LOADER: 'ssd_train'
 TEST:
-  DATASET: '/data/voc_2007_test'
+  DATASET: '../data/datasets/voc_test2007'
-  PROTOCOL: 'voc2007'
+  JSON_DATASET: '../data/datasets/voc_test2007.json'
+  EVALUATOR: 'voc2007'
  IMS_PER_BATCH: 1
  SCALES: [512]
-  NMS: 0.45
+  NMS_THRESH: 0.45
  SCORE_THRESH: 0.01
--- a/configs/ssd/voc_ssdlite_MobileNetV2_300e.yml
+++ b/configs/ssd/voc_ssdlite_MobileNetV2_300e.yml
+NUM_GPUS: 1
+MODEL:
+  TYPE: 'ssd'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'aeroplane', 'bicycle', 'bird', 'boat',
+            'bottle', 'bus', 'car', 'cat', 'chair',
+            'cow', 'diningtable', 'dog', 'horse',
+            'motorbike', 'person', 'pottedplant',
+            'sheep', 'sofa', 'train', 'tvmonitor']
+BACKBONE:
+  TYPE: 'mobilenet_v2.ssdlite'
+  NORM: 'BN'
+FPN:
+  CONV: 'SepConv2d'
+  NORM: 'BN'
+  ACTIVATION: 'ReLU6'
+ANCHOR_GENERATOR:
+  STRIDES: [16, 32, 64, 107, 160, 320]
+  SIZES: [[48, 100], [100, 150],[150, 202],
+          [202, 253], [253, 304], [304, 320]]
+  ASPECT_RATIOS: [[1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33]]
+SOLVER:
+  BASE_LR: 0.04
+  WEIGHT_DECAY: 0.00004
+  DECAY_STEPS: [50000, 62500]
+  MAX_STEPS: 75000
+  SNAPSHOT_EVERY: 1250
+  SNAPSHOT_PREFIX: 'voc_ssdlite_MobileNetV2'
+AUG:
+  COLOR_JITTER: 0.5
+TRAIN:
+  WEIGHTS: '../data/pretrained/MobileNetV2_in1k_cls300e.pkl'
+  DATASET: '../data/datasets/voc_trainval0712'
+  IMS_PER_BATCH: 64
+  SCALES: [320]
+  SCALES_RANGE: [0.25, 1.0]
+  LOADER: 'ssd_train'
+  NUM_WORKERS: 12
+TEST:
+  DATASET: '../data/datasets/voc_test2007'
+  JSON_DATASET: '../data/datasets/voc_test2007.json'
+  EVALUATOR: 'voc2007'
+  IMS_PER_BATCH: 1
+  SCALES: [320]
+  NMS_THRESH: 0.45
+  SCORE_THRESH: 0.01
--- a/configs/ssd/voc_ssdlite_MobileNetV3L_300e.yml
+++ b/configs/ssd/voc_ssdlite_MobileNetV3L_300e.yml
+NUM_GPUS: 1
+MODEL:
+  TYPE: 'ssd'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'aeroplane', 'bicycle', 'bird', 'boat',
+            'bottle', 'bus', 'car', 'cat', 'chair',
+            'cow', 'diningtable', 'dog', 'horse',
+            'motorbike', 'person', 'pottedplant',
+            'sheep', 'sofa', 'train', 'tvmonitor']
+BACKBONE:
+  TYPE: 'mobilenet_v3_large.ssdlite'
+  NORM: 'BN'
+FPN:
+  CONV: 'SepConv2d'
+  NORM: 'BN'
+  ACTIVATION: 'ReLU'
+ANCHOR_GENERATOR:
+  STRIDES: [16, 32, 64, 107, 160, 320]
+  SIZES: [[48, 100], [100, 150],[150, 202],
+          [202, 253], [253, 304], [304, 320]]
+  ASPECT_RATIOS: [[1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33]]
+SOLVER:
+  BASE_LR: 0.04
+  WEIGHT_DECAY: 0.00004
+  DECAY_STEPS: [50000, 62500]
+  MAX_STEPS: 75000
+  SNAPSHOT_EVERY: 1250
+  SNAPSHOT_PREFIX: 'voc_ssdlite_MobileNetV3L'
+AUG:
+  COLOR_JITTER: 0.5
+TRAIN:
+  WEIGHTS: '../data/pretrained/MobileNetV3L_in1k_cls600e.pkl'
+  DATASET: '../data/datasets/voc_trainval0712'
+  IMS_PER_BATCH: 64
+  SCALES: [320]
+  SCALES_RANGE: [0.25, 1.0]
+  LOADER: 'ssd_train'
+  NUM_WORKERS: 12
+TEST:
+  DATASET: '../data/datasets/voc_test2007'
+  JSON_DATASET: '../data/datasets/voc_test2007.json'
+  EVALUATOR: 'voc2007'
+  IMS_PER_BATCH: 1
+  SCALES: [320]
+  NMS_THRESH: 0.45
+  SCORE_THRESH: 0.01
--- a/csrc/cxx/operators/nms_op.cc
+++ b/csrc/cxx/operators/nms_op.cc
-#include "nms_op.h"
+#include "../operators/nms_op.h"
-#include "../utils/detection_utils.h"
+#include "../utils/detection.h"
 namespace dragon {
 template <class Context>
 template <typename T>
 void NonMaxSuppressionOp<Context>::DoRunWithType() {
-  int num_selected;
+  auto &X = Input(0), *Y = Output(0);
-  utils::detection::ApplyNMS(
+  CHECK(X.ndim() == 2 && X.dim(1) == 5)
-      Output(0)->count(),
+      << "\nThe dimensions of boxes should be (num_boxes, 5).";
-      Output(0)->count(),
+  detection::ApplyNMS(
+      X.dim(0),
+      X.dim(0),
      iou_threshold_,
-      Input(0).template mutable_data<T, Context>(),
+      X.template mutable_data<T, Context>(),
-      Output(0)->template mutable_data<int64_t, CPUContext>(),
+      out_indices_,
-      num_selected,
      ctx());
-  Output(0)->Reshape({num_selected});
+  Y->template CopyFrom<int64_t>(out_indices_);
-}
-template <class Context>
-void NonMaxSuppressionOp<Context>::RunOnDevice() {
-  CHECK(Input(0).ndim() == 2 && Input(0).dim(1) == 5)
-      << "\nThe dimensions of boxes should be (num_boxes, 5).";
-  Output(0)->Reshape({Input(0).dim(0)});
-  DispatchHelper<TensorTypes<float>>::Call(this, Input(0));
 }
 DEPLOY_CPU_OPERATOR(NonMaxSuppression);

--- a/csrc/cxx/operators/nms_op.h
+++ b/csrc/cxx/operators/nms_op.h
@@ -10,8 +10,8 @@
 * ------------------------------------------------------------
 */
-#ifndef SEETADET_CXX_OPERATORS_NMS_OP_H_
+#ifndef DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
-#define SEETADET_CXX_OPERATORS_NMS_OP_H_
+#define DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
 #include "dragon/core/operator.h"
@@ -25,15 +25,18 @@ class NonMaxSuppressionOp final : public Operator<Context> {
        iou_threshold_(OP_SINGLE_ARG(float, "iou_threshold", 0.5f)) {}
  USE_OPERATOR_FUNCTIONS;
-  void RunOnDevice() override;
+  void RunOnDevice() override {
+    DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(0));
+  }
  template <typename T>
  void DoRunWithType();
 protected:
  float iou_threshold_;
+  vector<int64_t> out_indices_;
 };
 } // namespace dragon
-#endif // SEETADET_CXX_OPERATORS_NMS_OP_H_
+#endif // DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
--- a/csrc/cxx/operators/retinanet_decoder_op.cc
+++ b/csrc/cxx/operators/retinanet_decoder_op.cc
-#include <dragon/utils/math_functions.h>
+#include "../operators/retinanet_decoder_op.h"
+#include "../utils/detection.h"
-#include "../utils/detection_utils.h"
-#include "retinanet_decoder_op.h"
 namespace dragon {
 template <class Context>
 template <typename T>
 void RetinaNetDecoderOp<Context>::DoRunWithType() {
-  using BT = float; // DType of BBox
+  auto num_images = Input(SCORES).dim(0);
-  using BC = CPUContext; // Context of BBox
+  auto num_anchors = Input(SCORES).dim(1);
-  int total_proposals = 0;
-  auto* batch_scores = Input(SCORES).template data<T, Context>();
-  auto* batch_deltas = Input(DELTAS).template data<T, BC>();
-  auto* im_info = Input(IMAGE_INFO).template data<BT, BC>();
-  auto* all_proposals = Output(0)->template mutable_data<BT, BC>();
-  for (int im_idx = 0; im_idx < num_images_; ++im_idx) {
-    BT im_h = im_info[0];
-    BT im_w = im_info[1];
-    BT im_scale_h = im_info[2];
-    BT im_scale_w = im_info[2];
-    if (Input(IMAGE_INFO).dim(1) == 4) im_scale_w = im_info[3];
-    CHECK_EQ(strides_.size(), InputSize() - 3)
-        << "\nGiven " << strides_.size() << " strides "
-        << "and " << InputSize() - 3 << " features";
-    // Select the top-k candidates as proposals
-    auto num_boxes = Input(SCORES).dim(1);
  auto num_classes = Input(SCORES).dim(2);
-    utils::detection::SelectProposals(
+  auto num_scores = num_anchors * num_classes;
-        Input(SCORES).count(1),
+  auto num_cell_anchors = int64_t(ratios_.size() * scales_.size());
-        score_thr_,
-        batch_scores + im_idx * Input(SCORES).stride(0),
+  // Generate anchors.
-        roi_scores_,
+  CHECK_EQ(Input(GRID_INFO).dim(0), int64_t(strides_.size()))
-        roi_indices_,
+      << "\nProvide " << Input(GRID_INFO).dim(0) << " grids for "
-        ctx());
+      << strides_.size() << " strides.";
-    auto num_candidates = (int)roi_scores_.size();
+  cell_anchors_.resize(strides_.size());
-    auto num_proposals = std::min(num_candidates, (int)pre_nms_topn_);
+  vector<detection::GridArgs<int64_t>> grid_args(strides_.size());
-    utils::detection::ArgPartition(
+  for (int i = 0; i < strides_.size(); ++i) {
-        num_candidates, num_proposals, true, roi_scores_.data(), indices_);
+    grid_args[i].stride = strides_[i];
-    scores_.resize(indices_.size());
+    auto& anchors = cell_anchors_[i];
-    for (int i = 0; i < num_proposals; ++i) {
+    if (int64_t(anchors.size()) == num_cell_anchors * 4) continue;
-      scores_[i] = roi_scores_[indices_[i]];
+    anchors.resize(num_cell_anchors * 4);
-      indices_[i] = roi_indices_[indices_[i]];
+    detection::GenerateAnchors(
-    }
-    // Decode proposals via anchors
-    int stride_offset = 0;
-    for (int i = 0; i < strides_.size(); i++) {
-      auto feature_h = Input(i).dim(2);
-      auto feature_w = Input(i).dim(3);
-      auto K = feature_h * feature_w;
-      auto A = int(ratios_.size() * scales_.size());
-      anchors_.resize((size_t)(A * 4));
-      utils::detection::GenerateAnchors(
        strides_[i],
-          (int)ratios_.size(),
+        int64_t(ratios_.size()),
-          (int)scales_.size(),
+        int64_t(scales_.size()),
        ratios_.data(),
        scales_.data(),
-          anchors_.data());
+        anchors.data());
-      utils::detection::GetShiftedAnchors(
+  }
-          num_proposals,
+  // Set grid arguments.
+  auto* grid_info = Input(GRID_INFO).template data<int64_t, CPUContext>();
+  detection::SetGridArgs(num_anchors, num_cell_anchors, grid_info, grid_args);
+  // Decode detections.
+  auto* scores = Input(SCORES).template data<T, Context>();
+  auto* deltas = Input(DELTAS).template data<T, CPUContext>();
+  auto* im_info = Input(IM_INFO).template data<float, CPUContext>();
+  auto* Y = Output(0)->Reshape({num_images * pre_nms_topn_, 7});
+  auto* dets = Y->template mutable_data<float, CPUContext>();
+  int64_t size_dets = 0;
+  for (int batch_ind = 0; batch_ind < num_images; ++batch_ind) {
+    detection::ImageArgs<int64_t> im_args(im_info + batch_ind * 4);
+    im_args.batch_ind = batch_ind;
+    detection::SelectProposals(
+        num_scores,
+        pre_nms_topn_,
+        score_thresh_,
+        scores + batch_ind * num_scores,
+        scores_,
+        indices_,
+        ctx());
+    auto* offset_dets = dets + size_dets * 7;
+    auto num_dets = int64_t(indices_.size());
+    size_dets += num_dets;
+    for (int i = 0; i < strides_.size(); ++i) {
+      detection::GetAnchors(
+          num_dets,
+          num_cell_anchors,
          num_classes,
-          A,
+          grid_args[i],
-          feature_h,
+          cell_anchors_[i].data(),
-          feature_w,
-          strides_[i],
-          stride_offset,
-          anchors_.data(),
          indices_.data(),
-          all_proposals);
+          offset_dets);
-      stride_offset += (A * K);
    }
-    utils::detection::GenerateDetections(
+    detection::DecodeDetections(
-        num_proposals,
+        num_dets,
-        num_boxes,
+        num_anchors,
        num_classes,
-        im_idx,
+        im_args,
-        im_h,
-        im_w,
-        im_scale_h,
-        im_scale_w,
        scores_.data(),
-        batch_deltas + im_idx * Input(DELTAS).stride(0),
+        deltas + batch_ind * Input(DELTAS).stride(0),
        indices_.data(),
-        all_proposals);
+        offset_dets);
-    total_proposals += num_proposals;
-    all_proposals += (num_proposals * 7);
-    im_info += Input(IMAGE_INFO).dim(1);
  }
-  Output(0)->Reshape({total_proposals, 7});
+  // Shrink to the correct dimensions.
-}
+  Y->Reshape({size_dets, 7});
-template <class Context>
-void RetinaNetDecoderOp<Context>::RunOnDevice() {
-  num_images_ = Input(0).dim(0);
-  CHECK_EQ(Input(-1).dim(0), num_images_)
-      << "\nExcepted " << num_images_ << " groups info, got "
-      << Input(-1).dim(0) << ".";
-  Output(0)->Reshape({num_images_ * pre_nms_topn_, 7});
-  DispatchHelper<TensorTypes<float>>::Call(this, Input(SCORES));
 }
 DEPLOY_CPU_OPERATOR(RetinaNetDecoder);
@@ -109,7 +88,7 @@ DEPLOY_CPU_OPERATOR(RetinaNetDecoder);
 DEPLOY_CUDA_OPERATOR(RetinaNetDecoder);
 #endif
-OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX);
+OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(4).NumOutputs(1);
 NO_GRADIENT(RetinaNetDecoder);

--- a/csrc/cxx/operators/retinanet_decoder_op.h
+++ b/csrc/cxx/operators/retinanet_decoder_op.h
@@ -10,8 +10,8 @@
 * ------------------------------------------------------------
 */
-#ifndef SEETADET_CXX_OPERATORS_RETINANET_DECODER_OP_H_
+#ifndef DRAGON_EXTENSION_OPERATORS_RETINANET_DECODER_OP_H_
-#define SEETADET_CXX_OPERATORS_RETINANET_DECODER_OP_H_
+#define DRAGON_EXTENSION_OPERATORS_RETINANET_DECODER_OP_H_
 #include "dragon/core/operator.h"
@@ -26,24 +26,29 @@ class RetinaNetDecoderOp final : public Operator<Context> {
        ratios_(OP_REPEATED_ARG(float, "ratios")),
        scales_(OP_REPEATED_ARG(float, "scales")),
        pre_nms_topn_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
-        score_thr_(OP_SINGLE_ARG(float, "score_thresh", 0.05f)) {}
+        score_thresh_(OP_SINGLE_ARG(float, "score_thresh", 0.05f)) {}
  USE_OPERATOR_FUNCTIONS;
-  void RunOnDevice() override;
+  void RunOnDevice() override {
+    DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(SCORES));
+  }
  template <typename T>
  void DoRunWithType();
-  enum INPUT_TAGS { SCORES = -3, DELTAS = -2, IMAGE_INFO = -1 };
+  enum INPUT_TAGS { SCORES = 0, DELTAS = 1, IM_INFO = 2, GRID_INFO = 3 };
 protected:
-  float score_thr_;
+  float score_thresh_;
-  vec64_t strides_, indices_, roi_indices_;
+  vector<int64_t> strides_;
-  vector<float> ratios_, scales_, anchors_;
+  vector<float> ratios_, scales_;
-  vector<float> scores_, roi_scores_;
+  int64_t pre_nms_topn_;
-  int64_t num_images_, pre_nms_topn_;
+  vector<float> scores_;
+  vector<int64_t> indices_;
+  vector<vector<float>> cell_anchors_;
 };
 } // namespace dragon
-#endif // SEETADET_CXX_OPERATORS_RETINANET_DECODER_OP_H_
+#endif // DRAGON_EXTENSION_PERATORS_RETINANET_DECODER_OP_H_
--- a/csrc/cxx/operators/rpn_decoder_op.cc
+++ b/csrc/cxx/operators/rpn_decoder_op.cc
-#include <dragon/utils/math_functions.h>
+#include "../operators/rpn_decoder_op.h"
+#include "../utils/detection.h"
-#include "../utils/detection_utils.h"
-#include "rpn_decoder_op.h"
 namespace dragon {
 template <class Context>
 template <typename T>
 void RPNDecoderOp<Context>::DoRunWithType() {
-  using BT = float; // DType of BBox
+  auto num_images = Input(SCORES).dim(0);
-  using BC = CPUContext; // Context of BBox
+  auto num_anchors = Input(SCORES).dim(1);
+  auto num_cell_anchors = int64_t(ratios_.size() * scales_.size());
-  int feat_h, feat_w, K, A;
-  int total_rois = 0, num_rois;
+  // Generate anchors.
-  int num_candidates, num_proposals;
+  CHECK_EQ(Input(GRID_INFO).dim(0), int64_t(strides_.size()))
+      << "\nProvide " << Input(GRID_INFO).dim(0) << " grids for "
-  auto* batch_scores = Input(SCORES).template data<T, BC>();
+      << strides_.size() << " strides.";
-  auto* batch_deltas = Input(DELTAS).template data<T, BC>();
+  cell_anchors_.resize(strides_.size());
-  auto* im_info = Input(IMAGE_INFO).template data<BT, BC>();
+  vector<detection::GridArgs<int64_t>> grid_args(strides_.size());
-  auto* all_rois = Output(0)->template mutable_data<BT, BC>();
+  for (int i = 0; i < strides_.size(); ++i) {
+    grid_args[i].stride = strides_[i];
-  for (int im_idx = 0; im_idx < num_images_; ++im_idx) {
+    auto& anchors = cell_anchors_[i];
-    const BT im_h = im_info[0];
+    if (int64_t(anchors.size()) == num_cell_anchors * 4) continue;
-    const BT im_w = im_info[1];
+    anchors.resize(num_cell_anchors * 4);
-    auto* scores = batch_scores + im_idx * Input(SCORES).stride(0);
+    detection::GenerateAnchors(
-    auto* deltas = batch_deltas + im_idx * Input(DELTAS).stride(0);
-    CHECK_EQ(strides_.size(), InputSize() - 3)
-        << "\nGiven " << strides_.size() << " strides "
-        << "and " << InputSize() - 3 << " feature inputs";
-    CHECK_EQ(strides_.size(), scales_.size())
-        << "\nGiven " << strides_.size() << " strides "
-        << "and " << scales_.size() << " scales";
-    // Select the top-k candidates as proposals
-    num_candidates = Input(SCORES).dim(1);
-    num_proposals = std::min(num_candidates, (int)pre_nms_top_n_);
-    utils::math::ArgPartition(
-        num_candidates, num_proposals, true, scores, indices_);
-    // Decode the candidates
-    int stride_offset = 0;
-    proposals_.Reshape({num_proposals, 5});
-    auto* proposals = proposals_.template mutable_data<BT, BC>();
-    for (int i = 0; i < strides_.size(); i++) {
-      feat_h = Input(i).dim(2);
-      feat_w = Input(i).dim(3);
-      K = feat_h * feat_w;
-      A = (int)ratios_.size();
-      anchors_.resize((size_t)(A * 4));
-      utils::detection::GenerateAnchors(
        strides_[i],
-          (int)ratios_.size(),
+        int64_t(ratios_.size()),
-          1,
+        int64_t(scales_.size()),
        ratios_.data(),
        scales_.data(),
-          anchors_.data());
+        anchors.data());
-      utils::detection::GetShiftedAnchors(
+  }
+  // Set grid arguments.
+  auto* grid_info = Input(GRID_INFO).template data<int64_t, CPUContext>();
+  detection::SetGridArgs(num_anchors, num_cell_anchors, grid_info, grid_args);
+  // Decode proposals.
+  auto* scores = Input(SCORES).template data<T, CPUContext>();
+  auto* deltas = Input(DELTAS).template data<T, CPUContext>();
+  auto* im_info = Input(IM_INFO).template data<float, CPUContext>();
+  auto* Y = Output("Y")->Reshape({num_images * pre_nms_topn_, 5});
+  auto* proposals = Y->template mutable_data<float, CPUContext>();
+  vector<int64_t> size_proposals({0});
+  for (int batch_ind = 0; batch_ind < num_images; ++batch_ind) {
+    detection::ImageArgs<int64_t> im_args(im_info + batch_ind * 4);
+    im_args.batch_ind = batch_ind;
+    detection::SelectProposals(
+        num_anchors,
+        pre_nms_topn_,
+        score_thresh_,
+        scores + batch_ind * num_anchors,
+        scores_,
+        indices_,
+        (CPUContext*)nullptr); // Faster.
+    auto* offset_proposals = proposals + size_proposals.back() * 5;
+    auto num_proposals = int64_t(indices_.size());
+    size_proposals.push_back(size_proposals.back() + num_proposals);
+    for (int i = 0; i < strides_.size(); ++i) {
+      detection::GetAnchors(
          num_proposals,
-          A,
+          num_cell_anchors,
-          feat_h,
+          grid_args[i],
-          feat_w,
+          cell_anchors_[i].data(),
-          strides_[i],
-          stride_offset,
-          anchors_.data(),
          indices_.data(),
-          proposals);
+          offset_proposals);
-      stride_offset += (A * K);
    }
-    utils::detection::GenerateProposals(
+    detection::DecodeProposals(
-        num_candidates,
        num_proposals,
-        im_h,
+        num_anchors,
-        im_w,
+        im_args,
-        scores,
+        scores_.data(),
-        deltas,
+        deltas + batch_ind * Input(DELTAS).stride(0),
-        &indices_[0],
+        indices_.data(),
-        proposals);
+        offset_proposals);
-    // Sort, NMS and Retrieve
+    detection::SortBoxes<T, detection::Box5d<T>>(
-    utils::detection::SortProposals(
+        num_proposals, offset_proposals);
-        0, num_proposals - 1, num_proposals, proposals);
+  }
-    utils::detection::ApplyNMS(
+  // Apply NMS.
+  auto* proposals_v2 = Y->template data<float, Context>();
+  int64_t size_rois = 0;
+  for (int batch_ind = 0; batch_ind < num_images; ++batch_ind) {
+    auto offset = size_proposals[batch_ind];
+    auto num_proposals = size_proposals[batch_ind + 1] - offset;
+    detection::ApplyNMS(
        num_proposals,
-        post_nms_top_n_,
+        post_nms_topn_,
-        nms_thr_,
+        nms_thresh_,
-        proposals_.template mutable_data<BT, Context>(),
+        proposals_v2 + offset * 5,
-        roi_indices_.data(),
+        nms_indices_,
-        num_rois,
        ctx());
-    utils::detection::RetrieveRoIs(
+    num_proposals = int64_t(nms_indices_.size());
-        num_rois, im_idx, proposals, roi_indices_.data(), all_rois);
+    for (int i = 0; i < num_proposals; ++i) {
-    total_rois += num_rois;
+      scores_[size_rois] = batch_ind;
-    all_rois += (num_rois * 5);
+      indices_[size_rois++] = nms_indices_[i] + offset;
-    im_info += Input(IMAGE_INFO).dim(1);
+    }
  }
-  Output(0)->Reshape({total_rois, 5});
+  // Apply Histogram.
+  detection::ApplyHistogram(
-  // Distribute rois into K bins
+      size_rois,
-  if (OutputSize() > 1) {
-    CHECK_EQ(max_level_ - min_level_ + 1, OutputSize())
-        << "\nExcepted " << OutputSize() << " outputs for levels "
-        << "between [" << min_level_ << ", " << max_level_ << "].";
-    vector<BT*> ys(OutputSize());
-    vector<vec64_t> bins(OutputSize());
-    Tensor RoIs;
-    RoIs.ReshapeLike(*Output(0));
-    auto* rois = RoIs.template mutable_data<BT, BC>();
-    ctx()->template Copy<BT, BC, BC>(
-        Output(0)->count(), rois, Output(0)->template data<BT, BC>());
-    utils::detection::CollectRoIs(
-        total_rois,
      min_level_,
      max_level_,
      canonical_level_,
      canonical_scale_,
-        rois,
+      proposals,
-        bins);
+      scores_.data(),
+      indices_.data(),
-    for (int i = 0; i < OutputSize(); i++) {
+      output_rois_);
-      Output(i)->Reshape({std::max((int)bins[i].size(), 1), 5});
-      ys[i] = Output(i)->template mutable_data<BT, BC>();
+  // Copy to outputs.
-    }
+  for (int i = 0; i < OutputSize(); ++i) {
+    const auto& rois = output_rois_[i];
-    utils::detection::DistributeRoIs(bins, rois, ys);
+    vector<int64_t> dims({int64_t(rois.size()) / 5, 5});
+    auto* Yi = Output(i)->Reshape(dims);
+    std::memcpy(
+        Yi->template mutable_data<T, CPUContext>(),
+        rois.data(),
+        sizeof(T) * rois.size());
  }
 }
-template <class Context>
-void RPNDecoderOp<Context>::RunOnDevice() {
-  num_images_ = Input(0).dim(0);
-  CHECK_EQ(Input(IMAGE_INFO).dim(0), num_images_)
-      << "\nExcepted " << num_images_ << " groups info, got "
-      << Input(IMAGE_INFO).dim(0) << ".";
-  roi_indices_.resize(post_nms_top_n_);
-  Output(0)->Reshape({num_images_ * post_nms_top_n_, 5});
-  DispatchHelper<TensorTypes<float>>::Call(this, Input(SCORES));
-}
 DEPLOY_CPU_OPERATOR(RPNDecoder);
 #ifdef USE_CUDA
 DEPLOY_CUDA_OPERATOR(RPNDecoder);
 #endif
-OPERATOR_SCHEMA(RPNDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX);
+OPERATOR_SCHEMA(RPNDecoder).NumInputs(4).NumOutputs(1, INT_MAX);
 NO_GRADIENT(RPNDecoder);

--- a/csrc/cxx/operators/rpn_decoder_op.h
+++ b/csrc/cxx/operators/rpn_decoder_op.h
@@ -10,8 +10,8 @@
 * ------------------------------------------------------------
 */
-#ifndef SEETADET_CXX_OPERATORS_RPN_DECODER_OP_H_
+#ifndef DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
-#define SEETADET_CXX_OPERATORS_RPN_DECODER_OP_H_
+#define DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
 #include "dragon/core/operator.h"
@@ -25,32 +25,39 @@ class RPNDecoderOp final : public Operator<Context> {
        strides_(OP_REPEATED_ARG(int64_t, "strides")),
        ratios_(OP_REPEATED_ARG(float, "ratios")),
        scales_(OP_REPEATED_ARG(float, "scales")),
-        pre_nms_top_n_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
+        pre_nms_topn_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
-        post_nms_top_n_(OP_SINGLE_ARG(int64_t, "post_nms_top_n", 1000)),
+        post_nms_topn_(OP_SINGLE_ARG(int64_t, "post_nms_top_n", 1000)),
-        nms_thr_(OP_SINGLE_ARG(float, "nms_thresh", 0.7f)),
+        nms_thresh_(OP_SINGLE_ARG(float, "nms_thresh", 0.7f)),
+        score_thresh_(OP_SINGLE_ARG(float, "score_thresh", 0.f)),
        min_level_(OP_SINGLE_ARG(int64_t, "min_level", 2)),
        max_level_(OP_SINGLE_ARG(int64_t, "max_level", 5)),
        canonical_level_(OP_SINGLE_ARG(int64_t, "canonical_level", 4)),
        canonical_scale_(OP_SINGLE_ARG(int64_t, "canonical_scale", 224)) {}
  USE_OPERATOR_FUNCTIONS;
-  void RunOnDevice() override;
+  void RunOnDevice() override {
+    DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(SCORES));
+  }
  template <typename T>
  void DoRunWithType();
-  enum INPUT_TAGS { SCORES = -3, DELTAS = -2, IMAGE_INFO = -1 };
+  enum INPUT_TAGS { SCORES = 0, DELTAS = 1, IM_INFO = 2, GRID_INFO = 3 };
 protected:
-  float nms_thr_;
+  float nms_thresh_, score_thresh_;
-  vec64_t strides_, indices_, roi_indices_;
+  vector<int64_t> strides_;
-  vector<float> ratios_, scales_, scores_, anchors_;
+  vector<float> ratios_, scales_;
-  int64_t pre_nms_top_n_, post_nms_top_n_;
+  int64_t min_level_, max_level_;
-  int64_t num_images_, min_level_, max_level_;
+  int64_t pre_nms_topn_, post_nms_topn_;
  int64_t canonical_level_, canonical_scale_;
-  Tensor proposals_;
+  vector<float> scores_;
+  vector<int64_t> indices_, nms_indices_;
+  vector<vector<float>> cell_anchors_;
+  vector<vector<float>> output_rois_;
 };
 } // namespace dragon
-#endif // SEETADET_CXX_OPERATORS_RPN_DECODER_OP_H_
+#endif // DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
--- a/csrc/cxx/setup.py
+++ b/csrc/cxx/setup.py
@@ -8,7 +8,7 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
-"""Build cxx sources."""
+"""Build cpp extensions."""
 from __future__ import absolute_import
 from __future__ import division
@@ -16,7 +16,7 @@ from __future__ import print_function
 import glob
-from dragon.tools import cpp_extension
+from dragon.utils import cpp_extension
 from setuptools import setup
 Extension = cpp_extension.CppExtension
@@ -32,23 +32,18 @@ def find_sources(*dirs):
    sources = []
    for path in dirs:
        for ext_suffix in ext_suffixes:
-            sources += glob.glob(
+            sources += glob.glob(path + '/*' + ext_suffix, recursive=True)
-                path + '/*' + ext_suffix,
-                recursive=True,
-            )
    return sources
 ext_modules = [
    Extension(
-        name='install.lib.modules._C',
+        name='seetadet.ops._C',
        sources=find_sources('**'),
        define_macros=[('THRUST_IGNORE_CUB_VERSION_CHECK', None)],
    ),
 ]
-setup(
+setup(name='seetadet',
-    name='SeetaDet',
      ext_modules=ext_modules,
-    cmdclass={'build_ext': cpp_extension.BuildExtension},
+      cmdclass={'build_ext': cpp_extension.BuildExtension})
-)
--- a/csrc/cxx/utils/detection.h
+++ b/csrc/cxx/utils/detection.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_H_
+#include "../utils/detection/anchors.h"
+#include "../utils/detection/bbox.h"
+#include "../utils/detection/nms.h"
+#include "../utils/detection/proposals.h"
+#include "../utils/detection/types.h"
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_H_
--- a/csrc/cxx/utils/detection/anchors.h
+++ b/csrc/cxx/utils/detection/anchors.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
+#include "../../utils/detection/types.h"
+namespace dragon {
+namespace detection {
+/*!
+ * Anchor Functions.
+ */
+template <typename IndexT>
+inline void SetGridArgs(
+    const int num_anchors,
+    const int num_cell_anchors,
+    const IndexT* grid_info,
+    vector<GridArgs<IndexT>>& grid_args) {
+  IndexT grid_offset = 0;
+  for (int i = 0; i < grid_args.size(); ++i, grid_info += 2) {
+    auto& args = grid_args[i];
+    args.h = grid_info[0];
+    args.w = grid_info[1];
+    args.offset = grid_offset;
+    grid_offset += num_cell_anchors * args.h * args.w;
+  }
+  std::stringstream ss;
+  if (grid_offset != num_anchors) {
+    ss << "Mismatched number of anchors. (Excepted ";
+    ss << num_anchors << ", Got " << grid_offset << ")";
+    for (int i = 0; i < grid_args.size(); ++i) {
+      ss << "\nGrid #" << i << ": "
+         << "A=" << num_cell_anchors << ", H=" << grid_args[i].h
+         << ", W=" << grid_args[i].w << "\n";
+    }
+  }
+  if (!ss.str().empty()) LOG(FATAL) << ss.str();
+}
+template <typename T>
+inline void GenerateAnchors(
+    const int stride,
+    const int num_ratios,
+    const int num_scales,
+    const T* ratios,
+    const T* scales,
+    T* anchors) {
+  T* offset_anchors = anchors;
+  const T area = T(stride * stride);
+  const T ctr = T(0.5) * T(stride - 1);
+  for (int i = 0; i < num_ratios; ++i) {
+    const T ratio_w = std::round(std::sqrt(area / ratios[i]));
+    const T ratio_h = std::round(ratio_w * ratios[i]);
+    for (int j = 0; j < num_scales; ++j) {
+      const T w_half = T(0.5) * (ratio_w * scales[j] - T(1));
+      const T h_half = T(0.5) * (ratio_h * scales[j] - T(1));
+      offset_anchors[0] = ctr - w_half;
+      offset_anchors[1] = ctr - h_half;
+      offset_anchors[2] = ctr + w_half;
+      offset_anchors[3] = ctr + h_half;
+      offset_anchors += 4;
+    }
+  }
+}
+template <typename T>
+inline void GetAnchors(
+    const int num_anchors,
+    const int num_cell_anchors,
+    const GridArgs<int64_t>& args,
+    const T* cell_anchors,
+    const int64_t* indices,
+    T* anchors) {
+  const int64_t index_min = args.offset;
+  const int64_t index_max = num_cell_anchors * args.h * args.w;
+  for (int i = 0; i < num_anchors; ++i) {
+    auto index = indices[i] - index_min;
+    if (index >= 0 && index < index_max) {
+      const auto w = index % args.w;
+      index /= args.w;
+      const auto h = index % args.h;
+      index /= args.h;
+      const auto shift_x = T(w * args.stride);
+      const auto shift_y = T(h * args.stride);
+      auto* offset_anchors = anchors + i * 5;
+      const auto* offset_cell_anchors = cell_anchors + index * 4;
+      offset_anchors[0] = shift_x + offset_cell_anchors[0];
+      offset_anchors[1] = shift_y + offset_cell_anchors[1];
+      offset_anchors[2] = shift_x + offset_cell_anchors[2];
+      offset_anchors[3] = shift_y + offset_cell_anchors[3];
+    }
+  }
+}
+template <typename T>
+inline void GetAnchors(
+    const int num_anchors,
+    const int num_cell_anchors,
+    const int num_classes,
+    const GridArgs<int64_t>& args,
+    const T* cell_anchors,
+    const int64_t* indices,
+    T* anchors) {
+  const int64_t index_min = num_classes * args.offset;
+  const int64_t index_max = num_classes * (num_cell_anchors * args.h * args.w);
+  for (int i = 0; i < num_anchors; ++i) {
+    auto index = indices[i] - index_min;
+    if (index >= 0 && index < index_max) {
+      index /= num_classes;
+      const auto w = index % args.w;
+      index /= args.w;
+      const auto h = index % args.h;
+      index /= args.h;
+      const auto shift_x = T(w * args.stride);
+      const auto shift_y = T(h * args.stride);
+      auto* offset_anchors = anchors + i * 7 + 1;
+      const auto* offset_cell_anchors = cell_anchors + index * 4;
+      offset_anchors[0] = shift_x + offset_cell_anchors[0];
+      offset_anchors[1] = shift_y + offset_cell_anchors[1];
+      offset_anchors[2] = shift_x + offset_cell_anchors[2];
+      offset_anchors[3] = shift_y + offset_cell_anchors[3];
+    }
+  }
+}
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
--- a/csrc/cxx/utils/detection/bbox.h
+++ b/csrc/cxx/utils/detection/bbox.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
+#include "../../utils/detection/types.h"
+#if defined(__CUDACC__)
+#define HOSTDEVICE_DECL inline __host__ __device__
+#else
+#define HOSTDEVICE_DECL inline
+#endif
+namespace dragon {
+namespace detection {
+/*
+ * BBox Functions.
+ */
+template <typename T, class BoxT>
+inline void SortBoxes(const int N, T* data, bool descend = true) {
+  auto* boxes = reinterpret_cast<BoxT*>(data);
+  std::sort(boxes, boxes + N, [descend](BoxT lhs, BoxT rhs) {
+    return descend ? (lhs.score > rhs.score) : (lhs.score < rhs.score);
+  });
+}
+/*
+ * BBox Utilities.
+ */
+namespace utils {
+template <typename T>
+HOSTDEVICE_DECL bool CheckIoU(const T thresh, const T* a, const T* b) {
+#if defined(__CUDACC__)
+  const T x1 = max(a[0], b[0]);
+  const T y1 = max(a[1], b[1]);
+  const T x2 = min(a[2], b[2]);
+  const T y2 = min(a[3], b[3]);
+  const T width = max(T(0), x2 - x1 + T(1));
+  const T height = max(T(0), y2 - y1 + T(1));
+#else
+  const T x1 = std::max(a[0], b[0]);
+  const T y1 = std::max(a[1], b[1]);
+  const T x2 = std::min(a[2], b[2]);
+  const T y2 = std::min(a[3], b[3]);
+  const T width = std::max(T(0), x2 - x1 + T(1));
+  const T height = std::max(T(0), y2 - y1 + T(1));
+#endif
+  const T inter = width * height;
+  const T Sa = (a[2] - a[0] + T(1)) * (a[3] - a[1] + T(1));
+  const T Sb = (b[2] - b[0] + T(1)) * (b[3] - b[1] + T(1));
+  return inter > thresh * (Sa + Sb - inter);
+}
+template <typename T>
+inline void BBoxTransform(
+    const T dx,
+    const T dy,
+    const T dw,
+    const T dh,
+    const T im_w,
+    const T im_h,
+    const T im_scale_h,
+    const T im_scale_w,
+    T* bbox) {
+  const T w = bbox[2] - bbox[0] + 1;
+  const T h = bbox[3] - bbox[1] + 1;
+  const T ctr_x = bbox[0] + T(0.5) * w;
+  const T ctr_y = bbox[1] + T(0.5) * h;
+  const T pred_ctr_x = dx * w + ctr_x;
+  const T pred_ctr_y = dy * h + ctr_y;
+  const T pred_w = std::exp(dw) * w;
+  const T pred_h = std::exp(dh) * h;
+  const T x1 = pred_ctr_x - T(0.5) * pred_w;
+  const T y1 = pred_ctr_y - T(0.5) * pred_h;
+  const T x2 = pred_ctr_x + T(0.5) * pred_w;
+  const T y2 = pred_ctr_y + T(0.5) * pred_h;
+  bbox[0] = std::max(T(0), std::min(x1, im_w - T(1))) / im_scale_w;
+  bbox[1] = std::max(T(0), std::min(y1, im_h - T(1))) / im_scale_h;
+  bbox[2] = std::max(T(0), std::min(x2, im_w - T(1))) / im_scale_w;
+  bbox[3] = std::max(T(0), std::min(y2, im_h - T(1))) / im_scale_h;
+}
+template <typename T>
+inline int GetBBoxLevel(
+    const int lvl_min,
+    const int lvl_max,
+    const int lvl0,
+    const int s0,
+    T* bbox) {
+  const T w = bbox[2] - bbox[0] + 1;
+  const T h = bbox[3] - bbox[1] + 1;
+  const T s = std::sqrt(w * h);
+  const int lvl = lvl0 + std::log2(s / s0 + T(1e-6));
+  return std::min(std::max(lvl, lvl_min), lvl_max);
+}
+} // namespace utils
+} // namespace detection
+} // namespace dragon
+#undef HOSTDEVICE_DECL
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
--- a/csrc/cxx/utils/detection/iterator.h
+++ b/csrc/cxx/utils/detection/iterator.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *     <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
+#include <dragon/core/common.h>
+namespace dragon {
+namespace detection {
+template <typename MapT>
+class KeyValueMapIterator
+    : public std::iterator<std::input_iterator_tag, MapT> {
+ public:
+  typedef KeyValueMapIterator self_type;
+  typedef ptrdiff_t difference_type;
+  typedef MapT value_type;
+  typedef MapT& reference;
+  KeyValueMapIterator(
+      typename MapT::key_type* key_ptr,
+      typename MapT::value_type* value_ptr)
+      : key_ptr_(key_ptr), value_ptr_(value_ptr) {}
+  self_type operator++(int) {
+    self_type ret = *this;
+    key_ptr_++;
+    value_ptr_++;
+    return ret;
+  }
+  self_type operator++() {
+    key_ptr_++;
+    value_ptr_++;
+    return *this;
+  }
+  self_type operator--() {
+    key_ptr_--;
+    value_ptr_--;
+    return *this;
+  }
+  self_type operator--(int) {
+    self_type ret = *this;
+    key_ptr_--;
+    value_ptr_--;
+    return ret;
+  }
+  reference operator*() const {
+    if (map_.key_ptr != key_ptr_) {
+      map_.key_ptr = key_ptr_;
+      map_.value_ptr = value_ptr_;
+    }
+    return map_;
+  }
+  self_type operator+(difference_type n) const {
+    return self_type(key_ptr_ + n, value_ptr_ + n);
+  }
+  self_type& operator+=(difference_type n) {
+    key_ptr_ += n;
+    value_ptr_ += n;
+    return *this;
+  }
+  self_type operator-(difference_type n) const {
+    return self_type(key_ptr_ - n, value_ptr_ - n);
+  }
+  self_type& operator-=(difference_type n) {
+    key_ptr_ -= n;
+    value_ptr_ -= n;
+    return *this;
+  }
+  difference_type operator-(self_type other) const {
+    return key_ptr_ - other.key_ptr_;
+  }
+  bool operator<(const self_type& rhs) const {
+    return key_ptr_ < rhs.key_ptr_;
+  }
+  bool operator<=(const self_type& rhs) const {
+    return key_ptr_ <= rhs.key_ptr_;
+  }
+  bool operator==(const self_type& rhs) const {
+    return key_ptr_ == rhs.key_ptr_;
+  }
+  bool operator!=(const self_type& rhs) const {
+    return key_ptr_ != rhs.key_ptr_;
+  }
+ private:
+  mutable MapT map_;
+  typename MapT::key_type* key_ptr_;
+  typename MapT::value_type* value_ptr_;
+};
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
--- a/csrc/cxx/utils/detection/nms.cc
+++ b/csrc/cxx/utils/detection/nms.cc
+#include <dragon/core/context.h>
+#include "../../utils/detection/bbox.h"
+#include "../../utils/detection/nms.h"
+namespace dragon {
+namespace detection {
+template <>
+void ApplyNMS<float, CPUContext>(
+    const int N,
+    const int K,
+    const float thresh,
+    const float* boxes,
+    vector<int64_t>& indices,
+    CPUContext* ctx) {
+  int num_selected = 0;
+  indices.resize(K);
+  vector<char> is_dead(N, 0);
+  for (int i = 0; i < N; ++i) {
+    if (is_dead[i]) continue;
+    indices[num_selected++] = i;
+    if (num_selected >= K) break;
+    for (int j = i + 1; j < N; ++j) {
+      if (is_dead[j]) continue;
+      if (!utils::CheckIoU(thresh, &boxes[i * 5], &boxes[j * 5])) continue;
+      is_dead[j] = 1;
+    }
+  }
+  indices.resize(num_selected);
+}
+} // namespace detection
+} // namespace dragon
--- a/csrc/cxx/utils/detection/nms.cu
+++ b/csrc/cxx/utils/detection/nms.cu
+#include <dragon/core/context_cuda.h>
+#include <dragon/core/workspace.h>
+#include "../../utils/detection/bbox.h"
+#include "../../utils/detection/nms.h"
+#include "../../utils/detection/utils.h"
+namespace dragon {
+namespace detection {
+namespace {
+#define NUM_THREADS 64
+template <typename T>
+__global__ void _NonMaxSuppression(
+    const int N,
+    const T thresh,
+    const T* boxes,
+    uint64_t* mask) {
+  const int row_start = blockIdx.y;
+  const int col_start = blockIdx.x;
+  if (row_start > col_start) return;
+  const int row_size = min(N - row_start * NUM_THREADS, NUM_THREADS);
+  const int col_size = min(N - col_start * NUM_THREADS, NUM_THREADS);
+  __shared__ T block_boxes[NUM_THREADS * 4];
+  if (threadIdx.x < col_size) {
+    auto* offset_block_boxes = block_boxes + threadIdx.x * 4;
+    auto* offset_boxes = boxes + (col_start * NUM_THREADS + threadIdx.x) * 5;
+#pragma unroll
+    for (int i = 0; i < 4; ++i) {
+      *(offset_block_boxes++) = *(offset_boxes++);
+    }
+  }
+  __syncthreads();
+  if (threadIdx.x < row_size) {
+    const int index = row_start * NUM_THREADS + threadIdx.x;
+    const T* offset_boxes = boxes + index * 5;
+    unsigned long long val = 0;
+    const int start = (row_start == col_start) ? (threadIdx.x + 1) : 0;
+    for (int i = start; i < col_size; ++i) {
+      if (utils::CheckIoU(thresh, offset_boxes, block_boxes + i * 4)) {
+        val |= 1ULL << i;
+      }
+    }
+    mask[index * gridDim.x + col_start] = val;
+  }
+}
+} // namespace
+template <>
+void ApplyNMS<float, CUDAContext>(
+    const int N,
+    const int K,
+    const float thresh,
+    const float* boxes,
+    vector<int64_t>& indices,
+    CUDAContext* ctx) {
+  const auto num_blocks = utils::DivUp(N, NUM_THREADS);
+  vector<uint64_t> mask_host(N * num_blocks);
+  auto* mask_dev = (uint64_t*)ctx->workspace()->data<CUDAContext>(
+      mask_host.size() * sizeof(uint64_t), "BufferKernel");
+  _NonMaxSuppression<<<
+      dim3(num_blocks, num_blocks),
+      NUM_THREADS,
+      0,
+      ctx->cuda_stream()>>>(N, thresh, boxes, mask_dev);
+  CUDA_CHECK(cudaMemcpyAsync(
+      mask_host.data(),
+      mask_dev,
+      mask_host.size() * sizeof(uint64_t),
+      cudaMemcpyDeviceToHost,
+      ctx->cuda_stream()));
+  ctx->FinishDeviceComputation();
+  vector<uint64_t> is_dead(num_blocks);
+  memset(&is_dead[0], 0, sizeof(uint64_t) * num_blocks);
+  int num_selected = 0;
+  indices.resize(K);
+  for (int i = 0; i < N; ++i) {
+    const int nblock = i / NUM_THREADS;
+    const int inblock = i % NUM_THREADS;
+    if (!(is_dead[nblock] & (1ULL << inblock))) {
+      indices[num_selected++] = i;
+      if (num_selected >= K) break;
+      auto* offset_mask = &mask_host[0] + i * num_blocks;
+      for (int j = nblock; j < num_blocks; ++j) {
+        is_dead[j] |= offset_mask[j];
+      }
+    }
+  }
+  indices.resize(num_selected);
+}
+} // namespace detection
+} // namespace dragon
--- a/csrc/cxx/utils/detection/nms.h
+++ b/csrc/cxx/utils/detection/nms.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
+#include "../../utils/detection/types.h"
+namespace dragon {
+namespace detection {
+template <typename T, class Context>
+void ApplyNMS(
+    const int N,
+    const int K,
+    const T thresh,
+    const T* boxes,
+    vector<int64_t>& indices,
+    Context* ctx);
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
--- a/csrc/cxx/utils/detection/proposals.cc
+++ b/csrc/cxx/utils/detection/proposals.cc
+#include <dragon/core/context.h>
+#include "../../utils/detection/proposals.h"
+namespace dragon {
+namespace detection {
+namespace {
+template <typename KeyT, typename ValueT>
+inline void
+ArgPartition(const int N, const int K, const ValueT* values, KeyT* keys) {
+  std::nth_element(keys, keys + K, keys + N, [&values](KeyT lhs, KeyT rhs) {
+    return values[lhs] > values[rhs];
+  });
+}
+} // namespace
+template <>
+void SelectProposals<float, CPUContext>(
+    const int N,
+    const int K,
+    const float thresh,
+    const float* scores,
+    vector<float>& out_scores,
+    vector<int64_t>& out_indices,
+    CPUContext* ctx) {
+  int num_selected = 0;
+  out_indices.resize(N);
+  if (thresh > 0.f) {
+    for (int i = 0; i < N; ++i) {
+      if (scores[i] > thresh) {
+        out_indices[num_selected++] = i;
+      }
+    }
+  } else {
+    num_selected = N;
+    std::iota(out_indices.begin(), out_indices.end(), 0);
+  }
+  if (num_selected > K) {
+    ArgPartition(num_selected, K, scores, out_indices.data());
+    out_scores.resize(K);
+    out_indices.resize(K);
+    for (int i = 0; i < K; ++i) {
+      out_scores[i] = scores[out_indices[i]];
+    }
+  } else {
+    out_scores.resize(num_selected);
+    out_indices.resize(num_selected);
+    for (int i = 0; i < num_selected; ++i) {
+      out_scores[i] = scores[out_indices[i]];
+    }
+  }
+}
+} // namespace detection
+} // namespace dragon
--- a/csrc/cxx/utils/detection/proposals.cu
+++ b/csrc/cxx/utils/detection/proposals.cu
+#include <dragon/core/context_cuda.h>
+#include <dragon/core/workspace.h>
+#include <dragon/utils/device/common_thrust.h>
+#include "../../utils/detection/iterator.h"
+#include "../../utils/detection/proposals.h"
+namespace dragon {
+namespace detection {
+namespace {
+template <typename KeyT, typename ValueT>
+struct ThresholdFunctor {
+  ThresholdFunctor(ValueT thresh) : thresh_(thresh) {}
+  inline __device__ bool operator()(
+      const thrust::tuple<KeyT, ValueT>& kv) const {
+    return thrust::get<1>(kv) > thresh_;
+  }
+  ValueT thresh_;
+};
+template <typename IterT>
+inline void ArgPartition(const int N, const int K, IterT data) {
+  std::nth_element(
+      data,
+      data + K,
+      data + N,
+      [](const typename IterT::value_type& lhs,
+         const typename IterT::value_type& rhs) {
+        return *lhs.value_ptr > *rhs.value_ptr;
+      });
+}
+} // namespace
+template <>
+void SelectProposals<float, CUDAContext>(
+    const int N,
+    const int K,
+    const float thresh,
+    const float* scores,
+    vector<float>& out_scores,
+    vector<int64_t>& out_indices,
+    CUDAContext* ctx) {
+  int num_selected = N;
+  int64_t* indices = nullptr;
+  if (thresh > 0.f) {
+    indices = ctx->workspace()->data<int64_t, CUDAContext>(N, "BufferKernel");
+    auto policy = thrust::cuda::par.on(ctx->cuda_stream());
+    auto functor = ThresholdFunctor<int64_t, float>(thresh);
+    thrust::sequence(policy, indices, indices + N);
+    auto kv = thrust::make_tuple(indices, const_cast<float*>(scores));
+    auto first = thrust::make_zip_iterator(kv);
+    auto last = thrust::partition(policy, first, first + N, functor);
+    num_selected = last - first;
+  }
+  out_scores.resize(num_selected);
+  out_indices.resize(num_selected);
+  CUDA_CHECK(cudaMemcpyAsync(
+      out_scores.data(),
+      scores,
+      num_selected * sizeof(float),
+      cudaMemcpyDeviceToHost,
+      ctx->cuda_stream()));
+  if (thresh > 0.f) {
+    CUDA_CHECK(cudaMemcpyAsync(
+        out_indices.data(),
+        indices,
+        num_selected * sizeof(int64_t),
+        cudaMemcpyDeviceToHost,
+        ctx->cuda_stream()));
+  } else {
+    std::iota(out_indices.begin(), out_indices.end(), 0);
+  }
+  ctx->FinishDeviceComputation();
+  if (num_selected > K) {
+    auto iter = KeyValueMapIterator<KeyValueMap<int64_t, float>>(
+        out_indices.data(), out_scores.data());
+    ArgPartition(num_selected, K, iter);
+    out_scores.resize(K);
+    out_indices.resize(K);
+  }
+}
+} // namespace detection
+} // namespace dragon
--- a/csrc/cxx/utils/detection/proposals.h
+++ b/csrc/cxx/utils/detection/proposals.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
+#include "../../utils/detection/bbox.h"
+#include "../../utils/detection/types.h"
+namespace dragon {
+namespace detection {
+template <typename T, class Context>
+void SelectProposals(
+    const int N,
+    const int K,
+    const float thresh,
+    const T* input_scores,
+    vector<T>& output_scores,
+    vector<int64_t>& output_indices,
+    Context* ctx);
+template <typename T>
+void DecodeProposals(
+    const int num_proposals,
+    const int num_anchors,
+    const ImageArgs<int64_t>& im_args,
+    const T* scores,
+    const T* deltas,
+    const int64_t* indices,
+    T* proposals) {
+  T* offset_proposals = proposals;
+  const T* offset_dx = deltas;
+  const T* offset_dy = deltas + num_anchors;
+  const T* offset_dw = deltas + num_anchors * 2;
+  const T* offset_dh = deltas + num_anchors * 3;
+  for (int i = 0; i < num_proposals; ++i) {
+    const auto index = indices[i];
+    utils::BBoxTransform(
+        offset_dx[index],
+        offset_dy[index],
+        offset_dw[index],
+        offset_dh[index],
+        T(im_args.w),
+        T(im_args.h),
+        T(1),
+        T(1),
+        offset_proposals);
+    offset_proposals[4] = scores[i];
+    offset_proposals += 5;
+  }
+}
+template <typename T>
+void DecodeDetections(
+    const int num_dets,
+    const int num_anchors,
+    const int num_classes,
+    const ImageArgs<int64_t>& im_args,
+    const T* scores,
+    const T* deltas,
+    const int64_t* indices,
+    T* dets) {
+  T* offset_dets = dets;
+  const T* offset_dx = deltas;
+  const T* offset_dy = deltas + num_anchors;
+  const T* offset_dw = deltas + num_anchors * 2;
+  const T* offset_dh = deltas + num_anchors * 3;
+  for (int i = 0; i < num_dets; ++i) {
+    const auto index = indices[i] / num_classes;
+    utils::BBoxTransform(
+        offset_dx[index],
+        offset_dy[index],
+        offset_dw[index],
+        offset_dh[index],
+        T(im_args.w),
+        T(im_args.h),
+        T(im_args.scale_h),
+        T(im_args.scale_w),
+        offset_dets + 1);
+    offset_dets[0] = T(im_args.batch_ind);
+    offset_dets[5] = scores[i];
+    offset_dets[6] = T(indices[i] % num_classes + 1);
+    offset_dets += 7;
+  }
+}
+template <typename T>
+inline void ApplyHistogram(
+    const int N,
+    const int lvl_min,
+    const int lvl_max,
+    const int lvl0,
+    const int s0,
+    const T* boxes,
+    const T* batch_indices,
+    const int64_t* box_indices,
+    vector<vector<T>>& output_rois) {
+  vector<int> bin_indices(N);
+  vector<int> bin_count(lvl_max - lvl_min + 1, 0);
+  for (int i = 0; i < N; ++i) {
+    const T* offset_boxes = boxes + box_indices[i] * 5;
+    auto lvl = utils::GetBBoxLevel(lvl_min, lvl_max, lvl0, s0, offset_boxes);
+    bin_indices[i] = lvl - lvl_min;
+    bin_count[lvl - lvl_min]++;
+  }
+  output_rois.resize(lvl_max - lvl_min + 1);
+  for (int i = 0; i < output_rois.size(); ++i) {
+    auto& rois = output_rois[i];
+    rois.resize(std::max(bin_count[i], 1) * 5);
+    if (bin_count[i] == 0) rois[0] = T(-1); // Ignored.
+  }
+  for (int i = 0; i < N; ++i) {
+    const T* offset_boxes = boxes + box_indices[i] * 5;
+    const auto bin_index = bin_indices[i];
+    const auto roi_index = --bin_count[bin_index];
+    auto& rois = output_rois[bin_index];
+    T* offset_rois = rois.data() + roi_index * 5;
+    offset_rois[0] = batch_indices[i];
+    offset_rois[1] = offset_boxes[0];
+    offset_rois[2] = offset_boxes[1];
+    offset_rois[3] = offset_boxes[2];
+    offset_rois[4] = offset_boxes[3];
+  }
+}
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
--- a/csrc/cxx/utils/detection/types.h
+++ b/csrc/cxx/utils/detection/types.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
+#include <dragon/core/common.h>
+namespace dragon {
+namespace detection {
+template <typename T>
+struct Box4d {
+  T x1, y1, x2, y2;
+};
+template <typename T>
+struct Box5d {
+  T x1, y1, x2, y2, score;
+};
+template <typename IndexT>
+struct ImageArgs {
+  ImageArgs(const float* im_info) {
+    h = im_info[0], w = im_info[1];
+    scale_h = im_info[2], scale_w = im_info[3];
+  }
+  IndexT batch_ind, h, w;
+  float scale_h, scale_w;
+};
+template <typename IndexT>
+struct GridArgs {
+  IndexT h, w, stride, offset;
+};
+template <typename KeyT, typename ValueT>
+struct KeyValueMap {
+  typedef KeyT key_type;
+  typedef ValueT value_type;
+  friend void swap(KeyValueMap& x, KeyValueMap& y) {
+    std::swap(*x.key_ptr, *y.key_ptr);
+    std::swap(*x.value_ptr, *y.value_ptr);
+  }
+  KeyT* key_ptr = nullptr;
+  ValueT* value_ptr = nullptr;
+};
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
--- a/csrc/cxx/utils/detection/utils.h
+++ b/csrc/cxx/utils/detection/utils.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *     <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
+namespace dragon {
+namespace detection {
+/*
+ * Detection Utilities.
+ */
+namespace utils {
+template <typename T>
+inline T DivUp(const T a, const T b) {
+  return (a + b - T(1)) / b;
+}
+} // namespace utils
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
--- a/csrc/cxx/utils/detection_utils.cc
+++ b/csrc/cxx/utils/detection_utils.cc
-#include "detection_utils.h"
-#include <dragon/core/context.h>
-namespace dragon {
-namespace utils {
-namespace detection {
-template <typename T>
-T IoU(const T A[], const T B[]) {
-  if (A[0] > B[2] || A[1] > B[3] || A[2] < B[0] || A[3] < B[1]) return 0;
-  const T x1 = std::max(A[0], B[0]);
-  const T y1 = std::max(A[1], B[1]);
-  const T x2 = std::min(A[2], B[2]);
-  const T y2 = std::min(A[3], B[3]);
-  const T width = std::max((T)0, x2 - x1 + 1);
-  const T height = std::max((T)0, y2 - y1 + 1);
-  const T area = width * height;
-  const T A_area = (A[2] - A[0] + 1) * (A[3] - A[1] + 1);
-  const T B_area = (B[2] - B[0] + 1) * (B[3] - B[1] + 1);
-  return area / (A_area + B_area - area);
-}
-template <>
-void ApplyNMS<float, CPUContext>(
-    const int num_boxes,
-    const int max_keeps,
-    const float thresh,
-    const float* boxes,
-    int64_t* keep_indices,
-    int& num_keep,
-    CPUContext* ctx) {
-  int count = 0;
-  std::vector<char> is_dead(num_boxes);
-  for (int i = 0; i < num_boxes; ++i)
-    is_dead[i] = 0;
-  for (int i = 0; i < num_boxes; ++i) {
-    if (is_dead[i]) continue;
-    keep_indices[count++] = i;
-    if (count == max_keeps) break;
-    for (int j = i + 1; j < num_boxes; ++j)
-      if (!is_dead[j] && IoU(&boxes[i * 5], &boxes[j * 5]) > thresh) {
-        is_dead[j] = 1;
-      }
-  }
-  num_keep = count;
-}
-template <>
-void SelectProposals<float, CPUContext>(
-    const int count,
-    const float score_thresh,
-    const float* input_scores,
-    vector<float>& output_scores,
-    vector<int64_t>& output_indices,
-    CPUContext* ctx) {
-  int num_proposals = 0;
-  for (int i = 0; i < count; ++i) {
-    if (input_scores[i] > score_thresh) {
-      output_indices[num_proposals++] = i;
-    }
-  }
-  output_scores.resize(num_proposals);
-  for (int i = 0; i < num_proposals; ++i) {
-    output_scores[i] = input_scores[output_indices[i]];
-  }
-}
-} // namespace detection
-} // namespace utils
-} // namespace dragon
--- a/csrc/cxx/utils/detection_utils.cu
+++ b/csrc/cxx/utils/detection_utils.cu
-#ifdef USE_CUDA
-#include <dragon/core/context_cuda.h>
-#include <dragon/core/workspace.h>
-#include <dragon/utils/device/common_cub.h>
-#include <dragon/utils/device/common_thrust.h>
-#include "detection_utils.h"
-namespace dragon {
-namespace utils {
-namespace detection {
-#define DIV_UP(m, n) ((m) / (n) + ((m) % (n) > 0))
-#define NUM_THREADS 64
-namespace {
-template <typename T>
-struct ThresholdFunctor {
-  ThresholdFunctor(float thresh) : thresh_(thresh) {}
-  inline __device__ bool operator()(
-      const thrust::tuple<int64_t, T>& key_val) const {
-    return thrust::get<1>(key_val) > thresh_;
-  }
-  float thresh_;
-};
-template <typename T>
-__device__ bool _CheckIoU(const T* a, const T* b, const float thresh) {
-  const T x1 = max(a[0], b[0]);
-  const T y1 = max(a[1], b[1]);
-  const T x2 = min(a[2], b[2]);
-  const T y2 = min(a[3], b[3]);
-  const T width = max(T(0), x2 - x1 + 1);
-  const T height = max(T(0), y2 - y1 + 1);
-  const T inter = width * height;
-  const T Sa = (a[2] - a[0] + T(1)) * (a[3] - a[1] + T(1));
-  const T Sb = (b[2] - b[0] + T(1)) * (b[3] - b[1] + T(1));
-  return inter > thresh * (Sa + Sb - inter);
-}
-template <typename T>
-__global__ void _NonMaxSuppression(
-    const int num_blocks,
-    const int num_boxes,
-    const T thresh,
-    const T* dev_boxes,
-    uint64_t* dev_mask) {
-  const int row_start = blockIdx.y;
-  const int col_start = blockIdx.x;
-  if (row_start > col_start) return;
-  const int row_size = min(num_boxes - row_start * NUM_THREADS, NUM_THREADS);
-  const int col_size = min(num_boxes - col_start * NUM_THREADS, NUM_THREADS);
-  __shared__ T block_boxes[NUM_THREADS * 4];
-  if (threadIdx.x < col_size) {
-    const int c1 = threadIdx.x * 4;
-    const int c2 = (col_start * NUM_THREADS + threadIdx.x) * 5;
-    block_boxes[c1] = dev_boxes[c2];
-    block_boxes[c1 + 1] = dev_boxes[c2 + 1];
-    block_boxes[c1 + 2] = dev_boxes[c2 + 2];
-    block_boxes[c1 + 3] = dev_boxes[c2 + 3];
-  }
-  __syncthreads();
-  if (threadIdx.x < row_size) {
-    const int index = row_start * NUM_THREADS + threadIdx.x;
-    const T* dev_box = dev_boxes + index * 5;
-    unsigned long long val = 0;
-    const int start = (row_start == col_start) ? (threadIdx.x + 1) : 0;
-    for (int i = start; i < col_size; ++i) {
-      if (_CheckIoU(dev_box, block_boxes + i * 4, thresh)) {
-        val |= 1ULL << i;
-      }
-    }
-    dev_mask[index * num_blocks + col_start] = val;
-  }
-}
-} // namespace
-template <>
-void SelectProposals<float, CUDAContext>(
-    const int count,
-    const float score_thresh,
-    const float* in_scores,
-    vector<float>& out_scores,
-    vector<int64_t>& out_indices,
-    CUDAContext* ctx) {
-  auto* in_indices = ctx->workspace()->template data<int64_t, CUDAContext>(
-      {count}, "data:1")[0];
-  auto iter = thrust::make_zip_iterator(
-      thrust::make_tuple(in_indices, const_cast<float*>(in_scores)));
-  auto policy = thrust::cuda::par.on(ctx->cuda_stream());
-  thrust::counting_iterator<int64_t> offset(0);
-  thrust::copy(policy, offset, offset + count, in_indices);
-  auto last = thrust::partition(
-      policy, iter, iter + count, ThresholdFunctor<float>(score_thresh));
-  size_t num_proposals = last - iter;
-  out_scores.resize(num_proposals);
-  out_indices.resize(num_proposals);
-  CUDA_CHECK(cudaMemcpyAsync(
-      out_scores.data(),
-      in_scores,
-      num_proposals * sizeof(float),
-      cudaMemcpyDeviceToHost,
-      ctx->cuda_stream()));
-  CUDA_CHECK(cudaMemcpyAsync(
-      out_indices.data(),
-      in_indices,
-      num_proposals * sizeof(int64_t),
-      cudaMemcpyDeviceToHost,
-      ctx->cuda_stream()));
-  ctx->FinishDeviceComputation();
-}
-template <>
-void ApplyNMS<float, CUDAContext>(
-    const int num_boxes,
-    const int max_keeps,
-    const float thresh,
-    const float* boxes,
-    int64_t* keep_indices,
-    int& num_keep,
-    CUDAContext* ctx) {
-  const int num_blocks = DIV_UP(num_boxes, NUM_THREADS);
-  vector<uint64_t> mask_host(num_boxes * num_blocks);
-  auto* mask_dev = (uint64_t*)ctx->workspace()->data<CUDAContext>(
-      {mask_host.size() * sizeof(uint64_t)}, "data:1")[0];
-  _NonMaxSuppression<<<
-      dim3(num_blocks, num_blocks),
-      NUM_THREADS,
-      0,
-      ctx->cuda_stream()>>>(num_blocks, num_boxes, thresh, boxes, mask_dev);
-  CUDA_CHECK(cudaMemcpyAsync(
-      mask_host.data(),
-      mask_dev,
-      mask_host.size() * sizeof(uint64_t),
-      cudaMemcpyDeviceToHost,
-      ctx->cuda_stream()));
-  ctx->FinishDeviceComputation();
-  vector<uint64_t> dead_bit(num_blocks);
-  memset(&dead_bit[0], 0, sizeof(uint64_t) * num_blocks);
-  int num_selected = 0;
-  for (int i = 0; i < num_boxes; ++i) {
-    const int nblock = i / NUM_THREADS;
-    const int inblock = i % NUM_THREADS;
-    if (!(dead_bit[nblock] & (1ULL << inblock))) {
-      keep_indices[num_selected++] = i;
-      auto* mask_i = &mask_host[0] + i * num_blocks;
-      for (int j = nblock; j < num_blocks; ++j)
-        dead_bit[j] |= mask_i[j];
-      if (num_selected == max_keeps) break;
-    }
-  }
-  num_keep = num_selected;
-}
-} // namespace detection
-} // namespace utils
-} // namespace dragon
-#endif // USE_CUDA
--- a/csrc/cxx/utils/detection_utils.h
+++ b/csrc/cxx/utils/detection_utils.h
-/*!
- * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
- *
- * Licensed under the BSD 2-Clause License.
- * You should have received a copy of the BSD 2-Clause License
- * along with the software. If not, See,
- *
- *    <https://opensource.org/licenses/BSD-2-Clause>
- *
- * ------------------------------------------------------------
- */
-#ifndef SEETADET_CXX_UTILS_DETECTION_UTILS_H_
-#define SEETADET_CXX_UTILS_DETECTION_UTILS_H_
-#include "dragon/core/common.h"
-namespace dragon {
-namespace utils {
-namespace detection {
-#define ROUND(x) ((int)((x) + (T)0.5))
-/*!
- * Functional API
- */
-template <typename T>
-inline void ArgPartition(
-    const int count,
-    const int kth,
-    const bool descend,
-    const T* v,
-    vec64_t& indices) {
-  indices.resize(count);
-  std::iota(indices.begin(), indices.end(), 0);
-  if (descend) {
-    std::nth_element(
-        indices.begin(),
-        indices.begin() + kth,
-        indices.end(),
-        [&v](int64_t lhs, int64_t rhs) { return v[lhs] > v[rhs]; });
-  } else {
-    std::nth_element(
-        indices.begin(),
-        indices.begin() + kth,
-        indices.end(),
-        [&v](int64_t lhs, int64_t rhs) { return v[lhs] < v[rhs]; });
-  }
-}
-/*!
- * Box API
- */
-template <typename T>
-inline void BBoxTransform(
-    const T dx,
-    const T dy,
-    const T d_log_w,
-    const T d_log_h,
-    const T im_w,
-    const T im_h,
-    const T im_scale_h,
-    const T im_scale_w,
-    T* bbox) {
-  const T w = bbox[2] - bbox[0] + 1;
-  const T h = bbox[3] - bbox[1] + 1;
-  const T ctr_x = bbox[0] + (T)0.5 * w;
-  const T ctr_y = bbox[1] + (T)0.5 * h;
-  const T pred_ctr_x = dx * w + ctr_x;
-  const T pred_ctr_y = dy * h + ctr_y;
-  const T pred_w = exp(d_log_w) * w;
-  const T pred_h = exp(d_log_h) * h;
-  bbox[0] = pred_ctr_x - (T)0.5 * pred_w;
-  bbox[1] = pred_ctr_y - (T)0.5 * pred_h;
-  bbox[2] = pred_ctr_x + (T)0.5 * pred_w;
-  bbox[3] = pred_ctr_y + (T)0.5 * pred_h;
-  bbox[0] = std::max((T)0, std::min(bbox[0], im_w - 1)) / im_scale_w;
-  bbox[1] = std::max((T)0, std::min(bbox[1], im_h - 1)) / im_scale_h;
-  bbox[2] = std::max((T)0, std::min(bbox[2], im_w - 1)) / im_scale_w;
-  bbox[3] = std::max((T)0, std::min(bbox[3], im_h - 1)) / im_scale_h;
-}
-/*!
- * Anchor API
- */
-template <typename T>
-inline void GenerateAnchors(
-    int base_size,
-    const int num_ratios,
-    const int num_scales,
-    const T* ratios,
-    const T* scales,
-    T* anchors) {
-  const T base_area = (T)(base_size * base_size);
-  const T center = (T)0.5 * (base_size - (T)1);
-  T* offset_anchors = anchors;
-  for (int i = 0; i < num_ratios; ++i) {
-    const T ratio_w = (T)ROUND(sqrt(base_area / ratios[i]));
-    const T ratio_h = (T)ROUND(ratio_w * ratios[i]);
-    for (int j = 0; j < num_scales; ++j) {
-      const T scale_w = (T)0.5 * (ratio_w * scales[j] - (T)1);
-      const T scale_h = (T)0.5 * (ratio_h * scales[j] - (T)1);
-      offset_anchors[0] = center - scale_w;
-      offset_anchors[1] = center - scale_h;
-      offset_anchors[2] = center + scale_w;
-      offset_anchors[3] = center + scale_h;
-      offset_anchors += 4;
-    }
-  }
-}
-template <typename T>
-inline void GetShiftedAnchors(
-    const int num_proposals,
-    const int num_anchors,
-    const int feat_h,
-    const int feat_w,
-    const int stride,
-    const int stride_offset,
-    const T* base_anchors,
-    const int64_t* indices,
-    T* shifted_anchors) {
-  T x, y;
-  int idx_3d, a, h, w;
-  int idx_range = num_anchors * feat_h * feat_w;
-  for (int i = 0; i < num_proposals; ++i) {
-    idx_3d = (int)indices[i] - stride_offset;
-    if (idx_3d >= 0 && idx_3d < idx_range) {
-      w = idx_3d % feat_w;
-      h = (idx_3d / feat_w) % feat_h;
-      a = idx_3d / feat_w / feat_h;
-      x = (T)w * stride, y = (T)h * stride;
-      auto* A = base_anchors + a * 4;
-      auto* P = shifted_anchors + i * 5;
-      P[0] = x + A[0], P[1] = y + A[1];
-      P[2] = x + A[2], P[3] = y + A[3];
-    }
-  }
-}
-template <typename T>
-inline void GetShiftedAnchors(
-    const int num_proposals,
-    const int num_classes,
-    const int num_anchors,
-    const int feat_h,
-    const int feat_w,
-    const int stride,
-    const int stride_offset,
-    const T* base_anchors,
-    const int64_t* indices,
-    T* shifted_anchors) {
-  T x, y;
-  int idx_4d, a, h, w;
-  int lr = num_classes * stride_offset;
-  int rr = num_classes * (num_anchors * feat_h * feat_w);
-  for (int i = 0; i < num_proposals; ++i) {
-    idx_4d = (int)indices[i] - lr;
-    if (idx_4d >= 0 && idx_4d < rr) {
-      idx_4d /= num_classes;
-      w = idx_4d % feat_w;
-      h = (idx_4d / feat_w) % feat_h;
-      a = idx_4d / feat_w / feat_h;
-      x = (T)w * stride, y = (T)h * stride;
-      auto* A = base_anchors + a * 4;
-      auto* P = shifted_anchors + i * 7 + 1;
-      P[0] = x + A[0], P[1] = y + A[1];
-      P[2] = x + A[2], P[3] = y + A[3];
-    }
-  }
-}
-/*!
- * Proposal API
- */
-template <typename T, class Context>
-void SelectProposals(
-    const int count,
-    const float score_thresh,
-    const T* input_scores,
-    vector<T>& output_scores,
-    vector<int64_t>& output_indices,
-    Context* ctx);
-template <typename T>
-void GenerateProposals_v1(
-    const int K,
-    const int num_proposals,
-    const float im_h,
-    const float im_w,
-    const T* scores,
-    const T* deltas,
-    const int64_t* indices,
-    T* proposals) {
-  // Shifted anchors in format: [K, A, 4]
-  int64_t index, a, k;
-  const T* delta;
-  T* proposal = proposals;
-  T dx, dy, d_log_w, d_log_h;
-  for (int i = 0; i < num_proposals; ++i) {
-    index = indices[i];
-    a = index / K, k = index % K;
-    delta = deltas + k;
-    dx = delta[(a * 4 + 0) * K];
-    dy = delta[(a * 4 + 1) * K];
-    d_log_w = delta[(a * 4 + 2) * K];
-    d_log_h = delta[(a * 4 + 3) * K];
-    BBoxTransform(dx, dy, d_log_w, d_log_h, im_w, im_h, T(1), T(1), proposal);
-    proposal[4] = scores[index];
-    proposal += 5;
-  }
-}
-template <typename T>
-void GenerateProposals(
-    const int num_candidates,
-    const int num_proposals,
-    const float im_h,
-    const float im_w,
-    const T* scores,
-    const T* deltas,
-    const int64_t* indices,
-    T* proposals) {
-  // Shifted anchors in format: [4, A, K]
-  int64_t index;
-  int64_t num_candidates_2x = 2 * num_candidates;
-  int64_t num_candidates_3x = 3 * num_candidates;
-  T* proposal = proposals;
-  T dx, dy, d_log_w, d_log_h;
-  for (int i = 0; i < num_proposals; ++i) {
-    index = indices[i];
-    dx = deltas[index];
-    dy = deltas[num_candidates + index];
-    d_log_w = deltas[num_candidates_2x + index];
-    d_log_h = deltas[num_candidates_3x + index];
-    BBoxTransform(dx, dy, d_log_w, d_log_h, im_w, im_h, T(1), T(1), proposal);
-    proposal[4] = scores[index];
-    proposal += 5;
-  }
-}
-template <typename T>
-void GenerateDetections(
-    const int num_proposals,
-    const int num_boxes,
-    const int num_classes,
-    const int im_idx,
-    const float im_h,
-    const float im_w,
-    const float im_scale_h,
-    const float im_scale_w,
-    const T* scores,
-    const T* deltas,
-    const int64_t* indices,
-    T* detections) {
-  int64_t index, cls;
-  int64_t num_boxes_2x = 2 * num_boxes;
-  int64_t num_boxes_3x = 3 * num_boxes;
-  T* detection = detections;
-  float dx, dy, d_log_w, d_log_h;
-  for (int i = 0; i < num_proposals; ++i) {
-    cls = indices[i] % num_classes;
-    index = indices[i] / num_classes;
-    dx = deltas[index];
-    dy = deltas[num_boxes + index];
-    d_log_w = deltas[num_boxes_2x + index];
-    d_log_h = deltas[num_boxes_3x + index];
-    detection[0] = im_idx;
-    BBoxTransform(
-        dx,
-        dy,
-        d_log_w,
-        d_log_h,
-        im_w,
-        im_h,
-        im_scale_h,
-        im_scale_w,
-        detection + 1);
-    // detection[5] = scores[indices[i]];
-    detection[5] = scores[i];
-    detection[6] = cls + 1;
-    detection += 7;
-  }
-}
-template <typename T>
-inline void
-SortProposals(const int start, const int end, const int num_top, T* proposals) {
-  const T pivot_score = proposals[start * 5 + 4];
-  int left = start + 1, right = end;
-  while (left <= right) {
-    while (left <= end && proposals[left * 5 + 4] >= pivot_score)
-      ++left;
-    while (right > start && proposals[right * 5 + 4] <= pivot_score)
-      --right;
-    if (left <= right) {
-      for (int i = 0; i < 5; ++i)
-        std::swap(proposals[left * 5 + i], proposals[right * 5 + i]);
-      ++left;
-      --right;
-    }
-  }
-  if (right > start) {
-    for (int i = 0; i < 5; ++i)
-      std::swap(proposals[start * 5 + i], proposals[right * 5 + i]);
-  }
-  if (start < right - 1) SortProposals(start, right - 1, num_top, proposals);
-  if (right + 1 < num_top && right + 1 < end)
-    SortProposals(right + 1, end, num_top, proposals);
-}
-template <typename T>
-inline void RetrieveRoIs(
-    const int num_rois,
-    const int roi_batch_ind,
-    const T* proposals,
-    const int64_t* roi_indices,
-    T* rois) {
-  for (int i = 0; i < num_rois; ++i) {
-    const T* proposal = proposals + roi_indices[i] * 5;
-    rois[i * 5 + 0] = (T)roi_batch_ind;
-    rois[i * 5 + 1] = proposal[0];
-    rois[i * 5 + 2] = proposal[1];
-    rois[i * 5 + 3] = proposal[2];
-    rois[i * 5 + 4] = proposal[3];
-  }
-}
-template <typename T>
-inline int roi_level(
-    const int min_level,
-    const int max_level,
-    const int canonical_level,
-    const int canonical_scale,
-    T* roi) {
-  T w = roi[3] - roi[1] + 1;
-  T h = roi[4] - roi[2] + 1;
-  // Refer the settings of paper
-  int level = canonical_level +
-      std::log2(std::max(std::sqrt(w * h), (T)1) / (T)canonical_scale);
-  return std::min(max_level, std::max(min_level, level));
-}
-template <typename T>
-inline void CollectRoIs(
-    const int num_rois,
-    const int min_level,
-    const int max_level,
-    const int canonical_level,
-    const int canonical_scale,
-    const T* rois,
-    vector<vec64_t>& roi_bins) {
-  const T* roi = rois;
-  for (int i = 0; i < num_rois; ++i) {
-    int bin_idx =
-        roi_level(min_level, max_level, canonical_level, canonical_scale, roi);
-    bin_idx = std::max(bin_idx - min_level, 0);
-    roi_bins[bin_idx].push_back(i);
-    roi += 5;
-  }
-}
-template <typename T>
-inline void DistributeRoIs(
-    const vector<vec64_t>& roi_bins,
-    const T* rois,
-    vector<T*> outputs) {
-  for (int i = 0; i < roi_bins.size(); i++) {
-    auto* y = outputs[i];
-    if (roi_bins[i].size() == 0) {
-      // Fake a tiny roi to avoid empty roi pooling
-      y[0] = 0, y[1] = 0, y[2] = 0, y[3] = 1, y[4] = 1;
-    } else {
-      for (int j = 0; j < roi_bins[i].size(); ++j) {
-        const T* roi = rois + roi_bins[i][j] * 5;
-        for (int k = 0; k < 5; ++k)
-          y[k] = roi[k];
-        y += 5;
-      }
-    }
-  }
-}
-/*!
- * NMS API
- */
-template <typename T, class Context>
-void ApplyNMS(
-    const int num_boxes,
-    const int max_keeps,
-    const T thresh,
-    const T* boxes,
-    int64_t* keep_indices,
-    int& num_keep,
-    Context* ctx);
-} // namespace detection
-} // namespace utils
-} // namespace dragon
-#endif // SEETADET_CXX_UTILS_DETECTION_UTILS_H_
--- a/csrc/pyx/_mask.pyx
+++ b/csrc/pyx/_mask.pyx
-# distutils: language = c
-# distutils: sources = ../common/maskApi.c
-#**************************************************************************
-# Microsoft COCO Toolbox.      version 2.0
-# Data, paper, and tutorials available at:  http://mscoco.org/
-# Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
-# Licensed under the Simplified BSD License [see coco/license.txt]
-#**************************************************************************
-__author__ = 'tsungyi'
-import sys
-PYTHON_VERSION = sys.version_info[0]
-# import both Python-level and C-level symbols of Numpy
-# the API uses Numpy to interface C and Python
-import numpy as np
-cimport numpy as np
-from libc.stdlib cimport malloc, free
-# intialized Numpy. must do.
-np.import_array()
-# import numpy C function
-# we use PyArray_ENABLEFLAGS to make Numpy ndarray responsible to memoery management
-cdef extern from "numpy/arrayobject.h":
-    void PyArray_ENABLEFLAGS(np.ndarray arr, int flags)
-# Declare the prototype of the C functions in MaskApi.h
-cdef extern from "maskApi.h":
-    ctypedef unsigned int uint
-    ctypedef unsigned long siz
-    ctypedef unsigned char byte
-    ctypedef double* BB
-    ctypedef struct RLE:
-        siz h,
-        siz w,
-        siz m,
-        uint* cnts,
-    void rlesInit( RLE **R, siz n )
-    void rleEncode( RLE *R, const byte *M, siz h, siz w, siz n )
-    void rleDecode( const RLE *R, byte *mask, siz n )
-    void rleMerge( const RLE *R, RLE *M, siz n, int intersect )
-    void rleArea( const RLE *R, siz n, uint *a )
-    void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o )
-    void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o )
-    void rleToBbox( const RLE *R, BB bb, siz n )
-    void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n )
-    void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w )
-    char* rleToString( const RLE *R )
-    void rleFrString( RLE *R, char *s, siz h, siz w )
-# python class to wrap RLE array in C
-# the class handles the memory allocation and deallocation
-cdef class RLEs:
-    cdef RLE *_R
-    cdef siz _n
-    def __cinit__(self, siz n =0):
-        rlesInit(&self._R, n)
-        self._n = n
-    # free the RLE array here
-    def __dealloc__(self):
-        if self._R is not NULL:
-            for i in range(self._n):
-                free(self._R[i].cnts)
-            free(self._R)
-    def __getattr__(self, key):
-        if key == 'n':
-            return self._n
-        raise AttributeError(key)
-# python class to wrap Mask array in C
-# the class handles the memory allocation and deallocation
-cdef class Masks:
-    cdef byte *_mask
-    cdef siz _h
-    cdef siz _w
-    cdef siz _n
-    def __cinit__(self, h, w, n):
-        self._mask = <byte*> malloc(h*w*n* sizeof(byte))
-        self._h = h
-        self._w = w
-        self._n = n
-    # def __dealloc__(self):
-        # the memory management of _mask has been passed to np.ndarray
-        # it doesn't need to be freed here
-    # called when passing into np.array() and return an np.ndarray in column-major order
-    def __array__(self):
-        cdef np.npy_intp shape[1]
-        shape[0] = <np.npy_intp> self._h*self._w*self._n
-        # Create a 1D array, and reshape it to fortran/Matlab column-major array
-        ndarray = np.PyArray_SimpleNewFromData(1, shape, np.NPY_UINT8, self._mask).reshape((self._h, self._w, self._n), order='F')
-        # The _mask allocated by Masks is now handled by ndarray
-        PyArray_ENABLEFLAGS(ndarray, np.NPY_OWNDATA)
-        return ndarray
-# internal conversion from Python RLEs object to compressed RLE format
-def _toString(RLEs Rs):
-    cdef siz n = Rs.n
-    cdef bytes py_string
-    cdef char* c_string
-    objs = []
-    for i in range(n):
-        c_string = rleToString( <RLE*> &Rs._R[i] )
-        py_string = c_string
-        objs.append({
-            'size': [Rs._R[i].h, Rs._R[i].w],
-            'counts': py_string
-        })
-        free(c_string)
-    return objs
-# internal conversion from compressed RLE format to Python RLEs object
-def _frString(rleObjs):
-    cdef siz n = len(rleObjs)
-    Rs = RLEs(n)
-    cdef bytes py_string
-    cdef char* c_string
-    for i, obj in enumerate(rleObjs):
-        if PYTHON_VERSION == 2:
-            py_string = str(obj['counts']).encode('utf8')
-        elif PYTHON_VERSION == 3:
-            py_string = str.encode(obj['counts']) if type(obj['counts']) == str else obj['counts']
-        else:
-            raise Exception('Python version must be 2 or 3')
-        c_string = py_string
-        rleFrString( <RLE*> &Rs._R[i], <char*> c_string, obj['size'][0], obj['size'][1] )
-    return Rs
-# encode mask to RLEs objects
-# list of RLE string can be generated by RLEs member function
-def encode(np.ndarray[np.uint8_t, ndim=3, mode='fortran'] mask):
-    h, w, n = mask.shape[0], mask.shape[1], mask.shape[2]
-    cdef RLEs Rs = RLEs(n)
-    rleEncode(Rs._R,<byte*>mask.data,h,w,n)
-    objs = _toString(Rs)
-    return objs
-# decode mask from compressed list of RLE string or RLEs object
-def decode(rleObjs):
-    cdef RLEs Rs = _frString(rleObjs)
-    h, w, n = Rs._R[0].h, Rs._R[0].w, Rs._n
-    masks = Masks(h, w, n)
-    rleDecode(<RLE*>Rs._R, masks._mask, n);
-    return np.array(masks)
-def merge(rleObjs, intersect=0):
-    cdef RLEs Rs = _frString(rleObjs)
-    cdef RLEs R = RLEs(1)
-    rleMerge(<RLE*>Rs._R, <RLE*> R._R, <siz> Rs._n, intersect)
-    obj = _toString(R)[0]
-    return obj
-def area(rleObjs):
-    cdef RLEs Rs = _frString(rleObjs)
-    cdef uint* _a = <uint*> malloc(Rs._n* sizeof(uint))
-    rleArea(Rs._R, Rs._n, _a)
-    cdef np.npy_intp shape[1]
-    shape[0] = <np.npy_intp> Rs._n
-    a = np.array((Rs._n, ), dtype=np.uint8)
-    a = np.PyArray_SimpleNewFromData(1, shape, np.NPY_UINT32, _a)
-    PyArray_ENABLEFLAGS(a, np.NPY_OWNDATA)
-    return a
-# iou computation. support function overload (RLEs-RLEs and bbox-bbox).
-def iou( dt, gt, pyiscrowd ):
-    def _preproc(objs):
-        if len(objs) == 0:
-            return objs
-        if type(objs) == np.ndarray:
-            if len(objs.shape) == 1:
-                objs = objs.reshape((objs[0], 1))
-            # check if it's Nx4 bbox
-            if not len(objs.shape) == 2 or not objs.shape[1] == 4:
-                raise Exception('numpy ndarray input is only for *bounding boxes* and should have Nx4 dimension')
-            objs = objs.astype(np.double)
-        elif type(objs) == list:
-            # check if list is in box format and convert it to np.ndarray
-            isbox = np.all(np.array([(len(obj)==4) and ((type(obj)==list) or (type(obj)==np.ndarray)) for obj in objs]))
-            isrle = np.all(np.array([type(obj) == dict for obj in objs]))
-            if isbox:
-                objs = np.array(objs, dtype=np.double)
-                if len(objs.shape) == 1:
-                    objs = objs.reshape((1,objs.shape[0]))
-            elif isrle:
-                objs = _frString(objs)
-            else:
-                raise Exception('list input can be bounding box (Nx4) or RLEs ([RLE])')
-        else:
-            raise Exception('unrecognized type.  The following type: RLEs (rle), np.ndarray (box), and list (box) are supported.')
-        return objs
-    def _rleIou(RLEs dt, RLEs gt, np.ndarray[np.uint8_t, ndim=1] iscrowd, siz m, siz n, np.ndarray[np.double_t,  ndim=1] _iou):
-        rleIou( <RLE*> dt._R, <RLE*> gt._R, m, n, <byte*> iscrowd.data, <double*> _iou.data )
-    def _bbIou(np.ndarray[np.double_t, ndim=2] dt, np.ndarray[np.double_t, ndim=2] gt, np.ndarray[np.uint8_t, ndim=1] iscrowd, siz m, siz n, np.ndarray[np.double_t, ndim=1] _iou):
-        bbIou( <BB> dt.data, <BB> gt.data, m, n, <byte*> iscrowd.data, <double*>_iou.data )
-    def _len(obj):
-        cdef siz N = 0
-        if type(obj) == RLEs:
-            N = obj.n
-        elif len(obj)==0:
-            pass
-        elif type(obj) == np.ndarray:
-            N = obj.shape[0]
-        return N
-    # convert iscrowd to numpy array
-    cdef np.ndarray[np.uint8_t, ndim=1] iscrowd = np.array(pyiscrowd, dtype=np.uint8)
-    # simple type checking
-    cdef siz m, n
-    dt = _preproc(dt)
-    gt = _preproc(gt)
-    m = _len(dt)
-    n = _len(gt)
-    if m == 0 or n == 0:
-        return []
-    if not type(dt) == type(gt):
-        raise Exception('The dt and gt should have the same data type, either RLEs, list or np.ndarray')
-    # define local variables
-    cdef double* _iou = <double*> 0
-    cdef np.npy_intp shape[1]
-    # check type and assign iou function
-    if type(dt) == RLEs:
-        _iouFun = _rleIou
-    elif type(dt) == np.ndarray:
-        _iouFun = _bbIou
-    else:
-        raise Exception('input data type not allowed.')
-    _iou = <double*> malloc(m*n* sizeof(double))
-    iou = np.zeros((m*n, ), dtype=np.double)
-    shape[0] = <np.npy_intp> m*n
-    iou = np.PyArray_SimpleNewFromData(1, shape, np.NPY_DOUBLE, _iou)
-    PyArray_ENABLEFLAGS(iou, np.NPY_OWNDATA)
-    _iouFun(dt, gt, iscrowd, m, n, iou)
-    return iou.reshape((m,n), order='F')
-def toBbox( rleObjs ):
-    cdef RLEs Rs = _frString(rleObjs)
-    cdef siz n = Rs.n
-    cdef BB _bb = <BB> malloc(4*n* sizeof(double))
-    rleToBbox( <const RLE*> Rs._R, _bb, n )
-    cdef np.npy_intp shape[1]
-    shape[0] = <np.npy_intp> 4*n
-    bb = np.array((1,4*n), dtype=np.double)
-    bb = np.PyArray_SimpleNewFromData(1, shape, np.NPY_DOUBLE, _bb).reshape((n, 4))
-    PyArray_ENABLEFLAGS(bb, np.NPY_OWNDATA)
-    return bb
-def frBbox(np.ndarray[np.double_t, ndim=2] bb, siz h, siz w ):
-    cdef siz n = bb.shape[0]
-    Rs = RLEs(n)
-    rleFrBbox( <RLE*> Rs._R, <const BB> bb.data, h, w, n )
-    objs = _toString(Rs)
-    return objs
-def frPoly( poly, siz h, siz w ):
-    cdef np.ndarray[np.double_t, ndim=1] np_poly
-    n = len(poly)
-    Rs = RLEs(n)
-    for i, p in enumerate(poly):
-        np_poly = np.array(p, dtype=np.double, order='F')
-        rleFrPoly( <RLE*>&Rs._R[i], <const double*> np_poly.data, int(len(p)/2), h, w )
-    objs = _toString(Rs)
-    return objs
-def frUncompressedRLE(ucRles, siz h, siz w):
-    cdef np.ndarray[np.uint32_t, ndim=1] cnts
-    cdef RLE R
-    cdef uint *data
-    n = len(ucRles)
-    objs = []
-    for i in range(n):
-        Rs = RLEs(1)
-        cnts = np.array(ucRles[i]['counts'], dtype=np.uint32)
-        # time for malloc can be saved here but it's fine
-        data = <uint*> malloc(len(cnts)* sizeof(uint))
-        for j in range(len(cnts)):
-            data[j] = <uint> cnts[j]
-        R = RLE(ucRles[i]['size'][0], ucRles[i]['size'][1], len(cnts), <uint*> data)
-        Rs._R[0] = R
-        objs.append(_toString(Rs)[0])
-    return objs
-def frPyObjects(pyobj, h, w):
-    # encode rle from a list of python objects
-    if type(pyobj) == np.ndarray:
-        objs = frBbox(pyobj, h, w)
-    elif type(pyobj) == list and len(pyobj[0]) == 4:
-        objs = frBbox(pyobj, h, w)
-    elif type(pyobj) == list and len(pyobj[0]) > 4:
-        objs = frPoly(pyobj, h, w)
-    elif type(pyobj) == list and type(pyobj[0]) == dict \
-        and 'counts' in pyobj[0] and 'size' in pyobj[0]:
-        objs = frUncompressedRLE(pyobj, h, w)
-    # encode rle from single python object
-    elif type(pyobj) == list and len(pyobj) == 4:
-        objs = frBbox([pyobj], h, w)[0]
-    elif type(pyobj) == list and len(pyobj) > 4:
-        objs = frPoly([pyobj], h, w)[0]
-    elif type(pyobj) == dict and 'counts' in pyobj and 'size' in pyobj:
-        objs = frUncompressedRLE([pyobj], h, w)[0]
-    else:
-        raise Exception('input type is not supported.')
-    return objs
\ No newline at end of file
--- a/csrc/pyx/maskApi.c
+++ b/csrc/pyx/maskApi.c
-/**************************************************************************
-* Microsoft COCO Toolbox.      version 2.0
-* Data, paper, and tutorials available at:  http://mscoco.org/
-* Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
-* Licensed under the Simplified BSD License [see coco/license.txt]
-**************************************************************************/
-#include "maskApi.h"
-#include <math.h>
-#include <stdlib.h>
-uint umin( uint a, uint b ) { return (a<b) ? a : b; }
-uint umax( uint a, uint b ) { return (a>b) ? a : b; }
-void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts ) {
-  R->h=h; R->w=w; R->m=m; R->cnts=(m==0)?0:malloc(sizeof(uint)*m);
-  if(cnts) for(siz j=0; j<m; j++) R->cnts[j]=cnts[j];
-}
-void rleFree( RLE *R ) {
-  free(R->cnts); R->cnts=0;
-}
-void rlesInit( RLE **R, siz n ) {
-  *R = (RLE*) malloc(sizeof(RLE)*n);
-  for(siz i=0; i<n; i++) rleInit((*R)+i,0,0,0,0);
-}
-void rlesFree( RLE **R, siz n ) {
-  for(siz i=0; i<n; i++) rleFree((*R)+i); free(*R); *R=0;
-}
-void rleEncode( RLE *R, const byte *M, siz h, siz w, siz n ) {
-  siz i, j, k, a=w*h; uint c, *cnts; byte p;
-  cnts = malloc(sizeof(uint)*(a+1));
-  for(i=0; i<n; i++) {
-    const byte *T=M+a*i; k=0; p=0; c=0;
-    for(j=0; j<a; j++) { if(T[j]!=p) { cnts[k++]=c; c=0; p=T[j]; } c++; }
-    cnts[k++]=c; rleInit(R+i,h,w,k,cnts);
-  }
-  free(cnts);
-}
-void rleDecode( const RLE *R, byte *M, siz n ) {
-  for( siz i=0; i<n; i++ ) {
-    byte v=0; for( siz j=0; j<R[i].m; j++ ) {
-      for( siz k=0; k<R[i].cnts[j]; k++ ) *(M++)=v; v=!v; }}
-}
-void rleMerge( const RLE *R, RLE *M, siz n, bool intersect ) {
-  uint *cnts, c, ca, cb, cc, ct; bool v, va, vb, vp;
-  siz i, a, b, h=R[0].h, w=R[0].w, m=R[0].m; RLE A, B;
-  if(n==0) { rleInit(M,0,0,0,0); return; }
-  if(n==1) { rleInit(M,h,w,m,R[0].cnts); return; }
-  cnts = malloc(sizeof(uint)*(h*w+1));
-  for( a=0; a<m; a++ ) cnts[a]=R[0].cnts[a];
-  for( i=1; i<n; i++ ) {
-    B=R[i]; if(B.h!=h||B.w!=w) { h=w=m=0; break; }
-    rleInit(&A,h,w,m,cnts); ca=A.cnts[0]; cb=B.cnts[0];
-    v=va=vb=0; m=0; a=b=1; cc=0; ct=1;
-    while( ct>0 ) {
-      c=umin(ca,cb); cc+=c; ct=0;
-      ca-=c; if(!ca && a<A.m) { ca=A.cnts[a++]; va=!va; } ct+=ca;
-      cb-=c; if(!cb && b<B.m) { cb=B.cnts[b++]; vb=!vb; } ct+=cb;
-      vp=v; if(intersect) v=va&&vb; else v=va||vb;
-      if( v!=vp||ct==0 ) { cnts[m++]=cc; cc=0; }
-    }
-    rleFree(&A);
-  }
-  rleInit(M,h,w,m,cnts); free(cnts);
-}
-void rleArea( const RLE *R, siz n, uint *a ) {
-  for( siz i=0; i<n; i++ ) {
-    a[i]=0; for( siz j=1; j<R[i].m; j+=2 ) a[i]+=R[i].cnts[j]; }
-}
-void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o ) {
-  siz g, d; BB db, gb; bool crowd;
-  db=malloc(sizeof(double)*m*4); rleToBbox(dt,db,m);
-  gb=malloc(sizeof(double)*n*4); rleToBbox(gt,gb,n);
-  bbIou(db,gb,m,n,iscrowd,o); free(db); free(gb);
-  for( g=0; g<n; g++ ) for( d=0; d<m; d++ ) if(o[g*m+d]>0) {
-    crowd=iscrowd!=NULL && iscrowd[g];
-    if(dt[d].h!=gt[g].h || dt[d].w!=gt[g].w) { o[g*m+d]=-1; continue; }
-    siz ka, kb, a, b; uint c, ca, cb, ct, i, u; bool va, vb;
-    ca=dt[d].cnts[0]; ka=dt[d].m; va=vb=0;
-    cb=gt[g].cnts[0]; kb=gt[g].m; a=b=1; i=u=0; ct=1;
-    while( ct>0 ) {
-      c=umin(ca,cb); if(va||vb) { u+=c; if(va&&vb) i+=c; } ct=0;
-      ca-=c; if(!ca && a<ka) { ca=dt[d].cnts[a++]; va=!va; } ct+=ca;
-      cb-=c; if(!cb && b<kb) { cb=gt[g].cnts[b++]; vb=!vb; } ct+=cb;
-    }
-    if(i==0) u=1; else if(crowd) rleArea(dt+d,1,&u);
-    o[g*m+d] = (double)i/(double)u;
-  }
-}
-void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o ) {
-  double h, w, i, u, ga, da; siz g, d; bool crowd;
-  for( g=0; g<n; g++ ) {
-    BB G=gt+g*4; ga=G[2]*G[3]; crowd=iscrowd!=NULL && iscrowd[g];
-    for( d=0; d<m; d++ ) {
-      BB D=dt+d*4; da=D[2]*D[3]; o[g*m+d]=0;
-      w=fmin(D[2]+D[0],G[2]+G[0])-fmax(D[0],G[0]); if(w<=0) continue;
-      h=fmin(D[3]+D[1],G[3]+G[1])-fmax(D[1],G[1]); if(h<=0) continue;
-      i=w*h; u = crowd ? da : da+ga-i; o[g*m+d]=i/u;
-    }
-  }
-}
-void rleToBbox( const RLE *R, BB bb, siz n ) {
-  for( siz i=0; i<n; i++ ) {
-    uint h, w, x, y, xs, ys, xe, ye, cc, t; siz j, m;
-    h=(uint)R[i].h; w=(uint)R[i].w; m=R[i].m;
-    m=((siz)(m/2))*2; xs=w; ys=h; xe=ye=0; cc=0;
-    if(m==0) { bb[4*i+0]=bb[4*i+1]=bb[4*i+2]=bb[4*i+3]=0; continue; }
-    for( j=0; j<m; j++ ) {
-      cc+=R[i].cnts[j]; t=cc-j%2; y=t%h; x=(t-y)/h;
-      xs=umin(xs,x); xe=umax(xe,x); ys=umin(ys,y); ye=umax(ye,y);
-    }
-    bb[4*i+0]=xs; bb[4*i+2]=xe-xs+1;
-    bb[4*i+1]=ys; bb[4*i+3]=ye-ys+1;
-  }
-}
-void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n ) {
-  for( siz i=0; i<n; i++ ) {
-    double xs=bb[4*i+0], xe=xs+bb[4*i+2];
-    double ys=bb[4*i+1], ye=ys+bb[4*i+3];
-    double xy[8] = {xs,ys,xs,ye,xe,ye,xe,ys};
-    rleFrPoly( R+i, xy, 4, h, w );
-  }
-}
-int uintCompare(const void *a, const void *b) {
-  uint c=*((uint*)a), d=*((uint*)b); return c>d?1:c<d?-1:0;
-}
-void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w ) {
-  // upsample and get discrete points densely along entire boundary
-  siz j, m=0; double scale=5; int *x, *y, *u, *v; uint *a, *b;
-  x=malloc(sizeof(int)*(k+1)); y=malloc(sizeof(int)*(k+1));
-  for(j=0; j<k; j++) x[j]=(int)(scale*xy[j*2+0]+.5); x[k]=x[0];
-  for(j=0; j<k; j++) y[j]=(int)(scale*xy[j*2+1]+.5); y[k]=y[0];
-  for(j=0; j<k; j++) m+=umax(abs(x[j]-x[j+1]),abs(y[j]-y[j+1]))+1;
-  u=malloc(sizeof(int)*m); v=malloc(sizeof(int)*m); m=0;
-  for( j=0; j<k; j++ ) {
-    int xs=x[j], xe=x[j+1], ys=y[j], ye=y[j+1], dx, dy, t;
-    bool flip; double s; dx=abs(xe-xs); dy=abs(ys-ye);
-    flip = (dx>=dy && xs>xe) || (dx<dy && ys>ye);
-    if(flip) { t=xs; xs=xe; xe=t; t=ys; ys=ye; ye=t; }
-    s = dx>=dy ? (double)(ye-ys)/dx : (double)(xe-xs)/dy;
-    if(dx>=dy) for( int d=0; d<=dx; d++ ) {
-      t=flip?dx-d:d; u[m]=t+xs; v[m]=(int)(ys+s*t+.5); m++;
-    } else for( int d=0; d<=dy; d++ ) {
-      t=flip?dy-d:d; v[m]=t+ys; u[m]=(int)(xs+s*t+.5); m++;
-    }
-  }
-  // get points along y-boundary and downsample
-  free(x); free(y); k=m; m=0; double xd, yd;
-  x=malloc(sizeof(int)*k); y=malloc(sizeof(int)*k);
-  for( j=1; j<k; j++ ) if(u[j]!=u[j-1]) {
-    xd=(double)(u[j]<u[j-1]?u[j]:u[j]-1); xd=(xd+.5)/scale-.5;
-    if( floor(xd)!=xd || xd<0 || xd>w-1 ) continue;
-    yd=(double)(v[j]<v[j-1]?v[j]:v[j-1]); yd=(yd+.5)/scale-.5;
-    if(yd<0) yd=0; else if(yd>h) yd=h; yd=ceil(yd);
-    x[m]=(int) xd; y[m]=(int) yd; m++;
-  }
-  // compute rle encoding given y-boundary points
-  k=m; a=malloc(sizeof(uint)*(k+1));
-  for( j=0; j<k; j++ ) a[j]=(uint)(x[j]*(int)(h)+y[j]);
-  a[k++]=(uint)(h*w); free(u); free(v); free(x); free(y);
-  qsort(a,k,sizeof(uint),uintCompare); uint p=0;
-  for( j=0; j<k; j++ ) { uint t=a[j]; a[j]-=p; p=t; }
-  b=malloc(sizeof(uint)*k); j=m=0; b[m++]=a[j++];
-  while(j<k) if(a[j]>0) b[m++]=a[j++]; else {
-    j++; if(j<k) b[m-1]+=a[j++]; }
-  rleInit(R,h,w,m,b); free(a); free(b);
-}
-char* rleToString( const RLE *R ) {
-  // Similar to LEB128 but using 6 bits/char and ascii chars 48-111.
-  siz i, m=R->m, p=0; long x; bool more;
-  char *s=malloc(sizeof(char)*m*6);
-  for( i=0; i<m; i++ ) {
-    x=(long) R->cnts[i]; if(i>2) x-=(long) R->cnts[i-2]; more=1;
-    while( more ) {
-      char c=x & 0x1f; x >>= 5; more=(c & 0x10) ? x!=-1 : x!=0;
-      if(more) c |= 0x20; c+=48; s[p++]=c;
-    }
-  }
-  s[p]=0; return s;
-}
-void rleFrString( RLE *R, char *s, siz h, siz w ) {
-  siz m=0, p=0, k; long x; bool more; uint *cnts;
-  while( s[m] ) m++; cnts=malloc(sizeof(uint)*m); m=0;
-  while( s[p] ) {
-    x=0; k=0; more=1;
-    while( more ) {
-      char c=s[p]-48; x |= (c & 0x1f) << 5*k;
-      more = c & 0x20; p++; k++;
-      if(!more && (c & 0x10)) x |= -1 << 5*k;
-    }
-    if(m>2) x+=(long) cnts[m-2]; cnts[m++]=(uint) x;
-  }
-  rleInit(R,h,w,m,cnts); free(cnts);
-}
--- a/csrc/pyx/maskApi.h
+++ b/csrc/pyx/maskApi.h
-/**************************************************************************
-* Microsoft COCO Toolbox.      version 2.0
-* Data, paper, and tutorials available at:  http://mscoco.org/
-* Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
-* Licensed under the Simplified BSD License [see coco/license.txt]
-**************************************************************************/
-#pragma once
-#include <stdbool.h>
-typedef unsigned int uint;
-typedef unsigned long siz;
-typedef unsigned char byte;
-typedef double* BB;
-typedef struct { siz h, w, m; uint *cnts; } RLE;
-// Initialize/destroy RLE.
-void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts );
-void rleFree( RLE *R );
-// Initialize/destroy RLE array.
-void rlesInit( RLE **R, siz n );
-void rlesFree( RLE **R, siz n );
-// Encode binary masks using RLE.
-void rleEncode( RLE *R, const byte *mask, siz h, siz w, siz n );
-// Decode binary masks encoded via RLE.
-void rleDecode( const RLE *R, byte *mask, siz n );
-// Compute union or intersection of encoded masks.
-void rleMerge( const RLE *R, RLE *M, siz n, bool intersect );
-// Compute area of encoded masks.
-void rleArea( const RLE *R, siz n, uint *a );
-// Compute intersection over union between masks.
-void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o );
-// Compute intersection over union between bounding boxes.
-void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o );
-// Get bounding boxes surrounding encoded masks.
-void rleToBbox( const RLE *R, BB bb, siz n );
-// Convert bounding boxes to encoded masks.
-void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n );
-// Convert polygon to encoded mask.
-void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w );
-// Get compressed string representation of encoded mask.
-char* rleToString( const RLE *R );
-// Convert from compressed string representation of encoded mask.
-void rleFrString( RLE *R, char *s, siz h, siz w );
--- a/csrc/pyx/setup.py
+++ b/csrc/pyx/setup.py
@@ -8,7 +8,7 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
-"""Compile the cython extensions."""
+"""Build cython extensions."""
 from __future__ import absolute_import
 from __future__ import division
@@ -16,34 +16,25 @@ from __future__ import print_function
 from distutils.extension import Extension
 from distutils.core import setup
-import os
 from Cython.Distutils import build_ext
 import numpy as np
 ext_modules = [
    Extension(
-        'install.lib.utils.cython_bbox',
+        'seetadet.utils.bbox.cython_bbox',
        ['cython_bbox.pyx'],
        extra_compile_args=['-w'],
-        include_dirs=[np.get_include()]
+        include_dirs=[np.get_include()],
    ),
    Extension(
-        'install.lib.utils.cython_nms',
+        'seetadet.utils.nms.cython_nms',
        ['cython_nms.pyx'],
        extra_compile_args=['-w'],
-        include_dirs=[np.get_include()]
+        include_dirs=[np.get_include()],
-    ),
-    Extension(
-        'install.lib.utils.pycocotools._mask',
-        ['maskApi.c', '_mask.pyx'],
-        include_dirs=[np.get_include(), os.path.dirname(os.path.abspath(__file__))],
-        extra_compile_args=['-w']
    ),
 ]
-setup(
+setup(name='seetadet',
-    name='SeetaDet',
      ext_modules=ext_modules,
-    cmdclass={'build_ext': build_ext},
+      cmdclass={'build_ext': build_ext})
-)
--- a/data/datasets/README.md
+++ b/data/datasets/README.md
+# Datasets
+## Introduction
+This folder is kept for the record and json datasets.
+Please prepare the datasets following the [documentation](../../scripts/datasets/README.md).
--- a/data/images/README.md
+++ b/data/images/README.md
+# Demo Images
+## Introduction
+This folder is kept for the demo images.
--- a/data/images/coco_val2017_000000001000.jpg
+++ b/data/images/coco_val2017_000000001000.jpg
--- a/data/images/coco_val2017_000000002157.jpg
+++ b/data/images/coco_val2017_000000002157.jpg
--- a/data/images/coco_val2017_000000013201.jpg
+++ b/data/images/coco_val2017_000000013201.jpg
--- a/data/images/coco_val2017_000000015254.jpg
+++ b/data/images/coco_val2017_000000015254.jpg
--- a/data/images/coco_val2017_000000015497.jpg
+++ b/data/images/coco_val2017_000000015497.jpg
--- a/data/pretrained/README.md
+++ b/data/pretrained/README.md
+# Pretrained Models
+## Introduction
+This folder is kept for the pretrained models.
+## ImageNet Pretrained Models
+### Training settings
+- ResNet models trained with 200 epochs follow the procedure in arXiv.1812.01187.
+### ResNet
+| Model | Lr sched | Acc@1 | Acc@5 | Source |
+| :---: | :------: | :---: | :---: | :----: |
+| [R-50](https://dragon.seetatech.com/download/seetadet/pretrained/R-50_in1k_cls90e.pkl) | 90e | 76.53 | 93.16 | Ours |
+| [R-50](https://dragon.seetatech.com/download/seetadet/pretrained/R-50_in1k_cls200e.pkl) | 200e | 78.64 | 94.30 | Ours |
+### MobileNet
+| Model | Lr sched | Acc@1 | Acc@5 | Source |
+| :---: | :------: | :---: | :---: | :----: |
+| [MobileNetV2](https://dragon.seetatech.com/download/seetadet/pretrained/MobileNetV2_in1k_cls300e.pkl) | 300e | 71.88 | 90.29 | TorchVision |
+| [MobileNetV3L](https://dragon.seetatech.com/download/seetadet/pretrained/MobileNetV3L_in1k_cls600e.pkl) | 600e | 74.04 | 91.34 | TorchVision |
+### VGG
+| Model | Lr sched | Acc@1 | Acc@5 | Source |
+| :---: | :------: | :---: | :---: | :----: |
+| [VGG-16-FCN](https://dragon.seetatech.com/download/seetadet/pretrained/VGG-16-FCN_in1k.pkl) | - | - | - | weiliu89 |
--- a/scripts/coco/im2rec.py
+++ b/scripts/coco/im2rec.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""Make record file for COCO dataset."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os
-import shutil
-from maker import make_record
-from roidb import make_database
-if __name__ == '__main__':
-    COCO_ROOT = '/data'
-    # Encode masks to RLE bytes
-    if not os.path.exists('build'):
-        os.makedirs('build')
-    make_database('train', '2017', COCO_ROOT)
-    make_database('val', '2017', COCO_ROOT)
-    # coco_2017_train
-    make_record(
-        db_file='build/coco_2017_train.db.pkl',
-        record_file=os.path.join(COCO_ROOT, 'coco_2017_train'),
-        images_path=[os.path.join(COCO_ROOT, 'images/train2017')],
-        splits_path=[os.path.join(COCO_ROOT, 'splits')],
-        splits=['train2017'],
-    )
-    # coco_2017_val
-    make_record(
-        db_file='build/coco_2017_val.db.pkl',
-        record_file=os.path.join(COCO_ROOT, 'coco_2017_val'),
-        images_path=[os.path.join(COCO_ROOT, 'images/val2017')],
-        splits_path=[os.path.join(COCO_ROOT, 'splits')],
-        splits=['val2017'],
-    )
-    shutil.rmtree('build')
--- a/scripts/coco/maker.py
+++ b/scripts/coco/maker.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-import os
-import pickle
-import time
-import cv2
-import dragon
-import numpy as np
-def make_example(image_file, objects, im_scale=None):
-    filename = os.path.split(image_file)[-1]
-    example = {'id': filename.split('.')[0], 'object': []}
-    if im_scale:
-        img = cv2.imread(image_file)
-        img = cv2.resize(
-            img, None,
-            fx=im_scale, fy=im_scale,
-            interpolation=cv2.INTER_LINEAR,
-        )
-        example['height'], example['width'], example['depth'] = img.shape
-        _, img = cv2.imencode('.jpg', img, [int(cv2.IMWRITE_JPEG_QUALITY), 95])
-        example['content'] = img.tostring()
-    else:
-        with open(image_file, 'rb') as f:
-            img_bytes = bytes(f.read())
-        img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 3)
-        example['height'], example['width'], example['depth'] = img.shape
-        example['content'] = img_bytes
-    for obj in objects:
-        x1, y1, x2, y2 = obj['bbox']
-        example['object'].append({
-            'name': obj['name'],
-            'xmin': x1,
-            'ymin': y1,
-            'xmax': x2,
-            'ymax': y2,
-            'mask': obj['mask'],
-            'polygons': obj['polygons'],
-            'difficult': obj.get('crowd', 0),
-        })
-    return example
-def make_record(
-    record_file,
-    images_path,
-    db_file,
-    splits_path,
-    splits,
-    ext='.jpg',
-    im_scale=None,
-):
-    if os.path.exists(record_file):
-        raise ValueError('The record file is already exist.')
-    os.makedirs(record_file)
-    if not isinstance(images_path, list):
-        images_path = [images_path]
-    if not isinstance(splits_path, list):
-        splits_path = [splits_path]
-    assert len(splits) == len(splits_path)
-    assert len(splits) == len(images_path)
-    if db_file is not None:
-        with open(db_file, 'rb') as f:
-            all_entries = pickle.load(f)
-    else:
-        all_entries = {}
-    print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
-    writer = dragon.io.KPLRecordWriter(
-        path=record_file,
-        protocol={
-            'id': 'string',
-            'content': 'bytes',
-            'height': 'int64',
-            'width': 'int64',
-            'depth': 'int64',
-            'object': [{
-                'name': 'string',
-                'xmin': 'float64',
-                'ymin': 'float64',
-                'xmax': 'float64',
-                'ymax': 'float64',
-                'mask': 'bytes',
-                'polygons': [['float64']],
-                'difficult': 'int64',
-            }]
-        }
-    )
-    count, total_line = 0, 0
-    start_time = time.time()
-    for db_idx, split in enumerate(splits):
-        split_file = os.path.join(splits_path[db_idx], split + '.txt')
-        if not os.path.exists(split_file):
-            # Fallback to try if split provided as json format
-            split_file = os.path.join(splits_path[db_idx], split + '.json')
-            if not os.path.exists(split_file):
-                raise FileNotFoundError('Unable to find the split:', split)
-            with open(split_file, 'r') as f:
-                import json
-                images_info = json.load(f)
-                total_line = len(images_info['images'])
-                lines = []
-                for info in images_info['images']:
-                    lines.append(os.path.splitext(info['file_name'])[0])
-        else:
-            with open(split_file, 'r') as f:
-                lines = f.readlines()
-                total_line += len(lines)
-        for line in lines:
-            count += 1
-            if count % 2000 == 0:
-                now_time = time.time()
-                print('{} / {} in {:.2f} sec'.format(
-                    count, total_line, now_time - start_time))
-            filename = line.strip()
-            image_file = os.path.join(images_path[db_idx], filename + ext)
-            objects = all_entries[filename] if filename in all_entries else {}
-            writer.write(make_example(image_file, objects, im_scale))
-    now_time = time.time()
-    print('{} / {} in {:.2f} sec'.format(count, total_line, now_time - start_time))
-    writer.close()
-    end_time = time.time()
-    data_size = os.path.getsize(record_file + '/root.data') * 1e-6
-    print('{} images take {:.2f} MB in {:.2f} sec.'
-          .format(total_line, data_size, end_time - start_time))
--- a/scripts/coco/roidb.py
+++ b/scripts/coco/roidb.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import os
-import os.path as osp
-import pickle
-from seetadet.utils.pycocotools import mask_utils
-from seetadet.utils.pycocotools.coco import COCO
-class COCOWrapper(object):
-    def __init__(self, image_set, year, data_dir):
-        self._year = year
-        self._image_set = image_set
-        self._data_path = osp.join(data_dir)
-        self.invalid_cnt = 0
-        self.ignore_cnt = 0
-        # Load COCO API, classes, class <-> id mappings
-        self._COCO = COCO(self._get_ann_file())
-        cats = self._COCO.loadCats(self._COCO.getCatIds())
-        self._classes = tuple(['__background__'] + [c['name'] for c in cats])
-        self._class_to_ind = dict(zip(self._classes, range(self.num_classes)))
-        self._ind_to_class = dict(zip(range(self.num_classes), self._classes))
-        self._class_to_cat_id = dict(zip([c['name'] for c in cats], self._COCO.getCatIds()))
-        self._cat_id_to_class_id = dict([(self._class_to_cat_id[cls], self._class_to_ind[cls])
-                                         for cls in self._classes[1:]])
-        self._data_name = {
-            # 5k ``val2014`` subset
-            'minival2014': 'val2014',
-            # ``val2014`` minus ``minival2014``
-            'valminusminival2014': 'val2014',
-        }.get(image_set + year, image_set + year)
-        self._image_index = self._load_image_set_index()
-        self._annotations = self._load_annotations()
-    def _get_ann_file(self):
-        prefix = 'instances' \
-            if self._image_set.find('test') == -1 \
-            else 'image_info'
-        return osp.join(
-            self._data_path,
-            'annotations',
-            prefix + '_' +
-            self._image_set +
-            self._year + '.json'
-        )
-    def _load_image_set_index(self):
-        """Load image ids."""
-        image_ids = self._COCO.getImgIds()
-        return image_ids
-    def _load_annotations(self):
-        """Load annotations."""
-        annotations = [self._load_coco_annotation(index)
-                       for index in self._image_index]
-        return annotations
-    def image_path_from_index(self, index):
-        """Construct an image path from the image's "index" identifier."""
-        # Example image path for index=119993:
-        # images/train2014/COCO_train2014_000000119993.jpg
-        # images/train2017/000000119993.jpg
-        filename = str(index).zfill(12) + '.jpg'
-        if '2014' in self._data_name:
-            filename = 'COCO_{}_{}'.format(self._data_name, filename)
-        image_path = osp.join(self._data_path, 'images',
-                              self._data_name, filename)
-        assert osp.exists(image_path), \
-            'Path does not exist: {}'.format(image_path)
-        return image_path
-    def image_path_at(self, i):
-        """Return the absolute path to image i in the image sequence."""
-        return self.image_path_from_index(self._image_index[i])
-    def annotation_at(self, i):
-        """Return the absolute path to image i in the image sequence."""
-        return self._annotations[i]
-    def _load_coco_annotation(self, index):
-        """Loads COCO bounding-box instance annotations."""
-        im_ann = self._COCO.loadImgs(index)[0]
-        width, height = im_ann['width'], im_ann['height']
-        ann_ids = self._COCO.getAnnIds(imgIds=index, iscrowd=None)
-        objects = self._COCO.loadAnns(ann_ids)
-        # Sanitize boxes -- some are invalid
-        valid_objects = []
-        mask, polygons = b'', []
-        for obj in objects:
-            x1 = float(max(0, obj['bbox'][0]))
-            y1 = float(max(0, obj['bbox'][1]))
-            x2 = float(min(width - 1, x1 + max(0, obj['bbox'][2] - 1)))
-            y2 = float(min(height - 1, y1 + max(0, obj['bbox'][3] - 1)))
-            if isinstance(obj['segmentation'], list):
-                for p in obj['segmentation']:
-                    if len(p) < 6:
-                        print('Remove Invalid segm.')
-                # Valid polygons have >= 3 points, so require >= 6 coordinates
-                polygons = [p for p in obj['segmentation'] if len(p) >= 6]
-            else:
-                # Crowd masks
-                # Some are encoded with height or width
-                # running out of the image bound
-                # Do not use them or decoding error is inevitable
-                mask = mask_utils.poly2bytes(obj['segmentation'], height, width)
-            if obj['area'] > 0 and x2 > x1 and y2 > y1:
-                obj['clean_bbox'] = [x1, y1, x2, y2]
-                valid_objects.append({
-                    'bbox': [x1, y1, x2, y2],
-                    'mask': mask,
-                    'polygons': polygons,
-                    'category_id': obj['category_id'],
-                    'class_id': self._cat_id_to_class_id[obj['category_id']],
-                    'crowd': obj['iscrowd'],
-                })
-                valid_objects[-1]['name'] = \
-                    self._ind_to_class[valid_objects[-1]['class_id']]
-        return height, width, valid_objects
-    @property
-    def num_images(self):
-        return len(self._image_index)
-    @property
-    def num_classes(self):
-        return len(self._classes)
-def make_database(split, year, data_dir):
-    coco = COCOWrapper(split, year, data_dir)
-    print('Preparing to make split: {}, total {} images'
-          .format(split, coco.num_images))
-    if not osp.exists(osp.join(coco._data_path, 'splits')):
-        os.makedirs(osp.join(coco._data_path, 'splits'))
-    entries = collections.OrderedDict()
-    for i in range(coco.num_images):
-        filename = osp.basename(coco.image_path_at(i)).split('.')[0]
-        h, w, objects = coco.annotation_at(i)
-        entries[filename] = objects
-    with open(osp.join('build',
-                       'coco_' + year + '_' + split +
-                       '.db.pkl'), 'wb') as f:
-        pickle.dump(entries, f, pickle.HIGHEST_PROTOCOL)
-    with open(osp.join(coco._data_path, 'splits',
-                       split + year + '.txt'), 'w') as f:
-        for i in range(coco.num_images):
-            filename = str(osp.basename(coco.image_path_at(i)).split('.')[0])
-            if i != coco.num_images - 1:
-                filename += '\n'
-            f.write(filename)
-def merge_database(split, year, db_files):
-    entries = collections.OrderedDict()
-    data_path = os.path.dirname(db_files[0])
-    for db_file in db_files:
-        with open(db_file, 'rb') as f:
-            entries = pickle.load(f)
-            entries.update(entries)
-    with open(osp.join(data_path,
-                       'coco_' + year + '_' + split +
-                       '.db.pkl'), 'wb') as f:
-        pickle.dump(entries, f, pickle.HIGHEST_PROTOCOL)
--- a/scripts/datasets/README.md
+++ b/scripts/datasets/README.md
+# Prepare Datasets
+## Create Datasets for PASCAL VOC
+We assume that raw dataset has the following structure:
+```
+VOC<year>
+|_ JPEGImages
+|  |_ <im-1-name>.jpg
+|  |_ ...
+|  |_ <im-N-name>.jpg
+|_ Annotations
+|  |_ <im-1-name>.xml
+|  |_ ...
+|  |_ <im-N-name>.xml
+|_ ImageSets
+|  |_ Main
+|  |  |_ trainval.txt
+|  |  |_ test.txt
+|  |  |_ ...
+```
+Create record and json dataset by:
+```
+python pascal_voc.py \
+  --rec /path/to/datasets/voc_trainval0712 \
+  --gt /path/to/datasets/voc_trainval0712.json \
+  --images /path/to/VOC2007/JPEGImages \
+           /path/to/VOC2012/JPEGImages \
+  --annotations /path/to/VOC2007/Annotations \
+                /path/to/VOC2012/Annotations \
+  --splits /path/to/VOC2007/ImageSets/Main/trainval.txt \
+           /path/to/VOC2012/ImageSets/Main/trainval.txt
+```
+## Create Datasets for COCO
+We assume that raw dataset has the following structure:
+```
+COCO
+|_ images
+|  |_ train2017
+|  |  |_ <im-1-name>.jpg
+|  |  |_ ...
+|  |  |_ <im-N-name>.jpg
+|_ annotations
+|  |_ instances_train2017.json
+|  |_ ...
+```
+Create record dataset by:
+```
+python coco.py \
+  --rec /path/to/datasets/coco_train2017 \
+  --images /path/to/COCO/images/train2017 \
+  --annotations /path/to/COCO/annotations/instances_train2017.json
+```
--- a/scripts/datasets/coco.py
+++ b/scripts/datasets/coco.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Prepare MS COCO datasets."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import os
+import sys
+import time
+import dragon
+from pycocotools.coco import COCO
+from pycocotools.mask import frPyObjects
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Prepare PASCAL VOC datasets')
+    parser.add_argument(
+        '--rec',
+        default=None,
+        help='path to write record dataset')
+    parser.add_argument(
+        '--images',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of images folder')
+    parser.add_argument(
+        '--annotations',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of annotations folder')
+    parser.add_argument(
+        '--splits',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of split file')
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+def make_example(img_id, img_file, cocoGt):
+    """Return the record example."""
+    img_meta = cocoGt.imgs[img_id]
+    img_anns = cocoGt.loadAnns(cocoGt.getAnnIds(imgIds=[img_id]))
+    cat_id_to_cat = dict((v['id'], v['name'])
+                         for v in cocoGt.cats.values())
+    with open(img_file, 'rb') as f:
+        img_bytes = bytes(f.read())
+    height, width = img_meta['height'], img_meta['width']
+    example = {'id': str(img_id), 'height': height, 'width': width,
+               'depth': 3, 'content': img_bytes, 'object': []}
+    for ann in img_anns:
+        x1 = float(max(0, ann['bbox'][0]))
+        y1 = float(max(0, ann['bbox'][1]))
+        x2 = float(min(width - 1, x1 + max(0, ann['bbox'][2] - 1)))
+        y2 = float(min(height - 1, y1 + max(0, ann['bbox'][3] - 1)))
+        mask, polygons = b'', []
+        segm = ann.get('segmentation', None)
+        if segm is not None and isinstance(segm, list):
+            for p in ann['segmentation']:
+                if len(p) < 6:
+                    print('Remove Invalid segm.')
+            # Valid polygons have >= 3 points, so require >= 6 coordinates
+            polygons = [p for p in ann['segmentation'] if len(p) >= 6]
+        elif segm is not None:
+            # Crowd masks.
+            # Some are encoded with wrong height or width.
+            # Do not use them or decoding error is inevitable.
+            rle = frPyObjects(ann['segmentation'], height, width)
+            assert type(rle) == dict
+            mask = rle['counts']
+        example['object'].append({
+            'name': cat_id_to_cat[ann['category_id']],
+            'xmin': x1, 'ymin': y1, 'xmax': x2, 'ymax': y2,
+            'mask': mask, 'polygons': polygons,
+            'difficult': ann.get('iscrowd', 0)})
+    return example
+def write_dataset(args):
+    assert len(args.images) == len(args.annotations)
+    if os.path.exists(args.rec):
+        raise ValueError('The record path is already exist.')
+    os.makedirs(args.rec)
+    print('Write record dataset to {}'.format(args.rec))
+    writer = dragon.io.KPLRecordWriter(
+        path=args.rec,
+        protocol={
+            'id': 'string',
+            'content': 'bytes',
+            'height': 'int64',
+            'width': 'int64',
+            'depth': 'int64',
+            'object': [{
+                'name': 'string',
+                'xmin': 'float64',
+                'ymin': 'float64',
+                'xmax': 'float64',
+                'ymax': 'float64',
+                'mask': 'bytes',
+                'polygons': [['float64']],
+                'difficult': 'int64',
+            }]
+        }
+    )
+    # Scan all available entries.
+    print('Scan entries...')
+    entries, cocoGts = [], []
+    for ann_file in args.annotations:
+        cocoGts.append(COCO(ann_file))
+    if args.splits is not None:
+        assert len(args.splits) == len(args.images)
+        for i, split in enumerate(args.splits):
+            f = open(split, 'r')
+            for line in f.readlines():
+                filename = line.strip()
+                img_id = int(filename)
+                img_file = os.path.join(args.images[i], filename + '.jpg')
+                entries.append((img_id, img_file, cocoGts[i]))
+            f.close()
+    else:
+        for i, cocoGt in enumerate(cocoGts):
+            for info in cocoGt.imgs.values():
+                img_id = info['id']
+                img_file = os.path.join(args.images[i], info['file_name'])
+                entries.append((img_id, img_file, cocoGts[i]))
+    print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
+    start_time = time.time()
+    for i, entry in enumerate(entries):
+        if i > 0 and i % 2000 == 0:
+            now_time = time.time()
+            print('{} / {} in {:.2f} sec'.format(
+                i, len(entries), now_time - start_time))
+        writer.write(make_example(*entry))
+    now_time = time.time()
+    print('{} / {} in {:.2f} sec'.format(
+        len(entries), len(entries), now_time - start_time))
+    writer.close()
+    end_time = time.time()
+    data_size = os.path.getsize(args.rec + '/root.data') * 1e-6
+    print('{} images take {:.2f} MB in {:.2f} sec.'
+          .format(len(entries), data_size, end_time - start_time))
+if __name__ == '__main__':
+    args = parse_args()
+    if args.rec is not None:
+        write_dataset(args)
--- a/scripts/datasets/json_dataset.py
+++ b/scripts/datasets/json_dataset.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Prepare JSON datasets."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import json
+import os
+import sys
+import dragon
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Prepare PASCAL VOC datasets')
+    parser.add_argument(
+        '--rec',
+        default=None,
+        help='path to read record')
+    parser.add_argument(
+        '--gt',
+        default=None,
+        help='path to write json ground-truth')
+    parser.add_argument(
+        '--categories',
+        nargs='+',
+        type=str,
+        default=None,
+        help='dataset object categories')
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+def get_image_id(image_name):
+    image_id = image_name.split('_')[-1].split('.')[0]
+    try:
+        return int(image_id)
+    except ValueError:
+        return image_name
+def write_dataset(args):
+    dataset = {'images': [], 'categories': [], 'annotations': []}
+    kpl_dataset = dragon.io.KPLRecordDataset(args.rec)
+    cat_to_cat_id = dict(zip(args.categories,
+                             range(1, len(args.categories) + 1)))
+    print('Writing json dataset to {}'.format(args.gt))
+    for cat in args.categories:
+        dataset['categories'].append({
+            'name': cat, 'id': cat_to_cat_id[cat]})
+    for _ in range(len(kpl_dataset)):
+        example = kpl_dataset.get()
+        image_id = get_image_id(example['id'])
+        dataset['images'].append({
+            'id': image_id, 'height': example['height'],
+            'width': example['width']})
+        for obj in example['object']:
+            if 'x2' in obj:
+                x1, y1, x2, y2 = obj['x1'], obj['y1'], obj['x2'], obj['y2']
+            elif 'xmin' in obj:
+                x1, y1, x2, y2 = obj['xmin'], obj['ymin'], obj['xmax'], obj['ymax']
+            else:
+                x1, y1, x2, y2 = obj['bbox']
+            w, h = x2 - x1 + 1, y2 - y1 + 1
+            dataset['annotations'].append({
+                'id': str(len(dataset['annotations'])),
+                'bbox': [x1, y1, w, h],
+                'area': w * h,
+                'iscrowd': obj.get('difficult', 0),
+                'image_id': image_id,
+                'category_id': cat_to_cat_id[obj['name']]})
+    with open(args.gt, 'w') as f:
+        json.dump(dataset, f)
+if __name__ == '__main__':
+    args = parse_args()
+    if args.rec is None or not os.path.exists(args.rec):
+        raise ValueError('Specify the prepared record dataset.')
+    if args.gt is None:
+        raise ValueError('Specify the path to write json dataset.')
+    write_dataset(args)
--- a/scripts/voc/maker.py
+++ b/scripts/voc/maker.py
@@ -8,27 +8,67 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Prepare PASCAL VOC datasets."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
+import argparse
 import os
+import sys
 import time
 import cv2
 import dragon
 import numpy as np
-import xml.etree.ElementTree as ET
+import xml.etree.ElementTree
-def make_example(image_file, xml_file):
+def parse_args():
-    tree = ET.parse(xml_file)
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Prepare PASCAL VOC datasets')
+    parser.add_argument(
+        '--rec',
+        default=None,
+        help='path to write record dataset')
+    parser.add_argument(
+        '--gt',
+        default=None,
+        help='path to write json dataset')
+    parser.add_argument(
+        '--images',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of images folder')
+    parser.add_argument(
+        '--annotations',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of annotations folder')
+    parser.add_argument(
+        '--splits',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of split file')
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+def make_example(img_file, xml_file):
+    """Return the record example."""
+    tree = xml.etree.ElementTree.parse(xml_file)
    filename = os.path.split(xml_file)[-1]
-    objs = tree.findall('object')
+    objects = tree.findall('object')
    size = tree.find('size')
    example = {'id': filename.split('.')[0], 'object': []}
-    with open(image_file, 'rb') as f:
+    with open(img_file, 'rb') as f:
        img_bytes = bytes(f.read())
    if size is not None:
        example['height'] = int(size.find('height').text)
@@ -38,7 +78,7 @@ def make_example(image_file, xml_file):
        img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 3)
        example['height'], example['width'], example['depth'] = img.shape
    example['content'] = img_bytes
-    for ix, obj in enumerate(objs):
+    for obj in objects:
        bbox = obj.find('bndbox')
        is_diff = 0
        if obj.find('difficult') is not None:
@@ -49,35 +89,21 @@ def make_example(image_file, xml_file):
            'ymin': float(bbox.find('ymin').text),
            'xmax': float(bbox.find('xmax').text),
            'ymax': float(bbox.find('ymax').text),
-            'difficult': is_diff,
+            'difficult': is_diff})
-        })
    return example
-def make_record(
+def write_dataset(args):
-    record_file,
+    """Write the record dataset."""
-    images_path,
+    assert len(args.splits) == len(args.images)
-    annotations_path,
+    assert len(args.splits) == len(args.annotations)
-    splits_path,
+    if os.path.exists(args.rec):
-    splits
+        raise ValueError('The record path is already exist.')
-):
+    os.makedirs(args.rec)
-    if os.path.exists(record_file):
+    print('Write record dataset to {}'.format(args.rec))
-        raise ValueError('The record file is already exist.')
-    os.makedirs(record_file)
-    if not isinstance(images_path, list):
-        images_path = [images_path]
-    if not isinstance(annotations_path, list):
-        annotations_path = [annotations_path]
-    if not isinstance(splits_path, list):
-        splits_path = [splits_path]
-    assert len(splits) == len(splits_path)
-    assert len(splits) == len(images_path)
-    assert len(splits) == len(annotations_path)
    writer = dragon.io.KPLRecordWriter(
-        path=record_file,
+        path=args.rec,
        protocol={
            'id': 'string',
            'content': 'bytes',
@@ -95,36 +121,56 @@ def make_record(
        }
    )
-    # Scan all available entries
+    # Scan all available entries.
    print('Scan entries...')
    entries = []
-    for i, split in enumerate(splits):
+    for i, split in enumerate(args.splits):
-        split_file = os.path.join(splits_path[i], split + '.txt')
+        with open(split, 'r') as f:
-        with open(split_file, 'r') as f:
            lines = f.readlines()
        for line in lines:
            filename = line.strip()
-            img_file = os.path.join(images_path[i], filename + '.jpg')
+            img_file = os.path.join(args.images[i], filename + '.jpg')
-            ann_file = os.path.join(annotations_path[i], filename + '.xml')
+            ann_file = os.path.join(args.annotations[i], filename + '.xml')
            entries.append((img_file, ann_file))
-    # Parse and write into record file
+    # Parse and write into record file.
    print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
    start_time = time.time()
+    for i, (img_file, xml_file) in enumerate(entries):
-    for i, (img_file, ann_file) in enumerate(entries):
        if i > 0 and i % 2000 == 0:
            now_time = time.time()
            print('{} / {} in {:.2f} sec'.format(
                i, len(entries), now_time - start_time))
-        writer.write(make_example(img_file, ann_file))
+        writer.write(make_example(img_file, xml_file))
    now_time = time.time()
    print('{} / {} in {:.2f} sec'.format(
        len(entries), len(entries), now_time - start_time))
    writer.close()
    end_time = time.time()
-    data_size = os.path.getsize(record_file + '/root.data') * 1e-6
+    data_size = os.path.getsize(args.rec + '/root.data') * 1e-6
    print('{} images take {:.2f} MB in {:.2f} sec.'
          .format(len(entries), data_size, end_time - start_time))
+def write_json_dataset(args):
+    """Write the json dataset."""
+    categories = ['aeroplane', 'bicycle', 'bird', 'boat',
+                  'bottle', 'bus', 'car', 'cat', 'chair',
+                  'cow', 'diningtable', 'dog', 'horse',
+                  'motorbike', 'person', 'pottedplant',
+                  'sheep', 'sofa', 'train', 'tvmonitor']
+    import subprocess
+    scirpt = os.path.dirname(os.path.abspath(__file__)) + '/json_dataset.py'
+    cmd = '{} {} '.format(sys.executable, scirpt)
+    cmd += '--rec {} --gt {} '.format(args.rec, args.gt)
+    cmd += '--categories {} '.format(' '.join(categories))
+    return subprocess.call(cmd, shell=True)
+if __name__ == '__main__':
+    args = parse_args()
+    if args.rec is not None:
+        write_dataset(args)
+    if args.gt is not None:
+        write_json_dataset(args)
--- a/scripts/rotated/maker.py
+++ b/scripts/rotated/maker.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os
-import time
-import cv2
-import dragon
-import numpy as np
-import xml.etree.ElementTree as ET
-def make_example(image_file, xml_file):
-    tree = ET.parse(xml_file)
-    filename = os.path.split(xml_file)[-1]
-    objs = tree.findall('object')
-    example = {'id': filename.split('.')[0], 'object': []}
-    with open(image_file, 'rb') as f:
-        img_bytes = bytes(f.read())
-    img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 1)
-    example['height'], example['width'], example['depth'] = img.shape
-    example['content'] = img_bytes
-    for ix, obj in enumerate(objs):
-        bbox = obj.find('bndbox')
-        is_diff = 0
-        if obj.find('difficult') is not None:
-            is_diff = int(obj.find('difficult').text) == 1
-        example['object'].append({
-            'name': obj.find('name').text.strip(),
-            'x1': float(bbox.find('x1').text),
-            'y1': float(bbox.find('y1').text),
-            'x2': float(bbox.find('x2').text),
-            'y2': float(bbox.find('y2').text),
-            'x3': float(bbox.find('x3').text),
-            'y3': float(bbox.find('y3').text),
-            'x4': float(bbox.find('x4').text),
-            'y4': float(bbox.find('y4').text),
-            'difficult': is_diff,
-        })
-    return example
-def make_record(
-    record_file,
-    images_path,
-    annotations_path,
-    splits_path,
-    splits
-):
-    if os.path.exists(record_file):
-        raise ValueError('The record file is already exist.')
-    os.makedirs(record_file)
-    if not isinstance(images_path, list):
-        images_path = [images_path]
-    if not isinstance(annotations_path, list):
-        annotations_path = [annotations_path]
-    if not isinstance(splits_path, list):
-        splits_path = [splits_path]
-    assert len(splits) == len(splits_path)
-    assert len(splits) == len(images_path)
-    assert len(splits) == len(annotations_path)
-    print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
-    writer = dragon.io.KPLRecordWriter(
-        path=record_file,
-        protocol={
-            'id': 'string',
-            'content': 'bytes',
-            'height': 'int64',
-            'width': 'int64',
-            'depth': 'int64',
-            'object': [{
-                'name': 'string',
-                'x1': 'float64',
-                'y1': 'float64',
-                'x2': 'float64',
-                'y2': 'float64',
-                'x3': 'float64',
-                'y3': 'float64',
-                'x4': 'float64',
-                'y4': 'float64',
-                'difficult': 'int64',
-            }]
-        }
-    )
-    # Scan all available entries
-    print('Scan entries...')
-    entries = []
-    for i, split in enumerate(splits):
-        split_file = os.path.join(splits_path[i], split + '.txt')
-        with open(split_file, 'r') as f:
-            lines = f.readlines()
-        for line in lines:
-            filename = line.strip()
-            img_file = os.path.join(images_path[i], filename + '.jpg')
-            ann_file = os.path.join(annotations_path[i], filename + '.xml')
-            entries.append((img_file, ann_file))
-    # Parse and write into record file
-    print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
-    start_time = time.time()
-    for i, (img_file, ann_file) in enumerate(entries):
-        if i > 0 and i % 2000 == 0:
-            now_time = time.time()
-            print('{} / {} in {:.2f} sec'.format(
-                i, len(entries), now_time - start_time))
-        writer.write(make_example(img_file, ann_file))
-    now_time = time.time()
-    print('{} / {} in {:.2f} sec'.format(
-        len(entries), len(entries), now_time - start_time))
-    writer.close()
-    end_time = time.time()
-    data_size = os.path.getsize(record_file + '/root.data') * 1e-6
-    print('{} images take {:.2f} MB in {:.2f} sec.'
-          .format(len(entries), data_size, end_time - start_time))
--- a/scripts/voc/im2rec.py
+++ b/scripts/voc/im2rec.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""Make record file for VOC dataset."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from os import path as osp
-from maker import make_record
-if __name__ == '__main__':
-    voc_root = '/data'
-    make_record(
-        record_file=osp.join(voc_root, 'voc_0712_trainval'),
-        images_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/JPEGImages'),
-                     osp.join(voc_root, 'VOCdevkit2012/VOC2012/JPEGImages')],
-        annotations_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/Annotations'),
-                          osp.join(voc_root, 'VOCdevkit2012/VOC2012/Annotations')],
-        splits_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/ImageSets/Main'),
-                     osp.join(voc_root, 'VOCdevkit2012/VOC2012/ImageSets/Main')],
-        splits=['trainval', 'trainval']
-    )
-    make_record(
-        record_file=osp.join(voc_root, 'voc_2007_test'),
-        images_path=osp.join(voc_root, 'VOCdevkit2007/VOC2007/JPEGImages'),
-        annotations_path=osp.join(voc_root, 'VOCdevkit2007/VOC2007/Annotations'),
-        splits_path=osp.join(voc_root, 'VOCdevkit2007/VOC2007/ImageSets/Main'),
-        splits=['test']
-    )
--- a/seetadet/algo/common/anchor_sampler.py
+++ b/seetadet/algo/common/anchor_sampler.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from seetadet.core.config import cfg
-class AnchorSampler(object):
-    """Sample precomputed anchors asynchronously."""
-    def __init__(self):
-        self._rpn_target = None
-        self._retinanet_target = None
-        self._ssd_target = None
-        if 'rcnn' in cfg.MODEL.TYPE:
-            from seetadet.algo.faster_rcnn import anchor_target
-            self._rpn_target = anchor_target.AnchorTarget()
-        elif cfg.MODEL.TYPE == 'retinanet':
-            from seetadet.algo.retinanet import anchor_target
-            self._retinanet_target = anchor_target.AnchorTarget()
-        elif cfg.MODEL.TYPE == 'ssd':
-            from seetadet.algo.ssd import anchor_target
-            self._ssd_target = anchor_target.AnchorTarget()
-    def __call__(self, **inputs):
-        """Return the sample anchors."""
-        if self._rpn_target:
-            fg_inds, bg_inds = \
-                self._rpn_target.sample_anchors(
-                    gt_boxes=inputs['gt_boxes'],
-                    im_info=inputs['im_info'],
-                )
-            return {'fg_inds': fg_inds, 'bg_inds': bg_inds}
-        if self._retinanet_target:
-            fg_inds, ignore_inds = \
-                self._retinanet_target.sample_anchors(
-                    gt_boxes=inputs['gt_boxes'],
-                    im_info=inputs['im_info'],
-                )
-            return {'fg_inds': fg_inds, 'bg_inds': ignore_inds}
-        if self._ssd_target:
-            fg_inds, neg_inds = \
-                self._ssd_target.sample_anchors(
-                    gt_boxes=inputs['gt_boxes'],
-                )
-            return {'fg_inds': fg_inds, 'bg_inds': neg_inds}
-        return {}
--- a/seetadet/algo/faster_rcnn/anchor_target.py
+++ b/seetadet/algo/faster_rcnn/anchor_target.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import math
-import numpy as np
-import numpy.random as npr
-from seetadet.algo.faster_rcnn import generate_anchors as anchor_util
-from seetadet.algo.faster_rcnn import utils as rcnn_util
-from seetadet.core.config import cfg
-from seetadet.utils import boxes as box_util
-from seetadet.utils.env import new_tensor
-class AnchorTarget(object):
-    """Assign ground-truth targets to anchors."""
-    def __init__(self):
-        super(AnchorTarget, self).__init__()
-        # Load the basic configs
-        self.scales = cfg.RPN.SCALES
-        self.strides = cfg.RPN.STRIDES
-        self.ratios = cfg.RPN.ASPECT_RATIOS
-        self.num_strides = len(self.strides)
-        # Generate base anchors
-        self.base_anchors = []
-        for i in range(self.num_strides):
-            self.base_anchors.append(
-                anchor_util.generate_anchors(
-                    self.strides[i],
-                    self.ratios,
-                    np.array([self.scales[i]])
-                    if self.num_strides > 1
-                    else np.array(self.scales)))
-        # Plan the maximum shifted anchor layout
-        max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
-        if cfg.MODEL.COARSEST_STRIDE > 0:
-            stride = float(cfg.MODEL.COARSEST_STRIDE)
-            max_size = int(math.ceil(max_size / stride) * stride)
-        self.max_shapes = [[math.ceil(max_size / stride)] * 2
-                           for stride in self.strides]
-        self.all_coords = rcnn_util.get_shifted_coords(
-            self.max_shapes, self.base_anchors)
-        self.all_anchors = rcnn_util.get_shifted_anchors(
-            self.max_shapes, self.base_anchors, self.strides)
-    def sample_anchors(self, gt_boxes, im_info, all_anchors=None):
-        if all_anchors is None:
-            all_anchors = self.all_anchors
-        # Only keep anchors inside the image
-        # to get higher quality proposals.
-        inds_inside = np.where(
-            (all_anchors[:, 0] >= 0) &
-            (all_anchors[:, 1] >= 0) &
-            (all_anchors[:, 2] < im_info[1]) &
-            (all_anchors[:, 3] < im_info[0]))[0]
-        anchors = all_anchors[inds_inside, :]
-        num_inside = len(inds_inside)
-        labels = np.empty((num_inside,), 'int32')
-        labels.fill(-1)
-        # Overlaps between the anchors and the gt boxes.
-        overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
-        argmax_overlaps = overlaps.argmax(axis=1)
-        max_overlaps = overlaps[np.arange(num_inside), argmax_overlaps]
-        # Overlaps between the gt boxes and anchors with highest IoU.
-        gt_argmax_overlaps = overlaps.argmax(axis=0)
-        gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])]
-        gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
-        # Foreground: for each gt, anchor with highest overlap.
-        labels[gt_argmax_overlaps] = 1
-        # Foreground: above threshold IoU.
-        labels[max_overlaps >= cfg.RPN.POSITIVE_OVERLAP] = 1
-        # Background: below threshold IoU.
-        labels[max_overlaps < cfg.RPN.NEGATIVE_OVERLAP] = 0
-        # Retract the clamping if we don't have one.
-        fg_inds = np.where(labels == 1)[0]
-        if len(fg_inds) == 0:
-            labels[gt_argmax_overlaps] = 1
-            fg_inds = np.where(labels == 1)[0]
-        # Subsample positive labels if we have too many.
-        num_fg = int(cfg.RPN.FG_FRACTION * cfg.RPN.BATCH_SIZE)
-        if len(fg_inds) > num_fg:
-            fg_inds = npr.choice(fg_inds, num_fg, False)
-        # Subsample negative labels if we have too many.
-        num_bg = cfg.RPN.BATCH_SIZE - len(fg_inds)
-        bg_inds = np.where(labels == 0)[0]
-        if len(bg_inds) > num_bg:
-            bg_inds = npr.choice(bg_inds, num_bg, False)
-        return inds_inside[fg_inds], inds_inside[bg_inds]
-    def __call__(self, **inputs):
-        num_images = cfg.TRAIN.IMS_PER_BATCH
-        shapes = [f.shape[-2:] for f in inputs['features']]
-        image_stride = sum(self.base_anchors[i].shape[0] * np.prod(shapes[i])
-                           for i in range(len(inputs['features'])))
-        narrow_args = [self.all_coords, self.base_anchors, self.max_shapes, shapes]
-        outputs = collections.defaultdict(list)
-        for ix in range(num_images):
-            fg_inds = inputs['fg_inds'][ix]
-            bg_inds = inputs['bg_inds'][ix]
-            gt_boxes = inputs['gt_boxes'][ix]
-            # Narrow anchors to match the feature layout
-            anchors = self.all_anchors[fg_inds]
-            bg_inds = rcnn_util.narrow_anchors(*(narrow_args + [bg_inds]))
-            _, anchors = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds, anchors]))
-            fg_inds = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds]))
-            # Compute bbox targets
-            gt_assignment = box_util.bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
-            bbox_targets = box_util.bbox_transform(anchors, gt_boxes[gt_assignment, :4])
-            outputs['bbox_anchors'].append(anchors)
-            outputs['bbox_targets'].append(bbox_targets)
-            # Compute sparse indices
-            fg_inds += ix * image_stride
-            bg_inds += ix * image_stride
-            outputs['cls_inds'].extend([fg_inds, bg_inds])
-            outputs['bbox_inds'].extend([fg_inds])
-            outputs['labels'].extend([np.ones_like(fg_inds, 'float32'),
-                                      np.zeros_like(bg_inds, 'float32')])
-        return {
-            'labels': new_tensor(
-                np.concatenate(outputs['labels'])),
-            'cls_inds': new_tensor(
-                np.concatenate(outputs['cls_inds'])),
-            'bbox_inds': new_tensor(
-                np.concatenate(outputs['bbox_inds'])),
-            'bbox_targets': new_tensor(
-                np.concatenate(outputs['bbox_targets']).astype('float32')),
-            'bbox_anchors': new_tensor(
-                np.concatenate(outputs['bbox_anchors']).astype('float32')),
-        }
--- a/seetadet/algo/faster_rcnn/data_loader.py
+++ b/seetadet/algo/faster_rcnn/data_loader.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import multiprocessing as mp
-import time
-import threading
-import queue
-import dragon
-import dragon.vm.torch as torch
-import numpy as np
-from seetadet.algo.faster_rcnn import data_transformer
-from seetadet.core.config import cfg
-from seetadet.datasets.factory import get_dataset
-from seetadet.utils import blob as blob_util
-from seetadet.utils import logger
-class DataLoader(object):
-    """Load mini-batches of data."""
-    def __init__(self):
-        super(DataLoader, self).__init__()
-        dataset = get_dataset(cfg.TRAIN.DATASET)
-        self.iterator = Iterator(**{
-            'dataset': dataset.cls,
-            'source': dataset.source,
-            'classes': dataset.classes,
-            'shuffle': cfg.TRAIN.USE_SHUFFLE,
-            'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
-            'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
-        })
-        self.iterator.start()
-    def __call__(self):
-        outputs = self.iterator.next()
-        if isinstance(outputs['image'], np.ndarray):
-            outputs['image'] = torch.from_numpy(outputs['image'])
-        return outputs
-class Iterator(threading.Thread):
-    """Iterator to return the batch of data."""
-    def __init__(self, **kwargs):
-        super(Iterator, self).__init__()
-        # Distributed settings
-        rank, group_size = 0, 1
-        process_group = dragon.distributed.get_group()
-        if process_group is not None and \
-                kwargs.get('phase', 'TRAIN') == 'TRAIN':
-            group_size = process_group.size
-            rank = dragon.distributed.get_rank(process_group)
-        # Configuration
-        self._batch_size = kwargs.get('batch_size', 2)
-        self._num_readers = kwargs.get('num_readers', 1)
-        self._num_transformers = kwargs.get('num_transformers', 3)
-        self.daemon = True
-        # Initialize queues
-        num_batches = self._num_readers
-        self._queue1 = mp.Queue(num_batches * self._batch_size)
-        self._queue2 = mp.Queue(num_batches * self._batch_size)
-        self._queue3 = queue.Queue(num_batches)
-        # Initialize readers
-        self._readers = []
-        for i in range(self._num_readers):
-            part_idx, num_parts = i, self._num_readers
-            num_parts *= group_size
-            part_idx += rank * self._num_readers
-            self._readers.append(dragon.io.DataReader(
-                part_idx=part_idx, num_parts=num_parts, **kwargs))
-            self._readers[i]._seed += part_idx
-            self._readers[i].q_out = self._queue1
-            self._readers[i].start()
-            time.sleep(0.1)
-        # Initialize transformers
-        self._transformers = []
-        for i in range(self._num_transformers):
-            p = data_transformer.DataTransformer(**kwargs)
-            p._seed += (i + rank * self._num_transformers)
-            p.q_in, p.q_out = self._queue1, self._queue2
-            p.start()
-            self._transformers.append(p)
-            time.sleep(0.1)
-        # Register cleanup callbacks
-        def cleanup():
-            def terminate(processes):
-                for p in processes:
-                    p.terminate()
-                    p.join()
-            terminate(self._transformers)
-            logger.info('Terminate DataTransformer.')
-            terminate(self._readers)
-            logger.info('Terminate DataReader.')
-        import atexit
-        atexit.register(cleanup)
-    def next(self):
-        """Return the next batch of data."""
-        return self.__next__()
-    def run(self):
-        """Main loop."""
-        num_images = cfg.TRAIN.IMS_PER_BATCH
-        num_batches = cfg.TRAIN.ASPECT_GROUPING
-        logger.info('Initialize prefetching batches...')
-        example_buffer = [self._queue2.get()
-                          for _ in range(num_images * num_batches)]
-        next_examples = []
-        while True:
-            # Use cached buffer for next N examples
-            # Examples are sorted to simulate aspect grouping
-            if len(next_examples) == 0:
-                next_examples = example_buffer
-                next_examples.sort(key=lambda d: d['aspect_ratio'])
-                example_buffer = []
-            # Prepare the next batch
-            outputs = collections.defaultdict(list)
-            for i in range(num_images):
-                example = next_examples.pop(0)
-                outputs['image'].append(example['image'])
-                outputs['gt_boxes'].append(example['boxes'])
-                outputs['im_info'].append(example['im_info'])
-                outputs['fg_inds'].append(example.get('fg_inds', None))
-                outputs['bg_inds'].append(example.get('bg_inds', None))
-                example_buffer.append(self._queue2.get())
-            outputs['image'] = blob_util.im_list_to_blob(
-                outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
-            # Send batch data to consumer
-            self._queue3.put(outputs)
-    def __iter__(self):
-        """Return the iterator self."""
-        return self
-    def __next__(self):
-        """Return the next batch of data."""
-        return self._queue3.get()
--- a/seetadet/algo/faster_rcnn/data_transformer.py
+++ b/seetadet/algo/faster_rcnn/data_transformer.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import multiprocessing
-import cv2
-import numpy as np
-import numpy.random as npr
-from seetadet.algo import common as algo_common
-from seetadet.core.config import cfg
-from seetadet.datasets.example import Example
-from seetadet.utils import boxes as box_util
-from seetadet.utils import image as image_util
-class DataTransformer(multiprocessing.Process):
-    """DataTransformer."""
-    def __init__(self, **kwargs):
-        super(DataTransformer, self).__init__()
-        self._scales = cfg.TRAIN.SCALES
-        self._random_scales = cfg.TRAIN.RANDOM_SCALES
-        self._max_size = cfg.TRAIN.MAX_SIZE
-        self._seed = cfg.RNG_SEED
-        self._use_diff = cfg.TRAIN.USE_DIFF
-        self._use_flipped = cfg.TRAIN.USE_FLIPPED
-        self._use_distort = cfg.TRAIN.USE_COLOR_JITTER
-        self._classes = kwargs.get('classes', ('__background__',))
-        self._num_classes = len(self._classes)
-        self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
-        self._anchor_sampler = algo_common.AnchorSampler()
-        self.q_in = self.q_out = None
-        self.daemon = True
-    def get_boxes(self, example, im_scale, im_offset, flipped):
-        objects, num_objects = example.objects, 0
-        height, width = example.height, example.width
-        if not self._use_diff:
-            for obj in objects:
-                if obj.get('difficult', 0) == 0:
-                    num_objects += 1
-        else:
-            num_objects = len(objects)
-        boxes = np.zeros((num_objects, 4), 'float32')
-        gt_classes = np.zeros((num_objects,), 'float32')
-        # Filter the difficult instances.
-        object_idx = 0
-        for obj in objects:
-            if not self._use_diff and obj.get('difficult', 0) > 0:
-                continue
-            bbox = obj['bbox']
-            boxes[object_idx, :] = [max(0, bbox[0]),
-                                    max(0, bbox[1]),
-                                    min(bbox[2], width - 1),
-                                    min(bbox[3], height - 1)]
-            gt_classes[object_idx] = self._class_to_ind[obj['name']]
-            object_idx += 1
-        # Flip the boxes if necessary.
-        if flipped:
-            boxes = box_util.flip_boxes(boxes, width)
-        # Scale the boxes to the detecting scale.
-        boxes *= im_scale
-        # Offset the boxes to align the cropping.
-        if im_offset is not None:
-            boxes[:, 0::2] += im_offset[1]
-            boxes[:, 1::2] += im_offset[0]
-            boxes[:, :] = np.minimum(
-                np.maximum(boxes[:, :], 0),
-                [im_offset[2][1] - 1, im_offset[2][0] - 1] * 2)
-        # Attach the classes.
-        gt_boxes = np.empty((num_objects, 5), dtype=np.float32)
-        gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
-        return gt_boxes
-    def get(self, example):
-        example = Example(example)
-        # Resize.
-        target_size = npr.choice(self._scales)
-        img, im_scale = image_util.resize_image_with_target_size(
-            example.image,
-            target_size=target_size,
-            max_size=self._max_size,
-            random_scales=self._random_scales,
-        )
-        # Flip.
-        flipped = False
-        if self._use_flipped and npr.randint(2) > 0:
-            img = img[:, ::-1]
-            flipped = True
-        # Crop or Pad.
-        im_offset = None
-        if self._max_size == 0:
-            img, im_offset = image_util.get_image_with_target_size(
-                img, target_size)
-        # Distort.
-        if self._use_distort:
-            img = image_util.distort_image(img)
-        # Boxes.
-        boxes = self.get_boxes(example, im_scale, im_offset, flipped)
-        # Standard outputs.
-        outputs = {'image': img,
-                   'boxes': boxes,
-                   'im_info': img.shape[:2] + (im_scale,)}
-        # Attach precomputed targets.
-        if len(boxes) > 0:
-            outputs.update(
-                self._anchor_sampler(
-                    gt_boxes=boxes,
-                    im_info=outputs['im_info']))
-        return outputs
-    def run(self):
-        # Disable the opencv threading.
-        cv2.setNumThreads(1)
-        # Fix the process-local random seed.
-        np.random.seed(self._seed)
-        # Main prefetch loop
-        while True:
-            outputs = self.get(self.q_in.get())
-            if len(outputs['boxes']) < 1:
-                continue  # Ignore non-object image.
-            height, width = outputs['image'].shape[:2]
-            outputs['aspect_ratio'] = float(height) / float(width)
-            self.q_out.put(outputs)
--- a/seetadet/algo/faster_rcnn/generate_anchors.py
+++ b/seetadet/algo/faster_rcnn/generate_anchors.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# Codes are based on:
-#
-#     <https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/generate_anchors.py>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import numpy as np
-# Verify that we compute the same anchors as Shaoqing's matlab implementation:
-#
-#    >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat
-#    >> anchors
-#
-#    anchors =
-#
-#       -83   -39   100    56
-#      -175   -87   192   104
-#      -359  -183   376   200
-#       -55   -55    72    72
-#      -119  -119   136   136
-#      -247  -247   264   264
-#       -35   -79    52    96
-#       -79  -167    96   184
-#      -167  -343   184   360
-# array([[ -83.,  -39.,  100.,   56.],
-#       [-175.,  -87.,  192.,  104.],
-#       [-359., -183.,  376.,  200.],
-#       [ -55.,  -55.,   72.,   72.],
-#       [-119., -119.,  136.,  136.],
-#       [-247., -247.,  264.,  264.],
-#       [ -35.,  -79.,   52.,   96.],
-#       [ -79., -167.,   96.,  184.],
-#       [-167., -343.,  184.,  360.]])
-def generate_anchors(
-    base_size=16,
-    ratios=(0.5, 1, 2),
-    scales=2**np.arange(3, 6),
-):
-    """
-    Generate anchor (reference) windows by enumerating aspect ratios X
-    scales wrt a reference (0, 0, 15, 15) window.
-    """
-    base_anchor = np.array([1, 1, base_size, base_size]) - 1
-    ratio_anchors = _ratio_enum(base_anchor, ratios)
-    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
-                         for i in range(ratio_anchors.shape[0])])
-    return anchors
-def generate_anchors_v2(
-    stride=16,
-    ratios=(0.5, 1, 2),
-    sizes=(32, 64, 128, 256, 512),
-):
-    """
-    Generates a matrix of anchor boxes in (x1, y1, x2, y2) format. Anchors
-    are centered on stride / 2, have (approximate) sqrt areas of the specified
-    sizes, and aspect ratios as given.
-    """
-    return generate_anchors(
-        base_size=stride,
-        ratios=ratios,
-        scales=np.array(sizes, dtype=np.float) / stride,
-    )
-def _whctrs(anchor):
-    """Return width, height, x center, and y center for an anchor (window)."""
-    w = anchor[2] - anchor[0] + 1
-    h = anchor[3] - anchor[1] + 1
-    x_ctr = anchor[0] + 0.5 * (w - 1)
-    y_ctr = anchor[1] + 0.5 * (h - 1)
-    return w, h, x_ctr, y_ctr
-def _mkanchors(ws, hs, x_ctr, y_ctr):
-    """
-    Given a vector of widths (ws) and heights (hs) around a center
-    (x_ctr, y_ctr), output a set of anchors (windows).
-    """
-    ws = ws[:, np.newaxis]
-    hs = hs[:, np.newaxis]
-    anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
-                         y_ctr - 0.5 * (hs - 1),
-                         x_ctr + 0.5 * (ws - 1),
-                         y_ctr + 0.5 * (hs - 1)))
-    return anchors
-def _ratio_enum(anchor, ratios):
-    """Enumerate a set of anchors for each aspect ratio wrt an anchor."""
-    w, h, x_ctr, y_ctr = _whctrs(anchor)
-    size = w * h
-    size_ratios = size / ratios
-    ws = np.round(np.sqrt(size_ratios))
-    hs = np.round(ws * ratios)
-    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
-    return anchors
-def _scale_enum(anchor, scales):
-    """Enumerate a set of anchors for each scale wrt an anchor."""
-    w, h, x_ctr, y_ctr = _whctrs(anchor)
-    ws = w * scales
-    hs = h * scales
-    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
-    return anchors
-if __name__ == '__main__':
-    print(generate_anchors())
--- a/seetadet/algo/faster_rcnn/proposal.py
+++ b/seetadet/algo/faster_rcnn/proposal.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import numpy as np
-from seetadet.algo.faster_rcnn import generate_anchors as anchor_util
-from seetadet.algo.faster_rcnn import utils as rcnn_util
-from seetadet.core.config import cfg
-from seetadet.utils import boxes as box_util
-from seetadet.utils import nms
-class Proposal(object):
-    """Compute proposals by applying transformations anchors."""
-    def __init__(self):
-        super(Proposal, self).__init__()
-        # Load basic configs
-        self.scales = cfg.RPN.SCALES
-        self.strides = cfg.RPN.STRIDES
-        self.ratios = cfg.RPN.ASPECT_RATIOS
-        self.num_strides = len(self.strides)
-        self.defaults = collections.OrderedDict([
-            ('rois', np.array([[-1, 0, 0, 1, 1]], 'float32'))])
-        self.bbox_transform_clip = \
-            np.log(max(cfg.TRAIN.MAX_SIZE,
-                       max(cfg.TRAIN.SCALES)) / min(self.strides))
-        # Generate base anchors
-        self.base_anchors = []
-        for i in range(self.num_strides):
-            self.base_anchors.append(
-                anchor_util.generate_anchors(
-                    self.strides[i],
-                    self.ratios,
-                    np.array([self.scales[i]])
-                    if self.num_strides > 1
-                    else np.array(self.scales)))
-    def __call__(self, **inputs):
-        num_images = cfg.TRAIN.IMS_PER_BATCH
-        pre_nms_top_n = cfg.TRAIN.RPN_PRE_NMS_TOP_N
-        post_nms_top_n = cfg.TRAIN.RPN_POST_NMS_TOP_N
-        nms_thresh = cfg.TRAIN.RPN_NMS_THRESH
-        # Get resources
-        shapes = [f.shape[-2:] for f in inputs['features']]
-        all_anchors = rcnn_util.get_shifted_anchors(
-            shapes, self.base_anchors, self.strides)
-        # Prepare for the outputs
-        batch_rois = []
-        cls_prob = inputs['cls_prob'].numpy()
-        # (?, 4, A * K) -> (?, A * K, 4)
-        bbox_pred = inputs['bbox_pred'].numpy()
-        bbox_pred = bbox_pred.transpose((0, 2, 1))
-        # Extract RoIs separately
-        for ix in range(num_images):
-            # [?, N] -> [? * N, 1]
-            scores = cls_prob[ix].reshape((-1, 1))
-            deltas = bbox_pred[ix]
-            im_info = inputs['im_info'][ix]
-            if pre_nms_top_n <= 0 or pre_nms_top_n >= len(scores):
-                order = np.argsort(-scores.squeeze())
-            else:
-                # Avoid sorting possibly large arrays; First partition to get top K
-                # unsorted and then sort just those (~20x faster for 200k scores)
-                inds = np.argpartition(-scores.squeeze(), pre_nms_top_n)[:pre_nms_top_n]
-                order = np.argsort(-scores[inds].squeeze())
-                order = inds[order]
-            deltas = deltas[order]
-            anchors = all_anchors[order]
-            scores = scores[order]
-            # Convert anchors into proposals via bbox transformations
-            proposals = box_util.bbox_transform_inv(
-                anchors, deltas, clip=self.bbox_transform_clip)
-            # Clip predicted boxes to image
-            proposals = box_util.clip_tiled_boxes(proposals, im_info[:2])
-            # Apply nms (e.g. threshold = 0.7)
-            # Take after_nms_topN (e.g. 300)
-            # Return the top proposals (-> RoIs top)
-            keep = nms.gpu_nms(np.hstack((proposals, scores)), nms_thresh)
-            if post_nms_top_n > 0:
-                keep = keep[:post_nms_top_n]
-            proposals = proposals[keep, :]
-            # Attach RoIs with batch indices
-            batch_inds = np.empty((proposals.shape[0], 1), 'float32')
-            batch_inds.fill(ix)
-            rpn_rois = np.hstack((batch_inds, proposals.astype('float32', copy=False)))
-            batch_rois.append(rpn_rois)
-        # Merge RoIs into a blob
-        return np.concatenate(batch_rois, 0)
--- a/seetadet/algo/faster_rcnn/proposal_target.py
+++ b/seetadet/algo/faster_rcnn/proposal_target.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import numpy as np
-import numpy.random as npr
-from seetadet.algo.faster_rcnn import utils as rcnn_util
-from seetadet.core.config import cfg
-from seetadet.utils import boxes as box_util
-from seetadet.utils.env import new_tensor
-class ProposalTarget(object):
-    """Assign ground-truth targets to proposals."""
-    def __init__(self):
-        super(ProposalTarget, self).__init__()
-        self.num_strides = len(cfg.RPN.STRIDES)
-        self.num_classes = len(cfg.MODEL.CLASSES)
-        self.defaults = collections.OrderedDict([
-            ('rois', np.array([[-1, 0, 0, 1, 1]], 'float32')),
-            ('labels', np.array([-1], 'int64')),
-            ('bbox_targets', np.zeros((1, 4), 'float32')),
-        ])
-    def __call__(self, **inputs):
-        num_images = cfg.TRAIN.IMS_PER_BATCH
-        # Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
-        all_rois = inputs['rois']
-        # Prepare for the outputs
-        keys = self.defaults.keys()
-        blobs = dict(map(lambda a, b: (a, b), keys, [[] for _ in keys]))
-        # Generate targets separately
-        for ix in range(num_images):
-            # GT boxes (x1, y1, x2, y2, label)
-            gt_boxes = inputs['gt_boxes'][ix]
-            # Extract proposals for this image
-            rois = all_rois[np.where(all_rois[:, 0].astype('int32') == ix)[0]]
-            # Include ground-truth boxes in the set of candidate rois
-            inds = np.ones((gt_boxes.shape[0], 1), gt_boxes.dtype) * ix
-            rois = np.vstack((rois, np.hstack((inds, gt_boxes[:, :4]))))
-            # Sample a batch of RoIs for training
-            rois_per_image = cfg.FRCNN.BATCH_SIZE
-            fg_rois_per_image = np.round(cfg.FRCNN.FG_FRACTION * rois_per_image)
-            rcnn_util.map_returns_to_blobs(
-                sample_rois(rois,
-                            gt_boxes,
-                            rois_per_image,
-                            fg_rois_per_image),
-                blobs, keys,
-            )
-        # Stack into continuous blobs
-        blobs = dict((k, np.concatenate(blobs[k])) for k in blobs.keys())
-        if self.num_strides > 1:
-            # Distribute RoIs into pyramids
-            min_lvl = cfg.FPN.ROI_MIN_LEVEL
-            max_lvl = cfg.FPN.ROI_MAX_LEVEL
-            num_levels = max_lvl - min_lvl + 1
-            levels = rcnn_util.map_rois_to_levels(blobs['rois'], min_lvl, max_lvl)
-            lvl_blobs = rcnn_util.map_blobs_by_levels(
-                blobs,
-                self.defaults,
-                [np.where(levels == (i + min_lvl))[0] for i in range(num_levels)],
-            )
-            blobs = dict((k, np.concatenate(lvl_blobs[k])) for k in blobs.keys())
-            rois_wide = [lvl_blobs['rois'][i] for i in range(num_levels)]
-        else:
-            # Return RoIs directly for specified stride
-            rois_wide = [blobs['rois']]
-        # Select the foreground RoIs only for bbox branch
-        fg_inds = np.where(blobs['labels'] > 0)[0]
-        cls_inds = np.arange(len(blobs['rois'])) * self.num_classes
-        return {
-            'rois': [new_tensor(rois) for rois in rois_wide],
-            'labels': new_tensor(blobs['labels']),
-            'bbox_inds': new_tensor(cls_inds[fg_inds] + blobs['labels'][fg_inds]),
-            'bbox_targets': new_tensor(blobs['bbox_targets'][fg_inds].astype('float32')),
-            'bbox_anchors': new_tensor(blobs['rois'][fg_inds, 1:].astype('float32')),
-        }
-def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois):
-    """Sample a batch of RoIs comprising foreground and background examples."""
-    overlaps = box_util.bbox_overlaps(all_rois[:, 1:5], gt_boxes[:, :4])
-    gt_assignment = overlaps.argmax(axis=1)
-    max_overlaps = overlaps.max(axis=1)
-    labels = gt_boxes[gt_assignment, 4].astype('int64')
-    # Select foreground RoIs as those with >= POSITIVE_OVERLAP
-    fg_thresh = cfg.FRCNN.POSITIVE_OVERLAP
-    fg_inds = np.where(max_overlaps >= fg_thresh)[0]
-    while fg_inds.size == 0:
-        fg_thresh -= 0.01
-        fg_inds = np.where(max_overlaps >= fg_thresh)[0]
-    # Sample foreground regions without replacement
-    fg_rois_per_this_image = int(min(num_fg_rois, fg_inds.size))
-    fg_inds = npr.choice(fg_inds, fg_rois_per_this_image, False)
-    # Select background RoIs as those within
-    # [NEGATIVE_OVERLAP_LO, NEGATIVE_OVERLAP_HI)
-    bg_inds = np.where((max_overlaps < cfg.FRCNN.NEGATIVE_OVERLAP_HI) &
-                       (max_overlaps >= cfg.FRCNN.NEGATIVE_OVERLAP_LO))[0]
-    # Compute number of background RoIs to take from this image
-    bg_rois_per_this_image = num_rois - fg_rois_per_this_image
-    bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size)
-    # Sample background regions without replacement
-    if bg_inds.size > 0:
-        bg_inds = npr.choice(bg_inds, bg_rois_per_this_image, False)
-    # The selecting indices (both fg and bg)
-    keep_inds = np.append(fg_inds, bg_inds)
-    # Select sampled values from various arrays
-    rois, labels = all_rois[keep_inds], labels[keep_inds]
-    # Clamp labels for the background RoIs to 0
-    labels[fg_rois_per_this_image:] = 0
-    # Compute the target from RoIs
-    outputs = [rois, labels]
-    outputs += [box_util.bbox_transform(
-        rois[:, 1:5],
-        gt_boxes[gt_assignment[keep_inds], :4],
-        cfg.BBOX_REG_WEIGHTS)]
-    return outputs
--- a/seetadet/algo/faster_rcnn/test.py
+++ b/seetadet/algo/faster_rcnn/test.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import types
-import dragon.vm.torch as torch
-import numpy as np
-from seetadet.core.config import cfg
-from seetadet.modeling.detector import new_detector
-from seetadet.utils import blob as blob_util
-from seetadet.utils import boxes as box_util
-from seetadet.utils import image as image_util
-from seetadet.utils import logger
-from seetadet.utils import nms as nms_util
-from seetadet.utils import time_util
-def get_data(raw_images):
-    """Return the test data."""
-    max_size = cfg.TEST.MAX_SIZE
-    images_wide = []
-    image_shapes_wide, image_scales_wide = [], []
-    for img in raw_images:
-        images, image_scales = image_util.scale_image(
-            img, scales=cfg.TEST.SCALES, max_size=max_size)
-        images_wide += images
-        image_scales_wide += image_scales
-        image_shapes_wide += [img.shape[:2] for img in images]
-    images = blob_util.im_list_to_blob(
-        images_wide, coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
-    image_shapes = np.array(image_shapes_wide)
-    image_scales = np.array(image_scales_wide).reshape((len(images), -1))
-    images_info = np.hstack([image_shapes, image_scales]).astype('float32')
-    return images, images_info
-def ims_detect(detector, raw_images, timer=None):
-    """Detect images at single or multiple scales."""
-    images, images_info = get_data(raw_images)
-    timer.tic() if timer else timer
-    # Do forward.
-    inputs = {'image': torch.from_numpy(images),
-              'im_info': torch.from_numpy(images_info)}
-    if not hasattr(detector, 'script_forward'):
-        def script_forward(self, image, im_info):
-            return self.forward({'image': image, 'im_info': im_info})
-        detector.script_forward = torch.jit.trace(
-            func=types.MethodType(script_forward, detector),
-            example_inputs=[inputs['image'], inputs['im_info']],
-        )
-    outputs = detector.script_forward(inputs['image'], inputs['im_info'])
-    outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
-    # Decode results.
-    batch_pred = box_util.bbox_transform_inv(
-        outputs['rois'][:, 1:5],
-        outputs['bbox_pred'],
-        cfg.BBOX_REG_WEIGHTS)
-    results = [([], []) for _ in range(len(raw_images))]
-    for i in range(len(images)):
-        ii = i // len(cfg.TEST.SCALES)
-        inds = np.where(outputs['rois'][:, 0].astype(np.int32) == i)[0]
-        boxes = batch_pred[inds] / images_info[i][2]
-        boxes = box_util.clip_tiled_boxes(boxes, raw_images[ii].shape)
-        results[ii][0].append(outputs['cls_prob'][inds])
-        results[ii][1].append(boxes)
-    # Merge from multiple scales.
-    ret = [(np.vstack(s), np.vstack(b)) for s, b in results]
-    timer.toc() if timer else timer
-    return ret
-def get_detections(outputs):
-    """Return the categorical detections from outputs."""
-    scores, boxes = outputs
-    boxes_this_image = [[]]
-    empty_detections = np.zeros((0, 5), 'float32')
-    for j in range(1, len(cfg.MODEL.CLASSES)):
-        inds = np.where(scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
-        if len(inds) == 0:
-            boxes_this_image.append(empty_detections)
-            continue
-        cls_scores = scores[inds, j]
-        cls_boxes = boxes[inds, j * 4:(j + 1) * 4]
-        cls_detections = np.hstack(
-            (cls_boxes, cls_scores[:, np.newaxis])) \
-            .astype(np.float32, copy=False)
-        if cfg.TEST.USE_SOFT_NMS:
-            keep = nms_util.soft_nms(
-                cls_detections,
-                thresh=cfg.TEST.NMS,
-                method=cfg.TEST.SOFT_NMS_METHOD,
-                sigma=cfg.TEST.SOFT_NMS_SIGMA,
-            )
-        else:
-            keep = nms_util.nms(
-                cls_detections,
-                thresh=cfg.TEST.NMS,
-            )
-        cls_detections = cls_detections[keep, :]
-        boxes_this_image.append(cls_detections)
-    return [boxes_this_image]
-def test_net(weights, q_in, q_out, device, root_logger=True):
-    """Test a network trained with Faster R-CNN algorithm."""
-    cfg.GPU_ID = device
-    logger.set_root_logger(root_logger)
-    detector = new_detector(device, weights)
-    timers = time_util.new_timers('im_detect_bbox', 'misc')
-    must_stop = False
-    while not must_stop:
-        indices, raw_images = [], []
-        for _ in range(cfg.TEST.IMS_PER_BATCH):
-            i, raw_image = q_in.get()
-            if i < 0:
-                must_stop = True
-                break
-            indices.append(i)
-            raw_images.append(raw_image)
-        if len(raw_images) == 0:
-            continue
-        # Detect on specific scales.
-        all_outputs = ims_detect(
-            detector=detector,
-            raw_images=raw_images,
-            timer=timers['im_detect_bbox'],
-        )
-        # Post-processing.
-        for i, outputs in enumerate(all_outputs):
-            with timers['misc'].tic_and_toc():
-                boxes_this_image, = get_detections(outputs)
-            q_out.put((
-                indices[i],
-                dict([('im_detect', timers['im_detect_bbox'].average_time),
-                      ('misc', timers['misc'].average_time)]),
-                dict([('boxes', boxes_this_image)]),
-            ))
--- a/seetadet/algo/faster_rcnn/utils.py
+++ b/seetadet/algo/faster_rcnn/utils.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import numpy as np
-from seetadet.core.config import cfg
-def get_shifted_coords(shapes, base_anchors):
-    """Return the x-y coordinates of shifted anchors."""
-    xs, ys = [], []
-    for i in range(len(shapes)):
-        height, width = shapes[i]
-        x, y = np.arange(0, width), np.arange(0, height)
-        x, y = np.meshgrid(x, y)
-        # Add A anchors (A,) to cell K shifts (K,)
-        # to get shift coords (A, K)
-        xs.append(np.tile(x.flatten(), base_anchors[i].shape[0]))
-        ys.append(np.tile(y.flatten(), base_anchors[i].shape[0]))
-    return np.concatenate(xs), np.concatenate(ys)
-def get_shifted_anchors(shapes, base_anchors, strides):
-    """Return the shifted anchors on given shapes."""
-    anchors_to_pack = []
-    for i in range(len(shapes)):
-        height, width = shapes[i]
-        shift_x = np.arange(0, width) * strides[i]
-        shift_y = np.arange(0, height) * strides[i]
-        shift_x, shift_y = np.meshgrid(shift_x, shift_y)
-        shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
-                            shift_x.ravel(), shift_y.ravel())).transpose()
-        # Add A anchors (A, 1, 4) to cell K shifts (1, K, 4)
-        # to get shift anchors (A, K, 4)
-        a = base_anchors[i].shape[0]
-        k = shifts.shape[0]
-        anchors = (base_anchors[i].reshape((a, 1, 4)) +
-                   shifts.reshape((1, k, 4)))
-        anchors_to_pack.append(anchors.reshape((a * k, 4)))
-    return np.vstack(anchors_to_pack)
-def narrow_anchors(
-    all_coords,
-    base_anchors,
-    max_shapes,
-    shapes,
-    inds,
-    remapping=None,
-):
-    """Return the valid shifted anchors on given shapes."""
-    x_coords, y_coords = all_coords
-    inds_wide, remapping_wide = [], []
-    offset = num = 0
-    for i in range(len(max_shapes)):
-        num += base_anchors[i].shape[0] * np.prod(max_shapes[i])
-        inds_inside = np.where((inds >= offset) & (inds < num))[0]
-        inds_wide.append(inds[inds_inside])
-        if remapping is not None:
-            remapping_wide.append(remapping[inds_inside])
-        offset = num
-    offset1 = offset2 = num1 = num2 = 0
-    for i in range(len(max_shapes)):
-        num1 += base_anchors[i].shape[0] * np.prod(max_shapes[i])
-        num2 += base_anchors[i].shape[0] * np.prod(shapes[i])
-        inds = inds_wide[i]
-        x, y = x_coords[inds], y_coords[inds]
-        a = ((inds - offset1) // max_shapes[i][1]) // max_shapes[i][0]
-        inds = (a * shapes[i][0] + y) * shapes[i][1] + x + offset2
-        inds_mask = np.where((x < shapes[i][1]) & (y < shapes[i][0]))[0]
-        inds_wide[i] = inds[inds_mask]
-        if remapping is not None:
-            remapping_wide[i] = remapping_wide[i][inds_mask]
-        offset1, offset2 = num1, num2
-    outputs = [np.concatenate(inds_wide)]
-    if remapping is not None:
-        outputs += [np.concatenate(remapping_wide)]
-    return outputs[0] if len(outputs) == 1 else outputs
-def map_returns_to_blobs(returns, blobs, keys):
-    """Map returns of image to blobs."""
-    for i, key in enumerate(keys):
-        blobs[key].append(returns[i])
-def map_rois_to_levels(rois, k_min, k_max):
-    """Map rois to fpn levels."""
-    if len(rois) == 0:
-        return []
-    ws = rois[:, 3] - rois[:, 1] + 1
-    hs = rois[:, 4] - rois[:, 2] + 1
-    s = np.sqrt(ws * hs)
-    s0 = cfg.FPN.ROI_CANONICAL_SCALE  # default: 224
-    lvl0 = cfg.FPN.ROI_CANONICAL_LEVEL  # default: 4
-    target_levels = np.floor(lvl0 + np.log2(s / s0 + 1e-6))
-    return np.clip(target_levels, k_min, k_max)
-def map_blobs_by_levels(blobs, defaults, lvl_inds):
-    """Map blobs to outputs according to fpn indices."""
-    outputs = collections.defaultdict(list)
-    for inds in lvl_inds:
-        for key, blob in blobs.items():
-            outputs[key].append(
-                blob[inds]
-                if len(inds) > 0
-                else defaults[key])
-    return outputs
--- a/seetadet/algo/mask_rcnn/data_loader.py
+++ b/seetadet/algo/mask_rcnn/data_loader.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import multiprocessing as mp
-import time
-import threading
-import queue
-import dragon
-import dragon.vm.torch as torch
-import numpy as np
-from seetadet.algo.mask_rcnn import data_transformer
-from seetadet.core.config import cfg
-from seetadet.datasets.factory import get_dataset
-from seetadet.utils import blob as blob_util
-from seetadet.utils import logger
-class DataLoader(object):
-    """Provide mini-batches of data."""
-    def __init__(self):
-        super(DataLoader, self).__init__()
-        dataset = get_dataset(cfg.TRAIN.DATASET)
-        self.iterator = Iterator(**{
-            'dataset': dataset.cls,
-            'source': dataset.source,
-            'classes': dataset.classes,
-            'shuffle': cfg.TRAIN.USE_SHUFFLE,
-            'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
-            'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
-        })
-        self.iterator.start()
-    def __call__(self):
-        outputs = self.iterator.next()
-        if isinstance(outputs['image'], np.ndarray):
-            outputs['image'] = torch.from_numpy(outputs['image'])
-        return outputs
-class Iterator(threading.Thread):
-    """Iterator to return the batch of data."""
-    def __init__(self, **kwargs):
-        super(Iterator, self).__init__()
-        # Distributed settings
-        rank, group_size = 0, 1
-        process_group = dragon.distributed.get_group()
-        if process_group is not None and \
-                kwargs.get('phase', 'TRAIN') == 'TRAIN':
-            group_size = process_group.size
-            rank = dragon.distributed.get_rank(process_group)
-        # Configuration
-        self._batch_size = kwargs.get('batch_size', 2)
-        self._num_readers = kwargs.get('num_readers', 1)
-        self._num_transformers = kwargs.get('num_transformers', 3)
-        self.daemon = True
-        # Initialize queues
-        num_batches = self._num_readers
-        self._queue1 = mp.Queue(num_batches * self._batch_size)
-        self._queue2 = mp.Queue(num_batches * self._batch_size)
-        self._queue3 = queue.Queue(num_batches)
-        # Initialize readers
-        self._readers = []
-        for i in range(self._num_readers):
-            part_idx, num_parts = i, self._num_readers
-            num_parts *= group_size
-            part_idx += rank * self._num_readers
-            self._readers.append(dragon.io.DataReader(
-                part_idx=part_idx, num_parts=num_parts, **kwargs))
-            self._readers[i]._seed += part_idx
-            self._readers[i].q_out = self._queue1
-            self._readers[i].start()
-            time.sleep(0.1)
-        # Initialize transformers
-        self._transformers = []
-        for i in range(self._num_transformers):
-            p = data_transformer.DataTransformer(**kwargs)
-            p._seed += (i + rank * self._num_transformers)
-            p.q_in, p.q_out = self._queue1, self._queue2
-            p.start()
-            self._transformers.append(p)
-            time.sleep(0.1)
-        # Register cleanup callbacks
-        def cleanup():
-            def terminate(processes):
-                for p in processes:
-                    p.terminate()
-                    p.join()
-            terminate(self._transformers)
-            logger.info('Terminate DataTransformer.')
-            terminate(self._readers)
-            logger.info('Terminate DataReader.')
-        import atexit
-        atexit.register(cleanup)
-    def next(self):
-        """Return the next batch of data."""
-        return self.__next__()
-    def run(self):
-        """Main loop."""
-        num_images = cfg.TRAIN.IMS_PER_BATCH
-        num_batches = cfg.TRAIN.ASPECT_GROUPING
-        logger.info('Initialize prefetching batches...')
-        example_buffer = [self._queue2.get()
-                          for _ in range(num_images * num_batches)]
-        next_examples = []
-        while True:
-            # Use cached buffer for next N examples
-            # Examples are sorted to simulate aspect grouping
-            if len(next_examples) == 0:
-                next_examples = example_buffer
-                next_examples.sort(key=lambda d: d['aspect_ratio'])
-                example_buffer = []
-            # Prepare the next batch
-            outputs = collections.defaultdict(list)
-            for i in range(num_images):
-                example = next_examples.pop(0)
-                outputs['image'].append(example['image'])
-                outputs['gt_boxes'].append(example['boxes'])
-                outputs['gt_segms'].append(example['segms'])
-                outputs['im_info'].append(example['im_info'])
-                outputs['fg_inds'].append(example.get('fg_inds', None))
-                outputs['bg_inds'].append(example.get('bg_inds', None))
-                example_buffer.append(self._queue2.get())
-            outputs['image'] = blob_util.im_list_to_blob(
-                outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
-            # Send batch data to consumer
-            self._queue3.put(outputs)
-    def __iter__(self):
-        """Return the iterator self."""
-        return self
-    def __next__(self):
-        """Return the next batch of data."""
-        return self._queue3.get()
--- a/seetadet/algo/mask_rcnn/data_transformer.py
+++ b/seetadet/algo/mask_rcnn/data_transformer.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import multiprocessing
-import cv2
-import numpy as np
-import numpy.random as npr
-from seetadet.algo import common as algo_common
-from seetadet.core.config import cfg
-from seetadet.datasets.example import Example
-from seetadet.utils.pycocotools import mask_utils
-from seetadet.utils import boxes as box_util
-from seetadet.utils import image as image_util
-class DataTransformer(multiprocessing.Process):
-    """DataTransformer."""
-    def __init__(self, **kwargs):
-        super(DataTransformer, self).__init__()
-        self._scales = cfg.TRAIN.SCALES
-        self._random_scales = cfg.TRAIN.RANDOM_SCALES
-        self._max_size = cfg.TRAIN.MAX_SIZE
-        self._seed = cfg.RNG_SEED
-        self._use_diff = cfg.TRAIN.USE_DIFF
-        self._use_flipped = cfg.TRAIN.USE_FLIPPED
-        self._use_distort = cfg.TRAIN.USE_COLOR_JITTER
-        self._classes = kwargs.get('classes', ('__background__',))
-        self._num_classes = len(self._classes)
-        self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
-        self._anchor_sampler = algo_common.AnchorSampler()
-        self.q_in = self.q_out = None
-        self.daemon = True
-    def get_boxes_and_segms(self, example, im_scale, im_offset, flipped):
-        objects, num_objects = example.objects, 0
-        height, width = example.height, example.width
-        if not self._use_diff:
-            for obj in objects:
-                if obj.get('difficult', 0) == 0:
-                    num_objects += 1
-        else:
-            num_objects = len(objects)
-        boxes, segms = np.zeros((num_objects, 4), 'float32'), []
-        gt_classes = np.zeros((num_objects,), 'float32')
-        segm_flags = np.ones((num_objects,), 'float32')
-        # Filter the difficult instances.
-        object_idx = 0
-        for obj in objects:
-            if not self._use_diff and obj.get('difficult', 0) > 0:
-                continue
-            bbox = obj['bbox']
-            boxes[object_idx, :] = [max(0, bbox[0]),
-                                    max(0, bbox[1]),
-                                    min(bbox[2], width - 1),
-                                    min(bbox[3], height - 1)]
-            if 'mask' in obj:
-                mask_img = mask_utils.bytes2img(obj['mask'], height, width)
-                segms.append(mask_img[:, ::-1] if flipped else mask_img)
-            elif 'polygons' in obj:
-                polygons = obj['polygons']
-                segms.append(box_util.flip_polygons(
-                    polygons, width) if flipped else polygons)
-            else:
-                segms.append(None)
-                segm_flags[object_idx] = 0.
-            gt_classes[object_idx] = self._class_to_ind[obj['name']]
-            object_idx += 1
-        # Flip the boxes if necessary.
-        if flipped:
-            boxes = box_util.flip_boxes(boxes, width)
-        # Scale the boxes to the detecting scale.
-        boxes *= im_scale
-        # Offset the boxes to align the cropping.
-        if im_offset is not None:
-            if min(im_offset[:2]) < 0:
-                raise ValueError('RandomCrop with mask is not supported.')
-        # Attach the classes and mask flags.
-        gt_boxes = np.empty((num_objects, 6), dtype=np.float32)
-        gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
-        gt_boxes[:, 5] = segm_flags  # Has segmentation or not.
-        return gt_boxes, segms
-    def get(self, example):
-        example = Example(example)
-        # Resize.
-        target_size = npr.choice(self._scales)
-        img, im_scale = image_util.resize_image_with_target_size(
-            example.image,
-            target_size=target_size,
-            max_size=self._max_size,
-            random_scales=self._random_scales,
-        )
-        # Flip.
-        flipped = False
-        if self._use_flipped and npr.randint(2) > 0:
-            img = img[:, ::-1]
-            flipped = True
-        # Crop or Pad.
-        im_offset = None
-        if self._max_size == 0:
-            img, im_offset = image_util.get_image_with_target_size(
-                img, target_size)
-        # Distort.
-        if self._use_distort:
-            img = image_util.distort_image(img)
-        # Boxes and segmentations.
-        boxes, segms = self.get_boxes_and_segms(example, im_scale, im_offset, flipped)
-        # Standard outputs.
-        outputs = {'image': img,
-                   'boxes': boxes,
-                   'segms': segms,
-                   'im_info': img.shape[:2] + (im_scale,)}
-        # Attach precomputed targets.
-        if len(boxes) > 0:
-            outputs.update(
-                self._anchor_sampler(
-                    gt_boxes=boxes,
-                    im_info=outputs['im_info']))
-        return outputs
-    def run(self):
-        # Disable the opencv threading.
-        cv2.setNumThreads(1)
-        # Fix the process-local random seed.
-        np.random.seed(self._seed)
-        # Main prefetch loop
-        while True:
-            outputs = self.get(self.q_in.get())
-            if len(outputs['boxes']) < 1:
-                continue  # Ignore non-object image.
-            height, width = outputs['image'].shape[:2]
-            outputs['aspect_ratio'] = float(height) / float(width)
-            self.q_out.put(outputs)
--- a/seetadet/algo/mask_rcnn/proposal_target.py
+++ b/seetadet/algo/mask_rcnn/proposal_target.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import numpy as np
-import numpy.random as npr
-from seetadet.algo.faster_rcnn import utils as rcnn_util
-from seetadet.core.config import cfg
-from seetadet.utils import boxes as box_util
-from seetadet.utils import mask as mask_util
-from seetadet.utils.env import new_tensor
-class ProposalTarget(object):
-    """Assign proposals to ground-truth targets."""
-    def __init__(self):
-        super(ProposalTarget, self).__init__()
-        self.resolution = cfg.MRCNN.RESOLUTION
-        self.num_classes = len(cfg.MODEL.CLASSES)
-        self.defaults = collections.OrderedDict([
-            ('rois', np.array([[-1, 0, 0, 1, 1]], 'float32')),
-            ('labels', np.array([-1], 'int64')),
-            ('bbox_targets', np.zeros((1, 4), 'float32')),
-            ('mask_targets', -np.ones((1, self.resolution, self.resolution), 'float32')),
-        ])
-    def __call__(self, **inputs):
-        num_images = cfg.TRAIN.IMS_PER_BATCH
-        # Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
-        all_rois = inputs['rois']
-        # Prepare for the outputs
-        keys = self.defaults.keys()
-        blobs = dict(map(lambda a, b: (a, b), keys, [[] for _ in keys]))
-        # Generate targets separately
-        for ix in range(num_images):
-            # GT boxes (x1, y1, x2, y2, label)
-            gt_boxes = inputs['gt_boxes'][ix]
-            gt_segms = inputs['gt_segms'][ix]
-            # Extract proposals for this image
-            rois = all_rois[np.where(all_rois[:, 0].astype('int32') == ix)[0]]
-            # Include ground-truth boxes in the set of candidate rois
-            inds = np.ones((gt_boxes.shape[0], 1), gt_boxes.dtype) * ix
-            rois = np.vstack((rois, np.hstack((inds, gt_boxes[:, :4]))))
-            # Sample a batch of RoIs for training
-            rois_per_image = cfg.FRCNN.BATCH_SIZE
-            fg_rois_per_image = np.round(cfg.FRCNN.FG_FRACTION * rois_per_image)
-            rcnn_util.map_returns_to_blobs(
-                sample_rois(
-                    rois,
-                    gt_boxes,
-                    gt_segms,
-                    rois_per_image,
-                    fg_rois_per_image,
-                    inputs['im_info'][ix][2],
-                ), blobs, keys,
-            )
-        # Stack into continuous blobs
-        blobs = dict((k, np.concatenate(blobs[k])) for k in blobs.keys())
-        # Distribute rois into pyramids
-        k_min = cfg.FPN.ROI_MIN_LEVEL
-        k_max = cfg.FPN.ROI_MAX_LEVEL
-        num_levels = k_max - k_min + 1
-        levels = rcnn_util.map_rois_to_levels(blobs['rois'], k_min, k_max)
-        lvl_blobs = rcnn_util.map_blobs_by_levels(
-            blobs,
-            self.defaults,
-            [np.where(levels == (i + k_min))[0] for i in range(num_levels)],
-        )
-        rois_wide = [lvl_blobs['rois'][i] for i in range(num_levels)]
-        mask_rois_wide, mask_labels_wide = [], []
-        # Select the foreground RoIs only for bbox/mask branch
-        for i in range(num_levels):
-            inds = np.where(lvl_blobs['labels'][i] > 0)[0]
-            if len(inds) > 0:
-                mask_rois_wide.append(lvl_blobs['rois'][i][inds])
-                mask_labels_wide.append(lvl_blobs['labels'][i][inds] - 1)
-                lvl_blobs['mask_targets'][i] = lvl_blobs['mask_targets'][i][inds]
-            else:
-                mask_rois_wide.append(self.defaults['rois'])
-                mask_labels_wide.append(np.array([0], 'int64'))
-                lvl_blobs['mask_targets'][i] = self.defaults['mask_targets']
-        blobs = dict((k, np.concatenate(lvl_blobs[k])) for k in blobs.keys())
-        mask_labels = np.concatenate(mask_labels_wide)
-        fg_inds = np.where(blobs['labels'] > 0)[0]
-        bbox_cls_inds = np.arange(len(blobs['rois'])) * self.num_classes
-        mask_cls_inds = np.arange(len(mask_labels)) * (self.num_classes - 1)
-        # Sample a proposal randomly to avoid memory issue
-        if len(fg_inds) == 0:
-            fg_inds = np.random.randint(len(blobs['labels']), size=[1])
-        return {
-            'rois': [new_tensor(rois_wide[i]) for i in range(num_levels)],
-            'mask_rois': [new_tensor(mask_rois_wide[i]) for i in range(num_levels)],
-            'labels': new_tensor(blobs['labels']),
-            'bbox_inds': new_tensor(bbox_cls_inds[fg_inds] + blobs['labels'][fg_inds]),
-            'bbox_targets': new_tensor(blobs['bbox_targets'][fg_inds].astype('float32')),
-            'bbox_anchors': new_tensor(blobs['rois'][fg_inds, 1:].astype('float32')),
-            'mask_inds': new_tensor(mask_cls_inds + mask_labels),
-            'mask_targets': new_tensor(blobs['mask_targets']),
-        }
-def compute_targets(
-    rois,
-    gt_boxes,
-    gt_labels,
-    fg_segms,
-    fg_segms_flag,
-    mask_size,
-    im_scale,
-):
-    """Compute the bounding-box regression targets."""
-    assert rois.shape[0] == gt_boxes.shape[0]
-    assert rois.shape[1] == 4
-    assert gt_boxes.shape[1] == 4
-    # Compute bbox regression targets
-    fg_inds = np.where(gt_labels > 0)[0]
-    bbox_targets = box_util.bbox_transform(rois, gt_boxes, cfg.BBOX_REG_WEIGHTS)
-    # Compute mask classification targets
-    mask_shape = [mask_size] * 2
-    mask_targets = -np.ones([len(rois)] + mask_shape, 'float32')
-    rois_ori = rois / im_scale
-    rois_ori_int = np.round(rois_ori).astype(int)
-    gt_boxes_ori_int = np.round(gt_boxes / im_scale).astype(int)
-    for i, fg_idx in enumerate(fg_inds):
-        if fg_segms_flag[i] > 0:
-            if isinstance(fg_segms[i], list):
-                target = mask_util.warp_mask_via_polygons(
-                    fg_segms[i], rois_ori[i], mask_shape)
-            else:
-                target = mask_util.warp_mask_via_intersection(
-                    fg_segms[i], rois_ori_int[i], gt_boxes_ori_int[i], mask_shape)
-            if target is not None:
-                mask_targets[fg_idx] = target.astype(mask_targets.dtype)
-    return bbox_targets, mask_targets
-def sample_rois(
-    all_rois,
-    gt_boxes,
-    gt_segms,
-    num_rois,
-    num_fg_rois,
-    im_scale,
-):
-    """Sample a batch of RoIs comprising foreground and background examples."""
-    overlaps = box_util.bbox_overlaps(all_rois[:, 1:5], gt_boxes[:, :4])
-    gt_assignment = overlaps.argmax(axis=1)
-    max_overlaps = overlaps.max(axis=1)
-    labels = gt_boxes[gt_assignment, 4].astype('int64')
-    # Select foreground RoIs as those with >= FG_THRESH overlap
-    fg_inds = np.where(max_overlaps >= cfg.FRCNN.POSITIVE_OVERLAP)[0]
-    fg_rois_per_this_image = int(min(num_fg_rois, fg_inds.size))
-    # Sample foreground regions without replacement
-    if fg_inds.size > 0:
-        fg_inds = npr.choice(fg_inds, fg_rois_per_this_image, False)
-    # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
-    bg_inds = np.where((max_overlaps < cfg.FRCNN.NEGATIVE_OVERLAP_HI) &
-                       (max_overlaps >= cfg.FRCNN.NEGATIVE_OVERLAP_LO))[0]
-    # Compute number of background RoIs to take from this image
-    bg_rois_per_this_image = num_rois - fg_rois_per_this_image
-    bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size)
-    # Sample background regions without replacement
-    if bg_inds.size > 0:
-        bg_inds = npr.choice(bg_inds, bg_rois_per_this_image, False)
-    # The indices that we're selecting (both fg and bg)
-    keep_inds = np.append(fg_inds, bg_inds)
-    # Select sampled values from various arrays
-    rois, labels = all_rois[keep_inds], labels[keep_inds]
-    # Clamp labels for the background RoIs to 0
-    labels[fg_rois_per_this_image:] = 0
-    # Compute the target from RoIs
-    outputs = [rois, labels]
-    outputs += compute_targets(
-        rois[:, 1:5],
-        gt_boxes[gt_assignment[keep_inds], :4],
-        labels,
-        [gt_segms[i] for i in gt_assignment[fg_inds]],
-        gt_boxes[gt_assignment[fg_inds], 5],
-        cfg.MRCNN.RESOLUTION,
-        im_scale,
-    )
-    return outputs
--- a/seetadet/algo/mask_rcnn/test.py
+++ b/seetadet/algo/mask_rcnn/test.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import types
-import dragon.vm.torch as torch
-import numpy as np
-from seetadet.algo.faster_rcnn import utils as rcnn_util
-from seetadet.core.config import cfg
-from seetadet.modeling.detector import new_detector
-from seetadet.utils import env
-from seetadet.utils import blob as blob_util
-from seetadet.utils import boxes as box_util
-from seetadet.utils import image as image_util
-from seetadet.utils import logger
-from seetadet.utils import nms as nms_util
-from seetadet.utils import time_util
-def get_data(raw_images):
-    """Return the test data."""
-    max_size = cfg.TEST.MAX_SIZE
-    images_wide = []
-    image_shapes_wide, image_scales_wide = [], []
-    for img in raw_images:
-        images, image_scales = image_util.scale_image(
-            img, scales=cfg.TEST.SCALES, max_size=max_size)
-        images_wide += images
-        image_scales_wide += image_scales
-        image_shapes_wide += [img.shape[:2] for img in images]
-    images = blob_util.im_list_to_blob(
-        images_wide, coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
-    image_shapes = np.array(image_shapes_wide)
-    image_scales = np.array(image_scales_wide).reshape((len(images), -1))
-    images_info = np.hstack([image_shapes, image_scales]).astype('float32')
-    return images, images_info
-def ims_detect(detector, raw_images, timer=None):
-    """Detect a image, with single or multiple scales."""
-    images, images_info = get_data(raw_images)
-    timer.tic() if timer else timer
-    # Do forward
-    inputs = {'image': torch.from_numpy(images),
-              'im_info': torch.from_numpy(images_info)}
-    if not hasattr(detector, 'script_forward'):
-        def script_forward(self, image, im_info):
-            return self.forward({'image': image, 'im_info': im_info})
-        detector.script_forward = torch.jit.trace(
-            func=types.MethodType(script_forward, detector),
-            example_inputs=[inputs['image'], inputs['im_info']],
-        )
-    outputs = detector.script_forward(inputs['image'], inputs['im_info'])
-    outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
-    # Decode results
-    batch_pred = box_util.bbox_transform_inv(
-        outputs['rois'][:, 1:5],
-        outputs['bbox_pred'],
-        cfg.BBOX_REG_WEIGHTS)
-    results = [([], [], []) for _ in range(len(raw_images))]
-    for i in range(len(images)):
-        ii = i // len(cfg.TEST.SCALES)
-        inds = np.where(outputs['rois'][:, 0].astype(np.int32) == i)[0]
-        boxes = batch_pred[inds] / images_info[i, 2]
-        boxes = box_util.clip_tiled_boxes(boxes, raw_images[ii].shape)
-        results[ii][0].append(outputs['cls_prob'][inds])
-        results[ii][1].append(boxes)
-        results[ii][2].append(np.ones((len(inds), 1), 'int32') * i)
-    # Merge from multiple scales
-    ret = [(np.vstack(s), np.vstack(b),
-            np.vstack(i), images_info[:, 2]) for s, b, i in results]
-    timer.toc() if timer else timer
-    return ret
-def mask_detect(detector, rois):
-    k_min = cfg.FPN.ROI_MIN_LEVEL
-    k_max = cfg.FPN.ROI_MAX_LEVEL
-    k = k_max - k_min + 1
-    levels = rcnn_util.map_rois_to_levels(rois, k_min, k_max)
-    level_inds = [np.where(levels == (i + k_min))[0] for i in range(k)]
-    fpn_rois = rcnn_util.map_blobs_by_levels(
-        {'rois': rois[:, :5]},
-        {'rois': np.array([[-1, 0, 0, 1, 1]], 'float32')},
-        level_inds)['rois']
-    with torch.no_grad():
-        mask_score = detector.rcnn.compute_mask_score(
-            rois=[env.new_tensor(r.astype('float32')) for r in fpn_rois])
-    nc, i = mask_score.shape[1], 0
-    mask_inds = {}
-    for inds in level_inds:
-        for idx in inds:
-            cls = int(rois[idx, 5])
-            mask_inds[idx] = (i * nc + cls)
-            i += 1
-        if len(inds) == 0:
-            i += 1
-    mask_inds = list(map(mask_inds.get, sorted(mask_inds)))
-    mask_inds = env.new_tensor(np.array(mask_inds, 'int64'))
-    with torch.no_grad():
-        mask_pred = mask_score.index_select((0, 1), mask_inds)
-    return detector.rcnn.sigmoid(mask_pred).numpy().copy()
-def get_detections(outputs):
-    """Return the categorical detections from outputs."""
-    scores, boxes, batch_inds, im_scales = outputs
-    rois_this_image = []
-    boxes_this_image = [[]]
-    empty_detections = np.zeros((0, 5), 'float32')
-    empty_rois = np.zeros((0, 6), 'float32')
-    for j in range(1, len(cfg.MODEL.CLASSES)):
-        inds = np.where(scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
-        if len(inds) == 0:
-            boxes_this_image.append(empty_detections)
-            rois_this_image.append(empty_rois)
-            continue
-        cls_scores = scores[inds, j]
-        cls_boxes = boxes[inds, j * 4:(j + 1) * 4]
-        cls_batch_inds = batch_inds[inds]
-        cls_detections = np.hstack(
-            (cls_boxes, cls_scores[:, np.newaxis])) \
-            .astype(np.float32, copy=False)
-        if cfg.TEST.USE_SOFT_NMS:
-            keep = nms_util.soft_nms(
-                cls_detections,
-                thresh=cfg.TEST.NMS,
-                method=cfg.TEST.SOFT_NMS_METHOD,
-                sigma=cfg.TEST.SOFT_NMS_SIGMA,
-            )
-        else:
-            keep = nms_util.nms(
-                cls_detections,
-                thresh=cfg.TEST.NMS,
-            )
-        cls_detections = cls_detections[keep, :]
-        cls_batch_inds = cls_batch_inds[keep]
-        boxes_this_image.append(cls_detections)
-        rois_this_image.append(np.hstack((
-            cls_batch_inds,
-            cls_detections[:, :4] * im_scales[cls_batch_inds],
-            np.ones((len(keep), 1)) * (j - 1))))
-    return [boxes_this_image, rois_this_image]
-def test_net(weights, q_in, q_out, device, root_logger=True):
-    """Test a network trained with Mask R-CNN algorithm."""
-    cfg.GPU_ID = device
-    num_classes = len(cfg.MODEL.CLASSES)
-    logger.set_root_logger(root_logger)
-    detector = new_detector(device, weights)
-    timers = time_util.new_timers('im_detect_bbox', 'im_detect_mask', 'misc')
-    must_stop = False
-    while not must_stop:
-        # Wait inputs.
-        indices, raw_images = [], []
-        for _ in range(cfg.TEST.IMS_PER_BATCH):
-            i, raw_image = q_in.get()
-            if i < 0:
-                must_stop = True
-                break
-            indices.append(i)
-            raw_images.append(raw_image)
-        if len(raw_images) == 0:
-            continue
-        # Detect on specific scales.
-        all_outputs = ims_detect(
-            detector=detector,
-            raw_images=raw_images,
-            timer=timers['im_detect_bbox'],
-        )
-        # Post-processing.
-        for i, outputs in enumerate(all_outputs):
-            segms_this_image = [[]]
-            with timers['misc'].tic_and_toc():
-                boxes_this_image, rois_this_image = get_detections(outputs)
-            mask_rois = np.concatenate(rois_this_image)
-            if len(mask_rois) > 0:
-                k = 0
-                timers['im_detect_mask'].tic()
-                mask_pred = mask_detect(detector, mask_rois)
-                for j in range(1, num_classes):
-                    num_pred = len(boxes_this_image[j])
-                    cls_segms = mask_pred[k:k + num_pred]
-                    segms_this_image.append(cls_segms)
-                    k += num_pred
-                timers['im_detect_mask'].toc()
-            q_out.put((
-                indices[i],
-                dict([('im_detect', (timers['im_detect_bbox'].average_time +
-                                     timers['im_detect_mask'].average_time)),
-                      ('misc', timers['misc'].average_time)]),
-                dict([('boxes', boxes_this_image),
-                      ('masks', segms_this_image)]),
-            ))
--- a/seetadet/algo/retinanet/anchor_target.py
+++ b/seetadet/algo/retinanet/anchor_target.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import math
-import numpy as np
-from seetadet.algo.faster_rcnn import generate_anchors as anchor_util
-from seetadet.algo.faster_rcnn import utils as rcnn_util
-from seetadet.core.config import cfg
-from seetadet.utils import boxes as box_util
-from seetadet.utils.env import new_tensor
-class AnchorTarget(object):
-    """Assign ground-truth targets to anchors."""
-    def __init__(self):
-        super(AnchorTarget, self).__init__()
-        # Load the basic configs
-        k_max, k_min = cfg.FPN.RPN_MAX_LEVEL, cfg.FPN.RPN_MIN_LEVEL
-        scales_per_octave = cfg.RETINANET.SCALES_PER_OCTAVE
-        anchor_scale = cfg.RETINANET.ANCHOR_SCALE
-        self.strides = [2. ** lvl for lvl in range(k_min, k_max + 1)]
-        self.ratios = cfg.RETINANET.ASPECT_RATIOS
-        # Generate base anchors
-        self.base_anchors = []
-        for stride in self.strides:
-            sizes = [stride * anchor_scale *
-                     (2 ** (octave / float(scales_per_octave)))
-                     for octave in range(scales_per_octave)]
-            self.base_anchors.append(
-                anchor_util.generate_anchors_v2(
-                    stride=stride,
-                    ratios=self.ratios,
-                    sizes=sizes))
-        # Plan the maximum anchor layout
-        max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
-        if cfg.MODEL.COARSEST_STRIDE > 0:
-            stride = float(cfg.MODEL.COARSEST_STRIDE)
-            max_size = int(math.ceil(max_size / stride) * stride)
-        self.max_shapes = [[math.ceil(max_size / stride)] * 2
-                           for stride in self.strides]
-        self.all_coords = rcnn_util.get_shifted_coords(
-            self.max_shapes, self.base_anchors)
-        self.all_anchors = rcnn_util.get_shifted_anchors(
-            self.max_shapes, self.base_anchors, self.strides)
-    def sample_anchors(self, gt_boxes, im_info, all_anchors=None):
-        all_anchors = self.all_anchors \
-            if all_anchors is None else all_anchors
-        # Remove anchors separating from the image
-        inds_inside = np.where((all_anchors[:, 0] < im_info[1]) &
-                               (all_anchors[:, 1] < im_info[0]))[0]
-        anchors = all_anchors[inds_inside, :]
-        num_inside = len(anchors)
-        labels = np.empty((num_inside,), dtype='int32')
-        labels.fill(-1)
-        # Overlaps between the anchors and the gt boxes.
-        overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
-        argmax_overlaps = overlaps.argmax(axis=1)
-        max_overlaps = overlaps[np.arange(num_inside), argmax_overlaps]
-        # Foreground: for each gt, anchor with highest overlap.
-        gt_argmax_overlaps = overlaps.argmax(axis=0)
-        gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])]
-        gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
-        gt_assignment = argmax_overlaps[gt_argmax_overlaps]
-        labels[gt_argmax_overlaps] = gt_boxes[gt_assignment, 4]
-        # Foreground: above threshold IoU.
-        inds = max_overlaps >= cfg.RETINANET.POSITIVE_OVERLAP
-        gt_assignment = argmax_overlaps[inds]
-        labels[inds] = gt_boxes[gt_assignment, 4]
-        # Background: below threshold IoU.
-        labels[max_overlaps < cfg.RETINANET.NEGATIVE_OVERLAP] = 0
-        # Retract the clamping if we don't have one.
-        fg_inds = np.where(labels > 0)[0]
-        if len(fg_inds) == 0:
-            gt_assignment = argmax_overlaps[gt_argmax_overlaps]
-            labels[gt_argmax_overlaps] = gt_boxes[gt_assignment, 4]
-            fg_inds = np.where(labels > 0)[0]
-        # Select ignore labels to avoid too many negatives
-        # (~100x faster for 200 background indices)
-        ignore_inds = np.where(labels < 0)[0]
-        return inds_inside[fg_inds], inds_inside[ignore_inds]
-    def __call__(self, **inputs):
-        num_images = cfg.TRAIN.IMS_PER_BATCH
-        shapes = [f.shape[-2:] for f in inputs['features']]
-        image_stride = sum(self.base_anchors[i].shape[0] * np.prod(shapes[i])
-                           for i in range(len(inputs['features'])))
-        narrow_args = [self.all_coords, self.base_anchors, self.max_shapes, shapes]
-        outputs = collections.defaultdict(list)
-        # Label: ``1`` is positive, ``0`` is negative, ``-1` is don't care
-        output_labels = np.zeros((num_images, image_stride,), 'int64')
-        for ix in range(num_images):
-            fg_inds = inputs['fg_inds'][ix]
-            ignore_inds = inputs['bg_inds'][ix]
-            gt_boxes = inputs['gt_boxes'][ix]
-            # Narrow anchors to match the feature layout
-            anchors = self.all_anchors[fg_inds]
-            ignore_inds = rcnn_util.narrow_anchors(*(narrow_args + [ignore_inds]))
-            _, anchors = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds, anchors]))
-            fg_inds = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds]))
-            # Compute bbox targets
-            gt_assignment = box_util.bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
-            bbox_targets = box_util.bbox_transform(anchors, gt_boxes[gt_assignment, :4])
-            outputs['bbox_anchors'].append(anchors)
-            outputs['bbox_targets'].append(bbox_targets)
-            # Compute label assignments
-            output_labels[ix, ignore_inds] = -1
-            output_labels[ix, fg_inds] = gt_boxes[gt_assignment, 4]
-            # Compute sparse indices
-            fg_inds += ix * image_stride
-            outputs['bbox_inds'].extend([fg_inds])
-        return {
-            'labels': new_tensor(output_labels),
-            'bbox_inds': new_tensor(
-                np.concatenate(outputs['bbox_inds'])),
-            'bbox_targets': new_tensor(
-                np.concatenate(outputs['bbox_targets']).astype('float32')),
-            'bbox_anchors': new_tensor(
-                np.concatenate(outputs['bbox_anchors']).astype('float32')),
-        }
--- a/seetadet/algo/retinanet/test.py
+++ b/seetadet/algo/retinanet/test.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import types
-import dragon.vm.torch as torch
-import numpy as np
-from seetadet.core.config import cfg
-from seetadet.modeling.detector import new_detector
-from seetadet.utils import blob as blob_util
-from seetadet.utils import image as image_util
-from seetadet.utils import logger
-from seetadet.utils import nms as nms_util
-from seetadet.utils import time_util
-def get_data(raw_images):
-    """Return the test data."""
-    max_size = cfg.TEST.MAX_SIZE
-    if cfg.PIPELINE.TYPE.lower() == 'ssd':
-        max_size = 0  # Warped to a fixed size
-    images_wide = []
-    image_shapes_wide, image_scales_wide = [], []
-    for img in raw_images:
-        images, image_scales = image_util.scale_image(
-            img, scales=cfg.TEST.SCALES, max_size=max_size)
-        images_wide += images
-        image_scales_wide += image_scales
-        image_shapes_wide += [img.shape[:2] for img in images]
-    images = blob_util.im_list_to_blob(
-        images_wide, coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
-    image_shapes = np.array(image_shapes_wide)
-    image_scales = np.array(image_scales_wide).reshape((len(images), -1))
-    images_info = np.hstack([image_shapes, image_scales]).astype('float32')
-    return images, images_info
-def ims_detect(detector, raw_images, timer=None):
-    """Detect images at single or multiple scales."""
-    images, images_info = get_data(raw_images)
-    timer.tic() if timer else timer
-    # Do Forward
-    inputs = {'image': torch.from_numpy(images),
-              'im_info': torch.from_numpy(images_info)}
-    if not hasattr(detector, 'script_forward'):
-        def script_forward(self, image, im_info):
-            return self.forward({'image': image, 'im_info': im_info})
-        detector.script_forward = torch.jit.trace(
-            func=types.MethodType(script_forward, detector),
-            example_inputs=[inputs['image'], inputs['im_info']],
-        )
-    outputs = detector.script_forward(inputs['image'], inputs['im_info'])
-    outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
-    # Decode results
-    detections = outputs['detections']
-    results = [[] for _ in range(len(raw_images))]
-    for i in range(len(images)):
-        inds = np.where(detections[:, 0].astype(np.int32) == i)[0]
-        results[i // len(cfg.TEST.SCALES)].append(detections[inds, 1:])
-    # Merge from multiple scales
-    ret = [np.vstack(d) for d in results]
-    timer.toc() if timer else timer
-    return ret
-def get_detections(outputs):
-    """Return the categorical detections from outputs."""
-    num_classes = len(cfg.MODEL.CLASSES)
-    boxes_this_image = [[]]
-    raw_detections = outputs
-    empty_detections = np.zeros((0, 5), 'float32')
-    for j in range(1, num_classes):
-        cls_indices = np.where(
-            raw_detections[:, 5].astype(np.int32) == j)[0]
-        if len(cls_indices) == 0:
-            boxes_this_image.append(empty_detections)
-            continue
-        cls_boxes = raw_detections[cls_indices, :4]
-        cls_scores = raw_detections[cls_indices, 4]
-        cls_detections = np.hstack((
-            cls_boxes, cls_scores[:, np.newaxis])) \
-            .astype(np.float32, copy=False)
-        if cfg.TEST.USE_SOFT_NMS:
-            keep = nms_util.soft_nms(
-                cls_detections,
-                thresh=cfg.TEST.NMS,
-                method=cfg.TEST.SOFT_NMS_METHOD,
-                sigma=cfg.TEST.SOFT_NMS_SIGMA,
-            )
-        else:
-            keep = nms_util.nms(
-                cls_detections,
-                thresh=cfg.TEST.NMS,
-            )
-        cls_detections = cls_detections[keep, :]
-        boxes_this_image.append(cls_detections)
-    return [boxes_this_image]
-def test_net(weights, q_in, q_out, device, root_logger=True):
-    """Test a network trained with RetinaNet algorithm."""
-    cfg.GPU_ID = device
-    logger.set_root_logger(root_logger)
-    detector = new_detector(device, weights)
-    timers = time_util.new_timers('im_detect_bbox', 'misc')
-    must_stop = False
-    while not must_stop:
-        # Wait inputs.
-        indices, raw_images = [], []
-        for _ in range(cfg.TEST.IMS_PER_BATCH):
-            i, raw_image = q_in.get()
-            if i < 0:
-                must_stop = True
-                break
-            indices.append(i)
-            raw_images.append(raw_image)
-        if len(raw_images) == 0:
-            continue
-        # Detect on specific scales.
-        all_outputs = ims_detect(detector, raw_images, timers['im_detect_bbox'])
-        # Post-processing.
-        for i, outputs in enumerate(all_outputs):
-            with timers['misc'].tic_and_toc():
-                boxes_this_image, = get_detections(outputs)
-            q_out.put((
-                indices[i],
-                dict([('im_detect', timers['im_detect_bbox'].average_time),
-                      ('misc', timers['misc'].average_time)]),
-                dict([('boxes', boxes_this_image)]),
-            ))
--- a/seetadet/algo/ssd/anchor_target.py
+++ b/seetadet/algo/ssd/anchor_target.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import math
-import numpy as np
-from seetadet.algo.ssd import generate_anchors as anchor_util
-from seetadet.algo.ssd import utils as ssd_util
-from seetadet.core.config import cfg
-from seetadet.utils import boxes as box_util
-from seetadet.utils.env import new_tensor
-class AnchorTarget(object):
-    """Assign ground-truth targets to anchors."""
-    def __init__(self):
-        super(AnchorTarget, self).__init__()
-        # Load the basic configs
-        self.strides = cfg.SSD.STRIDES
-        anchor_sizes = cfg.SSD.ANCHOR_SIZES
-        aspect_ratios = cfg.SSD.ASPECT_RATIOS
-        self.base_anchors = []
-        for i in range(len(anchor_sizes)):
-            ratios = aspect_ratios[i]
-            if not isinstance(ratios, (tuple, list)):
-                # All strides share the same ratios
-                ratios = aspect_ratios
-            self.base_anchors.append(
-                anchor_util.generate_anchors(
-                    min_sizes=[anchor_sizes[i][0]],
-                    max_sizes=[anchor_sizes[i][1]],
-                    ratios=ratios))
-        # Plan the fixed anchor layout
-        max_size = cfg.TRAIN.SCALES[0]
-        if cfg.MODEL.COARSEST_STRIDE > 0:
-            stride = float(cfg.MODEL.COARSEST_STRIDE)
-            max_size = int(math.ceil(max_size / stride) * stride)
-        shapes = [[math.ceil(max_size / stride)] * 2
-                  for stride in self.strides]
-        self.all_anchors = ssd_util.get_shifted_anchors(
-            shapes, self.base_anchors, self.strides)
-    def sample_anchors(self, gt_boxes, all_anchors=None):
-        anchors = self.all_anchors \
-            if all_anchors is None else all_anchors
-        num_anchors = len(anchors)
-        labels = np.empty((num_anchors,), dtype='int32')
-        labels.fill(-1)
-        # Overlaps between the anchors and the gt boxes.
-        overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
-        argmax_overlaps = overlaps.argmax(axis=1)
-        max_overlaps = overlaps[np.arange(num_anchors), argmax_overlaps]
-        # Foreground: for each gt, anchor with highest overlap.
-        gt_argmax_overlaps = overlaps.argmax(axis=0)
-        gt_assignment = argmax_overlaps[gt_argmax_overlaps]
-        labels[gt_argmax_overlaps] = gt_boxes[gt_assignment, 4]
-        # Foreground: above threshold IoU.
-        inds = max_overlaps >= cfg.SSD.POSITIVE_OVERLAP
-        gt_assignment = argmax_overlaps[inds]
-        labels[inds] = gt_boxes[gt_assignment, 4]
-        fg_inds = np.where(labels > 0)[0]
-        # Negative: not matched and below threshold IoU.
-        neg_inds = np.where(labels <= 0)[0]
-        neg_overlaps = max_overlaps[neg_inds]
-        eligible_neg_inds = np.where(neg_overlaps < cfg.SSD.NEGATIVE_OVERLAP)[0]
-        neg_inds = neg_inds[eligible_neg_inds]
-        return fg_inds, neg_inds
-    def __call__(self, **inputs):
-        num_images = cfg.TRAIN.IMS_PER_BATCH
-        neg_pos_ratio = cfg.SSD.NEGATIVE_POSITIVE_RATIO
-        image_stride = self.all_anchors.shape[0]
-        cls_prob = inputs['cls_prob'].numpy()
-        outputs = collections.defaultdict(list)
-        # Label: ``1`` is positive, ``0`` is negative, ``-1` is don't care
-        output_labels = np.empty((num_images, image_stride,), 'int64')
-        output_labels.fill(-1)
-        for ix in range(num_images):
-            fg_inds = inputs['fg_inds'][ix]
-            neg_inds = inputs['bg_inds'][ix]
-            gt_boxes = inputs['gt_boxes'][ix]
-            # Mining hard negatives as background.
-            num_pos, num_neg = len(fg_inds), len(neg_inds)
-            num_bg = min(int(num_pos * neg_pos_ratio), num_neg)
-            neg_loss = -np.log(np.maximum(
-                cls_prob[ix, neg_inds][np.arange(num_neg),
-                                       np.zeros((num_neg,), 'int32')],
-                np.finfo(float).eps))
-            bg_inds = neg_inds[np.argsort(-neg_loss)][:num_bg]
-            # Compute bbox targets.
-            anchors = self.all_anchors[fg_inds]
-            gt_assignment = box_util.bbox_overlaps(
-                anchors, gt_boxes).argmax(axis=1)
-            bbox_targets = box_util.bbox_transform(
-                anchors, gt_boxes[gt_assignment, :4],
-                cfg.BBOX_REG_WEIGHTS)
-            outputs['bbox_anchors'].append(anchors)
-            outputs['bbox_targets'].append(bbox_targets)
-            # Compute label assignments.
-            output_labels[ix, bg_inds] = 0
-            output_labels[ix, fg_inds] = gt_boxes[gt_assignment, 4]
-            # Compute sparse indices.
-            fg_inds += ix * image_stride
-            outputs['bbox_inds'].extend([fg_inds])
-        return {
-            'labels': new_tensor(output_labels),
-            'bbox_inds': new_tensor(
-                np.concatenate(outputs['bbox_inds'])),
-            'bbox_targets': new_tensor(
-                np.concatenate(outputs['bbox_targets']).astype('float32')),
-            'bbox_anchors': new_tensor(
-                np.concatenate(outputs['bbox_anchors']).astype('float32')),
-        }
--- a/seetadet/algo/ssd/data_loader.py
+++ b/seetadet/algo/ssd/data_loader.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import multiprocessing as mp
-import time
-import threading
-import queue
-import dragon
-import dragon.vm.torch as torch
-import numpy as np
-from seetadet.algo.ssd import data_transformer
-from seetadet.core.config import cfg
-from seetadet.datasets.factory import get_dataset
-from seetadet.utils import blob as blob_util
-from seetadet.utils import logger
-class DataLoader(object):
-    """Provide mini-batches of data."""
-    def __init__(self):
-        super(DataLoader, self).__init__()
-        dataset = get_dataset(cfg.TRAIN.DATASET)
-        self.iterator = Iterator(**{
-            'dataset': dataset.cls,
-            'source': dataset.source,
-            'classes': dataset.classes,
-            'shuffle': cfg.TRAIN.USE_SHUFFLE,
-            'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
-            'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
-        })
-        self.iterator.start()
-    def __call__(self):
-        outputs = self.iterator.next()
-        if isinstance(outputs['image'], np.ndarray):
-            outputs['image'] = torch.from_numpy(outputs['image'])
-        return outputs
-class Iterator(threading.Thread):
-    """Iterator to return the batch of data."""
-    def __init__(self, **kwargs):
-        super(Iterator, self).__init__()
-        # Distributed settings
-        rank, group_size = 0, 1
-        process_group = dragon.distributed.get_group()
-        if process_group is not None and \
-                kwargs.get('phase', 'TRAIN') == 'TRAIN':
-            group_size = process_group.size
-            rank = dragon.distributed.get_rank(process_group)
-        # Configuration
-        self._batch_size = kwargs.get('batch_size', 8)
-        self._num_readers = kwargs.get('num_readers', 1)
-        self._num_transformers = kwargs.get('num_transformers', 3)
-        self.daemon = True
-        # Initialize queues
-        num_batches = self._num_readers
-        self._queue1 = mp.Queue(num_batches * self._batch_size)
-        self._queue2 = mp.Queue(num_batches * self._batch_size)
-        self._queue3 = queue.Queue(num_batches)
-        # Initialize readers
-        self._readers = []
-        for i in range(self._num_readers):
-            part_idx, num_parts = i, self._num_readers
-            num_parts *= group_size
-            part_idx += rank * self._num_readers
-            self._readers.append(dragon.io.DataReader(
-                part_idx=part_idx, num_parts=num_parts, **kwargs))
-            self._readers[i]._seed += part_idx
-            self._readers[i].q_out = self._queue1
-            self._readers[i].start()
-            time.sleep(0.1)
-        # Initialize transformers
-        self._transformers = []
-        for i in range(self._num_transformers):
-            p = data_transformer.DataTransformer(**kwargs)
-            p._seed += (i + rank * self._num_transformers)
-            p.q_in, p.q_out = self._queue1, self._queue2
-            p.start()
-            self._transformers.append(p)
-            time.sleep(0.1)
-        # Register cleanup callbacks
-        def cleanup():
-            def terminate(processes):
-                for p in processes:
-                    p.terminate()
-                    p.join()
-            terminate(self._transformers)
-            logger.info('Terminate DataTransformer.')
-            terminate(self._readers)
-            logger.info('Terminate DataReader.')
-        import atexit
-        atexit.register(cleanup)
-    def next(self):
-        """Return the next batch of data."""
-        return self.__next__()
-    def run(self):
-        """Main loop."""
-        num_images = cfg.TRAIN.IMS_PER_BATCH
-        num_batches = cfg.TRAIN.ASPECT_GROUPING
-        logger.info('Initialize prefetching batches...')
-        example_buffer = [self._queue2.get()
-                          for _ in range(num_images * num_batches)]
-        next_examples = []
-        while True:
-            # Use cached buffer for next N examples
-            if len(next_examples) == 0:
-                next_examples = example_buffer
-                example_buffer = []
-            # Prepare the next batch
-            outputs = collections.defaultdict(list)
-            for i in range(num_images):
-                example = next_examples.pop(0)
-                outputs['image'].append(example['image'])
-                outputs['gt_boxes'].append(example['boxes'])
-                outputs['im_info'].append(example['im_info'])
-                outputs['fg_inds'].append(example.get('fg_inds', None))
-                outputs['bg_inds'].append(example.get('bg_inds', None))
-                example_buffer.append(self._queue2.get())
-            outputs['image'] = blob_util.im_list_to_blob(
-                outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
-            # Send batch data to consumer
-            self._queue3.put(outputs)
-    def __iter__(self):
-        """Return the iterator self."""
-        return self
-    def __next__(self):
-        """Return the next batch of data."""
-        return self._queue3.get()
--- a/seetadet/algo/ssd/data_transformer.py
+++ b/seetadet/algo/ssd/data_transformer.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import multiprocessing
-import cv2
-import numpy as np
-import numpy.random as npr
-from seetadet.algo import common as algo_common
-from seetadet.algo.ssd import transforms
-from seetadet.core.config import cfg
-from seetadet.datasets.example import Example
-from seetadet.utils import boxes as box_util
-class DataTransformer(multiprocessing.Process):
-    """DataTransformer."""
-    def __init__(self, **kwargs):
-        super(DataTransformer, self).__init__()
-        self._scale = cfg.TRAIN.SCALES[0]
-        self._seed = cfg.RNG_SEED
-        self._use_diff = cfg.TRAIN.USE_DIFF
-        self._use_flipped = cfg.TRAIN.USE_FLIPPED
-        self._classes = kwargs.get('classes', ('__background__',))
-        self._num_classes = len(self._classes)
-        self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
-        self._anchor_sampler = algo_common.AnchorSampler()
-        self._apply_transform = transforms.Compose(transforms.Distort(),
-                                                   transforms.Expand(),
-                                                   transforms.Sample(),
-                                                   transforms.Resize())
-        self.q_in = self.q_out = None
-        self.daemon = True
-    def get_boxes(self, example, flipped):
-        objects, num_objects = example.objects, 0
-        height, width = example.height, example.width
-        if not self._use_diff:
-            for obj in objects:
-                if obj.get('difficult', 0) == 0:
-                    num_objects += 1
-        else:
-            num_objects = len(objects)
-        boxes = np.zeros((num_objects, 4), 'float32')
-        gt_classes = np.zeros((num_objects,), 'int32')
-        # Filter the difficult instances.
-        object_idx = 0
-        for obj in objects:
-            if not self._use_diff and obj.get('difficult', 0) > 0:
-                continue
-            bbox = obj['bbox']
-            boxes[object_idx, :] = [max(0, bbox[0]),
-                                    max(0, bbox[1]),
-                                    min(bbox[2], width - 1),
-                                    min(bbox[3], height - 1)]
-            gt_classes[object_idx] = self._class_to_ind[obj['name']]
-            object_idx += 1
-        # Flip the boxes if necessary.
-        if flipped:
-            boxes = box_util.flip_boxes(boxes, width)
-        # Normalize.
-        boxes[:, 0::2] /= width
-        boxes[:, 1::2] /= height
-        # Attach the classes.
-        gt_boxes = np.empty((num_objects, 5), dtype=np.float32)
-        gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
-        return gt_boxes
-    def get(self, example):
-        example = Example(example)
-        img = example.image
-        # Flip.
-        flipped = False
-        if self._use_flipped and npr.randint(2) > 0:
-            img = img[:, ::-1]
-            flipped = True
-        # Boxes.
-        boxes = self.get_boxes(example, flipped)
-        # Return to avoid the invalid transforms.
-        if len(boxes) == 0:
-            return {'boxes': boxes}
-        # Distort => Expand => Sample => Resize
-        img, boxes = self._apply_transform(img, boxes)
-        # Restore to the blob scale.
-        boxes[:, :4] *= self._scale
-        # Standard outputs.
-        outputs = {'image': img,
-                   'boxes': boxes,
-                   'im_info': img.shape[:2]}
-        # Attach precomputed targets.
-        if len(boxes) > 0:
-            outputs.update(
-                self._anchor_sampler(
-                    gt_boxes=boxes,
-                    im_info=outputs['im_info']))
-        return outputs
-    def run(self):
-        # Disable the opencv threading.
-        cv2.setNumThreads(1)
-        # Fix the process-local random seed.
-        np.random.seed(self._seed)
-        # Main prefetch loop
-        while True:
-            outputs = self.get(self.q_in.get())
-            if len(outputs['boxes']) < 1:
-                continue  # Ignore non-object image.
-            self.q_out.put(outputs)
--- a/seetadet/algo/ssd/generate_anchors.py
+++ b/seetadet/algo/ssd/generate_anchors.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import numpy as np
-def generate_anchors(min_sizes, max_sizes, ratios):
-    """Generate anchors by enumerating aspect ratios and sizes."""
-    total_anchors = []
-    for idx, min_size in enumerate(min_sizes):
-        # Note that SSD assume it is a center anchor
-        base_anchor = np.array([0, 0, min_size, min_size])
-        anchors = _ratio_enum(base_anchor, ratios, _mkanchors)
-        if len(max_sizes) > 0:
-            max_size = max_sizes[idx]
-            _anchors = anchors[0].reshape((1, 4))
-            _anchors = np.vstack([
-                _anchors,
-                _max_size_enum(
-                    base_anchor,
-                    min_size,
-                    max_size,
-                    _mkanchors,
-                )])
-            anchors = np.vstack([_anchors, anchors[1:]])
-        total_anchors.append(anchors)
-    return np.vstack(total_anchors)
-def _whctrs(anchor):
-    """Return width, height, x center, and y center for an anchor (window)."""
-    w, h = anchor[2], anchor[3]
-    x_ctr, y_ctr = anchor[0], anchor[1]
-    return w, h, x_ctr, y_ctr
-def _mkanchors(ws, hs, x_ctr, y_ctr):
-    """Given a vector of widths (ws) and heights (hs) around a center
-    (x_ctr, y_ctr), output a set of anchors (windows).
-    """
-    ws, hs = ws[:, np.newaxis], hs[:, np.newaxis]
-    return np.hstack((
-        x_ctr - 0.5 * ws,
-        y_ctr - 0.5 * hs,
-        x_ctr + 0.5 * ws,
-        y_ctr + 0.5 * hs,
-    ))
-def _mkanchors_v2(ws, hs, x_ctr, y_ctr):
-    """Given a vector of widths (ws) and heights (hs) around a center
-    (x_ctr, y_ctr), output a set of anchors (windows).
-    """
-    ws, hs = ws[:, np.newaxis], hs[:, np.newaxis]
-    return np.hstack((0 * (ws) + x_ctr, 0 * (hs) + y_ctr, ws, hs))
-def _ratio_enum(anchor, ratios, make_fn):
-    """Enumerate a set of anchors for each aspect ratio wrt an anchor."""
-    w, h, x_ctr, y_ctr = _whctrs(anchor)
-    size = w * h
-    size_ratios = size / ratios
-    hs = np.round(np.sqrt(size_ratios))
-    ws = np.round(hs * ratios)
-    return make_fn(ws, hs, x_ctr, y_ctr)
-def _max_size_enum(base_anchor, min_size, max_size, make_fn):
-    """Enumerate a anchor for max_size wrt base_anchor."""
-    w, h, x_ctr, y_ctr = _whctrs(base_anchor)
-    ws = hs = np.sqrt([min_size * max_size])
-    return make_fn(ws, hs, x_ctr, y_ctr)
-if __name__ == '__main__':
-    print(generate_anchors(min_sizes=[30], max_sizes=[60], ratios=[1]))
--- a/seetadet/algo/ssd/test.py
+++ b/seetadet/algo/ssd/test.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import types
-import dragon.vm.torch as torch
-import numpy as np
-from seetadet.core.config import cfg
-from seetadet.modeling.detector import new_detector
-from seetadet.utils import blob as blob_util
-from seetadet.utils import boxes as box_util
-from seetadet.utils import image as image_util
-from seetadet.utils import logger
-from seetadet.utils import nms as nms_util
-from seetadet.utils import time_util
-def get_data(raw_images):
-    """Return the test data."""
-    images_wide, image_scales_wide = [], []
-    for img in raw_images:
-        images, image_scales = image_util.scale_image(
-            img, scales=cfg.TEST.SCALES, max_size=0)
-        images_wide += images
-        image_scales_wide += image_scales
-    images_wide = blob_util.im_list_to_blob(
-        images_wide, coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
-    return images_wide, image_scales_wide
-def ims_detect(detector, raw_images, timer=None):
-    """Detect images at single or multiple scales."""
-    images, image_scales = get_data(raw_images)
-    timer.tic() if timer else timer
-    # Do forward
-    inputs = {'image': torch.from_numpy(images)}
-    if not hasattr(detector, 'script_forward'):
-        def script_forward(self, image):
-            return self.forward({'image': image})
-        detector.script_forward = torch.jit.trace(
-            func=types.MethodType(script_forward, detector),
-            example_inputs=[inputs['image']],
-        )
-    outputs = detector.script_forward(inputs['image'])
-    timer.toc() if timer else timer
-    # Decode results
-    batch_pred = outputs['bbox_pred'].numpy()
-    batch_scores = outputs['cls_prob'].numpy()
-    results = [([], []) for _ in range(len(raw_images))]
-    for i in range(len(images)):
-        boxes = box_util.bbox_transform_inv(
-            outputs['prior_boxes'], batch_pred[i],
-            cfg.BBOX_REG_WEIGHTS)
-        boxes[:, 0::2] /= image_scales[i][1]
-        boxes[:, 1::2] /= image_scales[i][0]
-        boxes = box_util.clip_boxes(boxes, raw_images[i].shape)
-        results[i // len(cfg.TEST.SCALES)][0].append(batch_scores[i])
-        results[i // len(cfg.TEST.SCALES)][1].append(boxes)
-    # Merge from multiple scales
-    ret = [(np.vstack(s), np.vstack(b)) for s, b in results]
-    timer.toc() if timer else timer
-    return ret
-def get_detections(outputs):
-    """Return the categorical detections from outputs."""
-    scores, boxes = outputs
-    boxes_this_image = [[]]
-    for j in range(1, len(cfg.MODEL.CLASSES)):
-        inds = np.where(scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
-        cls_scores = scores[inds, j]
-        cls_boxes = boxes[inds]
-        pre_nms_inds = np.argsort(-cls_scores)[:cfg.TEST.PRE_NMS_TOP_N]
-        cls_scores = cls_scores[pre_nms_inds]
-        cls_boxes = cls_boxes[pre_nms_inds]
-        cls_detections = np.hstack(
-            (cls_boxes, cls_scores[:, np.newaxis])) \
-            .astype(np.float32, copy=False)
-        if cfg.TEST.USE_SOFT_NMS:
-            keep = nms_util.soft_nms(
-                cls_detections,
-                thresh=cfg.TEST.NMS,
-                method=cfg.TEST.SOFT_NMS_METHOD,
-                sigma=cfg.TEST.SOFT_NMS_SIGMA,
-            )
-        else:
-            keep = nms_util.nms(
-                cls_detections,
-                thresh=cfg.TEST.NMS,
-            )
-        cls_detections = cls_detections[keep, :]
-        boxes_this_image.append(cls_detections)
-    return [boxes_this_image]
-def test_net(weights, q_in, q_out, device, root_logger=True):
-    """Test a network trained with SSD algorithm."""
-    cfg.GPU_ID = device
-    logger.set_root_logger(root_logger)
-    detector = new_detector(device, weights)
-    timers = time_util.new_timers('im_detect_bbox', 'misc')
-    must_stop = False
-    while not must_stop:
-        # Wait inputs.
-        indices, raw_images = [], []
-        for _ in range(cfg.TEST.IMS_PER_BATCH):
-            i, raw_image = q_in.get()
-            if i < 0:
-                must_stop = True
-                break
-            indices.append(i)
-            raw_images.append(raw_image)
-        if len(raw_images) == 0:
-            continue
-        # Detect on specific scales.
-        all_outputs = ims_detect(
-            detector=detector,
-            raw_images=raw_images,
-            timer=timers['im_detect_bbox'],
-        )
-        # Post-processing.
-        for i, outputs in enumerate(all_outputs):
-            with timers['misc'].tic_and_toc():
-                boxes_this_image, = get_detections(outputs)
-            q_out.put((
-                indices[i],
-                dict([('im_detect', timers['im_detect_bbox'].average_time),
-                      ('misc', timers['misc'].average_time)]),
-                dict([('boxes', boxes_this_image)]),
-            ))
--- a/seetadet/algo/ssd/transforms.py
+++ b/seetadet/algo/ssd/transforms.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import math
-import numpy as np
-import numpy.random as npr
-import PIL.Image
-import PIL.ImageEnhance
-from seetadet.core.config import cfg
-from seetadet.utils import boxes as box_util
-from seetadet.utils import boxes_v2 as box_util_v2
-from seetadet.utils import image as image_util
-class Compose(object):
-    """Compose the several transforms together."""
-    def __init__(self, *transforms):
-        self.transforms = transforms
-    def __call__(self, img, boxes):
-        for transform in self.transforms:
-            img, boxes = transform.apply(img, boxes)
-        return img, boxes
-class Distort(object):
-    """Distort the brightness, contrast and color of image."""
-    def __init__(self):
-        self._prob = 0.5 if cfg.TRAIN.USE_COLOR_JITTER else 0
-    def apply(self, img, boxes=None):
-        if self._prob > 0:
-            transforms = [PIL.ImageEnhance.Brightness,
-                          PIL.ImageEnhance.Contrast,
-                          PIL.ImageEnhance.Color]
-            npr.shuffle(transforms)
-            img = PIL.Image.fromarray(img)
-            for transform in transforms:
-                if npr.uniform() < self._prob:
-                    img = transform(img)
-                    img = img.enhance(1. + npr.uniform(-.4, .4))
-            img = np.array(img)
-        return img, boxes
-class Expand(object):
-    """Expand image to get smaller objects."""
-    def __init__(self):
-        self._max_ratio = 1. / cfg.TRAIN.RANDOM_SCALES[0]
-        self._expand_prob = 0.5 if self._max_ratio > 1 else 0
-    def apply(self, img, boxes=None):
-        prob = npr.uniform()
-        if prob > self._expand_prob:
-            return img, boxes
-        ratio = npr.uniform(1., self._max_ratio)
-        im_h, im_w = img.shape[:2]
-        expand_h, expand_w = int(im_h * ratio), int(im_w * ratio)
-        h_off = int(math.floor(npr.uniform(0., expand_h - im_h)))
-        w_off = int(math.floor(npr.uniform(0., expand_w - im_w)))
-        new_img = np.empty((expand_h, expand_w, 3), dtype=np.uint8)
-        new_img[:] = cfg.PIXEL_MEANS
-        new_img[h_off:h_off + im_h, w_off:w_off + im_w, :] = img
-        if boxes is not None:
-            new_boxes = boxes.astype(boxes.dtype, copy=True)
-            new_boxes[:, 0] = (boxes[:, 0] * im_w + w_off) / expand_w
-            new_boxes[:, 1] = (boxes[:, 1] * im_h + h_off) / expand_h
-            new_boxes[:, 2] = (boxes[:, 2] * im_w + w_off) / expand_w
-            new_boxes[:, 3] = (boxes[:, 3] * im_h + h_off) / expand_h
-            boxes = new_boxes
-        return new_img, boxes
-class Resize(object):
-    """Resize image."""
-    def __init__(self):
-        self._target_size = (cfg.TRAIN.SCALES[0],) * 2
-    def apply(self, img, boxes):
-        return image_util.resize_image(img, size=self._target_size), boxes
-class Sample(object):
-    """Crop image by sampling a region restricted by bounding boxes."""
-    def __init__(self):
-        min_scale, max_scale = \
-            cfg.PIPELINE.RANDOM_BBOX_CROP.SCALING
-        min_aspect_ratio, max_aspect_ratio = \
-            cfg.PIPELINE.RANDOM_BBOX_CROP.ASPECT_RATIO
-        self._samplers = [{'min_scale': 1.0,
-                           'max_scale': 1.0,
-                           'min_aspect_ratio': 1.0,
-                           'max_aspect_ratio': 1.0,
-                           'min_overlap': 0.0,
-                           'max_overlap': 1.0,
-                           'max_trials': 1,
-                           'max_sample': 1}]
-        for min_overlap in cfg.PIPELINE.RANDOM_BBOX_CROP.THRESHOLDS:
-            self._samplers.append({'min_scale': min_scale,
-                                   'max_scale': max_scale,
-                                   'min_aspect_ratio': min_aspect_ratio,
-                                   'max_aspect_ratio': max_aspect_ratio,
-                                   'min_overlap': min_overlap,
-                                   'max_overlap': 1.0,
-                                   'max_trials': 10,
-                                   'max_sample': 1})
-    @classmethod
-    def _compute_overlaps(cls, rand_box, gt_boxes):
-        return box_util_v2.iou(np.expand_dims(rand_box, 0), gt_boxes[:, 0:4])
-    @classmethod
-    def _generate_sample(cls, sample_param):
-        min_scale = sample_param.get('min_scale', 1.)
-        max_scale = sample_param.get('max_scale', 1.)
-        scale = npr.uniform(min_scale, max_scale)
-        min_aspect_ratio = sample_param.get('min_aspect_ratio', 1.)
-        max_aspect_ratio = sample_param.get('max_aspect_ratio', 1.)
-        min_aspect_ratio = max(min_aspect_ratio, scale**2)
-        max_aspect_ratio = min(max_aspect_ratio, 1. / (scale**2))
-        aspect_ratio = npr.uniform(min_aspect_ratio, max_aspect_ratio)
-        bbox_w = scale * (aspect_ratio ** 0.5)
-        bbox_h = scale / (aspect_ratio ** 0.5)
-        w_off = npr.uniform(0., 1. - bbox_w)
-        h_off = npr.uniform(0., 1. - bbox_h)
-        return np.array([w_off, h_off, w_off + bbox_w, h_off + bbox_h])
-    @staticmethod
-    def _check_center(sample_box, gt_boxes):
-        ctr_x = (gt_boxes[:, 2] + gt_boxes[:, 0]) / 2.0
-        ctr_y = (gt_boxes[:, 3] + gt_boxes[:, 1]) / 2.0
-        # Keep the ground-truth box whose center is in the sample box
-        keep_indices = np.where((ctr_x >= sample_box[0]) & (ctr_x <= sample_box[2]) &
-                                (ctr_y >= sample_box[1]) & (ctr_y <= sample_box[3]))[0]
-        return len(keep_indices) > 0
-    def _check_overlap(self, sample_box, gt_boxes, constraint):
-        min_overlap = constraint.get('min_overlap', None)
-        max_overlap = constraint.get('max_overlap', None)
-        if min_overlap is None and \
-                max_overlap is None:
-            return True
-        ovr = self._compute_overlaps(sample_box, gt_boxes).max()
-        if min_overlap is not None:
-            if ovr < min_overlap:
-                return False
-        if max_overlap is not None:
-            if ovr > max_overlap:
-                return False
-        return True
-    def _generate_batch_samples(self, gt_boxes):
-        sample_boxes = []
-        for sampler in self._samplers:
-            found = 0
-            for i in range(sampler['max_trials']):
-                if found >= sampler['max_sample']:
-                    break
-                sample_box = self._generate_sample(sampler)
-                if sampler['min_overlap'] != 0. or \
-                        sampler['max_overlap'] != 1.:
-                    if not self._check_overlap(sample_box, gt_boxes, sampler):
-                        continue
-                if not self._check_center(sample_box, gt_boxes):
-                    continue
-                found += 1
-                sample_boxes.append(sample_box)
-        return sample_boxes
-    @classmethod
-    def _rand_crop(cls, im, rand_box, gt_boxes=None):
-        im_h, im_w = im.shape[:2]
-        w_off = int(rand_box[0] * im_w)
-        h_off = int(rand_box[1] * im_h)
-        crop_w = int((rand_box[2] - rand_box[0]) * im_w)
-        crop_h = int((rand_box[3] - rand_box[1]) * im_h)
-        new_im = im[h_off:h_off + crop_h, w_off:w_off + crop_w, :]
-        if gt_boxes is not None:
-            ctr_x = (gt_boxes[:, 2] + gt_boxes[:, 0]) / 2.0
-            ctr_y = (gt_boxes[:, 3] + gt_boxes[:, 1]) / 2.0
-            keep_indices = np.where((ctr_x >= rand_box[0]) & (ctr_x <= rand_box[2]) &
-                                    (ctr_y >= rand_box[1]) & (ctr_y <= rand_box[3]))[0]
-            gt_boxes = gt_boxes[keep_indices]
-            new_gt_boxes = gt_boxes.astype(gt_boxes.dtype, copy=True)
-            new_gt_boxes[:, 0] = (gt_boxes[:, 0] * im_w - w_off)
-            new_gt_boxes[:, 1] = (gt_boxes[:, 1] * im_h - h_off)
-            new_gt_boxes[:, 2] = (gt_boxes[:, 2] * im_w - w_off)
-            new_gt_boxes[:, 3] = (gt_boxes[:, 3] * im_h - h_off)
-            new_gt_boxes = box_util.clip_boxes(new_gt_boxes, (crop_h, crop_w))
-            new_gt_boxes[:, 0] = new_gt_boxes[:, 0] / crop_w
-            new_gt_boxes[:, 1] = new_gt_boxes[:, 1] / crop_h
-            new_gt_boxes[:, 2] = new_gt_boxes[:, 2] / crop_w
-            new_gt_boxes[:, 3] = new_gt_boxes[:, 3] / crop_h
-            return new_im, new_gt_boxes
-        return new_im, gt_boxes
-    def apply(self, img, boxes):
-        sample_boxes = self._generate_batch_samples(boxes)
-        if len(sample_boxes) > 0:
-            # Apply sampling if found at least one valid sample box
-            # Then randomly pick one
-            sample_idx = npr.randint(len(sample_boxes))
-            rand_box = sample_boxes[sample_idx]
-            img, boxes = self._rand_crop(img, rand_box, boxes)
-        return img, boxes
--- a/seetadet/algo/ssd/utils.py
+++ b/seetadet/algo/ssd/utils.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import numpy as np
-def get_shifted_anchors(shapes, base_anchors, strides):
-    """Return the shifted anchors on given shapes."""
-    anchors_to_pack = []
-    for i in range(len(shapes)):
-        height, width = shapes[i]
-        shift_x = (np.arange(0, width) + 0.5) * strides[i]
-        shift_y = (np.arange(0, height) + 0.5) * strides[i]
-        shift_x, shift_y = np.meshgrid(shift_x, shift_y)
-        shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
-                            shift_x.ravel(), shift_y.ravel())).transpose()
-        # Add a anchors (1, A, 4) to cell K shifts (K, 1, 4)
-        # to get shift anchors (K, A, 4) and reshape to (K * A, 4)
-        a = base_anchors[i].shape[0]
-        k = shifts.shape[0]
-        anchors = (base_anchors[i].reshape((1, a, 4)) +
-                   shifts.reshape((1, k, 4)).transpose((1, 0, 2)))
-        anchors_to_pack.append(anchors.reshape((k * a, 4)))
-    return np.vstack(anchors_to_pack)
--- a/seetadet/core/backend.py
+++ b/seetadet/core/backend.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Platform backend."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import ctypes
+import importlib.machinery
+import os
+import types
+from dragon.vm import torch
+def load_library(library_prefix):
+    """Load a shared library."""
+    loader_details = (importlib.machinery.ExtensionFileLoader,
+                      importlib.machinery.EXTENSION_SUFFIXES)
+    library_prefix = os.path.abspath(library_prefix)
+    lib_dir, fullname = os.path.split(library_prefix)
+    finder = importlib.machinery.FileFinder(lib_dir, loader_details)
+    ext_specs = finder.find_spec(fullname)
+    if ext_specs is None:
+        raise ImportError('Could not find the pre-built library '
+                          'for <%s>.' % library_prefix)
+    ctypes.cdll.LoadLibrary(ext_specs.origin)
+def trace_module(module, name, func, example_inputs=None):
+    """Trace the function and bound to module."""
+    setattr(module, name, torch.jit.trace(
+            func=types.MethodType(func, module),
+            example_inputs=example_inputs))
--- a/seetadet/core/config.py
+++ b/seetadet/core/config.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import numpy as np
-from seetadet.utils.attrdict import AttrDict
-cfg = __C = AttrDict()
-###########################################
-#                                         #
-#            Pipeline Options             #
-#                                         #
-###########################################
-__C.PIPELINE = AttrDict()
-# The pipeline type
-# Value supported as follows:
-#  - 'ssd'
-#  - 'rcnn'
-#  - 'default'
-__C.PIPELINE.TYPE = 'default'
-# RandomBBoxCrop
-__C.PIPELINE.RANDOM_BBOX_CROP = AttrDict()
-#   - The range of scale for sampling regions
-__C.PIPELINE.RANDOM_BBOX_CROP.SCALING = [0.3, 1.0]
-#   - The range of aspect ratio for sampling regions
-__C.PIPELINE.RANDOM_BBOX_CROP.ASPECT_RATIO = [0.5, 2.0]
-#   - The minimum IoU to satisfy
-__C.PIPELINE.RANDOM_BBOX_CROP.THRESHOLDS = [0.0, 0.1, 0.3, 0.5, 0.7, 0.9]
-###########################################
-#                                         #
-#            Training Options             #
-#                                         #
-###########################################
-__C.TRAIN = AttrDict()
-# Initialize network with weights from this file
-__C.TRAIN.WEIGHTS = ''
-# Dataset to train
-__C.TRAIN.DATASET = ''
-# The number of threads to load train data
-__C.TRAIN.NUM_THREADS = 4
-# Scales to use during training (can list multiple scales)
-# Each scale is the pixel size of an image's shortest side
-__C.TRAIN.SCALES = (640,)
-# Range to jitter the selected scale
-__C.TRAIN.RANDOM_SCALES = [1., 1.]
-# Max pixel size of the longest side of a scaled input image
-__C.TRAIN.MAX_SIZE = 0
-# Images to use per mini-batch
-__C.TRAIN.IMS_PER_BATCH = 1
-# The number of training batches to init for aspect grouping
-__C.TRAIN.ASPECT_GROUPING = 64
-# Use shuffled images during training?
-__C.TRAIN.USE_SHUFFLE = True
-# Use horizontally-flipped images during training?
-__C.TRAIN.USE_FLIPPED = True
-# Use the difficult(under occlusion) objects
-__C.TRAIN.USE_DIFF = True
-# If True, distort th brightness, contrast, and saturation
-__C.TRAIN.USE_COLOR_JITTER = False
-# NMS threshold used on RPN proposals
-__C.TRAIN.RPN_NMS_THRESH = 0.7
-# Number of top scoring boxes to keep before NMS to RPN proposals
-__C.TRAIN.RPN_PRE_NMS_TOP_N = 12000
-# Number of top scoring boxes to keep after NMS to RPN proposals
-__C.TRAIN.RPN_POST_NMS_TOP_N = 2000
-###########################################
-#                                         #
-#            Testing Options              #
-#                                         #
-###########################################
-__C.TEST = AttrDict()
-# Dataset to test
-__C.TEST.DATASET = ''
-# The test protocol for dataset
-# Available protocols: 'voc2007', 'voc2010', 'coco'
-__C.TEST.PROTOCOL = 'voc2007'
-# Original json ground-truth file to use
-__C.TEST.JSON_FILE = ''
-# Scales to use during testing (can list multiple scales)
-# Each scale is the pixel size of an image's shortest side
-__C.TEST.SCALES = (640,)
-# Max pixel size of the longest side of a scaled input image
-__C.TEST.MAX_SIZE = 0
-# Images to use per mini-batch
-__C.TEST.IMS_PER_BATCH = 1
-# The threshold for predicting boxes
-__C.TEST.SCORE_THRESH = 0.05
-# The threshold for predicting masks
-__C.TEST.BINARY_THRESH = 0.5
-# Number of top scoring boxes to keep before NMS to detections
-__C.TEST.PRE_NMS_TOP_N = 300
-# Overlap threshold used for NMS
-__C.TEST.NMS = 0.3
-# Use Soft-NMS instead of standard NMS?
-# For the soft NMS overlap threshold, we simply use TEST.NMS
-__C.TEST.USE_SOFT_NMS = False
-__C.TEST.SOFT_NMS_METHOD = 'linear'
-__C.TEST.SOFT_NMS_SIGMA = 0.5
-# NMS threshold used on RPN proposals
-__C.TEST.RPN_NMS_THRESH = 0.7
-# Number of top scoring boxes to keep before NMS to RPN proposals
-__C.TEST.RPN_PRE_NMS_TOP_N = 6000
-# Number of top scoring boxes to keep after NMS to RPN proposals
-__C.TEST.RPN_POST_NMS_TOP_N = 1000
-# Number of top scoring boxes to keep before NMS to RetinaNet detections
-__C.TEST.RETINANET_PRE_NMS_TOP_N = 3000
-# Save detection results files if True
-# If false, results files are cleaned up after evaluation
-__C.TEST.COMPETITION_MODE = True
-# Maximum number of detections to return per image
-# 100 is based on the limit established for the COCO dataset
-__C.TEST.DETECTIONS_PER_IM = 100
-###########################################
-#                                         #
-#              Model Options              #
-#                                         #
-###########################################
-__C.MODEL = AttrDict()
-# The model type
-# Value supported as follows:
-#  - 'faster_rcnn'
-#  - 'mask_rcnn'``
-#  - 'retinanet
-#  - 'ssd'
-__C.MODEL.TYPE = ''
-# The float precision for training and inference
-# (FLOAT32, FLOAT16,)
-__C.MODEL.PRECISION = 'FLOAT32'
-# The backbone
-__C.MODEL.BACKBONE = ''
-# The backbone normalization module
-# Values supported: 'FrozenBN', 'BN'
-__C.MODEL.BACKBONE_NORM = 'FrozenBN'
-# The name for each object class
-__C.MODEL.CLASSES = ['__background__']
-# Frozen the gradient since the convolution stage K
-# The value of ``K`` is usually set to 2
-__C.MODEL.FREEZE_AT = 2
-# The variant of ReLU activation
-# Values supported: 'ReLU', 'ReLU6'
-__C.MODEL.RELU_VARIANT = 'ReLU'
-# Setting of focal loss
-__C.MODEL.FOCAL_LOSS_ALPHA = 0.25
-__C.MODEL.FOCAL_LOSS_GAMMA = 2.0
-# Stride of the coarsest feature level
-# This is needed so the input can be padded properly
-__C.MODEL.COARSEST_STRIDE = 32
-###########################################
-#                                         #
-#              RPN Options                #
-#                                         #
-###########################################
-__C.RPN = AttrDict()
-# Total number of rpn training examples per image
-__C.RPN.BATCH_SIZE = 256
-# Target fraction of foreground examples per training batch
-__C.RPN.FG_FRACTION = 0.5
-# Strides for multiple rpn heads
-__C.RPN.STRIDES = [4, 8, 16, 32, 64]
-# Scales for multiple anchors
-__C.RPN.SCALES = [8, 8, 8, 8, 8]
-# RPN anchor aspect ratios
-__C.RPN.ASPECT_RATIOS = [0.5, 1, 2]
-# IoU overlap ratio for labeling an anchor as positive
-# Anchors with >= iou overlap are labeled positive
-__C.RPN.POSITIVE_OVERLAP = 0.7
-# IoU overlap ratio for labeling an anchor as negative
-# Anchors with < iou overlap are labeled negative
-__C.RPN.NEGATIVE_OVERLAP = 0.3
-# The optional loss for bbox regression
-# Values supported: 'l1', 'smooth_l1'
-__C.RPN.BBOX_REG_LOSS_TYPE = 'l1'
-# Weight for bbox regression loss
-__C.RPN.BBOX_REG_LOSS_WEIGHT = 1.0
-###########################################
-#                                         #
-#           Retina-Net Options            #
-#                                         #
-###########################################
-__C.RETINANET = AttrDict()
-# Anchor aspect ratios to use
-__C.RETINANET.ASPECT_RATIOS = (0.5, 1.0, 2.0)
-# Anchor scales per octave
-__C.RETINANET.SCALES_PER_OCTAVE = 3
-# At each FPN level, we generate anchors based on their scale, aspect_ratio,
-# stride of the level, and we multiply the resulting anchor by ANCHOR_SCALE
-__C.RETINANET.ANCHOR_SCALE = 4
-# Convolutions to use in the cls and bbox tower
-# NOTE: this doesn't include the last conv for logits
-__C.RETINANET.NUM_CONVS = 4
-# IoU overlap ratio for labeling an anchor as positive
-# Anchors with >= iou overlap are labeled positive
-__C.RETINANET.POSITIVE_OVERLAP = 0.5
-# IoU overlap ratio for labeling an anchor as negative
-# Anchors with < iou overlap are labeled negative
-__C.RETINANET.NEGATIVE_OVERLAP = 0.4
-# The optional loss for bbox regression
-# Values supported: 'l1', 'smooth_l1', 'giou'
-__C.RETINANET.BBOX_REG_LOSS_TYPE = 'l1'
-# Weight for bbox regression loss
-__C.RETINANET.BBOX_REG_LOSS_WEIGHT = 1.0
-###########################################
-#                                         #
-#              FPN Options                #
-#                                         #
-###########################################
-__C.FPN = AttrDict()
-# Channel dimension of the FPN feature levels
-__C.FPN.DIM = 256
-# Coarsest level of the FPN pyramid
-__C.FPN.RPN_MAX_LEVEL = 6
-# Finest level of the FPN pyramid
-__C.FPN.RPN_MIN_LEVEL = 2
-# Hyper-Parameters for the RoI-to-FPN level mapping heuristic
-__C.FPN.ROI_CANONICAL_SCALE = 224
-__C.FPN.ROI_CANONICAL_LEVEL = 4
-# Coarsest level of the FPN pyramid
-__C.FPN.ROI_MAX_LEVEL = 5
-# Finest level of the FPN pyramid
-__C.FPN.ROI_MIN_LEVEL = 2
-###########################################
-#                                         #
-#           Fast R-CNN Options            #
-#                                         #
-###########################################
-__C.FRCNN = AttrDict()
-# Total number of training RoIs per image
-__C.FRCNN.BATCH_SIZE = 128
-# Target fraction of foreground RoIs per training batch
-__C.FRCNN.FG_FRACTION = 0.25
-# IoU overlap ratio for labeling a RoI as positive
-# RoIs with >= iou overlap are labeled positive
-__C.FRCNN.POSITIVE_OVERLAP = 0.5
-# IoU overlap ratio for labeling a RoI as negative
-# RoIs with iou overlap in [LO, HI) are labeled negative
-__C.FRCNN.NEGATIVE_OVERLAP_HI = 0.5
-__C.FRCNN.NEGATIVE_OVERLAP_LO = 0.0
-# RoI transform function
-# Values supported: 'RoIAlign', 'RoIPool'
-__C.FRCNN.ROI_XFORM_METHOD = 'RoIAlign'
-# RoI transform output resolution
-__C.FRCNN.ROI_XFORM_RESOLUTION = 7
-# Resampling window size for RoI transformation
-__C.FRCNN.ROI_XFORM_SAMPLING_RATIO = 0
-# Hidden layer dimension when using an MLP for the RoI box head
-__C.FRCNN.MLP_HEAD_DIM = 1024
-# The optional loss for bbox regression
-# Values supported: 'l1', 'smooth_l1'
-__C.FRCNN.BBOX_REG_LOSS_TYPE = 'l1'
-# Weight for bbox regression loss
-__C.FRCNN.BBOX_REG_LOSS_WEIGHT = 1.0
-###########################################
-#                                         #
-#           Mask R-CNN Options            #
-#                                         #
-###########################################
-__C.MRCNN = AttrDict()
-# Resolution of mask predictions
-__C.MRCNN.RESOLUTION = 28
-# RoI transform function
-# Values supported: 'RoIAlign', 'RoIPool'
-__C.MRCNN.ROI_XFORM_METHOD = 'RoIAlign'
-# RoI transform output resolution
-__C.MRCNN.ROI_XFORM_RESOLUTION = 14
-# Resampling window size for RoI transformation
-__C.MRCNN.ROI_XFORM_SAMPLING_RATIO = 0
-###########################################
-#                                         #
-#               SSD Options               #
-#                                         #
-###########################################
-__C.SSD = AttrDict()
-# Convolutions to use in the cls and bbox tower
-# NOTE: this doesn't include the last conv for logits
-__C.SSD.NUM_CONVS = 0
-# Anchor aspect ratios to use
-__C.SSD.ASPECT_RATIOS = []
-# Strides for multiple ssd heads
-__C.SSD.STRIDES = []
-# Anchor sizes to use
-__C.SSD.ANCHOR_SIZES = []
-# IoU overlap ratio for labeling an anchor as positive
-# Anchors with >= iou overlap are labeled positive
-__C.SSD.POSITIVE_OVERLAP = 0.5
-# IoU overlap ratio for labeling an anchor as negative
-# Anchors with < iou overlap are labeled negative
-__C.SSD.NEGATIVE_OVERLAP = 0.5
-# The ratio to sample negative anchors as background
-__C.SSD.NEGATIVE_POSITIVE_RATIO = 3.0
-# The optional loss for bbox regression
-# Values supported: 'l1', 'smooth_l1', 'giou'
-__C.SSD.BBOX_REG_LOSS_TYPE = 'l1'
-# Weight for bbox regression loss
-__C.SSD.BBOX_REG_LOSS_WEIGHT = 1.0
-###########################################
-#                                         #
-#             ResNet Options              #
-#                                         #
-###########################################
-__C.RESNET = AttrDict()
-# Number of groups to use
-# 1 ==> ResNet; > 1 ==> ResNeXt
-# ResNext 32x8d: NUM_GROUPS, WIDTH_PER_GROUP = 32, 8
-# ResNext 64x4d: NUM_GROUPS, WIDTH_PER_GROUP = 64, 4
-__C.RESNET.NUM_GROUPS = 1
-# Baseline width of each group
-__C.RESNET.WIDTH_PER_GROUP = 64
-###########################################
-#                                         #
-#             Solver Options              #
-#                                         #
-###########################################
-__C.SOLVER = AttrDict()
-# The interval to display logs
-__C.SOLVER.DISPLAY = 20
-# The interval to snapshot a model
-__C.SOLVER.SNAPSHOT_EVERY = 5000
-# Prefix to yield the path: <prefix>_iter_XYZ.pkl
-__C.SOLVER.SNAPSHOT_PREFIX = ''
-# Optional scaling factor for total loss
-# This option is helpful to scale the magnitude
-# of gradients during FP16 training
-__C.SOLVER.LOSS_SCALING = 1.0
-# Maximum number of SGD iterations
-__C.SOLVER.MAX_STEPS = 40000
-# Base learning rate for the specified schedule
-__C.SOLVER.BASE_LR = 0.001
-# The uniform interval for LRScheduler
-__C.SOLVER.DECAY_STEP = 1
-# The custom intervals for LRScheduler
-__C.SOLVER.DECAY_STEPS = []
-# The decay factor for exponential LRScheduler
-__C.SOLVER.DECAY_GAMMA = 0.1
-# Warm up to ``BASE_LR`` over this number of steps
-__C.SOLVER.WARM_UP_STEPS = 500
-# Start the warm up from ``BASE_LR`` * ``FACTOR``
-__C.SOLVER.WARM_UP_FACTOR = 0.333
-# The type of LRScheduler
-__C.SOLVER.LR_POLICY = 'steps_with_decay'
-# Momentum to use with SGD
-__C.SOLVER.MOMENTUM = 0.9
-# L2 regularization for weight parameters
-__C.SOLVER.WEIGHT_DECAY = 0.0001
-# L2 regularization for legacy bias parameters
-__C.SOLVER.WEIGHT_DECAY_BIAS = 0.0
-# L2 norm factor for clipping gradients
-__C.SOLVER.CLIP_NORM = 0.0
-###########################################
-#                                         #
-#               Misc Options              #
-#                                         #
-###########################################
-# Number of GPUs to use during training
-__C.NUM_GPUS = 1
-# Use NCCL for all reduce, otherwise use cuda-aware mpi
-__C.USE_NCCL = True
-# Hosts for Inter-Machine communication
-__C.HOSTS = []
-# Pixel stddev and mean values (BGR order)
-__C.PIXEL_STDS = [1.0, 1.0, 1.0]
-__C.PIXEL_MEANS = [103.53, 116.28, 123.675]
-# Default weights on (dx, dy, dw, dh) for normalizing bbox regression targets
-# These are empirically chosen to approximately lead to unit variance targets
-__C.BBOX_REG_WEIGHTS = (10., 10., 5., 5.)
-# Prior prob for the positives at the beginning of training.
-# This is used to set the bias init for the logits layer
-__C.PRIOR_PROB = 0.01
-# For reproducibility
-__C.RNG_SEED = 3
-# Place outputs under an experiments directory
-__C.EXP_DIR = ''
-# Default GPU device index
-__C.GPU_ID = 0
-# Show detection visualizations
-__C.VIS = False
-# Write detection visualizations instead of showing
-__C.VIS_ON_FILE = False
-# Score threshold for visualization
-__C.VIS_TH = 0.7
-# Write summaries by TensorBoard
-__C.ENABLE_TENSOR_BOARD = False
-def cfg_from_file(filename):
-    """Load a config file and merge it into the default options."""
-    import yaml
-    with open(filename, 'r') as f:
-        yaml_cfg = AttrDict(yaml.safe_load(f))
-    global __C
-    _merge_a_into_b(yaml_cfg, __C)
-def cfg_from_list(cfg_list):
-    """Set config keys via list (e.g., from command line)."""
-    from ast import literal_eval
-    assert len(cfg_list) % 2 == 0
-    for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
-        key_list = k.split('.')
-        d = __C
-        for sub_key in key_list[:-1]:
-            assert sub_key in d
-            d = d[sub_key]
-        sub_key = key_list[-1]
-        assert sub_key in d
-        try:
-            value = literal_eval(v)
-        except:  # noqa
-            # Handle the case when v is a string literal
-            value = v
-        if type(value) != type(d[sub_key]):  # noqa
-            raise TypeError('Type {} does not match original type {}'
-                            .format(type(value), type(d[sub_key])))
-        d[sub_key] = value
-def _merge_a_into_b(a, b):
-    """Merge config dictionary a into config dictionary b, clobbering the
-       options in b whenever they are also specified in a."""
-    if not isinstance(a, dict):
-        return
-    for k, v in a.items():
-        # a must specify keys that are in b
-        if k not in b:
-            raise KeyError('{} is not a valid config key'.format(k))
-        # The types must match, too
-        v = _check_and_coerce_cfg_value_type(v, b[k], k)
-        # Recursively merge dicts
-        if type(v) is AttrDict:
-            try:
-                _merge_a_into_b(a[k], b[k])
-            except:  # noqa
-                print('Error under config key: {}'.format(k))
-                raise
-        else:
-            b[k] = v
-def _check_and_coerce_cfg_value_type(value_a, value_b, key):
-    """Check if the value type matched."""
-    type_a, type_b = type(value_a), type(value_b)
-    if type_a is type_b:
-        return value_a
-    if type_b is float and type_a is int:
-        return float(value_a)
-    # Exceptions: numpy arrays, strings, tuple<->list
-    if isinstance(value_b, np.ndarray):
-        value_a = np.array(value_a, dtype=value_b.dtype)
-    elif isinstance(value_a, tuple) and isinstance(value_b, list):
-        value_a = list(value_a)
-    elif isinstance(value_a, list) and isinstance(value_b, tuple):
-        value_a = tuple(value_a)
-    elif isinstance(value_a, dict) and isinstance(value_b, AttrDict):
-        value_a = AttrDict(value_a)
-    else:
-        raise ValueError(
-            'Type mismatch ({} vs. {}) with values ({} vs. {}) for config '
-            'key: {}'.format(type_b, type_a, value_b, value_a, key))
-    return value_a
--- a/seetadet/algo/common/__init__.py
+++ b/seetadet/algo/common/__init__.py
@@ -8,9 +8,11 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Platform configurations."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-from seetadet.algo.common.anchor_sampler import AnchorSampler
+# Variables
+from seetadet.core.config.defaults import cfg  # noqa
--- a/seetadet/core/config/defaults.py
+++ b/seetadet/core/config/defaults.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Default configurations."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.core.config.yacs import CfgNode
+_C = cfg = CfgNode()
+# ------------------------------------------------------------
+# Augmentation options
+# ------------------------------------------------------------
+_C.AUG = CfgNode()
+# The probability to distort the color
+_C.AUG.COLOR_JITTER = 0.0
+# The crop size
+# Disable cropping always if crop size <= 0
+_C.AUG.CROP_SIZE = 0
+# ------------------------------------------------------------
+# Training options
+# ------------------------------------------------------------
+_C.TRAIN = CfgNode()
+# Initialize network with weights from this file
+_C.TRAIN.WEIGHTS = ''
+# The train dataset
+_C.TRAIN.DATASET = ''
+# The loader type for training
+_C.TRAIN.LOADER = 'det_train'
+# The number of workers to load train data
+_C.TRAIN.NUM_WORKERS = 3
+# Scales to use during training (can list multiple scales)
+# Each scale is the pixel size of an image shortest side
+_C.TRAIN.SCALES = (640,)
+# Range to jitter the image scales randomly
+_C.TRAIN.SCALES_RANGE = (1.0, 1.0)
+# Max pixel size of the longest side of a scaled input image
+_C.TRAIN.MAX_SIZE = 1066
+# Images to use per mini-batch
+_C.TRAIN.IMS_PER_BATCH = 1
+# Use the difficult (under occlusion) objects
+_C.TRAIN.USE_DIFF = True
+# ------------------------------------------------------------
+# Testing options
+# ------------------------------------------------------------
+_C.TEST = CfgNode()
+# The test dataset
+_C.TEST.DATASET = ''
+# THE JSON format dataset with annotations for evaluation
+_C.TEST.JSON_DATASET = ''
+# The loader type for testing
+_C.TEST.LOADER = 'det_test'
+# The evaluator type for dataset
+_C.TEST.EVALUATOR = ''
+# Scales to use during testing (can list multiple scales)
+# Each scale is the pixel size of an image's shortest side
+_C.TEST.SCALES = (640,)
+# Max pixel size of the longest side of a scaled input image
+_C.TEST.MAX_SIZE = 1066
+# Images to use per mini-batch
+_C.TEST.IMS_PER_BATCH = 1
+# The threshold for predicting boxes
+_C.TEST.SCORE_THRESH = 0.05
+# The threshold for predicting masks
+_C.TEST.BINARY_THRESH = 0.5
+# Overlap threshold used for NMS
+_C.TEST.NMS_THRESH = 0.5
+# Maximum number of detections to return per image
+# 100 is based on the limit established for the COCO dataset
+_C.TEST.DETECTIONS_PER_IM = 100
+# ------------------------------------------------------------
+# Model options
+# ------------------------------------------------------------
+_C.MODEL = CfgNode()
+# The model type
+_C.MODEL.TYPE = ''
+# The compute precision
+_C.MODEL.PRECISION = 'float32'
+# The name for each object class
+_C.MODEL.CLASSES = ['__background__']
+# Pixel mean and stddev values for image normalization (BGR order)
+_C.MODEL.PIXEL_MEAN = [103.53, 116.28, 123.675]
+_C.MODEL.PIXEL_STD = [57.375, 57.12, 58.395]
+# Focal loss parameters
+_C.MODEL.FOCAL_LOSS_ALPHA = 0.25
+_C.MODEL.FOCAL_LOSS_GAMMA = 2.0
+# ------------------------------------------------------------
+# Backbone options
+# ------------------------------------------------------------
+_C.BACKBONE = CfgNode()
+# The backbone type
+_C.BACKBONE.TYPE = ''
+# The normalization in backbone modules
+_C.BACKBONE.NORM = 'FrozenBN'
+# Freeze backbone since the stage K
+# The value of ``K`` is usually set to 2
+_C.BACKBONE.FREEZE_AT = 2
+# Stride of the coarsest feature
+# This is needed so the input can be padded properly
+_C.BACKBONE.COARSEST_STRIDE = 32
+# ------------------------------------------------------------
+# FPN options
+# ------------------------------------------------------------
+_C.FPN = CfgNode()
+# Finest level of the FPN pyramid
+_C.FPN.MIN_LEVEL = 3
+# Coarsest level of the FPN pyramid
+_C.FPN.MAX_LEVEL = 7
+# The number of repeated fpn cells.
+_C.FPN.NUM_CELLS = 1
+# Channel dimension of the FPN feature levels
+_C.FPN.DIM = 256
+# The FPN conv module
+_C.FPN.CONV = 'Conv2d'
+# The fpn normalization module
+_C.FPN.NORM = ''
+# The fpn activation module
+_C.FPN.ACTIVATION = ''
+# The feature fusion method
+# Values supported: 'sum', 'attn'
+_C.FPN.FUSE_TYPE = 'sum'
+# ------------------------------------------------------------
+# Anchor generator options
+# ------------------------------------------------------------
+_C.ANCHOR_GENERATOR = CfgNode()
+# The stride of each level
+_C.ANCHOR_GENERATOR.STRIDES = [8, 16, 32, 64, 128]
+# The anchor size of each stride
+_C.ANCHOR_GENERATOR.SIZES = [[32], [64], [128], [256], [512]]
+# The aspect ratios of each stride
+_C.ANCHOR_GENERATOR.ASPECT_RATIOS = [[0.5, 1.0, 2.0]]
+# ------------------------------------------------------------
+# RPN options
+# ------------------------------------------------------------
+_C.RPN = CfgNode()
+# Total number of rpn training anchors per image
+_C.RPN.BATCH_SIZE = 256
+# Fraction of foreground anchors per training batch
+_C.RPN.POSITIVE_FRACTION = 0.5
+# IoU overlap ratio for labeling an anchor as positive
+# Anchors with >= iou overlap are labeled positive
+_C.RPN.POSITIVE_OVERLAP = 0.7
+# IoU overlap ratio for labeling an anchor as negative
+# Anchors with < iou overlap are labeled negative
+_C.RPN.NEGATIVE_OVERLAP = 0.3
+# NMS threshold used on RPN proposals
+_C.RPN.NMS_THRESH = 0.7
+# Number of top scoring boxes to keep before NMS to RPN proposals
+_C.RPN.PRE_NMS_TOP_N_TRAIN = 12000
+_C.RPN.PRE_NMS_TOP_N_TEST = 6000
+# Number of top scoring boxes to keep after NMS to RPN proposals
+_C.RPN.POST_NMS_TOP_N_TRAIN = 2000
+_C.RPN.POST_NMS_TOP_N_TEST = 1000
+# The optional loss for bbox regression
+_C.RPN.BBOX_REG_LOSS_TYPE = 'l1'
+# Weight for bbox regression loss
+_C.RPN.BBOX_REG_LOSS_WEIGHT = 1.0
+# ------------------------------------------------------------
+# RetinaNet options
+# ------------------------------------------------------------
+_C.RETINANET = CfgNode()
+# Number of conv layers to stack in the head
+_C.RETINANET.NUM_CONV = 4
+# The head conv module
+_C.RETINANET.CONV = 'Conv2d'
+# The head normalization module
+_C.RETINANET.NORM = ''
+# The head activation module
+_C.RETINANET.ACTIVATION = 'ReLU'
+# IoU overlap ratio for labeling an anchor as positive
+# Anchors with >= iou overlap are labeled positive
+_C.RETINANET.POSITIVE_OVERLAP = 0.5
+# IoU overlap ratio for labeling an anchor as negative
+# Anchors with < iou overlap are labeled negative
+_C.RETINANET.NEGATIVE_OVERLAP = 0.4
+# Number of top scoring boxes to keep before NMS
+_C.RETINANET.PRE_NMS_TOP_N = 6000
+# The bbox regression loss type
+_C.RETINANET.BBOX_REG_LOSS_TYPE = 'l1'
+# The weight for bbox regression loss
+_C.RETINANET.BBOX_REG_LOSS_WEIGHT = 1.0
+# ------------------------------------------------------------
+# FastRCNN options
+# ------------------------------------------------------------
+_C.FRCNN = CfgNode()
+# Total number of training RoIs per image
+_C.FRCNN.BATCH_SIZE = 512
+# The finest level of RoI feature
+_C.FRCNN.MIN_LEVEL = 2
+# The coarsest level of RoI feature
+_C.FRCNN.MAX_LEVEL = 5
+# Fraction of foreground RoIs per training batch
+_C.FRCNN.POSITIVE_FRACTION = 0.25
+# IoU overlap ratio for labeling a RoI as positive
+# RoIs with >= iou overlap are labeled positive
+_C.FRCNN.POSITIVE_OVERLAP = 0.5
+# IoU overlap ratio for labeling a RoI as negative
+# RoIs with < iou overlap are labeled negative
+_C.FRCNN.NEGATIVE_OVERLAP = 0.5
+# RoI pooler type
+_C.FRCNN.POOLER_TYPE = 'RoIAlign'
+# The output size of of RoI pooler
+_C.FRCNN.POOLER_RESOLUTION = 7
+# The resampling window size of RoI pooler
+_C.FRCNN.POOLER_SAMPLING_RATIO = 0
+# The number of conv layers to stack in the head
+_C.FRCNN.NUM_CONV = 0
+# The number of fc layers to stack in the head
+_C.FRCNN.NUM_FC = 2
+# The hidden dimension of conv head
+_C.FRCNN.CONV_HEAD_DIM = 256
+# The hidden dimension of fc head
+_C.FRCNN.FC_HEAD_DIM = 1024
+# The head normalization module
+_C.FRCNN.NORM = ''
+# The bbox regression loss type
+_C.FRCNN.BBOX_REG_LOSS_TYPE = 'l1'
+# The weight for bbox regression loss
+_C.FRCNN.BBOX_REG_LOSS_WEIGHT = 1.0
+# The weights on (dx, dy, dw, dh) for normalizing bbox regression targets
+_C.FRCNN.BBOX_REG_WEIGHTS = (10., 10., 5., 5.)
+# ------------------------------------------------------------
+# MaskRCNN options
+# ------------------------------------------------------------
+_C.MRCNN = CfgNode()
+# RoI pooler type
+_C.MRCNN.POOLER_TYPE = 'RoIAlign'
+# The output size of of RoI pooler
+_C.MRCNN.POOLER_RESOLUTION = 14
+# The resampling window size of RoI pooler
+_C.MRCNN.POOLER_SAMPLING_RATIO = 0
+# The number of conv layers to stack in the head
+_C.MRCNN.NUM_CONV = 4
+# The hidden dimension of conv head
+_C.MRCNN.CONV_HEAD_DIM = 256
+# The head normalization module
+_C.MRCNN.NORM = ''
+# ------------------------------------------------------------
+# SSD options
+# ------------------------------------------------------------
+_C.SSD = CfgNode()
+# Number of conv layers to stack in the cls and bbox tower
+_C.SSD.NUM_CONVS = 0
+# The head conv module
+_C.SSD.CONV = 'Conv2d'
+# The head normalization module
+_C.SSD.NORM = ''
+# Fraction of foreground anchors per training batch
+_C.SSD.POSITIVE_FRACTION = 0.25
+# IoU overlap ratio for labeling an anchor as positive
+# Anchors with >= iou overlap are labeled positive
+_C.SSD.POSITIVE_OVERLAP = 0.5
+# IoU overlap ratio for labeling an anchor as negative
+# Anchors with < iou overlap are labeled negative
+_C.SSD.NEGATIVE_OVERLAP = 0.5
+# Number of top scoring boxes to keep before NMS
+_C.SSD.PRE_NMS_TOP_N = 300
+# The optional loss for bbox regression
+# Values supported: 'l1', 'smooth_l1', 'giou'
+_C.SSD.BBOX_REG_LOSS_TYPE = 'l1'
+# Weight for bbox regression loss
+_C.SSD.BBOX_REG_LOSS_WEIGHT = 1.0
+# The weights on (dx, dy, dw, dh) for normalizing bbox regression targets
+_C.SSD.BBOX_REG_WEIGHTS = (10., 10., 5., 5.)
+# ------------------------------------------------------------
+# Solver options
+# ------------------------------------------------------------
+_C.SOLVER = CfgNode()
+# The interval to display logs
+_C.SOLVER.DISPLAY = 20
+# The interval to snapshot a model
+_C.SOLVER.SNAPSHOT_EVERY = 5000
+# Prefix to yield the path: <prefix>_iter_XYZ.pkl
+_C.SOLVER.SNAPSHOT_PREFIX = ''
+# Loss scaling factor for mixed precision training
+_C.SOLVER.LOSS_SCALE = 1024.0
+# Maximum number of SGD iterations
+_C.SOLVER.MAX_STEPS = 40000
+# Base learning rate for the specified scheduler
+_C.SOLVER.BASE_LR = 0.001
+# Minimal learning rate for the specified scheduler
+_C.SOLVER.MIN_LR = 0.0
+# The decay intervals for LRScheduler
+_C.SOLVER.DECAY_STEPS = []
+# The decay factor for exponential LRScheduler
+_C.SOLVER.DECAY_GAMMA = 0.1
+# Warm up to ``BASE_LR`` over this number of steps
+_C.SOLVER.WARM_UP_STEPS = 1000
+# Start the warm up from ``BASE_LR`` * ``FACTOR``
+_C.SOLVER.WARM_UP_FACTOR = 0.1
+# The type of optimizier
+_C.SOLVER.OPTIMIZER = 'SGD'
+# The type of lr scheduler
+_C.SOLVER.LR_POLICY = 'steps_with_decay'
+# The layer-wise lr decay
+_C.SOLVER.LAYER_LR_DECAY = 1.0
+# Momentum to use with SGD
+_C.SOLVER.MOMENTUM = 0.9
+# L2 regularization for weight parameters
+_C.SOLVER.WEIGHT_DECAY = 0.0001
+# L2 norm factor for clipping gradients
+_C.SOLVER.CLIP_NORM = 0.0
+# ------------------------------------------------------------
+# Misc options
+# ------------------------------------------------------------
+# Number of GPUs for distributed training
+_C.NUM_GPUS = 1
+# Random seed for reproducibility
+_C.RNG_SEED = 3
+# Place outputs under an experiments directory
+_C.EXP_DIR = ''
+# Default GPU device index
+_C.GPU_ID = 0
+# Write summaries by TensorBoard
+_C.ENABLE_TENSOR_BOARD = False
--- a/seetadet/core/config/yacs.py
+++ b/seetadet/core/config/yacs.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# Codes are based on:
+#
+#    <https://github.com/rbgirshick/yacs/blob/master/yacs/config.py>
+#
+# ------------------------------------------------------------
+"""Yet Another Configuration System (YACS)."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import copy
+import numpy as np
+import yaml
+class CfgNode(dict):
+    """Node for configuration options."""
+    IMMUTABLE = '__immutable__'
+    def __init__(self, *args, **kwargs):
+        super(CfgNode, self).__init__(*args, **kwargs)
+        self.__dict__[CfgNode.IMMUTABLE] = False
+    def clone(self):
+        """Recursively copy this CfgNode."""
+        return copy.deepcopy(self)
+    def freeze(self):
+        """Make this CfgNode and all of its children immutable."""
+        self._immutable(True)
+    def is_frozen(self):
+        """Return mutability."""
+        return self.__dict__[CfgNode.IMMUTABLE]
+    def merge_from_file(self, cfg_filename):
+        """Load a yaml config file and merge it into this CfgNode."""
+        with open(cfg_filename, 'r') as f:
+            other_cfg = CfgNode(yaml.safe_load(f))
+        self.merge_from_other_cfg(other_cfg)
+    def merge_from_list(self, cfg_list):
+        """Merge config (keys, values) in a list into this CfgNode."""
+        assert len(cfg_list) % 2 == 0
+        from ast import literal_eval
+        for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
+            key_list = k.split('.')
+            d = self
+            for sub_key in key_list[:-1]:
+                assert sub_key in d
+                d = d[sub_key]
+            sub_key = key_list[-1]
+            assert sub_key in d
+            try:
+                value = literal_eval(v)
+            except:  # noqa
+                # Handle the case when v is a string literal
+                value = v
+            if type(value) != type(d[sub_key]):  # noqa
+                raise TypeError('Type {} does not match original type {}'
+                                .format(type(value), type(d[sub_key])))
+            d[sub_key] = value
+    def merge_from_other_cfg(self, other_cfg):
+        """Merge ``other_cfg`` into this CfgNode."""
+        _merge_a_into_b(other_cfg, self)
+    def _immutable(self, is_immutable):
+        """Set immutability recursively to all nested CfgNode."""
+        self.__dict__[CfgNode.IMMUTABLE] = is_immutable
+        for v in self.__dict__.values():
+            if isinstance(v, CfgNode):
+                v._immutable(is_immutable)
+        for v in self.values():
+            if isinstance(v, CfgNode):
+                v._immutable(is_immutable)
+    def __getattr__(self, name):
+        if name in self.__dict__:
+            return self.__dict__[name]
+        elif name in self:
+            return self[name]
+        else:
+            raise AttributeError(name)
+    def __repr__(self):
+        return "{}({})".format(self.__class__.__name__,
+                               super(CfgNode, self).__repr__())
+    def __setattr__(self, name, value):
+        if not self.__dict__[CfgNode.IMMUTABLE]:
+            if name in self.__dict__:
+                self.__dict__[name] = value
+            else:
+                self[name] = value
+        else:
+            raise AttributeError(
+                'Attempted to set "{}" to "{}", but CfgNode is immutable'
+                .format(name, value))
+    def __str__(self):
+        def _indent(s_, num_spaces):
+            s = s_.split("\n")
+            if len(s) == 1:
+                return s_
+            first = s.pop(0)
+            s = [(num_spaces * " ") + line for line in s]
+            s = "\n".join(s)
+            s = first + "\n" + s
+            return s
+        r = ""
+        s = []
+        for k, v in sorted(self.items()):
+            seperator = "\n" if isinstance(v, CfgNode) else " "
+            attr_str = "{}:{}{}".format(str(k), seperator, str(v))
+            attr_str = _indent(attr_str, 2)
+            s.append(attr_str)
+        r += "\n".join(s)
+        return r
+def _merge_a_into_b(a, b):
+    """Merge config dictionary a into config dictionary b, clobbering the
+       options in b whenever they are also specified in a."""
+    if not isinstance(a, dict):
+        return
+    for k, v in a.items():
+        # a must specify keys that are in b
+        if k not in b:
+            raise KeyError('{} is not a valid config key'.format(k))
+        # The types must match, too
+        v = _check_and_coerce_cfg_value_type(v, b[k], k)
+        # Recursively merge dicts
+        if type(v) is CfgNode:
+            try:
+                _merge_a_into_b(a[k], b[k])
+            except:  # noqa
+                print('Error under config key: {}'.format(k))
+                raise
+        else:
+            b[k] = v
+def _check_and_coerce_cfg_value_type(value_a, value_b, key):
+    """Check if the value type matched."""
+    type_a, type_b = type(value_a), type(value_b)
+    if type_a is type_b:
+        return value_a
+    if type_b is float and type_a is int:
+        return float(value_a)
+    # Exceptions: numpy arrays, strings, tuple<->list
+    if isinstance(value_b, np.ndarray):
+        value_a = np.array(value_a, dtype=value_b.dtype)
+    elif isinstance(value_a, tuple) and isinstance(value_b, list):
+        value_a = list(value_a)
+    elif isinstance(value_a, list) and isinstance(value_b, tuple):
+        value_a = tuple(value_a)
+    elif isinstance(value_a, dict) and isinstance(value_b, CfgNode):
+        value_a = CfgNode(value_a)
+    else:
+        raise ValueError(
+            'Type mismatch ({} vs. {}) with values ({} vs. {}) for config '
+            'key: {}'.format(type_b, type_a, value_b, value_a, key))
+    return value_a
--- a/seetadet/core/coordinator.py
+++ b/seetadet/core/coordinator.py
@@ -8,6 +8,7 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Experiment coordinator."""
 from __future__ import absolute_import
 from __future__ import division
@@ -20,16 +21,14 @@ import time
 import numpy as np
 from seetadet.core.config import cfg
-from seetadet.core.config import cfg_from_file
+from seetadet.utils import logging
-from seetadet.utils import logger
 class Coordinator(object):
    """Manage the unique experiments."""
    def __init__(self, cfg_file, exp_dir=None):
-        # Override the default configs
+        cfg.merge_from_file(cfg_file)
-        cfg_from_file(cfg_file)
        if cfg.EXP_DIR != '':
            exp_dir = cfg.EXP_DIR
        if exp_dir is None:
@@ -43,7 +42,7 @@ class Coordinator(object):
                raise ValueError('Invalid experiment dir: ' + exp_dir)
        self.exp_dir = exp_dir
-    def _path_at(self, file, auto_create=True):
+    def path_at(self, file, auto_create=True):
        try:
            path = osp.abspath(osp.join(self.exp_dir, file))
            if auto_create and not osp.exists(path):
@@ -54,20 +53,8 @@ class Coordinator(object):
                os.makedirs(path)
        return path
-    def checkpoints_dir(self):
+    def get_checkpoint(self, step=None, last_idx=1, wait=False):
-        return self._path_at('checkpoints')
+        path = self.path_at('checkpoints')
-    def exports_dir(self):
-        return self._path_at('exports')
-    def results_dir(self, checkpoint=None, output_dir=None):
-        if output_dir is not None:
-            return output_dir
-        path = osp.splitext(osp.basename(checkpoint))[0] if checkpoint else ''
-        return self._path_at(osp.join('results', path))
-    def checkpoint(self, step=None, last_idx=1, wait=False):
-        path = self.checkpoints_dir()
        def locate(last_idx=None):
            files = os.listdir(path)
@@ -91,7 +78,7 @@ class Coordinator(object):
        file, file_step = locate(last_idx)
        while file is None and wait:
-            logger.info('Wait for checkpoint at {}.'.format(step))
+            logging.info('Wait for checkpoint at {}.'.format(step))
            time.sleep(10)
            file, file_step = locate(last_idx)
        return file, file_step
--- a/scripts/coco/__init__.py
+++ b/scripts/coco/__init__.py
--- a/seetadet/core/modules/faster_rcnn.py
+++ b/seetadet/core/modules/faster_rcnn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import dragon.vm.torch as torch
+import numpy as np
+from seetadet.core.backend import trace_module
+from seetadet.core.config import cfg
+from seetadet.utils.bbox import bbox_transform_inv
+from seetadet.utils.bbox import clip_tiled_boxes
+from seetadet.utils.blob import blob_vstack
+from seetadet.utils.image import im_rescale
+from seetadet.utils.nms import nms
+def get_data(imgs):
+    """Return the test data."""
+    im_batch, im_shapes, im_scales = [], [], []
+    for img in imgs:
+        scaled_imgs, scales = im_rescale(
+            img, scales=cfg.TEST.SCALES, max_size=cfg.TEST.MAX_SIZE)
+        im_batch += scaled_imgs
+        im_scales += scales
+        im_shapes += [x.shape[:2] for x in scaled_imgs]
+    im_batch = blob_vstack(
+        im_batch, fill_value=cfg.MODEL.PIXEL_MEAN,
+        align=(cfg.BACKBONE.COARSEST_STRIDE,) * 2)
+    im_shapes = np.array(im_shapes)
+    im_scales = np.array(im_scales).reshape((len(im_batch), -1))
+    im_info = np.hstack([im_shapes, im_scales]).astype('float32')
+    strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
+    strides = np.array(strides)[:, None]
+    grid_shapes = np.stack([im_batch.shape[1:3]] * len(strides))
+    grid_shapes = (grid_shapes - 1) // strides + 1
+    grid_info = grid_shapes.astype('int64')
+    return im_batch, im_info, grid_info
+@torch.no_grad()
+def im_detect(model, imgs):
+    """Detect images."""
+    im_batch, im_info, grid_info = get_data(imgs)
+    model.timers['im_detect'].tic()
+    inputs = {'img': torch.from_numpy(im_batch),
+              'im_info': torch.from_numpy(im_info),
+              'grid_info': torch.from_numpy(grid_info)}
+    if not hasattr(model, 'run_inference'):
+        def run_inference(self, img, im_info, grid_info):
+            return self.forward({'img': img, 'im_info': im_info,
+                                 'grid_info': grid_info})
+        trace_module(model, 'run_inference', run_inference)
+    outputs = model.run_inference(inputs['img'], inputs['im_info'],
+                                  inputs['grid_info'])
+    outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
+    bbox_pred = bbox_transform_inv(
+        outputs['rois'][:, 1:5], outputs['bbox_pred'],
+        weights=cfg.FRCNN.BBOX_REG_WEIGHTS)
+    imgs_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
+    results = [([], []) for _ in range(imgs_per_batch)]
+    batch_inds = outputs['rois'][:, 0:1].astype('int32')
+    for i in range(imgs_per_batch * num_scales):
+        index = i // num_scales
+        inds = np.where(batch_inds == i)[0]
+        boxes = bbox_pred[inds] / im_info[i, 2]
+        boxes = clip_tiled_boxes(boxes, imgs[index].shape)
+        results[index][0].append(outputs['cls_score'][inds])
+        results[index][1].append(boxes)
+    results = [[np.vstack(x) for x in y] for y in results]
+    model.timers['im_detect'].toc(n=imgs_per_batch)
+    im_boxes = []
+    for scores, boxes in results:
+        with model.timers['misc'].tic_and_toc():
+            cls_boxes = get_cls_results(scores, boxes)
+        im_boxes.append(cls_boxes)
+    return [{'boxes': boxes} for boxes in im_boxes]
+def get_cls_results(all_scores, all_boxes):
+    """Return the categorical results."""
+    empty_boxes = np.zeros((0, 5), 'float32')
+    cls_boxes = [[]]
+    for j in range(1, len(cfg.MODEL.CLASSES)):
+        inds = np.where(all_scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
+        if len(inds) == 0:
+            cls_boxes.append(empty_boxes)
+            continue
+        scores = all_scores[inds, j]
+        boxes = all_boxes[inds, j * 4:(j + 1) * 4]
+        dets = np.hstack((boxes, scores[:, np.newaxis]))
+        dets = dets.astype('float32', copy=False)
+        keep = nms(dets, cfg.TEST.NMS_THRESH)
+        cls_boxes.append(dets[keep, :])
+    return cls_boxes
--- a/seetadet/core/modules/mask_rcnn.py
+++ b/seetadet/core/modules/mask_rcnn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torch
+import numpy as np
+from seetadet.core.backend import trace_module
+from seetadet.core.config import cfg
+from seetadet.utils.bbox import bbox_transform_inv
+from seetadet.utils.bbox import clip_tiled_boxes
+from seetadet.utils.bbox import distribute_boxes
+from seetadet.utils.blob import blob_vstack
+from seetadet.utils.image import im_rescale
+from seetadet.utils.nms import nms
+def get_data(imgs):
+    """Return the test data."""
+    im_batch, im_shapes, im_scales = [], [], []
+    for img in imgs:
+        scaled_imgs, scales = im_rescale(
+            img, scales=cfg.TEST.SCALES, max_size=cfg.TEST.MAX_SIZE)
+        im_batch += scaled_imgs
+        im_scales += scales
+        im_shapes += [x.shape[:2] for x in scaled_imgs]
+    im_batch = blob_vstack(
+        im_batch, fill_value=cfg.MODEL.PIXEL_MEAN,
+        align=(cfg.BACKBONE.COARSEST_STRIDE,) * 2)
+    im_shapes = np.array(im_shapes)
+    im_scales = np.array(im_scales).reshape((len(im_batch), -1))
+    im_info = np.hstack([im_shapes, im_scales]).astype('float32')
+    strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
+    strides = np.array(strides)[:, None]
+    grid_shapes = np.stack([im_batch.shape[1:3]] * len(strides))
+    grid_shapes = (grid_shapes - 1) // strides + 1
+    grid_info = grid_shapes.astype('int64')
+    return im_batch, im_info, grid_info
+@torch.no_grad()
+def im_detect(model, imgs):
+    """Detect images."""
+    im_boxes, im_rois = im_detect_bbox(model, imgs)
+    im_rois = np.concatenate(sum(im_rois, []))
+    mask_pred = im_detect_mask(model, im_rois)
+    imgs_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
+    im_masks = [[] for _ in range(imgs_per_batch)]
+    batch_inds = im_rois[:, 0:1].astype('int32')
+    for i in range(imgs_per_batch * num_scales):
+        index = i // num_scales
+        inds = np.where(batch_inds == i)[0]
+        masks, labels = mask_pred[inds], im_rois[inds, 5]
+        num_classes = len(im_boxes[index])
+        for _ in range(num_classes - len(im_masks[index])):
+            im_masks[index].append([])
+        for j in range(1, num_classes):
+            im_masks[index][j].append(masks[np.where(labels == (j - 1))[0]])
+            if (i + 1) % num_scales == 0:
+                v = im_masks[index][j]
+                im_masks[index][j] = np.vstack(v) if len(v) > 1 else v[0]
+    return [{'boxes': boxes, 'masks': masks}
+            for boxes, masks in zip(im_boxes, im_masks)]
+@torch.no_grad()
+def im_detect_bbox(model, imgs):
+    """Detect images at single or multiple scales."""
+    im_batch, im_info, grid_info = get_data(imgs)
+    model.timers['im_detect'].tic()
+    inputs = {'img': torch.from_numpy(im_batch),
+              'im_info': torch.from_numpy(im_info),
+              'grid_info': torch.from_numpy(grid_info)}
+    if not hasattr(model, 'run_inference'):
+        def run_inference(self, img, im_info, grid_info):
+            return self.forward({'img': img, 'im_info': im_info,
+                                 'grid_info': grid_info})
+        trace_module(model, 'run_inference', run_inference)
+    outputs = model.run_inference(inputs['img'], inputs['im_info'],
+                                  inputs['grid_info'])
+    outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
+    bbox_pred = bbox_transform_inv(
+        outputs['rois'][:, 1:5], outputs['bbox_pred'],
+        weights=cfg.FRCNN.BBOX_REG_WEIGHTS)
+    imgs_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
+    results = [([], [], []) for _ in range(imgs_per_batch)]
+    batch_inds = outputs['rois'][:, 0:1].astype('int32')
+    for i in range(imgs_per_batch * num_scales):
+        index = i // num_scales
+        inds = np.where(batch_inds == i)[0]
+        boxes = bbox_pred[inds] / im_info[i, 2]
+        boxes = clip_tiled_boxes(boxes, imgs[index].shape)
+        results[index][0].append(outputs['cls_score'][inds])
+        results[index][1].append(boxes)
+        results[index][2].append(batch_inds[inds])
+    results = [[np.vstack(x) for x in y] for y in results]
+    model.timers['im_detect'].toc(n=imgs_per_batch)
+    im_boxes, im_rois = [], []
+    for scores, boxes, batch_inds in results:
+        with model.timers['misc'].tic_and_toc():
+            cls_boxes, cls_rois = get_cls_results(
+                scores, boxes, batch_inds, im_info)
+        im_boxes.append(cls_boxes)
+        im_rois.append(cls_rois)
+    return im_boxes, im_rois
+@torch.no_grad()
+def im_detect_mask(model, im_rois):
+    lvl_min, lvl_max = cfg.FRCNN.MIN_LEVEL, cfg.FRCNN.MAX_LEVEL
+    roi_lvls = distribute_boxes(im_rois[:, 1:5], lvl_min, lvl_max)
+    roi_inds = [np.where(roi_lvls == (i + lvl_min))[0]
+                for i in range(lvl_max - lvl_min + 1)]
+    rois, labels = [], []
+    for inds in roi_inds:
+        rois.append(im_rois[inds, :5] if len(inds) > 0 else
+                    np.array([[-1, 0, 0, 1, 1]], 'float32'))
+        labels.append(im_rois[inds, 5].astype('int64')
+                      if len(inds) > 0 else np.array([-1], 'int64'))
+    model.timers['im_detect_mask'].tic()
+    model.outputs['rois'] = [model.to_tensor(x) for x in rois]
+    mask_pred = model.mask_head(model.outputs)['mask_pred']
+    num_rois, num_classes = mask_pred.shape[:2]
+    labels = np.concatenate(labels)
+    fg_inds = np.where(labels >= 0)[0]
+    strides = np.arange(num_rois) * num_classes
+    mask_inds = model.to_tensor(strides[fg_inds] + labels[fg_inds])
+    mask_pred = mask_pred.flatten_(0, 1)[mask_inds].numpy()
+    mask_pred = mask_pred[np.concatenate(roi_inds).argsort()].copy()
+    model.timers['im_detect_mask'].toc()
+    return mask_pred
+def get_cls_results(all_scores, all_boxes, batch_inds, im_info):
+    """Return the categorical results."""
+    empty_boxes = np.zeros((0, 5), 'float32')
+    empty_rois = np.zeros((0, 6), 'float32')
+    cls_boxes, cls_rois = [[]], []
+    for j in range(1, len(cfg.MODEL.CLASSES)):
+        inds = np.where(all_scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
+        if len(inds) == 0:
+            cls_boxes.append(empty_boxes)
+            cls_rois.append(empty_rois)
+            continue
+        scores = all_scores[inds, j]
+        boxes = all_boxes[inds, j * 4:(j + 1) * 4]
+        dets = np.hstack((boxes, scores[:, np.newaxis]))
+        dets = dets.astype('float32', copy=False)
+        keep = nms(dets, cfg.TEST.NMS_THRESH)
+        batch_inds_keep = batch_inds[inds][keep]
+        cls_boxes.append(dets[keep, :])
+        cls_rois.append(np.hstack((
+            batch_inds_keep,
+            cls_boxes[-1][:, :4] * im_info[batch_inds_keep, 2],
+            np.ones((len(keep), 1)) * (j - 1))).astype('float32'))
+    return cls_boxes, cls_rois
--- a/seetadet/core/modules/retinanet.py
+++ b/seetadet/core/modules/retinanet.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import dragon.vm.torch as torch
+import numpy as np
+from seetadet.core.backend import trace_module
+from seetadet.core.config import cfg
+from seetadet.utils.blob import blob_vstack
+from seetadet.utils.image import im_rescale
+from seetadet.utils.nms import nms
+def get_data(imgs):
+    """Return the test data."""
+    im_batch, im_shapes, im_scales = [], [], []
+    for img in imgs:
+        scaled_imgs, scales = im_rescale(
+            img, scales=cfg.TEST.SCALES, max_size=cfg.TEST.MAX_SIZE)
+        im_batch += scaled_imgs
+        im_scales += scales
+        im_shapes += [x.shape[:2] for x in scaled_imgs]
+    im_batch = blob_vstack(
+        im_batch, fill_value=cfg.MODEL.PIXEL_MEAN,
+        size=(cfg.AUG.CROP_SIZE,) * 2,
+        align=(cfg.BACKBONE.COARSEST_STRIDE,) * 2)
+    im_shapes = np.array(im_shapes)
+    im_scales = np.array(im_scales).reshape((len(im_batch), -1))
+    im_info = np.hstack([im_shapes, im_scales]).astype('float32')
+    strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
+    strides = np.array(strides)[:, None]
+    grid_shapes = np.stack([im_batch.shape[1:3]] * len(strides))
+    grid_shapes = (grid_shapes - 1) // strides + 1
+    grid_info = grid_shapes.astype('int64')
+    return im_batch, im_info, grid_info
+@torch.no_grad()
+def im_detect(model, imgs):
+    """Detect images."""
+    im_batch, im_info, grid_info = get_data(imgs)
+    model.timers['im_detect'].tic()
+    inputs = {'img': torch.from_numpy(im_batch),
+              'im_info': torch.from_numpy(im_info),
+              'grid_info': torch.from_numpy(grid_info)}
+    if not hasattr(model, 'run_inference'):
+        def run_inference(self, img, im_info, grid_info):
+            return self.forward({'img': img, 'im_info': im_info,
+                                 'grid_info': grid_info})
+        trace_module(model, 'run_inference', run_inference)
+    outputs = model.run_inference(inputs['img'], inputs['im_info'],
+                                  inputs['grid_info'])
+    outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
+    imgs_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
+    results = [[] for _ in range(imgs_per_batch)]
+    batch_inds = outputs['dets'][:, 0:1].astype('int32')
+    for i in range(imgs_per_batch * num_scales):
+        index = i // num_scales
+        inds = np.where(batch_inds == i)[0]
+        results[index].append(outputs['dets'][inds, 1:])
+    for index in range(imgs_per_batch):
+        try:
+            results[index] = np.vstack(results[index])
+        except ValueError:
+            results[index] = results[index][0]
+    model.timers['im_detect'].toc(n=imgs_per_batch)
+    im_boxes = []
+    for dets in results:
+        with model.timers['misc'].tic_and_toc():
+            cls_boxes = get_cls_results(dets)
+        im_boxes.append(cls_boxes)
+    return [{'boxes': boxes} for boxes in im_boxes]
+def get_cls_results(all_dets):
+    """Return the categorical results."""
+    empty_boxes = np.zeros((0, 5), 'float32')
+    cls_boxes = [[]]
+    labels = all_dets[:, 5].astype('int32')
+    for j in range(1, len(cfg.MODEL.CLASSES)):
+        inds = np.where(labels == j)[0]
+        if len(inds) == 0:
+            cls_boxes.append(empty_boxes)
+            continue
+        dets = all_dets[inds, :5].astype('float32')
+        keep = nms(dets, cfg.TEST.NMS_THRESH)
+        cls_boxes.append(dets[keep, :])
+    return cls_boxes
--- a/seetadet/core/modules/ssd.py
+++ b/seetadet/core/modules/ssd.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import dragon.vm.torch as torch
+import numpy as np
+from seetadet.core.backend import trace_module
+from seetadet.core.config import cfg
+from seetadet.utils.bbox import bbox_transform_inv
+from seetadet.utils.bbox import clip_boxes
+from seetadet.utils.blob import blob_vstack
+from seetadet.utils.image import im_rescale
+from seetadet.utils.nms import nms
+def get_data(imgs):
+    """Return the test data."""
+    im_batch, im_scales = [], []
+    for img in imgs:
+        scaled_imgs, scales = im_rescale(
+            img, scales=cfg.TEST.SCALES, keep_ratio=False)
+        im_batch += scaled_imgs
+        im_scales += scales
+    im_batch = blob_vstack(im_batch, fill_value=cfg.MODEL.PIXEL_MEAN)
+    return im_batch, im_scales
+@torch.no_grad()
+def im_detect(model, imgs):
+    """Detect images."""
+    im_batch, im_scales = get_data(imgs)
+    model.timers['im_detect'].tic()
+    inputs = {'img': torch.from_numpy(im_batch)}
+    if not hasattr(model, 'run_inference'):
+        def run_inference(self, img):
+            return self.forward({'img': img})
+        trace_module(model, 'run_inference', run_inference,
+                     example_inputs=[inputs['img']])
+    outputs = model.run_inference(inputs['img'])
+    outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
+    anchors = model.bbox_head.targets.generator.grid_anchors
+    imgs_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
+    results = [([], []) for _ in range(imgs_per_batch)]
+    for i in range(imgs_per_batch * num_scales):
+        index = i // num_scales
+        boxes = bbox_transform_inv(
+            anchors, outputs['bbox_pred'][i],
+            weights=cfg.SSD.BBOX_REG_WEIGHTS)
+        boxes[:, 0::2] /= im_scales[i][1]
+        boxes[:, 1::2] /= im_scales[i][0]
+        boxes = clip_boxes(boxes, imgs[index].shape)
+        results[index][0].append(outputs['cls_score'][i])
+        results[index][1].append(boxes)
+    results = [[np.vstack(x) for x in y] for y in results]
+    model.timers['im_detect'].toc(n=imgs_per_batch)
+    im_boxes = []
+    for scores, boxes in results:
+        with model.timers['misc'].tic_and_toc():
+            cls_boxes = get_cls_results(scores, boxes)
+        im_boxes.append(cls_boxes)
+    return [{'boxes': boxes} for boxes in im_boxes]
+def get_cls_results(all_scores, all_boxes):
+    """Return the categorical results."""
+    cls_boxes = [[]]
+    for j in range(1, len(cfg.MODEL.CLASSES)):
+        inds = np.where(all_scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
+        scores, boxes = all_scores[inds, j], all_boxes[inds]
+        inds = np.argsort(-scores)[:cfg.SSD.PRE_NMS_TOP_N]
+        scores, boxes = scores[inds], boxes[inds]
+        dets = np.hstack((boxes, scores[:, np.newaxis]))
+        dets = dets.astype('float32', copy=False)
+        keep = nms(dets, cfg.TEST.NMS_THRESH)
+        cls_boxes.append(dets[keep, :])
+    return cls_boxes
--- a/seetadet/core/registry.py
+++ b/seetadet/core/registry.py
@@ -8,6 +8,7 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Registry class."""
 from __future__ import absolute_import
 from __future__ import division
@@ -18,42 +19,37 @@ import functools
 class Registry(object):
-    """The base registry class."""
+    """Registry class."""
    def __init__(self, name):
-        self._name = name
+        self.name = name
-        self._registry = collections.OrderedDict()
+        self.registry = collections.OrderedDict()
    def has(self, key):
-        return key in self._registry
+        return key in self.registry
    def register(self, name, func=None, **kwargs):
        def decorated(inner_function):
            for key in (name if isinstance(
                    name, (tuple, list)) else [name]):
-                if self.has(key):
+                self.registry[key] = \
-                    raise KeyError(
-                        '`%s` has been registered in %s.'
-                        % (key, self._name))
-                self._registry[key] = \
                    functools.partial(inner_function, **kwargs)
            return inner_function
        if func is not None:
            return decorated(func)
        return decorated
-    def get(self, name):
+    def get(self, name, default=None):
+        if name is None:
+            return None
        if not self.has(name):
-            raise KeyError(
+            if default is not None:
-                "`%s` is not registered in <%s>."
+                return default
-                % (name, self._name))
+            raise KeyError("`%s` is not registered in <%s>."
-        return self._registry[name]
+                           % (name, self.name))
+        return self.registry[name]
    def try_get(self, name):
        if self.has(name):
            return self.get(name)
        return None
-backbone = Registry('backbone')
-fusion_pass = Registry('fusion_pass')
--- a/seetadet/core/test_engine.py
+++ b/seetadet/core/test_engine.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import datetime
-import importlib
-import multiprocessing
-import numpy as np
-from seetadet.core.config import cfg
-from seetadet.utils import time_util
-from seetadet.utils.vis import vis_one_image
-def run_test_net(
-    checkpoint,
-    server,
-    devices,
-    read_every=1000,
-    log_every=100,
-):
-    classes = server.classes
-    num_images = server.num_images
-    num_classes = server.num_classes
-    devices = devices if devices else [cfg.GPU_ID]
-    num_workers = len(devices)
-    read_stride = float(num_workers * cfg.TEST.IMS_PER_BATCH)
-    read_every = int(np.ceil(read_every / read_stride) * read_stride)
-    log_every = log_every if log_every > 0 else num_images
-    test_module = 'seetadet.algo.%s.test' % cfg.MODEL.TYPE
-    test_fn = getattr(importlib.import_module(test_module), 'test_net')
-    timers = time_util.new_timers('im_detect', 'misc')
-    vis_image_dict = {}
-    all_boxes = [[[] for _ in range(num_images)] for _ in range(num_classes)]
-    all_masks = [[[] for _ in range(num_images)] for _ in range(num_classes)]
-    queues = [multiprocessing.Queue() for _ in range(num_workers + 1)]
-    workers = [
-        multiprocessing.Process(
-            target=test_fn,
-            kwargs={
-                'weights': checkpoint,
-                'q_in': queues[i],
-                'q_out': queues[-1],
-                'device': devices[i],
-                'root_logger': i == 0,
-            }
-        ) for i in range(num_workers)
-    ]
-    for process in workers:
-        process.start()
-    num_sends = 0
-    for count in range(num_images):
-        if count >= num_sends:
-            num_to_send = min(read_every, num_images - num_sends)
-            for i in range(count, count + num_to_send):
-                image_id, raw_image = server.get_image()
-                queues[i % num_workers].put((i, raw_image))
-                if cfg.VIS or cfg.VIS_ON_FILE:
-                    vis_image_dict[i] = (image_id, raw_image)
-            num_sends += num_to_send
-            if num_sends == num_images:
-                for i in range(num_workers):
-                    queues[i].put((-1, None))
-        i, time_diffs, results = queues[-1].get()
-        # Unpack the diverse results
-        boxes_this_image = results['boxes']
-        masks_this_image = results.get('masks', None)
-        # Disable some collections
-        if masks_this_image is None:
-            all_masks = None
-        # Update time difference
-        for name, diff in time_diffs.items():
-            timers[name].add_diff(diff)
-        # Visualize the results if necessary
-        if cfg.VIS or cfg.VIS_ON_FILE:
-            image_id, raw_image = vis_image_dict[i]
-            vis_one_image(
-                raw_image,
-                classes,
-                boxes_this_image,
-                masks_this_image,
-                thresh=cfg.VIS_TH,
-                box_alpha=1.,
-                show_class=True,
-                filename=server.get_save_filename(image_id),
-            )
-            del vis_image_dict[i]
-        # Pack the results in the class-major order
-        for j in range(1, num_classes):
-            all_boxes[j][i] = boxes_this_image[j]
-            if all_masks is not None:
-                if j < len(masks_this_image):
-                    all_masks[j][i] = masks_this_image[j]
-        # Limit to max_per_image detections *over all classes*
-        max_detections = cfg.TEST.DETECTIONS_PER_IM
-        if max_detections > 0:
-            scores = []
-            for j in range(1, num_classes):
-                if len(all_boxes[j][i]) < 1:
-                    continue
-                scores.append(all_boxes[j][i][:, -1])
-            if len(scores) > 0:
-                scores = np.hstack(scores)
-            if len(scores) > max_detections:
-                thr = np.sort(scores)[-max_detections]
-                for j in range(1, num_classes):
-                    keep = np.where(all_boxes[j][i][:, -1] >= thr)[0]
-                    all_boxes[j][i] = all_boxes[j][i][keep, :]
-                    if all_masks is not None:
-                        all_masks[j][i] = all_masks[j][i][keep]
-        if (count + 1) % log_every == 0:
-            avg_total_time = np.sum([t.average_time for t in timers.values()])
-            eta_seconds = avg_total_time * (num_images - count - 1)
-            print('\rim_detect: {:d}/{:d} [{:.3f}s + {:.3f}s] (eta: {})'
-                  .format(count + 1, num_images,
-                          timers['im_detect'].average_time,
-                          timers['misc'].average_time,
-                          str(datetime.timedelta(seconds=int(eta_seconds)))),
-                  end='')
-    print('\n\n>>> Evaluating detections\n')
-    server.evaluate_detections(all_boxes)
-    if all_masks is not None:
-        print('>>> Evaluating segmentations\n')
-        server.evaluate_segmentations(all_boxes, all_masks)
--- a/seetadet/core/test_server.py
+++ b/seetadet/core/test_server.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import multiprocessing as mp
-import os
-import cv2
-import dragon
-from seetadet.core.config import cfg
-from seetadet.datasets.example import Example
-from seetadet.datasets.factory import get_dataset
-class _Server(object):
-    """Base server class."""
-    def __init__(self, output_dir):
-        self.output_dir = output_dir
-        if cfg.VIS_ON_FILE:
-            self.vis_dir = os.path.join(self.output_dir, 'vis')
-            if not os.path.exists(self.vis_dir):
-                os.makedirs(self.vis_dir)
-    def evaluate_detections(self, all_boxes):
-        pass
-    def evaluate_segmentations(self, all_boxes, all_masks):
-        pass
-    def get_image(self):
-        pass
-    def get_save_filename(self, image_id, ext='.jpg'):
-        return os.path.join(self.vis_dir, image_id + ext) \
-            if cfg.VIS_ON_FILE else None
-class EvaluateServer(_Server):
-    """Server to evaluate network with ground-truth."""
-    def __init__(self, output_dir):
-        super(EvaluateServer, self).__init__(output_dir)
-        self.dataset = get_dataset(cfg.TEST.DATASET)
-        self.dataset.competition_mode(cfg.TEST.COMPETITION_MODE)
-        self.classes = self.dataset.classes
-        self.num_images = self.dataset.num_images
-        self.num_classes = self.dataset.num_classes
-        self.data_reader = dragon.io.DataReader(
-            dataset=self.dataset.cls, source=self.dataset.source)
-        self.data_reader.q_out = mp.Queue(cfg.TEST.IMS_PER_BATCH * 4)
-        self.data_reader.start()
-        self.gt_recs = collections.OrderedDict()
-    def get_image(self):
-        example = Example(self.data_reader.q_out.get())
-        image, image_id = example.image, example.id
-        self.gt_recs[image_id] = {
-            'height': example.height,
-            'width': example.width,
-            'objects': example.objects,
-        }
-        return image_id, image
-    def get_records(self):
-        if len(self.gt_recs) != self.num_images:
-            raise RuntimeError(
-                'Loading {} records, while {} required.'
-                .format(len(self.gt_recs), self.num_images))
-        return self.gt_recs
-    def evaluate_detections(self, all_boxes):
-        if cfg.TEST.PROTOCOL == 'dump':
-            self.dataset.dump_detections(all_boxes, self.output_dir)
-        else:
-            self.dataset.evaluate_detections(
-                all_boxes,
-                self.get_records(),
-                self.output_dir,
-            )
-    def evaluate_segmentations(self, all_boxes, all_masks):
-        self.dataset.evaluate_segmentations(
-            all_boxes,
-            all_masks,
-            self.get_records(),
-            self.output_dir,
-        )
-class InferServer(_Server):
-    """Server to run inference."""
-    def __init__(self, output_dir):
-        super(InferServer, self).__init__(output_dir)
-        self.images_dir = cfg.TEST.DATASET
-        self.images = os.listdir(self.images_dir)
-        self.classes = cfg.MODEL.CLASSES
-        self.num_images = len(self.images)
-        self.num_classes = len(cfg.MODEL.CLASSES)
-        self.output_dir = output_dir
-        self.image_idx = 0
-    def get_image(self):
-        image_name = self.images[self.image_idx]
-        image_id = image_name.split('.')[0]
-        image = cv2.imread(os.path.join(self.images_dir, image_name))
-        self.image_idx = (self.image_idx + 1) % self.num_images
-        return image_id, image
--- a/scripts/rotated/__init__.py
+++ b/scripts/rotated/__init__.py
--- a/seetadet/core/testing/test_engine.py
+++ b/seetadet/core/testing/test_engine.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Testing engine."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import datetime
+import importlib
+import multiprocessing as mp
+import time
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.core.testing import test_server
+from seetadet.models.build import build_detector
+from seetadet.utils.vis import vis_one_image
+from seetadet.utils import logging
+from seetadet.utils import profiler
+def filter_outputs(outputs, max_dets=100):
+    """Limit the max number of detections."""
+    if max_dets <= 0:
+        return outputs
+    boxes = outputs.pop('boxes')
+    masks = outputs.pop('masks', None)
+    scores, num_classes = [], len(boxes)
+    for i in range(num_classes):
+        if len(boxes[i]) > 0:
+            scores.append(boxes[i][:, -1])
+    scores = np.hstack(scores) if len(scores) > 0 else []
+    if len(scores) > max_dets:
+        thr = np.sort(scores)[-max_dets]
+        for i in range(num_classes):
+            if len(boxes[i]) < 1:
+                continue
+            keep = np.where(boxes[i][:, -1] >= thr)[0]
+            boxes[i] = boxes[i][keep]
+            if masks is not None:
+                masks[i] = masks[i][keep]
+    outputs['boxes'] = boxes
+    outputs['masks'] = masks
+    return outputs
+def extend_results(index, collection, results):
+    """Add image results to the collection."""
+    if results is None:
+        return
+    for _ in range(len(results) - len(collection)):
+        collection.append([])
+    for i in range(1, len(results)):
+        for _ in range(index - len(collection[i]) + 1):
+            collection[i].append([])
+        collection[i][index] = results[i]
+def test_detector(
+    test_cfg,
+    weights,
+    queues,
+    device,
+    verbose=True,
+    batch_timeout=None,
+):
+    """Test a detector.
+    Parameters
+    ----------
+    test_cfg : CfgNode
+        The cfg for testing.
+    weights : str
+        The path of model weights to load.
+    queues : Sequence[multiprocessing.Queue]
+        The input and output queue.
+    device : int
+        The index of computing device.
+    verbose : bool, optional, default=True
+        Print the infomation or not.
+    batch_timeout : number, optional
+        The timeout to wait "IMS_PER_BATCH"
+    """
+    cfg.merge_from_other_cfg(test_cfg)
+    cfg.GPU_ID = device
+    cfg.freeze()
+    logging.set_root(verbose)
+    model = build_detector(device, weights)
+    module = 'seetadet.core.modules.%s' % cfg.MODEL.TYPE
+    module = importlib.import_module(module)
+    input_queue, output_queue = queues
+    imgs_per_batch = cfg.TEST.IMS_PER_BATCH
+    must_stop = False
+    while not must_stop:
+        indices, imgs = [], []
+        deadline, timeout = None, None
+        for i in range(imgs_per_batch):
+            if batch_timeout and i == 1:
+                deadline = time.monotonic() + batch_timeout
+            if batch_timeout and i >= 1:
+                timeout = deadline - time.monotonic()
+            try:
+                index, img = input_queue.get(timeout=timeout)
+                if index < 0:
+                    must_stop = True
+                    break
+                indices.append(index)
+                imgs.append(img)
+            except Exception:
+                pass
+        if len(imgs) == 0:
+            continue
+        results = module.im_detect(model, imgs)
+        time_diffs = dict((k, v.average_time) for k, v in model.timers.items())
+        time_diffs['im_detect'] += time_diffs.pop('im_detect_mask', 0.0)
+        for i, outputs in enumerate(results):
+            output_queue.put((indices[i], time_diffs, outputs))
+def run_test(weights, output_dir, devices, read_every=100, vis_thresh=0):
+    """Run a model testing.
+    Parameters
+    ----------
+    weights : str
+        The path of model weights to load.
+    output_dir : str
+        The path to save results.
+    devices : Sequence[int]
+        The index of computing devices.
+    read_every : int, optional, default=100
+        Read every N images to distribute to devices.
+    vis_thresh : float, optional, default=0
+        The score threshold for visualization.
+    """
+    server = test_server.EvaluateServer(output_dir)
+    devices = devices if devices else [cfg.GPU_ID]
+    num_devices = len(devices)
+    num_images = server.dataset.num_images
+    max_dets = cfg.TEST.DETECTIONS_PER_IM
+    read_stride = float(num_devices * cfg.TEST.IMS_PER_BATCH)
+    read_every = int(np.ceil(read_every / read_stride) * read_stride)
+    if vis_thresh > 0:
+        import matplotlib.pyplot as plt
+        plt.switch_backend('agg')
+    queues = [mp.Queue() for _ in range(num_devices + 1)]
+    actors = [mp.Process(
+        target=test_detector, kwargs={
+            'test_cfg': cfg,
+            'weights': weights,
+            'queues': [queues[i], queues[-1]],
+            'device': devices[i],
+            'verbose': i == 0}) for i in range(num_devices)]
+    for actor in actors:
+        actor.start()
+    timers = collections.defaultdict(profiler.Timer)
+    all_boxes, all_masks, vis_images = [], [], {}
+    for count in range(1, num_images + 1):
+        img_id, img = server.get_image()
+        queues[count % num_devices].put((count - 1, img))
+        if vis_thresh > 0:
+            filename = server.get_save_filename(img_id)
+            vis_images[count - 1] = (filename, img)
+        if count % read_every > 0 and count < num_images:
+            continue
+        if count == num_images:
+            for i in range(num_devices):
+                queues[i].put((-1, None))
+        for _ in range(((count - 1) % read_every + 1)):
+            index, time_diffs, outputs = queues[-1].get()
+            outputs = filter_outputs(outputs, max_dets)
+            extend_results(index, all_boxes, outputs['boxes'])
+            extend_results(index, all_masks, outputs.get('masks', None))
+            for name, diff in time_diffs.items():
+                timers[name].add_diff(diff)
+            if vis_thresh > 0:
+                filename, img = vis_images[index]
+                vis_one_image(img, server.dataset.classes,
+                              outputs['boxes'],
+                              outputs.get('masks', None),
+                              thresh=vis_thresh,
+                              filename=filename)
+                del vis_images[index]
+        avg_time = sum([t.average_time for t in timers.values()])
+        eta_seconds = avg_time * (num_images - count)
+        print('\rim_detect: {:d}/{:d} [{:.3f}s + {:.3f}s] (eta: {})'
+              .format(count, num_images,
+                      timers['im_detect'].average_time,
+                      timers['misc'].average_time,
+                      str(datetime.timedelta(seconds=int(eta_seconds)))),
+              end='')
+    print('\nEvaluating detections...')
+    server.eval_bbox(all_boxes)
+    if len(all_masks) > 0:
+        print('Evaluating segmentations...')
+        server.eval_segm(all_boxes, all_masks)
--- a/seetadet/core/testing/test_server.py
+++ b/seetadet/core/testing/test_server.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Testing servers."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import base64
+import collections
+import multiprocessing
+import os
+import cv2
+import numpy as np
+try:
+    import flask
+except ImportError:
+    flask = None
+from seetadet.core.config import cfg
+from seetadet.data.build import build_dataset
+from seetadet.data.build import build_evaluator
+from seetadet.data.build import build_loader_test
+class BaseServer(object):
+    """Base server class."""
+    def __init__(self, output_dir):
+        self.output_dir = output_dir
+        self.vis_dir = os.path.join(self.output_dir, 'vis')
+    def get_image(self):
+        """Return the image."""
+    def get_save_filename(self, img_id, ext='.jpg'):
+        if not os.path.exists(self.vis_dir):
+            os.makedirs(self.vis_dir)
+        return os.path.join(self.vis_dir, img_id + ext)
+class EvaluateServer(BaseServer):
+    """Server to evaluate model with ground-truth."""
+    def __init__(self, output_dir):
+        super(EvaluateServer, self).__init__(output_dir)
+        self.loader = build_loader_test()
+        self.dataset = build_dataset(cfg.TEST.DATASET)
+        self.evaluator = build_evaluator()
+        self.next_inputs = []
+        self.metas = collections.OrderedDict()
+    def get_image(self):
+        if len(self.next_inputs) == 0:
+            inputs = self.loader()
+            for i, img_meta in enumerate(inputs['img_meta']):
+                self.next_inputs.append({
+                    'img': inputs['img'][i],
+                    'objects': inputs['objects'][i],
+                    'id': img_meta['id'],
+                    'height': img_meta['height'],
+                    'width': img_meta['width']})
+        inputs = self.next_inputs.pop(0)
+        img_id, img = inputs.pop('id'), inputs.pop('img')
+        self.metas[img_id] = inputs
+        return img_id, img
+    def eval_bbox(self, boxes):
+        self.check_metas()
+        res_file = self.evaluator.write_bbox_results(
+            boxes, self.metas, self.output_dir)
+        self.evaluator.eval_bbox(res_file)
+    def eval_segm(self, boxes, masks):
+        self.check_metas()
+        res_file = self.evaluator.write_segm_results(
+            boxes, masks, self.metas, self.output_dir)
+        self.evaluator.eval_segm(res_file)
+    def check_metas(self):
+        if len(self.metas) != self.dataset.num_images:
+            raise RuntimeError(
+                'Mismatched number of metas and images. ({} vs. {}).'
+                '\nCheck if existing duplicate image ids.'
+                .format(len(self.metas), self.dataset.num_images))
+        if self.evaluator.cocoGt is None:
+            ann_file = self.evaluator.write_annotations(self.metas, self.output_dir)
+            self.evaluator.load_annotations(ann_file)
+class WebServer(BaseServer, multiprocessing.Process):
+    """Server for web serving."""
+    def __init__(self, output_dir):
+        BaseServer.__init__(self, output_dir)
+        multiprocessing.Process.__init__(self, daemon=True)
+        self.output_dir = output_dir
+        self.classes = cfg.MODEL.CLASSES
+        self.num_classes = len(cfg.MODEL.CLASSES)
+        self.img_id = multiprocessing.Value('i', 0)
+    def get_image(self):
+        try:
+            req = flask.request.get_json(force=True)
+            img_base64 = req['image']
+        except KeyError:
+            err_msg = 'Not found "image" in data.'
+            flask.abort(flask.Response(err_msg))
+        try:
+            img_base64 = img_base64.split(",")[-1]
+            img_bytes = base64.b64decode(img_base64)
+        except Exception as e:
+            err_msg = 'Decode image bytes failed. Detail: ' + str(e)
+            flask.abort(flask.Response(err_msg))
+        try:
+            img = np.frombuffer(img_bytes, 'uint8')
+            img = cv2.imdecode(img, cv2.IMREAD_COLOR)
+        except Exception as e:
+            err_msg = 'Decode image bytes. Detail: ' + str(e)
+            flask.abort(flask.Response(err_msg))
+        with self.img_id.get_lock():
+            self.img_id.value += 1
+            img_id = self.img_id.value
+        return img_id, img
+class InferServer(BaseServer):
+    """Server to run model inference."""
+    def __init__(self, output_dir):
+        super(InferServer, self).__init__(output_dir)
+        self.img_dir = cfg.TEST.DATASET
+        self.classes = cfg.MODEL.CLASSES
+        self.num_classes = len(cfg.MODEL.CLASSES)
+        self.imgs = os.listdir(self.img_dir)
+        self.num_images = len(self.imgs)
+        self.img_id = 0
+    def get_image(self):
+        img_file = self.imgs[self.img_id]
+        img = cv2.imread(os.path.join(self.img_dir, img_file))
+        self.img_id += 1
+        return self.img_id - 1, img
--- a/seetadet/core/train.py
+++ b/seetadet/core/train.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import os
-from dragon.vm import torch
-from seetadet.core.config import cfg
-from seetadet.solver.sgd import SGDSolver
-from seetadet.utils import logger
-from seetadet.utils import time_util
-from seetadet.utils.stats import SmoothedValue
-class SolverWrapper(object):
-    """Sovler wrapper."""
-    def __init__(self, coordinator):
-        self.output_dir = coordinator.checkpoints_dir()
-        self.solver = SGDSolver()
-        self.detector = self.solver.detector
-        # Setup the detector
-        self.detector.load_weights(cfg.TRAIN.WEIGHTS)
-        self.detector.cuda(cfg.GPU_ID)
-        if cfg.MODEL.PRECISION.lower() == 'float16':
-            # Mixed precision training
-            self.detector.half()
-        # Plan the metrics
-        self.board = None
-        self.metrics = collections.OrderedDict()
-        if cfg.ENABLE_TENSOR_BOARD and logger.is_root():
-            try:
-                from dragon.tools.tensorboard import TensorBoard
-                log_dir = coordinator.exp_dir + '/logs'
-                self.board = TensorBoard(log_dir=log_dir)
-            except ImportError:
-                pass
-    def snapshot(self):
-        filename = (cfg.SOLVER.SNAPSHOT_PREFIX +
-                    '_iter_{}.pkl'.format(self.solver.iter))
-        filename = os.path.join(self.output_dir, filename)
-        if logger.is_root() and not os.path.exists(filename):
-            torch.save(self.detector.state_dict(), filename)
-            logger.info('Wrote snapshot to: {:s}'.format(filename))
-    def add_metrics(self, stats):
-        for k, v in stats['loss'].items():
-            if k not in self.metrics:
-                self.metrics[k] = SmoothedValue(20)
-            self.metrics[k].add_value(v)
-    def send_metrics(self, stats):
-        if self.board is not None:
-            self.board.scalar_summary('lr', stats['lr'], stats['iter'])
-            self.board.scalar_summary('time', stats['time'], stats['iter'])
-            for k, v in self.metrics.items():
-                if k == 'total':
-                    self.board.scalar_summary(
-                        'total_loss', v.median(), stats['iter'])
-                else:
-                    self.board.scalar_summary(k, v.median(), stats['iter'])
-    def step(self):
-        display = self.solver.iter % cfg.SOLVER.DISPLAY == 0
-        stats = self.solver.step()
-        self.add_metrics(stats)
-        if display:
-            logger.info(
-                'Iteration %d, lr = %.8f, loss = %f, time = %.2fs'
-                % (stats['iter'], stats['lr'],
-                   self.metrics['total'].median(), stats['time']))
-            for k, v in self.metrics.items():
-                if k == 'total':
-                    continue
-                logger.info(' ' * 10 + 'Train net output({}): {}'
-                            .format(k, v.median()))
-            self.send_metrics(stats)
-    def train_model(self):
-        """Network training loop."""
-        timer = time_util.Timer()
-        max_steps = cfg.SOLVER.MAX_STEPS
-        while self.solver.iter < max_steps:
-            with timer.tic_and_toc():
-                _, step = self.step(), self.solver.iter
-            if step % (10 * cfg.SOLVER.DISPLAY) == 0:
-                logger.info(time_util.get_progress_info(timer, step, max_steps))
-            if step % cfg.SOLVER.SNAPSHOT_EVERY == 0:
-                self.snapshot()
-def train_net(coordinator, start_iter=0):
-    sw = SolverWrapper(coordinator)
-    sw.solver.iter = start_iter
-    logger.info('Solving...')
-    sw.train_model()
-    sw.snapshot()
--- a/scripts/voc/__init__.py
+++ b/scripts/voc/__init__.py
--- a/seetadet/core/training/build.py
+++ b/seetadet/core/training/build.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Build for training library."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torch
+from seetadet.core.config import cfg
+from seetadet.core.training import lr_scheduler
+from seetadet.utils import logging
+def build_optimizer(params, **kwargs):
+    """Build the optimizer."""
+    args = {'lr': cfg.SOLVER.BASE_LR,
+            'weight_decay': cfg.SOLVER.WEIGHT_DECAY,
+            'clip_norm': cfg.SOLVER.CLIP_NORM,
+            'grad_scale': 1.0 / cfg.SOLVER.LOSS_SCALE}
+    optimizer = kwargs.pop('optimizer', cfg.SOLVER.OPTIMIZER)
+    if optimizer == 'SGD':
+        args['momentum'] = cfg.SOLVER.MOMENTUM
+    args.update(kwargs)
+    return getattr(torch.optim, optimizer)(params, **args)
+def build_lr_scheduler(**kwargs):
+    """Build the LR scheduler."""
+    args = {'lr_max': cfg.SOLVER.BASE_LR,
+            'lr_min': cfg.SOLVER.MIN_LR,
+            'warmup_steps': cfg.SOLVER.WARM_UP_STEPS,
+            'warmup_factor': cfg.SOLVER.WARM_UP_FACTOR}
+    policy = kwargs.pop('policy', cfg.SOLVER.LR_POLICY)
+    args.update(kwargs)
+    if policy == 'cosine_decay':
+        return lr_scheduler.CosineLR(
+            decay_step=(cfg.SOLVER.DECAY_STEPS or [1])[0],
+            max_steps=cfg.SOLVER.MAX_STEPS, **args)
+    elif policy == 'steps_with_decay':
+        return lr_scheduler.MultiStepLR(
+            decay_steps=cfg.SOLVER.DECAY_STEPS,
+            decay_gamma=cfg.SOLVER.DECAY_GAMMA, **args)
+    return lr_scheduler.ConstantLR(**args)
+def build_tensorboard(log_dir):
+    """Build the tensorboard."""
+    if cfg.ENABLE_TENSOR_BOARD and logging.is_root():
+        try:
+            from dragon.utils.tensorboard import TensorBoard
+            return TensorBoard(log_dir=log_dir)
+        except ImportError:
+            pass
+    return None
--- a/seetadet/core/training/lr_scheduler.py
+++ b/seetadet/core/training/lr_scheduler.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""LearningRate schedulers."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import math
+class ConstantLR(object):
+    """Constant LR scheduler."""
+    def __init__(self, **kwargs):
+        self._lr_max = kwargs.pop('lr_max')
+        self._lr_min = kwargs.pop('lr_min', 0)
+        self._warmup_steps = kwargs.pop('warmup_steps', 0)
+        self._warmup_factor = kwargs.pop('warmup_factor', 0)
+        if kwargs:
+            raise ValueError('Unexpected arguments: ' + ','.join(v for v in kwargs))
+        self._step_count = 0
+        self._last_decay = 1.
+    def step(self):
+        self._step_count += 1
+    def get_lr(self):
+        if self._step_count < self._warmup_steps:
+            alpha = (self._step_count + 1.) / self._warmup_steps
+            return self._lr_max * (alpha + (1. - alpha) * self._warmup_factor)
+        return self._lr_min + (self._lr_max - self._lr_min) * self.get_decay()
+    def get_decay(self):
+        return self._last_decay
+class CosineLR(ConstantLR):
+    """LR scheduler with cosine decay."""
+    def __init__(self, lr_max, max_steps, lr_min=0, decay_step=1, **kwargs):
+        super(CosineLR, self).__init__(lr_max=lr_max, lr_min=lr_min, **kwargs)
+        self._decay_step = decay_step
+        self._max_steps = max_steps
+    def get_decay(self):
+        t = self._step_count - self._warmup_steps
+        t_max = self._max_steps - self._warmup_steps
+        if t > 0 and t % self._decay_step == 0:
+            self._last_decay = .5 * (1. + math.cos(math.pi * t / t_max))
+        return self._last_decay
+class MultiStepLR(ConstantLR):
+    """LR scheduler with multi-steps decay."""
+    def __init__(self, lr_max, decay_steps, decay_gamma, **kwargs):
+        super(MultiStepLR, self).__init__(lr_max=lr_max, **kwargs)
+        self._decay_steps = decay_steps
+        self._decay_gamma = decay_gamma
+        self._stage_count = 0
+        self._num_stages = len(decay_steps)
+    def get_decay(self):
+        if self._stage_count < self._num_stages:
+            k = self._decay_steps[self._stage_count]
+            while self._step_count >= k:
+                self._stage_count += 1
+                if self._stage_count >= self._num_stages:
+                    break
+                k = self._decay_steps[self._stage_count]
+            self._last_decay = self._decay_gamma ** self._stage_count
+        return self._last_decay
--- a/seetadet/core/training/train_engine.py
+++ b/seetadet/core/training/train_engine.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Training engine."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import functools
+import os
+from dragon.vm import torch
+from seetadet.core.config import cfg
+from seetadet.core.training.build import build_lr_scheduler
+from seetadet.core.training.build import build_optimizer
+from seetadet.core.training.build import build_tensorboard
+from seetadet.core.training.utils import get_param_groups
+from seetadet.data.build import build_loader_train
+from seetadet.models.build import build_detector
+from seetadet.utils import logging
+from seetadet.utils import profiler
+class Trainer(object):
+    """Schedule the iterative model training."""
+    def __init__(self, coordinator):
+        # Build loader.
+        self.loader = build_loader_train()
+        # Build model.
+        self.model = build_detector(training=True)
+        self.model.load_weights(cfg.TRAIN.WEIGHTS)
+        self.model.cuda(cfg.GPU_ID)
+        if cfg.MODEL.PRECISION.lower() == 'float16':
+            self.model.half()
+        # Build optimizer.
+        self.loss_scale = cfg.SOLVER.LOSS_SCALE
+        param_groups_getter = get_param_groups
+        if cfg.SOLVER.LAYER_LR_DECAY < 1.0:
+            param_groups_getter = functools.partial(
+                param_groups_getter, lr_scale_getter=functools.partial(
+                    self.model.get_lr_scale, decay=cfg.SOLVER.LAYER_LR_DECAY))
+        self.optimizer = build_optimizer(param_groups_getter(self.model))
+        self.scheduler = build_lr_scheduler()
+        # Build monitor.
+        self.coordinator = coordinator
+        self.metrics = collections.OrderedDict()
+        self.board = build_tensorboard(coordinator.path_at('logs'))
+    @property
+    def iter(self):
+        return self.scheduler._step_count
+    def snapshot(self):
+        """Save the checkpoint of current iterative step."""
+        f = cfg.SOLVER.SNAPSHOT_PREFIX
+        f += '_iter_{}.pkl'.format(self.iter)
+        f = os.path.join(self.coordinator.path_at('checkpoints'), f)
+        if logging.is_root() and not os.path.exists(f):
+            torch.save(self.model.state_dict(), f, pickle_protocol=4)
+            logging.info('Wrote snapshot to: {:s}'.format(f))
+    def add_metrics(self, stats):
+        """Add or update the metrics."""
+        for k, v in stats['metrics'].items():
+            if k not in self.metrics:
+                self.metrics[k] = profiler.SmoothedValue()
+            self.metrics[k].update(v)
+    def display_metrics(self, stats):
+        """Send metrics to the monitor."""
+        logging.info('Iteration %d, lr = %.8f, time = %.2fs'
+                     % (stats['iter'], stats['lr'], stats['time']))
+        for k, v in self.metrics.items():
+            if k == 'total_loss':
+                continue
+            logging.info(' ' * 4 + 'Train net output({}): {:.4f} ({:.4f})'
+                         .format(k, stats['metrics'][k], v.average()))
+        if self.board is not None:
+            self.board.scalar_summary('lr', stats['lr'], stats['iter'])
+            self.board.scalar_summary('time', stats['time'], stats['iter'])
+            for k, v in self.metrics.items():
+                self.board.scalar_summary(k, v.average(), stats['iter'])
+    def step(self):
+        stats = {'iter': self.iter}
+        metrics = collections.defaultdict(float)
+        # Run forward.
+        timer = profiler.Timer().tic()
+        inputs = self.loader()
+        outputs, losses = self.model(inputs), []
+        for k, v in outputs.items():
+            if 'loss' in k:
+                losses.append(v)
+                loss_val = float(v)
+                metrics[k] += loss_val
+                metrics['total_loss'] += loss_val
+        # Run backward.
+        losses = sum(losses[1:], losses[0])
+        if self.loss_scale != 1.0:
+            losses *= self.loss_scale
+        losses.backward()
+        # Apply update.
+        stats['lr'] = self.scheduler.get_lr()
+        for group in self.optimizer.param_groups:
+            group['lr'] = stats['lr'] * group.get('lr_scale', 1.0)
+        self.optimizer.step()
+        self.scheduler.step()
+        stats['time'] = timer.toc()
+        stats['metrics'] = metrics
+        return stats
+    def train_model(self, start_iter=0):
+        """Network training loop."""
+        timer = profiler.Timer()
+        max_steps = cfg.SOLVER.MAX_STEPS
+        display_every = cfg.SOLVER.DISPLAY
+        progress_every = 10 * display_every
+        snapshot_every = cfg.SOLVER.SNAPSHOT_EVERY
+        self.scheduler._step_count = start_iter
+        while self.iter < max_steps:
+            with timer.tic_and_toc():
+                stats = self.step()
+            self.add_metrics(stats)
+            if stats['iter'] % display_every == 0:
+                self.display_metrics(stats)
+            if self.iter % progress_every == 0:
+                logging.info(profiler.get_progress(timer, self.iter, max_steps))
+            if self.iter % snapshot_every == 0:
+                self.snapshot()
+                self.metrics.clear()
+def run_train(coordinator, start_iter=0):
+    """Start a network training task."""
+    trainer = Trainer(coordinator)
+    logging.info('Start training...')
+    trainer.train_model(start_iter)
+    trainer.snapshot()
--- a/seetadet/core/training/utils.py
+++ b/seetadet/core/training/utils.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Training utilities."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+def count_params(module):
+    """Return the number of parameters in MB."""
+    return sum([v.size().numel() for v in module.parameters()]) / 1e6
+def freeze_module(module):
+    """Freeze parameters of given module."""
+    module.eval()
+    for param in module.parameters():
+        param.requires_grad = False
+def get_param_groups(module, lr_scale_getter=None):
+    """Separate parameters into groups."""
+    groups = collections.OrderedDict()
+    for name, param in module.named_parameters():
+        if not param.requires_grad:
+            continue
+        attrs = collections.OrderedDict()
+        if lr_scale_getter:
+            attrs['lr_scale'] = lr_scale_getter(name)
+        no_weight_decay = not (name.endswith('weight') and param.dim() > 1)
+        no_weight_decay = getattr(param, 'no_weight_decay', no_weight_decay)
+        if no_weight_decay:
+            attrs['weight_decay'] = 0
+        group_name = '/'.join(['%s:%s' % (v[0], v[1]) for v in list(attrs.items())])
+        if group_name not in groups:
+            groups[group_name] = {'params': []}
+            groups[group_name].update(attrs)
+        groups[group_name]['params'].append(param)
+    return list(groups.values())
--- a/seetadet/algo/faster_rcnn/__init__.py
+++ b/seetadet/algo/faster_rcnn/__init__.py
@@ -13,7 +13,7 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-from seetadet.algo.faster_rcnn.anchor_target import AnchorTarget
+# Modules.
-from seetadet.algo.faster_rcnn.data_loader import DataLoader
+from seetadet.data import datasets
-from seetadet.algo.faster_rcnn.proposal import Proposal
+from seetadet.data import evaluators
-from seetadet.algo.faster_rcnn.proposal_target import ProposalTarget
+from seetadet.data import pipelines
--- a/seetadet/algo/__init__.py
+++ b/seetadet/algo/__init__.py
--- a/seetadet/data/anchors/rpn.py
+++ b/seetadet/data/anchors/rpn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Anchor generator for RPN head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+class AnchorGenerator(object):
+    """Generate anchors for bbox regression."""
+    def __init__(self, strides, sizes, aspect_ratios,
+                 scales_per_octave=1):
+        self.strides = strides
+        self.sizes = _align_args(strides, sizes)
+        self.aspect_ratios = _align_args(strides, aspect_ratios)
+        for i in range(len(self.sizes)):
+            octave_sizes = []
+            for j in range(1, scales_per_octave):
+                scale = 2 ** (float(j) / scales_per_octave)
+                octave_sizes += [x * scale for x in self.sizes[i]]
+            self.sizes[i] += octave_sizes
+        self.scales = [[x / y for x in z] for y, z in zip(strides, self.sizes)]
+        self.cell_anchors = []
+        for i in range(len(strides)):
+            self.cell_anchors.append(generate_anchors_v2(
+                strides[i], self.aspect_ratios[i], self.sizes[i]))
+        self.grid_shapes = None
+        self.grid_anchors = None
+        self.grid_coords = None
+    def reset_grid(self, max_size):
+        """Reset the grid."""
+        self.grid_shapes = [(int(np.ceil(max_size / x)),) * 2 for x in self.strides]
+        self.grid_coords = self.get_coords(self.grid_shapes)
+        self.grid_anchors = self.get_anchors(self.grid_shapes)
+    def num_cell_anchors(self, index=0):
+        """Return number of cell anchors."""
+        return self.cell_anchors[index].shape[0]
+    def num_anchors(self, shapes):
+        """Return the number of grid anchors."""
+        return sum(self.cell_anchors[i].shape[0] * np.prod(shapes[i])
+                   for i in range(len(shapes)))
+    def get_coords(self, shapes):
+        """Return the x-y coordinates of grid anchors."""
+        xs, ys = [], []
+        for i in range(len(shapes)):
+            height, width = shapes[i]
+            x, y = np.arange(0, width), np.arange(0, height)
+            x, y = np.meshgrid(x, y)
+            # Add A anchors (A,) to cell K shifts (K,)
+            # to get shift coords (A, K)
+            xs.append(np.tile(x.flatten(), self.cell_anchors[i].shape[0]))
+            ys.append(np.tile(y.flatten(), self.cell_anchors[i].shape[0]))
+        return np.concatenate(xs), np.concatenate(ys)
+    def get_anchors(self, shapes):
+        """Return the grid anchors."""
+        grid_anchors = []
+        for i in range(len(shapes)):
+            h, w = shapes[i]
+            shift_x = np.arange(0, w) * self.strides[i]
+            shift_y = np.arange(0, h) * self.strides[i]
+            shift_x, shift_y = np.meshgrid(shift_x, shift_y)
+            shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
+                                shift_x.ravel(), shift_y.ravel())).transpose()
+            shifts = shifts.astype(self.cell_anchors[i].dtype)
+            # Add A anchors (A, 1, 4) to cell K shifts (1, K, 4)
+            # to get shift anchors (A, K, 4)
+            a, k = self.num_cell_anchors(i), shifts.shape[0]
+            anchors = (self.cell_anchors[i].reshape((a, 1, 4)) +
+                       shifts.reshape((1, k, 4)))
+            grid_anchors.append(anchors.reshape((a * k, 4)))
+        return np.vstack(grid_anchors)
+    def narrow_anchors(self, shapes, inds, return_anchors=False):
+        """Return the valid anchors on given shapes."""
+        max_shapes = self.grid_shapes
+        anchors = self.grid_anchors
+        x_coords, y_coords = self.grid_coords
+        offset1 = offset2 = num1 = num2 = 0
+        out_inds, out_anchors = [], []
+        for i in range(len(max_shapes)):
+            num1 += self.num_cell_anchors(i) * np.prod(max_shapes[i])
+            num2 += self.num_cell_anchors(i) * np.prod(shapes[i])
+            inds_keep = inds[np.where((inds >= offset1) & (inds < num1))[0]]
+            anchors_keep = anchors[inds_keep] if return_anchors else None
+            x, y = x_coords[inds_keep], y_coords[inds_keep]
+            z = ((inds_keep - offset1) // max_shapes[i][1]) // max_shapes[i][0]
+            keep = np.where((x < shapes[i][1]) & (y < shapes[i][0]))[0]
+            inds_keep = (z * shapes[i][0] + y) * shapes[i][1] + x + offset2
+            out_inds.append(inds_keep[keep])
+            out_anchors.append(anchors_keep[keep] if return_anchors else None)
+            offset1, offset2 = num1, num2
+        outputs = [np.concatenate(out_inds)]
+        if return_anchors:
+            outputs += [np.concatenate(out_anchors)]
+        return outputs[0] if len(outputs) == 1 else outputs
+def generate_anchors(stride=16, ratios=(0.5, 1, 2), scales=2 ** np.arange(3, 6)):
+    """Generate anchors by enumerating aspect ratios and scales."""
+    base_anchor = np.array([1, 1, stride, stride]) - 1
+    ratio_anchors = _ratio_enum(base_anchor, ratios)
+    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
+                         for i in range(ratio_anchors.shape[0])])
+    return anchors.astype('float32')
+def generate_anchors_v2(stride=16, ratios=(0.5, 1, 2), sizes=(32, 64, 128, 256, 512)):
+    """Generate anchors by enumerating aspect ratios and sizes."""
+    scales = np.array(sizes) / stride
+    return generate_anchors(stride, ratios, scales)
+def _whctrs(anchor):
+    """Return the xywh of an anchor."""
+    w = anchor[2] - anchor[0] + 1
+    h = anchor[3] - anchor[1] + 1
+    x_ctr = anchor[0] + 0.5 * (w - 1)
+    y_ctr = anchor[1] + 0.5 * (h - 1)
+    return w, h, x_ctr, y_ctr
+def _mkanchors(ws, hs, x_ctr, y_ctr):
+    """Return a sef of anchors by widths, heights and center."""
+    ws, hs = ws[:, np.newaxis], hs[:, np.newaxis]
+    return np.hstack((x_ctr - 0.5 * (ws - 1), y_ctr - 0.5 * (hs - 1),
+                      x_ctr + 0.5 * (ws - 1), y_ctr + 0.5 * (hs - 1)))
+def _ratio_enum(anchor, ratios):
+    """Enumerate a set of anchors by aspect ratios."""
+    w, h, x_ctr, y_ctr = _whctrs(anchor)
+    ws = np.round(np.sqrt(w * h / ratios))
+    hs = np.round(ws * ratios)
+    return _mkanchors(ws, hs, x_ctr, y_ctr)
+def _scale_enum(anchor, scales):
+    """Enumerate a set of anchors by scales."""
+    w, h, x_ctr, y_ctr = _whctrs(anchor)
+    ws, hs = w * scales, h * scales
+    return _mkanchors(ws, hs, x_ctr, y_ctr)
+def _align_args(strides, args):
+    """Align the args to the strides."""
+    args = (args * len(strides)) if len(args) == 1 else args
+    assert len(args) == len(strides)
+    return [[x] if not isinstance(x, (tuple, list)) else x[:] for x in args]
--- a/seetadet/data/anchors/ssd.py
+++ b/seetadet/data/anchors/ssd.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Anchor generator for SSD head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+class AnchorGenerator(object):
+    """Generate anchors for bbox regression."""
+    def __init__(self, strides, sizes, aspect_ratios):
+        self.strides = strides
+        self.sizes = _align_args(strides, sizes)
+        self.aspect_ratios = _align_args(strides, aspect_ratios)
+        self.scales = [[x / y for x in z] for y, z in zip(strides, self.sizes)]
+        self.cell_anchors = []
+        for i in range(len(strides)):
+            self.cell_anchors.append(generate_anchors(
+                self.aspect_ratios[i], self.sizes[i]))
+        self.grid_shapes = None
+        self.grid_anchors = None
+    def reset_grid(self, max_size):
+        """Reset the grid."""
+        self.grid_shapes = [(int(np.ceil(max_size / x)),) * 2 for x in self.strides]
+        self.grid_anchors = self.get_anchors(self.grid_shapes)
+    def num_cell_anchors(self, index=0):
+        """Return number of cell anchors."""
+        return self.cell_anchors[index].shape[0]
+    def get_anchors(self, shapes):
+        """Return the grid anchors."""
+        grid_anchors = []
+        for i in range(len(shapes)):
+            h, w = shapes[i]
+            shift_x = (np.arange(0, w) + 0.5) * self.strides[i]
+            shift_y = (np.arange(0, h) + 0.5) * self.strides[i]
+            shift_x, shift_y = np.meshgrid(shift_x, shift_y)
+            shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
+                                shift_x.ravel(), shift_y.ravel())).transpose()
+            shifts = shifts.astype(self.cell_anchors[i].dtype)
+            # Add a anchors (1, A, 4) to cell K shifts (K, 1, 4)
+            # to get shift anchors (K, A, 4) and reshape to (K * A, 4)
+            a = self.cell_anchors[i].shape[0]
+            k = shifts.shape[0]
+            anchors = (self.cell_anchors[i].reshape((1, a, 4)) +
+                       shifts.reshape((1, k, 4)).transpose((1, 0, 2)))
+            grid_anchors.append(anchors.reshape((k * a, 4)))
+        return np.vstack(grid_anchors)
+def generate_anchors(ratios, sizes):
+    """Generate anchors by enumerating aspect ratios and sizes."""
+    min_size, max_size = sizes
+    base_anchor = np.array([0, 0, min_size, min_size])
+    ratio_anchors = _ratio_enum(base_anchor, ratios)
+    size_anchors = _size_enum(base_anchor, min_size, max_size)
+    anchors = np.vstack([ratio_anchors[:1], size_anchors, ratio_anchors[1:]])
+    return anchors.astype('float32')
+def _whctrs(anchor):
+    """Return the xywh of an anchor."""
+    w, h = anchor[2], anchor[3]
+    x_ctr, y_ctr = anchor[0], anchor[1]
+    return w, h, x_ctr, y_ctr
+def _mkanchors(ws, hs, x_ctr, y_ctr):
+    """Return a sef of anchors by widths, heights and center."""
+    ws, hs = ws[:, np.newaxis], hs[:, np.newaxis]
+    return np.hstack((x_ctr - 0.5 * ws, y_ctr - 0.5 * hs,
+                      x_ctr + 0.5 * ws, y_ctr + 0.5 * hs))
+def _ratio_enum(anchor, ratios):
+    """Enumerate a set of anchors by aspect ratios."""
+    w, h, x_ctr, y_ctr = _whctrs(anchor)
+    hs = np.round(np.sqrt(w * h / ratios))
+    ws = np.round(hs * ratios)
+    return _mkanchors(ws, hs, x_ctr, y_ctr)
+def _size_enum(anchor, min_size, max_size):
+    """Enumerate a anchor for size wrt base_anchor."""
+    _, _, x_ctr, y_ctr = _whctrs(anchor)
+    ws = hs = np.sqrt([min_size * max_size])
+    return _mkanchors(ws, hs, x_ctr, y_ctr)
+def _align_args(strides, args):
+    """Align the args to the strides."""
+    args = (args * len(strides)) if len(args) == 1 else args
+    assert len(args) == len(strides)
+    return [[x] if not isinstance(x, (tuple, list)) else x[:] for x in args]
+if __name__ == '__main__':
+    anchor_generator = AnchorGenerator(
+        strides=(8, 16, 32, 64, 100, 300),
+        sizes=((30, 60), (60, 110), (110, 162),
+               (162, 213), (213, 264), (264, 315)),
+        aspect_ratios=((1, 2, 0.5),
+                       (1, 2, 0.5, 3, 0.33),
+                       (1, 2, 0.5, 3, 0.33),
+                       (1, 2, 0.5, 3, 0.33),
+                       (1, 2, 0.5),
+                       (1, 2, 0.5)))
+    anchor_generator.reset_grid(max_size=300)
+    assert anchor_generator.grid_anchors.shape == (8732, 4)
--- a/seetadet/data/assigners.py
+++ b/seetadet/data/assigners.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Ground-truth assigners."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+from seetadet.utils.bbox import bbox_overlaps
+class MaxIoUAssigner(object):
+    """Assign ground-truth to boxes according to the IoU."""
+    def __init__(
+        self,
+        pos_iou_thr=0.5,
+        neg_iou_thr=0.5,
+        match_low_quality=True,
+        gt_max_assign_all=True,
+    ):
+        """Create a ``MaxIoUAssigner``.
+        Parameters
+        ----------
+        pos_iou_thr : float, optional, default=0.5
+            The minimum IoU overlap to label positives.
+        neg_iou_thr : float, optional, default=0.5
+            The maximum IoU overlap to label negatives.
+        match_low_quality : bool, optional, default=True
+            Assign boxes for each gt box or not.
+        gt_max_assign_all : bool, optional, default=True
+            Assign all boxes with max overlaps for gt boxes or not.
+        """
+        self.pos_iou_thr = pos_iou_thr
+        self.neg_iou_thr = neg_iou_thr
+        self.match_low_quality = match_low_quality
+        self.gt_max_assign_all = gt_max_assign_all
+    def assign(self, boxes, gt_boxes):
+        # Initialize assigns with ignored index "-1".
+        num_boxes = len(boxes)
+        labels = np.empty((num_boxes,), 'int8')
+        labels.fill(-1)
+        # Overlaps between the anchors and the gt boxes.
+        overlaps = bbox_overlaps(boxes, gt_boxes)
+        max_overlaps = overlaps.max(axis=1)
+        # Background: below threshold IoU.
+        labels[max_overlaps < self.neg_iou_thr] = 0
+        # Foreground: above threshold IoU.
+        labels[max_overlaps >= self.pos_iou_thr] = 1
+        # Foreground: for each gt, assign anchor(s) with highest overlap.
+        if self.match_low_quality:
+            if self.gt_max_assign_all:
+                gt_max_overlaps = overlaps.max(axis=0)
+                gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
+            else:
+                gt_argmax_overlaps = overlaps.argmax(axis=0)
+            labels[gt_argmax_overlaps] = 1
+        # Return the assigned labels for future development.
+        return labels
--- a/seetadet/data/build.py
+++ b/seetadet/data/build.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Build for data."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.core.config import cfg
+from seetadet.core.registry import Registry
+LOADERS = Registry('loaders')
+DATASETS = Registry('datasets')
+EVALUATORS = Registry('evaluators')
+ANCHOR_SAMPLERS = Registry('anchor_samplers')
+def build_anchor_sampler():
+    return ANCHOR_SAMPLERS.try_get(cfg.MODEL.TYPE)()
+def build_dataset(path):
+    """Build the dataset."""
+    keys = path.split('://')
+    if len(keys) >= 2:
+        return DATASETS.get(keys[0])(keys[1])
+    return DATASETS.get('kpl')(path)
+def build_loader_train(**kwargs):
+    """Build the train loader."""
+    args = {'dataset': cfg.TRAIN.DATASET,
+            'batch_size': cfg.TRAIN.IMS_PER_BATCH,
+            'num_workers': cfg.TRAIN.NUM_WORKERS,
+            'shuffle': True, 'contiguous': True}
+    args.update(kwargs)
+    return LOADERS.get(cfg.TRAIN.LOADER)(**args)
+def build_loader_test(**kwargs):
+    """Build the test loader."""
+    args = {'dataset': cfg.TEST.DATASET,
+            'batch_size': cfg.TEST.IMS_PER_BATCH,
+            'shuffle': False, 'contiguous': False}
+    args.update(kwargs)
+    return LOADERS.get(cfg.TEST.LOADER)(**args)
+def build_evaluator(**kwargs):
+    evaluator_type = cfg.TEST.EVALUATOR
+    if not evaluator_type:
+        return None
+    args = {'classes': cfg.MODEL.CLASSES}
+    if evaluator_type == 'voc2007':
+        args['use_07_metric'] = True
+    args.update(kwargs)
+    evaluator = EVALUATORS.get(evaluator_type)(**args)
+    ann_file = cfg.TEST.JSON_DATASET
+    if ann_file:
+        evaluator.load_annotations(ann_file)
+    return evaluator
--- a/scripts/rotated/im2rec.py
+++ b/scripts/rotated/im2rec.py
@@ -8,24 +8,14 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Datasets."""
-"""Make record file for Rotated dataset."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-from os import path as osp
+# Classes.
-from maker import make_record
+from seetadet.data.datasets.datum import AnnotatedDatum
-if __name__ == '__main__':
-    data_root = '/data'
-    make_record(
+# Modules.
-        record_file=osp.join(data_root, 'rotated_train'),
+from seetadet.data.datasets import kpl_dataset
-        images_path=[osp.join(data_root, 'JPEGImages')],
-        annotations_path=[osp.join(data_root, 'Annotations')],
-        splits_path=[osp.join(data_root, 'ImageSets')],
-        splits=['train']
-    )
--- a/seetadet/data/datasets/dataset.py
+++ b/seetadet/data/datasets/dataset.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Base dataset."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.core.config import cfg
+class Dataset(object):
+    """Base dataset class."""
+    def __init__(self, source):
+        self.source = source
+        self.num_images = 0
+        self.classes = cfg.MODEL.CLASSES
+        self.num_classes = len(self.classes)
+        self.class_to_ind = dict(zip(self.classes, range(self.num_classes)))
+    @property
+    def type(self):
+        return type(self)
--- a/seetadet/datasets/example.py
+++ b/seetadet/datasets/example.py
@@ -8,6 +8,7 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Annotated datum."""
 from __future__ import absolute_import
 from __future__ import division
@@ -17,72 +18,41 @@ import cv2
 import numpy as np
-class Example(object):
+class AnnotatedDatum(object):
-    """Wrapper for annotated example."""
+    """Wrapper for annotated datum."""
-    def __init__(self, datum):
+    def __init__(self, example):
-        """Create a ``Example``.
+        self._example = example
+        self._img = None
-        Parameters
-        ----------
-        datum : Dict
-            The data loaded for dataset
-        """
-        self._datum = datum
-        self._image = None
    @property
    def id(self):
-        """Return the example id.
+        """Return the example id."""
+        return self._example['id']
-        Returns
-        -------
-        str
-            The unique id.
-        """
-        return self._datum['id']
-    @property
-    def image(self):
-        """Return the image array.
-        Returns
-        -------
-        numpy.ndarray
-            The image array.
-        """
-        if self._image is None:
-            img_bytes = np.frombuffer(self._datum['content'], 'uint8')
-            self._image = cv2.imdecode(img_bytes, 3)
-        return self._image
    @property
    def height(self):
-        """Return the image height.
+        """Return the image height."""
+        return self._example['height']
-        Returns
+    @property
-        -------
+    def width(self):
-        int
+        """Return the image width."""
-            The image height.
+        return self._example['width']
-        """
+    @property
-        return self._datum['height']
+    def img(self):
+        """Return the image array."""
+        if self._img is None:
+            img_bytes = np.frombuffer(self._example['content'], 'uint8')
+            self._img = cv2.imdecode(img_bytes, cv2.IMREAD_COLOR)
+        return self._img
    @property
    def objects(self):
-        """Return the annotated objects.
+        """Return the annotated objects."""
-        Returns
-        -------
-        Sequence[dict]
-            The objects.
-        """
        objects = []
-        for ix, obj in enumerate(self._datum['object']):
+        for obj in self._example['object']:
            mask = obj.get('mask', None)
            polygons = obj.get('polygons', None)
            if 'x3' in obj:
@@ -99,25 +69,11 @@ class Example(object):
                bbox = [obj['xmin'], obj['ymin'], obj['xmax'], obj['ymax']]
            else:
                bbox = obj['bbox']
-            objects.append({
+            objects.append({'name': obj['name'],
-                'name': obj['name'],
                            'bbox': bbox,
-                'difficult': obj.get('difficult', 0),
+                            'difficult': obj.get('difficult', 0)})
-            })
            if mask is not None and len(mask) > 0:
                objects[-1]['mask'] = mask
            elif polygons is not None and len(polygons) > 0:
                objects[-1]['polygons'] = [np.array(p) for p in polygons]
        return objects
-    @property
-    def width(self):
-        """Return the image width.
-        Returns
-        -------
-        int
-            The image width.
-        """
-        return self._datum['width']
--- a/seetadet/data/datasets/kpl_dataset.py
+++ b/seetadet/data/datasets/kpl_dataset.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""KPLRecord dataset."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import dragon
+from seetadet.data.build import DATASETS
+from seetadet.data.datasets.dataset import Dataset
+@DATASETS.register('kpl')
+class KPLRecordDataset(Dataset):
+    def __init__(self, source):
+        super(KPLRecordDataset, self).__init__(source)
+        self.num_images = self.type(self.source).size
+    @property
+    def type(self):
+        return dragon.io.KPLRecordDataset
--- a/seetadet/algo/ssd/__init__.py
+++ b/seetadet/algo/ssd/__init__.py
@@ -8,10 +8,12 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Evaluators."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-from seetadet.algo.ssd.data_loader import DataLoader
+# Modules.
-from seetadet.algo.ssd.anchor_target import AnchorTarget
+from seetadet.data.evaluators import coco_evaluator
+from seetadet.data.evaluators import voc_evaluator
--- a/seetadet/data/evaluators/coco_evaluator.py
+++ b/seetadet/data/evaluators/coco_evaluator.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""COCO dataset evaluator."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import numpy as np
+import prettytable
+from pycocotools.cocoeval import COCOeval
+from seetadet.data.build import EVALUATORS
+from seetadet.data.evaluators.evaluator import Evaluator
+@EVALUATORS.register('coco')
+class COCOEvaluator(Evaluator):
+    """Evaluator for MS COCO dataset."""
+    def __init__(self, classes):
+        super(COCOEvaluator, self).__init__(classes, COCOeval)
+    def print_eval_results(self, coco_eval):
+        def get_thr_ind(coco_eval, thr):
+            ind = np.where((coco_eval.params.iouThrs > thr - 1e-5) &
+                           (coco_eval.params.iouThrs < thr + 1e-5))[0][0]
+            iou_thr = coco_eval.params.iouThrs[ind]
+            assert np.isclose(iou_thr, thr)
+            return ind
+        ind_lo = get_thr_ind(coco_eval, 0.5)
+        ind_hi = get_thr_ind(coco_eval, 0.95)
+        # Precision: (iou, recall, cls, area range, max dets)
+        # Recall: (iou, cls, area range, max dets)
+        # Area range index 0: all area ranges
+        # Max dets index 2: 100 per image
+        all_prec = coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, :, 0, 2]
+        all_recall = coco_eval.eval['recall'][ind_lo:(ind_hi + 1), :, 0, 2]
+        metrics = collections.OrderedDict([
+            ('AP@[IoU=0.5:0.95]', []), ('AR@[IoU=0.5:0.95]', [])])
+        class_table = prettytable.PrettyTable()
+        for cls_ind, cls in enumerate(self.classes):
+            if cls == '__background__':
+                continue
+            ap = np.mean(all_prec[:, :, cls_ind - 1])  # (iou, recall, cls)
+            recall = np.mean(all_recall[:, cls_ind - 1])  # (iou, cls)
+            metrics['AP@[IoU=0.5:0.95]'].append(ap)
+            metrics['AR@[IoU=0.5:0.95]'].append(recall)
+        for k, v in metrics.items():
+            v = np.nan_to_num(v, nan=0)
+            class_table.add_column(k, np.round(v * 100, 2))
+        class_table.add_column('Class', self.classes[1:])
+        print('Per class results:\n' + class_table.get_string(), '\n')
+        print('Summary:')
+        coco_eval.summarize()
--- a/seetadet/data/evaluators/evaluator.py
+++ b/seetadet/data/evaluators/evaluator.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Base evaluator."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import json
+import os
+import numpy as np
+from pycocotools import mask as maskUtils
+from pycocotools.coco import COCO
+from seetadet.core.config import cfg
+from seetadet.utils import logging
+from seetadet.utils.mask import paste_masks
+class Evaluator(object):
+    """Evaluator using COCO json dataset format."""
+    def __init__(self, classes, eval_type=None):
+        self.classes = classes
+        self.num_classes = len(self.classes)
+        self.class_to_cat_id = dict(zip(self.classes, range(self.num_classes)))
+        self.eval_type = eval_type
+        self.cocoGt = None
+        self.binary_thresh = cfg.TEST.BINARY_THRESH
+    def load_annotations(self, ann_file=None):
+        """Load annotations."""
+        self.cocoGt = COCO(ann_file)
+        if len(self.cocoGt.dataset) > 0:
+            self.class_to_cat_id = dict((v['name'], v['id'])
+                                        for v in self.cocoGt.cats.values())
+    def eval_bbox(self, res_file):
+        """Evaluate bbox results."""
+        if len(self.cocoGt.dataset['annotations']) == 0:
+            logging.info('No annotations. Skip evaluation.')
+            return
+        cocoDt = self.cocoGt.loadRes(res_file)
+        coco_eval = self.eval_type(self.cocoGt, cocoDt, 'bbox')
+        coco_eval.evaluate()
+        coco_eval.accumulate()
+        self.print_eval_results(coco_eval)
+    def eval_segm(self, res_file):
+        """Evaluate segmentation results."""
+        if len(self.cocoGt.dataset['annotations']) == 0:
+            logging.info('No annotations. Skip evaluation.')
+            return
+        cocoDt = self.cocoGt.loadRes(res_file)
+        coco_eval = self.eval_type(self.cocoGt, cocoDt, 'segm')
+        coco_eval.evaluate()
+        coco_eval.accumulate()
+        self.print_eval_results(coco_eval)
+    def print_eval_results(self, coco_eval):
+        """Print the evaluation results."""
+    def bbox_results_one_category(self, boxes, metas, cat_id):
+        results = []
+        for i, img_id in enumerate(metas.keys()):
+            dets = boxes[i].astype('float64')
+            if len(dets) == 0:
+                continue
+            xs, ys = dets[:, 0], dets[:, 1]
+            ws, hs = dets[:, 2] - xs + 1, dets[:, 3] - ys + 1
+            scores = dets[:, -1]
+            results.extend([{
+                'image_id': self.get_image_id(img_id),
+                'category_id': cat_id,
+                'bbox': [xs[j], ys[j], ws[j], hs[j]],
+                'score': scores[j],
+            } for j in range(dets.shape[0])])
+        return results
+    def segm_results_one_category(self, boxes, masks, metas, cat_id):
+        results = []
+        for i, (img_id, meta) in enumerate(metas.items()):
+            dets = boxes[i]
+            if len(dets) == 0:
+                continue
+            img_size = (meta['height'], meta['width'])
+            segms = self.encode_masks(masks[i], dets[:, :4], img_size)
+            scores = dets[:, -1]
+            results.extend([{
+                'image_id': self.get_image_id(img_id),
+                'category_id': cat_id,
+                'segmentation': segms[j],
+                'score': float(scores[j]),
+            } for j in range(dets.shape[0])])
+        return results
+    def write_bbox_results(self, boxes, metas, output_dir):
+        results = []
+        for cls_ind, cls in enumerate(self.classes):
+            if cls == '__background__':
+                continue
+            print('Collecting {} results ({:d}/{:d})'
+                  .format(cls, cls_ind, self.num_classes - 1))
+            results.extend(self.bbox_results_one_category(
+                boxes[cls_ind], metas, self.class_to_cat_id[cls]))
+        res_file = self.get_res_file(output_dir)
+        print('Writing results json to {}'.format(res_file))
+        with open(res_file, 'w') as f:
+            json.dump(results, f)
+        return res_file
+    def write_segm_results(self, boxes, masks, metas, output_dir):
+        results = []
+        for cls_ind, cls in enumerate(self.classes):
+            if cls == '__background__':
+                continue
+            print('Collecting {} results ({:d}/{:d})'
+                  .format(cls, cls_ind, self.num_classes - 1))
+            results.extend(self.segm_results_one_category(
+                boxes[cls_ind], masks[cls_ind], metas,
+                self.class_to_cat_id[cls]))
+        res_file = self.get_res_file(output_dir, 'segm')
+        print('Writing results json to {}'.format(res_file))
+        with open(res_file, 'w') as fid:
+            json.dump(results, fid)
+        return res_file
+    def write_annotations(self, metas, output_dir):
+        dataset = {'images': [], 'categories': [], 'annotations': []}
+        for img_id, meta in metas.items():
+            dataset['images'].append({
+                'id': self.get_image_id(img_id),
+                'height': meta['height'], 'width': meta['width']})
+        for cls in self.classes:
+            if cls == '__background__':
+                continue
+            dataset['categories'].append({
+                'name': cls, 'id': self.class_to_cat_id[cls]})
+        for img_id, meta in metas.items():
+            img_size = (meta['height'], meta['width'])
+            for obj in meta['objects']:
+                x, y = obj['bbox'][0], obj['bbox'][1]
+                w, h = obj['bbox'][2] - x + 1, obj['bbox'][3] - y + 1
+                dataset['annotations'].append({
+                    'id': str(len(dataset['annotations'])),
+                    'bbox': [x, y, w, h],
+                    'area': w * h,
+                    'iscrowd': obj['difficult'],
+                    'image_id': self.get_image_id(img_id),
+                    'category_id': self.class_to_cat_id[obj['name']]})
+                if 'mask' in obj:
+                    segm = {'size': img_size, 'counts': obj['mask']}
+                    dataset['annotations'][-1]['segmentation'] = segm
+                elif 'polygons' in obj:
+                    segm = []
+                    for poly in obj['polygons']:
+                        if isinstance(poly, np.ndarray):
+                            poly = poly.tolist()
+                        segm.append(poly)
+                    dataset['annotations'][-1]['segmentation'] = segm
+        ann_file = self.get_ann_file(output_dir)
+        print('Writing annotations json to {}'.format(ann_file))
+        with open(ann_file, 'w') as f:
+            json.dump(dataset, f)
+        return ann_file
+    def encode_masks(self, masks, boxes, size):
+        segms = maskUtils.encode(paste_masks(
+            masks, boxes, size, thresh=self.binary_thresh))
+        for segm in segms:
+            segm['counts'] = segm['counts'].decode()
+        return segms
+    @staticmethod
+    def get_prefix(type='bbox'):
+        if type == 'bbox':
+            return 'detections'
+        elif type == 'segm':
+            return 'segmentations'
+        elif type == 'kpt':
+            return 'keypoints'
+        return ''
+    @staticmethod
+    def get_ann_file(output_dir):
+        filename = 'annotations.json'
+        if not os.path.exists(output_dir):
+            os.makedirs(output_dir)
+        return os.path.join(output_dir, filename)
+    @staticmethod
+    def get_res_file(output_dir, type='bbox'):
+        filename = Evaluator.get_prefix(type) + '.json'
+        if not os.path.exists(output_dir):
+            os.makedirs(output_dir)
+        return os.path.join(output_dir, filename)
+    @staticmethod
+    def get_image_id(image_name):
+        image_id = image_name.split('_')[-1].split('.')[0]
+        try:
+            return int(image_id)
+        except ValueError:
+            return image_name
--- a/seetadet/data/evaluators/voc_eval.py
+++ b/seetadet/data/evaluators/voc_eval.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Evaluation API on the Pascal VOC dataset."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import datetime
+import itertools
+import time
+import numpy as np
+from pycocotools import mask as maskUtils
+def voc_ap(rec, prec, use_07_metric=False):
+    """Compute VOC AP given precision and recall."""
+    if use_07_metric:
+        # 11 point metric.
+        ap = 0.
+        for t in np.arange(0., 1.1, 0.1):
+            if np.sum(rec >= t) == 0:
+                p = 0
+            else:
+                p = np.max(prec[rec >= t])
+            ap = ap + p / 11.
+    else:
+        # Correct AP calculation.
+        # First append sentinel values at the end.
+        mrec = np.concatenate(([0.], rec, [1.]))
+        mpre = np.concatenate(([0.], prec, [0.]))
+        # Compute the precision envelope.
+        for i in range(mpre.size - 1, 0, -1):
+            mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
+        # To calculate area under PR curve, look for points.
+        # where X axis (recall) changes value.
+        i = np.where(mrec[1:] != mrec[:-1])[0]
+        # And sum (\Delta recall) * prec
+        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
+    return ap
+class VOCeval(object):
+    """Interface for evaluating detection via COCO object."""
+    def __init__(self, cocoGt=None, cocoDt=None, iouType='bbox',
+                 iouThrs=[0.5, 0.7], use_07_metric=False):
+        self.cocoGt = cocoGt
+        self.cocoDt = cocoDt
+        self.params = Params(iouType)
+        self.params.iouThrs = iouThrs
+        self.params.use_07_metric = use_07_metric
+        if cocoGt is not None:
+            self.params.imgIds = sorted(cocoGt.getImgIds())
+            self.params.catIds = sorted(cocoGt.getCatIds())
+        self.ious = {}
+    def _prepare(self):
+        p = self.params
+        gts = self.cocoGt.loadAnns(
+            self.cocoGt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
+        dts = self.cocoDt.loadAnns(
+            self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
+        for gt in gts:
+            gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
+            gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']
+        self._gts = collections.defaultdict(list)
+        self._dts = collections.defaultdict(list)
+        for gt in gts:
+            self._gts[gt['image_id'], gt['category_id']].append(gt)
+        for dt in dts:
+            self._dts[dt['image_id'], dt['category_id']].append(dt)
+        self.eval = {}
+    def evaluate(self):
+        tic = time.time()
+        print('Running per image evaluation...')
+        p = self.params
+        print('Evaluate annotation type *{}*'.format(p.iouType))
+        p.imgIds = list(np.unique(p.imgIds))
+        p.catIds = list(np.unique(p.catIds))
+        self._prepare()
+        self.ious = {(imgId, catId): self.computeIoU(imgId, catId)
+                     for imgId in p.imgIds for catId in p.catIds}
+        self.evalImgs = [self.evaluateImg(imgId, catId)
+                         for catId in p.catIds for imgId in p.imgIds]
+        toc = time.time()
+        print('DONE (t={:0.2f}s).'.format(toc - tic))
+    def accumulate(self, p=None):
+        print('Accumulating evaluation results...')
+        tic = time.time()
+        if not self.evalImgs:
+            print('Please run evaluate() first')
+        if p is None:
+            p = self.params
+        print('VOC07 metric? ' + ('Yes' if p.use_07_metric else 'No'))
+        T, K, I = len(p.iouThrs), len(p.catIds), len(p.imgIds)
+        recall, ap = np.zeros((T, K)), np.zeros((T, K))
+        for k in range(K):
+            E = [self.evalImgs[k * I + i] for i in range(I)]
+            E = [e for e in E if e is not None]
+            if len(E) == 0:
+                continue
+            dtScores = np.concatenate([e['dtScores'] for e in E])
+            inds = np.argsort(-dtScores)
+            dtm = np.concatenate([e['dtMatches'] for e in E], axis=1)[:, inds]
+            dtIg = np.concatenate([e['dtIgnore'] for e in E], axis=1)[:, inds]
+            gtIg = np.concatenate([e['gtIgnore'] for e in E])
+            npig = np.count_nonzero(gtIg == 0)
+            if npig == 0:
+                continue
+            tps = np.logical_and(dtm, np.logical_not(dtIg))
+            fps = np.logical_and(np.logical_not(dtm), np.logical_not(dtIg))
+            tp_sum = np.cumsum(tps, axis=1).astype(dtype=np.float)
+            fp_sum = np.cumsum(fps, axis=1).astype(dtype=np.float)
+            for t, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
+                nd = len(tp)
+                rc = tp / npig
+                pr = tp / np.maximum(tp + fp, np.spacing(1))
+                recall[t, k] = rc[-1] if nd else 0
+                ap[t, k] = voc_ap(rc, pr, use_07_metric=p.use_07_metric)
+        self.eval = {'counts': [T, K],
+                     'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
+                     'ap': ap, 'recall': recall}
+        toc = time.time()
+        print('DONE (t={:0.2f}s).'.format(toc - tic))
+    def computeIoU(self, imgId, catId):
+        p = self.params
+        gt = self._gts[imgId, catId]
+        dt = self._dts[imgId, catId]
+        if len(gt) == 0 and len(dt) == 0:
+            return []
+        inds = np.argsort([-d['score'] for d in dt], kind='mergesort')
+        dt = [dt[i] for i in inds]
+        if p.iouType == 'segm':
+            g = [g['segmentation'] for g in gt]
+            d = [d['segmentation'] for d in dt]
+        elif p.iouType == 'bbox':
+            g = [g['bbox'] for g in gt]
+            d = [d['bbox'] for d in dt]
+        else:
+            raise Exception('unknown iouType for iou computation')
+        iscrowd = [int(o['iscrowd']) for o in gt]
+        return maskUtils.iou(d, g, iscrowd)
+    def evaluateImg(self, imgId, catId):
+        p = self.params
+        gt = self._gts[imgId, catId]
+        dt = self._dts[imgId, catId]
+        if len(gt) == 0 and len(dt) == 0:
+            return None
+        for g in gt:
+            g['_ignore'] = g['ignore']
+        gtind = np.argsort([g['_ignore'] for g in gt], kind='mergesort')
+        gt = [gt[i] for i in gtind]
+        dtind = np.argsort([-d['score'] for d in dt], kind='mergesort')
+        dt = [dt[i] for i in dtind]
+        iscrowd = [int(o['iscrowd']) for o in gt]
+        ious = (self.ious[imgId, catId][:, gtind]
+                if len(self.ious[imgId, catId]) > 0 else self.ious[imgId, catId])
+        T, G, D = len(p.iouThrs), len(gt), len(dt)
+        gtm, dtm = np.zeros((T, G)), np.zeros((T, D))
+        gtIg, dtIg = np.array([g['_ignore'] for g in gt]), np.zeros((T, D))
+        for (tind, iou), (dind, d) in itertools.product(
+                enumerate(p.iouThrs), enumerate(dt)):
+            m = -1
+            for gind, g in enumerate(gt):
+                if gtm[tind, gind] > 0 and not iscrowd[gind]:
+                    continue
+                if m > -1 and gtIg[m] == 0 and gtIg[gind] == 1:
+                    break
+                if ious[dind, gind] <= iou:
+                    continue
+                m = gind
+            if m == -1:
+                continue
+            dtIg[tind, dind] = gtIg[m]
+            dtm[tind, dind] = gt[m]['id']
+            gtm[tind, m] = d['id']
+        return {'image_id': imgId,
+                'category_id': catId,
+                'dtMatches': dtm,
+                'dtScores': [d['score'] for d in dt],
+                'gtIgnore': gtIg,
+                'dtIgnore': dtIg}
+class Params(object):
+    """Params for evaluation API."""
+    def setDetParams(self):
+        self.imgIds = []
+        self.catIds = []
+        self.iouThrs = [0.5]
+        self.use_07_metric = False
+    def __init__(self, iouType='segm'):
+        if iouType == 'segm' or iouType == 'bbox':
+            self.setDetParams()
+        else:
+            raise Exception('iouType not supported')
+        self.iouType = iouType
--- a/seetadet/data/evaluators/voc_evaluator.py
+++ b/seetadet/data/evaluators/voc_evaluator.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""VOC dataset evaluator."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import functools
+import numpy as np
+import prettytable
+from seetadet.data.build import EVALUATORS
+from seetadet.data.evaluators.evaluator import Evaluator
+from seetadet.data.evaluators.voc_eval import VOCeval
+@EVALUATORS.register(['voc', 'voc2007', 'voc2010', 'voc2012'])
+class VOCEvaluator(Evaluator):
+    """Evaluator for Pascal VOC dataset."""
+    def __init__(self, classes, use_07_metric=False):
+        eval_type = functools.partial(
+            VOCeval, iouThrs=[0.5], use_07_metric=use_07_metric)
+        super(VOCEvaluator, self).__init__(classes, eval_type)
+    def print_eval_results(self, coco_eval):
+        metrics = collections.OrderedDict()
+        for cls_ind, cls in enumerate(self.classes):
+            if cls == '__background__':
+                continue
+            for k, name in zip(('ap', 'recall'), ('AP', 'AR')):
+                for i, iou in enumerate(coco_eval.params.iouThrs):
+                    name = '%s@[IoU=%s]' % (name, str(iou))
+                    v = coco_eval.eval[k][i, cls_ind]
+                    if name not in metrics:
+                        metrics[name] = []
+                    metrics[name].append(v)
+        class_table = prettytable.PrettyTable()
+        summary_table = prettytable.PrettyTable()
+        for k, v in metrics.items():
+            v = np.nan_to_num(v, nan=0)
+            class_table.add_column(k, np.round(v * 100, 2))
+            summary_table.add_column(k, [np.round(np.mean(v) * 100, 2)])
+        class_table.add_column('Class', self.classes[1:])
+        print('Per class results:\n' + class_table.get_string(), '\n')
+        print('Summary:\n' + summary_table.get_string())
--- a/seetadet/data/loader.py
+++ b/seetadet/data/loader.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import multiprocessing as mp
+import time
+import threading
+import queue
+import cv2
+import dragon
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.data.build import build_dataset
+from seetadet.utils import logging
+from seetadet.utils.blob import blob_vstack
+class BalancedQueues(object):
+    """Balanced queues."""
+    def __init__(self, base_queue, num=1):
+        self.queues = [base_queue]
+        self.queues += [mp.Queue(base_queue._maxsize) for _ in range(num - 1)]
+        self.index = 0
+    def put(self, obj, block=True, timeout=None):
+        q = self.queues[self.index]
+        q.put(obj, block=block, timeout=timeout)
+        self.index = (self.index + 1) % len(self.queues)
+    def get(self, block=True, timeout=None):
+        q = self.queues[self.index]
+        obj = q.get(block=block, timeout=timeout)
+        self.index = (self.index + 1) % len(self.queues)
+        return obj
+class DataWorkerBase(mp.Process):
+    """Base class of data worker."""
+    def __init__(self):
+        super(DataWorkerBase, self).__init__(daemon=True)
+        self.seed = cfg.RNG_SEED
+        self.reader_queue = None
+        self.worker_queue = None
+    def get_outputs(self, inputs):
+        """Return the transformed data."""
+        return inputs
+    def run(self):
+        # Disable the opencv threading.
+        cv2.setNumThreads(1)
+        # Fix the process-local random seed.
+        np.random.seed(self.seed)
+        # Use cached buffer for next 4 examples.
+        # At least 4 examples prepared for mosaic.
+        example_buffer = []
+        # Main prefetch loop.
+        while True:
+            while len(example_buffer) < 4:
+                example_buffer.append(self.reader_queue.get())
+            outputs = self.get_outputs(example_buffer)
+            if outputs is not None:
+                self.worker_queue.put(outputs)
+class DataLoaderBase(threading.Thread):
+    """Base class of data loader."""
+    def __init__(self, worker, **kwargs):
+        super(DataLoaderBase, self).__init__(daemon=True)
+        self.batch_size = kwargs.get('batch_size', 2)
+        self.num_readers = kwargs.get('num_readers', 1)
+        self.num_workers = kwargs.get('num_workers', 3)
+        self.queue_depth = kwargs.get('queue_depth', 2)
+        # Initialize distributed group.
+        rank, group_size = 0, 1
+        dist_group = dragon.distributed.get_group()
+        if dist_group is not None:
+            group_size = dist_group.size
+            rank = dragon.distributed.get_rank(dist_group)
+        # Build queues.
+        self.reader_queue = mp.Queue(self.queue_depth * self.batch_size)
+        self.worker_queue = mp.Queue(self.queue_depth * self.batch_size)
+        self.batch_queue = queue.Queue(self.queue_depth)
+        self.reader_queue = BalancedQueues(self.reader_queue, self.num_workers)
+        self.worker_queue = BalancedQueues(self.worker_queue, self.num_workers)
+        # Build readers.
+        self.readers = []
+        for i in range(self.num_readers):
+            part_idx, num_parts = i, self.num_readers
+            num_parts *= group_size
+            part_idx += rank * self.num_readers
+            self.readers.append(dragon.io.DataReader(**kwargs))
+            self.readers[i]._part_idx = part_idx
+            self.readers[i]._num_parts = num_parts
+            self.readers[i]._seed += part_idx
+            self.readers[i]._reader_queue = self.reader_queue
+            self.readers[i].start()
+            time.sleep(0.1)
+        # Build workers.
+        self.workers = []
+        for i in range(self.num_workers):
+            p = worker(**kwargs)
+            p.seed += (i + rank * self.num_workers)
+            p.reader_queue = self.reader_queue.queues[i]
+            p.worker_queue = self.worker_queue.queues[i]
+            p.start()
+            self.workers.append(p)
+            time.sleep(0.1)
+        # Register cleanup callbacks.
+        def cleanup():
+            def terminate(processes):
+                for p in processes:
+                    p.terminate()
+                    p.join()
+            terminate(self.workers)
+            terminate(self.readers)
+        import atexit
+        atexit.register(cleanup)
+        # Start batch prefetching.
+        self.start()
+    def next(self):
+        """Return the next batch of data."""
+        return self.__next__()
+    def run(self):
+        """Main loop."""
+    def __call__(self):
+        return self.next()
+    def __iter__(self):
+        """Return the iterator self."""
+        return self
+    def __next__(self):
+        """Return the next batch of data."""
+        return self.batch_queue.get()
+class DataLoader(DataLoaderBase):
+    """Loader to return the batch of data."""
+    def __init__(self, dataset, worker, **kwargs):
+        dataset = build_dataset(dataset)
+        self.contiguous = kwargs.get('contiguous', True)
+        self.prefetch_count = kwargs.get('prefetch_count', 50)
+        self.img_mean = cfg.MODEL.PIXEL_MEAN
+        self.img_align = (cfg.BACKBONE.COARSEST_STRIDE,) * 2
+        args = {'dataset': dataset.type,
+                'source': dataset.source,
+                'classes': dataset.classes,
+                'shuffle': kwargs.get('shuffle', True),
+                'batch_size': kwargs.get('batch_size', 1),
+                'num_workers': kwargs.get('num_workers', 1),
+                'stick_to_part': dataset.num_images < 100000}
+        super(DataLoader, self).__init__(worker, **args)
+    def run(self):
+        """Main loop."""
+        logging.info('Prefetch batches...')
+        prev_inputs = [self.worker_queue.get()
+                       for _ in range(self.prefetch_count * self.batch_size)]
+        next_inputs = []
+        while True:
+            # Use cached buffer for next N inputs.
+            # Inputs are sorted to simulate aspect grouping.
+            if len(next_inputs) == 0:
+                next_inputs = prev_inputs
+                if 'aspect_ratio' in next_inputs[0]:
+                    # Inputs are sorted to simulate aspect grouping.
+                    next_inputs.sort(key=lambda d: d['aspect_ratio'][0])
+                prev_inputs = []
+            # Collect the next batch.
+            outputs = collections.defaultdict(list)
+            for _ in range(self.batch_size):
+                inputs = next_inputs.pop(0)
+                for k, v in inputs.items():
+                    outputs[k].extend(v)
+                prev_inputs.append(self.worker_queue.get())
+            # Stack batch data.
+            if self.contiguous:
+                outputs['img'] = blob_vstack(
+                    outputs['img'], fill_value=self.img_mean,
+                    align=self.img_align)
+            # Send batch data to consumer.
+            self.batch_queue.put(outputs)
--- a/seetadet/data/pipelines.py
+++ b/seetadet/data/pipelines.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Data loading pipelines."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.core.config import cfg
+from seetadet.data import transforms
+from seetadet.data.build import LOADERS
+from seetadet.data.build import build_anchor_sampler
+from seetadet.data.datasets import AnnotatedDatum
+from seetadet.data.loader import DataWorkerBase
+from seetadet.data.loader import DataLoader
+class DetTrainWorker(DataWorkerBase):
+    """Worker that defines a generic train pipeline."""
+    def __init__(self, **kwargs):
+        super(DetTrainWorker, self).__init__()
+        self.parse_boxes = transforms.ParseBoxes()
+        self.resize = transforms.RandomResize(
+            scales=cfg.TRAIN.SCALES,
+            scales_range=cfg.TRAIN.SCALES_RANGE,
+            max_size=cfg.TRAIN.MAX_SIZE)
+        self.flip = transforms.RandomFlip()
+        self.crop = transforms.RandomCrop(crop_size=cfg.AUG.CROP_SIZE)
+        self.distort = transforms.ColorJitter(cfg.AUG.COLOR_JITTER)
+        self.anchor_sampler = build_anchor_sampler()
+    def get_outputs(self, inputs):
+        datum = AnnotatedDatum(inputs.pop(0))
+        img, boxes = datum.img, self.parse_boxes(datum)
+        img, boxes = self.resize(img, boxes)
+        img, boxes = self.flip(img, boxes)
+        img, boxes = self.crop(img, boxes)
+        if len(boxes) == 0:
+            return None
+        img = self.distort(img)
+        height, width = img.shape[:2]
+        im_scale = self.resize.im_scale
+        outputs = {'img': [img], 'gt_boxes': [boxes],
+                   'im_info': [(height, width, im_scale)],
+                   'aspect_ratio': [float(height) / float(width)]}
+        if self.anchor_sampler is not None:
+            anchor_info = self.anchor_sampler.sample(boxes, outputs['im_info'][0])
+            for k, v in anchor_info.items():
+                outputs[k] = [v]
+        return outputs
+class MaskTrainWorker(DataWorkerBase):
+    """Worker that defines a generic train pipeline."""
+    def __init__(self, **kwargs):
+        super(MaskTrainWorker, self).__init__()
+        self.parse_boxes = transforms.ParseBoxes()
+        self.parse_segms = transforms.ParseSegms()
+        self.resize = transforms.RandomResize(
+            scales=cfg.TRAIN.SCALES,
+            scales_range=cfg.TRAIN.SCALES_RANGE,
+            max_size=cfg.TRAIN.MAX_SIZE)
+        self.flip = transforms.RandomFlip()
+        self.distort = transforms.ColorJitter(cfg.AUG.COLOR_JITTER)
+        self.anchor_sampler = build_anchor_sampler()
+    def get_outputs(self, inputs):
+        datum = AnnotatedDatum(inputs.pop(0))
+        img, boxes = datum.img, self.parse_boxes(datum)
+        segms, width = self.parse_segms(datum), img.shape[1]
+        img, boxes = self.resize(img, boxes)
+        if len(boxes) == 0:
+            return None
+        img, boxes = self.flip(img, boxes)
+        segms = self.flip.apply_segms(segms, width)
+        img = self.distort(img)
+        height, width = img.shape[:2]
+        im_scale = self.resize.im_scale
+        outputs = {'img': [img], 'gt_boxes': [boxes], 'gt_segms': [segms],
+                   'im_info': [(height, width, im_scale)],
+                   'aspect_ratio': [float(height) / float(width)]}
+        anchor_info = self.anchor_sampler.sample(boxes, outputs['im_info'][0])
+        for k, v in anchor_info.items():
+            outputs[k] = [v]
+        return outputs
+class SSDTrainWorker(DataWorkerBase):
+    """DataTransformer."""
+    def __init__(self, **kwargs):
+        super(SSDTrainWorker, self).__init__()
+        self.parse_boxes = transforms.ParseBoxes()
+        self.paste = transforms.RandomPaste()
+        self.crop = transforms.RandomBBoxCrop()
+        self.resize = transforms.RandomResize(
+            scales=cfg.TRAIN.SCALES, keep_ratio=False)
+        self.flip = transforms.RandomFlip()
+        self.distort = transforms.ColorJitter(cfg.AUG.COLOR_JITTER)
+        self.anchor_sampler = build_anchor_sampler()
+    def get_outputs(self, inputs):
+        datum = AnnotatedDatum(inputs.pop(0))
+        img, boxes = datum.img, self.parse_boxes(datum)
+        boxes /= [(img.shape[1], img.shape[0]) * 2 + (1,)]
+        img, boxes = self.paste(img, boxes)
+        img, boxes = self.crop(img, boxes)
+        if len(boxes) == 0:
+            return None
+        img, _ = self.resize(img)
+        boxes[:, :4] *= img.shape[0]
+        img, boxes = self.flip(img, boxes)
+        img = self.distort(img)
+        outputs = {'img': [img], 'gt_boxes': [boxes],
+                   'im_info': [img.shape[:2]]}
+        if self.anchor_sampler is not None:
+            anchor_info = self.anchor_sampler.sample(boxes, outputs['im_info'][0])
+            for k, v in anchor_info.items():
+                outputs[k] = [v]
+        return outputs
+class DetTestWorker(DataWorkerBase):
+    """Worker that defines a generic test pipeline."""
+    def __init__(self, **kwargs):
+        super(DetTestWorker, self).__init__()
+    def get_outputs(self, inputs):
+        datum = AnnotatedDatum(inputs.pop(0))
+        img, objects = datum.img, datum.objects
+        outputs = {'img': [img], 'objects': [objects],
+                   'img_meta': [{'id': datum.id,
+                                 'height': datum.height,
+                                 'width': datum.width}]}
+        return outputs
+LOADERS.register('det_train', DataLoader, worker=DetTrainWorker)
+LOADERS.register('mask_train', DataLoader, worker=MaskTrainWorker)
+LOADERS.register('ssd_train', DataLoader, worker=SSDTrainWorker)
+LOADERS.register('det_test', DataLoader, worker=DetTestWorker)
--- a/seetadet/datasets/__init__.py
+++ b/seetadet/datasets/__init__.py
--- a/seetadet/data/targets/rcnn.py
+++ b/seetadet/data/targets/rcnn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import numpy as np
+import numpy.random as npr
+from seetadet.core.config import cfg
+from seetadet.ops.normalization import to_tensor
+from seetadet.utils.bbox import bbox_overlaps
+from seetadet.utils.bbox import bbox_transform
+from seetadet.utils.bbox import distribute_boxes
+from seetadet.utils.mask import mask_from
+class ProposalTargets(object):
+    """Generate ground-truth targets for proposals."""
+    def __init__(self):
+        super(ProposalTargets, self).__init__()
+        self.num_classes = len(cfg.MODEL.CLASSES)
+        self.num_rois = cfg.FRCNN.BATCH_SIZE
+        self.num_fg_rois = round(cfg.FRCNN.POSITIVE_FRACTION * self.num_rois)
+        self.pos_iou_thr = cfg.FRCNN.POSITIVE_OVERLAP
+        self.neg_iou_thr = cfg.FRCNN.NEGATIVE_OVERLAP
+        self.bbox_reg_weights = cfg.FRCNN.BBOX_REG_WEIGHTS
+        self.mask_size = (cfg.MRCNN.POOLER_RESOLUTION * 2,) * 2
+        self.lvl_min, self.lvl_max = cfg.FRCNN.MIN_LEVEL, cfg.FRCNN.MAX_LEVEL
+        self.defaults = {'rois': np.array([[-1, 0, 0, 1, 1]], 'float32'),
+                         'labels': np.array([-1], 'int64'),
+                         'bbox_targets': np.zeros((1, 4), 'float32'),
+                         'mask_targets': np.full((1,) + self.mask_size, -1, 'float32')}
+    def sample_rois(self, rois, gt_boxes):
+        """Sample positive and negative RoIs."""
+        # Compute overlaps between RoIs and ground-truth boxes.
+        overlaps = bbox_overlaps(rois[:, 1:5], gt_boxes[:, :4])
+        max_overlaps = overlaps.max(axis=1)
+        # Assign with the ground-truth boxes taken the highest IoU.
+        gt_assignments = overlaps.argmax(axis=1)
+        labels = gt_boxes[gt_assignments, 4].astype('int64')
+        # Select foreground regions.
+        pos_iou_thr = self.pos_iou_thr
+        fg_inds = np.where(max_overlaps >= pos_iou_thr)[0]
+        while fg_inds.size == 0:
+            pos_iou_thr -= 0.01
+            fg_inds = np.where(max_overlaps >= pos_iou_thr)[0]
+        # Select background regions.
+        bg_inds = np.where(max_overlaps < self.neg_iou_thr)[0]
+        # Sample foreground regions without replacement.
+        num_fg_rois = int(min(self.num_fg_rois, fg_inds.size))
+        fg_inds = npr.choice(fg_inds, num_fg_rois, False)
+        # Sample background regions without replacement.
+        num_bg_rois = self.num_rois - num_fg_rois
+        num_bg_rois = min(num_bg_rois, bg_inds.size)
+        if bg_inds.size > 0:
+            bg_inds = npr.choice(bg_inds, num_bg_rois, False)
+        # Take values via sampled indices.
+        keep_inds = np.append(fg_inds, bg_inds)
+        rois, labels = rois[keep_inds], labels[keep_inds]
+        gt_assignments = gt_assignments[keep_inds]
+        # Reassign background regions.
+        labels[num_fg_rois:] = 0
+        return rois, labels, gt_assignments
+    def distribute_blobs(self, blobs, lvls):
+        """Distribute blobs on given levels."""
+        outputs = collections.defaultdict(list)
+        lvl_inds = [np.where(lvls == (i + self.lvl_min))[0]
+                    for i in range(self.lvl_max - self.lvl_min + 1)]
+        for inds in lvl_inds:
+            for key, blob in blobs.items():
+                outputs[key].append(blob[inds] if len(inds) > 0
+                                    else self.defaults[key])
+        return outputs
+    def get_bbox_targets(self, rois, boxes):
+        return bbox_transform(rois, boxes, weights=self.bbox_reg_weights)
+    def get_mask_targets(self, rois, segms, inds):
+        targets = np.full((len(rois),) + self.mask_size, -1, 'float32')
+        for i in inds:
+            if segms[i] is not None:
+                targets[i] = mask_from(segms[i], self.mask_size, rois[i])
+        return targets
+    def compute(self, **inputs):
+        """Compute proposal targets."""
+        blobs = collections.defaultdict(list)
+        all_rois = inputs['rois']
+        batch_inds = all_rois[:, 0].astype('int32')
+        # Compute targets per image.
+        for i, gt_boxes in enumerate(inputs['gt_boxes']):
+            # Select proposals of this image.
+            rois = all_rois[np.where(batch_inds == i)[0]]
+            # Include ground-truth boxes in the set of candidates.
+            inds = np.ones((gt_boxes.shape[0], 1), gt_boxes.dtype) * i
+            rois = np.vstack((rois, np.hstack((inds, gt_boxes[:, :4]))))
+            # Sample a batch of RoIs for training.
+            rois, labels, gt_assignments = self.sample_rois(rois, gt_boxes)
+            # Fill blobs.
+            blobs['rois'].append(rois)
+            blobs['labels'].append(labels)
+            blobs['bbox_targets'].append(self.get_bbox_targets(
+                rois[:, 1:5], gt_boxes[gt_assignments, :4]))
+            if 'gt_segms' in inputs:
+                fg_inds = np.where(labels > 0)[0]
+                segms = [inputs['gt_segms'][i][j] for j in gt_assignments]
+                blobs['mask_targets'].append(self.get_mask_targets(
+                    rois[:, 1:5] / inputs['im_info'][i][2], segms, fg_inds))
+        # Concat to get the contiguous blobs.
+        blobs = dict((k, np.concatenate(blobs[k])) for k in blobs.keys())
+        # Distribute blobs by the level of all ROIs.
+        lvls = distribute_boxes(blobs['rois'][:, 1:], self.lvl_min, self.lvl_max)
+        blobs = self.distribute_blobs(blobs, lvls)
+        # Add the targets using foreground ROIs only.
+        for lvl in range(self.lvl_max - self.lvl_min + 1):
+            inds = np.where(blobs['labels'][lvl] > 0)[0]
+            if len(inds) > 0:
+                blobs['fg_rois'].append(blobs['rois'][lvl][inds])
+                blobs['mask_labels'].append(blobs['labels'][lvl][inds] - 1)
+                if 'mask_targets' in blobs:
+                    blobs['mask_targets'][lvl] = blobs['mask_targets'][lvl][inds]
+            else:
+                blobs['fg_rois'].append(self.defaults['rois'])
+                blobs['mask_labels'].append(np.array([0], 'int64'))
+                if 'mask_targets' in blobs:
+                    blobs['mask_targets'][lvl] = self.defaults['mask_targets']
+        # Concat to get contiguous blobs along the levels.
+        rois, fg_rois = blobs['rois'], blobs['fg_rois']
+        blobs = dict((k, np.concatenate(blobs[k])) for k in blobs.keys())
+        # Compute class-specific strides.
+        bbox_strides = np.arange(len(blobs['rois'])) * self.num_classes
+        mask_strides = np.arange(len(blobs['fg_rois'])) * (self.num_classes - 1)
+        # Select the foreground RoIs for bbox targets.
+        fg_inds = np.where(blobs['labels'] > 0)[0]
+        if len(fg_inds) == 0:
+            # Sample a proposal randomly to avoid memory issue.
+            fg_inds = npr.randint(len(blobs['labels']), size=[1])
+        outputs = {
+            'rois': [to_tensor(rois[i]) for i in range(len(rois))],
+            'fg_rois': [to_tensor(fg_rois[i]) for i in range(len(fg_rois))],
+            'labels': to_tensor(blobs['labels']),
+            'bbox_inds': to_tensor(bbox_strides[fg_inds] + blobs['labels'][fg_inds]),
+            'mask_inds': to_tensor(mask_strides + blobs['mask_labels']),
+            'bbox_targets': to_tensor(blobs['bbox_targets'][fg_inds]),
+            'bbox_anchors': to_tensor(blobs['rois'][fg_inds, 1:]),
+        }
+        if 'mask_targets' in blobs:
+            outputs['mask_targets'] = to_tensor(blobs['mask_targets'])
+        return outputs
--- a/seetadet/data/targets/retinanet.py
+++ b/seetadet/data/targets/retinanet.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.data.build import ANCHOR_SAMPLERS
+from seetadet.data.anchors.rpn import AnchorGenerator
+from seetadet.data.assigners import MaxIoUAssigner
+from seetadet.ops.normalization import to_tensor
+from seetadet.utils.bbox import bbox_overlaps
+from seetadet.utils.bbox import bbox_transform
+@ANCHOR_SAMPLERS.register('retinanet')
+class AnchorTargets(object):
+    """Generate ground-truth targets for anchors."""
+    def __init__(self):
+        super(AnchorTargets, self).__init__()
+        self.generator = AnchorGenerator(
+            strides=cfg.ANCHOR_GENERATOR.STRIDES,
+            sizes=cfg.ANCHOR_GENERATOR.SIZES,
+            aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS,
+            scales_per_octave=3)
+        self.assigner = MaxIoUAssigner(
+            pos_iou_thr=cfg.RETINANET.POSITIVE_OVERLAP,
+            neg_iou_thr=cfg.RETINANET.NEGATIVE_OVERLAP)
+        max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
+        if cfg.BACKBONE.COARSEST_STRIDE > 0:
+            stride = float(cfg.BACKBONE.COARSEST_STRIDE)
+            max_size = int(np.ceil(max_size / stride) * stride)
+        self.generator.reset_grid(max_size)
+    def sample(self, gt_boxes, im_info, anchors=None):
+        """Sample positive and negative anchors."""
+        # Remove anchors separating from the image.
+        anchors = self.generator.grid_anchors
+        inds_inside = np.where((anchors[:, 0] < im_info[1]) &
+                               (anchors[:, 1] < im_info[0]))[0]
+        anchors = anchors[inds_inside, :]
+        # Assign ground-truth according to the IoU.
+        labels = self.assigner.assign(anchors, gt_boxes)
+        # Select foreground and ignored indices
+        # to avoid too many backgrounds.
+        # (~100x faster for 200k background indices)
+        return {'fg_inds': inds_inside[np.where(labels > 0)[0]],
+                'bg_inds': inds_inside[np.where(labels < 0)[0]]}
+    def compute(self, **inputs):
+        """Compute anchor targets."""
+        shapes = [x[:2] for x in inputs['grid_info']]
+        num_images = len(inputs['gt_boxes'])
+        num_anchors = self.generator.num_anchors(shapes)
+        blobs = collections.defaultdict(list)
+        # "1" is positive, "0" is negative, "-1" is don't care.
+        labels = np.zeros((num_images, num_anchors), 'int64')
+        for i, gt_boxes in enumerate(inputs['gt_boxes']):
+            fg_inds = inputs['fg_inds'][i]
+            ignore_inds = inputs['bg_inds'][i]
+            # Narrow anchors to match the feature layout.
+            ignore_inds = self.generator.narrow_anchors(shapes, ignore_inds)
+            fg_inds, anchors = self.generator.narrow_anchors(shapes, fg_inds, True)
+            # Compute bbox targets.
+            gt_assignments = bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
+            bbox_targets = bbox_transform(anchors, gt_boxes[gt_assignments, :4])
+            blobs['bbox_anchors'].append(anchors)
+            blobs['bbox_targets'].append(bbox_targets)
+            # Compute label assignments.
+            labels[i, ignore_inds] = -1
+            labels[i, fg_inds] = gt_boxes[gt_assignments, 4]
+            # Compute sparse indices.
+            fg_inds += i * num_anchors
+            blobs['bbox_inds'].extend([fg_inds])
+        return {
+            'labels': to_tensor(labels),
+            'bbox_inds': to_tensor(np.hstack(blobs['bbox_inds'])),
+            'bbox_targets': to_tensor(np.vstack(blobs['bbox_targets'])),
+            'bbox_anchors': to_tensor(np.vstack(blobs['bbox_anchors'])),
+        }
--- a/seetadet/data/targets/rpn.py
+++ b/seetadet/data/targets/rpn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Generate targets for RPN head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import numpy as np
+import numpy.random as npr
+from seetadet.core.config import cfg
+from seetadet.data.anchors.rpn import AnchorGenerator
+from seetadet.data.assigners import MaxIoUAssigner
+from seetadet.data.build import ANCHOR_SAMPLERS
+from seetadet.ops.normalization import to_tensor
+from seetadet.utils.bbox import bbox_overlaps
+from seetadet.utils.bbox import bbox_transform
+@ANCHOR_SAMPLERS.register(['faster_rcnn', 'mask_rcnn'])
+class AnchorTargets(object):
+    """Generate ground-truth targets for anchors."""
+    def __init__(self):
+        super(AnchorTargets, self).__init__()
+        self.generator = AnchorGenerator(
+            strides=cfg.ANCHOR_GENERATOR.STRIDES,
+            sizes=cfg.ANCHOR_GENERATOR.SIZES,
+            aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS)
+        self.assigner = MaxIoUAssigner(
+            pos_iou_thr=cfg.RPN.POSITIVE_OVERLAP,
+            neg_iou_thr=cfg.RPN.NEGATIVE_OVERLAP)
+        max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
+        if cfg.BACKBONE.COARSEST_STRIDE > 0:
+            stride = float(cfg.BACKBONE.COARSEST_STRIDE)
+            max_size = int(np.ceil(max_size / stride) * stride)
+        self.generator.reset_grid(max_size)
+    def sample(self, gt_boxes, im_info):
+        """Sample positive and negative anchors."""
+        # Only keep anchors inside the image.
+        anchors = self.generator.grid_anchors
+        inds_inside = np.where((anchors[:, 0] >= 0) &
+                               (anchors[:, 1] >= 0) &
+                               (anchors[:, 2] < im_info[1]) &
+                               (anchors[:, 3] < im_info[0]))[0]
+        anchors = anchors[inds_inside, :]
+        # Assign ground-truth according to the IoU.
+        labels = self.assigner.assign(anchors, gt_boxes)
+        # Sample positive labels if we have too many.
+        fg_inds = np.where(labels > 0)[0]
+        num_fg = int(cfg.RPN.POSITIVE_FRACTION * cfg.RPN.BATCH_SIZE)
+        if len(fg_inds) > num_fg:
+            fg_inds = npr.choice(fg_inds, num_fg, False)
+        # Sample negative labels if we have too many.
+        num_bg = cfg.RPN.BATCH_SIZE - len(fg_inds)
+        bg_inds = np.where(labels == 0)[0]
+        if len(bg_inds) > num_bg:
+            bg_inds = npr.choice(bg_inds, num_bg, False)
+        # Select foreground and background indices.
+        return {'fg_inds': inds_inside[fg_inds],
+                'bg_inds': inds_inside[bg_inds]}
+    def compute(self, **inputs):
+        """Compute anchor targets."""
+        shapes = [x[:2] for x in inputs['grid_info']]
+        num_anchors = self.generator.num_anchors(shapes)
+        blobs = collections.defaultdict(list)
+        for i, gt_boxes in enumerate(inputs['gt_boxes']):
+            fg_inds = inputs['fg_inds'][i]
+            bg_inds = inputs['bg_inds'][i]
+            # Narrow anchors to match the feature layout.
+            bg_inds = self.generator.narrow_anchors(shapes, bg_inds)
+            fg_inds, anchors = self.generator.narrow_anchors(shapes, fg_inds, True)
+            # Compute bbox targets.
+            gt_assignments = bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
+            bbox_targets = bbox_transform(anchors, gt_boxes[gt_assignments, :4])
+            blobs['bbox_anchors'].append(anchors)
+            blobs['bbox_targets'].append(bbox_targets)
+            # Compute sparse indices.
+            fg_inds += i * num_anchors
+            bg_inds += i * num_anchors
+            blobs['cls_inds'] += [fg_inds, bg_inds]
+            blobs['bbox_inds'] += [fg_inds]
+            blobs['labels'] += [np.ones_like(fg_inds, 'float32'),
+                                np.zeros_like(bg_inds, 'float32')]
+        return {
+            'labels': to_tensor(np.hstack(blobs['labels'])),
+            'cls_inds': to_tensor(np.hstack(blobs['cls_inds'])),
+            'bbox_inds': to_tensor(np.hstack(blobs['bbox_inds'])),
+            'bbox_targets': to_tensor(np.vstack(blobs['bbox_targets'])),
+            'bbox_anchors': to_tensor(np.vstack(blobs['bbox_anchors'])),
+        }
--- a/seetadet/data/targets/ssd.py
+++ b/seetadet/data/targets/ssd.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Generate targets for SSD head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.data.build import ANCHOR_SAMPLERS
+from seetadet.data.anchors.ssd import AnchorGenerator
+from seetadet.data.assigners import MaxIoUAssigner
+from seetadet.ops.normalization import to_tensor
+from seetadet.utils.bbox import bbox_overlaps
+from seetadet.utils.bbox import bbox_transform
+@ANCHOR_SAMPLERS.register('ssd')
+class AnchorTargets(object):
+    """Generate ground-truth targets for anchors."""
+    def __init__(self):
+        super(AnchorTargets, self).__init__()
+        self.generator = AnchorGenerator(
+            strides=cfg.ANCHOR_GENERATOR.STRIDES,
+            sizes=cfg.ANCHOR_GENERATOR.SIZES,
+            aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS)
+        self.assigner = MaxIoUAssigner(
+            pos_iou_thr=cfg.SSD.POSITIVE_OVERLAP,
+            neg_iou_thr=cfg.SSD.NEGATIVE_OVERLAP,
+            gt_max_assign_all=False)
+        self.neg_pos_ratio = (1.0 / cfg.SSD.POSITIVE_FRACTION) - 1.0
+        max_size = cfg.ANCHOR_GENERATOR.STRIDES[-1]
+        self.generator.reset_grid(max_size)
+    def sample(self, gt_boxes, im_info):
+        """Sample positive and negative anchors."""
+        anchors = self.generator.grid_anchors
+        # Assign ground-truth according to the IoU.
+        labels = self.assigner.assign(anchors, gt_boxes)
+        # Select positive and non-positive indices.
+        return {'fg_inds': np.where(labels > 0)[0],
+                'bg_inds': np.where(labels <= 0)[0]}
+    def compute(self, **inputs):
+        """Compute anchor targets."""
+        num_images = len(inputs['gt_boxes'])
+        num_anchors = self.generator.grid_anchors.shape[0]
+        cls_score = inputs['cls_score'].numpy().astype('float32')
+        blobs = collections.defaultdict(list)
+        # "1" is positive, "0" is negative, "-1" is don't care
+        labels = np.full((num_images, num_anchors,), -1, 'int64')
+        for i, gt_boxes in enumerate(inputs['gt_boxes']):
+            fg_inds = pos_inds = inputs['fg_inds'][i]
+            neg_inds = inputs['bg_inds'][i]
+            # Mining hard negatives as background.
+            num_pos, num_neg = len(pos_inds), len(neg_inds)
+            num_bg = min(int(num_pos * self.neg_pos_ratio), num_neg)
+            neg_score = cls_score[i, neg_inds, 0]
+            bg_inds = neg_inds[np.argsort(neg_score)][:num_bg]
+            # Compute bbox targets.
+            anchors = self.generator.grid_anchors[fg_inds]
+            gt_assignments = bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
+            bbox_targets = bbox_transform(anchors, gt_boxes[gt_assignments, :4],
+                                          weights=cfg.SSD.BBOX_REG_WEIGHTS)
+            blobs['bbox_anchors'].append(anchors)
+            blobs['bbox_targets'].append(bbox_targets)
+            # Compute label assignments.
+            labels[i, bg_inds] = 0
+            labels[i, fg_inds] = gt_boxes[gt_assignments, 4]
+            # Compute sparse indices.
+            fg_inds += i * num_anchors
+            blobs['bbox_inds'].extend([fg_inds])
+        return {
+            'labels': to_tensor(labels),
+            'bbox_inds': to_tensor(np.hstack(blobs['bbox_inds'])),
+            'bbox_targets': to_tensor(np.vstack(blobs['bbox_targets'])),
+            'bbox_anchors': to_tensor(np.vstack(blobs['bbox_anchors'])),
+        }
--- a/seetadet/data/transforms.py
+++ b/seetadet/data/transforms.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import numpy.random as npr
+from seetadet.core.config import cfg
+from seetadet.utils.bbox import boxes_iou
+from seetadet.utils.bbox import clip_boxes
+from seetadet.utils.bbox import flip_boxes
+from seetadet.utils.bbox import flip_polygons
+from seetadet.utils.image import im_resize
+from seetadet.utils.image import color_jitter
+from seetadet.utils.mask import mask_from
+class ParseBoxes(object):
+    """Parse the ground-truth boxes."""
+    def __init__(self):
+        self.classes = cfg.MODEL.CLASSES
+        self.num_classes = len(self.classes)
+        self.class_indices = dict(zip(self.classes, range(self.num_classes)))
+        self.use_diff = cfg.TRAIN.USE_DIFF
+    def __call__(self, datum):
+        objects, num_objects = datum.objects, 0
+        height, width = datum.height, datum.width
+        if not self.use_diff:
+            for obj in objects:
+                if obj.get('difficult', 0) == 0:
+                    num_objects += 1
+        else:
+            num_objects = len(objects)
+        boxes = np.zeros((num_objects, 4), 'float32')
+        classes = np.zeros((num_objects, 1), 'float32')
+        # Filter the difficult instances.
+        object_idx = 0
+        for obj in objects:
+            if not self.use_diff and obj.get('difficult', 0) > 0:
+                continue
+            bbox = obj['bbox']
+            boxes[object_idx, :] = [max(0, bbox[0]),
+                                    max(0, bbox[1]),
+                                    min(bbox[2], width - 1),
+                                    min(bbox[3], height - 1)]
+            classes[object_idx, :] = self.class_indices[obj['name']]
+            object_idx += 1
+        # Attach the classes.
+        cls_boxes = np.empty((len(boxes), 5), 'float32')
+        cls_boxes[:, :4], cls_boxes[:, 4:] = boxes, classes
+        return cls_boxes
+class ParseSegms(object):
+    """Parse the ground-truth segmentations."""
+    def __init__(self):
+        self.use_diff = cfg.TRAIN.USE_DIFF
+    def __call__(self, datum):
+        objects, num_objects = datum.objects, 0
+        height, width = datum.height, datum.width
+        if not self.use_diff:
+            for obj in objects:
+                if obj.get('difficult', 0) == 0:
+                    num_objects += 1
+        else:
+            num_objects = len(objects)
+        segms = []
+        # Filter the difficult instances.
+        object_idx = 0
+        for obj in objects:
+            if not self.use_diff and obj.get('difficult', 0) > 0:
+                continue
+            if 'mask' in obj:
+                segms.append(mask_from(obj['mask'], (height, width)))
+            elif 'polygons' in obj:
+                segms.append(obj['polygons'])
+            else:
+                segms.append(None)
+            object_idx += 1
+        return segms
+class RandomMosaic(object):
+    """Copy images into a 4x4 grid canvas."""
+    def __init__(self, size=None):
+        if size is None:
+            size = cfg.AUG.MOSAIC_SIZE
+        self._prob = cfg.AUG.MOSAIC
+        self._pixel_mean = cfg.MODEL.PIXEL_MEAN
+        if isinstance(size, (tuple, list)):
+            self._size_h, self._size_w = size[:2]
+        else:
+            self._size_h = self._size_w = int(size)
+        self._out_h = self._size_h * 2
+        self._out_w = self._size_w * 2
+        self._enabled = self._out_h > 0 and self._out_w > 0
+    @property
+    def enabled(self):
+        return self._enabled and npr.rand() < self._prob
+    @staticmethod
+    def _coords_to_slice(coords):
+        x1, y1, x2, y2 = coords
+        return slice(y1, y2), slice(x1, x2)
+    def _get_coords(self, index, w, h, x_ctr, y_ctr):
+        if index == 0:
+            x1, y1 = max(x_ctr - w, 0), max(y_ctr - h, 0)
+            x2, y2 = x_ctr, y_ctr
+            coords = (w - (x2 - x1), h - (y2 - y1), w, h)
+        elif index == 1:
+            x1, y1 = x_ctr, max(y_ctr - h, 0)
+            x2, y2 = min(x_ctr + w, self._out_w), y_ctr
+            coords = (0, h - (y2 - y1), min(w, x2 - x1), h)
+        elif index == 2:
+            x1, y1 = max(x_ctr - w, 0), y_ctr
+            x2, y2 = x_ctr, min(self._out_h, y_ctr + h)
+            coords = (w - (x2 - x1), 0, w, min(y2 - y1, h))
+        else:
+            x1, y1 = x_ctr, y_ctr
+            x2, y2 = min(x_ctr + w, self._out_w), min(self._out_h, y_ctr + h)
+            coords = (0, 0, min(w, x2 - x1), min(y2 - y1, h))
+        out_coords = self._coords_to_slice((x1, y1, x2, y2))
+        coords = self._coords_to_slice(coords)
+        return out_coords, coords
+    def __call__(self, img_list, boxes_list):
+        out_shape = list(img_list[0].shape)
+        out_shape[:2] = (self._out_h, self._out_w)
+        y_ctr = int(npr.uniform(0.5 * self._size_h, 1.5 * self._size_h))
+        x_ctr = int(npr.uniform(0.5 * self._size_w, 1.5 * self._size_w))
+        out_img = np.empty(out_shape, img_list[0].dtype)
+        out_img[:], out_boxes = self._pixel_mean, []
+        for i in range(4):
+            img, boxes = img_list[i], boxes_list[i]
+            h, w = img.shape[:2]
+            im_scale = min(self._size_h / float(h), self._size_w / float(w))
+            img = im_resize(img, scale=im_scale)
+            h, w = img.shape[:2]
+            out_coords, coords = self._get_coords(i, w, h, x_ctr, y_ctr)
+            h_offset = out_coords[0].start - coords[0].start
+            w_offset = out_coords[1].start - coords[1].start
+            out_img[out_coords] = img[coords]
+            boxes[:, (0, 2)] = boxes[:, (0, 2)] * im_scale + w_offset
+            boxes[:, (1, 3)] = boxes[:, (1, 3)] * im_scale + h_offset
+            boxes = clip_boxes(boxes, out_img.shape)
+            valid_inds = (boxes[:, 2] > boxes[:, 0]) & (boxes[:, 3] > boxes[:, 1])
+            out_boxes.append(boxes[valid_inds])
+        out_boxes = np.vstack(out_boxes)
+        return out_img, out_boxes
+class RandomFlip(object):
+    """Flip the image randomly."""
+    def __init__(self, prob=0.5):
+        self.prob = prob
+        self.is_flipped = False
+    def apply_segms(self, segms, width):
+        for i, segm in enumerate(segms):
+            if not self.is_flipped or segm is None:
+                continue
+            if isinstance(segm, np.ndarray):
+                segm = segm[:, ::-1]
+            else:
+                segm = flip_polygons(segm, width)
+            segms[i] = segm
+        return segms
+    def __call__(self, img, boxes=None):
+        self.is_flipped = npr.rand() < self.prob
+        img = img[:, ::-1] if self.is_flipped else img
+        if boxes is not None and self.is_flipped:
+            boxes = flip_boxes(boxes, img.shape[1])
+        return img, boxes
+class RandomResize(object):
+    """Resize the image randomly."""
+    def __init__(
+        self,
+        scales=(640,),
+        scales_range=(1.0, 1.0),
+        max_size=1066,
+        keep_ratio=True,
+    ):
+        self.scales = scales
+        self.scales_range = scales_range
+        self.max_size = max_size
+        self.keep_ratio = keep_ratio
+        self.im_scale = 1.0
+        self.im_scale_factor = 1.0
+    def __call__(self, img, boxes=None):
+        im_shape = img.shape
+        target_size = npr.choice(self.scales)
+        if self.keep_ratio:
+            # Scale along the shortest side.
+            max_size = max(self.max_size, target_size)
+            im_size_min = np.min(im_shape[:2])
+            im_size_max = np.max(im_shape[:2])
+            self.im_scale = float(target_size) / float(im_size_min)
+            # Prevent the biggest axis from being more than *MAX_SIZE*.
+            if np.round(self.im_scale * im_size_max) > max_size:
+                self.im_scale = float(max_size) / float(im_size_max)
+            # Apply the scale jitter to get a range of dynamic scales.
+            r = self.scales_range
+            self.im_scale_factor = r[0] + npr.rand() * (r[1] - r[0])
+            self.im_scale *= self.im_scale_factor
+            img = im_resize(img, scale=self.im_scale)
+            if boxes is not None:
+                boxes[:, :4] *= self.im_scale
+        else:
+            self.im_scale = (float(target_size) / float(im_shape[0]),
+                             float(target_size) / float(im_shape[1]))
+            img = im_resize(img, size=target_size)
+            if boxes is not None:
+                boxes[:, (0, 2)] = boxes[:, (0, 2)] * self.im_scale[1]
+                boxes[:, (1, 3)] = boxes[:, (1, 3)] * self.im_scale[0]
+        return img, boxes
+class RandomPaste(object):
+    """Copy image into a larger canvas randomly."""
+    def __init__(self, prob=0.5):
+        self.ratio = 1. / cfg.TRAIN.SCALES_RANGE[0]
+        self.prob = prob if self.ratio > 1 else 0
+        self.pixel_mean = cfg.MODEL.PIXEL_MEAN
+    def __call__(self, img, boxes):
+        if npr.rand() > self.prob:
+            return img, boxes
+        im_shape = list(img.shape)
+        h, w = im_shape[:2]
+        ratio = npr.uniform(1., self.ratio)
+        out_h, out_w = int(h * ratio), int(w * ratio)
+        y1 = int(np.floor(npr.uniform(0., out_h - h)))
+        x1 = int(np.floor(npr.uniform(0., out_w - w)))
+        im_shape[:2] = (out_h, out_w)
+        out_img = np.empty(im_shape, dtype=img.dtype)
+        out_img[:] = self.pixel_mean
+        out_img[y1:y1 + h, x1:x1 + w, :] = img
+        out_boxes = boxes.astype(boxes.dtype, copy=True)
+        out_boxes[:, (0, 2)] = (boxes[:, (0, 2)] * w + x1) / out_w
+        out_boxes[:, (1, 3)] = (boxes[:, (1, 3)] * h + y1) / out_h
+        return out_img, out_boxes
+class RandomCrop(object):
+    """Crop the image randomly."""
+    def __init__(self, crop_size=512):
+        self.crop_size = crop_size
+        self.pixel_mean = cfg.MODEL.PIXEL_MEAN
+    def __call__(self, img, boxes):
+        if self.crop_size <= 0:
+            return img, boxes
+        im_shape = list(img.shape)
+        h, w = im_shape[:2]
+        out_h, out_w = (self.crop_size,) * 2
+        y1 = npr.randint(max(h - out_h, 0) + 1)
+        x1 = npr.randint(max(w - out_w, 0) + 1)
+        im_shape[:2] = (out_h, out_w)
+        out_img = np.empty(im_shape, dtype=img.dtype)
+        out_img[:] = self.pixel_mean
+        out_img[:h, :w] = img[y1:y1 + out_h, x1:x1 + out_w]
+        img = out_img
+        boxes[:, (0, 2)] -= x1
+        boxes[:, (1, 3)] -= y1
+        boxes = clip_boxes(boxes, img.shape)
+        valid_inds = (boxes[:, 2] > boxes[:, 0]) & (boxes[:, 3] > boxes[:, 1])
+        boxes = boxes[valid_inds]
+        return img, boxes
+class ColorJitter(object):
+    """Distort the brightness, contrast and color of image."""
+    def __init__(self, prob=0.5):
+        self.prob = prob
+        self.brightness_range = (0.875, 1.125)
+        self.contrast_range = (0.5, 1.5)
+        self.saturation_range = (0.5, 1.5)
+    def __call__(self, img):
+        brightness = contrast = saturation = None
+        if npr.rand() < self.prob:
+            brightness = self.brightness_range
+        if npr.rand() < self.prob:
+            contrast = self.contrast_range
+        if npr.rand() < self.prob:
+            saturation = self.saturation_range
+        return color_jitter(img, brightness=brightness,
+                            contrast=contrast, saturation=saturation)
+class RandomBBoxCrop(object):
+    """Crop image by sampling a region restricted by bounding boxes."""
+    def __init__(self, scales_range=(0.3, 1.0), aspect_ratios_range=(0.5, 2.0),
+                 overlaps=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9)):
+        self.samplers = [{}]
+        for ov in overlaps:
+            self.samplers.append({
+                'scales_range': scales_range,
+                'aspect_ratios_range': aspect_ratios_range,
+                'overlaps_range': (ov, 1.0), 'max_trials': 10})
+    @classmethod
+    def generate_sample(cls, param):
+        scales_range = param.get('scales_range', (1.0, 1.0))
+        aspect_ratios_range = param.get('aspect_ratios_range', (1.0, 1.0))
+        scale = npr.uniform(scales_range[0], scales_range[1])
+        min_aspect_ratio = max(aspect_ratios_range[0], scale**2)
+        max_aspect_ratio = min(aspect_ratios_range[1], 1. / (scale**2))
+        aspect_ratio = npr.uniform(min_aspect_ratio, max_aspect_ratio)
+        bbox_w = scale * (aspect_ratio ** 0.5)
+        bbox_h = scale / (aspect_ratio ** 0.5)
+        w_off = npr.uniform(0., 1. - bbox_w)
+        h_off = npr.uniform(0., 1. - bbox_h)
+        return np.array([w_off, h_off, w_off + bbox_w, h_off + bbox_h])
+    @staticmethod
+    def check_center(sample_box, boxes):
+        x_ctr = (boxes[:, 2] + boxes[:, 0]) / 2.0
+        y_ctr = (boxes[:, 3] + boxes[:, 1]) / 2.0
+        # Keep the ground-truth box whose center is in the sample box.
+        keep = np.where((x_ctr >= sample_box[0]) & (x_ctr <= sample_box[2]) &
+                        (y_ctr >= sample_box[1]) & (y_ctr <= sample_box[3]))[0]
+        return len(keep) > 0
+    @staticmethod
+    def check_overlap(sample_box, boxes, param):
+        overlaps_range = param.get('overlaps_range', (0.0, 1.0))
+        if overlaps_range[0] == 0.0 and overlaps_range[1] == 1.0:
+            return True
+        ovmax = boxes_iou(sample_box[None, :], boxes[:, :4]).max()
+        if ovmax < overlaps_range[0] or ovmax > overlaps_range[1]:
+            return False
+        return True
+    def generate_batch_samples(self, boxes):
+        sample_boxes = []
+        for sampler in self.samplers:
+            found, max_trails = 0, sampler.get('max_trials', 1)
+            for _ in range(max_trails):
+                if found >= 1:
+                    break
+                sample_box = self.generate_sample(sampler)
+                if not self.check_overlap(sample_box, boxes, sampler):
+                    continue
+                if not self.check_center(sample_box, boxes):
+                    continue
+                found += 1
+                sample_boxes.append(sample_box)
+        return sample_boxes
+    @classmethod
+    def crop(cls, img, crop_box, boxes=None):
+        h, w = img.shape[:2]
+        w_offset = int(crop_box[0] * w)
+        h_offset = int(crop_box[1] * h)
+        crop_w = int((crop_box[2] - crop_box[0]) * w)
+        crop_h = int((crop_box[3] - crop_box[1]) * h)
+        img = img[h_offset:h_offset + crop_h, w_offset:w_offset + crop_w]
+        if boxes is not None:
+            x_ctr = (boxes[:, 2] + boxes[:, 0]) / 2.0
+            y_ctr = (boxes[:, 3] + boxes[:, 1]) / 2.0
+            keep = np.where((x_ctr >= crop_box[0]) & (x_ctr <= crop_box[2]) &
+                            (y_ctr >= crop_box[1]) & (y_ctr <= crop_box[3]))[0]
+            boxes = boxes[keep]
+            boxes[:, (0, 2)] = boxes[:, (0, 2)] * w - w_offset
+            boxes[:, (1, 3)] = boxes[:, (1, 3)] * h - h_offset
+            boxes = clip_boxes(boxes, (crop_h, crop_w))
+            boxes[:, (0, 2)] /= crop_w
+            boxes[:, (1, 3)] /= crop_h
+        return img, boxes
+    def __call__(self, img, boxes):
+        sample_boxes = self.generate_batch_samples(boxes)
+        if len(sample_boxes) > 0:
+            rand_box = sample_boxes[npr.randint(len(sample_boxes))]
+            img, boxes = self.crop(img, rand_box, boxes)
+        return img, boxes
--- a/seetadet/datasets/coco_evaluator.py
+++ b/seetadet/datasets/coco_evaluator.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import json
-import os
-import sys
-import numpy as np
-from seetadet.core.config import cfg
-from seetadet.utils import mask as mask_util
-from seetadet.utils.pycocotools import mask as mask_tools
-from seetadet.utils.pycocotools.coco import COCO
-from seetadet.utils.pycocotools.cocoeval import COCOeval
-class COCOEvaluator(object):
-    """Evaluator for MS COCO dataset."""
-    def __init__(self, imdb, ann_file=None):
-        self.imdb = imdb
-        if ann_file is not None and os.path.exists(ann_file):
-            self.coco = COCO(ann_file)
-            cats = self.coco.loadCats(self.coco.getCatIds())
-            self.class_to_cat_id = dict(zip([c['name'] for c in cats],
-                                            self.coco.getCatIds()))
-        else:
-            self.coco = None
-            self.class_to_cat_id = None
-    def bbox_results_one_category(self, boxes, cat_id, gt_recs):
-        ix, results = 0, []
-        for image_name, rec in gt_recs.items():
-            detections = boxes[ix]
-            ix += 1
-            if isinstance(detections, list) and len(detections) == 0:
-                continue
-            detections = detections.astype('float64')
-            scores = detections[:, -1]
-            xs = detections[:, 0]
-            ys = detections[:, 1]
-            ws = detections[:, 2] - xs + 1
-            hs = detections[:, 3] - ys + 1
-            results.extend([{
-                'image_id': self.get_image_id(image_name),
-                'category_id': cat_id,
-                'bbox': [xs[k], ys[k], ws[k], hs[k]],
-                'score': scores[k],
-            } for k in range(detections.shape[0])])
-        return results
-    def do_bbox_eval(self, res_file):
-        coco_dt = self.coco.loadRes(res_file)
-        coco_eval = COCOeval(self.coco, coco_dt, 'bbox')
-        coco_eval.evaluate()
-        coco_eval.accumulate()
-        self.print_coco_eval_results(coco_eval)
-    def do_segm_eval(self, res_file):
-        coco_dt = self.coco.loadRes(res_file)
-        coco_eval = COCOeval(self.coco, coco_dt, 'segm')
-        coco_eval.evaluate()
-        coco_eval.accumulate()
-        self.print_coco_eval_results(coco_eval)
-    @staticmethod
-    def encode_masks(masks, boxes, im_h, im_w):
-        mask_image = mask_util.project_masks(
-            masks, boxes, im_h, im_w,
-            cfg.TEST.BINARY_THRESH)
-        return mask_tools.encode(mask_image)
-    @staticmethod
-    def get_prefix(type='bbox'):
-        if type == 'bbox':
-            return 'detections'
-        elif type == 'segm':
-            return 'segmentations'
-        elif type == 'kpt':
-            return 'keypoints'
-        return ''
-    @staticmethod
-    def get_annotations_file(results_folder, type='bbox'):
-        # experiments/model_id/annotations/[GT]detections.json
-        filename = '[GT]' + COCOEvaluator.get_prefix(type) + '.json'
-        if not os.path.exists(results_folder):
-            os.makedirs(results_folder)
-        return os.path.join(results_folder, filename)
-    @staticmethod
-    def get_image_id(image_name):
-        image_id = image_name.split('_')[-1].split('.')[0]
-        try:
-            return int(image_id)
-        except ValueError:
-            return image_name
-    def get_results_file(self, results_folder, type='bbox'):
-        # experiments/model_id/results/detections.json
-        filename = self.get_prefix(type) + self.imdb.comp_id + '.json'
-        if not os.path.exists(results_folder):
-            os.makedirs(results_folder)
-        return os.path.join(results_folder, filename)
-    def print_coco_eval_results(self, coco_eval, iou_thr=(0.5, 0.95)):
-        def get_thr_ind(coco_eval, thr):
-            ind = np.where((coco_eval.params.iouThrs > thr - 1e-5) &
-                           (coco_eval.params.iouThrs < thr + 1e-5))[0][0]
-            iou_thr = coco_eval.params.iouThrs[ind]
-            assert np.isclose(iou_thr, thr)
-            return ind
-        ind_lo = get_thr_ind(coco_eval, iou_thr[0])
-        ind_hi = get_thr_ind(coco_eval, iou_thr[1])
-        # Precision has dims (iou, recall, cls, area range, max dets)
-        # Area range index 0: all area ranges
-        # Max dets index 2: 100 per image
-        precision_res = coco_eval.eval['precision']
-        precision = precision_res[ind_lo:(ind_hi + 1), :, :, 0, 2]
-        ap_default = np.mean(precision[precision > -1])
-        print('~~~~ Mean and per-category AP @ IoU=[{:.2f},{:.2f}] '
-              '~~~~'.format(iou_thr[0], iou_thr[1]))
-        print('{:.1f}'.format(100 * ap_default))
-        for cls_ind, cls in enumerate(self.imdb.classes):
-            if cls == '__background__':
-                continue
-            precision = precision_res[ind_lo:(ind_hi + 1), :, cls_ind - 1, 0, 2]
-            ap = np.mean(precision[precision > -1])
-            print('{:.1f}'.format(100 * ap))
-        print('~~~~ Summary metrics ~~~~')
-        coco_eval.summarize()
-    def segm_results_one_category(self, boxes, masks, cat_id, gt_recs):
-        def filter_boxes(dets):
-            boxes = dets[:, :4]
-            ws = boxes[:, 2] - boxes[:, 0]
-            hs = boxes[:, 3] - boxes[:, 1]
-            keep = np.where((ws >= 1) & (hs >= 1))[0]
-            return keep
-        results = []
-        ix = 0
-        for image_name, rec in gt_recs.items():
-            dets = boxes[ix].astype(np.float)
-            msks = masks[ix]
-            ix += 1
-            keep = filter_boxes(dets)
-            im_h, im_w = rec['height'], rec['width']
-            if len(keep) == 0:
-                continue
-            scores = dets[:, -1]
-            mask_encode = self.encode_masks(
-                msks[keep], dets[keep, :4], im_h, im_w)
-            for k in range(dets[keep].shape[0]):
-                rle = mask_encode[k]
-                if sys.version_info >= (3, 0):
-                    rle['counts'] = rle['counts'].decode()
-                results.append({
-                    'image_id': self.get_image_id(image_name),
-                    'category_id': cat_id,
-                    'segmentation': rle,
-                    'score': scores[k],
-                })
-        return results
-    def write_bbox_annotations(self, gt_recs, output_dir):
-        # Build images
-        dataset = {'images': []}
-        for image_name, rec in gt_recs.items():
-            dataset['images'].append({
-                'file_name': image_name + '.jpg',
-                'id': self.get_image_id(image_name),
-                'height': rec['height'], 'width': rec['width'],
-            })
-        # Build categories
-        dataset['categories'] = []
-        for cls in self.imdb.classes:
-            if cls == '__background__':
-                continue
-            dataset['categories'].append({
-                'name': cls,
-                'id': self.imdb.class_to_ind[cls],
-            })
-        # Build annotations
-        dataset['annotations'] = []
-        ann_id = 0
-        for image_name, rec in gt_recs.items():
-            for obj in rec['objects']:
-                x, y = obj['bbox'][0], obj['bbox'][1]
-                w, h = obj['bbox'][2] - x + 1, obj['bbox'][3] - y + 1
-                dataset['annotations'].append({
-                    'id': str(ann_id),
-                    'bbox': [x, y, w, h],
-                    'area': w * h,
-                    'iscrowd': obj['difficult'],
-                    'image_id': self.get_image_id(image_name),
-                    'category_id': self.imdb.class_to_ind[obj['name']],
-                })
-                ann_id += 1
-        ann_file = self.get_annotations_file(output_dir, 'bbox')
-        with open(ann_file, 'w') as f:
-            json.dump(dataset, f)
-        return ann_file
-    def write_bbox_results(self, all_boxes, gt_recs, output_dir):
-        filename = self.get_results_file(output_dir)
-        results = []
-        for cls_ind, cls in enumerate(self.imdb.classes):
-            if cls == '__background__':
-                continue
-            print('Collecting {} results ({:d}/{:d})'
-                  .format(cls, cls_ind, self.imdb.num_classes - 1))
-            cat_id = self.class_to_cat_id[cls]
-            results.extend(self.bbox_results_one_category(
-                all_boxes[cls_ind], cat_id, gt_recs))
-        print('Writing results json to {}'.format(filename))
-        with open(filename, 'w') as fid:
-            json.dump(results, fid)
-        return filename
-    def write_segm_annotations(self, gt_recs, output_dir):
-        # Build images
-        dataset = {'images': []}
-        for image_name, rec in gt_recs.items():
-            dataset['images'].append({
-                'file_name': image_name + '.jpg',
-                'id': self.get_image_id(image_name),
-                'height': rec['height'], 'width': rec['width'],
-            })
-        # Build categories
-        dataset['categories'] = []
-        for cls in self.imdb._classes:
-            if cls == '__background__':
-                continue
-            dataset['categories'].append({
-                'name': cls,
-                'id': self.imdb.class_to_ind[cls],
-            })
-        # Build annotations
-        dataset['annotations'] = []
-        ann_id = 0
-        for image_name, rec in gt_recs.items():
-            mask_size = (rec['height'], rec['width'])
-            for obj in rec['objects']:
-                x, y = obj['bbox'][0], obj['bbox'][1]
-                w, h = obj['bbox'][2] - x + 1, obj['bbox'][3] - y + 1
-                if 'mask' in obj:
-                    segm = {'size': mask_size, 'counts': obj['mask']}
-                    if sys.version_info >= (3, 0):
-                        segm['counts'] = segm['counts'].decode()
-                elif 'polygons' in obj:
-                    segm = []
-                    for poly in obj['polygons']:
-                        if isinstance(poly, np.ndarray):
-                            segm.append(poly.tolist())
-                        else:
-                            segm.append(poly)
-                else:
-                    raise ValueError('Excepted mask-rle or polygons.')
-                dataset['annotations'].append({
-                    'id': str(ann_id),
-                    'bbox': [x, y, w, h],
-                    'area': w * h,
-                    'segmentation': segm,
-                    'iscrowd': obj['difficult'],
-                    'image_id': self.get_image_id(image_name),
-                    'category_id': self.imdb.class_to_ind[obj['name']],
-                })
-                ann_id += 1
-        ann_file = self.get_annotations_file(output_dir, 'segm')
-        with open(ann_file, 'w') as f:
-            json.dump(dataset, f)
-        return ann_file
-    def write_segm_results(self, all_boxes, all_masks, gt_recs, output_dir):
-        filename = self.get_results_file(output_dir, 'segm')
-        results = []
-        for cls_ind, cls in enumerate(self.imdb.classes):
-            if cls == '__background__':
-                continue
-            print('Collecting {} results ({:d}/{:d})'
-                  .format(cls, cls_ind, self.imdb.num_classes - 1))
-            cat_id = self.class_to_cat_id[cls]
-            results.extend(self.segm_results_one_category(
-                all_boxes[cls_ind], all_masks[cls_ind], cat_id, gt_recs))
-        print('Writing results json to {}'.format(filename))
-        with open(filename, 'w') as fid:
-            json.dump(results, fid)
-        return filename
--- a/seetadet/datasets/dataset.py
+++ b/seetadet/datasets/dataset.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import uuid
-from seetadet.core.config import cfg
-from seetadet.datasets.coco_evaluator import COCOEvaluator
-from seetadet.datasets.voc_evaluator import VOCEvaluator
-class Dataset(object):
-    """The base dataset class."""
-    def __init__(self, source):
-        self._source = source
-        self._num_images = 0
-        self._classes = cfg.MODEL.CLASSES
-        self._class_to_ind = self._class_to_cat_id = \
-            dict(zip(self.classes, range(self.num_classes)))
-        self._salt = str(uuid.uuid4())
-        self.config = {'cleanup': True, 'use_salt': True}
-    @property
-    def classes(self):
-        return self._classes
-    @property
-    def class_to_ind(self):
-        return self._class_to_ind
-    @property
-    def cls(self):
-        return type(self)
-    @property
-    def comp_id(self):
-        return '_' + self._salt if self.config['use_salt'] else ''
-    @property
-    def num_classes(self):
-        return len(self._classes)
-    @property
-    def num_images(self):
-        return self._num_images
-    @property
-    def source(self):
-        return self._source
-    def competition_mode(self, on):
-        if on:
-            self.config['use_salt'] = False
-            self.config['cleanup'] = False
-        else:
-            self.config['use_salt'] = True
-            self.config['cleanup'] = True
-    def dump_detections(self, all_boxes, output_dir):
-        pass
-    def evaluate_detections(self, all_boxes, gt_recs, output_dir):
-        protocol = cfg.TEST.PROTOCOL
-        if 'voc' in protocol:
-            evaluator = VOCEvaluator(self)
-            evaluator.write_bbox_results(all_boxes, gt_recs, output_dir)
-            if '!' not in protocol:
-                for ovr in (0.5,):
-                    evaluator.do_bbox_eval(
-                        gt_recs,
-                        output_dir,
-                        iou=ovr,
-                        use_07_metric='2007' in protocol,
-                    )
-        elif 'coco' in protocol:
-            ann_file = cfg.TEST.JSON_FILE
-            evaluator = COCOEvaluator(self, ann_file)
-            if evaluator.coco is None:
-                ann_file = evaluator \
-                    .write_bbox_annotations(
-                        gt_recs, output_dir)
-                evaluator = COCOEvaluator(self, ann_file)
-            res_file = evaluator.write_bbox_results(
-                all_boxes, gt_recs, output_dir)
-            if '!' not in protocol:
-                evaluator.do_bbox_eval(res_file)
-    def evaluate_segmentations(self, all_boxes, all_masks, gt_recs, output_dir):
-        protocol = cfg.TEST.PROTOCOL
-        if 'voc' in protocol:
-            evaluator = VOCEvaluator(self)
-            evaluator.write_segm_results(all_boxes, all_masks, output_dir)
-            if '!' not in protocol:
-                for ovr in (0.5,):
-                    evaluator.do_segm_eval(
-                        gt_recs,
-                        output_dir,
-                        iou=ovr,
-                        use_07_metric='2007' in protocol,
-                    )
-        elif 'coco' in protocol:
-            ann_file = cfg.TEST.JSON_FILE
-            evaluator = COCOEvaluator(self, ann_file)
-            if evaluator.coco is None:
-                ann_file = evaluator \
-                    .write_segm_annotations(
-                        gt_recs, output_dir)
-                evaluator = COCOEvaluator(self, ann_file)
-            res_file = evaluator.write_segm_results(
-                all_boxes, all_masks, gt_recs, output_dir)
-            if '!' not in protocol:
-                evaluator.do_segm_eval(res_file)
--- a/seetadet/datasets/factory.py
+++ b/seetadet/datasets/factory.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os
-from seetadet.datasets import kpl_dataset
-def get_dataset(name):
-    """Get a dataset by name."""
-    keys = name.split('://')
-    if len(keys) >= 2:
-        cls, source = keys
-        if cls not in _GLOBAL_REGISTERED_DATASET:
-            raise KeyError('Unknown dataset:', cls)
-        return _GLOBAL_REGISTERED_DATASET[cls](source)
-    elif os.path.exists(name):
-        return _GLOBAL_REGISTERED_DATASET['default'](name)
-    else:
-        raise ValueError('Illegal dataset:', name)
-def list_dataset():
-    """List all registered dataset."""
-    return _GLOBAL_REGISTERED_DATASET.keys()
-_GLOBAL_REGISTERED_DATASET = {
-    'default': lambda source:
-        kpl_dataset.KPLRecordDataset(source),
-}
--- a/seetadet/datasets/kpl_dataset.py
+++ b/seetadet/datasets/kpl_dataset.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os
-import dragon
-from seetadet.core.config import cfg
-from seetadet.datasets.dataset import Dataset
-class KPLRecordDataset(Dataset):
-    def __init__(self, source):
-        super(KPLRecordDataset, self).__init__(source)
-        self._num_images = self.cls(self.source).size
-    @property
-    def cls(self):
-        return dragon.io.KPLRecordDataset
-    def dump_detections(self, all_boxes, output_dir):
-        dataset = self.cls(self.source)
-        for file in ('root.data', 'root.index', 'root.meta'):
-            file = os.path.join(output_dir, file)
-            if os.path.exists(file):
-                os.remove(file)
-        writer = dragon.io.KPLRecordWriter(output_dir, dataset.protocol)
-        for i in range(len(dataset)):
-            example = dataset.get()
-            example['object'] = []
-            for cls_ind, cls in enumerate(self.classes):
-                if cls == '__background__':
-                    continue
-                detections = all_boxes[cls_ind][i]
-                if len(detections) == 0:
-                    continue
-                for k in range(detections.shape[0]):
-                    if detections[k, -1] < cfg.VIS_TH:
-                        continue
-                    example['object'].append({
-                        'name': cls,
-                        'xmin': float(detections[k][0]),
-                        'ymin': float(detections[k][1]),
-                        'xmax': float(detections[k][2]),
-                        'ymax': float(detections[k][3]),
-                        'difficult': 0,
-                    })
-            writer.write(example)
--- a/seetadet/datasets/voc_eval.py
+++ b/seetadet/datasets/voc_eval.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# Codes are based on:
-#
-#     <https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/datasets/voc_eval.py>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import cv2
-import numpy as np
-from seetadet.core.config import cfg
-from seetadet.utils import boxes as box_util
-from seetadet.utils.env import pickle
-from seetadet.utils.mask import mask_overlap
-from seetadet.utils.pycocotools import mask_utils
-def voc_ap(rec, prec, use_07_metric=False):
-    """Compute VOC AP given precision and recall."""
-    if use_07_metric:
-        # 11 point metric
-        ap = 0.
-        for t in np.arange(0., 1.1, 0.1):
-            if np.sum(rec >= t) == 0:
-                p = 0
-            else:
-                p = np.max(prec[rec >= t])
-            ap = ap + p / 11.
-    else:
-        # Correct AP calculation
-        # First append sentinel values at the end
-        mrec = np.concatenate(([0.], rec, [1.]))
-        mpre = np.concatenate(([0.], prec, [0.]))
-        # Compute the precision envelope
-        for i in range(mpre.size - 1, 0, -1):
-            mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
-        # To calculate area under PR curve, look for points
-        # where X axis (recall) changes value
-        i = np.where(mrec[1:] != mrec[:-1])[0]
-        # And sum (\Delta recall) * prec
-        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
-    return ap
-def voc_bbox_eval(
-    det_file,
-    gt_recs,
-    cls_name,
-    iou=0.5,
-    use_07_metric=False,
-):
-    class_recs, n_pos = {}, 0
-    for image_name, rec in gt_recs.items():
-        objects = [obj for obj in rec['objects'] if obj['name'] == cls_name]
-        bbox = np.array([x['bbox'] for x in objects])
-        diff = np.array([x['difficult'] for x in objects]).astype(np.bool)
-        det = [False] * len(objects)
-        n_pos = n_pos + sum(~diff)
-        class_recs[image_name] = {'bbox': bbox, 'difficult': diff, 'det': det}
-    # Read detections
-    with open(det_file, 'r') as f:
-        lines = f.readlines()
-    splitlines = [x.strip().split(' ') for x in lines]
-    image_ids = [x[0] for x in splitlines]
-    confidence = np.array([float(x[1]) for x in splitlines])
-    BB = np.array([[float(z) for z in x[2:]] for x in splitlines])
-    # Avoid IndexError if detecting nothing
-    if len(BB) == 0:
-        return 0, 0, -1
-    # Sort by confidence
-    sorted_ind = np.argsort(-confidence)
-    BB = BB[sorted_ind, :]
-    image_ids = [image_ids[x] for x in sorted_ind]
-    # Go down detections and mark TPs and FPs
-    nd = len(image_ids)
-    tp, fp = np.zeros(nd), np.zeros(nd)
-    def compute_overlaps(bb, BBGT):
-        ixmin = np.maximum(BBGT[:, 0], bb[0])
-        iymin = np.maximum(BBGT[:, 1], bb[1])
-        ixmax = np.minimum(BBGT[:, 2], bb[2])
-        iymax = np.minimum(BBGT[:, 3], bb[3])
-        iw = np.maximum(ixmax - ixmin + 1., 0.)
-        ih = np.maximum(iymax - iymin + 1., 0.)
-        inters = iw * ih
-        uni = ((bb[2] - bb[0] + 1.) *
-               (bb[3] - bb[1] + 1.) +
-               (BBGT[:, 2] - BBGT[:, 0] + 1.) *
-               (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
-        return inters / uni
-    for d in range(nd):
-        R = class_recs[image_ids[d]]
-        bb = BB[d, :].astype(float)
-        ov_max, j_max = -np.inf, 0
-        BBGT = R['bbox'].astype(float)
-        if BBGT.size > 0:
-            overlaps = compute_overlaps(bb, BBGT)
-            ov_max = np.max(overlaps)
-            j_max = np.argmax(overlaps)
-        if ov_max > iou:
-            if not R['difficult'][j_max]:
-                if not R['det'][j_max]:
-                    tp[d] = 1.
-                    R['det'][j_max] = 1
-                else:
-                    fp[d] = 1.
-        else:
-            fp[d] = 1.
-    # Compute precision recall
-    fp = np.cumsum(fp)
-    tp = np.cumsum(tp)
-    rec = tp / float(n_pos)
-    # Avoid divide by zero in case the first detection
-    prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
-    ap = voc_ap(rec, prec, use_07_metric)
-    return rec, prec, ap
-def voc_segm_eval(
-    det_file,
-    seg_file,
-    gt_recs,
-    cls_name,
-    iou=0.5,
-    use_07_metric=False,
-):
-    # 0. Constants
-    M = cfg.MRCNN.RESOLUTION
-    binary_thresh = cfg.TEST.BINARY_THRESH
-    scale = (M + 2.) / M
-    padded_mask = np.zeros((M + 2, M + 2), dtype=np.float32)
-    # 1. Get bbox & mask ground truths
-    image_names, class_recs, n_pos = [], {}, 0
-    for image_name, rec in gt_recs.items():
-        objects = [obj for obj in rec['objects'] if obj['name'] == cls_name]
-        bbox = np.array([x['bbox'] for x in objects])
-        mask = np.array([
-            mask_utils.bytes2img(
-                x['mask'],
-                rec['height'],
-                rec['width']
-            ) for x in objects]
-        )
-        difficult = np.array([x['difficult'] for x in objects]).astype(np.bool)
-        det = [False] * len(objects)
-        n_pos = n_pos + sum(~difficult)
-        class_recs[image_name] = {
-            'bbox': bbox,
-            'mask': mask,
-            'difficult': difficult,
-            'det': det
-        }
-        image_names.append(image_name)
-    # 2. Get predict pickle file for this class
-    with open(det_file, 'rb') as f:
-        boxes_pkl = pickle.load(f)
-    with open(seg_file, 'rb') as f:
-        masks_pkl = pickle.load(f)
-    # 3. Pre-compute number of total instances to allocate memory
-    num_images = len(gt_recs)
-    box_num = 0
-    for im_i in range(num_images):
-        box_num += len(boxes_pkl[im_i])
-    # avoid IndexError if detecting nothing
-    if box_num == 0:
-        return 0, 0, -1
-    # 4. Re-organize all the predicted boxes
-    new_boxes = np.zeros((box_num, 5))
-    new_masks = np.zeros((box_num, M, M))
-    new_images = []
-    cnt = 0
-    for image_ind in range(num_images):
-        boxes = boxes_pkl[image_ind]
-        masks = masks_pkl[image_ind]
-        num_instance = len(boxes)
-        for box_ind in range(num_instance):
-            new_boxes[cnt] = boxes[box_ind]
-            new_masks[cnt] = masks[box_ind]
-            new_images.append(image_names[image_ind])
-            cnt += 1
-    # 5. Rearrange boxes according to their scores
-    seg_scores = new_boxes[:, -1]
-    keep_inds = np.argsort(-seg_scores)
-    new_boxes = new_boxes[keep_inds, :]
-    new_masks = new_masks[keep_inds, :, :]
-    num_pred = new_boxes.shape[0]
-    # 6. Calculate t/f positive
-    fp = np.zeros((num_pred, 1))
-    tp = np.zeros((num_pred, 1))
-    ref_boxes = box_util.expand_boxes(new_boxes, scale)
-    ref_boxes = ref_boxes.astype(np.int32)
-    for i in range(num_pred):
-        image_name = new_images[keep_inds[i]]
-        if image_name not in class_recs:
-            print('Warning: {} does not exist in the ground-truths.'.format(image_name))
-            fp[i] = 1
-            continue
-        R = class_recs[image_name]
-        im_h = gt_recs[image_name]['height']
-        im_w = gt_recs[image_name]['width']
-        # Decode mask
-        ref_box = ref_boxes[i, :4]
-        mask = new_masks[i]
-        padded_mask[1:-1, 1:-1] = mask[:, :]
-        w = ref_box[2] - ref_box[0] + 1
-        h = ref_box[3] - ref_box[1] + 1
-        w = np.maximum(w, 1)
-        h = np.maximum(h, 1)
-        mask = cv2.resize(padded_mask, (w, h))
-        mask = np.array(mask > binary_thresh, dtype=np.uint8)
-        x1 = max(ref_box[0], 0)
-        y1 = max(ref_box[1], 0)
-        x2 = min(ref_box[2] + 1, im_w)
-        y2 = min(ref_box[3] + 1, im_h)
-        pred_mask = mask[(y1 - ref_box[1]): (y2 - ref_box[1]),
-                         (x1 - ref_box[0]): (x2 - ref_box[0])]
-        # Calculate max region overlap
-        ovmax, jmax = -1, -1
-        for j in range(len(R['det'])):
-            gt_mask_bound = R['bbox'][j].astype(int)
-            pred_mask_bound = new_boxes[i, :4].astype(int)
-            crop_mask = R['mask'][j][gt_mask_bound[1]:gt_mask_bound[3] + 1,
-                                     gt_mask_bound[0]:gt_mask_bound[2] + 1]
-            ov = mask_overlap(gt_mask_bound,
-                              pred_mask_bound,
-                              crop_mask,
-                              pred_mask)
-            if ov > ovmax:
-                ovmax = ov
-                jmax = j
-        if ovmax > iou:
-            if not R['difficult'][jmax]:
-                if not R['det'][jmax]:
-                    tp[i] = 1.
-                    R['det'][jmax] = 1
-                else:
-                    fp[i] = 1.
-        else:
-            fp[i] = 1
-    # 7. Calculate precision
-    fp = np.cumsum(fp)
-    tp = np.cumsum(tp)
-    rec = tp / float(n_pos)
-    # Avoid divide by zero in case the first matches a difficult gt
-    prec = tp / np.maximum(fp + tp, np.finfo(np.float64).eps)
-    ap = voc_ap(rec, prec, use_07_metric=use_07_metric)
-    return ap
--- a/seetadet/datasets/voc_evaluator.py
+++ b/seetadet/datasets/voc_evaluator.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os
-import numpy as np
-from seetadet.datasets import voc_eval
-from seetadet.utils.env import pickle
-class VOCEvaluator(object):
-    """Evaluator for PASCAL VOC dataset."""
-    def __init__(self, imdb):
-        self.imdb = imdb
-    def do_bbox_eval(
-        self,
-        gt_recs,
-        output_dir,
-        iou=0.5,
-        use_07_metric=True,
-    ):
-        aps = []
-        print('~~~~~~ Evaluation IoU@%s ~~~~~~' % str(iou))
-        print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No'))
-        for i, cls in enumerate(self.imdb.classes):
-            if cls == '__background__':
-                continue
-            det_file = self.get_results_file(output_dir).format(cls)
-            rec, prec, ap = \
-                voc_eval.voc_bbox_eval(
-                    det_file,
-                    gt_recs, cls,
-                    iou=iou,
-                    use_07_metric=use_07_metric,
-                )
-            if ap > 0:
-                aps += [ap]
-            print('AP for {} = {:.4f}'.format(cls, ap))
-        print('Mean AP = {:.4f}\n'.format(np.mean(aps)))
-    def do_segm_eval(
-        self,
-        gt_recs,
-        output_dir,
-        iou=0.5,
-        use_07_metric=True,
-    ):
-        aps = []
-        print('~~~~~~ Evaluation IoU@%s ~~~~~~' % str(iou))
-        print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No'))
-        for i, cls in enumerate(self.imdb.classes):
-            if cls == '__background__':
-                continue
-            segm_filename = self.get_results_file(output_dir, 'segm').format(cls)
-            bbox_filename = segm_filename.replace('segmentations', 'detections')
-            ap = voc_eval.voc_segm_eval(
-                bbox_filename,
-                segm_filename,
-                gt_recs, cls,
-                iou=iou,
-                use_07_metric=use_07_metric,
-            )
-            if ap > 0:
-                aps += [ap]
-            print('AP for {} = {:.4f}'.format(cls, ap))
-        print('Mean AP = {:.4f}\n'.format(np.mean(aps)))
-    @staticmethod
-    def get_prefix(type='bbox'):
-        if type == 'bbox':
-            return 'detections'
-        elif type == 'segm':
-            return 'segmentations'
-        elif type == 'kpt':
-            return 'keypoints'
-        return ''
-    def get_results_file(self, results_folder, type='bbox'):
-        # experiments/model_id/results/detections_<comp_id>_<class_name>.txt
-        if type == 'bbox':
-            filename = self.get_prefix(type) + self.imdb.comp_id + '_{:s}.txt'
-        elif type == 'segm':
-            filename = self.get_prefix(type) + self.imdb.comp_id + '_{:s}.pkl'
-        else:
-            raise ValueError('Type of results can be either bbox or segm.')
-        if not os.path.exists(results_folder):
-            os.makedirs(results_folder)
-        return os.path.join(results_folder, filename)
-    def write_bbox_results(self, all_boxes, gt_recs, output_dir):
-        for cls_ind, cls in enumerate(self.imdb.classes):
-            if cls == '__background__':
-                continue
-            print('Writing {} VOC format bbox results'.format(cls))
-            filename = self.get_results_file(output_dir).format(cls)
-            with open(filename, 'wt') as f:
-                ix = 0
-                for image_id, rec in gt_recs.items():
-                    dets = all_boxes[cls_ind][ix]
-                    ix += 1
-                    if len(dets) == 0:
-                        continue
-                    for k in range(dets.shape[0]):
-                        content = '{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}' \
-                            .format(image_id, dets[k, -1],
-                                    dets[k, 0] + 1, dets[k, 1] + 1,
-                                    dets[k, 2] + 1, dets[k, 3] + 1)
-                        if dets.shape[1] == 6:
-                            content += ' {:.2f}'.format(dets[k, 4])
-                        f.write(content + '\n')
-    def write_segm_results(self, all_boxes, all_masks, output_dir):
-        for cls_inds, cls in enumerate(self.imdb.classes):
-            if cls == '__background__':
-                continue
-            print('Writing {} VOC format segm results'.format(cls))
-            segm_filename = self.get_results_file(output_dir, 'segm').format(cls)
-            bbox_filename = segm_filename.replace('segmentations', 'detections')
-            with open(bbox_filename, 'wb') as f:
-                pickle.dump(all_boxes[cls_inds], f, pickle.HIGHEST_PROTOCOL)
-            with open(segm_filename, 'wb') as f:
-                pickle.dump(all_masks[cls_inds], f, pickle.HIGHEST_PROTOCOL)
--- a/seetadet/modeling/airnet.py
+++ b/seetadet/modeling/airnet.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""AirNet backbone."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import dragon.vm.torch as torch
-from seetadet.core import registry
-from seetadet.core.config import cfg
-from seetadet.modules import init
-from seetadet.modules import nn
-class ResBlock(nn.Module):
-    """The resnet block."""
-    def __init__(self, dim_in, dim_out, stride=1, downsample=None):
-        super(ResBlock, self).__init__()
-        norm = cfg.MODEL.BACKBONE_NORM
-        self.conv1 = nn.Conv3x3(dim_in, dim_out, stride)
-        self.bn1 = nn.get_norm(norm, dim_out)
-        self.conv2 = nn.Conv3x3(dim_out, dim_out)
-        self.bn2 = nn.get_norm(norm, dim_out)
-        self.downsample = downsample
-        self.relu = nn.ReLU(inplace=True)
-    def forward(self, x):
-        identity = x
-        out = self.conv1(x)
-        out = self.bn1(out)
-        out = self.relu(out)
-        out = self.conv2(out)
-        out = self.bn2(out)
-        if self.downsample is not None:
-            identity = self.downsample(x)
-        out += identity
-        out = self.relu(out)
-        return out
-class InceptionBlock(nn.Module):
-    """The inception block."""
-    def __init__(self, dim_in, dim_out):
-        super(InceptionBlock, self).__init__()
-        norm = cfg.MODEL.BACKBONE_NORM
-        self.conv1 = nn.Conv1x1(dim_in, dim_out)
-        self.bn1 = nn.get_norm(norm, dim_out)
-        self.conv2 = nn.Conv3x3(dim_out, dim_out // 2)
-        self.bn2 = nn.get_norm(norm, dim_out // 2)
-        self.conv3a = nn.Conv3x3(dim_out // 2, dim_out)
-        self.bn3a = nn.get_norm(norm, dim_out)
-        self.conv3b = nn.Conv3x3(dim_out, dim_out)
-        self.bn3b = nn.get_norm(norm, dim_out)
-        self.conv4 = nn.Conv3x3(dim_out * 3, dim_out)
-        self.bn4 = nn.get_norm(norm, dim_out)
-        self.relu = nn.ReLU(inplace=True)
-    def forward(self, x):
-        identity = x
-        out = self.conv1(x)
-        out_1x1 = self.bn1(out)
-        out_1x1 = self.relu(out_1x1)
-        out = self.conv2(out_1x1)
-        out = self.bn2(out)
-        out = self.relu(out)
-        out = self.conv3a(out)
-        out_3x3_a = self.bn3a(out)
-        out_3x3_a = self.relu(out_3x3_a)
-        out = self.conv3b(out_1x1)
-        out_3x3_b = self.bn3b(out)
-        out_3x3_b = self.relu(out_3x3_b)
-        out = torch.cat([out_1x1, out_3x3_a, out_3x3_b], 1)
-        out = self.conv4(out)
-        out = self.bn4(out)
-        out += identity
-        out = self.relu(out)
-        return out
-class AirNet(nn.Module):
-    """The airnet class."""
-    def __init__(self, model_cfg):
-        super(AirNet, self).__init__()
-        dim_in, dims, features = 64, [64, 128, 256, 384], []
-        self.conv1 = nn.Conv2d(3, 64, kernel_size=7,
-                               stride=2, padding=3, bias=False)
-        self.bn1 = nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_in)
-        self.relu = nn.ReLU(inplace=True)
-        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
-        self.feature_dims = collections.OrderedDict(stem=64)
-        for i, v, dim_out in zip(range(4), model_cfg, dims):
-            stride = 1 if i == 0 else 2
-            downsample = nn.Sequential(
-                nn.Conv1x1(dim_in, dim_out, stride=stride),
-                nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_out),
-            )
-            features.append(ResBlock(dim_in, dim_out, stride, downsample))
-            for j in range(1, len(v)):
-                if v[j] == 'r':
-                    features.append(ResBlock(dim_out, dim_out))
-                elif v[j] == 'i':
-                    features.append(InceptionBlock(dim_out, dim_out))
-                else:
-                    raise ValueError('Unknown block flag: ' + v[i])
-            setattr(self, 'layer%d' % (i + 1), nn.Sequential(*features[-len(v):]))
-            self.feature_dims[id(features[-1])] = dim_in = dim_out
-        self.features = features
-        self.reset_parameters()
-    def reset_parameters(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                init.kaiming_normal(m.weight, mode='fan_out')
-    def forward(self, x):
-        x = self.conv1(x)
-        x = self.bn1(x)
-        x = self.relu(x)
-        x = self.maxpool(x)
-        outputs = [None]
-        for layer in self.features:
-            x = layer(x)
-            if self.feature_dims.get(id(layer)):
-                outputs.append(x)
-        return outputs
-def airnet(num_layers=5):
-    model_cfg = (('r', 'r'), ('r', 'i'), ('r', 'i'), ('r', 'i'))
-    return AirNet(model_cfg[:num_layers])
-registry.backbone.register('airnet', airnet)
-registry.backbone.register('airnet_3b', airnet, num_layers=3)
-registry.backbone.register('airnet_4b', airnet, num_layers=4)
-registry.backbone.register('airnet_5b', airnet, num_layers=5)
--- a/seetadet/modeling/detector.py
+++ b/seetadet/modeling/detector.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""Generic detector."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import importlib
-import dragon.vm.torch as torch
-from seetadet import modeling as models
-from seetadet.core.config import cfg
-from seetadet.core import registry
-from seetadet.modules import nn
-from seetadet.modules import utils as module_util
-from seetadet.modules import vision
-from seetadet.utils import logger
-class Detector(nn.Module):
-    """Organize the detection pipelines."""
-    def __init__(self):
-        super(Detector, self).__init__()
-        model_type = cfg.MODEL.TYPE
-        backbone = cfg.MODEL.BACKBONE.lower().split('.')
-        conv_body, conv_modules = backbone[0], backbone[1:]
-        # DataLoader
-        self.data_loader = None
-        self.data_loader_cls = getattr(importlib.import_module(
-            'seetadet.algo.{}'.format(model_type)), 'DataLoader')
-        self.image_norm = vision.ImageNormalizer()
-        # FeatureExtractor
-        self.conv_body = registry.backbone.get(conv_body)()
-        feature_dims = list(self.conv_body.feature_dims.values())
-        # FeatureEnhancer
-        if 'fpn' in conv_modules:
-            self.fpn = models.FPN(feature_dims)
-            feature_dims = [self.fpn.feature_dim]
-        # DetectionHead
-        if 'rcnn' in model_type:
-            self.rpn = models.RPN(feature_dims[0])
-            if 'faster' in model_type:
-                self.rcnn = models.FastRCNN(feature_dims[0])
-            elif 'mask' in model_type:
-                self.rcnn = models.MaskRCNN(feature_dims[0])
-            else:
-                raise ValueError('Unsupported model: ' + model_type)
-        elif model_type == 'retinanet':
-            self.retinanet = models.RetinaNet(feature_dims[0])
-        elif model_type == 'ssd':
-            self.ssd = models.SSD(feature_dims)
-        else:
-            raise ValueError('Unsupported model: ' + model_type)
-    def load_weights(self, weights):
-        """Load the state dict of this detector.
-        Note that the mismatched keys will be ignored.
-        Parameters
-        ----------
-        weights : str
-            The path of the weights file.
-        """
-        self.load_state_dict(torch.load(weights), strict=False)
-    def forward(self, inputs=None):
-        """Compute the detection outputs.
-        Parameters
-        ----------
-        inputs : dict, optional
-            The inputs.
-        Returns
-        -------
-        dict
-            The outputs.
-        """
-        # Get the inputs
-        if inputs is None:
-            if self.data_loader is None:
-                self.data_loader = self.data_loader_cls()
-            inputs = self.data_loader()
-        # Extract features
-        image = self.image_norm(inputs['image'])
-        features = self.conv_body(image)
-        # Apply the FPN to enhance features if necessary
-        if hasattr(self, 'fpn'):
-            features = self.fpn(features)
-        # Collect detection outputs
-        outputs = collections.OrderedDict()
-        # Features -> RPN -> R-CNN
-        if hasattr(self, 'rpn'):
-            outputs.update(self.rpn(features=features, **inputs))
-            outputs.update(
-                self.rcnn(
-                    features=features,
-                    rpn_cls_score=outputs['rpn_cls_score'],
-                    rpn_bbox_pred=outputs['rpn_bbox_pred'],
-                    **inputs
-                )
-            )
-        # Features -> RetinaNet
-        if hasattr(self, 'retinanet'):
-            outputs.update(self.retinanet(features=features, **inputs))
-        # Features -> SSD
-        if hasattr(self, 'ssd'):
-            outputs.update(self.ssd(features=features, **inputs))
-        return outputs
-    def optimize_for_inference(self):
-        """Optimize the graph for the inference."""
-        # Optimization #1: LayerFusion
-        fusions = set()
-        last_module = None
-        for module in self.modules():
-            pass_key, pass_fn = module_util \
-                .get_fusion_pass(last_module, module)
-            if pass_fn is not None:
-                fusions.add(pass_key)
-                pass_fn(last_module, module)
-            last_module = module
-        if len(fusions) > 0:
-            logger.info('Enable fusions: ' + ', '.join(fusions))
-def new_detector(device, weights=None, training=False):
-    detector = Detector().cuda(device)
-    if weights is not None:
-        detector.load_weights(weights)
-    if not training:
-        detector.eval()
-        detector.optimize_for_inference()
-    # Enable the fp16 inference support if necessary
-    # Boost a little if TensorCore is available
-    if cfg.MODEL.PRECISION.lower() == 'float16':
-        detector.half()
-    return detector
--- a/seetadet/modeling/efficientnet.py
+++ b/seetadet/modeling/efficientnet.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""EfficientNet backbone."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import functools
-import math
-from seetadet.core import registry
-from seetadet.modeling.mobilenet_v3 import conv_triplet
-from seetadet.modeling.mobilenet_v3 import conv_quintet
-from seetadet.modeling.mobilenet_v3 import make_divisible
-from seetadet.modules import init
-from seetadet.modules import nn
-class SqueezeExcite(nn.Module):
-    """Squeeze-excite attention module."""
-    def __init__(self, dim_in, dim_squeeze, squeeze_ratio=0.25):
-        super(SqueezeExcite, self).__init__()
-        dim = int(dim_squeeze * squeeze_ratio)
-        self.layers = nn.Sequential(nn.AvgPool2d(-1, global_pooling=True),
-                                    nn.Conv2d(dim_in, dim, kernel_size=1),
-                                    nn.Swish(),
-                                    nn.Conv2d(dim, dim_in, kernel_size=1),
-                                    nn.Sigmoid(True))
-    def forward(self, x):
-        return x * self.layers(x)
-class InvertedResidual(nn.Module):
-    """Invert residual block."""
-    def __init__(
-        self,
-        dim_in,
-        dim_out,
-        kernel_size=3,
-        expand_ratio=3,
-        stride=1,
-        activation=None,
-        squeeze_excite=0,
-    ):
-        super(InvertedResidual, self).__init__()
-        self.stride = stride
-        self.apply_residual = stride == 1 and dim_in == dim_out
-        self.dim = dim = int(round(dim_in * expand_ratio))
-        self.endpoint = None  # Expansion feature
-        layers = []
-        if expand_ratio != 1:
-            layers.append(nn.Sequential(*conv_triplet(
-                dim_in, dim, activation=activation)))
-        expansion_transform = None
-        if squeeze_excite > 0:
-            expansion_transform = SqueezeExcite(dim, dim_in)
-        quintet = conv_quintet(dim, dim_out,
-                               kernel_size=kernel_size,
-                               stride=stride,
-                               activation=activation,
-                               expansion_transform=expansion_transform)
-        layers.append(nn.Sequential(*quintet[:3]))
-        layers.extend(quintet[3:])
-        self.conv = nn.Sequential(*layers)
-    def forward(self, x):
-        out = self.conv[0](x)
-        self.endpoint = out if self.stride == 2 else None
-        for layer in self.conv[1:]:
-            out = layer(out)
-        if self.apply_residual:
-            out += x
-        return out
-class NASMobileNet(nn.Module):
-    """NAS variant of mobilenet class."""
-    def __init__(self, arch, preset, width_mult=1.0, depth_mult=1.0):
-        super(NASMobileNet, self).__init__()
-        # Hand-craft configurations.
-        repeats, strides, out_channels, def_blocks = preset
-        assert sum(repeats) == len(arch), 'Bad architecture.'
-        self.feature_dims = collections.OrderedDict()
-        # Apply the width scaling.
-        out_channels = list(map(lambda x: make_divisible(x * width_mult),
-                                out_channels))
-        # Apply the depth scaling.
-        repeated_arch = []
-        for i, repeat in enumerate(repeats):
-            idx_start = sum(repeats[:i])
-            indices = arch[idx_start: idx_start + repeat]
-            repeat = int(math.ceil(repeat * depth_mult))
-            repeated_arch += (indices + [indices[-1]] * (repeat - len(indices)))
-        arch = repeated_arch
-        # Stem.
-        features = [nn.Sequential(
-            *conv_triplet(
-                dim_in=3,
-                dim_out=out_channels[0],
-                kernel_size=3,
-                stride=2,
-                activation=nn.Swish(),
-            ))]
-        # Blocks.
-        dim_in, stride_out = out_channels[0], 2
-        for repeat, dim_out, stride in \
-                zip(repeats, out_channels[1:], strides):
-            repeat = int(math.ceil(repeat * depth_mult))
-            stride_out *= stride
-            for i in range(repeat):
-                stride = stride if i == 0 else 1
-                idx = arch[len(features) - 1]
-                if def_blocks is None:
-                    block = functools.partial(
-                        InvertedResidual,
-                        kernel_size=(idx // 100) % 10,
-                        expand_ratio=int(idx / 1000.) / 10,
-                        squeeze_excite=idx % 10)
-                else:
-                    block = def_blocks[idx]
-                features.append(block(
-                    dim_in, dim_out,
-                    stride=stride,
-                    activation=nn.Swish()))
-                dim_in = dim_out
-                if stride == 2:
-                    self.feature_dims[id(features[-1])] = features[-1].dim
-        features.append(nn.Sequential(
-            *conv_triplet(
-                dim_in=dim_in,
-                dim_out=out_channels[-1],
-                kernel_size=1,
-                stride=1,
-                activation=nn.Swish())))
-        self.feature_dims[id(features[-1])] = out_channels[-1]
-        self.features = nn.Sequential(*features)
-        self.reset_parameters()
-    def reset_parameters(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                init.kaiming_normal(m.weight, mode='fan_out')
-    def forward(self, x):
-        outputs = []
-        for i, layer in enumerate(self.features):
-            x = layer(x)
-            if self.feature_dims.get(id(layer)):
-                outputs.append(getattr(layer, 'endpoint', x))
-        return outputs
-class ModelSetting(object):
-    """Hand-craft model setting."""
-    # Default NASBlocks definition.
-    # We use the following hash method:
-    # ef * 10000 + kernel_size * 100 + se * 1
-    # e.g., ef=4.0, ks=3, se=True, with index 40301
-    DEFAULT_NAS_BLOCKS_DEF = None
-    EFFICIENT = (
-        [1, 2, 2, 3, 3, 4, 1],
-        [1, 2, 2, 2, 1, 2, 1],
-        [32, 16, 24, 40, 80, 112, 192, 320, 1280],
-        DEFAULT_NAS_BLOCKS_DEF,
-    )
-def efficientnet(width_mult=1.0, depth_mult=1.0):
-    return NASMobileNet([10301,
-                         60301, 60301,
-                         60501, 60501,
-                         60301, 60301, 60301,
-                         60501, 60501, 60501,
-                         60501, 60501, 60501, 60501,
-                         60301],
-                        preset=ModelSetting.EFFICIENT,
-                        width_mult=width_mult,
-                        depth_mult=depth_mult)
-@registry.backbone.register('efficientnet_b0')
-def efficientnet_b0():
-    return efficientnet(width_mult=1.0, depth_mult=1.0)
-@registry.backbone.register('efficientnet_b1')
-def efficientnet_b1():
-    return efficientnet(width_mult=1.0, depth_mult=1.1)
-@registry.backbone.register('efficientnet_b2')
-def efficientnet_b2():
-    return efficientnet(width_mult=1.1, depth_mult=1.2)
-@registry.backbone.register('efficientnet_b3')
-def efficientnet_b3():
-    return efficientnet(width_mult=1.2, depth_mult=1.4)
-@registry.backbone.register('efficientnet_b4')
-def efficientnet_b4():
-    return efficientnet(width_mult=1.4, depth_mult=1.8)
-@registry.backbone.register('efficientnet_b5')
-def efficientnet_b5():
-    return efficientnet(width_mult=1.6, depth_mult=2.2)
-@registry.backbone.register('efficientnet_b6')
-def efficientnet_b6():
-    return efficientnet(width_mult=1.8, depth_mult=2.6)
--- a/seetadet/modeling/fast_rcnn.py
+++ b/seetadet/modeling/fast_rcnn.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""FastRCNN head."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import functools
-import dragon.vm.torch as torch
-from seetadet.algo import faster_rcnn
-from seetadet.core.config import cfg
-from seetadet.modules import det
-from seetadet.modules import init
-from seetadet.modules import nn
-from seetadet.modules import vision
-class FastRCNN(nn.Module):
-    r"""Generate proposal regions for R-CNN series.
-    The pipeline is as follows:
-    ... ->   RoIs   \                          /-> cls_score -> cls_loss
-                     -> RoIFeatureXform -> MLP
-    ... -> Features /                          \-> bbox_pred -> bbox_loss
-    """
-    def __init__(self, dim_in=256):
-        super(FastRCNN, self).__init__()
-        self.data = {}
-        self.roi_head_dim = dim_in * (cfg.FRCNN.ROI_XFORM_RESOLUTION ** 2)
-        self.fc6 = nn.Linear(self.roi_head_dim, cfg.FRCNN.MLP_HEAD_DIM)
-        self.fc7 = nn.Linear(cfg.FRCNN.MLP_HEAD_DIM, cfg.FRCNN.MLP_HEAD_DIM)
-        self.cls_score = nn.Linear(cfg.FRCNN.MLP_HEAD_DIM, len(cfg.MODEL.CLASSES))
-        self.bbox_pred = nn.Linear(cfg.FRCNN.MLP_HEAD_DIM, len(cfg.MODEL.CLASSES) * 4)
-        self.rpn_decoder = det.RPNDecoder()
-        self.proposal = faster_rcnn.Proposal()
-        self.proposal_target = faster_rcnn.ProposalTarget()
-        self.softmax = nn.Softmax(dim=1)
-        self.relu = nn.ReLU(inplace=True)
-        self.sigmoid = nn.Sigmoid()
-        self.box_roi_feature = functools.partial({
-            'RoIPool': vision.roi_pool,
-            'RoIAlign': vision.roi_align,
-        }[cfg.FRCNN.ROI_XFORM_METHOD],
-            size=cfg.FRCNN.ROI_XFORM_RESOLUTION,
-            sampling_ratio=cfg.FRCNN.ROI_XFORM_SAMPLING_RATIO)
-        self.cls_loss = nn.CrossEntropyLoss()
-        if cfg.FRCNN.BBOX_REG_LOSS_TYPE.lower() == 'l1':
-            self.bbox_loss = nn.L1Loss(reduction='sum')
-        else:
-            self.bbox_loss = nn.SmoothL1Loss(beta=1.0, reduction='sum')
-        # Compute spatial scales according to strides.
-        self.spatial_scales = [
-            1. / (2 ** lvl)
-            for lvl in range(
-                cfg.FPN.ROI_MIN_LEVEL,
-                cfg.FPN.ROI_MAX_LEVEL + 1
-            )]
-        self.reset_parameters()
-    def reset_parameters(self):
-        init.normal(self.cls_score.weight, std=0.01)
-        init.normal(self.bbox_pred.weight, std=0.001)
-        for name, param in self.named_parameters():
-            if 'bias' in name:
-                init.constant(param, 0)
-    def forward(self, **kwargs):
-        # Generate proposals.
-        proposal_fn = self.proposal \
-            if self.training else self.rpn_decoder
-        self.data = {
-            'rois': proposal_fn(
-                features=kwargs['features'],
-                cls_prob=self.sigmoid(kwargs['rpn_cls_score'].data),
-                bbox_pred=kwargs['rpn_bbox_pred'],
-                im_info=kwargs['im_info'],
-            )
-        }
-        # Generate targets from proposals.
-        if self.training:
-            self.data.update(
-                self.proposal_target(
-                    rois=self.data['rois'],
-                    gt_boxes=kwargs['gt_boxes'],
-                )
-            )
-        # Transform RoI features.
-        if len(self.data['rois']) > 1:
-            roi_features = \
-                torch.cat([
-                    self.box_roi_feature(
-                        kwargs['features'][i],
-                        self.data['rois'][i],
-                        spatial_scale,
-                    ) for i, spatial_scale in enumerate(self.spatial_scales)
-                ], dim=0)
-        else:
-            roi_features = \
-                self.box_roi_feature(
-                    kwargs['features'][0],
-                    self.data['rois'][0],
-                    1. / cfg.RPN.STRIDES[0],
-                )
-        # Apply a simple MLP.
-        roi_features = roi_features.view(-1, self.roi_head_dim)
-        roi_features = self.relu(self.fc6(roi_features))
-        roi_features = self.relu(self.fc7(roi_features))
-        # Compute logits and losses.
-        outputs = collections.OrderedDict()
-        cls_score = self.cls_score(roi_features).float()
-        outputs['bbox_pred'] = self.bbox_pred(roi_features).float()
-        if self.training:
-            # Compute rcnn losses.
-            bbox_pred = outputs['bbox_pred'].view(0, -1, 4) \
-                .index_select((0, 1), self.data['bbox_inds'])
-            batch_size = roi_features.size(0)
-            bbox_loss_weight = cfg.FRCNN.BBOX_REG_LOSS_WEIGHT
-            bbox_loss_weight /= float(batch_size)
-            outputs.update(collections.OrderedDict([
-                ('cls_loss', self.cls_loss(
-                    cls_score,
-                    self.data['labels'])),
-                ('bbox_loss', self.bbox_loss(
-                    bbox_pred,
-                    self.data['bbox_targets'],
-                    self.data['bbox_anchors']) * bbox_loss_weight),
-            ]))
-        else:
-            # Return the rois to decode the refine boxes.
-            if len(self.data['rois']) > 1:
-                outputs['rois'] = torch.cat(self.data['rois'], 0)
-            else:
-                outputs['rois'] = self.data['rois'][0]
-            # Return the classification prob.
-            outputs['cls_prob'] = self.softmax(cls_score)
-        return outputs
--- a/seetadet/modeling/fpn.py
+++ b/seetadet/modeling/fpn.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""FPN feature enhancer."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from seetadet.core.config import cfg
-from seetadet.modules import init
-from seetadet.modules import nn
-HIGHEST_BACKBONE_LVL = 5  # E.g., "conv5"-like level
-class FPN(nn.Module):
-    """Feature Pyramid Networks to enhance input features."""
-    def __init__(self, feature_dims):
-        super(FPN, self).__init__()
-        self.C = nn.ModuleList()
-        self.P = nn.ModuleList()
-        self.feature_dim = dim = cfg.FPN.DIM
-        self.highest_backbone_lvl = min(cfg.FPN.RPN_MAX_LEVEL, HIGHEST_BACKBONE_LVL)
-        for lvl in range(cfg.FPN.RPN_MIN_LEVEL, self.highest_backbone_lvl + 1):
-            self.C.append(nn.Conv1x1(feature_dims[lvl - 1], dim, bias=True))
-            self.P.append(nn.Conv3x3(dim, dim, bias=True))
-        if 'rcnn' in cfg.MODEL.TYPE:
-            self.apply_func = self.apply_rcnn
-            self.maxpool = nn.MaxPool2d(kernel_size=1, stride=2)
-        else:
-            self.apply_func = self.apply_generic
-            self.relu = nn.ReLU(inplace=False)
-            for lvl in range(self.highest_backbone_lvl + 1, cfg.FPN.RPN_MAX_LEVEL + 1):
-                dim_in = feature_dims[-1] if lvl == self.highest_backbone_lvl + 1 else dim
-                self.P.append(nn.Conv3x3(dim_in, dim, stride=2, bias=True))
-        self.coarsest_stride = cfg.MODEL.COARSEST_STRIDE
-        self.reset_parameters()
-    def reset_parameters(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                init.xavier_uniform(m.weight)
-                init.constant(m.bias, 0)
-    def apply_rcnn(self, features):
-        fpn_input = self.C[-1](features[-1])
-        min_lvl, max_lvl = cfg.FPN.RPN_MIN_LEVEL, cfg.FPN.RPN_MAX_LEVEL
-        outputs = [self.P[self.highest_backbone_lvl - min_lvl](fpn_input)]
-        # Apply max pool for higher features.
-        for i in range(self.highest_backbone_lvl + 1, max_lvl + 1):
-            outputs.append(self.maxpool(outputs[-1]))
-        # Build pyramids between [MIN_LEVEL, HIGHEST_LEVEL]
-        for i in range(self.highest_backbone_lvl - 1, min_lvl - 1, -1):
-            lateral_output = self.C[i - min_lvl](features[i - 1])
-            if self.coarsest_stride > 0:
-                upscale_output = nn.upsample(fpn_input, scale_factor=2)
-            else:
-                upscale_output = nn.upsample(fpn_input, size=lateral_output.shape[2:])
-            fpn_input = lateral_output.__iadd__(upscale_output)
-            outputs.insert(0, self.P[i - min_lvl](fpn_input))
-        return outputs
-    def apply_generic(self, features):
-        fpn_input = self.C[-1](features[-1])
-        min_lvl, max_lvl = cfg.FPN.RPN_MIN_LEVEL, cfg.FPN.RPN_MAX_LEVEL
-        outputs = [self.P[self.highest_backbone_lvl - min_lvl](fpn_input)]
-        # Add extra convolutions for higher features.
-        extra_input = features[-1]
-        for i in range(self.highest_backbone_lvl + 1, max_lvl + 1):
-            outputs.append(self.P[i - min_lvl](extra_input))
-            if i != max_lvl:
-                extra_input = self.relu(outputs[-1])
-        # Build pyramids between [MIN_LEVEL, HIGHEST_LEVEL]
-        for i in range(self.highest_backbone_lvl - 1, min_lvl - 1, -1):
-            lateral_output = self.C[i - min_lvl](features[i - 1])
-            if self.coarsest_stride > 0:
-                upscale_output = nn.upsample(fpn_input, scale_factor=2)
-            else:
-                upscale_output = nn.upsample(fpn_input, size=lateral_output.shape[2:])
-            fpn_input = lateral_output.__iadd__(upscale_output)
-            outputs.insert(0, self.P[i - min_lvl](fpn_input))
-        return outputs
-    def forward(self, features):
-        return self.apply_func(features)
--- a/seetadet/modeling/mask_rcnn.py
+++ b/seetadet/modeling/mask_rcnn.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""MaskRCNN head."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import functools
-import dragon.vm.torch as torch
-from seetadet.algo import mask_rcnn
-from seetadet.core.config import cfg
-from seetadet.modules import det
-from seetadet.modules import init
-from seetadet.modules import nn
-from seetadet.modules import vision
-class MaskRCNN(nn.Module):
-    r"""Generate mask regions for R-CNN series.
-    The pipeline is as follows:
-    ... -> BoxRoIs  \                          /-> cls_score -> cls_loss
-                     -> RoIFeatureXform -> MLP
-    ... -> Features /                          \-> bbox_pred -> bbox_loss
-    ... -> MaskRoIs \
-                     -> RoIFeatureXform -> FCN  -> mask_score -> mask_loss
-    ... -> Features /
-    """
-    def __init__(self, dim_in=256):
-        super(MaskRCNN, self).__init__()
-        self.data = {}
-        self.roi_head_dim = dim_in * (cfg.FRCNN.ROI_XFORM_RESOLUTION ** 2)
-        self.fc6 = nn.Linear(self.roi_head_dim, cfg.FRCNN.MLP_HEAD_DIM)
-        self.fc7 = nn.Linear(cfg.FRCNN.MLP_HEAD_DIM, cfg.FRCNN.MLP_HEAD_DIM)
-        self.fcn = nn.ModuleList([nn.Conv3x3(dim_in, dim_in, bias=True) for _ in range(4)])
-        self.fcn += [nn.ConvTranspose2d(dim_in, dim_in, 2, 2, 0)]
-        self.cls_score = nn.Linear(cfg.FRCNN.MLP_HEAD_DIM, len(cfg.MODEL.CLASSES))
-        self.bbox_pred = nn.Linear(cfg.FRCNN.MLP_HEAD_DIM, len(cfg.MODEL.CLASSES) * 4)
-        self.mask_score = nn.Conv1x1(dim_in, len(cfg.MODEL.CLASSES) - 1, bias=True)
-        self.rpn_decoder = det.RPNDecoder()
-        self.proposal = mask_rcnn.Proposal()
-        self.proposal_target = mask_rcnn.ProposalTarget()
-        self.sigmoid = nn.Sigmoid()
-        self.softmax = nn.Softmax(dim=1)
-        self.relu = nn.ReLU(True)
-        self.box_roi_feature = functools.partial({
-            'RoIPool': vision.roi_pool,
-            'RoIAlign': vision.roi_align,
-        }[cfg.FRCNN.ROI_XFORM_METHOD],
-            size=cfg.FRCNN.ROI_XFORM_RESOLUTION,
-            sampling_ratio=cfg.FRCNN.ROI_XFORM_SAMPLING_RATIO)
-        self.mask_roi_feature = functools.partial({
-            'RoIPool': vision.roi_pool,
-            'RoIAlign': vision.roi_align,
-        }[cfg.MRCNN.ROI_XFORM_METHOD],
-            size=cfg.MRCNN.ROI_XFORM_RESOLUTION,
-            sampling_ratio=cfg.MRCNN.ROI_XFORM_SAMPLING_RATIO)
-        self.cls_loss = nn.CrossEntropyLoss()
-        if cfg.FRCNN.BBOX_REG_LOSS_TYPE.lower() == 'l1':
-            self.bbox_loss = nn.L1Loss(reduction='sum')
-        else:
-            self.bbox_loss = nn.SmoothL1Loss(beta=1.0, reduction='sum')
-        self.mask_loss = nn.BCEWithLogitsLoss()
-        self.compute_mask_score = None
-        # Compute spatial scales according to strides.
-        self.spatial_scales = [
-            1. / (2 ** lvl)
-            for lvl in range(cfg.FPN.ROI_MIN_LEVEL,
-                             cfg.FPN.ROI_MAX_LEVEL + 1)]
-        self.reset_parameters()
-    def reset_parameters(self):
-        init.normal(self.cls_score.weight, std=0.01)
-        init.normal(self.bbox_pred.weight, std=0.001)
-        init.normal(self.mask_score.weight, std=0.001)
-        for m in self.fcn.modules():
-            if hasattr(m, 'weight'):
-                init.kaiming_normal(m.weight)
-        for name, param in self.named_parameters():
-            if 'bias' in name:
-                init.constant(param, 0)
-    def get_mask_score(self, features, rois):
-        roi_features = \
-            torch.cat([
-                self.mask_roi_feature(
-                    features[i], rois[i], spatial_scale,
-                ) for i, spatial_scale in enumerate(self.spatial_scales)
-            ], dim=0)
-        for i in range(len(self.fcn)):
-            roi_features = self.relu(self.fcn[i](roi_features))
-        return self.mask_score(roi_features).float()
-    def forward(self, **kwargs):
-        # Generate proposals.
-        proposal_func = self.proposal \
-            if self.training else self.rpn_decoder
-        self.data = {
-            'rois': proposal_func(
-                features=kwargs['features'],
-                cls_prob=self.sigmoid(kwargs['rpn_cls_score'].data),
-                bbox_pred=kwargs['rpn_bbox_pred'],
-                im_info=kwargs['im_info'],
-            )
-        }
-        # Generate targets from proposals.
-        if self.training:
-            self.data.update(
-                self.proposal_target(
-                    rois=self.data['rois'],
-                    gt_boxes=kwargs['gt_boxes'],
-                    gt_segms=kwargs['gt_segms'],
-                    im_info=kwargs['im_info'],
-                )
-            )
-        # Transform RoI features.
-        roi_features = \
-            torch.cat([
-                self.box_roi_feature(
-                    kwargs['features'][i],
-                    self.data['rois'][i],
-                    spatial_scale,
-                ) for i, spatial_scale in enumerate(self.spatial_scales)
-            ], dim=0)
-        # Apply a simple MLP.
-        roi_features = roi_features.view(-1, self.roi_head_dim)
-        roi_features = self.relu(self.fc6(roi_features))
-        roi_features = self.relu(self.fc7(roi_features))
-        # Compute logits and losses.
-        outputs = collections.OrderedDict()
-        cls_score = self.cls_score(roi_features).float()
-        outputs['bbox_pred'] = self.bbox_pred(roi_features).float()
-        if self.training:
-            # Compute the loss of bbox branch.
-            bbox_pred = outputs['bbox_pred'].view(0, -1, 4) \
-                .index_select((0, 1), self.data['bbox_inds'])
-            batch_size = roi_features.size(0)
-            bbox_loss_weight = cfg.FRCNN.BBOX_REG_LOSS_WEIGHT
-            bbox_loss_weight /= float(batch_size)
-            outputs.update(collections.OrderedDict([
-                ('cls_loss', self.cls_loss(
-                    cls_score,
-                    self.data['labels'])),
-                ('bbox_loss', self.bbox_loss(
-                    bbox_pred,
-                    self.data['bbox_targets']) * bbox_loss_weight),
-            ]))
-            # Compute the loss of mask branch.
-            mask_score = self.get_mask_score(
-                kwargs['features'], self.data['mask_rois'])
-            mask_score = mask_score \
-                .index_select((0, 1), self.data['mask_inds'])
-            outputs['mask_loss'] = self.mask_loss(
-                mask_score, self.data['mask_targets'])
-        else:
-            # Return the RoIs to decode the refine boxes.
-            if len(self.data['rois']) > 1:
-                outputs['rois'] = torch.cat(self.data['rois'], 0)
-            else:
-                outputs['rois'] = self.data['rois'][0]
-            # Return the classification prob.
-            outputs['cls_prob'] = self.softmax(cls_score)
-            # Set a callback to decode mask from refined RoIs.
-            self.compute_mask_score = functools.partial(
-                self.get_mask_score, features=kwargs['features'])
-        return outputs
--- a/seetadet/modeling/mobilenet_v2.py
+++ b/seetadet/modeling/mobilenet_v2.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""MobileNetV2 backbone."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import functools
-from seetadet.core import registry
-from seetadet.core.config import cfg
-from seetadet.modules import init
-from seetadet.modules import nn
-def make_divisible(v, divisor=8):
-    """Return the divisible value."""
-    min_value = divisor
-    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
-    if new_v < 0.9 * v:
-        new_v += divisor
-    return new_v
-def conv_triplet(dim_in, dim_out, kernel_size=1, stride=1):
-    """Return a convolution triplet."""
-    return [nn.Conv2d(dim_in, dim_out,
-                      kernel_size=kernel_size,
-                      stride=stride,
-                      padding=kernel_size // 2,
-                      bias=False),
-            nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_out),
-            nn.ReLU(True)]
-def conv_quintet(dim_in, dim_out, kernel_size, stride):
-    """Return a convolution quintet."""
-    return [nn.Conv2d(dim_in, dim_in,
-                      kernel_size=kernel_size,
-                      stride=stride,
-                      padding=kernel_size // 2,
-                      groups=dim_in,
-                      bias=False),
-            nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_in),
-            nn.ReLU(True),
-            nn.Conv2d(dim_in, dim_out, kernel_size=1, bias=False),
-            nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_out)]
-class InvertedResidual(nn.Module):
-    """Invert residual block."""
-    def __init__(self, dim_in, dim_out, kernel_size=3, expand_ratio=3, stride=1):
-        super(InvertedResidual, self).__init__()
-        self.stride = stride
-        self.apply_residual = stride == 1 and dim_in == dim_out
-        self.dim = dim = int(round(dim_in * expand_ratio))
-        self.endpoint = None  # Expansion feature
-        layers = []
-        if expand_ratio != 1:
-            layers.append(nn.Sequential(*conv_triplet(dim_in, dim)))
-        quintet = conv_quintet(dim, dim_out, kernel_size, stride)
-        layers.append(nn.Sequential(*quintet[:3]))
-        layers.extend(quintet[3:])
-        self.conv = nn.Sequential(*layers)
-    def forward(self, x):
-        out = self.conv[0](x)
-        self.endpoint = out if self.stride == 2 else None
-        for layer in self.conv[1:]:
-            out = layer(out)
-        if self.apply_residual:
-            out += x
-        return out
-class NASMobileNet(nn.Module):
-    """NAS variant of mobilenet class."""
-    def __init__(self, arch, preset, width_mult=1.0):
-        super(NASMobileNet, self).__init__()
-        # Hand-craft configurations
-        repeats, strides, out_channels, def_blocks = preset
-        assert sum(repeats) == len(arch), 'Bad architecture.'
-        self.feature_dims = collections.OrderedDict()
-        # Apply the width scaling.
-        out_channels = list(map(lambda x: make_divisible(x * width_mult),
-                                out_channels))
-        # Stem.
-        features = [nn.Sequential(
-            *conv_triplet(
-                dim_in=3,
-                dim_out=out_channels[0],
-                kernel_size=3,
-                stride=2,
-            ))]
-        # Blocks.
-        dim_in, dim_out = out_channels[:2]
-        features.append(InvertedResidual(dim_in, dim_out, 3, 1))
-        for repeat, dim_out, stride in \
-                zip(repeats, out_channels[2:], strides):
-            for i in range(repeat):
-                stride = stride if i == 0 else 1
-                block = def_blocks[arch[len(features) - 2]]
-                features.append(block(dim_in, dim_out, stride=stride))
-                dim_in = dim_out
-                if stride == 2:
-                    self.feature_dims[id(features[-1])] = features[-1].dim
-        features.append(nn.Sequential(*conv_triplet(dim_in, out_channels[-1])))
-        self.feature_dims[id(features[-1])] = out_channels[-1]
-        self.features = nn.Sequential(*features)
-        self.reset_parameters()
-    def reset_parameters(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                init.kaiming_normal(m.weight, mode='fan_out')
-    def forward(self, x):
-        outputs = []
-        for layer in self.features:
-            x = layer(x)
-            if self.feature_dims.get(id(layer)):
-                outputs.append(getattr(layer, 'endpoint', x))
-        return outputs
-class ModelSetting(object):
-    """Hand-craft model setting."""
-    # Default NASBlocks definition.
-    # See ProxyLessNAS (arxiv.1812.00332) for details.
-    DEFAULT_NAS_BLOCKS_DEF = {
-        0: functools.partial(InvertedResidual, kernel_size=3, expand_ratio=3),
-        1: functools.partial(InvertedResidual, kernel_size=3, expand_ratio=6),
-        2: functools.partial(InvertedResidual, kernel_size=5, expand_ratio=3),
-        3: functools.partial(InvertedResidual, kernel_size=5, expand_ratio=6),
-        4: functools.partial(InvertedResidual, kernel_size=7, expand_ratio=3),
-        5: functools.partial(InvertedResidual, kernel_size=7, expand_ratio=6),
-        6: nn.Identity,
-    }
-    V2 = (
-        [2, 3, 4, 3, 3, 1],
-        [2, 2, 2, 1, 2, 1],
-        [32, 16, 24, 32, 64, 96, 160, 320, 1280],
-        DEFAULT_NAS_BLOCKS_DEF,
-    )
-    PROXYLESS_MOBILE = (
-        [4, 4, 4, 4, 4, 1],
-        [2, 2, 2, 1, 2, 1],
-        [32, 16, 32, 40, 80, 96, 192, 320, 1280],
-        DEFAULT_NAS_BLOCKS_DEF,
-    )
-    PROXYLESS_GPU = (
-        [4, 4, 4, 4, 4, 1],
-        [2, 2, 2, 1, 2, 1],
-        [40, 24, 32, 56, 112, 128, 256, 432, 1280],
-        DEFAULT_NAS_BLOCKS_DEF,
-    )
-@registry.backbone.register('mobilenet_v2')
-def mobilenet_v2():
-    return NASMobileNet([1, 1,
-                         1, 1, 1,
-                         1, 1, 1, 1,
-                         1, 1, 1,
-                         1, 1, 1,
-                         1], ModelSetting.V2)
-@registry.backbone.register('proxyless_mobile')
-def proxyless_mobile():
-    return NASMobileNet([2, 0, 6, 6,
-                         4, 0, 2, 2,
-                         5, 2, 2, 2,
-                         3, 2, 2, 2,
-                         5, 5, 4, 4,
-                         5], ModelSetting.PROXYLESS_MOBILE)
--- a/seetadet/modeling/mobilenet_v3.py
+++ b/seetadet/modeling/mobilenet_v3.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""MobileNetV3 backbone."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import functools
-from seetadet.core import registry
-from seetadet.core.config import cfg
-from seetadet.modules import init
-from seetadet.modules import nn
-def make_divisible(v, divisor=8):
-    """Return the divisible value."""
-    min_value = divisor
-    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
-    if new_v < 0.9 * v:
-        new_v += divisor
-    return new_v
-def conv_triplet(dim_in, dim_out, kernel_size=1, stride=1, activation=None):
-    """Return a convolution triplet."""
-    return [nn.Conv2d(dim_in, dim_out,
-                      kernel_size=kernel_size,
-                      stride=stride,
-                      padding=kernel_size // 2,
-                      bias=False),
-            nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_out),
-            nn.ReLU(True) if activation is None else activation]
-def conv_quintet(
-    dim_in,
-    dim_out,
-    kernel_size,
-    stride,
-    activation=None,
-    expansion_transform=None,
-):
-    """Return a convolution quintet."""
-    layers = [nn.Conv2d(dim_in, dim_in,
-                        kernel_size=kernel_size,
-                        stride=stride,
-                        padding=kernel_size // 2,
-                        groups=dim_in,
-                        bias=False),
-              nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_in),
-              nn.ReLU(True) if activation is None else activation]
-    if expansion_transform is not None:
-        layers += [expansion_transform]
-    layers += [nn.Conv2d(dim_in, dim_out, kernel_size=1, bias=False),
-               nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_out)]
-    return layers
-class SqueezeExcite(nn.Module):
-    """Squeeze-excite attention module."""
-    def __init__(self, dim_in, squeeze_ratio=0.25):
-        super(SqueezeExcite, self).__init__()
-        dim = make_divisible(dim_in * squeeze_ratio)
-        self.layers = nn.Sequential(nn.AvgPool2d(-1, global_pooling=True),
-                                    nn.Conv2d(dim_in, dim, kernel_size=1),
-                                    nn.ReLU(True),
-                                    nn.Conv2d(dim, dim_in, kernel_size=1),
-                                    nn.Hardsigmoid(True))
-    def forward(self, x):
-        return x * self.layers(x)
-class InvertedResidual(nn.Module):
-    """Invert residual block."""
-    def __init__(
-        self,
-        dim_in,
-        dim_out,
-        kernel_size=3,
-        expand_ratio=3,
-        stride=1,
-        activation=None,
-        squeeze_excite=0,
-    ):
-        super(InvertedResidual, self).__init__()
-        self.stride = stride
-        self.apply_residual = stride == 1 and dim_in == dim_out
-        self.dim = dim = int(round(dim_in * expand_ratio))
-        self.endpoint = None  # Expansion feature
-        layers = []
-        if expand_ratio != 1:
-            layers.append(nn.Sequential(*conv_triplet(
-                dim_in, dim, activation=activation)))
-        expansion_transform = None
-        if squeeze_excite > 0:
-            expansion_transform = SqueezeExcite(dim)
-        quintet = conv_quintet(dim, dim_out,
-                               kernel_size=kernel_size,
-                               stride=stride,
-                               activation=activation,
-                               expansion_transform=expansion_transform)
-        layers.append(nn.Sequential(*quintet[:3]))
-        layers.extend(quintet[3:])
-        self.conv = nn.Sequential(*layers)
-    def forward(self, x):
-        out = self.conv[0](x)
-        self.endpoint = out if self.stride == 2 else None
-        for layer in self.conv[1:]:
-            out = layer(out)
-        if self.apply_residual:
-            out += x
-        return out
-class NASMobileNet(nn.Module):
-    """NAS variant of mobilenet class."""
-    def __init__(self, arch, preset, width_mult=1.0):
-        super(NASMobileNet, self).__init__()
-        # Hand-craft configurations.
-        repeats, strides, out_channels, def_blocks = preset
-        assert sum(repeats) == len(arch), 'Bad architecture.'
-        self.feature_dims = collections.OrderedDict()
-        # Apply the width scaling.
-        out_channels = list(map(lambda x: make_divisible(x * width_mult),
-                                out_channels))
-        # Stem.
-        features = [nn.Sequential(
-            *conv_triplet(
-                dim_in=3,
-                dim_out=out_channels[0],
-                kernel_size=3,
-                stride=2,
-                activation=nn.Hardswish(),
-            ))]
-        # Blocks.
-        dim_in, stride_out = out_channels[0], 2
-        for repeat, dim_out, stride in \
-                zip(repeats, out_channels[1:], strides):
-            stride_out *= stride
-            for i in range(repeat):
-                stride = stride if i == 0 else 1
-                idx = arch[len(features) - 1]
-                if def_blocks is None:
-                    block = functools.partial(
-                        InvertedResidual,
-                        kernel_size=(idx // 100) % 10,
-                        expand_ratio=int(idx / 1000.) / 10,
-                        squeeze_excite=idx % 10)
-                else:
-                    block = def_blocks[idx]
-                features.append(block(
-                    dim_in, dim_out,
-                    stride=stride,
-                    activation=nn.Hardswish()
-                    if stride_out > 8 else nn.ReLU(True)))
-                dim_in = dim_out
-                if stride == 2:
-                    self.feature_dims[id(features[-1])] = features[-1].dim
-        features.append(nn.Sequential(
-            *conv_triplet(
-                dim_in=dim_in,
-                dim_out=out_channels[-1],
-                kernel_size=1,
-                stride=1,
-                activation=nn.Hardswish())))
-        self.feature_dims[id(features[-1])] = out_channels[-1]
-        self.features = nn.Sequential(*features)
-        self.reset_parameters()
-    def reset_parameters(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                init.kaiming_normal(m.weight, mode='fan_out')
-    def forward(self, x):
-        outputs = []
-        for i, layer in enumerate(self.features):
-            x = layer(x)
-            if self.feature_dims.get(id(layer)):
-                outputs.append(getattr(layer, 'endpoint', x))
-        return outputs
-class ModelSetting(object):
-    """Hand-craft model setting."""
-    # Default NASBlocks definition.
-    # We use the following hash method:
-    # ef * 10000 + kernel_size * 100 + se * 1
-    # e.g., ef=4.0, ks=3, se=True, with index 40301
-    DEFAULT_NAS_BLOCKS_DEF = None
-    V3 = (
-        [1, 2, 3, 4, 2, 3],
-        [1, 2, 2, 2, 1, 2],
-        [16, 16, 24, 40, 80, 112, 160, 960],
-        DEFAULT_NAS_BLOCKS_DEF,
-    )
-@registry.backbone.register('mobilenet_v3')
-def mobilenet_v3():
-    return NASMobileNet([10300,
-                         40300, 30300,
-                         30501, 30501, 30501,
-                         60300, 25300, 23300, 23300,
-                         60301, 60301,
-                         60501, 60501, 60501], ModelSetting.V3)
--- a/seetadet/modeling/resnet.py
+++ b/seetadet/modeling/resnet.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""ResNet backbone."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-from seetadet.core import registry
-from seetadet.core.config import cfg
-from seetadet.modules import nn
-from seetadet.modules import init
-from seetadet.utils import env
-class BasicBlock(nn.Module):
-    """The basic resnet block."""
-    expansion = 1
-    def __init__(self, dim_in, dim, stride=1, downsample=None):
-        super(BasicBlock, self).__init__()
-        norm = cfg.MODEL.BACKBONE_NORM
-        self.conv1 = nn.Conv3x3(dim_in, dim, stride)
-        self.bn1 = nn.get_norm(norm, dim)
-        self.relu = nn.ReLU(inplace=True)
-        self.conv2 = nn.Conv3x3(dim, dim)
-        self.bn2 = nn.get_norm(norm, dim)
-        self.downsample = downsample
-    def forward(self, x):
-        identity = x
-        out = self.conv1(x)
-        out = self.bn1(out)
-        out = self.relu(out)
-        out = self.conv2(out)
-        out = self.bn2(out)
-        if self.downsample is not None:
-            identity = self.downsample(x)
-        out += identity
-        out = self.relu(out)
-        return out
-class Bottleneck(nn.Module):
-    """The bottleneck resnet block."""
-    expansion = 4
-    def __init__(self, dim_in, dim, stride=1, downsample=None):
-        super(Bottleneck, self).__init__()
-        groups = cfg.RESNET.NUM_GROUPS
-        width_per_group = cfg.RESNET.WIDTH_PER_GROUP
-        norm = cfg.MODEL.BACKBONE_NORM
-        width = int(dim * (width_per_group / 64.)) * groups
-        self.conv1 = nn.Conv1x1(dim_in, width)
-        self.bn1 = nn.get_norm(norm, width)
-        self.conv2 = nn.Conv3x3(width, width, stride=stride)
-        self.bn2 = nn.get_norm(norm, width)
-        self.conv3 = nn.Conv1x1(width, dim * self.expansion)
-        self.bn3 = nn.get_norm(norm, dim * self.expansion)
-        self.relu = nn.ReLU(inplace=True)
-        self.downsample = downsample
-    def forward(self, x):
-        identity = x
-        out = self.conv1(x)
-        out = self.bn1(out)
-        out = self.relu(out)
-        out = self.conv2(out)
-        out = self.bn2(out)
-        out = self.relu(out)
-        out = self.conv3(out)
-        out = self.bn3(out)
-        if self.downsample is not None:
-            identity = self.downsample(x)
-        out += identity
-        out = self.relu(out)
-        return out
-class ResNet(nn.Module):
-    """The resnet class."""
-    def __init__(self, block, layers):
-        super(ResNet, self).__init__()
-        dim_in, dims, features = 64, [64, 128, 256, 512], []
-        self.conv1 = nn.Conv2d(3, dim_in, kernel_size=7,
-                               stride=2, padding=3, bias=False)
-        self.bn1 = nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_in)
-        self.relu = nn.ReLU(inplace=True)
-        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
-        self.feature_dims = collections.OrderedDict(stem=64)
-        for i, repeat, dim in zip(range(4), layers, dims):
-            stride = 1 if i == 0 else 2
-            downsample = None
-            if stride != 1 or dim_in != dim * block.expansion:
-                downsample = nn.Sequential(
-                    nn.Conv1x1(dim_in, dim * block.expansion, stride=stride),
-                    nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim * block.expansion))
-            features.append(block(dim_in, dim, stride, downsample))
-            dim_in = dim * block.expansion
-            for j in range(repeat - 1):
-                features.append(block(dim_in, dim))
-            setattr(self, 'layer%d' % (i + 1), nn.Sequential(*features[-repeat:]))
-            self.feature_dims[id(features[-1])] = dim_in
-        self.features = features
-        self.last_outputs = None
-        self.reset_parameters()
-    def reset_parameters(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                init.kaiming_normal(m.weight, mode='fan_out')
-        if cfg.MODEL.FREEZE_AT > 0:
-            self.conv1.apply(env.freeze_module)
-            self.bn1.apply(env.freeze_module)
-        for i in range(cfg.MODEL.FREEZE_AT, 1, -1):
-            getattr(self, 'layer{}'.format(i - 1)).apply(env.freeze_module)
-    def forward(self, x):
-        x = self.conv1(x)
-        x = self.bn1(x)
-        x = self.relu(x)
-        x = self.maxpool(x)
-        outputs = [None]
-        for layer in self.features:
-            x = layer(x)
-            if self.feature_dims.get(id(layer)):
-                outputs.append(x)
-        if self.training:
-            self.last_outputs = outputs
-        return outputs
-def resnet(depth):
-    if depth == 18:
-        layers = [2, 2, 2, 2]
-    elif depth == 34:
-        layers = [3, 4, 6, 3]
-    elif depth == 50:
-        layers = [3, 4, 6, 3]
-    elif depth == 101:
-        layers = [3, 4, 23, 3]
-    elif depth == 152:
-        layers = [3, 8, 36, 3]
-    elif depth == 200:
-        layers = [3, 24, 36, 3]
-    elif depth == 269:
-        layers = [3, 30, 48, 8]
-    else:
-        raise ValueError('Unsupported depth: %d' % depth)
-    block = Bottleneck if depth >= 50 else BasicBlock
-    return ResNet(block, layers)
-registry.backbone.register(['res50', 'resnet50', 'resnet_50'],
-                           func=resnet, depth=50)
-registry.backbone.register(['res101', 'resnet101', 'resnet_101'],
-                           func=resnet, depth=101)
--- a/seetadet/modeling/retinanet.py
+++ b/seetadet/modeling/retinanet.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""RetinaNet head."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import math
-import dragon.vm.torch as torch
-from seetadet.algo import retinanet
-from seetadet.core.config import cfg
-from seetadet.modules import det
-from seetadet.modules import init
-from seetadet.modules import nn
-from seetadet.utils import stats
-class RetinaNet(nn.Module):
-    def __init__(self, dim_in=256):
-        super(RetinaNet, self).__init__()
-        self.data = dict()
-        ########################################
-        #           RetinaNet outputs          #
-        ########################################
-        self.cls_conv = nn.ModuleList(
-            nn.Conv3x3(dim_in, dim_in, bias=True)
-            for _ in range(cfg.RETINANET.NUM_CONVS)
-        )
-        self.bbox_conv = nn.ModuleList(
-            nn.Conv3x3(dim_in, dim_in, bias=True)
-            for _ in range(cfg.RETINANET.NUM_CONVS)
-        )
-        self.cls_dim = len(cfg.MODEL.CLASSES) - 1
-        anchor_dim = (len(cfg.RETINANET.ASPECT_RATIOS) *
-                      cfg.RETINANET.SCALES_PER_OCTAVE)
-        self.cls_score = nn.Conv3x3(dim_in, self.cls_dim * anchor_dim, bias=True)
-        self.bbox_pred = nn.Conv3x3(dim_in, 4 * anchor_dim, bias=True)
-        self.cls_prob = nn.Sigmoid(inplace=True)
-        self.relu = nn.ReLU(inplace=True)
-        self.decoder = det.RetinaNetDecoder()
-        ########################################
-        #           RetinaNet losses           #
-        ########################################
-        self.anchor_target = retinanet.AnchorTarget()
-        self.cls_loss = nn.SigmoidFocalLoss()
-        if cfg.RETINANET.BBOX_REG_LOSS_TYPE.lower() == 'l1':
-            self.bbox_loss = nn.L1Loss(reduction='sum')
-        elif cfg.RETINANET.BBOX_REG_LOSS_TYPE.lower() == 'giou':
-            self.bbox_loss = nn.GIoULoss(reduction='sum')
-        else:
-            self.bbox_loss = nn.SmoothL1Loss(beta=0.1, reduction='sum')
-        self.normalizer = stats.ExponentialMovingAverage(decay=0.9)
-        self.reset_parameters()
-    def reset_parameters(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                init.normal(m.weight, std=0.01)
-                init.constant(m.bias, 0)
-        # Bias prior initialization for Focal Loss.
-        # For details, See the official codes:
-        # https://github.com/facebookresearch/Detectron
-        bias_init = -math.log((1 - cfg.PRIOR_PROB) / cfg.PRIOR_PROB)
-        self.cls_score.bias.fill_(bias_init)
-    def compute_outputs(self, features):
-        """Compute RetinaNet logits."""
-        cls_score_wide, bbox_pred_wide = [], []
-        for j, feature in enumerate(features):
-            cls_input, bbox_input = feature, feature
-            for i in range(cfg.RETINANET.NUM_CONVS):
-                cls_input = self.relu(self.cls_conv[i](cls_input))
-                bbox_input = self.relu(self.bbox_conv[i](bbox_input))
-            cls_score_wide.append(self.cls_score(cls_input).view(0, self.cls_dim, - 1))
-            bbox_pred_wide.append(self.bbox_pred(bbox_input).view(0, 4, -1))
-        if len(features) > 1:
-            return (torch.cat(cls_score_wide, dim=2),
-                    torch.cat(bbox_pred_wide, dim=2))
-        else:
-            return cls_score_wide[0], bbox_pred_wide[0]
-    def compute_losses(self, **inputs):
-        """Compute RetinaNet classification and regression loss."""
-        self.data = self.anchor_target(**inputs)
-        bbox_pred = inputs['bbox_pred'].permute(0, 2, 1) \
-            .index_select((0, 1), self.data['bbox_inds'])
-        self.normalizer.add_value(self.data['bbox_inds'].size(0))
-        cls_loss_weight = 1.0 / self.normalizer.running_average()
-        bbox_loss_weight = (cfg.RETINANET.BBOX_REG_LOSS_WEIGHT /
-                            self.normalizer.running_average())
-        outputs = collections.OrderedDict([
-            ('cls_loss', self.cls_loss(
-                inputs['cls_score'],
-                self.data['labels'],) * cls_loss_weight),
-            ('bbox_loss', self.bbox_loss(
-                bbox_pred,
-                self.data['bbox_targets'],
-                self.data['bbox_anchors']) * bbox_loss_weight)])
-        return outputs
-    def forward(self, **kwargs):
-        cls_score, bbox_pred = self.compute_outputs(kwargs['features'])
-        cls_score, bbox_pred = cls_score.float(), bbox_pred.float()
-        outputs = collections.OrderedDict([('bbox_pred', bbox_pred)])
-        if self.training:
-            outputs.update(
-                self.compute_losses(
-                    features=kwargs['features'],
-                    cls_score=cls_score,
-                    bbox_pred=bbox_pred,
-                    fg_inds=kwargs['fg_inds'],
-                    bg_inds=kwargs['bg_inds'],
-                    gt_boxes=kwargs['gt_boxes'],
-                )
-            )
-        else:
-            outputs['detections'] = self.decoder(
-                kwargs['features'],
-                self.cls_prob(cls_score).permute(0, 2, 1),
-                bbox_pred,
-                kwargs['im_info'],
-            )
-        return outputs
--- a/seetadet/modeling/rpn.py
+++ b/seetadet/modeling/rpn.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""RPN head."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import dragon.vm.torch as torch
-from seetadet.algo import faster_rcnn
-from seetadet.core.config import cfg
-from seetadet.modules import init
-from seetadet.modules import nn
-class RPN(nn.Module):
-    """Region Proposal Networks for R-CNN series."""
-    def __init__(self, dim_in=256):
-        super(RPN, self).__init__()
-        self.data = {}
-        ##################################
-        #           RPN outputs          #
-        ##################################
-        num_anchors = len(cfg.RPN.ASPECT_RATIOS) * (
-            len(cfg.RPN.SCALES) if len(cfg.RPN.STRIDES) == 1 else 1)
-        self.output = nn.Conv3x3(dim_in, dim_in, bias=True)
-        self.cls_score = nn.Conv1x1(dim_in, num_anchors, bias=True)
-        self.bbox_pred = nn.Conv1x1(dim_in, num_anchors * 4, bias=True)
-        self.relu = nn.ReLU(inplace=True)
-        ##################################
-        #            RPN losses          #
-        ##################################
-        self.anchor_target = faster_rcnn.AnchorTarget()
-        self.cls_loss = nn.BCEWithLogitsLoss(reduction='mean')
-        if cfg.RPN.BBOX_REG_LOSS_TYPE.lower() == 'l1':
-            self.bbox_loss = nn.L1Loss(reduction='sum')
-        else:
-            self.bbox_loss = nn.SmoothL1Loss(beta=0.1, reduction='sum')
-        self.reset_parameters()
-    def reset_parameters(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                init.normal(m.weight, std=0.01)
-                init.constant(m.bias, 0)
-    def compute_outputs(self, features):
-        """Compute the RPN logits."""
-        cls_score_wide, bbox_pred_wide = [], []
-        for i, feature in enumerate(features):
-            x = self.relu(self.output(feature))
-            cls_score_wide.append(self.cls_score(x).view(0, -1))
-            bbox_pred_wide.append(self.bbox_pred(x).view(0, 4, -1))
-        if len(features) > 1:
-            return (torch.cat(cls_score_wide, dim=1),
-                    torch.cat(bbox_pred_wide, dim=2))
-        else:
-            return cls_score_wide[0], bbox_pred_wide[0]
-    def compute_losses(self, **inputs):
-        """Compute the RPN classification loss and regression loss."""
-        self.data = self.anchor_target(**inputs)
-        cls_score = inputs['cls_score'] \
-            .index_select((0, 1), self.data['cls_inds'])
-        bbox_pred = inputs['bbox_pred'].permute(0, 2, 1) \
-            .index_select((0, 1), self.data['bbox_inds'])
-        batch_size = cfg.RPN.BATCH_SIZE * cfg.TRAIN.IMS_PER_BATCH
-        bbox_loss_weight = cfg.RPN.BBOX_REG_LOSS_WEIGHT / float(batch_size)
-        return collections.OrderedDict([
-            ('rpn_cls_loss', self.cls_loss(
-                cls_score,
-                self.data['labels'])),
-            ('rpn_bbox_loss', self.bbox_loss(
-                bbox_pred,
-                self.data['bbox_targets'],
-                self.data['bbox_anchors']) * bbox_loss_weight),
-        ])
-    def forward(self, **kwargs):
-        cls_score, bbox_pred = \
-            self.compute_outputs(kwargs['features'])
-        outputs = collections.OrderedDict([
-            ('rpn_cls_score', cls_score.float()),
-            ('rpn_bbox_pred', bbox_pred.float()),
-        ])
-        if self.training:
-            outputs.update(
-                self.compute_losses(
-                    features=kwargs['features'],
-                    cls_score=outputs['rpn_cls_score'],
-                    bbox_pred=outputs['rpn_bbox_pred'],
-                    fg_inds=kwargs['fg_inds'],
-                    bg_inds=kwargs['bg_inds'],
-                    gt_boxes=kwargs['gt_boxes'],
-                )
-            )
-        return outputs
--- a/seetadet/modeling/ssd.py
+++ b/seetadet/modeling/ssd.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""SSD head."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import dragon.vm.torch as torch
-from seetadet.algo import ssd
-from seetadet.core.config import cfg
-from seetadet.modules import init
-from seetadet.modules import nn
-from seetadet.utils import stats
-class SSD(nn.Module):
-    def __init__(self, feature_dims):
-        super(SSD, self).__init__()
-        self.data = {}
-        ########################################
-        #             SSD outputs              #
-        ########################################
-        self.cls_conv = torch.nn.ModuleList(
-            nn.Conv3x3(feature_dims[0], feature_dims[0], bias=True)
-            for _ in range(cfg.SSD.NUM_CONVS))
-        self.bbox_conv = torch.nn.ModuleList(
-            nn.Conv3x3(feature_dims[0], feature_dims[0], bias=True)
-            for _ in range(cfg.SSD.NUM_CONVS))
-        self.cls_score = nn.ModuleList()
-        self.bbox_pred = nn.ModuleList()
-        self.softmax = nn.Softmax(dim=2)
-        self.relu = nn.ReLU(inplace=True)
-        self.box_dim = len(cfg.BBOX_REG_WEIGHTS)
-        if len(feature_dims) != len(cfg.SSD.STRIDES):
-            # FPN case, all strides share the same feature dim
-            feature_dims = [feature_dims[0]] * len(cfg.SSD.STRIDES)
-        for i, dim in enumerate(feature_dims):
-            ratios = cfg.SSD.ASPECT_RATIOS[i]
-            if not isinstance(ratios, (tuple, list)):
-                # Legacy case, All strides share the same ratios
-                ratios = cfg.SSD.ASPECT_RATIOS
-            nc, na = len(cfg.MODEL.CLASSES), len(ratios) + 1
-            self.cls_score.append(nn.Conv3x3(dim, na * nc, bias=True))
-            self.bbox_pred.append(nn.Conv3x3(dim, na * self.box_dim, bias=True))
-        ########################################
-        #              SSD losses              #
-        ########################################
-        self.anchor_target = ssd.AnchorTarget()
-        self.cls_loss = nn.CrossEntropyLoss(reduction='sum')
-        if cfg.SSD.BBOX_REG_LOSS_TYPE.lower() == 'l1':
-            self.bbox_loss = nn.L1Loss(reduction='sum')
-        elif cfg.SSD.BBOX_REG_LOSS_TYPE.lower() == 'giou':
-            self.bbox_loss = nn.GIoULoss(
-                reduction='sum', delta_weights=cfg.BBOX_REG_WEIGHTS)
-        else:
-            self.bbox_loss = nn.SmoothL1Loss(beta=1.0, reduction='sum')
-        self.normalizer = stats.ExponentialMovingAverage(decay=0.9)
-        self.reset_parameters()
-    def reset_parameters(self):
-        if cfg.SSD.NUM_CONVS > 0:
-            for m in self.modules():
-                if isinstance(m, nn.Conv2d):
-                    init.normal(m.weight, std=0.01)
-                    init.constant(m.bias, 0)
-        else:
-            for m in self.modules():
-                if isinstance(m, nn.Conv2d):
-                    init.normal(m.weight, std=0.001)
-                    init.constant(m.bias, 0)
-    def compute_outputs(self, features):
-        """Compute SSD logits."""
-        cls_score_wide, bbox_pred_wide = [], []
-        for i, feature in enumerate(features):
-            cls_input, bbox_input = feature, feature
-            for j in range(cfg.SSD.NUM_CONVS):
-                cls_input = self.relu(self.cls_conv[j](cls_input))
-                bbox_input = self.relu(self.bbox_conv[j](bbox_input))
-            cls_score_wide.append(
-                self.cls_score[i](cls_input)
-                    .permute(0, 2, 3, 1).view(0, -1))
-            bbox_pred_wide.append(
-                self.bbox_pred[i](bbox_input)
-                    .permute(0, 2, 3, 1).view(0, -1))
-        return (torch.cat(cls_score_wide, dim=1)
-                     .view(0, -1, len(cfg.MODEL.CLASSES)),
-                torch.cat(bbox_pred_wide, dim=1)
-                     .view(0, -1, self.box_dim))
-    def compute_losses(self, **inputs):
-        """Compute tSSD classification and regression loss."""
-        self.data = self.anchor_target(**inputs)
-        bbox_pred = inputs['bbox_pred'] \
-            .index_select((0, 1), self.data['bbox_inds'])
-        self.normalizer.add_value(self.data['bbox_inds'].size(0))
-        cls_loss_weight = 1.0 / self.normalizer.running_average()
-        bbox_loss_weight = (cfg.SSD.BBOX_REG_LOSS_WEIGHT /
-                            self.normalizer.running_average())
-        return collections.OrderedDict([
-            ('cls_loss', self.cls_loss(
-                inputs['cls_score'].view(-1, len(cfg.MODEL.CLASSES)),
-                self.data['labels']) * cls_loss_weight),
-            ('bbox_loss', self.bbox_loss(
-                bbox_pred,
-                self.data['bbox_targets'],
-                self.data['bbox_anchors']) * bbox_loss_weight)
-        ])
-    def forward(self, **kwargs):
-        cls_score, bbox_pred = self.compute_outputs(kwargs['features'])
-        cls_score, bbox_pred = cls_score.float(), bbox_pred.float()
-        if cls_score.size(1) != self.anchor_target.all_anchors.shape[0]:
-            raise ValueError('Misalignment between default anchors and features.\n'
-                             'Specify correct <SSD.STRIDES> to avoid this problem.')
-        outputs = collections.OrderedDict([
-            ('bbox_pred', bbox_pred),
-            ('prior_boxes', self.anchor_target.all_anchors),
-        ])
-        if self.training:
-            outputs.update(
-                self.compute_losses(
-                    cls_score=cls_score,
-                    bbox_pred=bbox_pred,
-                    cls_prob=self.softmax(cls_score.data),
-                    fg_inds=kwargs['fg_inds'],
-                    bg_inds=kwargs['bg_inds'],
-                    gt_boxes=kwargs['gt_boxes'],
-                )
-            )
-        else:
-            outputs['cls_prob'] = self.softmax(cls_score)
-        return outputs
--- a/seetadet/modeling/vgg.py
+++ b/seetadet/modeling/vgg.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""VGGNet backbone."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-from seetadet.core import registry
-from seetadet.modules import init
-from seetadet.modules import nn
-class VGG(nn.Module):
-    """The VGG net class."""
-    def __init__(self, model_cfg, extra_cfg=None):
-        super(VGG, self).__init__()
-        layers, features, dim_in = [], [], 3
-        self.feature_dims = collections.OrderedDict()
-        self.feature_norms = nn.ModuleList()
-        for v in model_cfg:
-            if v == 'M':
-                features.append(nn.Sequential(*layers))
-                if extra_cfg and len(features) == 5:
-                    layers = [nn.MaxPool2d(kernel_size=3, padding=1)]
-                else:
-                    layers = [nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)]
-                if len(features) > 1:
-                    self.feature_dims[id(features[-1])] = dim_in
-                if extra_cfg and len(features) == 4:
-                    self.feature_norms.append(nn.L2Normalize(dim_in, init=20.))
-            else:
-                conv2d = nn.Conv2d(dim_in, v, kernel_size=3, padding=1)
-                layers += [conv2d, nn.ReLU(inplace=True)]
-                dim_in = v
-        if extra_cfg:
-            lowest_lvl = id(features[3])
-            self.feature_dims = collections.OrderedDict(
-                [(lowest_lvl, self.feature_dims[lowest_lvl])])
-            layers += [nn.Conv2d(dim_in, 1024, kernel_size=3, padding=6, dilation=6)]
-            layers += [nn.ReLU(inplace=True)]
-            layers += [nn.Conv2d(1024, 1024, kernel_size=1)]
-            layers += [nn.ReLU(inplace=True)]
-            features.append(nn.Sequential(*layers))
-            self.feature_dims[id(features[-1])] = dim_in = 1024
-            for c, (k, s, p) in extra_cfg:
-                features.append(nn.Sequential(
-                    nn.Conv2d(dim_in, c, kernel_size=1),
-                    nn.ReLU(inplace=True),
-                    nn.Conv2d(c, c * 2, kernel_size=k, stride=s, padding=p),
-                    nn.ReLU(inplace=True),
-                ))
-                self.feature_dims[id(features[-1])] = dim_in = c * 2
-        self.features = nn.Sequential(*features)
-        self.last_outputs = None
-        self.reset_parameters()
-    def reset_parameters(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                init.xavier_uniform(m.weight)
-                init.constant(m.bias, 0)
-    def forward(self, x):
-        outputs = []
-        for layer in self.features:
-            x = layer(x)
-            if self.feature_dims.get(id(layer)):
-                outputs.append(x)
-        for i, norm_layer in enumerate(self.feature_norms):
-            outputs[i] = norm_layer(outputs[i])
-        if self.training:
-            self.last_outputs = outputs
-        return outputs
-def vgg16(extra_cfg=None):
-    model_cfg = [64, 64, 'M',
-                 128, 128, 'M',
-                 256, 256, 256, 'M',
-                 512, 512, 512, 'M',
-                 512, 512, 512, 'M']
-    return VGG(model_cfg, extra_cfg)
-def vgg16_reduced(scale=300):
-    if scale == 300:
-        extra_cfg = [(256, (3, 2, 1)),
-                     (128, (3, 2, 1)),
-                     (128, (3, 1, 0)),
-                     (128, (3, 1, 0))]
-    elif scale == 512:
-        extra_cfg = [(256, (3, 2, 1)),
-                     (128, (3, 2, 1)),
-                     (128, (3, 2, 1)),
-                     (128, (3, 2, 1)),
-                     (128, (4, 1, 1))]
-    else:
-        raise ValueError('Unsupported scale: {}'.format(scale))
-    return vgg16(extra_cfg)
-registry.backbone.register('vgg16_reduced_300', vgg16_reduced, scale=300)
-registry.backbone.register('vgg16_reduced_512', vgg16_reduced, scale=512)
--- a/seetadet/modeling/__init__.py
+++ b/seetadet/modeling/__init__.py
@@ -8,25 +8,15 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
-"""Modeling utilities."""
+"""Models."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-# Backbone
+# Modules.
-import seetadet.modeling.airnet
+from seetadet.models import backbones
-import seetadet.modeling.mobilenet_v2
+from seetadet.models import decoders
-import seetadet.modeling.mobilenet_v3
+from seetadet.models import dense_heads
-import seetadet.modeling.resnet
+from seetadet.models import detectors
-import seetadet.modeling.vgg
+from seetadet.models import roi_heads
-# FeatureEnhancer
-from seetadet.modeling.fpn import FPN
-# RoIHead
-from seetadet.modeling.fast_rcnn import FastRCNN
-from seetadet.modeling.mask_rcnn import MaskRCNN
-from seetadet.modeling.retinanet import RetinaNet
-from seetadet.modeling.rpn import RPN
-from seetadet.modeling.ssd import SSD
--- a/seetadet/models/backbones/__init__.py
+++ b/seetadet/models/backbones/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Backbones."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+# Modules
+from seetadet.models.backbones import airnet
+from seetadet.models.backbones import bifpn
+from seetadet.models.backbones import efficientnet
+from seetadet.models.backbones import fpn
+from seetadet.models.backbones import mobilenet_v2
+from seetadet.models.backbones import mobilenet_v3
+from seetadet.models.backbones import resnet
+from seetadet.models.backbones import repvgg
+from seetadet.models.backbones import vgg
--- a/seetadet/models/backbones/fpn.py
+++ b/seetadet/models/backbones/fpn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Vanilla FPN neck."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.models.build import BACKBONES
+from seetadet.ops.conv import ConvNorm2d
+@BACKBONES.register('fpn')
+class FPN(nn.Module):
+    """FPN to enhance input features."""
+    def __init__(self, in_dims):
+        super(FPN, self).__init__()
+        lateral_conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.FPN.NORM)
+        output_conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.FPN.NORM, conv_type=cfg.FPN.CONV)
+        self.dim = cfg.FPN.DIM
+        self.min_lvl = cfg.FPN.MIN_LEVEL
+        self.max_lvl = cfg.FPN.MAX_LEVEL
+        self.highest_lvl = min(self.max_lvl, len(in_dims))
+        self.coarsest_stride = cfg.BACKBONE.COARSEST_STRIDE
+        self.out_dims = [self.dim] * (self.max_lvl - self.min_lvl + 1)
+        self.lateral_conv = nn.ModuleList()
+        self.output_conv = nn.ModuleList()
+        for dim in in_dims[self.min_lvl - 1:self.highest_lvl + 1]:
+            self.lateral_conv += [lateral_conv_module(dim, self.dim, 1)]
+            self.output_conv += [output_conv_module(self.dim, self.dim, 3)]
+        if 'rcnn' not in cfg.MODEL.TYPE:
+            for lvl in range(self.highest_lvl + 1, self.max_lvl + 1):
+                dim = in_dims[-1] if lvl == self.highest_lvl + 1 else self.dim
+                self.output_conv += [output_conv_module(dim, self.dim, 3, stride=2)]
+    def forward(self, features):
+        features = features[self.min_lvl - 1:self.highest_lvl + 1]
+        laterals = [conv(x) for conv, x in zip(self.lateral_conv, features)]
+        for i in range(len(features) - 1, 0, -1):
+            y, x = laterals[i - 1], laterals[i]
+            scale = 2 if self.coarsest_stride > 1 else None
+            size = None if self.coarsest_stride > 1 else y.shape[2:]
+            y += nn.functional.interpolate(x, size, scale)
+        outputs = [conv(x) for conv, x in zip(self.output_conv, laterals)]
+        if len(self.output_conv) <= len(self.lateral_conv):
+            for _ in range(len(outputs), len(self.out_dims)):
+                outputs.append(nn.functional.max_pool2d(outputs[-1], 1, stride=2))
+        else:
+            outputs.append(self.output_conv[len(outputs)](features[-1]))
+            for i in range(len(outputs), len(self.out_dims)):
+                outputs.append(self.output_conv[i](nn.functional.relu(outputs[-1])))
+        return outputs
--- a/seetadet/models/backbones/mobilenet_v2.py
+++ b/seetadet/models/backbones/mobilenet_v2.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""MobileNetV2 backbone."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.models.build import BACKBONES
+from seetadet.ops.conv import ConvNorm2d
+def make_divisible(v, divisor=8):
+    """Return the divisible value."""
+    min_value = divisor
+    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+    if new_v < 0.9 * v:
+        new_v += divisor
+    return new_v
+class InvertedResidual(nn.Module):
+    """Invert residual block."""
+    def __init__(self, dim_in, dim_out, kernel_size=3, stride=1, expand_ratio=6):
+        super(InvertedResidual, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
+            activation_type='ReLU6')
+        self.has_endpoint = stride == 2
+        self.apply_shortcut = stride == 1 and dim_in == dim_out
+        self.dim = dim = int(round(dim_in * expand_ratio))
+        self.conv1 = (conv_module(dim_in, dim, 1)
+                      if expand_ratio > 1 else nn.Identity())
+        self.conv2 = conv_module(dim, dim, kernel_size, stride, groups=dim)
+        self.conv3 = conv_module(dim, dim_out, 1, activation_type='')
+    def forward(self, x):
+        shortcut = x
+        x = self.conv1(x)
+        if self.has_endpoint:
+            self.endpoint = x
+        x = self.conv2(x)
+        x = self.conv3(x)
+        if self.apply_shortcut:
+            return x.add_(shortcut)
+        return x
+class MobileNetV2(nn.Module):
+    """MobileNetV2 class."""
+    def __init__(self, depths, dims, strides, expand_ratios, width_mult=1.0):
+        super(MobileNetV2, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
+            activation_type='ReLU6')
+        dims = list(map(lambda x: make_divisible(x * width_mult), dims))
+        self.conv1 = conv_module(3, dims[0], 3, 2)
+        dim_in, blocks = dims[0], []
+        self.out_indices, self.out_dims = [], []
+        for i, (depth, dim) in enumerate(zip(depths, dims[1:-1])):
+            for j in range(depth):
+                stride = strides[i] if j == 0 else 1
+                blocks.append(InvertedResidual(
+                    dim_in, dim, stride=stride,
+                    expand_ratio=expand_ratios[i]))
+                if blocks[-1].has_endpoint:
+                    self.out_indices.append(len(blocks) - 1)
+                    self.out_dims.append(blocks[-1].dim)
+                dim_in = dim
+            setattr(self, 'layer%d' % (i + 1), nn.Sequential(*blocks[-depth:]))
+        self.conv2 = conv_module(dim_in, dims[-1], 1)
+        self.blocks = blocks + [self.conv2]
+        self.out_dims.append(dims[-1])
+        self.out_indices.append(len(self.blocks) - 1)
+    def forward(self, x):
+        x = self.conv1(x)
+        outputs = []
+        for i, blk in enumerate(self.blocks):
+            x = blk(x)
+            if i in self.out_indices:
+                outputs.append(blk.__dict__.pop('endpoint', x))
+        return outputs
+BACKBONES.register(
+    'mobilenet_v2', MobileNetV2,
+    dims=(32,) + (16, 24, 32, 64, 96, 160, 320) + (1280,),
+    depths=(1, 2, 3, 4, 3, 3, 1),
+    strides=(1, 2, 2, 2, 1, 2, 1),
+    expand_ratios=(1, 6, 6, 6, 6, 6, 6))
--- a/seetadet/models/backbones/mobilenet_v3.py
+++ b/seetadet/models/backbones/mobilenet_v3.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""MobileNetV3 backbone."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.models.build import BACKBONES
+from seetadet.ops.conv import ConvNorm2d
+def make_divisible(v, divisor=8):
+    """Return the divisible value."""
+    min_value = divisor
+    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+    if new_v < 0.9 * v:
+        new_v += divisor
+    return new_v
+class SqueezeExcite(nn.Module):
+    """Squeeze-and-Excitation block."""
+    def __init__(self, dim_in, dim):
+        super(SqueezeExcite, self).__init__()
+        self.conv1 = nn.Conv2d(dim_in, dim, 1)
+        self.conv2 = nn.Conv2d(dim, dim_in, 1)
+        self.activation1 = nn.ReLU(True)
+        self.activation2 = nn.Hardsigmoid(True)
+    def forward(self, x):
+        scale = x.mean((2, 3), keepdim=True)
+        scale = self.activation1(self.conv1(scale))
+        scale = self.activation2(self.conv2(scale))
+        return x * scale
+class InvertedResidual(nn.Module):
+    """Invert residual block."""
+    def __init__(
+        self,
+        dim_in,
+        dim_out,
+        kernel_size=3,
+        stride=1,
+        expand_ratio=3,
+        squeeze_ratio=1,
+        activation_type='ReLU',
+    ):
+        super(InvertedResidual, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
+            activation_type=activation_type)
+        self.has_endpoint = stride == 2
+        self.apply_shortcut = stride == 1 and dim_in == dim_out
+        self.dim = dim = int(round(dim_in * expand_ratio))
+        self.conv1 = (conv_module(dim_in, dim, 1)
+                      if expand_ratio > 1 else nn.Identity())
+        self.conv2 = conv_module(dim, dim, kernel_size, stride, groups=dim)
+        self.se = (SqueezeExcite(dim, make_divisible(dim * squeeze_ratio))
+                   if squeeze_ratio < 1 else nn.Identity())
+        self.conv3 = conv_module(dim, dim_out, 1, activation_type='')
+    def forward(self, x):
+        shortcut = x
+        x = self.conv1(x)
+        if self.has_endpoint:
+            self.endpoint = x
+        x = self.conv2(x)
+        x = self.se(x)
+        x = self.conv3(x)
+        if self.apply_shortcut:
+            return x.add_(shortcut)
+        return x
+class MobileNetV3(nn.Module):
+    """MobileNetV3 class."""
+    def __init__(self, depths, dims, kernel_sizes, strides,
+                 expand_ratios, squeeze_ratios, width_mult=1.0):
+        super(MobileNetV3, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
+            activation_type='Hardswish')
+        dims = list(map(lambda x: make_divisible(x * width_mult), dims))
+        self.conv1 = conv_module(3, dims[0], 3, 2)
+        dim_in, blocks, coarsest_stride = dims[0], [], 2
+        self.out_indices, self.out_dims = [], []
+        for i, (depth, dim) in enumerate(zip(depths, dims[1:])):
+            coarsest_stride *= strides[i]
+            layer_expand_ratios = expand_ratios[i]
+            if not isinstance(layer_expand_ratios, (tuple, list)):
+                layer_expand_ratios = [layer_expand_ratios]
+            layer_expand_ratios = list(layer_expand_ratios)
+            layer_expand_ratios += ([layer_expand_ratios[-1]] *
+                                    (depth - len(layer_expand_ratios)))
+            for j in range(depth):
+                blocks.append(InvertedResidual(
+                    dim_in, dim,
+                    kernel_size=kernel_sizes[i],
+                    stride=strides[i] if j == 0 else 1,
+                    expand_ratio=layer_expand_ratios[j],
+                    squeeze_ratio=squeeze_ratios[i],
+                    activation_type='Hardswish'
+                    if coarsest_stride >= 16 else 'ReLU'))
+                if blocks[-1].has_endpoint:
+                    self.out_indices.append(len(blocks) - 1)
+                    self.out_dims.append(blocks[-1].dim)
+                dim_in = dim
+            setattr(self, 'layer%d' % (i + 1), nn.Sequential(*blocks[-depth:]))
+        self.conv2 = conv_module(dim_in, blocks[-1].dim, 1)
+        self.blocks = blocks + [self.conv2]
+        self.out_dims.append(blocks[-1].dim)
+        self.out_indices.append(len(self.blocks) - 1)
+    def forward(self, x):
+        x = self.conv1(x)
+        outputs = []
+        for i, blk in enumerate(self.blocks):
+            x = blk(x)
+            if i in self.out_indices:
+                outputs.append(blk.__dict__.pop('endpoint', x))
+        return outputs
+BACKBONES.register(
+    'mobilenet_v3_large', MobileNetV3,
+    dims=(16,) + (16, 24, 40, 80, 112, 160),
+    depths=(1, 2, 3, 4, 2, 3),
+    kernel_sizes=(3, 3, 5, 3, 3, 5),
+    strides=(1, 2, 2, 2, 1, 2),
+    expand_ratios=(1, (4, 3), 3, (6, 2.5, 2.3, 2.3), 6, 6),
+    squeeze_ratios=(1, 1, 0.25, 1, 0.25, 0.25))
+BACKBONES.register(
+    'mobilenet_v3_small', MobileNetV3,
+    dims=(16,) + (16, 24, 40, 48, 96),
+    depths=(1, 2, 3, 2, 3),
+    kernel_sizes=(3, 3, 5, 5, 5),
+    strides=(2, 2, 2, 1, 2),
+    expand_ratios=(1, (4.5, 88. / 24), (4, 6, 6), 3, 6),
+    squeeze_ratios=(0.25, 1, 0.25, 0.25, 0.25))
--- a/seetadet/models/backbones/resnet.py
+++ b/seetadet/models/backbones/resnet.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""ResNet backbone."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import itertools
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.core.training.utils import freeze_module
+from seetadet.models.build import BACKBONES
+from seetadet.ops.build import build_norm
+class BasicBlock(nn.Module):
+    """The basic resnet block."""
+    expansion = 1
+    def __init__(self, dim_in, dim, stride=1, downsample=None):
+        super(BasicBlock, self).__init__()
+        self.conv1 = nn.Conv2d(dim_in, dim, 3, stride, padding=1, bias=False)
+        self.bn1 = build_norm(dim, cfg.BACKBONE.NORM)
+        self.relu = nn.ReLU(True)
+        self.conv2 = nn.Conv2d(dim, dim, 3, padding=1, bias=False)
+        self.bn2 = build_norm(dim, cfg.BACKBONE.NORM)
+        self.downsample = downsample
+    def forward(self, x):
+        shortcut = x
+        x = self.relu(self.bn1(self.conv1(x)))
+        x = self.bn2(self.conv2(x))
+        if self.downsample is not None:
+            shortcut = self.downsample(shortcut)
+        return self.relu(x.add_(shortcut))
+class Bottleneck(nn.Module):
+    """The bottleneck resnet block."""
+    expansion = 4
+    groups, width_per_group = 1, 64
+    def __init__(self, dim_in, dim, stride=1, downsample=None):
+        super(Bottleneck, self).__init__()
+        width = int(dim * (self.width_per_group / 64.)) * self.groups
+        self.conv1 = nn.Conv2d(dim_in, width, 1, bias=False)
+        self.bn1 = build_norm(width, cfg.BACKBONE.NORM)
+        self.conv2 = nn.Conv2d(width, width, 3, stride, padding=1, bias=False)
+        self.bn2 = build_norm(width, cfg.BACKBONE.NORM)
+        self.conv3 = nn.Conv2d(width, dim * self.expansion, 1, bias=False)
+        self.bn3 = build_norm(dim * self.expansion, cfg.BACKBONE.NORM)
+        self.relu = nn.ReLU(True)
+        self.downsample = downsample
+    def forward(self, x):
+        shortcut = x
+        x = self.relu(self.bn1(self.conv1(x)))
+        x = self.relu(self.bn2(self.conv2(x)))
+        x = self.bn3(self.conv3(x))
+        if self.downsample is not None:
+            shortcut = self.downsample(shortcut)
+        return self.relu(x.add_(shortcut))
+class ResNet(nn.Module):
+    """ResNet class."""
+    def __init__(self, block, depths):
+        super(ResNet, self).__init__()
+        dim_in, dims, blocks = 64, [64, 128, 256, 512], []
+        self.out_indices = [v - 1 for v in itertools.accumulate(depths)]
+        self.out_dims = [dim_in] + [v * block.expansion for v in dims]
+        self.conv1 = nn.Conv2d(3, dim_in, 7, 2, padding=3, bias=False)
+        self.bn1 = build_norm(dim_in, cfg.BACKBONE.NORM)
+        self.relu = nn.ReLU(True)
+        self.maxpool = nn.MaxPool2d(3, 2, padding=1)
+        for i, depth, dim in zip(range(4), depths, dims):
+            downsample, stride = None, 1 if i == 0 else 2
+            if stride != 1 or dim_in != dim * block.expansion:
+                downsample = nn.Sequential(
+                    nn.Conv2d(dim_in, dim * block.expansion, 1, stride, bias=False),
+                    build_norm(dim * block.expansion, cfg.BACKBONE.NORM))
+            blocks.append(block(dim_in, dim, stride, downsample))
+            dim_in = dim * block.expansion
+            for _ in range(depth - 1):
+                blocks.append(block(dim_in, dim))
+            setattr(self, 'layer%d' % (i + 1), nn.Sequential(*blocks[-depth:]))
+        self.blocks = blocks
+        self.reset_parameters()
+    def reset_parameters(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(
+                    m.weight, mode='fan_out', nonlinearity='relu')
+        num_freeze_stages = cfg.BACKBONE.FREEZE_AT
+        if num_freeze_stages > 0:
+            self.conv1.apply(freeze_module)
+            self.bn1.apply(freeze_module)
+        for i in range(num_freeze_stages - 1, 0, -1):
+            getattr(self, 'layer%d' % i).apply(freeze_module)
+    def forward(self, x):
+        x = self.relu(self.bn1(self.conv1(x)))
+        x = self.maxpool(x)
+        outputs = [None]
+        for i, blk in enumerate(self.blocks):
+            x = blk(x)
+            if i in self.out_indices:
+                outputs.append(x)
+        return outputs
+BACKBONES.register('resnet18', func=ResNet, block=BasicBlock, depths=[2, 2, 2, 2])
+BACKBONES.register('resnet34', func=ResNet, block=BasicBlock, depths=[3, 4, 6, 3])
+BACKBONES.register('resnet50', func=ResNet, block=Bottleneck, depths=[3, 4, 6, 3])
+BACKBONES.register('resnet101', func=ResNet, block=Bottleneck, depths=[3, 4, 23, 3])
+BACKBONES.register('resnet152', func=ResNet, block=Bottleneck, depths=[3, 8, 36, 3])
--- a/seetadet/models/backbones/vgg.py
+++ b/seetadet/models/backbones/vgg.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""VGGNet backbone."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import itertools
+import functools
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.models.build import BACKBONES
+from seetadet.ops.build import build_norm
+from seetadet.ops.conv import ConvNorm2d
+from seetadet.ops.normalization import L2Norm
+class VGGBlock(nn.Module):
+    """The VGG block."""
+    def __init__(self, dim_in, dim, downsample=None):
+        super(VGGBlock, self).__init__()
+        self.conv = nn.Conv2d(dim_in, dim, 3, padding=1,
+                              bias=not cfg.BACKBONE.NORM)
+        self.bn = build_norm(dim, cfg.BACKBONE.NORM)
+        self.relu = nn.ReLU(True)
+        self.downsample = downsample
+    def forward(self, x):
+        if self.downsample is not None:
+            x = self.downsample(x)
+        return self.relu(self.bn(self.conv(x)))
+class VGG(nn.Module):
+    """VGGNet."""
+    def __init__(self, depths):
+        super(VGG, self).__init__()
+        dim_in, dims, blocks = 3, [64, 128, 256, 512, 512], []
+        self.out_indices = [v - 1 for v in itertools.accumulate(depths)][1:]
+        self.out_dims = dims[1:]
+        for i, (depth, dim) in enumerate(zip(depths, dims)):
+            downsample = nn.MaxPool2d(2, 2, ceil_mode=True) if i > 0 else None
+            blocks.append(VGGBlock(dim_in, dim, downsample))
+            for _ in range(depth - 1):
+                blocks.append(VGGBlock(dim, dim))
+            setattr(self, 'layer%d' % i, nn.Sequential(*blocks[-depth:]))
+            dim_in = dim
+        self.blocks = blocks
+        self.reset_parameters()
+    def reset_parameters(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(
+                    m.weight, mode='fan_out', nonlinearity='relu')
+    def forward(self, x):
+        outputs = []
+        for i, blk in enumerate(self.blocks):
+            x = blk(x)
+            if i in self.out_indices:
+                outputs.append(x)
+        return outputs
+class VGGFCN(VGG):
+    """Fully convolutional VGGNet in SSD."""
+    def __init__(self, depths):
+        super(VGGFCN, self).__init__(depths)
+        dim_in, out_index = self.out_dims[-1], self.out_indices[-1]
+        self.blocks.append(nn.Sequential(
+            nn.MaxPool2d(3, padding=1),
+            nn.Conv2d(dim_in, 1024, 3, padding=6, dilation=6),
+            nn.ReLU(True)))
+        self.blocks.append(nn.Sequential(nn.Conv2d(1024, 1024, 1), nn.ReLU(True)))
+        self.layer4.add_module(str(len(self.layer4)), self.blocks[-2])
+        self.layer4.add_module(str(len(self.layer4)), self.blocks[-1])
+        self.out_dims = [self.out_dims[-2], 1024]  # conv4_3, fc7
+        self.out_indices = [self.out_indices[-2], out_index + 2]  # 9, 14
+        self.norm = L2Norm(dim_in, init=20.0)
+    def forward(self, x):
+        outputs = super(VGGFCN, self).forward(x)
+        outputs[0] = self.norm(outputs[0])
+        return outputs
+class SSDNeck(nn.Module):
+    """Feature Pyramid Network."""
+    def __init__(self, in_dims, out_dims, kernel_sizes, strides, paddings):
+        super(SSDNeck, self).__init__()
+        self.out_dims = list(in_dims[-2:]) + list(out_dims)
+        dim_in, self.blocks = in_dims[-1], nn.ModuleList()
+        conv_module = functools.partial(
+            ConvNorm2d, conv_type=cfg.FPN.CONV,
+            norm_type=cfg.FPN.NORM, activation_type=cfg.FPN.ACTIVATION)
+        for dim, kernel_size, stride, padding in zip(
+                out_dims, kernel_sizes, strides, paddings):
+            self.blocks.append(conv_module(dim_in, dim // 2, 1))
+            self.blocks.append(conv_module(dim // 2, dim, kernel_size, stride, padding))
+            dim_in = dim
+    def forward(self, features):
+        x, outputs = features[-1], features[-2:]
+        for i, blk in enumerate(self.blocks):
+            x = blk(x)
+            if i % 2 > 0:
+                outputs.append(x)
+        return outputs
+BACKBONES.register('vgg16', VGG, depths=(2, 2, 3, 3, 3))
+BACKBONES.register('vgg16_fcn', VGGFCN, depths=(2, 2, 3, 3, 3))
+BACKBONES.register(
+    'ssd300', SSDNeck,
+    out_dims=(512, 256, 256, 256),
+    kernel_sizes=(3, 3, 3, 3),
+    strides=(2, 2, 1, 1),
+    paddings=(1, 1, 0, 0))
+BACKBONES.register(
+    'ssd512', SSDNeck,
+    out_dims=(512, 256, 256, 256, 256),
+    kernel_sizes=(3, 3, 3, 3, 4),
+    strides=(2, 2, 2, 2, 1),
+    paddings=(1, 1, 1, 1, 1))
+BACKBONES.register(
+    'ssdlite', SSDNeck,
+    out_dims=(512, 256, 256, 128),
+    kernel_sizes=(3, 3, 3, 3),
+    strides=(2, 2, 2, 2),
+    paddings=(1, 1, 1, 1))
--- a/seetadet/models/build.py
+++ b/seetadet/models/build.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Build for models."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.core.registry import Registry
+from seetadet.utils.profiler import Timer
+BACKBONES = Registry('backbones')
+DETECTORS = Registry('detectors')
+def build_backbone():
+    """Build the backbone."""
+    backbone_types = cfg.BACKBONE.TYPE.lower().split('.')
+    backbone = BACKBONES.get(backbone_types[0])()
+    backbone_dims = backbone.out_dims
+    neck = nn.Identity()
+    if len(backbone_types) > 1:
+        neck = BACKBONES.get(backbone_types[1])(backbone_dims)
+    else:
+        neck.out_dims = backbone_dims
+    return backbone, neck
+def build_detector(device=None, weights=None, training=False):
+    """Create a detector instance.
+    Parameters
+    ----------
+    device : int, optional
+        The index of compute device.
+    weights : str, optional
+        The path of weight file.
+    training : bool, optional, default=False
+        Return a training detector or not.
+    """
+    model = DETECTORS.get(cfg.MODEL.TYPE)()
+    if model is None:
+        raise ValueError('Unknown detector: ' + cfg.MODEL.TYPE)
+    if weights is not None:
+        model.load_weights(weights, strict=True)
+    if device is not None:
+        model.cuda(device)
+    if not training:
+        model.eval()
+        model.optimize_for_inference()
+        model.timers = collections.defaultdict(Timer)
+    return model
--- a/seetadet/solver/__init__.py
+++ b/seetadet/solver/__init__.py
--- a/seetadet/models/decoders/retinanet.py
+++ b/seetadet/models/decoders/retinanet.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""RetinaNet decoder."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm.torch import autograd
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.data.anchors.rpn import AnchorGenerator
+class RetinaNetDecoder(nn.Module):
+    """Decode predictions from retinanet."""
+    def __init__(self):
+        super(RetinaNetDecoder, self).__init__()
+        self.anchor_generator = AnchorGenerator(
+            strides=cfg.ANCHOR_GENERATOR.STRIDES,
+            sizes=cfg.ANCHOR_GENERATOR.SIZES,
+            aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS,
+            scales_per_octave=3)
+        self.pre_nms_top_n = cfg.RETINANET.PRE_NMS_TOP_N
+        self.score_thresh = float(cfg.TEST.SCORE_THRESH)
+    def forward(self, inputs):
+        input_tags = ['cls_score', 'bbox_pred', 'im_info', 'grid_info']
+        return autograd.Function.apply(
+            'RetinaNetDecoder',
+            inputs['cls_score'].device,
+            inputs=[inputs[k] for k in input_tags],
+            strides=self.anchor_generator.strides,
+            ratios=self.anchor_generator.aspect_ratios[0],
+            scales=self.anchor_generator.scales[0],
+            pre_nms_top_n=self.pre_nms_top_n,
+            score_thresh=self.score_thresh,
+        )
+    autograd.Function.register(
+        'RetinaNetDecoder', lambda **kwargs: {
+            'strides': kwargs.get('strides', []),
+            'ratios': kwargs.get('ratios', []),
+            'scales': kwargs.get('scales', []),
+            'pre_nms_top_n': kwargs.get('pre_nms_top_n', 1),
+            'score_thresh': kwargs.get('score_thresh', 0.),
+            'check_device': False,
+        })
--- a/seetadet/models/decoders/rpn.py
+++ b/seetadet/models/decoders/rpn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""RPN decoder."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm.torch import autograd
+from dragon.vm.torch import nn
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.data.anchors.rpn import AnchorGenerator
+from seetadet.utils.bbox import bbox_transform_inv
+from seetadet.utils.bbox import clip_boxes
+from seetadet.utils.nms import gpu_nms
+class RPNDecoder(nn.Module):
+    """Generate proposal regions from RPN."""
+    def __init__(self):
+        super(RPNDecoder, self).__init__()
+        self.anchor_generator = AnchorGenerator(
+            strides=cfg.ANCHOR_GENERATOR.STRIDES,
+            sizes=cfg.ANCHOR_GENERATOR.SIZES,
+            aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS)
+        self.min_level = cfg.FRCNN.MIN_LEVEL
+        self.max_level = cfg.FRCNN.MAX_LEVEL
+        self.pre_nms_top_n = {True: cfg.RPN.PRE_NMS_TOP_N_TRAIN,
+                              False: cfg.RPN.PRE_NMS_TOP_N_TEST}
+        self.post_nms_top_n = {True: cfg.RPN.POST_NMS_TOP_N_TRAIN,
+                               False: cfg.RPN.POST_NMS_TOP_N_TEST}
+        self.nms_thresh = float(cfg.RPN.NMS_THRESH)
+    def decode_proposals(self, scores, deltas, anchors, im_info):
+        pre_nms_top_n = self.pre_nms_top_n[self.training]
+        post_nms_top_n = self.post_nms_top_n[self.training]
+        if pre_nms_top_n <= 0 or pre_nms_top_n >= len(scores):
+            order = np.argsort(-scores.squeeze())
+        else:
+            # Avoid sorting possibly large arrays; First partition to get top K
+            # unsorted and then sort just those (~20x faster for 200k scores)
+            inds = np.argpartition(-scores.squeeze(), pre_nms_top_n)[:pre_nms_top_n]
+            order = np.argsort(-scores[inds].squeeze())
+            order = inds[order]
+        scores, deltas, anchors = scores[order], deltas[order], anchors[order]
+        # Convert anchors into proposals.
+        proposals = bbox_transform_inv(anchors, deltas)
+        proposals = clip_boxes(proposals, im_info[:2])
+        # Apply NMS.
+        keep = gpu_nms(np.hstack((proposals, scores)), self.nms_thresh)
+        keep = keep[:post_nms_top_n] if post_nms_top_n > 0 else keep
+        return proposals[keep, :].astype('float32', copy=False)
+    def forward_train(self, inputs):
+        shapes = [x[:2] for x in inputs['grid_info']]
+        anchors = self.anchor_generator.get_anchors(shapes)
+        cls_score = inputs['cls_score'].numpy()
+        bbox_pred = inputs['bbox_pred'].permute(0, 2, 1).numpy()
+        all_rois, batch_size = [], cls_score.shape[0]
+        for batch_ind in range(batch_size):
+            scores = cls_score[batch_ind].reshape((-1, 1))
+            deltas = bbox_pred[batch_ind]
+            im_info = inputs['im_info'][batch_ind]
+            proposals = self.decode_proposals(scores, deltas, anchors, im_info)
+            batch_inds = np.full((proposals.shape[0], 1), batch_ind, 'float32')
+            all_rois.append(np.hstack((batch_inds, proposals)))
+        return np.concatenate(all_rois)
+    def forward(self, inputs):
+        if self.training:
+            return self.forward_train(inputs)
+        input_tags = ['cls_score', 'bbox_pred', 'im_info', 'grid_info']
+        return autograd.Function.apply(
+            'RPNDecoder',
+            inputs['cls_score'].device,
+            inputs=[inputs[k] for k in input_tags],
+            outputs=[None] * (self.max_level - self.min_level + 1),
+            strides=self.anchor_generator.strides,
+            ratios=self.anchor_generator.aspect_ratios[0],
+            scales=self.anchor_generator.scales[0],
+            min_level=self.min_level,
+            max_level=self.max_level,
+            pre_nms_top_n=self.pre_nms_top_n[False],
+            post_nms_top_n=self.post_nms_top_n[False],
+            nms_thresh=self.nms_thresh,
+        )
+    autograd.Function.register(
+        'RPNDecoder', lambda **kwargs: {
+            'strides': kwargs.get('strides', []),
+            'ratios': kwargs.get('ratios', []),
+            'scales': kwargs.get('scales', []),
+            'pre_nms_top_n': kwargs.get('pre_nms_top_n', 6000),
+            'post_nms_top_n': kwargs.get('post_nms_top_n', 1000),
+            'nms_thresh': kwargs.get('nms_thresh', 0.7),
+            'min_level': kwargs.get('min_level', 2),
+            'max_level': kwargs.get('max_level', 5),
+            'check_device': False,
+        })
--- a/seetadet/models/dense_heads/__init__.py
+++ b/seetadet/models/dense_heads/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
--- a/seetadet/models/dense_heads/retinanet.py
+++ b/seetadet/models/dense_heads/retinanet.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""RetinaNet head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+import math
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.data.targets.retinanet import AnchorTargets
+from seetadet.ops.build import build_activation
+from seetadet.ops.build import build_loss
+from seetadet.ops.build import build_norm
+from seetadet.ops.conv import ConvNorm2d
+from seetadet.ops.fusion import fuse_conv_bn
+class RetinaNetHead(nn.Module):
+    """RetinaNet head."""
+    def __init__(self, in_dims):
+        super(RetinaNetHead, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, dim_in=in_dims[0], dim_out=in_dims[0],
+            kernel_size=3, conv_type=cfg.RETINANET.CONV)
+        norm_module = functools.partial(build_norm, norm_type=cfg.RETINANET.NORM)
+        self.conv_module = conv_module
+        self.dim_cls = len(cfg.MODEL.CLASSES) - 1
+        self.cls_conv = nn.ModuleList(
+            conv_module() for _ in range(cfg.RETINANET.NUM_CONV))
+        self.bbox_conv = nn.ModuleList(
+            conv_module() for _ in range(cfg.RETINANET.NUM_CONV))
+        self.cls_norm = nn.ModuleList()
+        self.bbox_norm = nn.ModuleList()
+        for _ in range(len(self.cls_conv)):
+            self.cls_norm.append(nn.ModuleList())
+            self.bbox_norm.append(nn.ModuleList())
+            for _ in range(len(in_dims)):
+                self.cls_norm[-1].append(norm_module(in_dims[0]))
+                self.bbox_norm[-1].append(norm_module(in_dims[0]))
+        self.targets = AnchorTargets()
+        num_anchors = self.targets.generator.num_cell_anchors(0)
+        self.cls_score = conv_module(dim_out=self.dim_cls * num_anchors)
+        self.bbox_pred = conv_module(dim_out=4 * num_anchors)
+        self.activation = build_activation(cfg.RETINANET.ACTIVATION, inplace=True)
+        self.cls_loss = build_loss('sigmoid_focal')
+        self.bbox_loss = build_loss(cfg.RETINANET.BBOX_REG_LOSS_TYPE, beta=0.1)
+        self.reset_parameters()
+    def reset_parameters(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.normal_(m.weight, std=0.01)
+        # Bias prior initialization for focal loss.
+        for name, param in self.cls_score.named_parameters():
+            if name.endswith('bias'):
+                nn.init.constant_(param, -math.log((1 - 0.01) / 0.01))
+    def optimize_for_inference(self):
+        """Optimize modules for inference."""
+        if hasattr(self.cls_norm[0][0], 'momentum'):
+            cls_conv = nn.ModuleList()
+            bbox_conv = nn.ModuleList()
+            for i in range(len(self.cls_norm)):
+                cls_conv.append(nn.ModuleList())
+                bbox_conv.append(nn.ModuleList())
+                cls_state = self.cls_conv[i].state_dict()
+                bbox_state = self.bbox_conv[i].state_dict()
+                for j in range(len(self.cls_norm[i])):
+                    cls_conv[i].append(self.conv_module()._apply(
+                        lambda t: t.to(self.cls_norm[i][j].weight.device)))
+                    bbox_conv[i].append(self.conv_module()._apply(
+                        lambda t: t.to(self.bbox_norm[i][j].weight.device)))
+                    cls_conv[i][j].load_state_dict(cls_state)
+                    bbox_conv[i][j].load_state_dict(bbox_state)
+                    fuse_conv_bn(cls_conv[i][j][-1], self.cls_norm[i][j])
+                    fuse_conv_bn(bbox_conv[i][j][-1], self.bbox_norm[i][j])
+            self._modules['cls_conv'] = cls_conv
+            self._modules['bbox_conv'] = bbox_conv
+    def get_outputs(self, inputs):
+        """Return the outputs."""
+        features = list(inputs['features'])
+        cls_score, bbox_pred = [], []
+        for j, feature in enumerate(features):
+            cls_input, box_input = feature, feature
+            for i in range(len(self.cls_conv)):
+                if isinstance(self.cls_conv[i], nn.ModuleList):
+                    cls_input = self.cls_conv[i][j](cls_input)
+                    box_input = self.bbox_conv[i][j](box_input)
+                else:
+                    cls_input = self.cls_conv[i](cls_input)
+                    box_input = self.bbox_conv[i](box_input)
+                cls_input = self.activation(self.cls_norm[i][j](cls_input))
+                box_input = self.activation(self.bbox_norm[i][j](box_input))
+            cls_score.append(self.cls_score(cls_input).reshape_((0, self.dim_cls, -1)))
+            bbox_pred.append(self.bbox_pred(box_input).reshape_((0, 4, -1)))
+        cls_score = torch.cat(cls_score, 2) if len(features) > 1 else cls_score[0]
+        bbox_pred = torch.cat(bbox_pred, 2) if len(features) > 1 else bbox_pred[0]
+        return {'cls_score': cls_score, 'bbox_pred': bbox_pred}
+    def get_losses(self, inputs, targets):
+        """Return the losses."""
+        bbox_pred = inputs['bbox_pred'].permute(0, 2, 1)
+        bbox_pred = bbox_pred.flatten_(0, 1)[targets['bbox_inds']]
+        cls_loss = self.cls_loss(inputs['cls_score'], targets['labels'])
+        bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'],
+                                   targets['bbox_anchors'])
+        normalizer = targets['bbox_inds'].size(0)
+        cls_loss_weight = 1.0 / normalizer
+        bbox_loss_weight = cfg.RETINANET.BBOX_REG_LOSS_WEIGHT / normalizer
+        cls_loss = cls_loss.mul_(cls_loss_weight)
+        bbox_loss = bbox_loss.mul_(bbox_loss_weight)
+        return {'cls_loss': cls_loss, 'bbox_loss': bbox_loss}
+    def forward(self, inputs):
+        outputs = self.get_outputs(inputs)
+        if self.training:
+            targets = self.targets.compute(**inputs)
+            logits = {'cls_score': outputs['cls_score'].float(),
+                      'bbox_pred': outputs['bbox_pred'].float()}
+            return self.get_losses(logits, targets)
+        else:
+            cls_score = outputs['cls_score'].permute(0, 2, 1)
+            cls_score = nn.functional.sigmoid(cls_score, inplace=True)
+            return {'cls_score': cls_score.float(),
+                    'bbox_pred': outputs['bbox_pred'].float()}
--- a/seetadet/models/dense_heads/rpn.py
+++ b/seetadet/models/dense_heads/rpn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""RPN head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.data.targets.rpn import AnchorTargets
+from seetadet.ops.build import build_loss
+class RPNHead(nn.Module):
+    """RPN head."""
+    def __init__(self, in_dims):
+        super(RPNHead, self).__init__()
+        self.targets = AnchorTargets()
+        num_anchors = self.targets.generator.num_cell_anchors(0)
+        self.output_conv = nn.Conv2d(in_dims[0], in_dims[0], 3, padding=1)
+        self.cls_score = nn.Conv2d(in_dims[0], num_anchors, 1)
+        self.bbox_pred = nn.Conv2d(in_dims[0], num_anchors * 4, 1)
+        self.activation = nn.ReLU(inplace=True)
+        self.cls_loss = nn.BCEWithLogitsLoss(reduction='mean')
+        self.bbox_loss = build_loss(cfg.RPN.BBOX_REG_LOSS_TYPE, beta=0.1)
+        self.reset_parameters()
+    def reset_parameters(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.normal_(m.weight, std=0.01)
+    def get_outputs(self, inputs):
+        """Return the outputs."""
+        features = list(inputs['features'])
+        cls_score, bbox_pred = [], []
+        for x in features:
+            x = self.activation(self.output_conv(x))
+            cls_score.append(self.cls_score(x).reshape_((0, -1)))
+            bbox_pred.append(self.bbox_pred(x).reshape_((0, 4, -1)))
+        cls_score = torch.cat(cls_score, 1) if len(features) > 1 else cls_score[0]
+        bbox_pred = torch.cat(bbox_pred, 2) if len(features) > 1 else bbox_pred[0]
+        return {'rpn_cls_score': cls_score, 'rpn_bbox_pred': bbox_pred}
+    def get_losses(self, inputs, targets):
+        """Return the losses."""
+        bbox_pred = inputs['bbox_pred'].permute(0, 2, 1)
+        bbox_pred = bbox_pred.index_select((0, 1), targets['bbox_inds'])
+        cls_score = inputs['cls_score'].index_select((0, 1), targets['cls_inds'])
+        cls_loss = self.cls_loss(cls_score, targets['labels'])
+        bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'],
+                                   targets['bbox_anchors'])
+        normalizer = cfg.RPN.BATCH_SIZE * cfg.TRAIN.IMS_PER_BATCH
+        bbox_loss_weight = cfg.RPN.BBOX_REG_LOSS_WEIGHT / normalizer
+        bbox_loss = bbox_loss.mul_(bbox_loss_weight)
+        return {'rpn_cls_loss': cls_loss, 'rpn_bbox_loss': bbox_loss}
+    def forward(self, inputs):
+        outputs = self.get_outputs(inputs)
+        rpn_cls_score = outputs.pop('rpn_cls_score').float()
+        outputs['rpn_bbox_pred'] = outputs['rpn_bbox_pred'].float()
+        outputs['rpn_cls_score'] = nn.functional.sigmoid(rpn_cls_score.data)
+        if self.training:
+            targets = self.targets.compute(**inputs)
+            logits = {'cls_score': rpn_cls_score,
+                      'bbox_pred': outputs['rpn_bbox_pred']}
+            outputs.update(self.get_losses(logits, targets))
+        return outputs
--- a/seetadet/models/dense_heads/ssd.py
+++ b/seetadet/models/dense_heads/ssd.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""SSD head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.data.targets.ssd import AnchorTargets
+from seetadet.ops.build import build_loss
+from seetadet.ops.conv import ConvNorm2d
+class SSDHead(nn.Module):
+    """SSD head."""
+    def __init__(self, in_dims):
+        super(SSDHead, self).__init__()
+        self.targets = AnchorTargets()
+        self.cls_score = nn.ModuleList()
+        self.bbox_pred = nn.ModuleList()
+        self.num_classes = len(cfg.MODEL.CLASSES)
+        conv_module = nn.Conv2d
+        if cfg.FPN.CONV == 'SepConv2d':
+            conv_module = functools.partial(ConvNorm2d, conv_type='SepConv2d')
+        conv_module = functools.partial(conv_module, kernel_size=3, padding=1)
+        for i, dim in enumerate(in_dims):
+            num_anchors = self.targets.generator.num_cell_anchors(i)
+            self.cls_score.append(conv_module(dim, num_anchors * self.num_classes))
+            self.bbox_pred.append(conv_module(dim, num_anchors * 4))
+        self.cls_loss = nn.CrossEntropyLoss(ignore_index=-1, reduction='sum')
+        self.bbox_loss = build_loss(cfg.SSD.BBOX_REG_LOSS_TYPE)
+    def get_outputs(self, inputs):
+        """Return the outputs."""
+        features = list(inputs['features'])
+        cls_score, bbox_pred = [], []
+        for i, x in enumerate(features):
+            cls_score.append(self.cls_score[i](x).permute(0, 2, 3, 1).flatten_(1))
+            bbox_pred.append(self.bbox_pred[i](x).permute(0, 2, 3, 1).flatten_(1))
+        cls_score = torch.cat(cls_score, 1) if len(features) > 1 else cls_score[0]
+        bbox_pred = torch.cat(bbox_pred, 1) if len(features) > 1 else bbox_pred[0]
+        cls_score = cls_score.reshape_((0, -1, self.num_classes))
+        bbox_pred = bbox_pred.reshape_((0, -1, 4))
+        return {'cls_score': cls_score, 'bbox_pred': bbox_pred}
+    def get_losses(self, inputs, targets):
+        """Return the losses."""
+        cls_score = inputs['cls_score'].flatten_(0, 1)
+        bbox_pred = inputs['bbox_pred'].flatten_(0, 1)
+        bbox_pred = bbox_pred[targets['bbox_inds']]
+        cls_loss = self.cls_loss(cls_score, targets['labels'])
+        bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'],
+                                   targets['bbox_anchors'])
+        normalizer = targets['bbox_inds'].size(0)
+        cls_loss_weight = 1.0 / normalizer
+        bbox_loss_weight = cfg.SSD.BBOX_REG_LOSS_WEIGHT / normalizer
+        cls_loss = cls_loss.mul_(cls_loss_weight)
+        bbox_loss = bbox_loss.mul_(bbox_loss_weight)
+        return {'cls_loss': cls_loss, 'bbox_loss': bbox_loss}
+    def forward(self, inputs):
+        outputs = self.get_outputs(inputs)
+        cls_score = outputs['cls_score']
+        if self.training:
+            cls_score_data = nn.functional.softmax(cls_score.data, dim=2)
+            targets = self.targets.compute(cls_score=cls_score_data, **inputs)
+            logits = {'cls_score': cls_score.float(),
+                      'bbox_pred': outputs['bbox_pred'].float()}
+            return self.get_losses(logits, targets)
+        else:
+            cls_score = nn.functional.softmax(cls_score, dim=2, inplace=True)
+            return {'cls_score': cls_score.float(),
+                    'bbox_pred': outputs['bbox_pred'].float()}
--- a/seetadet/algo/retinanet/data_loader.py
+++ b/seetadet/algo/retinanet/data_loader.py
@@ -8,24 +8,15 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Detectors."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-from seetadet.algo import faster_rcnn
+# Classes.
-from seetadet.algo import ssd
+from seetadet.models.detectors.detector import Detector
-from seetadet.core.config import cfg
+from seetadet.models.detectors.faster_rcnn import FasterRCNN
+from seetadet.models.detectors.mask_rcnn import MaskRCNN
+from seetadet.models.detectors.retinanet import RetinaNet
-class DataLoader(object):
+from seetadet.models.detectors.ssd import SSD
-    """Provide mini-batches of data."""
-    def __new__(cls):
-        pipeline_type = cfg.PIPELINE.TYPE.lower()
-        if pipeline_type == 'default' or pipeline_type == 'rcnn':
-            return faster_rcnn.DataLoader()
-        elif pipeline_type == 'ssd':
-            return ssd.DataLoader()
-        else:
-            raise ValueError('Unsupported pipeline: ' + pipeline_type)
--- a/seetadet/models/detectors/detector.py
+++ b/seetadet/models/detectors/detector.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Base detector."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.models.build import build_backbone
+from seetadet.ops.fusion import get_fusion
+from seetadet.ops.normalization import ToTensor
+from seetadet.utils import logging
+class Detector(nn.Module):
+    """Class to build and compute the detection pipelines."""
+    def __init__(self):
+        super(Detector, self).__init__()
+        self.to_tensor = ToTensor()
+        self.backbone, self.neck = build_backbone()
+        self.backbone_dims = self.neck.out_dims
+    def get_inputs(self, inputs):
+        """Return the detection inputs.
+        Parameters
+        ----------
+        inputs : dict, optional
+            The optional inputs.
+        """
+        inputs['img'] = self.to_tensor(inputs['img'], normalize=True)
+        return inputs
+    def get_features(self, inputs):
+        """Return the detection features.
+        Parameters
+        ----------
+        inputs : dict
+            The inputs.
+        """
+        return self.neck(self.backbone(inputs['img']))
+    def get_outputs(self, inputs):
+        """Return the detection outputs.
+        Parameters
+        ----------
+        inputs : dict
+            The inputs.
+        """
+        return inputs
+    def forward(self, inputs):
+        """Define the computation performed at every call.
+        Parameters
+        ----------
+        inputs : dict
+            The inputs.
+        """
+        return self.get_outputs(inputs)
+    def load_weights(self, weights, strict=False):
+        """Load the state dict of this detector.
+        Parameters
+        ----------
+        weights : str
+            The path of the weights file.
+        """
+        return self.load_state_dict(torch.load(weights), strict=strict)
+    def optimize_for_inference(self):
+        """Optimize the graph for the inference."""
+        # Set precision.
+        precision = cfg.MODEL.PRECISION.lower()
+        self.half() if precision == 'float16' else self.float()
+        logging.info('Set precision: ' + precision)
+        # Fuse modules.
+        fusion_memo, last_module = set(), None
+        for module in self.modules():
+            if module is self:
+                continue
+            if hasattr(module, 'optimize_for_inference'):
+                module.optimize_for_inference()
+                fusion_memo.add(module.__class__.__name__)
+                continue
+            key, fn = get_fusion(last_module, module)
+            if fn is not None:
+                fusion_memo.add(key)
+                fn(last_module, module)
+            last_module = module
+        for key in fusion_memo:
+            logging.info('Fuse modules: ' + key)
--- a/seetadet/models/detectors/faster_rcnn.py
+++ b/seetadet/models/detectors/faster_rcnn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Faster R-CNN detector."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.data.targets.rcnn import ProposalTargets
+from seetadet.models.build import DETECTORS
+from seetadet.models.decoders.rpn import RPNDecoder
+from seetadet.models.dense_heads.rpn import RPNHead
+from seetadet.models.detectors.detector import Detector
+from seetadet.models.roi_heads.fast_rcnn import FastRCNNHead
+@DETECTORS.register('faster_rcnn')
+class FasterRCNN(Detector):
+    """Faster R-CNN detector."""
+    def __init__(self):
+        super(FasterRCNN, self).__init__()
+        self.rpn_head = RPNHead(self.backbone_dims)
+        self.bbox_head = FastRCNNHead(self.backbone_dims)
+        self.rpn_decoder = RPNDecoder()
+        self.proposal_targets = ProposalTargets()
+    def get_outputs(self, inputs):
+        """Return the detection outputs."""
+        inputs = self.get_inputs(inputs)
+        inputs['features'] = self.get_features(inputs)
+        inputs['grid_info'] = inputs.pop(
+            'grid_info', [x.shape[-2:] for x in inputs['features']])
+        outputs = self.rpn_head(inputs)
+        inputs['rois'] = self.rpn_decoder({
+            'cls_score': outputs.pop('rpn_cls_score'),
+            'bbox_pred': outputs.pop('rpn_bbox_pred'),
+            'im_info': inputs['im_info'],
+            'grid_info': inputs['grid_info']})
+        if self.training:
+            targets = self.proposal_targets.compute(**inputs)
+            inputs['rois'] = targets['rois']
+            outputs.update(self.bbox_head(inputs, targets))
+        else:
+            outputs.update(self.bbox_head(inputs))
+        return outputs
--- a/seetadet/models/detectors/mask_rcnn.py
+++ b/seetadet/models/detectors/mask_rcnn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Mask R-CNN detector."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.data.targets.rcnn import ProposalTargets
+from seetadet.models.build import DETECTORS
+from seetadet.models.decoders.rpn import RPNDecoder
+from seetadet.models.dense_heads.rpn import RPNHead
+from seetadet.models.detectors.detector import Detector
+from seetadet.models.roi_heads.fast_rcnn import FastRCNNHead
+from seetadet.models.roi_heads.mask_rcnn import MaskRCNNHead
+@DETECTORS.register('mask_rcnn')
+class MaskRCNN(Detector):
+    """Mask R-CNN detector."""
+    def __init__(self):
+        super(MaskRCNN, self).__init__()
+        self.rpn_head = RPNHead(self.backbone_dims)
+        self.bbox_head = FastRCNNHead(self.backbone_dims)
+        self.mask_head = MaskRCNNHead(self.backbone_dims)
+        self.rpn_decoder = RPNDecoder()
+        self.proposal_targets = ProposalTargets()
+    def get_outputs(self, inputs):
+        """Return the detection outputs."""
+        inputs = self.get_inputs(inputs)
+        inputs['features'] = self.get_features(inputs)
+        inputs['grid_info'] = inputs.pop(
+            'grid_info', [x.shape[-2:] for x in inputs['features']])
+        outputs = self.rpn_head(inputs)
+        inputs['rois'] = self.rpn_decoder({
+            'cls_score': outputs.pop('rpn_cls_score'),
+            'bbox_pred': outputs.pop('rpn_bbox_pred'),
+            'im_info': inputs['im_info'],
+            'grid_info': inputs['grid_info']})
+        if self.training:
+            targets = self.proposal_targets.compute(**inputs)
+            inputs['rois'] = targets.pop('rois')
+            outputs.update(self.bbox_head(inputs, targets))
+            inputs['rois'] = targets.pop('fg_rois')
+            outputs.update(self.mask_head(inputs, targets))
+        else:
+            outputs.update(self.bbox_head(inputs))
+            self.outputs = {'features': inputs['features']}
+        return outputs
--- a/seetadet/models/detectors/retinanet.py
+++ b/seetadet/models/detectors/retinanet.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""RetinaNet detector."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.models.build import DETECTORS
+from seetadet.models.decoders.retinanet import RetinaNetDecoder
+from seetadet.models.dense_heads.retinanet import RetinaNetHead
+from seetadet.models.detectors.detector import Detector
+@DETECTORS.register('retinanet')
+class RetinaNet(Detector):
+    """RetinaNet detector."""
+    def __init__(self):
+        super(RetinaNet, self).__init__()
+        self.bbox_head = RetinaNetHead(self.backbone_dims)
+        self.bbox_decoder = RetinaNetDecoder()
+    def get_outputs(self, inputs):
+        """Compute detection outputs."""
+        inputs = self.get_inputs(inputs)
+        inputs['features'] = self.get_features(inputs)
+        inputs['grid_info'] = inputs.pop(
+            'grid_info', [x.shape[-2:] for x in inputs['features']])
+        outputs = self.bbox_head(inputs)
+        if not self.training:
+            outputs['dets'] = self.bbox_decoder({
+                'cls_score': outputs.pop('cls_score'),
+                'bbox_pred': outputs.pop('bbox_pred'),
+                'im_info': inputs['im_info'],
+                'grid_info': inputs['grid_info']})
+        return outputs
--- a/seetadet/models/detectors/ssd.py
+++ b/seetadet/models/detectors/ssd.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""SSD detector."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.models.build import DETECTORS
+from seetadet.models.dense_heads.ssd import SSDHead
+from seetadet.models.detectors.detector import Detector
+@DETECTORS.register('ssd')
+class SSD(Detector):
+    """SSD detector."""
+    def __init__(self):
+        super(SSD, self).__init__()
+        self.bbox_head = SSDHead(self.backbone_dims)
+    def get_outputs(self, inputs=None):
+        """Compute detection outputs."""
+        inputs = self.get_inputs(inputs)
+        inputs['features'] = self.get_features(inputs)
+        outputs = self.bbox_head(inputs)
+        return outputs
--- a/seetadet/models/roi_heads/__init__.py
+++ b/seetadet/models/roi_heads/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
--- a/seetadet/models/roi_heads/fast_rcnn.py
+++ b/seetadet/models/roi_heads/fast_rcnn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Fast-RCNN head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.ops.build import build_loss
+from seetadet.ops.conv import ConvNorm2d
+from seetadet.ops.vision import RoIPooler
+class FastRCNNHead(nn.Module):
+    """Fast R-CNN head."""
+    def __init__(self, in_dims):
+        super(FastRCNNHead, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.FRCNN.NORM,
+            kernel_size=3, activation_type='ReLU')
+        self.output_conv = nn.ModuleList()
+        self.output_fc = nn.ModuleList()
+        for i in range(cfg.FRCNN.NUM_CONV):
+            dim = in_dims[0] if i == 0 else cfg.FRCNN.CONV_HEAD_DIM
+            self.output_conv += [conv_module(dim, cfg.FRCNN.CONV_HEAD_DIM)]
+        for i in range(cfg.FRCNN.NUM_FC):
+            dim = in_dims[0] * cfg.FRCNN.POOLER_RESOLUTION ** 2
+            dim = dim if i == 0 else cfg.FRCNN.FC_HEAD_DIM
+            self.output_fc += [nn.Sequential(nn.Linear(dim, cfg.FRCNN.FC_HEAD_DIM),
+                                             nn.ReLU(inplace=True))]
+        self.cls_score = nn.Linear(cfg.FRCNN.FC_HEAD_DIM, len(cfg.MODEL.CLASSES))
+        self.bbox_pred = nn.Linear(cfg.FRCNN.FC_HEAD_DIM, len(cfg.MODEL.CLASSES) * 4)
+        self.pooler = RoIPooler(
+            pooler_type=cfg.FRCNN.POOLER_TYPE,
+            resolution=cfg.FRCNN.POOLER_RESOLUTION,
+            sampling_ratio=cfg.FRCNN.POOLER_SAMPLING_RATIO)
+        self.cls_loss = nn.CrossEntropyLoss(ignore_index=-1, reduction='mean')
+        self.bbox_loss = build_loss(cfg.FRCNN.BBOX_REG_LOSS_TYPE)
+        self.spatial_scales = [1. / (2 ** lvl) for lvl in range(
+            cfg.FRCNN.MIN_LEVEL, cfg.FRCNN.MAX_LEVEL + 1)]
+        self.reset_parameters()
+    def reset_parameters(self):
+        nn.init.normal_(self.cls_score.weight, std=0.01)
+        nn.init.normal_(self.bbox_pred.weight, std=0.001)
+    def get_outputs(self, inputs):
+        x = torch.cat([self.pooler(
+            inputs['features'][i], inputs['rois'][i],
+            spatial_scale=spatial_scale) for i, spatial_scale
+            in enumerate(self.spatial_scales)])
+        for layer in self.output_conv:
+            x = layer(x)
+        x = x.flatten_(1)
+        for layer in self.output_fc:
+            x = layer(x)
+        cls_score, bbox_pred = self.cls_score(x), self.bbox_pred(x)
+        return {'cls_score': cls_score, 'bbox_pred': bbox_pred}
+    def get_losses(self, inputs, targets):
+        bbox_pred = inputs['bbox_pred'].reshape_((0, -1, 4))
+        bbox_pred = bbox_pred.index_select((0, 1), targets['bbox_inds'])
+        cls_loss = self.cls_loss(inputs['cls_score'], targets['labels'])
+        bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'])
+        normalizer = cfg.FRCNN.BATCH_SIZE * cfg.TRAIN.IMS_PER_BATCH
+        bbox_loss_weight = cfg.FRCNN.BBOX_REG_LOSS_WEIGHT / normalizer
+        bbox_loss = bbox_loss.mul_(bbox_loss_weight)
+        return {'cls_loss': cls_loss, 'bbox_loss': bbox_loss}
+    def forward(self, inputs, targets=None):
+        outputs = self.get_outputs(inputs)
+        if self.training:
+            logits = {'cls_score': outputs['cls_score'].float(),
+                      'bbox_pred': outputs['bbox_pred'].float()}
+            return self.get_losses(logits, targets)
+        else:
+            outputs['cls_score'] = nn.functional.softmax(
+                outputs['cls_score'], dim=1, inplace=True)
+            return {'rois': torch.cat(inputs['rois']),
+                    'cls_score': outputs['cls_score'].float(),
+                    'bbox_pred': outputs['bbox_pred'].float()}
--- a/seetadet/models/roi_heads/mask_rcnn.py
+++ b/seetadet/models/roi_heads/mask_rcnn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Mask R-CNN head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.ops.conv import ConvNorm2d
+from seetadet.ops.vision import RoIPooler
+class MaskRCNNHead(nn.Module):
+    """Mask R-CNN head."""
+    def __init__(self, in_dims):
+        super(MaskRCNNHead, self).__init__()
+        self.dim = cfg.MRCNN.CONV_HEAD_DIM
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.MRCNN.NORM,
+            kernel_size=3, activation_type='ReLU')
+        self.output_conv = nn.ModuleList()
+        for i in range(cfg.MRCNN.NUM_CONV):
+            dim = in_dims[0] if i == 0 else self.dim
+            self.output_conv += [conv_module(dim, self.dim)]
+        self.output_conv += [nn.Sequential(
+            nn.ConvTranspose2d(self.dim, self.dim, 2, 2),
+            nn.ReLU(True))]
+        self.mask_pred = nn.Conv2d(self.dim, len(cfg.MODEL.CLASSES) - 1, 1)
+        self.pooler = RoIPooler(
+            pooler_type=cfg.MRCNN.POOLER_TYPE,
+            resolution=cfg.MRCNN.POOLER_RESOLUTION,
+            sampling_ratio=cfg.MRCNN.POOLER_SAMPLING_RATIO)
+        self.mask_loss = nn.BCEWithLogitsLoss(reduction='valid')
+        self.spatial_scales = [1. / (2 ** lvl) for lvl in range(
+            cfg.FRCNN.MIN_LEVEL, cfg.FRCNN.MAX_LEVEL + 1)]
+        self.reset_parameters()
+    def reset_parameters(self):
+        nn.init.normal_(self.mask_pred.weight, std=0.001)
+    def get_outputs(self, inputs):
+        x = torch.cat([self.pooler(
+            inputs['features'][i], inputs['rois'][i],
+            spatial_scale=spatial_scale) for i, spatial_scale
+            in enumerate(self.spatial_scales)])
+        for layer in self.output_conv:
+            x = layer(x)
+        mask_pred = self.mask_pred(x)
+        return {'mask_pred': mask_pred}
+    def get_losses(self, inputs, targets):
+        mask_pred = inputs['mask_pred']
+        mask_pred = mask_pred.index_select((0, 1), targets['mask_inds'])
+        mask_loss = self.mask_loss(mask_pred, targets['mask_targets'])
+        return {'mask_loss': mask_loss}
+    def forward(self, inputs, targets=None):
+        outputs = self.get_outputs(inputs)
+        if self.training:
+            logits = {'mask_pred': outputs['mask_pred'].float()}
+            return self.get_losses(logits, targets)
+        else:
+            outputs['mask_pred'] = nn.functional.sigmoid(
+                outputs['mask_pred'].float(), inplace=True)
+            return {'mask_pred': outputs['mask_pred']}
--- a/seetadet/modules/det.py
+++ b/seetadet/modules/det.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""Detection modules."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from dragon.vm.torch import nn
-from dragon.vm.torch import autograd
-from seetadet.core.config import cfg
-class _NonMaxSuppression(autograd.Function):
-    """Filter out boxes that have high IoU with selected ones."""
-    def __init__(self, key, dev, **kwargs):
-        super(_NonMaxSuppression, self).__init__(key, dev, **kwargs)
-        self.iou_threshold = kwargs.get('iou_threshold', 0.5)
-    def attributes(self):
-        return {
-            'op_type': 'NonMaxSuppression',
-            'arguments': {'iou_threshold': self.iou_threshold}
-        }
-    def forward(self, input):
-        return self.dispatch([input], [self.alloc()])
-class _RetinaNetDecoder(autograd.Function):
-    """Decode predictions from RetinaNet."""
-    def __init__(self, key, dev, **kwargs):
-        super(_RetinaNetDecoder, self).__init__(key, dev, **kwargs)
-        self.args = kwargs
-    def attributes(self):
-        return {
-            'op_type': 'RetinaNetDecoder',
-            'arguments': {
-                'strides': self.args['strides'],
-                'ratios': self.args['ratios'],
-                'scales': self.args['scales'],
-                'pre_nms_top_n': self.args['pre_nms_top_n'],
-                'score_thresh': self.args['score_thresh'],
-            }
-        }
-    def forward(self, features, cls_prob, bbox_pred, ims_info):
-        inputs = features + [cls_prob, bbox_pred, ims_info]
-        self._check_device(inputs[:-1])  # Skip <ims_info>
-        return self.dispatch(inputs, [self.alloc()], check_device=False)
-class _RPNDecoder(autograd.Function):
-    """Decode proposal regions from RPN."""
-    def __init__(self, key, dev, **kwargs):
-        super(_RPNDecoder, self).__init__(key, dev, **kwargs)
-        self.args = kwargs
-    def attributes(self):
-        return {
-            'op_type': 'RPNDecoder',
-            'arguments': {
-                'strides': self.args['strides'],
-                'ratios': self.args['ratios'],
-                'scales': self.args['scales'],
-                'pre_nms_top_n': self.args['pre_nms_top_n'],
-                'post_nms_top_n': self.args['post_nms_top_n'],
-                'nms_thresh': self.args['nms_thresh'],
-                'min_level': self.args['min_level'],
-                'max_level': self.args['max_level'],
-                'canonical_scale': self.args['canonical_scale'],
-                'canonical_level': self.args['canonical_level'],
-            }
-        }
-    def forward(self, features, cls_prob, bbox_pred, im_info):
-        inputs = features + [cls_prob, bbox_pred, im_info]
-        self._check_device(inputs[:-1])  # Skip <im_info>
-        num_outputs = self.args['max_level'] - self.args['min_level'] + 1
-        outputs = [self.alloc() for _ in range(num_outputs)]
-        return self.dispatch(inputs, outputs, check_device=False)
-def decode_retinanet(
-    features,
-    cls_prob,
-    bbox_pred,
-    ims_info,
-    strides,
-    ratios,
-    scales,
-    pre_nms_top_n,
-    score_thresh,
-):
-    return _RetinaNetDecoder \
-        .instantiate(
-            cls_prob.device,
-            strides=strides,
-            ratios=ratios,
-            scales=scales,
-            pre_nms_top_n=pre_nms_top_n,
-            score_thresh=score_thresh,
-        ).apply(features, cls_prob, bbox_pred, ims_info)
-def decode_rpn(
-    features,
-    cls_prob,
-    bbox_pred,
-    im_info,
-    strides,
-    ratios,
-    scales,
-    pre_nms_top_n,
-    post_nms_top_n,
-    nms_thresh,
-    min_level,
-    max_level,
-    canonical_scale,
-    canonical_level,
-):
-    return _RPNDecoder \
-        .instantiate(
-            cls_prob.device,
-            strides=strides,
-            ratios=ratios,
-            scales=scales,
-            pre_nms_top_n=pre_nms_top_n,
-            post_nms_top_n=post_nms_top_n,
-            nms_thresh=nms_thresh,
-            min_level=min_level,
-            max_level=max_level,
-            canonical_scale=canonical_scale,
-            canonical_level=canonical_level,
-        ).apply(features, cls_prob, bbox_pred, im_info)
-def nms(input, iou_threshold=0.5):
-    return _NonMaxSuppression \
-        .instantiate(
-            input.device,
-            iou_threshold=iou_threshold,
-        ).apply(input)
-class RetinaNetDecoder(nn.Module):
-    """Decode predictions from retinanet."""
-    def __init__(self):
-        super(RetinaNetDecoder, self).__init__()
-        k_max, k_min = cfg.FPN.RPN_MAX_LEVEL, cfg.FPN.RPN_MIN_LEVEL
-        scales_per_octave = cfg.RETINANET.SCALES_PER_OCTAVE
-        self.strides = [int(2. ** lvl) for lvl in range(k_min, k_max + 1)]
-        self.scales = [cfg.RETINANET.ANCHOR_SCALE *
-                       (2 ** (octave / float(scales_per_octave)))
-                       for octave in range(scales_per_octave)]
-    def forward(self, features, cls_prob, bbox_pred, ims_info):
-        return decode_retinanet(
-            features=features,
-            cls_prob=cls_prob,
-            bbox_pred=bbox_pred,
-            ims_info=ims_info,
-            strides=self.strides,
-            ratios=[float(e) for e in cfg.RETINANET.ASPECT_RATIOS],
-            scales=self.scales,
-            pre_nms_top_n=cfg.TEST.RETINANET_PRE_NMS_TOP_N,
-            score_thresh=float(cfg.TEST.SCORE_THRESH),
-        )
-class RPNDecoder(nn.Module):
-    """Generate proposal regions from RPN."""
-    def __init__(self):
-        super(RPNDecoder, self).__init__()
-    def forward(self, features, cls_prob, bbox_pred, im_info):
-        return decode_rpn(
-            features=features,
-            cls_prob=cls_prob,
-            bbox_pred=bbox_pred,
-            im_info=im_info,
-            strides=cfg.RPN.STRIDES,
-            ratios=[float(e) for e in cfg.RPN.ASPECT_RATIOS],
-            scales=[float(e) for e in cfg.RPN.SCALES],
-            pre_nms_top_n=cfg.TEST.RPN_PRE_NMS_TOP_N,
-            post_nms_top_n=cfg.TEST.RPN_POST_NMS_TOP_N,
-            nms_thresh=cfg.TEST.RPN_NMS_THRESH,
-            min_level=cfg.FPN.ROI_MIN_LEVEL,
-            max_level=cfg.FPN.ROI_MAX_LEVEL,
-            canonical_scale=cfg.FPN.ROI_CANONICAL_SCALE,
-            canonical_level=cfg.FPN.ROI_CANONICAL_LEVEL,
-        )
--- a/seetadet/modules/nn.py
+++ b/seetadet/modules/nn.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""NN modules."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import dragon
-from dragon.vm import torch
-from dragon.vm.torch import nn
-from seetadet.core.config import cfg
-class Conv1x1(object):
-    """1x1 convolution."""
-    def __new__(cls, dim_in, dim_out, stride=1, bias=False):
-        return nn.Conv2d(
-            in_channels=dim_in,
-            out_channels=dim_out,
-            kernel_size=1,
-            stride=stride,
-            bias=bias,
-        )
-class Conv3x3(object):
-    """3x3 convolution."""
-    def __new__(
-        cls,
-        dim_in,
-        dim_out,
-        stride=1,
-        dilation=1,
-        groups=1,
-        bias=False
-    ):
-        return nn.Conv2d(
-            in_channels=dim_in,
-            out_channels=dim_out,
-            kernel_size=3,
-            stride=stride,
-            padding=1 * dilation,
-            dilation=dilation,
-            groups=groups,
-            bias=bias,
-        )
-class CrossEntropyLoss(object):
-    """Cross entropy loss."""
-    def __new__(cls, reduction='valid'):
-        return nn.CrossEntropyLoss(
-            reduction=reduction, ignore_index=-1)
-class FrozenBatchNorm2d(nn.Module):
-    """BatchNorm2d where statistics and the affine parameters are fixed."""
-    def __init__(self, num_features, eps=1e-5, inplace=True):
-        super(FrozenBatchNorm2d, self).__init__()
-        self.num_features = num_features
-        self.eps = eps
-        self.inplace = inplace
-        self.register_buffer('weight', torch.ones(num_features))
-        self.register_buffer('bias', torch.zeros(num_features))
-        self.register_buffer('running_mean', torch.zeros(num_features))
-        self.register_buffer('running_var', torch.ones(num_features) - eps)
-    def extra_repr(self):
-        affine_str = '{num_features}, eps={eps}'.format(**self.__dict__)
-        inplace_str = ', inplace' if self.inplace else ''
-        return affine_str + inplace_str
-    def forward(self, input):
-        return torch.channel_affine(
-            input,
-            self.weight,
-            self.bias,
-            dim=1,
-            out=input if self.inplace else None,
-        )
-    def _load_from_state_dict(
-        self,
-        state_dict,
-        prefix,
-        strict,
-        missing_keys,
-        unexpected_keys,
-        error_msgs,
-    ):
-        super(FrozenBatchNorm2d, self)._load_from_state_dict(
-            state_dict,
-            prefix,
-            strict,
-            missing_keys,
-            unexpected_keys,
-            error_msgs,
-        )
-        # Fuse the running stats into weight and bias.
-        # Note that this behavior will break the original stats
-        # into zero means and one stds.
-        with torch.no_grad():
-            self.running_var.float_().add_(self.eps).sqrt_()
-            self.weight.float_().div_(self.running_var)
-            self.bias.float_().sub_(self.running_mean.float_() * self.weight)
-            self.running_mean.zero_()
-            self.running_var.one_().sub_(self.eps)
-class GIoULoss(nn.Module):
-    """GIoU loss."""
-    def __init__(self, reduction='sum', delta_weights=None):
-        super(GIoULoss, self).__init__()
-        self.reduction = reduction
-        self.delta_weights = delta_weights
-        # Store the detached tensors
-        self.data = {}
-        self.x1, self.y1, self.x2, self.y2 = None, None, None, None
-    def transform_inv(self, boxes, deltas, name=None):
-        widths = boxes[:, 2] - boxes[:, 0]
-        heights = boxes[:, 3] - boxes[:, 1]
-        ctr_x = boxes[:, 0] + 0.5 * widths
-        ctr_y = boxes[:, 1] + 0.5 * heights
-        if name is not None:
-            self.data[name + '/widths'] = widths
-            self.data[name + '/heights'] = heights
-        dx, dy, dw, dh = torch.chunk(deltas, chunks=4, dim=1)
-        if self.delta_weights is not None:
-            wx, wy, ww, wh = self.delta_weights
-            dx, dy, dw, dh = dx / wx, dy / wy, dw / ww, dh / wh
-        pred_ctr_x = dx * widths + ctr_x
-        pred_ctr_y = dy * heights + ctr_y
-        pred_w = torch.exp(dw) * widths
-        pred_h = torch.exp(dh) * heights
-        x1 = pred_ctr_x - 0.5 * pred_w
-        y1 = pred_ctr_y - 0.5 * pred_h
-        x2 = pred_ctr_x + 0.5 * pred_w
-        y2 = pred_ctr_y + 0.5 * pred_h
-        return x1, y1, x2, y2
-    def forward_impl(self, input, target, anchor):
-        x1, y1, x2, y2 = self.transform_inv(
-            anchor, input, name='logits')
-        self.x1, self.y1, self.x2, self.y2 = \
-            self.transform_inv(anchor, target)
-        # Compute the independent area
-        pred_area = (x2 - x1) * (y2 - y1)
-        target_area = (self.x2 - self.x1) * (self.y2 - self.y1)
-        # Compute the intersecting area
-        x1_inter = torch.maximum(x1, self.x1)
-        y1_inter = torch.maximum(y1, self.y1)
-        x2_inter = torch.minimum(x2, self.x2)
-        y2_inter = torch.minimum(y2, self.y2)
-        w_inter = torch.clamp(x2_inter - x1_inter, min=0)
-        h_inter = torch.clamp(y2_inter - y1_inter, min=0)
-        area_inter = w_inter * h_inter
-        # Compute the enclosing area
-        x1_enc = torch.minimum(x1, self.x1)
-        y1_enc = torch.minimum(y1, self.y1)
-        x2_enc = torch.maximum(x2, self.x2)
-        y2_enc = torch.maximum(y2, self.y2)
-        area_enc = (x2_enc - x1_enc) * (y2_enc - y1_enc) + 1.
-        # Compute the differentiable IoU metric
-        area_union = pred_area + target_area - area_inter
-        iou = area_inter / (area_union + 1.)
-        iou_metric = iou - (area_enc - area_union) / area_enc
-        # Compute the reduced loss
-        if self.reduction == 'sum':
-            return (1 - iou_metric).sum()
-        else:
-            return (1 - iou_metric).mean()
-    def forward(self, *inputs, **kwargs):
-        # Enter a new detaching scope
-        with dragon.eager_scope('${IOU_LOSS}'):
-            return self.forward_impl(*inputs, **kwargs)
-class Identity(nn.Module):
-    """Pass input to the output."""
-    def __init__(self, *args, **kwargs):
-        super(Identity, self).__init__()
-        _, _ = args, kwargs
-    def forward(self, x):
-        return x
-class L1Loss(nn.Module):
-    """L1 loss."""
-    def __init__(self, reduction='sum'):
-        super(L1Loss, self).__init__()
-        self.reduction = reduction
-    def forward(self, input, target, *args):
-        return nn.functional.l1_loss(
-            input, target,
-            reduction=self.reduction,
-        )
-class L2Normalize(nn.Module):
-    """Normalize the input using L2 norm."""
-    def __init__(self, num_features, init=20.):
-        super(L2Normalize, self).__init__()
-        self.weight = nn.Parameter(torch.Tensor(num_features).fill_(init))
-    def forward(self, input):
-        out = nn.functional.normalize(input, p=2, dim=1, eps=1e-5)
-        out = torch.channel_affine(out, self.weight, dim=1)
-        return out
-class ReLU(object):
-    """The generic ReLU activation."""
-    def __new__(cls, inplace=False):
-        return getattr(torch.nn, cfg.MODEL.RELU_VARIANT)(inplace)
-class SigmoidFocalLoss(object):
-    """Sigmoid focal loss."""
-    def __new__(cls, reduction='sum'):
-        return nn.SigmoidFocalLoss(
-            alpha=cfg.MODEL.FOCAL_LOSS_ALPHA,
-            gamma=cfg.MODEL.FOCAL_LOSS_GAMMA,
-            negative_index=0,  # Background index
-            reduction=reduction,
-        )
-class SmoothL1Loss(nn.Module):
-    """Smoothed l1 loss."""
-    def __init__(self, beta=1.0, reduction='sum'):
-        super(SmoothL1Loss, self).__init__()
-        self.beta = beta
-        self.reduction = reduction
-    def forward(self, input, target, *args):
-        return nn.functional.smooth_l1_loss(
-            input, target,
-            beta=self.beta,
-            reduction=self.reduction,
-        )
-# Getters
-def get_norm(norm, dim_in):
-    """Return a normalization module."""
-    if isinstance(norm, str):
-        if len(norm) == 0:
-            return Identity()
-        norm = {'BN': BatchNorm2d,
-                'FrozenBN': FrozenBatchNorm2d}[norm]
-    return norm(dim_in)
-# Aliases
-AvgPool2d = nn.AvgPool2d
-BatchNorm2d = nn.BatchNorm2d
-BCEWithLogitsLoss = nn.BCEWithLogitsLoss
-Conv2d = nn.Conv2d
-ConvTranspose2d = nn.ConvTranspose2d
-DepthwiseConv2d = nn.DepthwiseConv2d
-DropBlock2d = nn.DropBlock2d
-Hardsigmoid = nn.Hardsigmoid
-Hardswish = nn.Hardswish
-Linear = nn.Linear
-MaxPool2d = nn.MaxPool2d
-Module = nn.Module
-ModuleList = nn.ModuleList
-Sequential = nn.Sequential
-Sigmoid = nn.Sigmoid
-Softmax = nn.Softmax
-Swish = nn.Swish
-upsample = nn.functional.upsample
--- a/seetadet/modules/utils.py
+++ b/seetadet/modules/utils.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""Module utilities."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from dragon.vm import torch
-from seetadet.core import registry
-@registry.fusion_pass.register([
-    'Conv2d+BatchNorm2d',
-    'Conv2d+FrozenBatchNorm2d',
-    'DepthwiseConv2d+BatchNorm2d',
-    'DepthwiseConv2d+FrozenBatchNorm2d',
-])
-def layer_fusion_conv2d_and_bn2d(conv_module, bn_module):
-    """Layer fusion between Conv2d and BatchNorm2d."""
-    if conv_module.bias is None:
-        with torch.no_grad():
-            delattr(conv_module, 'bias')
-            bn_module.forward = lambda x: x
-            t = torch.sqrt(bn_module.running_var + bn_module.eps)
-            t = bn_module.weight / t
-            conv_module.register_buffer(
-                'bias', bn_module.bias - t * bn_module.running_mean)
-            t = t.view(0, *([1] * (conv_module.weight.ndimension() - 1)))
-            if conv_module.weight.dtype == 'float16':
-                conv_module.bias.half_()
-                weight = conv_module.weight.float()
-                weight.mul_(t).half_()
-                conv_module.weight.copy_(weight)
-            else:
-                conv_module.weight.mul_(t)
-def get_fusion_pass(*modules):
-    """Return the fusion pass between modules."""
-    pass_key = '+'.join(m.__class__.__name__ for m in modules)
-    return pass_key, registry.fusion_pass.try_get(pass_key)
--- a/seetadet/modules/vision.py
+++ b/seetadet/modules/vision.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""Vision modules."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import functools
-from dragon.vm import torch
-from dragon.vm import torchvision
-from dragon.vm.torch import nn
-from seetadet.core.config import cfg
-def roi_align(input, boxes, spatial_scale, size, **kwargs):
-    return torchvision.ops.roi_align(
-        input, boxes,
-        output_size=(size, size),
-        spatial_scale=spatial_scale,
-        sampling_ratio=kwargs.get('sampling_ratio', 0),
-    )
-def roi_pool(input, boxes, spatial_scale, size, **kwargs):
-    _ = locals()  # Unused
-    return torchvision.ops.roi_pool(
-        input, boxes,
-        output_size=(size, size),
-        spatial_scale=spatial_scale,
-    )
-class ImageNormalizer(nn.Module):
-    """Normalize the image to match the computation."""
-    def __init__(self):
-        super(ImageNormalizer, self).__init__()
-        self._device = torch.device('cpu')
-        self._dummy_buffer = torch.ones(1)
-        self._normalize_func = functools.partial(
-            torch.channel_normalize,
-            mean=cfg.PIXEL_MEANS,
-            std=cfg.PIXEL_STDS,
-            dim=1,
-            dims=(0, 3, 1, 2),
-            dtype=cfg.MODEL.PRECISION.lower(),
-        )
-    def _apply(self, fn):
-        fn(self._dummy_buffer)
-    def cpu(self):
-        self._device = torch.device('cpu')
-    def cuda(self, device=None):
-        self._device = torch.device('cuda', device)
-    def device(self):
-        return self._dummy_buffer.device
-    def forward(self, input):
-        if isinstance(input, torch.Tensor):
-            if input.shape[1] <= 3:
-                return input
-        cur_device = self.device()
-        if input._device != cur_device:
-            if cur_device.type == 'cpu':
-                input = input.cpu()
-            else:
-                input = input.cuda(cur_device.index)
-        return self._normalize_func(input)
--- a/seetadet/modules/__init__.py
+++ b/seetadet/modules/__init__.py
@@ -8,7 +8,7 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
-"""Modules."""
+"""Operators."""
 from __future__ import absolute_import
 from __future__ import division
@@ -16,5 +16,5 @@ from __future__ import print_function
 import os
-from seetadet.utils import env
+from seetadet.core.backend import load_library as _load_library
-env.load_library(os.path.join(os.path.dirname(__file__), '_C'))
+_load_library(os.path.join(os.path.dirname(__file__), '_C'))
--- a/seetadet/ops/build.py
+++ b/seetadet/ops/build.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Build for ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm.torch import nn
+from seetadet.ops.loss import GIoULoss
+from seetadet.ops.loss import L1Loss
+from seetadet.ops.loss import SmoothL1Loss
+from seetadet.ops.loss import SigmoidFocalLoss
+from seetadet.ops.normalization import FrozenBatchNorm2d
+def build_loss(loss_type, reduction='sum', **kwargs):
+    if isinstance(loss_type, str):
+        loss_type = loss_type.lower()
+        if loss_type != 'smooth_l1':
+            kwargs.pop('beta', None)
+        loss_type = {
+            'l1': L1Loss,
+            'smooth_l1': SmoothL1Loss,
+            'giou': GIoULoss,
+            'cross_entroy': nn.CrossEntropyLoss,
+            'sigmoid_focal': SigmoidFocalLoss,
+        }[loss_type]
+    return loss_type(reduction=reduction, **kwargs)
+def build_norm(dim, norm_type):
+    """Build the normalization module."""
+    if isinstance(norm_type, str):
+        if len(norm_type) == 0:
+            return nn.Identity()
+        norm_type = {
+            'BN': nn.BatchNorm2d,
+            'FrozenBN': FrozenBatchNorm2d,
+            'SyncBN': nn.SyncBatchNorm,
+            'GN': lambda c: nn.GroupNorm(32, c),
+            'Affine': lambda c: FrozenBatchNorm2d(c, affine=True),
+        }[norm_type]
+    return norm_type(dim)
+def build_activation(activation_type, inplace=False):
+    """Build the activation module."""
+    if isinstance(activation_type, str):
+        if len(activation_type) == 0:
+            return nn.Identity()
+        activation_type = getattr(nn, activation_type)
+    activation = activation_type()
+    activation.inplace = inplace
+    return activation
--- a/seetadet/ops/conv.py
+++ b/seetadet/ops/conv.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Convolution ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm.torch import nn
+from seetadet.ops.build import build_norm
+class ConvNorm2d(nn.Sequential):
+    """2d convolution followed by norm."""
+    def __init__(
+        self,
+        dim_in,
+        dim_out,
+        kernel_size,
+        stride=1,
+        padding=None,
+        dilation=1,
+        groups=1,
+        bias=True,
+        conv_type='Conv2d',
+        norm_type='',
+        activation_type='',
+        inplace=True,
+    ):
+        super(ConvNorm2d, self).__init__()
+        if padding is None:
+            padding = kernel_size // 2
+        if conv_type == 'Conv2d':
+            layers = [nn.Conv2d(dim_in, dim_out,
+                                kernel_size=kernel_size,
+                                stride=stride,
+                                padding=padding,
+                                dilation=dilation,
+                                groups=groups,
+                                bias=bias and (not norm_type))]
+        elif conv_type == 'SepConv2d':
+            layers = [nn.Conv2d(dim_in, dim_in,
+                                kernel_size=kernel_size,
+                                stride=stride,
+                                padding=padding,
+                                dilation=dilation,
+                                groups=dim_in,
+                                bias=False),
+                      nn.Conv2d(dim_in, dim_out,
+                                kernel_size=1,
+                                bias=bias and (not norm_type))]
+        else:
+            raise ValueError('Unknown conv type: ' + conv_type)
+        if norm_type:
+            layers += [build_norm(dim_out, norm_type)]
+        if activation_type:
+            layers += [getattr(nn, activation_type)()]
+            layers[-1].inplace = inplace
+        for i, layer in enumerate(layers):
+            self.add_module(str(i), layer)
+        self.reset_parameters()
+    def reset_parameters(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(
+                    m.weight, mode='fan_out', nonlinearity='relu')
--- a/seetadet/ops/fusion.py
+++ b/seetadet/ops/fusion.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Operator fusions."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torch
+from seetadet.core.registry import Registry
+# Pass to fuse adjacent modules.
+FUSIONS = Registry('fusions')
+@FUSIONS.register([
+    'Conv2d+BatchNorm2d',
+    'Conv2d+FrozenBatchNorm2d',
+    'Conv2d+SyncBatchNorm',
+    'ConvTranspose2d+BatchNorm2d',
+    'ConvTranspose2d+FrozenBatchNorm2d',
+    'ConvTranspose2d+SyncBatchNorm',
+    'DepthwiseConv2d+BatchNorm2d',
+    'DepthwiseConv2d+FrozenBatchNorm2d',
+    'DepthwiseConv2d+SyncBatchNorm'])
+def fuse_conv_bn(conv, bn):
+    """Fuse Conv and BatchNorm."""
+    with torch.no_grad():
+        m = bn.running_mean
+        if conv.bias is not None:
+            m.sub_(conv.bias.float())
+        else:
+            delattr(conv, 'bias')
+        bn.forward = lambda x: x
+        t = bn.weight.div((bn.running_var + bn.eps).sqrt_())
+        conv._parameters['bias'] = bn.bias.sub(t * m)
+        t_conv_shape = [1, conv.out_channels] if conv.transposed else [0, 1]
+        t_conv_shape += [1] * len(conv.kernel_size)
+        if conv.weight.dtype == 'float16' and t.dtype == 'float32':
+            conv.bias.half_()
+            weight = conv.weight.float()
+            weight.mul_(t.reshape_(t_conv_shape)).half_()
+            conv.weight.copy_(weight)
+        else:
+            conv.weight.mul_(t.reshape_(t_conv_shape))
+def get_fusion(*modules):
+    """Return the fusion pass between modules."""
+    key = '+'.join(m.__class__.__name__ for m in modules)
+    return key, FUSIONS.try_get(key)
--- a/seetadet/ops/loss.py
+++ b/seetadet/ops/loss.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Loss ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import dragon
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+class GIoULoss(nn.Module):
+    """GIoU loss."""
+    def __init__(self, reduction='sum', delta_weights=None):
+        super(GIoULoss, self).__init__()
+        self.reduction = reduction
+        self.delta_weights = delta_weights
+    def transform_inv(self, boxes, deltas):
+        widths = boxes[:, 2:3] - boxes[:, 0:1]
+        heights = boxes[:, 3:4] - boxes[:, 1:2]
+        ctr_x = boxes[:, 0:1] + 0.5 * widths
+        ctr_y = boxes[:, 1:2] + 0.5 * heights
+        dx, dy, dw, dh = torch.chunk(deltas, chunks=4, dim=1)
+        if self.delta_weights is not None:
+            wx, wy, ww, wh = self.delta_weights
+            dx, dy, dw, dh = dx / wx, dy / wy, dw / ww, dh / wh
+        pred_ctr_x = dx * widths + ctr_x
+        pred_ctr_y = dy * heights + ctr_y
+        pred_w = torch.exp(dw) * widths
+        pred_h = torch.exp(dh) * heights
+        x1 = pred_ctr_x - 0.5 * pred_w
+        y1 = pred_ctr_y - 0.5 * pred_h
+        x2 = pred_ctr_x + 0.5 * pred_w
+        y2 = pred_ctr_y + 0.5 * pred_h
+        return x1, y1, x2, y2
+    def forward_impl(self, input, target, anchor):
+        x1, y1, x2, y2 = self.transform_inv(anchor, input)
+        x1_, y1_, x2_, y2_ = self.transform_inv(anchor, target)
+        # Compute the independent area.
+        pred_area = (x2 - x1) * (y2 - y1)
+        target_area = (x2_ - x1_) * (y2_ - y1_)
+        # Compute the intersecting area.
+        x1_inter = torch.maximum(x1, x1_)
+        y1_inter = torch.maximum(y1, y1_)
+        x2_inter = torch.minimum(x2, x2_)
+        y2_inter = torch.minimum(y2, y2_)
+        w_inter = torch.clamp(x2_inter - x1_inter, min=0)
+        h_inter = torch.clamp(y2_inter - y1_inter, min=0)
+        area_inter = w_inter * h_inter
+        # Compute the enclosing area.
+        x1_enc = torch.minimum(x1, x1_)
+        y1_enc = torch.minimum(y1, y1_)
+        x2_enc = torch.maximum(x2, x2_)
+        y2_enc = torch.maximum(y2, y2_)
+        area_enc = (x2_enc - x1_enc) * (y2_enc - y1_enc) + 1.
+        # Compute the differentiable IoU metric.
+        area_union = pred_area + target_area - area_inter
+        iou = area_inter / (area_union + 1.)
+        iou_metric = iou - (area_enc - area_union) / area_enc
+        # Compute the reduced loss.
+        if self.reduction == 'sum':
+            return (1 - iou_metric).sum()
+        else:
+            return (1 - iou_metric).mean()
+    def forward(self, *inputs, **kwargs):
+        with dragon.variable_scope('IoULossVariable'):
+            return self.forward_impl(*inputs, **kwargs)
+class L1Loss(nn.L1Loss):
+    """L1 loss."""
+    def forward(self, input, target, *args):
+        return super(L1Loss, self).forward(input, target)
+class SigmoidFocalLoss(nn.SigmoidFocalLoss):
+    """Sigmoid focal loss."""
+    def __init__(self, reduction='sum'):
+        super(SigmoidFocalLoss, self).__init__(
+            alpha=cfg.MODEL.FOCAL_LOSS_ALPHA,
+            gamma=cfg.MODEL.FOCAL_LOSS_GAMMA,
+            start_index=1,  # Foreground index
+            reduction=reduction)
+class SmoothL1Loss(nn.SmoothL1Loss):
+    """Smoothed l1 loss."""
+    def forward(self, input, target, *args):
+        return nn.functional.smooth_l1_loss(
+            input, target, beta=self.beta,
+            reduction=self.reduction)
--- a/seetadet/ops/nn.py
+++ b/seetadet/ops/nn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""NN ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torch
+from dragon.vm.torch import nn
+class WeightedFusion(nn.Module):
+    """Fuse inputs using the weighted sum."""
+    def __init__(self, num_inputs, fuse_type='sum', init=1.):
+        super(WeightedFusion, self).__init__()
+        self.fuse_type = fuse_type
+        if fuse_type == 'attn' or fuse_type == 'fast_attn':
+            self.weight = nn.Parameter(torch.Tensor(num_inputs).fill_(init))
+        elif fuse_type == 'sum':
+            self.weight = None
+        else:
+            raise ValueError('Unknown fuse type: ' + fuse_type)
+    def forward(self, inputs):
+        inputs = list(filter(lambda x: x is not None, inputs))
+        if self.fuse_type == 'attn':
+            weight = nn.functional.softmax(self.weight, 0)
+            inputs = [inputs[i] * weight[i] for i in range(len(inputs))]
+        elif self.fuse_type == 'fast_attn':
+            # NB: This implementation actually is "slow"
+            #     due to the more kernels are launched.
+            weight = nn.functional.relu(self.weight)
+            weight = weight / (weight.sum(0) + 1e-4)
+            inputs = [inputs[i] * weight[i] for i in range(len(inputs))]
+        out = inputs[0]
+        for x in inputs[1:]:
+            out = out + x
+        return out
--- a/seetadet/ops/normalization.py
+++ b/seetadet/ops/normalization.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Normalization ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+class FrozenBatchNorm2d(nn.Module):
+    """BatchNorm2d where statistics or affine parameters are fixed."""
+    def __init__(self, num_features, eps=1e-5, affine=False, inplace=True):
+        super(FrozenBatchNorm2d, self).__init__()
+        self.num_features = num_features
+        self.eps = eps
+        self.affine = affine
+        self.inplace = inplace and (not affine)
+        if self.affine:
+            self.weight = torch.nn.Parameter(torch.ones(num_features))
+            self.bias = torch.nn.Parameter(torch.zeros(num_features))
+        else:
+            self.register_buffer('weight', torch.ones(num_features))
+            self.register_buffer('bias', torch.zeros(num_features))
+        self.register_buffer('running_mean', torch.zeros(num_features))
+        self.register_buffer('running_var', torch.ones(num_features) - eps)
+    def extra_repr(self):
+        affine_str = '{num_features}, eps={eps}, affine={affine}' \
+                     .format(**self.__dict__)
+        inplace_str = ', inplace' if self.inplace else ''
+        return affine_str + inplace_str
+    def forward(self, input):
+        return nn.functional.affine(
+            input, self.weight, self.bias,
+            dim=1, out=input if self.inplace else None)
+    def _load_from_state_dict(
+        self,
+        state_dict,
+        prefix,
+        strict,
+        missing_keys,
+        unexpected_keys,
+        error_msgs,
+    ):
+        super(FrozenBatchNorm2d, self)._load_from_state_dict(
+            state_dict,
+            prefix,
+            strict,
+            missing_keys,
+            unexpected_keys,
+            error_msgs,
+        )
+        # Fuse the running stats into weight and bias.
+        # Note that this behavior will break the original stats
+        # into zero means and one stds.
+        with torch.no_grad():
+            self.running_var.float_().add_(self.eps).sqrt_()
+            self.weight.float_().div_(self.running_var)
+            self.bias.float_().sub_(self.running_mean.float_() * self.weight)
+            self.running_mean.zero_()
+            self.running_var.one_().sub_(self.eps)
+class L2Norm(nn.Module):
+    """Parameterized L2 normalize."""
+    def __init__(self, num_features, init=20., eps=1e-5):
+        super(L2Norm, self).__init__()
+        self.eps = eps
+        self.weight = nn.Parameter(torch.Tensor(num_features).fill_(init))
+    def forward(self, input):
+        out = nn.functional.normalize(input, p=2, dim=1, eps=self.eps)
+        return nn.functional.affine(out, self.weight, dim=1)
+class ToTensor(nn.Module):
+    """Convert input to tensor."""
+    def __init__(self):
+        super(ToTensor, self).__init__()
+        self.device = torch.device('cpu')
+        self.tensor = torch.ones(1)
+        self.normalize = functools.partial(
+            nn.functional.channel_norm,
+            mean=cfg.MODEL.PIXEL_MEAN,
+            std=cfg.MODEL.PIXEL_STD,
+            dim=1, dims=(0, 3, 1, 2),
+            dtype=cfg.MODEL.PRECISION.lower())
+    def _apply(self, fn):
+        fn(self.tensor)
+    def cpu(self):
+        self.device = torch.device('cpu')
+    def cuda(self, device=None):
+        self.device = torch.device('cuda', device)
+    def forward(self, input, normalize=False):
+        if input is None:
+            return input
+        if not isinstance(input, torch.Tensor):
+            input = torch.from_numpy(input)
+        input = input.to(self.tensor.device)
+        if normalize and not input.is_floating_point():
+            input = self.normalize(input)
+        return input
+def to_tensor(input, device='cuda'):
+    """Convert input to tensor."""
+    if input is None:
+        return input
+    if not isinstance(input, torch.Tensor):
+        input = torch.from_numpy(input)
+    device = torch.device(device, cfg.GPU_ID)
+    return input.to(device=device)
--- a/seetadet/onnx/nodes.py
+++ b/seetadet/onnx/nodes.py
@@ -13,16 +13,15 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-from dragon.vm.onnx.core import exporter
 from dragon.vm.onnx.core import helper
+from dragon.vm.onnx.core.exporters import utils as export_util
-@exporter.register('RetinanetDecoder')
+@export_util.register('RetinaNetDecoder')
-def retinanet_decoder_exporter(op_def, shape_dict, ws):
+def retinanet_decoder_exporter(op_def, context):
-    node, const_tensors = exporter.translate(**locals())
+    node, const_tensors = export_util.translate(**locals())
-    node.op_type = 'ATen'  # Currently not supported in ai.onnx
+    node.op_type = 'ATen'  # Currently not supported in ai.onnx.
    helper.add_attribute(node, 'op_type', 'RetinaNetDecoder')
    for arg in op_def.arg:
        if arg.name == 'strides':
            helper.add_attribute(node, 'strides', arg.ints)
@@ -34,16 +33,14 @@ def retinanet_decoder_exporter(op_def, shape_dict, ws):
            helper.add_attribute(node, 'pre_nms_top_n', arg.i)
        elif arg.name == 'score_thresh':
            helper.add_attribute(node, 'score_thresh', arg.f)
    return node, const_tensors
-@exporter.register('RPNDecoder')
+@export_util.register('RPNDecoder')
-def rpn_decoder_exporter(op_def, shape_dict, ws):
+def rpn_decoder_exporter(op_def, context):
-    node, const_tensors = exporter.translate(**locals())
+    node, const_tensors = export_util.translate(**locals())
-    node.op_type = 'ATen'  # Currently not supported in ai.onnx
+    node.op_type = 'ATen'  # Currently not supported in ai.onnx.
    helper.add_attribute(node, 'op_type', 'RPNDecoder')
    for arg in op_def.arg:
        if arg.name == 'strides':
            helper.add_attribute(node, 'strides', arg.ints)
@@ -61,9 +58,4 @@ def rpn_decoder_exporter(op_def, shape_dict, ws):
            helper.add_attribute(node, 'min_level', arg.i)
        elif arg.name == 'max_level':
            helper.add_attribute(node, 'max_level', arg.i)
-        elif arg.name == 'canonical_scale':
-            helper.add_attribute(node, 'canonical_scale', arg.i)
-        elif arg.name == 'canonical_level':
-            helper.add_attribute(node, 'canonical_level', arg.i)
    return node, const_tensors
--- a/seetadet/ops/vision.py
+++ b/seetadet/ops/vision.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Vision ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torchvision
+from dragon.vm.torch import nn
+from dragon.vm.torch import autograd
+class RoIPooler(nn.Module):
+    """Resample RoI features into a fixed resolution."""
+    def __init__(self, pooler_type='RoIAlign', resolution=7, sampling_ratio=1.0):
+        super(RoIPooler, self).__init__()
+        self.pooler_type = pooler_type
+        self.resolution = resolution
+        self.sampling_ratio = sampling_ratio
+    def forward(self, input, boxes, spatial_scale=1.0):
+        if self.pooler_type == 'RoIPool':
+            return torchvision.ops.roi_pool(
+                input, boxes,
+                output_size=(self.resolution, self.resolution),
+                spatial_scale=spatial_scale)
+        elif self.pooler_type == 'RoIAlign':
+            return torchvision.ops.roi_align(
+                input, boxes,
+                output_size=(self.resolution, self.resolution),
+                spatial_scale=spatial_scale,
+                sampling_ratio=self.sampling_ratio)
+        else:
+            raise NotImplementedError
+class NonMaxSuppression(object):
+    """Filter out boxes that have high IoU with selected ones."""
+    @staticmethod
+    def apply(input, iou_threshold=0.5):
+        return autograd.Function.apply(
+            'NonMaxSuppression', input.device, [input],
+            iou_threshold=iou_threshold)
+    autograd.Function.register(
+        'NonMaxSuppression', lambda **kwargs: {
+            'iou_threshold': kwargs.get('iou_threshold', 0.5),
+        })
--- a/seetadet/solver/lr_scheduler.py
+++ b/seetadet/solver/lr_scheduler.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import math
-from seetadet.core.config import cfg
-class _LRScheduler(object):
-    def __init__(
-        self,
-        lr_max,
-        lr_min=0.,
-        warmup_steps=0,
-        warmup_factor=0.,
-    ):
-        self._step_count = 0
-        self._lr_max = lr_max
-        self._lr_min = lr_min
-        self._warmup_steps = warmup_steps
-        self._warmup_factor = warmup_factor
-        self._last_lr = self._lr_max
-        self._last_steps = self._warmup_steps
-    def step(self):
-        self._step_count += 1
-    def get_lr(self):
-        if self._step_count < self._warmup_steps:
-            alpha = (self._step_count + 1.) / self._warmup_steps
-            decay_factor = self._warmup_factor * (1 - alpha) + alpha
-            self._last_lr = self._lr_max * decay_factor
-            return self._last_lr
-        return self.schedule_impl()
-    def schedule_impl(self):
-        raise NotImplementedError
-class CosineLR(_LRScheduler):
-    def __init__(
-        self,
-        lr_max,
-        lr_min,
-        decay_step,
-        max_steps,
-        warmup_steps=0,
-        warmup_factor=0.,
-    ):
-        super(CosineLR, self).__init__(
-            lr_max=lr_max,
-            lr_min=lr_min,
-            warmup_steps=warmup_steps,
-            warmup_factor=warmup_factor,
-        )
-        self._decay_step = decay_step
-        self._max_steps = max_steps - warmup_steps
-    def schedule_impl(self):
-        step_count = self._step_count - self._last_steps
-        if step_count % self._decay_step == 0:
-            decay_factor = 0.5 * (1. + math.cos(
-                math.pi * step_count / self._max_steps))
-            self._last_lr = self._lr_min + \
-                (self._lr_max - self._lr_min) * decay_factor
-        return self._last_lr
-class MultiStepLR(_LRScheduler):
-    def __init__(
-        self,
-        lr_max,
-        decay_steps,
-        decay_gamma,
-        warmup_steps=0,
-        warmup_factor=0.,
-    ):
-        super(MultiStepLR, self).__init__(
-            lr_max=lr_max,
-            warmup_steps=warmup_steps,
-            warmup_factor=warmup_factor,
-        )
-        self._decay_steps = decay_steps
-        self._decay_gamma = decay_gamma
-        self._stage_count = 0
-        self._num_stages = len(self._decay_steps)
-    def schedule_impl(self):
-        if self._stage_count < self._num_stages:
-            k = self._decay_steps[self._stage_count]
-            while self._step_count >= k:
-                self._stage_count += 1
-                if self._stage_count >= self._num_stages:
-                    break
-                k = self._decay_steps[self._stage_count]
-            self._last_lr = self._lr_max * (
-                self._decay_gamma ** self._stage_count)
-        return self._last_lr
-class LinearCosineLR(_LRScheduler):
-    def __init__(
-        self,
-        lr_max,
-        lr_min,
-        decay_step,
-        max_steps,
-        warmup_steps=0,
-        warmup_factor=0.,
-    ):
-        super(LinearCosineLR, self).__init__(
-            lr_max=lr_max,
-            lr_min=lr_min,
-            warmup_steps=warmup_steps,
-            warmup_factor=warmup_factor,
-        )
-        self._decay_step = decay_step
-        self._max_steps = max_steps - warmup_steps
-    def schedule_impl(self):
-        step_count = self._step_count - self._last_steps
-        if step_count % self._decay_step == 0:
-            linear_decay = 1. - float(step_count) / self._max_steps
-            cosine_decay = 0.5 * (1. + math.cos(
-                math.pi * step_count / self._max_steps))
-            decay_factor = linear_decay * cosine_decay
-            self._last_lr = \
-                self._lr_min + (self._lr_max - self._lr_min) * decay_factor
-        return self._last_lr
-class StepLR(_LRScheduler):
-    def __init__(
-        self,
-        lr_max,
-        decay_step,
-        decay_gamma,
-        warmup_steps=0,
-        warmup_factor=0.,
-    ):
-        super(StepLR, self).__init__(
-            lr_max=lr_max,
-            warmup_steps=warmup_steps,
-            warmup_factor=warmup_factor,
-        )
-        self._decay_step = decay_step
-        self._decay_gamma = decay_gamma
-    def schedule_impl(self):
-        step_count = self._step_count - self._last_steps
-        if step_count % self._decay_step == 0:
-            decay_factor = step_count // self._decay_step
-            self._last_lr = self._lr_max * (
-                self._decay_gamma ** decay_factor)
-        return self._last_lr
-def get_scheduler():
-    lr_policy = cfg.SOLVER.LR_POLICY
-    if lr_policy == 'cosine_decay':
-        return CosineLR(
-            lr_max=cfg.SOLVER.BASE_LR,
-            lr_min=0.,
-            decay_step=cfg.SOLVER.DECAY_STEP,
-            max_steps=cfg.SOLVER.MAX_STEPS,
-            warmup_steps=cfg.SOLVER.WARM_UP_STEPS,
-            warmup_factor=cfg.SOLVER.WARM_UP_FACTOR,
-        )
-    elif lr_policy == 'linear_cosine_decay':
-        return LinearCosineLR(
-            lr_max=cfg.SOLVER.BASE_LR,
-            lr_min=0.,
-            decay_step=cfg.SOLVER.DECAY_STEP,
-            max_steps=cfg.SOLVER.MAX_STEPS,
-            warmup_steps=cfg.SOLVER.WARM_UP_STEPS,
-            warmup_factor=cfg.SOLVER.WARM_UP_FACTOR,
-        )
-    elif lr_policy == 'step':
-        return StepLR(
-            lr_max=cfg.SOLVER.BASE_LR,
-            decay_step=cfg.SOLVER.DECAY_STEP,
-            decay_gamma=cfg.SOLVER.DECAY_GAMMA,
-            warmup_steps=cfg.SOLVER.WARM_UP_STEPS,
-            warmup_factor=cfg.SOLVER.WARM_UP_FACTOR,
-        )
-    elif lr_policy == 'steps_with_decay':
-        return MultiStepLR(
-            lr_max=cfg.SOLVER.BASE_LR,
-            decay_steps=cfg.SOLVER.DECAY_STEPS,
-            decay_gamma=cfg.SOLVER.DECAY_GAMMA,
-            warmup_steps=cfg.SOLVER.WARM_UP_STEPS,
-            warmup_factor=cfg.SOLVER.WARM_UP_FACTOR,
-        )
-    else:
-        raise ValueError('Unknown lr policy: ' + lr_policy)
-if __name__ == '__main__':
-    def extract_label(scheduler):
-        class_name = scheduler.__class__.__name__
-        label = class_name + '('
-        if class_name == 'CosineLR':
-            label += 'α=' + str(scheduler._decay_step)
-        elif class_name == 'LinearCosineLR':
-            label += 'α=' + str(scheduler._decay_step)
-        elif class_name == 'MultiStepLR':
-            label += 'α=' + str(scheduler._decay_steps) + ', '
-            label += 'γ=' + str(scheduler._decay_gamma)
-        elif class_name == 'StepLR':
-            label += 'α=' + str(scheduler._decay_step) + ', '
-            label += 'γ=' + str(scheduler._decay_gamma)
-        label += ')'
-        return label
-    vis = True
-    max_steps = 240
-    shared_args = {
-        'lr_max': 0.4,
-        'warmup_steps': 5,
-        'warmup_factor': 0.,
-    }
-    schedulers = [
-        StepLR(decay_step=1, decay_gamma=0.97, **shared_args),
-        MultiStepLR(decay_steps=[60, 120, 180], decay_gamma=0.1, **shared_args),
-        CosineLR(lr_min=0., decay_step=1, max_steps=max_steps, **shared_args),
-        LinearCosineLR(lr_min=0., decay_step=1, max_steps=max_steps, **shared_args),
-    ]
-    for i in range(max_steps):
-        info = 'Step = %d\n' % i
-        for scheduler in schedulers:
-            if i == 0:
-                scheduler.lr_seq = []
-            info += '  * {}: {}\n'.format(
-                extract_label(scheduler),
-                scheduler.get_lr())
-            scheduler.lr_seq.append(scheduler.get_lr())
-            scheduler.step()
-        if not vis:
-            print(info)
-    if vis:
-        import matplotlib.pyplot as plt
-        plt.figure(1)
-        plt.title('Visualization of different LR Schedulers')
-        plt.xlabel('Step')
-        plt.ylabel('Learning Rate')
-        line = '-'
-        colors = ['r', 'g', 'b', 'c', 'm', 'y', 'k']
-        for i, scheduler in enumerate(schedulers):
-            plt.plot(
-                range(max_steps),
-                scheduler.lr_seq,
-                colors[i] + line,
-                linewidth=1.,
-                label=extract_label(scheduler),
-            )
-        plt.legend()
-        plt.grid(linestyle='--')
-        plt.show()
--- a/seetadet/solver/sgd.py
+++ b/seetadet/solver/sgd.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import dragon.vm.torch as torch
-from seetadet.core.config import cfg
-from seetadet.modeling.detector import Detector
-from seetadet.solver import lr_scheduler
-from seetadet.utils import env
-from seetadet.utils import time_util
-class SGDSolver(object):
-    def __init__(self):
-        # Define the generic detector
-        self.detector = Detector()
-        # Define the optimizer and its arguments
-        self.optimizer = torch.optim.SGD(
-            env.get_param_groups(self.detector),
-            lr=cfg.SOLVER.BASE_LR,
-            momentum=cfg.SOLVER.MOMENTUM,
-            weight_decay=cfg.SOLVER.WEIGHT_DECAY,
-            clip_norm=float(cfg.SOLVER.CLIP_NORM),
-            scale=1.0 / cfg.SOLVER.LOSS_SCALING,
-        )
-        self.lr_scheduler = lr_scheduler.get_scheduler()
-    def step(self):
-        def add_loss(x, y):
-            return y if x is None else x + y
-        stats = {
-            'iter': self.iter,
-            'loss': {'total': 0.},
-            'time': time_util.Timer(),
-        }
-        with stats['time'].tic_and_toc():
-            # Forward pass
-            outputs = self.detector()
-            # Backward pass
-            total_loss = None
-            loss_scaling = cfg.SOLVER.LOSS_SCALING
-            for k, v in outputs.items():
-                if 'loss' in k:
-                    if k not in stats['loss']:
-                        stats['loss'][k] = 0.
-                    total_loss = add_loss(total_loss, v)
-                    stats['loss'][k] += float(v)
-            stats['loss']['total'] += float(total_loss)
-            if loss_scaling != 1.0:
-                total_loss *= loss_scaling
-            total_loss.backward()
-            # Apply Update
-            self.base_lr = self.lr_scheduler.get_lr()
-            self.optimizer.step()
-            self.lr_scheduler.step()
-        # Misc stats
-        stats['lr'] = self.base_lr
-        stats['time'] = stats['time'].total_time
-        return stats
-    @property
-    def base_lr(self):
-        return self.optimizer.param_groups[0]['lr']
-    @base_lr.setter
-    def base_lr(self, value):
-        for group in self.optimizer.param_groups:
-            group['lr'] = value
-    @property
-    def iter(self):
-        return self.lr_scheduler._step_count
-    @iter.setter
-    def iter(self, value):
-        self.lr_scheduler._step_count = value
--- a/seetadet/utils/attrdict.py
+++ b/seetadet/utils/attrdict.py
-# Copyright (c) 2017-present, Facebook, Inc.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-##############################################################################
-"""A simple attribute dictionary used for representing configuration options."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from __future__ import unicode_literals
-class AttrDict(dict):
-    IMMUTABLE = '__immutable__'
-    def __init__(self, *args, **kwargs):
-        super(AttrDict, self).__init__(*args, **kwargs)
-        self.__dict__[AttrDict.IMMUTABLE] = False
-    def __getattr__(self, name):
-        if name in self.__dict__:
-            return self.__dict__[name]
-        elif name in self:
-            return self[name]
-        else:
-            raise AttributeError(name)
-    def __setattr__(self, name, value):
-        if not self.__dict__[AttrDict.IMMUTABLE]:
-            if name in self.__dict__:
-                self.__dict__[name] = value
-            else:
-                self[name] = value
-        else:
-            raise AttributeError(
-                'Attempted to set "{}" to "{}", but AttrDict is immutable'
-                .format(name, value))
-    def immutable(self, is_immutable):
-        """Set immutability to is_immutable and recursively apply the setting
-        to all nested AttrDicts.
-        """
-        self.__dict__[AttrDict.IMMUTABLE] = is_immutable
-        # Recursively set immutable state
-        for v in self.__dict__.values():
-            if isinstance(v, AttrDict):
-                v.immutable(is_immutable)
-        for v in self.values():
-            if isinstance(v, AttrDict):
-                v.immutable(is_immutable)
-    def is_immutable(self):
-        return self.__dict__[AttrDict.IMMUTABLE]
--- a/seetadet/utils/bbox/__init__.py
+++ b/seetadet/utils/bbox/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Bounding-Box utilities."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.utils.bbox.helper import filter_boxes
+from seetadet.utils.bbox.helper import flip_boxes
+from seetadet.utils.bbox.helper import flip_polygons
+from seetadet.utils.bbox.helper import clip_boxes
+from seetadet.utils.bbox.helper import clip_tiled_boxes
+from seetadet.utils.bbox.helper import distribute_boxes
+from seetadet.utils.bbox.helper import rescale_boxes
+from seetadet.utils.bbox.metrics import bbox_overlaps
+from seetadet.utils.bbox.metrics import bbox_centerness
+from seetadet.utils.bbox.metrics import boxes_iou
+from seetadet.utils.bbox.transforms import bbox_transform
+from seetadet.utils.bbox.transforms import bbox_transform_inv
--- a/seetadet/utils/bbox/helper.py
+++ b/seetadet/utils/bbox/helper.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Helper functions for Bounding-Box."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+def clip_boxes(boxes, im_shape):
+    """Clip the boxes."""
+    xmax, ymax = im_shape[1] - 1, im_shape[0] - 1
+    boxes[:, (0, 2)] = np.maximum(np.minimum(boxes[:, (0, 2)], xmax), 0)
+    boxes[:, (1, 3)] = np.maximum(np.minimum(boxes[:, (1, 3)], ymax), 0)
+    return boxes
+def clip_tiled_boxes(boxes, im_shape):
+    """Clip the tiled boxes."""
+    xmax, ymax = im_shape[1] - 1, im_shape[0] - 1
+    boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], xmax), 0)
+    boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], ymax), 0)
+    boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], xmax), 0)
+    boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], ymax), 0)
+    return boxes
+def rescale_boxes(boxes, scale_factor=1.):
+    """Rescale the boxes."""
+    w = (boxes[:, 2] - boxes[:, 0]) * 0.5 * scale_factor
+    h = (boxes[:, 3] - boxes[:, 1]) * 0.5 * scale_factor
+    x_ctr = (boxes[:, 2] + boxes[:, 0]) * 0.5
+    y_ctr = (boxes[:, 3] + boxes[:, 1]) * 0.5
+    boxes_rescaled = np.zeros(boxes.shape)
+    boxes_rescaled[:, 0], boxes_rescaled[:, 1] = x_ctr - w, y_ctr - h
+    boxes_rescaled[:, 2], boxes_rescaled[:, 3] = x_ctr + w, y_ctr + h
+    return boxes_rescaled
+def flip_boxes(boxes, width):
+    """Flip the boxes horizontally."""
+    boxes_flipped = boxes.copy()
+    boxes_flipped[:, 0] = width - boxes[:, 2] - 1
+    boxes_flipped[:, 2] = width - boxes[:, 0] - 1
+    return boxes_flipped
+def flip_polygons(polygons, width):
+    """Flip the polygons horizontally."""
+    for i, poly in enumerate(polygons):
+        poly_flipped = poly.copy()
+        poly_flipped[0::2] = width - poly[0::2] - 1
+        polygons[i] = poly_flipped
+    return polygons
+def filter_boxes(boxes, min_size):
+    """Remove all boxes with any side smaller than min size."""
+    ws = boxes[:, 2] - boxes[:, 0] + 1
+    hs = boxes[:, 3] - boxes[:, 1] + 1
+    keep = np.where((ws >= min_size) & (hs >= min_size))[0]
+    return keep
+def distribute_boxes(boxes, lvl_min, lvl_max):
+    """Return the fpn level of boxes."""
+    if len(boxes) == 0:
+        return []
+    ws = boxes[:, 2] - boxes[:, 0] + 1
+    hs = boxes[:, 3] - boxes[:, 1] + 1
+    s = np.sqrt(ws * hs)
+    s0 = 224  # default: 224
+    lvl0 = 4  # default: 4
+    lvls = np.floor(lvl0 + np.log2(s / s0 + 1e-6))
+    return np.clip(lvls, lvl_min, lvl_max)
--- a/seetadet/utils/boxes_v2.py
+++ b/seetadet/utils/boxes_v2.py
@@ -8,60 +8,83 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
-"""Box utilities for normalized coordinates."""
+"""Bounding-Box metrics."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
+from seetadet.utils.bbox import cython_bbox
 import numpy as np
+def bbox_overlaps(boxes1, boxes2):
+    """Compute the overlaps between two group of boxes."""
+    boxes1 = np.ascontiguousarray(boxes1, dtype=np.float)
+    boxes2 = np.ascontiguousarray(boxes2, dtype=np.float)
+    return cython_bbox.bbox_overlaps(boxes1, boxes2)
+def bbox_centerness(boxes1, boxes2):
+    """Compute centerness of the boxes to ground-truth."""
+    ctr_x = (boxes1[:, 2] + boxes1[:, 0]) / 2
+    ctr_y = (boxes1[:, 3] + boxes1[:, 1]) / 2
+    l = ctr_x - boxes2[:, 0]
+    t = ctr_y - boxes2[:, 1]
+    r = boxes2[:, 2] - ctr_x
+    b = boxes2[:, 3] - ctr_y
+    centerness = ((np.minimum(l, r) / np.maximum(l, r)) *
+                  (np.minimum(t, b) / np.maximum(t, b)))
+    min_dist = np.stack([l, t, r, b], axis=1).min(axis=1)
+    keep_inds = np.where(min_dist > 0.01)[0]
+    discard_inds = np.where(min_dist <= 0.01)[0]
+    centerness[keep_inds] = np.sqrt(centerness[keep_inds])
+    centerness[discard_inds] = -1
+    return centerness, keep_inds, discard_inds
 def boxes_area(boxes):
    """Compute the area of input boxes."""
    w = (boxes[:, 2] - boxes[:, 0])
    h = (boxes[:, 3] - boxes[:, 1])
-    area = w * h
+    return w * h
-    assert np.all(area >= 0), 'Negative areas founds'
-    return area
-def intersection(box1, box2):
+def boxes_intersection(boxes1, boxes2):
    """Compute intersection between boxes."""
-    [y_min1, x_min1, y_max1, x_max1] = np.split(box1, 4, axis=1)
+    [y_min1, x_min1, y_max1, x_max1] = np.split(boxes1, 4, axis=1)
-    [y_min2, x_min2, y_max2, x_max2] = np.split(box2, 4, axis=1)
+    [y_min2, x_min2, y_max2, x_max2] = np.split(boxes2, 4, axis=1)
    all_pairs_min_ymax = np.minimum(y_max1, np.transpose(y_max2))
    all_pairs_max_ymin = np.maximum(y_min1, np.transpose(y_min2))
    all_pairs_min_xmax = np.minimum(x_max1, np.transpose(x_max2))
    all_pairs_max_xmin = np.maximum(x_min1, np.transpose(x_min2))
-    inter_heights = np.maximum(
+    inter_heights = np.maximum(np.zeros(all_pairs_max_ymin.shape),
-        np.zeros(all_pairs_max_ymin.shape),
                               all_pairs_min_ymax - all_pairs_max_ymin)
-    inter_widths = np.maximum(
+    inter_widths = np.maximum(np.zeros(all_pairs_max_xmin.shape),
-        np.zeros(all_pairs_max_xmin.shape),
                              all_pairs_min_xmax - all_pairs_max_xmin)
    return inter_heights * inter_widths
-def ioa1(box1, box2):
+def boxes_ioa1(boxes1, boxes2):
    """Compute intersection-over-area1 between boxes."""
-    inter = intersection(box1, box2)
+    inter = boxes_intersection(boxes1, boxes2)
-    area = np.expand_dims(boxes_area(box1), axis=1)
+    area = np.expand_dims(boxes_area(boxes1), axis=1)
    return inter / area
-def ioa2(box1, box2):
+def boxes_ioa2(boxes1, boxes2):
    """Compute intersection-over-area2 between boxes."""
-    inter = intersection(box1, box2)
+    inter = boxes_intersection(boxes1, boxes2)
-    area = np.expand_dims(boxes_area(box2), axis=0)
+    area = np.expand_dims(boxes_area(boxes2), axis=0)
    return inter / area
-def iou(box1, box2):
+def boxes_iou(boxes1, boxes2):
    """Compute intersection-over-union between boxes."""
-    inter = intersection(box1, box2)
+    inter = boxes_intersection(boxes1, boxes2)
-    area1 = boxes_area(box1)
+    area1 = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1])
-    area2 = boxes_area(box2)
+    area2 = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1])
    union = (np.expand_dims(area1, axis=1) +
             np.expand_dims(area2, axis=0) - inter)
    return inter / union
--- a/seetadet/utils/boxes.py
+++ b/seetadet/utils/boxes.py
@@ -8,7 +8,7 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
-"""Box utilities for original coordinates."""
+"""Bounding-Box transforms."""
 from __future__ import absolute_import
 from __future__ import division
@@ -16,15 +16,7 @@ from __future__ import print_function
 import numpy as np
-from seetadet.utils import cython_bbox
+_DEFAULT_SCALE_CLIP = np.log(1333.0 / 4.0)
-def bbox_overlaps(boxes1, boxes2):
-    """Compute the overlaps between two group of boxes."""
-    return cython_bbox.bbox_overlaps(
-        np.ascontiguousarray(boxes1, dtype=np.float),
-        np.ascontiguousarray(boxes2, dtype=np.float),
-    )
 def bbox_transform(ex_rois, gt_rois, weights=(1., 1., 1., 1.)):
@@ -33,137 +25,41 @@ def bbox_transform(ex_rois, gt_rois, weights=(1., 1., 1., 1.)):
    ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.
    ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
    ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights
    gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.
    gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.
    gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
    gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights
    wx, wy, ww, wh = weights
    targets = [wx * (gt_ctr_x - ex_ctr_x) / ex_widths]
    targets += [wy * (gt_ctr_y - ex_ctr_y) / ex_heights]
    targets += [ww * np.log(gt_widths / ex_widths)]
    targets += [wh * np.log(gt_heights / ex_heights)]
    return np.vstack(targets).transpose()
-def bbox_centerness(ex_rois, gt_rois):
+def bbox_transform_inv(boxes, deltas, weights=(1., 1., 1., 1.)):
-    """Compute centerness of the boxes to ground-truth."""
-    ex_ctr_x = (ex_rois[:, 2] + ex_rois[:, 0]) / 2
-    ex_ctr_y = (ex_rois[:, 3] + ex_rois[:, 1]) / 2
-    l = ex_ctr_x - gt_rois[:, 0]
-    t = ex_ctr_y - gt_rois[:, 1]
-    r = gt_rois[:, 2] - ex_ctr_x
-    b = gt_rois[:, 3] - ex_ctr_y
-    centerness = \
-        (np.minimum(l, r) / np.maximum(l, r)) * \
-        (np.minimum(t, b) / np.maximum(t, b))
-    min_dist = np.stack([l, t, r, b], axis=1).min(axis=1)
-    keep_inds = np.where(min_dist > 0.01)[0]
-    discard_inds = np.where(min_dist <= 0.01)[0]
-    centerness[keep_inds] = np.sqrt(centerness[keep_inds])
-    centerness[discard_inds] = -1
-    return centerness, keep_inds, discard_inds
-def bbox_transform_inv(boxes, deltas, weights=(1., 1., 1., 1.), clip=None):
    """Decode the final boxes according to the deltas."""
    if boxes.shape[0] == 0:
        return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
    boxes = boxes.astype(deltas.dtype, copy=False)
    widths = boxes[:, 2] - boxes[:, 0] + 1.
    heights = boxes[:, 3] - boxes[:, 1] + 1.
    ctr_x = boxes[:, 0] + 0.5 * widths
    ctr_y = boxes[:, 1] + 0.5 * heights
    wx, wy, ww, wh = weights
    dx = deltas[:, 0::4] / wx
    dy = deltas[:, 1::4] / wy
    dw = deltas[:, 2::4] / ww
    dh = deltas[:, 3::4] / wh
+    dw = np.minimum(dw, _DEFAULT_SCALE_CLIP)
-    # Heuristically clip height and width deltas
+    dh = np.minimum(dh, _DEFAULT_SCALE_CLIP)
-    # to avoid too large value in np.exp(...)
-    if clip is not None:
-        dw = np.minimum(dw, clip)
-        dh = np.minimum(dh, clip)
    pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
    pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
    pred_w = np.exp(dw) * widths[:, np.newaxis]
    pred_h = np.exp(dh) * heights[:, np.newaxis]
    pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
-    pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w      # x1
+    pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
-    pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h      # y1
+    pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
-    pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w - 1  # x2
+    pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w - 1
-    pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h - 1  # y2
+    pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h - 1
    return pred_boxes
-def clip_boxes(boxes, im_shape):
-    # x1 >= 0
-    boxes[:, 0] = np.maximum(np.minimum(boxes[:, 0], im_shape[1] - 1), 0)
-    # y1 >= 0
-    boxes[:, 1] = np.maximum(np.minimum(boxes[:, 1], im_shape[0] - 1), 0)
-    # x2 < im_shape[1]
-    boxes[:, 2] = np.maximum(np.minimum(boxes[:, 2], im_shape[1] - 1), 0)
-    # y2 < im_shape[0]
-    boxes[:, 3] = np.maximum(np.minimum(boxes[:, 3], im_shape[0] - 1), 0)
-    return boxes
-def clip_tiled_boxes(boxes, im_shape):
-    # x1 >= 0
-    boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
-    # y1 >= 0
-    boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
-    # x2 < im_shape[1]
-    boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
-    # y2 < im_shape[0]
-    boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
-    return boxes
-def expand_boxes(boxes, scale):
-    """Expand an array of boxes by a given scale."""
-    w_half = (boxes[:, 2] - boxes[:, 0]) * .5
-    h_half = (boxes[:, 3] - boxes[:, 1]) * .5
-    x_c = (boxes[:, 2] + boxes[:, 0]) * .5
-    y_c = (boxes[:, 3] + boxes[:, 1]) * .5
-    w_half *= scale
-    h_half *= scale
-    boxes_exp = np.zeros(boxes.shape)
-    boxes_exp[:, 0] = x_c - w_half
-    boxes_exp[:, 2] = x_c + w_half
-    boxes_exp[:, 1] = y_c - h_half
-    boxes_exp[:, 3] = y_c + h_half
-    return boxes_exp
-def flip_boxes(boxes, width):
-    """Flip the boxes horizontally."""
-    boxes_flipped = boxes.copy()
-    boxes_flipped[:, 0] = width - boxes[:, 2] - 1
-    boxes_flipped[:, 2] = width - boxes[:, 0] - 1
-    return boxes_flipped
-def flip_polygons(polygons, width):
-    """Flip the polygons horizontally."""
-    for i, poly in enumerate(polygons):
-        poly_flipped = poly.copy()
-        poly_flipped[0::2] = width - poly[0::2] - 1
-        polygons[i] = poly_flipped
-    return polygons
-def filter_boxes(boxes, min_size):
-    """Remove all boxes with any side smaller than min size."""
-    ws = boxes[:, 2] - boxes[:, 0] + 1
-    hs = boxes[:, 3] - boxes[:, 1] + 1
-    keep = np.where((ws >= min_size) & (hs >= min_size))[0]
-    return keep
--- a/seetadet/utils/blob.py
+++ b/seetadet/utils/blob.py
@@ -7,11 +7,8 @@
 #
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
-# Codes are based on:
-#
-#     <https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/utils/blob.py>
-#
 # ------------------------------------------------------------
+"""Blob utilities."""
 from __future__ import absolute_import
 from __future__ import division
@@ -19,26 +16,29 @@ from __future__ import print_function
 import numpy as np
-from seetadet.core.config import cfg
-def im_list_to_blob(ims, coarsest_stride=0):
+def blob_vstack(arrays, fill_value=None, dtype=None, size=None, align=None):
-    """Convert a list of images into a network input."""
+    """Stack arrays in sequence vertically."""
-    blob_dtype = 'uint8' if ims[0].dtype == 'uint8' else 'float32'
+    if fill_value is None:
-    max_shape = np.array([im.shape for im in ims]).max(axis=0)
+        return np.vstack(arrays)
-    if coarsest_stride > 0:
+    # Compute the max stack shape.
-        stride = coarsest_stride
+    max_shape = np.max(np.stack([arr.shape for arr in arrays]), 0)
-        max_shape[0] = int(np.ceil(max_shape[0] / stride) * stride)
+    if size is not None and min(size) > 0:
-        max_shape[1] = int(np.ceil(max_shape[1] / stride) * stride)
+        max_shape[:len(size)] = size
+    if align is not None and min(align) > 0:
+        align_size = np.ceil(max_shape[:len(align)] / align)
+        max_shape[:len(align)] = align_size.astype('int64') * align
-    blob_shape = (len(ims), max_shape[0], max_shape[1], 3)
+    # Fill output with the given value.
-    blob = np.empty(blob_shape, blob_dtype)
+    output_dtype = dtype or arrays[0].dtype
-    blob[:] = cfg.PIXEL_MEANS
+    output_shape = [len(arrays)] + list(max_shape)
+    output = np.empty(output_shape, output_dtype)
+    output[:] = fill_value
-    for i, im in enumerate(ims):
+    # Copy arrays.
-        if im.dtype == 'uint16':
+    for i, arr in enumerate(arrays):
-            im = im.astype(blob_dtype) / 256.
+        copy_slices = (slice(0, d) for d in arr.shape)
-        blob[i, :im.shape[0], :im.shape[1], :] = im
+        output[(i,) + tuple(copy_slices)] = arr
-    return blob
+    return output
--- a/seetadet/utils/env.py
+++ b/seetadet/utils/env.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import importlib.machinery
-import os
-import dragon
-import numpy as np
-from dragon.core.util import six
-from dragon.vm import torch
-from seetadet.core.config import cfg
-def freeze_module(module):
-    """Freeze parameters of given module.
-    Parameters
-    ----------
-    module : dragon.vm.torch.nn.Module
-        The module to freeze parameters.
-    """
-    for param in list(module._parameters.keys()):
-        module._parameters[param].requires_grad = False
-        module._buffers[param] = module._parameters[param]
-        del module._parameters[param]
-def get_param_groups(module):
-    """Separate parameters for different weight decay.
-    Parameters
-    ----------
-    module : dragon.vm.torch.nn.Module
-        The module to collect parameters.
-    Returns
-    -------
-    Sequence[ParamGroup]
-        The parameter groups.
-    """
-    param_groups = [
-        {'params': [], 'weight_decay': cfg.SOLVER.WEIGHT_DECAY},
-        {'params': [], 'weight_decay': 0.},
-        {'params': [], 'weight_decay': cfg.SOLVER.WEIGHT_DECAY_BIAS},
-    ]
-    legacy_biases = set()
-    for name, param in module.named_parameters():
-        if name.endswith('weight') and param.dim() > 1:
-            legacy_biases.add(name[:-6] + 'bias')
-    for name, param in module.named_parameters():
-        gi = 0 if 'weight' in name and param.dim() > 1 else 1
-        if gi > 0 and name in legacy_biases:
-            gi = 2
-        param_groups[gi]['params'].append(param)
-    return list(filter(lambda g: len(g['params']) > 0, param_groups))
-def load_library(library_prefix):
-    """Load a shared library.
-    Parameters
-    ----------
-    library_prefix : str
-        The prefix of library.
-    """
-    loader_details = (
-        importlib.machinery.ExtensionFileLoader,
-        importlib.machinery.EXTENSION_SUFFIXES)
-    library_prefix = os.path.abspath(library_prefix)
-    lib_dir, fullname = os.path.split(library_prefix)
-    finder = importlib.machinery.FileFinder(lib_dir, loader_details)
-    ext_specs = finder.find_spec(fullname)
-    if ext_specs is None:
-        raise ImportError(
-            'Could not find the pre-built library '
-            'for <%s>.' % library_prefix)
-    dragon.load_library(ext_specs.origin)
-def new_tensor(data, enforce_cpu=False):
-    """Create a new tensor from the data.
-    Parameters
-    ----------
-    data : array_like
-        The data value.
-    enforce_cpu : bool, optional, default=False
-        **True** to enforce the cpu storage.
-    Returns
-    -------
-    dragon.vm.torch.Tensor
-        The tensor taken with the data.
-    """
-    if data is None:
-        return data
-    if isinstance(data, np.ndarray):
-        tensor = torch.from_numpy(data)
-    elif isinstance(data, torch.Tensor):
-        tensor = data
-    else:
-        tensor = torch.tensor(data)
-    if not enforce_cpu:
-        tensor = tensor.cuda(cfg.GPU_ID)
-    return tensor
-# Aliases
-pickle = six.moves.pickle
--- a/seetadet/utils/image.py
+++ b/seetadet/utils/image.py
@@ -8,111 +8,74 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Image utilities."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 import numpy as np
-import numpy.random as npr
 import PIL.Image
 import PIL.ImageEnhance
-from seetadet.core.config import cfg
+def im_resize(img, size=None, scale=None, mode='linear'):
-def distort_image(img):
+    """Resize image by the scale or size."""
-    """Distort the brightness, contrast and color of an image."""
-    img = PIL.Image.fromarray(img)
-    transforms = [PIL.ImageEnhance.Brightness,
-                  PIL.ImageEnhance.Contrast,
-                  PIL.ImageEnhance.Color]
-    npr.shuffle(transforms)
-    for transform in transforms:
-        if npr.uniform() < 0.5:
-            img = transform(img)
-            img = img.enhance(1. + np.random.uniform(-.4, .4))
-    return np.array(img)
-def get_image_with_target_size(img, target_size, no_offset=False):
-    """Crop or pad an image with the target size."""
-    im_shape = list(img.shape)
-    if not isinstance(target_size, (tuple, list)):
-        target_size = [target_size, target_size]
-    h_diff = target_size[0] - im_shape[0]
-    w_diff = target_size[1] - im_shape[1]
-    def get_param(diff, crop, no_offset):
-        diff = max(-diff if crop else diff, 0)
-        return 0 if no_offset else npr.randint(diff + 1)
-    offset_crop_w = get_param(w_diff, True, no_offset)
-    offset_crop_h = get_param(h_diff, True, no_offset)
-    im_shape[:2] = target_size
-    new_img = np.empty(im_shape, dtype=img.dtype)
-    new_img[:] = cfg.PIXEL_MEANS
-    new_img[:img.shape[0], :img.shape[1]] = \
-        img[offset_crop_h:offset_crop_h + target_size[0],
-            offset_crop_w:offset_crop_w + target_size[1]]
-    offset_w = -offset_crop_w
-    offset_h = -offset_crop_h
-    return new_img, (offset_h, offset_w, target_size)
-def resize_image(img, fx=1.0, fy=1.0, size=None):
-    """Resize an image."""
    if size is None:
-        size = (int(img.shape[1] * fx), int(img.shape[0] * fy))
+        if not isinstance(scale, (tuple, list)):
+            scale = (scale, scale)
+        h, w = img.shape[:2]
+        size = int(h * scale[0] + .5), int(w * scale[1] + .5)
    else:
        if not isinstance(size, (tuple, list)):
            size = (size, size)
+    mode = {'linear': PIL.Image.BILINEAR,
+            'nearest': PIL.Image.NEAREST}[mode]
    img = PIL.Image.fromarray(img)
-    return np.array(img.resize(size, PIL.Image.BILINEAR))
+    return np.array(img.resize(size[::-1], mode))
-def resize_image_with_target_size(
+def im_rescale(img, scales, max_size=0, keep_ratio=True):
-    img,
+    """Rescale image to match the detecting scales."""
-    target_size,
-    max_size=0,
-    random_scales=(1.0, 1.0),
-):
-    """Resize an image with the target size."""
    im_shape = img.shape
-    max_size = max_size if max_size > 0 else target_size
+    img_list, img_scales = [], []
-    # Scale along the shortest side
+    if keep_ratio:
-    im_size_min = np.min(im_shape[:2])
+        size_min = np.min(im_shape[:2])
-    im_size_max = np.max(im_shape[:2])
+        size_max = np.max(im_shape[:2])
-    im_scale = float(target_size) / float(im_size_min)
-    # Prevent the biggest axis from being more than MAX_SIZE
-    if np.round(im_scale * im_size_max) > max_size:
-        im_scale = float(max_size) / float(im_size_max)
-    # Apply the scale jitter to get a range of dynamic scales
-    r = random_scales
-    jitter = r[0] + npr.rand() * (r[1] - r[0])
-    im_scale *= jitter
-    return resize_image(img, im_scale, im_scale), im_scale
-def scale_image(img, scales, max_size=0):
-    """Resize image to match the detecting scales."""
-    processed_images, image_scales = [], []
-    if max_size > 0:
-        im_size_min = np.min(img.shape[:2])
-        im_size_max = np.max(img.shape[:2])
        for target_size in scales:
-            im_scale = float(target_size) / float(im_size_min)
+            im_scale = float(target_size) / float(size_min)
-            if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:
+            target_size_max = max_size if max_size > 0 else target_size
-                im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
+            if np.round(im_scale * size_max) > target_size_max:
-            processed_images.append(resize_image(img, im_scale, im_scale))
+                im_scale = float(target_size_max) / float(size_max)
-            image_scales.append(im_scale)
+            img_list.append(im_resize(img, scale=im_scale))
+            img_scales.append((im_scale, im_scale))
    else:
        for target_size in scales:
-            fy = float(target_size) / img.shape[0]
+            h_scale = float(target_size) / im_shape[0]
-            fx = float(target_size) / img.shape[1]
+            w_scale = float(target_size) / im_shape[1]
-            processed_images.append(resize_image(img, size=target_size))
+            img_list.append(im_resize(img, size=target_size))
-            image_scales.append([fy, fx])
+            img_scales.append((h_scale, w_scale))
-    return processed_images, image_scales
+    return img_list, img_scales
+def color_jitter(img, brightness=None, contrast=None, saturation=None):
+    """Distort the color of image."""
+    def add_transform(transforms, type, range):
+        if range is not None:
+            if not isinstance(range, (tuple, list)):
+                range = (1. - range, 1. + range)
+            transforms.append((type, range))
+    transforms = []
+    contrast_first = np.random.rand() < 0.5
+    add_transform(transforms, PIL.ImageEnhance.Brightness, brightness)
+    if contrast_first:
+        add_transform(transforms, PIL.ImageEnhance.Contrast, contrast)
+    add_transform(transforms, PIL.ImageEnhance.Color, saturation)
+    if not contrast_first:
+        add_transform(transforms, PIL.ImageEnhance.Contrast, contrast)
+    for transform, jitter_range in transforms:
+        if isinstance(img, np.ndarray):
+            img = PIL.Image.fromarray(img)
+        img = transform(img)
+        img = img.enhance(np.random.uniform(*jitter_range))
+    return np.asarray(img)
--- a/seetadet/utils/logger.py
+++ b/seetadet/utils/logger.py
@@ -7,11 +7,8 @@
 #
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
-# Codes are based on:
-#
-#   <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/platform/tf_logging.py>
-#
 # ------------------------------------------------------------
+"""Logging utilities."""
 from __future__ import absolute_import
 from __future__ import division
@@ -25,7 +22,6 @@ import threading
 _logger = None
-_is_root = True
 _logger_lock = threading.Lock()
@@ -38,11 +34,12 @@ def get_logger():
    try:
        if _logger:
            return _logger
-        logger = _logging.getLogger('SeetaDet')
+        logger = _logging.getLogger('seetadet')
        logger.setLevel('INFO')
        logger.propagate = False
+        logger._is_root = True
        if True:
-            # Determine whether we are in an interactive environment
+            # Determine whether we are in an interactive environment.
            _interactive = False
            try:
                # This is only defined in interactive shells.
@@ -108,14 +105,24 @@ def get_verbosity():
 def set_verbosity(v):
-    """Sets the threshold for what messages will be logged."""
+    """Set the threshold for what messages will be logged."""
    get_logger().setLevel(v)
-def set_root_logger(is_root=True):
+def set_formatter(fmt=None, datefmt=None):
-    global _is_root
+    """Set the formatter."""
-    _is_root = is_root
+    handler = _logging.StreamHandler(_sys.stderr)
+    handler.setFormatter(_logging.Formatter(fmt, datefmt))
+    logger = get_logger()
+    logger.removeHandler(logger.handlers[0])
+    logger.addHandler(handler)
+def set_root(is_root=True):
+    """Set logger to the root."""
+    get_logger()._is_root = is_root
 def is_root():
-    return _is_root
+    """Return logger is the root."""
+    return get_logger()._is_root
--- a/seetadet/utils/mask.py
+++ b/seetadet/utils/mask.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-"""Mask utilities with boxes."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import copy
-import cv2
-import numpy as np
-import PIL.Image
-from seetadet.utils.pycocotools import mask as mask_tools
-from seetadet.utils import boxes as box_util
-def warp_mask_via_intersection(mask, box1, box2, size):
-    """Warp mask via intersection."""
-    x1 = max(box1[0], box2[0])
-    y1 = max(box1[1], box2[1])
-    x2 = min(box1[2], min(box2[2], mask.shape[1] - 1))
-    y2 = min(box1[3], min(box2[3], mask.shape[0] - 1))
-    if x1 > x2 or y1 > y2:
-        return None
-    w = x2 - x1 + 1
-    h = y2 - y1 + 1
-    ex_start_y = y1 - box1[1]
-    ex_start_x = x1 - box1[0]
-    inter_mask = mask[y1:y2 + 1, x1:x2 + 1]
-    target_h = box1[3] - box1[1] + 1
-    target_w = box1[2] - box1[0] + 1
-    warped_mask = np.zeros((target_h, target_w), dtype='uint8')
-    warped_mask[ex_start_y:ex_start_y + h,
-                ex_start_x:ex_start_x + w] = inter_mask
-    if not isinstance(size, (tuple, list)):
-        size = (size, size)
-    mask = PIL.Image.fromarray(warped_mask)
-    mask = mask.resize((size[1], size[0]), PIL.Image.NEAREST)
-    return np.array(mask)
-def warp_mask_via_polygons(polygons, box, size):
-    """Warp mask via polygons."""
-    w, h = box[2] - box[0], box[3] - box[1]
-    if not isinstance(size, (tuple, list)):
-        size = (size, size)
-    ratio_h = size[0] / max(h, 0.1)
-    ratio_w = size[1] / max(w, 0.1)
-    polygons = copy.deepcopy(polygons)
-    for p in polygons:
-        p[0::2] = p[0::2] - box[0]
-        p[1::2] = p[1::2] - box[1]
-    if ratio_h == ratio_w:
-        for p in polygons:
-            p *= ratio_h
-    else:
-        for p in polygons:
-            p[0::2] *= ratio_w
-            p[1::2] *= ratio_h
-    rle_objs = mask_tools.frPyObjects(polygons, size[0], size[1])
-    rle_objs = [mask_tools.merge(rle_objs)]
-    return mask_tools.decode(rle_objs)[:, :, 0]
-def mask_overlap(box1, box2, mask1, mask2):
-    """Compute the overlap of two masks."""
-    x1 = max(box1[0], box2[0])
-    y1 = max(box1[1], box2[1])
-    x2 = min(box1[2], box2[2])
-    y2 = min(box1[3], box2[3])
-    if x1 > x2 or y1 > y2:
-        return 0
-    w = x2 - x1 + 1
-    h = y2 - y1 + 1
-    # Get masks in the intersection part
-    start_ya = y1 - box1[1]
-    start_xa = x1 - box1[0]
-    inter_mask_a = mask1[start_ya: start_ya + h, start_xa:start_xa + w]
-    start_yb = y1 - box2[1]
-    start_xb = x1 - box2[0]
-    inter_mask_b = mask2[start_yb: start_yb + h, start_xb:start_xb + w]
-    assert inter_mask_a.shape == inter_mask_b.shape
-    inter = np.logical_and(inter_mask_b, inter_mask_a).sum()
-    union = mask1.sum() + mask2.sum() - inter
-    if union < 1.:
-        return 0.
-    return float(inter) / float(union)
-def project_masks(
-    masks,
-    boxes,
-    height,
-    width,
-    thresh=0.5,
-    data_format='HWC',
-    data_order='F',
-):
-    """Project the predicting masks to a image.
-    Parameters
-    ----------
-    masks : numpy.ndarray
-        The masks packed in (C, H, W) format.
-    boxes : numpy.ndarray
-        The predicting bounding boxes.
-    height : int
-        The height of image.
-    width : int
-        The width of image.
-    thresh : float, optional, default=0.5
-        The threshold to binarize floating mask.
-    data_format : {'HWC', 'CHW'}, optional
-        The data format of output image.
-    data_order : {'F', 'C'}, optional
-        The fortran-style or c-style order.
-    Returns
-    -------
-    numpy.ndarray
-        The output image.
-    """
-    num_pred = boxes.shape[0]
-    assert masks.shape[0] == num_pred
-    mask_shape = [height, width]
-    if data_format == 'HWC':
-        mask_shape += [num_pred]
-    elif data_format == 'CHW':
-        mask_shape = [num_pred] + mask_shape
-    else:
-        raise ValueError('Unknown data format', data_format)
-    mask_image = np.zeros(mask_shape, 'uint8', data_order)
-    size = masks[0].shape[0]
-    scale = (size + 2.) / size
-    ref_boxes = box_util.expand_boxes(boxes, scale)
-    ref_boxes = ref_boxes.astype(np.int32)
-    padded_mask = np.zeros((size + 2, size + 2), 'float32')
-    for i in range(num_pred):
-        ref_box = ref_boxes[i, :4]
-        mask = masks[i]
-        padded_mask[1:-1, 1:-1] = mask[:, :]
-        w = ref_box[2] - ref_box[0] + 1
-        h = ref_box[3] - ref_box[1] + 1
-        w = np.maximum(w, 1)
-        h = np.maximum(h, 1)
-        mask = cv2.resize(padded_mask, (w, h))
-        mask = np.array(mask >= thresh, 'uint8')
-        x1 = max(ref_box[0], 0)
-        y1 = max(ref_box[1], 0)
-        x2 = min(ref_box[2] + 1, width)
-        y2 = min(ref_box[3] + 1, height)
-        if data_format == 'HWC':
-            mask_image[y1:y2, x1:x2, i] = \
-                mask[(y1 - ref_box[1]):(y2 - ref_box[1]),
-                     (x1 - ref_box[0]):(x2 - ref_box[0])]
-        elif data_format == 'CHW':
-            mask_image[i, y1:y2, x1:x2] = \
-                mask[(y1 - ref_box[1]):(y2 - ref_box[1]),
-                     (x1 - ref_box[0]):(x2 - ref_box[0])]
-    return mask_image
--- a/seetadet/onnx/__init__.py
+++ b/seetadet/onnx/__init__.py
@@ -8,10 +8,12 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
-"""ONNX utilities."""
+"""Mask utilities."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-from seetadet.onnx import nodes as _
+from seetadet.utils.mask.helper import mask_from
+from seetadet.utils.mask.helper import paste_masks
+from seetadet.utils.mask.metrics import mask_overlap
--- a/seetadet/utils/mask/helper.py
+++ b/seetadet/utils/mask/helper.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Helper functions for Mask."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import copy
+import cv2
+import numpy as np
+from pycocotools.mask import decode
+from pycocotools.mask import merge
+from pycocotools.mask import frPyObjects
+from seetadet.utils.bbox import rescale_boxes
+from seetadet.utils.image import im_resize
+def mask_from_buffer(buffer, size):
+    """Return a binary mask from the buffer."""
+    if not isinstance(size, (tuple, list)):
+        size = (size, size)
+    rles = [{'counts': buffer, 'size': size}]
+    mask = decode(rles)
+    if mask.shape[2] != 1:
+        raise ValueError('Mask contains {} instances. '
+                         'Merge them before compressing.'
+                         .format(mask.shape[2]))
+    return mask[:, :, 0]
+def mask_from_polygons(polygons, size, box=None):
+    """Return a binary mask from the polygons."""
+    if not isinstance(size, (tuple, list)):
+        size = (size, size)
+    if box is not None:
+        polygons = copy.deepcopy(polygons)
+        w, h = box[2] - box[0], box[3] - box[1]
+        ratio_h = size[0] / max(h, 0.1)
+        ratio_w = size[1] / max(w, 0.1)
+        for p in polygons:
+            p[0::2] = p[0::2] - box[0]
+            p[1::2] = p[1::2] - box[1]
+        if ratio_h == ratio_w:
+            for p in polygons:
+                p *= ratio_h
+        else:
+            for p in polygons:
+                p[0::2] *= ratio_w
+                p[1::2] *= ratio_h
+    rles = frPyObjects(polygons, size[0], size[1])
+    return decode([merge(rles)])[:, :, 0]
+def mask_from_bitmap(bitmap, size, box=None):
+    """Return a binary mask from the bitmap."""
+    if not isinstance(size, (tuple, list)):
+        size = (size, size)
+    if box is not None:
+        box = np.round(box).astype('int64')
+        bitmap = bitmap[box[1]:box[3] + 1, box[0]:box[2] + 1]
+    return im_resize(bitmap, size, mode='nearest')
+def mask_from(segm, size, box=None):
+    """Return a binary mask from the segmentation object."""
+    if segm is None:
+        return None
+    elif isinstance(segm, list):
+        return mask_from_polygons(segm, size, box)
+    elif isinstance(segm, np.ndarray):
+        return mask_from_bitmap(segm, size, box)
+    elif isinstance(segm, bytes):
+        return mask_from_buffer(segm, size, box)
+    else:
+        raise TypeError('Unknown segmentation type: ' + type(segm))
+def paste_masks(masks, boxes, img_size, thresh=0.5, data_order='F'):
+    """Paste masks on an image."""
+    num_boxes = boxes.shape[0]
+    assert masks.shape[0] == num_boxes
+    img_shape = list(img_size) + [num_boxes]
+    output = np.zeros(img_shape, 'uint8', data_order)
+    size = masks[0].shape[0]
+    scale_factor = (size + 2.) / size
+    boxes = rescale_boxes(boxes, scale_factor).astype(np.int32)
+    padded_mask = np.zeros((size + 2, size + 2), 'float32')
+    for i in range(num_boxes):
+        box, mask = boxes[i, :4], masks[i]
+        padded_mask[1:-1, 1:-1] = mask[:, :]
+        w = max(box[2] - box[0] + 1, 1)
+        h = max(box[3] - box[1] + 1, 1)
+        mask = cv2.resize(padded_mask, (w, h))
+        mask = np.array(mask >= thresh, 'uint8')
+        x1, y1 = max(box[0], 0), max(box[1], 0)
+        x2, y2 = min(box[2] + 1, img_size[1]), min(box[3] + 1, img_size[0])
+        mask = mask[y1 - box[1]:y2 - box[1], x1 - box[0]:x2 - box[0]]
+        output[y1:y2, x1:x2, i] = mask
+    return output
--- a/seetadet/utils/mask/metrics.py
+++ b/seetadet/utils/mask/metrics.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Mask metrics."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+def mask_overlap(box1, box2, mask1, mask2):
+    """Compute the overlap of two masks."""
+    x1 = max(box1[0], box2[0])
+    y1 = max(box1[1], box2[1])
+    x2 = min(box1[2], box2[2])
+    y2 = min(box1[3], box2[3])
+    if x1 > x2 or y1 > y2:
+        return 0
+    w = x2 - x1 + 1
+    h = y2 - y1 + 1
+    # Get masks in the intersection part.
+    start_ya = y1 - box1[1]
+    start_xa = x1 - box1[0]
+    inter_mask_a = mask1[start_ya: start_ya + h, start_xa:start_xa + w]
+    start_yb = y1 - box2[1]
+    start_xb = x1 - box2[0]
+    inter_mask_b = mask2[start_yb: start_yb + h, start_xb:start_xb + w]
+    assert inter_mask_a.shape == inter_mask_b.shape
+    inter = np.logical_and(inter_mask_b, inter_mask_a).sum()
+    union = mask1.sum() + mask2.sum() - inter
+    if union < 1.:
+        return 0.
+    return float(inter) / float(union)
--- a/seetadet/utils/nms.py
+++ b/seetadet/utils/nms.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from seetadet.modules import det
-from seetadet.utils import env
-try:
-    from seetadet.utils.cython_nms import cpu_nms
-    from seetadet.utils.cython_nms import cpu_soft_nms
-except ImportError:
-    cpu_nms = cpu_soft_nms = print
-def gpu_nms(detections, thresh):
-    """Filter out the detections using GPU-NMS."""
-    if detections.shape[0] == 0:
-        return []
-    scores = detections[:, 4]
-    order = scores.argsort()[::-1]
-    sorted_detections = env.new_tensor(detections[order, :])
-    keep = det.nms(sorted_detections, iou_threshold=thresh).numpy()
-    return order[keep]
-def nms(detections, thresh):
-    """Filter out the detections using NMS."""
-    if detections.shape[0] == 0:
-        return []
-    if cpu_nms is print:
-        raise ImportError('Failed to load <cython_nms> library.')
-    return cpu_nms(detections, thresh)
-def soft_nms(
-    detections,
-    thresh,
-    method='linear',
-    sigma=0.5,
-    score_thresh=0.001,
-):
-    """Filter out the detections using Soft-NMS."""
-    if detections.shape[0] == 0:
-        return []
-    if cpu_soft_nms is print:
-        raise ImportError('Failed to load <cython_nms> library.')
-    methods = {'hard': 0, 'linear': 1, 'gaussian': 2}
-    if method not in methods:
-        raise ValueError('Unknown soft nms method:', method)
-    return cpu_soft_nms(
-        detections,
-        thresh,
-        methods[method],
-        sigma,
-        score_thresh,
-    )
--- a/seetadet/modules/init.py
+++ b/seetadet/modules/init.py
@@ -8,25 +8,12 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
-"""Init modules."""
+"""Non-Maximum Suppression utilities."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-from dragon.vm.torch import nn
+from seetadet.utils.nms.nms_impl import gpu_nms
+from seetadet.utils.nms.nms_impl import nms
+from seetadet.utils.nms.nms_impl import soft_nms
-def xavier_uniform(weight, mode='fan_in'):
-    """The initializer of xavier uniform distribution."""
-    nn.init.kaiming_uniform_(weight, mode=mode, nonlinearity='linear')
-def kaiming_normal(weight, mode='fan_in'):
-    """The initializer of kaiming normal distribution."""
-    nn.init.kaiming_normal_(weight, mode=mode, nonlinearity='relu')
-# Aliases
-constant = nn.init.constant_
-normal = nn.init.normal_
--- a/seetadet/utils/nms/nms_impl.py
+++ b/seetadet/utils/nms/nms_impl.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Implementations of Non-Maximum Suppression."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.ops.normalization import to_tensor
+from seetadet.ops.vision import NonMaxSuppression
+try:
+    from seetadet.utils.nms.cython_nms import cpu_nms
+    from seetadet.utils.nms.cython_nms import cpu_soft_nms
+except ImportError:
+    cpu_nms = cpu_soft_nms = print
+def gpu_nms(dets, thresh):
+    """Filter out the dets using GPU-NMS."""
+    if dets.shape[0] == 0:
+        return []
+    scores = dets[:, 4]
+    order = scores.argsort()[::-1]
+    sorted_dets = to_tensor(dets[order, :])
+    keep = NonMaxSuppression.apply(sorted_dets, iou_threshold=thresh)
+    return order[keep.numpy()]
+def nms(dets, thresh):
+    """Filter out the dets using NMS."""
+    if dets.shape[0] == 0:
+        return []
+    if cpu_nms is print:
+        raise ImportError('Failed to load <cython_nms> library.')
+    return cpu_nms(dets, thresh)
+def soft_nms(dets, thresh, method='linear', sigma=0.5, score_thresh=0.001):
+    """Filter out the dets using Soft-NMS."""
+    if dets.shape[0] == 0:
+        return []
+    if cpu_soft_nms is print:
+        raise ImportError('Failed to load <cython_nms> library.')
+    methods = {'hard': 0, 'linear': 1, 'gaussian': 2}
+    if method not in methods:
+        raise ValueError('Unknown soft nms method: ' + method)
+    return cpu_soft_nms(dets, thresh, methods[method], sigma, score_thresh)
--- a/seetadet/utils/observer.py
+++ b/seetadet/utils/observer.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import functools
-import operator
-from dragon.vm import torch
-from seetadet.modules import nn
-def conv_flops(m, inputs, output):
-    """Hook to compute flops for a convolution."""
-    _ = locals()  # Unused
-    k_dim = functools.reduce(operator.mul, m.kernel_size)
-    out_dim = functools.reduce(operator.mul, output.shape[2:])
-    out_c, in_c = m.weight.shape[:2]
-    m.__params__ = (k_dim * in_c + (1 if m.bias else 0)) * out_c
-    m.__flops__ = m.__params__ * out_dim
-def register_flops(module):
-    """Register hooks to collect flops info."""
-    if not hasattr(module, '__flops__'):
-        module.__flops__ = 0.
-        for m in module.modules():
-            if isinstance(m, nn.Conv2d):
-                m.register_forward_hook(conv_flops)
-def collect_flops(module, normalizer=1e6):
-    """Collect flops from the last forward."""
-    total_flops = 0.0
-    for m in module.modules():
-        if hasattr(m, '__flops__'):
-            total_flops += m.__flops__
-            m.__flops__ = 0.0
-    return total_flops / normalizer
-def benchmark_flops(module, normalizer=1e6):
-    """Return the flops by running benchmark once."""
-    register_flops(module)
-    collect_flops(module)
-    original_training = module.training
-    if original_training:
-        module.eval()
-    with torch.no_grad():
-        module.benchmark()
-    if original_training:
-        module.train()
-    return collect_flops(module, normalizer)
--- a/seetadet/algo/mask_rcnn/__init__.py
+++ b/seetadet/algo/mask_rcnn/__init__.py
@@ -8,12 +8,12 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Profiler utilities."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-from seetadet.algo.faster_rcnn.anchor_target import AnchorTarget
+from seetadet.utils.profiler.stats import SmoothedValue
-from seetadet.algo.faster_rcnn.proposal import Proposal
+from seetadet.utils.profiler.timer import Timer
-from seetadet.algo.mask_rcnn.data_loader import DataLoader
+from seetadet.utils.profiler.timer import get_progress
-from seetadet.algo.mask_rcnn.proposal_target import ProposalTarget
--- a/seetadet/utils/stats.py
+++ b/seetadet/utils/stats.py
@@ -8,6 +8,7 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Trackable statistics."""
 from __future__ import absolute_import
 from __future__ import division
@@ -18,32 +19,30 @@ import numpy as np
 class SmoothedValue(object):
-    """Track a series of values and provide smoothed report."""
+    """Track values and provide smoothed report."""
-    def __init__(self, window_size):
+    def __init__(self, window_size=None):
        self.deque = collections.deque(maxlen=window_size)
-        self.series = []
        self.total = 0.0
        self.count = 0
-    def add_value(self, value):
+    def update(self, value):
        self.deque.append(value)
-        self.series.append(value)
        self.count += 1
        self.total += value
-    def average(self):
+    def mean(self):
        return np.mean(self.deque)
-    def global_average(self):
-        return self.total / self.count
    def median(self):
        return np.median(self.deque)
+    def average(self):
+        return self.total / self.count
 class ExponentialMovingAverage(object):
-    """Track a series of values and provide EMA report."""
+    """Track values and provide EMA report."""
    def __init__(self, decay=0.9):
        self.value = None
@@ -51,7 +50,7 @@ class ExponentialMovingAverage(object):
        self.total = 0.0
        self.count = 0
-    def add_value(self, value):
+    def update(self, value):
        if self.value is None:
            self.value = value
        else:

--- a/seetadet/utils/time_util.py
+++ b/seetadet/utils/time_util.py
@@ -8,6 +8,7 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Timing functions."""
 from __future__ import absolute_import
 from __future__ import division
@@ -19,7 +20,7 @@ import time
 class Timer(object):
-    """A simple timer."""
+    """Simple timer."""
    def __init__(self):
        self.total_time = 0.
@@ -28,74 +29,32 @@ class Timer(object):
        self.diff = 0.
        self.average_time = 0.
-    def add_diff(self, diff, average=True):
+    def add_diff(self, diff, n=1, average=True):
        self.total_time += diff
-        self.calls += 1
+        self.calls += n
        self.average_time = self.total_time / self.calls
-        if average:
+        return self.average_time if average else self.diff
-            return self.average_time
-        else:
-            return self.diff
    @contextlib.contextmanager
-    def tic_and_toc(self):
+    def tic_and_toc(self, n=1, average=True):
        try:
            yield self.tic()
        finally:
-            self.toc()
+            self.toc(n, average)
    def tic(self):
        self.start_time = time.time()
+        return self
-    def toc(self, average=True):
+    def toc(self, n=1, average=True):
        self.diff = time.time() - self.start_time
-        self.total_time += self.diff
+        return self.add_diff(self.diff, n, average)
-        self.calls += 1
-        self.average_time = self.total_time / self.calls
-        if average:
-            return self.average_time
-        else:
-            return self.diff
-def get_progress_info(timer, step, max_steps):
-    """Return a info of current progress.
-    Parameters
-    ----------
-    timer : Timer
-        The timer to get progress.
-    step : int
-        The current step.
-    max_steps : int
-        The total number of steps.
-    Returns
+def get_progress(timer, step, max_steps):
-    -------
+    """Return the progress information."""
-    str
+    eta_seconds = timer.average_time * (max_steps - step)
-        The progress info.
-    """
-    average_time = timer.average_time
-    eta_seconds = average_time * (max_steps - step)
    eta = str(datetime.timedelta(seconds=int(eta_seconds)))
    progress = (step + 1.) / max_steps
    return ('< PROGRESS: {:.2%} | SPEED: {:.3f}s / iter | ETA: {} >'
            .format(progress, timer.average_time, eta))
-def new_timers(*args):
-    """Return a dict that contains specified timers.
-    Parameters
-    ----------
-    args : str...
-        The key(s) to create timers.
-    Returns
-    -------
-    Dict[Timer]
-        The timer dict.
-    """
-    return dict([(k, Timer()) for k in args])
--- a/seetadet/utils/pycocotools/__init__.py
+++ b/seetadet/utils/pycocotools/__init__.py
-__author__ = 'tylin'
--- a/seetadet/utils/pycocotools/coco.py
+++ b/seetadet/utils/pycocotools/coco.py
-__author__ = 'tylin'
-__version__ = '2.0'
-# Interface for accessing the Microsoft COCO dataset.
-# Microsoft COCO is a large image dataset designed for object detection,
-# segmentation, and caption generation. pycocotools is a Python API that
-# assists in loading, parsing and visualizing the annotations in COCO.
-# Please visit http://mscoco.org/ for more information on COCO, including
-# for the data, paper, and tutorials. The exact format of the annotations
-# is also described on the COCO website. For example usage of the pycocotools
-# please see pycocotools_demo.ipynb. In addition to this API, please download both
-# the COCO images and annotations in order to run the demo.
-# An alternative to using the API is to load the annotations directly
-# into Python dictionary
-# Using the API provides additional utility functions. Note that this API
-# supports both *instance* and *caption* annotations. In the case of
-# captions not all functions are defined (e.g. categories are undefined).
-# The following API functions are defined:
-#  COCO       - COCO api class that loads COCO annotation file and prepare data structures.
-#  decodeMask - Decode binary mask M encoded via run-length encoding.
-#  encodeMask - Encode binary mask M using run-length encoding.
-#  getAnnIds  - Get ann ids that satisfy given filter conditions.
-#  getCatIds  - Get cat ids that satisfy given filter conditions.
-#  getImgIds  - Get img ids that satisfy given filter conditions.
-#  loadAnns   - Load anns with the specified ids.
-#  loadCats   - Load cats with the specified ids.
-#  loadImgs   - Load imgs with the specified ids.
-#  annToMask  - Convert segmentation in an annotation to binary mask.
-#  showAnns   - Display the specified annotations.
-#  loadRes    - Load algorithm results and create API for accessing them.
-#  download   - Download COCO images from mscoco.org server.
-# Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
-# Help on each functions can be accessed by: "help COCO>function".
-# See also COCO>decodeMask,
-# COCO>encodeMask, COCO>getAnnIds, COCO>getCatIds,
-# COCO>getImgIds, COCO>loadAnns, COCO>loadCats,
-# COCO>loadImgs, COCO>annToMask, COCO>showAnns
-# Microsoft COCO Toolbox.      version 2.0
-# Data, paper, and tutorials available at:  http://mscoco.org/
-# Code written by Piotr Dollar and Tsung-Yi Lin, 2014.
-# Licensed under the Simplified BSD License [see bsd.txt]
-import json
-import time
-import matplotlib.pyplot as plt
-from matplotlib.collections import PatchCollection
-from matplotlib.patches import Polygon
-import numpy as np
-import copy
-import itertools
-from . import mask as maskUtils
-import os
-from collections import defaultdict
-import sys
-PYTHON_VERSION = sys.version_info[0]
-if PYTHON_VERSION == 2:
-    from urllib import urlretrieve
-elif PYTHON_VERSION == 3:
-    from urllib.request import urlretrieve
-def _isArrayLike(obj):
-    return hasattr(obj, '__iter__') and hasattr(obj, '__len__')
-class COCO:
-    def __init__(self, annotation_file=None):
-        """
-        Constructor of Microsoft COCO helper class for reading and visualizing annotations.
-        :param annotation_file (str): location of annotation file
-        :param image_folder (str): location to the folder that hosts images.
-        :return:
-        """
-        # load dataset
-        self.dataset,self.anns,self.cats,self.imgs = dict(),dict(),dict(),dict()
-        self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list)
-        if not annotation_file == None:
-            print('loading annotations into memory...')
-            tic = time.time()
-            dataset = json.load(open(annotation_file, 'r'))
-            assert type(dataset)==dict, 'annotation file format {} not supported'.format(type(dataset))
-            print('Done (t={:0.2f}s)'.format(time.time()- tic))
-            self.dataset = dataset
-            self.createIndex()
-    def createIndex(self):
-        # create index
-        print('creating index...')
-        anns, cats, imgs = {}, {}, {}
-        imgToAnns,catToImgs = defaultdict(list),defaultdict(list)
-        if 'annotations' in self.dataset:
-            for ann in self.dataset['annotations']:
-                imgToAnns[ann['image_id']].append(ann)
-                anns[ann['id']] = ann
-        if 'images' in self.dataset:
-            for img in self.dataset['images']:
-                imgs[img['id']] = img
-        if 'categories' in self.dataset:
-            for cat in self.dataset['categories']:
-                cats[cat['id']] = cat
-        if 'annotations' in self.dataset and 'categories' in self.dataset:
-            for ann in self.dataset['annotations']:
-                catToImgs[ann['category_id']].append(ann['image_id'])
-        print('index created!')
-        # create class members
-        self.anns = anns
-        self.imgToAnns = imgToAnns
-        self.catToImgs = catToImgs
-        self.imgs = imgs
-        self.cats = cats
-    def info(self):
-        """
-        Print information about the annotation file.
-        :return:
-        """
-        for key, value in self.dataset['info'].items():
-            print('{}: {}'.format(key, value))
-    def getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None):
-        """
-        Get ann ids that satisfy given filter conditions. default skips that filter
-        :param imgIds  (int array)     : get anns for given imgs
-               catIds  (int array)     : get anns for given cats
-               areaRng (float array)   : get anns for given area range (e.g. [0 inf])
-               iscrowd (boolean)       : get anns for given crowd label (False or True)
-        :return: ids (int array)       : integer array of ann ids
-        """
-        imgIds = imgIds if _isArrayLike(imgIds) else [imgIds]
-        catIds = catIds if _isArrayLike(catIds) else [catIds]
-        if len(imgIds) == len(catIds) == len(areaRng) == 0:
-            anns = self.dataset['annotations']
-        else:
-            if not len(imgIds) == 0:
-                lists = [self.imgToAnns[imgId] for imgId in imgIds if imgId in self.imgToAnns]
-                anns = list(itertools.chain.from_iterable(lists))
-            else:
-                anns = self.dataset['annotations']
-            anns = anns if len(catIds)  == 0 else [ann for ann in anns if ann['category_id'] in catIds]
-            anns = anns if len(areaRng) == 0 else [ann for ann in anns if ann['area'] > areaRng[0] and ann['area'] < areaRng[1]]
-        if not iscrowd == None:
-            ids = [ann['id'] for ann in anns if ann['iscrowd'] == iscrowd]
-        else:
-            ids = [ann['id'] for ann in anns]
-        return ids
-    def getCatIds(self, catNms=[], supNms=[], catIds=[]):
-        """
-        filtering parameters. default skips that filter.
-        :param catNms (str array)  : get cats for given cat names
-        :param supNms (str array)  : get cats for given supercategory names
-        :param catIds (int array)  : get cats for given cat ids
-        :return: ids (int array)   : integer array of cat ids
-        """
-        catNms = catNms if _isArrayLike(catNms) else [catNms]
-        supNms = supNms if _isArrayLike(supNms) else [supNms]
-        catIds = catIds if _isArrayLike(catIds) else [catIds]
-        if len(catNms) == len(supNms) == len(catIds) == 0:
-            cats = self.dataset['categories']
-        else:
-            cats = self.dataset['categories']
-            cats = cats if len(catNms) == 0 else [cat for cat in cats if cat['name']          in catNms]
-            cats = cats if len(supNms) == 0 else [cat for cat in cats if cat['supercategory'] in supNms]
-            cats = cats if len(catIds) == 0 else [cat for cat in cats if cat['id']            in catIds]
-        ids = [cat['id'] for cat in cats]
-        return ids
-    def getImgIds(self, imgIds=[], catIds=[]):
-        '''
-        Get img ids that satisfy given filter conditions.
-        :param imgIds (int array) : get imgs for given ids
-        :param catIds (int array) : get imgs with all given cats
-        :return: ids (int array)  : integer array of img ids
-        '''
-        imgIds = imgIds if _isArrayLike(imgIds) else [imgIds]
-        catIds = catIds if _isArrayLike(catIds) else [catIds]
-        if len(imgIds) == len(catIds) == 0:
-            ids = self.imgs.keys()
-        else:
-            ids = set(imgIds)
-            for i, catId in enumerate(catIds):
-                if i == 0 and len(ids) == 0:
-                    ids = set(self.catToImgs[catId])
-                else:
-                    ids &= set(self.catToImgs[catId])
-        return list(ids)
-    def loadAnns(self, ids=[]):
-        """
-        Load anns with the specified ids.
-        :param ids (int array)       : integer ids specifying anns
-        :return: anns (object array) : loaded ann objects
-        """
-        if _isArrayLike(ids):
-            return [self.anns[id] for id in ids]
-        elif type(ids) == int:
-            return [self.anns[ids]]
-    def loadCats(self, ids=[]):
-        """
-        Load cats with the specified ids.
-        :param ids (int array)       : integer ids specifying cats
-        :return: cats (object array) : loaded cat objects
-        """
-        if _isArrayLike(ids):
-            return [self.cats[id] for id in ids]
-        elif type(ids) == int:
-            return [self.cats[ids]]
-    def loadImgs(self, ids=[]):
-        """
-        Load anns with the specified ids.
-        :param ids (int array)       : integer ids specifying img
-        :return: imgs (object array) : loaded img objects
-        """
-        if _isArrayLike(ids):
-            return [self.imgs[id] for id in ids]
-        elif type(ids) == int:
-            return [self.imgs[ids]]
-    def showAnns(self, anns):
-        """
-        Display the specified annotations.
-        :param anns (array of object): annotations to display
-        :return: None
-        """
-        if len(anns) == 0:
-            return 0
-        if 'segmentation' in anns[0] or 'keypoints' in anns[0]:
-            datasetType = 'instances'
-        elif 'caption' in anns[0]:
-            datasetType = 'captions'
-        else:
-            raise Exception('datasetType not supported')
-        if datasetType == 'instances':
-            ax = plt.gca()
-            ax.set_autoscale_on(False)
-            polygons = []
-            color = []
-            for ann in anns:
-                c = (np.random.random((1, 3))*0.6+0.4).tolist()[0]
-                if 'segmentation' in ann:
-                    if type(ann['segmentation']) == list:
-                        # polygon
-                        for seg in ann['segmentation']:
-                            poly = np.array(seg).reshape((int(len(seg)/2), 2))
-                            polygons.append(Polygon(poly))
-                            color.append(c)
-                    else:
-                        # mask
-                        t = self.imgs[ann['image_id']]
-                        if type(ann['segmentation']['counts']) == list:
-                            rle = maskUtils.frPyObjects([ann['segmentation']], t['height'], t['width'])
-                        else:
-                            rle = [ann['segmentation']]
-                        m = maskUtils.decode(rle)
-                        img = np.ones( (m.shape[0], m.shape[1], 3) )
-                        if ann['iscrowd'] == 1:
-                            color_mask = np.array([2.0,166.0,101.0])/255
-                        if ann['iscrowd'] == 0:
-                            color_mask = np.random.random((1, 3)).tolist()[0]
-                        for i in range(3):
-                            img[:,:,i] = color_mask[i]
-                        ax.imshow(np.dstack( (img, m*0.5) ))
-                if 'keypoints' in ann and type(ann['keypoints']) == list:
-                    # turn skeleton into zero-based index
-                    sks = np.array(self.loadCats(ann['category_id'])[0]['skeleton'])-1
-                    kp = np.array(ann['keypoints'])
-                    x = kp[0::3]
-                    y = kp[1::3]
-                    v = kp[2::3]
-                    for sk in sks:
-                        if np.all(v[sk]>0):
-                            plt.plot(x[sk],y[sk], linewidth=3, color=c)
-                    plt.plot(x[v>0], y[v>0],'o',markersize=8, markerfacecolor=c, markeredgecolor='k',markeredgewidth=2)
-                    plt.plot(x[v>1], y[v>1],'o',markersize=8, markerfacecolor=c, markeredgecolor=c, markeredgewidth=2)
-            p = PatchCollection(polygons, facecolor=color, linewidths=0, alpha=0.4)
-            ax.add_collection(p)
-            p = PatchCollection(polygons, facecolor='none', edgecolors=color, linewidths=2)
-            ax.add_collection(p)
-        elif datasetType == 'captions':
-            for ann in anns:
-                print(ann['caption'])
-    def loadRes(self, resFile):
-        """
-        Load result file and return a result api object.
-        :param   resFile (str)     : file name of result file
-        :return: res (obj)         : result api object
-        """
-        res = COCO()
-        res.dataset['images'] = [img for img in self.dataset['images']]
-        print('Loading and preparing results...')
-        tic = time.time()
-        if type(resFile) == str or type(resFile) == unicode:
-            anns = json.load(open(resFile))
-        elif type(resFile) == np.ndarray:
-            anns = self.loadNumpyAnnotations(resFile)
-        else:
-            anns = resFile
-        assert type(anns) == list, 'results in not an array of objects'
-        annsImgIds = [ann['image_id'] for ann in anns]
-        assert set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())), \
-               'Results do not correspond to current coco set'
-        if 'caption' in anns[0]:
-            imgIds = set([img['id'] for img in res.dataset['images']]) & set([ann['image_id'] for ann in anns])
-            res.dataset['images'] = [img for img in res.dataset['images'] if img['id'] in imgIds]
-            for id, ann in enumerate(anns):
-                ann['id'] = id+1
-        elif 'bbox' in anns[0] and not anns[0]['bbox'] == []:
-            res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
-            for id, ann in enumerate(anns):
-                bb = ann['bbox']
-                x1, x2, y1, y2 = [bb[0], bb[0]+bb[2], bb[1], bb[1]+bb[3]]
-                if not 'segmentation' in ann:
-                    ann['segmentation'] = [[x1, y1, x1, y2, x2, y2, x2, y1]]
-                ann['area'] = bb[2]*bb[3]
-                ann['id'] = id+1
-                ann['iscrowd'] = 0
-        elif 'segmentation' in anns[0]:
-            res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
-            for id, ann in enumerate(anns):
-                # now only support compressed RLE format as segmentation results
-                ann['area'] = maskUtils.area([ann['segmentation']])[0]
-                if not 'bbox' in ann:
-                    ann['bbox'] = maskUtils.toBbox([ann['segmentation']])[0]
-                ann['id'] = id+1
-                ann['iscrowd'] = 0
-        elif 'keypoints' in anns[0]:
-            res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
-            for id, ann in enumerate(anns):
-                s = ann['keypoints']
-                x = s[0::3]
-                y = s[1::3]
-                x0,x1,y0,y1 = np.min(x), np.max(x), np.min(y), np.max(y)
-                ann['area'] = (x1-x0)*(y1-y0)
-                ann['id'] = id + 1
-                ann['bbox'] = [x0,y0,x1-x0,y1-y0]
-        print('DONE (t={:0.2f}s)'.format(time.time()- tic))
-        res.dataset['annotations'] = anns
-        res.createIndex()
-        return res
-    def download(self, tarDir = None, imgIds = [] ):
-        '''
-        Download COCO images from mscoco.org server.
-        :param tarDir (str): COCO results directory name
-               imgIds (list): images to be downloaded
-        :return:
-        '''
-        if tarDir is None:
-            print('Please specify target directory')
-            return -1
-        if len(imgIds) == 0:
-            imgs = self.imgs.values()
-        else:
-            imgs = self.loadImgs(imgIds)
-        N = len(imgs)
-        if not os.path.exists(tarDir):
-            os.makedirs(tarDir)
-        for i, img in enumerate(imgs):
-            tic = time.time()
-            fname = os.path.join(tarDir, img['file_name'])
-            if not os.path.exists(fname):
-                urlretrieve(img['coco_url'], fname)
-            print('downloaded {}/{} images (t={:0.1f}s)'.format(i, N, time.time()- tic))
-    def loadNumpyAnnotations(self, data):
-        """
-        Convert result data from a numpy array [Nx7] where each row contains {imageID,x1,y1,w,h,score,class}
-        :param  data (numpy.ndarray)
-        :return: annotations (python nested list)
-        """
-        print('Converting ndarray to lists...')
-        assert(type(data) == np.ndarray)
-        print(data.shape)
-        assert(data.shape[1] == 7)
-        N = data.shape[0]
-        ann = []
-        for i in range(N):
-            if i % 1000000 == 0:
-                print('{}/{}'.format(i,N))
-            ann += [{
-                'image_id'  : int(data[i, 0]),
-                'bbox'  : [ data[i, 1], data[i, 2], data[i, 3], data[i, 4] ],
-                'score' : data[i, 5],
-                'category_id': int(data[i, 6]),
-                }]
-        return ann
-    def annToRLE(self, ann):
-        """
-        Convert annotation which can be polygons, uncompressed RLE to RLE.
-        :return: binary mask (numpy 2D array)
-        """
-        t = self.imgs[ann['image_id']]
-        h, w = t['height'], t['width']
-        segm = ann['segmentation']
-        if type(segm) == list:
-            # polygon -- a single object might consist of multiple parts
-            # we merge all parts into one mask rle code
-            rles = maskUtils.frPyObjects(segm, h, w)
-            rle = maskUtils.merge(rles)
-        elif type(segm['counts']) == list:
-            # uncompressed RLE
-            rle = maskUtils.frPyObjects(segm, h, w)
-        else:
-            # rle
-            rle = ann['segmentation']
-        return rle
-    def annToMask(self, ann):
-        """
-        Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask.
-        :return: binary mask (numpy 2D array)
-        """
-        rle = self.annToRLE(ann)
-        m = maskUtils.decode(rle)
-        return m
--- a/seetadet/utils/pycocotools/cocoeval.py
+++ b/seetadet/utils/pycocotools/cocoeval.py
-__author__ = 'tsungyi'
-import numpy as np
-import datetime
-import time
-from collections import defaultdict
-from . import mask as maskUtils
-import copy
-class COCOeval:
-    # Interface for evaluating detection on the Microsoft COCO dataset.
-    #
-    # The usage for CocoEval is as follows:
-    #  cocoGt=..., cocoDt=...       # load dataset and results
-    #  E = CocoEval(cocoGt,cocoDt); # initialize CocoEval object
-    #  E.params.recThrs = ...;      # set parameters as desired
-    #  E.evaluate();                # run per image evaluation
-    #  E.accumulate();              # accumulate per image results
-    #  E.summarize();               # display summary metrics of results
-    # For example usage see evalDemo.m and http://mscoco.org/.
-    #
-    # The evaluation parameters are as follows (defaults in brackets):
-    #  imgIds     - [all] N img ids to use for evaluation
-    #  catIds     - [all] K cat ids to use for evaluation
-    #  iouThrs    - [.5:.05:.95] T=10 IoU thresholds for evaluation
-    #  recThrs    - [0:.01:1] R=101 recall thresholds for evaluation
-    #  areaRng    - [...] A=4 object area ranges for evaluation
-    #  maxDets    - [1 10 100] M=3 thresholds on max detections per image
-    #  iouType    - ['segm'] set iouType to 'segm', 'bbox' or 'keypoints'
-    #  iouType replaced the now DEPRECATED useSegm parameter.
-    #  useCats    - [1] if true use category labels for evaluation
-    # Note: if useCats=0 category labels are ignored as in proposal scoring.
-    # Note: multiple areaRngs [Ax2] and maxDets [Mx1] can be specified.
-    #
-    # evaluate(): evaluates detections on every image and every category and
-    # concats the results into the "evalImgs" with fields:
-    #  dtIds      - [1xD] id for each of the D detections (dt)
-    #  gtIds      - [1xG] id for each of the G ground truths (gt)
-    #  dtMatches  - [TxD] matching gt id at each IoU or 0
-    #  gtMatches  - [TxG] matching dt id at each IoU or 0
-    #  dtScores   - [1xD] confidence of each dt
-    #  gtIgnore   - [1xG] ignore flag for each gt
-    #  dtIgnore   - [TxD] ignore flag for each dt at each IoU
-    #
-    # accumulate(): accumulates the per-image, per-category evaluation
-    # results in "evalImgs" into the dictionary "eval" with fields:
-    #  params     - parameters used for evaluation
-    #  date       - date evaluation was performed
-    #  counts     - [T,R,K,A,M] parameter dimensions (see above)
-    #  precision  - [TxRxKxAxM] precision for every evaluation setting
-    #  recall     - [TxKxAxM] max recall for every evaluation setting
-    # Note: precision and recall==-1 for settings with no gt objects.
-    #
-    # See also coco, mask, pycocoDemo, pycocoEvalDemo
-    #
-    # Microsoft COCO Toolbox.      version 2.0
-    # Data, paper, and tutorials available at:  http://mscoco.org/
-    # Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
-    # Licensed under the Simplified BSD License [see coco/license.txt]
-    def __init__(self, cocoGt=None, cocoDt=None, iouType='segm'):
-        '''
-        Initialize CocoEval using coco APIs for gt and dt
-        :param cocoGt: coco object with ground truth annotations
-        :param cocoDt: coco object with detection results
-        :return: None
-        '''
-        if not iouType:
-            print('iouType not specified. use default iouType segm')
-        self.cocoGt   = cocoGt              # ground truth COCO API
-        self.cocoDt   = cocoDt              # detections COCO API
-        self.params   = {}                  # evaluation parameters
-        self.evalImgs = defaultdict(list)   # per-image per-category evaluation results [KxAxI] elements
-        self.eval     = {}                  # accumulated evaluation results
-        self._gts = defaultdict(list)       # gt for evaluation
-        self._dts = defaultdict(list)       # dt for evaluation
-        self.params = Params(iouType=iouType) # parameters
-        self._paramsEval = {}               # parameters for evaluation
-        self.stats = []                     # result summarization
-        self.ious = {}                      # ious between all gts and dts
-        if not cocoGt is None:
-            self.params.imgIds = sorted(cocoGt.getImgIds())
-            self.params.catIds = sorted(cocoGt.getCatIds())
-    def _prepare(self):
-        '''
-        Prepare ._gts and ._dts for evaluation based on params
-        :return: None
-        '''
-        def _toMask(anns, coco):
-            # modify ann['segmentation'] by reference
-            for ann in anns:
-                rle = coco.annToRLE(ann)
-                ann['segmentation'] = rle
-        p = self.params
-        if p.useCats:
-            gts=self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
-            dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
-        else:
-            gts=self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds))
-            dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds))
-        # convert ground truth to mask if iouType == 'segm'
-        if p.iouType == 'segm':
-            _toMask(gts, self.cocoGt)
-            _toMask(dts, self.cocoDt)
-        # set ignore flag
-        for gt in gts:
-            gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
-            gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']
-            if p.iouType == 'keypoints':
-                gt['ignore'] = (gt['num_keypoints'] == 0) or gt['ignore']
-        self._gts = defaultdict(list)       # gt for evaluation
-        self._dts = defaultdict(list)       # dt for evaluation
-        for gt in gts:
-            self._gts[gt['image_id'], gt['category_id']].append(gt)
-        for dt in dts:
-            self._dts[dt['image_id'], dt['category_id']].append(dt)
-        self.evalImgs = defaultdict(list)   # per-image per-category evaluation results
-        self.eval     = {}                  # accumulated evaluation results
-    def evaluate(self):
-        '''
-        Run per image evaluation on given images and store results (a list of dict) in self.evalImgs
-        :return: None
-        '''
-        tic = time.time()
-        print('Running per image evaluation...')
-        p = self.params
-        # add backward compatibility if useSegm is specified in params
-        if not p.useSegm is None:
-            p.iouType = 'segm' if p.useSegm == 1 else 'bbox'
-            print('useSegm (deprecated) is not None. Running {} evaluation'.format(p.iouType))
-        print('Evaluate annotation type *{}*'.format(p.iouType))
-        p.imgIds = list(np.unique(p.imgIds))
-        if p.useCats:
-            p.catIds = list(np.unique(p.catIds))
-        p.maxDets = sorted(p.maxDets)
-        self.params=p
-        self._prepare()
-        # loop through images, area range, max detection number
-        catIds = p.catIds if p.useCats else [-1]
-        if p.iouType == 'segm' or p.iouType == 'bbox':
-            computeIoU = self.computeIoU
-        elif p.iouType == 'keypoints':
-            computeIoU = self.computeOks
-        self.ious = {(imgId, catId): computeIoU(imgId, catId) \
-                        for imgId in p.imgIds
-                        for catId in catIds}
-        evaluateImg = self.evaluateImg
-        maxDet = p.maxDets[-1]
-        self.evalImgs = [evaluateImg(imgId, catId, areaRng, maxDet)
-                 for catId in catIds
-                 for areaRng in p.areaRng
-                 for imgId in p.imgIds
-             ]
-        self._paramsEval = copy.deepcopy(self.params)
-        toc = time.time()
-        print('DONE (t={:0.2f}s).'.format(toc-tic))
-    def computeIoU(self, imgId, catId):
-        p = self.params
-        if p.useCats:
-            gt = self._gts[imgId,catId]
-            dt = self._dts[imgId,catId]
-        else:
-            gt = [_ for cId in p.catIds for _ in self._gts[imgId,cId]]
-            dt = [_ for cId in p.catIds for _ in self._dts[imgId,cId]]
-        if len(gt) == 0 and len(dt) ==0:
-            return []
-        inds = np.argsort([-d['score'] for d in dt], kind='mergesort')
-        dt = [dt[i] for i in inds]
-        if len(dt) > p.maxDets[-1]:
-            dt=dt[0:p.maxDets[-1]]
-        if p.iouType == 'segm':
-            g = [g['segmentation'] for g in gt]
-            d = [d['segmentation'] for d in dt]
-        elif p.iouType == 'bbox':
-            g = [g['bbox'] for g in gt]
-            d = [d['bbox'] for d in dt]
-        else:
-            raise Exception('unknown iouType for iou computation')
-        # compute iou between each dt and gt region
-        iscrowd = [int(o['iscrowd']) for o in gt]
-        ious = maskUtils.iou(d,g,iscrowd)
-        return ious
-    def computeOks(self, imgId, catId):
-        p = self.params
-        # dimention here should be Nxm
-        gts = self._gts[imgId, catId]
-        dts = self._dts[imgId, catId]
-        inds = np.argsort([-d['score'] for d in dts], kind='mergesort')
-        dts = [dts[i] for i in inds]
-        if len(dts) > p.maxDets[-1]:
-            dts = dts[0:p.maxDets[-1]]
-        # if len(gts) == 0 and len(dts) == 0:
-        if len(gts) == 0 or len(dts) == 0:
-            return []
-        ious = np.zeros((len(dts), len(gts)))
-        sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62,.62, 1.07, 1.07, .87, .87, .89, .89])/10.0
-        vars = (sigmas * 2)**2
-        k = len(sigmas)
-        # compute oks between each detection and ground truth object
-        for j, gt in enumerate(gts):
-            # create bounds for ignore regions(double the gt bbox)
-            g = np.array(gt['keypoints'])
-            xg = g[0::3]; yg = g[1::3]; vg = g[2::3]
-            k1 = np.count_nonzero(vg > 0)
-            bb = gt['bbox']
-            x0 = bb[0] - bb[2]; x1 = bb[0] + bb[2] * 2
-            y0 = bb[1] - bb[3]; y1 = bb[1] + bb[3] * 2
-            for i, dt in enumerate(dts):
-                d = np.array(dt['keypoints'])
-                xd = d[0::3]; yd = d[1::3]
-                if k1>0:
-                    # measure the per-keypoint distance if keypoints visible
-                    dx = xd - xg
-                    dy = yd - yg
-                else:
-                    # measure minimum distance to keypoints in (x0,y0) & (x1,y1)
-                    z = np.zeros((k))
-                    dx = np.max((z, x0-xd),axis=0)+np.max((z, xd-x1),axis=0)
-                    dy = np.max((z, y0-yd),axis=0)+np.max((z, yd-y1),axis=0)
-                e = (dx**2 + dy**2) / vars / (gt['area']+np.spacing(1)) / 2
-                if k1 > 0:
-                    e=e[vg > 0]
-                ious[i, j] = np.sum(np.exp(-e)) / e.shape[0]
-        return ious
-    def evaluateImg(self, imgId, catId, aRng, maxDet):
-        '''
-        perform evaluation for single category and image
-        :return: dict (single image results)
-        '''
-        p = self.params
-        if p.useCats:
-            gt = self._gts[imgId,catId]
-            dt = self._dts[imgId,catId]
-        else:
-            gt = [_ for cId in p.catIds for _ in self._gts[imgId,cId]]
-            dt = [_ for cId in p.catIds for _ in self._dts[imgId,cId]]
-        if len(gt) == 0 and len(dt) ==0:
-            return None
-        for g in gt:
-            if g['ignore'] or (g['area']<aRng[0] or g['area']>aRng[1]):
-                g['_ignore'] = 1
-            else:
-                g['_ignore'] = 0
-        # sort dt highest score first, sort gt ignore last
-        gtind = np.argsort([g['_ignore'] for g in gt], kind='mergesort')
-        gt = [gt[i] for i in gtind]
-        dtind = np.argsort([-d['score'] for d in dt], kind='mergesort')
-        dt = [dt[i] for i in dtind[0:maxDet]]
-        iscrowd = [int(o['iscrowd']) for o in gt]
-        # load computed ious
-        ious = self.ious[imgId, catId][:, gtind] if len(self.ious[imgId, catId]) > 0 else self.ious[imgId, catId]
-        T = len(p.iouThrs)
-        G = len(gt)
-        D = len(dt)
-        gtm  = np.zeros((T,G))
-        dtm  = np.zeros((T,D))
-        gtIg = np.array([g['_ignore'] for g in gt])
-        dtIg = np.zeros((T,D))
-        if not len(ious)==0:
-            for tind, t in enumerate(p.iouThrs):
-                for dind, d in enumerate(dt):
-                    # information about best match so far (m=-1 -> unmatched)
-                    iou = min([t,1-1e-10])
-                    m   = -1
-                    for gind, g in enumerate(gt):
-                        # if this gt already matched, and not a crowd, continue
-                        if gtm[tind,gind]>0 and not iscrowd[gind]:
-                            continue
-                        # if dt matched to reg gt, and on ignore gt, stop
-                        if m>-1 and gtIg[m]==0 and gtIg[gind]==1:
-                            break
-                        # continue to next gt unless better match made
-                        if ious[dind,gind] < iou:
-                            continue
-                        # if match successful and best so far, store appropriately
-                        iou=ious[dind,gind]
-                        m=gind
-                    # if match made store id of match for both dt and gt
-                    if m ==-1:
-                        continue
-                    dtIg[tind,dind] = gtIg[m]
-                    dtm[tind,dind]  = gt[m]['id']
-                    gtm[tind,m]     = d['id']
-        # set unmatched detections outside of area range to ignore
-        a = np.array([d['area']<aRng[0] or d['area']>aRng[1] for d in dt]).reshape((1, len(dt)))
-        dtIg = np.logical_or(dtIg, np.logical_and(dtm==0, np.repeat(a,T,0)))
-        # store results for given image and category
-        return {
-                'image_id':     imgId,
-                'category_id':  catId,
-                'aRng':         aRng,
-                'maxDet':       maxDet,
-                'dtIds':        [d['id'] for d in dt],
-                'gtIds':        [g['id'] for g in gt],
-                'dtMatches':    dtm,
-                'gtMatches':    gtm,
-                'dtScores':     [d['score'] for d in dt],
-                'gtIgnore':     gtIg,
-                'dtIgnore':     dtIg,
-            }
-    def accumulate(self, p = None):
-        '''
-        Accumulate per image evaluation results and store the result in self.eval
-        :param p: input params for evaluation
-        :return: None
-        '''
-        print('Accumulating evaluation results...')
-        tic = time.time()
-        if not self.evalImgs:
-            print('Please run evaluate() first')
-        # allows input customized parameters
-        if p is None:
-            p = self.params
-        p.catIds = p.catIds if p.useCats == 1 else [-1]
-        T           = len(p.iouThrs)
-        R           = len(p.recThrs)
-        K           = len(p.catIds) if p.useCats else 1
-        A           = len(p.areaRng)
-        M           = len(p.maxDets)
-        precision   = -np.ones((T,R,K,A,M)) # -1 for the precision of absent categories
-        recall      = -np.ones((T,K,A,M))
-        scores      = -np.ones((T,R,K,A,M))
-        # create dictionary for future indexing
-        _pe = self._paramsEval
-        catIds = _pe.catIds if _pe.useCats else [-1]
-        setK = set(catIds)
-        setA = set(map(tuple, _pe.areaRng))
-        setM = set(_pe.maxDets)
-        setI = set(_pe.imgIds)
-        # get inds to evaluate
-        k_list = [n for n, k in enumerate(p.catIds)  if k in setK]
-        m_list = [m for n, m in enumerate(p.maxDets) if m in setM]
-        a_list = [n for n, a in enumerate(map(lambda x: tuple(x), p.areaRng)) if a in setA]
-        i_list = [n for n, i in enumerate(p.imgIds)  if i in setI]
-        I0 = len(_pe.imgIds)
-        A0 = len(_pe.areaRng)
-        # retrieve E at each category, area range, and max number of detections
-        for k, k0 in enumerate(k_list):
-            Nk = k0*A0*I0
-            for a, a0 in enumerate(a_list):
-                Na = a0*I0
-                for m, maxDet in enumerate(m_list):
-                    E = [self.evalImgs[Nk + Na + i] for i in i_list]
-                    E = [e for e in E if not e is None]
-                    if len(E) == 0:
-                        continue
-                    dtScores = np.concatenate([e['dtScores'][0:maxDet] for e in E])
-                    # different sorting method generates slightly different results.
-                    # mergesort is used to be consistent as Matlab implementation.
-                    inds = np.argsort(-dtScores, kind='mergesort')
-                    dtScoresSorted = dtScores[inds]
-                    dtm  = np.concatenate([e['dtMatches'][:,0:maxDet] for e in E], axis=1)[:,inds]
-                    dtIg = np.concatenate([e['dtIgnore'][:,0:maxDet]  for e in E], axis=1)[:,inds]
-                    gtIg = np.concatenate([e['gtIgnore'] for e in E])
-                    npig = np.count_nonzero(gtIg==0 )
-                    if npig == 0:
-                        continue
-                    tps = np.logical_and(               dtm,  np.logical_not(dtIg) )
-                    fps = np.logical_and(np.logical_not(dtm), np.logical_not(dtIg) )
-                    tp_sum = np.cumsum(tps, axis=1).astype(dtype=np.float)
-                    fp_sum = np.cumsum(fps, axis=1).astype(dtype=np.float)
-                    for t, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
-                        tp = np.array(tp)
-                        fp = np.array(fp)
-                        nd = len(tp)
-                        rc = tp / npig
-                        pr = tp / (fp+tp+np.spacing(1))
-                        fn = npig - tp
-                        tn = nd - tp - fp - fn
-                        q  = np.zeros((R,))
-                        ss = np.zeros((R,))
-                        if nd:
-                            recall[t,k,a,m] = rc[-1]
-                        else:
-                            recall[t,k,a,m] = 0
-                        # numpy is slow without cython optimization for accessing elements
-                        # use python array gets significant speed improvement
-                        pr = pr.tolist(); q = q.tolist()
-                        for i in range(nd-1, 0, -1):
-                            if pr[i] > pr[i-1]:
-                                pr[i-1] = pr[i]
-                        inds = np.searchsorted(rc, p.recThrs, side='left')
-                        try:
-                            for ri, pi in enumerate(inds):
-                                q[ri] = pr[pi]
-                                ss[ri] = dtScoresSorted[pi]
-                        except:
-                            pass
-                        precision[t,:,k,a,m] = np.array(q)
-                        scores[t,:,k,a,m] = np.array(ss)
-        self.eval = {
-            'params': p,
-            'counts': [T, R, K, A, M],
-            'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
-            'precision': precision,
-            'recall':   recall,
-            'scores': scores,
-        }
-        toc = time.time()
-        print('DONE (t={:0.2f}s).'.format( toc-tic))
-    def summarize(self):
-        '''
-        Compute and display summary metrics for evaluation results.
-        Note this functin can *only* be applied on the default parameter setting
-        '''
-        def _summarize( ap=1, iouThr=None, areaRng='all', maxDets=100 ):
-            p = self.params
-            iStr = ' {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}'
-            titleStr = 'Average Precision' if ap == 1 else 'Average Recall'
-            typeStr = '(AP)' if ap==1 else '(AR)'
-            iouStr = '{:0.2f}:{:0.2f}'.format(p.iouThrs[0], p.iouThrs[-1]) \
-                if iouThr is None else '{:0.2f}'.format(iouThr)
-            aind = [i for i, aRng in enumerate(p.areaRngLbl) if aRng == areaRng]
-            mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets]
-            if ap == 1:
-                # dimension of precision: [TxRxKxAxM]
-                s = self.eval['precision']
-                # IoU
-                if iouThr is not None:
-                    t = np.where(iouThr == p.iouThrs)[0]
-                    s = s[t]
-                s = s[:,:,:,aind,mind]
-            else:
-                # dimension of recall: [TxKxAxM]
-                s = self.eval['recall']
-                if iouThr is not None:
-                    t = np.where(iouThr == p.iouThrs)[0]
-                    s = s[t]
-                s = s[:,:,aind,mind]
-            if len(s[s>-1])==0:
-                mean_s = -1
-            else:
-                mean_s = np.mean(s[s>-1])
-            print(iStr.format(titleStr, typeStr, iouStr, areaRng, maxDets, mean_s))
-            return mean_s
-        def _summarizeDets():
-            stats = np.zeros((12,))
-            stats[0] = _summarize(1)
-            stats[1] = _summarize(1, iouThr=.5, maxDets=self.params.maxDets[2])
-            stats[2] = _summarize(1, iouThr=.75, maxDets=self.params.maxDets[2])
-            stats[3] = _summarize(1, areaRng='small', maxDets=self.params.maxDets[2])
-            stats[4] = _summarize(1, areaRng='medium', maxDets=self.params.maxDets[2])
-            stats[5] = _summarize(1, areaRng='large', maxDets=self.params.maxDets[2])
-            stats[6] = _summarize(0, maxDets=self.params.maxDets[0])
-            stats[7] = _summarize(0, maxDets=self.params.maxDets[1])
-            stats[8] = _summarize(0, maxDets=self.params.maxDets[2])
-            stats[9] = _summarize(0, areaRng='small', maxDets=self.params.maxDets[2])
-            stats[10] = _summarize(0, areaRng='medium', maxDets=self.params.maxDets[2])
-            stats[11] = _summarize(0, areaRng='large', maxDets=self.params.maxDets[2])
-            return stats
-        def _summarizeKps():
-            stats = np.zeros((10,))
-            stats[0] = _summarize(1, maxDets=20)
-            stats[1] = _summarize(1, maxDets=20, iouThr=.5)
-            stats[2] = _summarize(1, maxDets=20, iouThr=.75)
-            stats[3] = _summarize(1, maxDets=20, areaRng='medium')
-            stats[4] = _summarize(1, maxDets=20, areaRng='large')
-            stats[5] = _summarize(0, maxDets=20)
-            stats[6] = _summarize(0, maxDets=20, iouThr=.5)
-            stats[7] = _summarize(0, maxDets=20, iouThr=.75)
-            stats[8] = _summarize(0, maxDets=20, areaRng='medium')
-            stats[9] = _summarize(0, maxDets=20, areaRng='large')
-            return stats
-        if not self.eval:
-            raise Exception('Please run accumulate() first')
-        iouType = self.params.iouType
-        if iouType == 'segm' or iouType == 'bbox':
-            summarize = _summarizeDets
-        elif iouType == 'keypoints':
-            summarize = _summarizeKps
-        self.stats = summarize()
-    def prs(self):
-        def _summarize(iouThr=None, areaRng='all', maxDets=100):
-            p = self.params
-            iStr = '[ IoU={:<9} | area={:>6} | maxDets={:>3} ]'
-            iouStr = '%0.2f:%0.2f' % (p.iouThrs[0], p.iouThrs[-1]) if iouThr is None else '%0.2f' % (iouThr)
-            areaStr = areaRng
-            maxDetsStr = '%d' % (maxDets)
-            aind = [i for i, aRng in enumerate(['all', 'small', 'medium', 'large']) if aRng == areaRng]
-            mind = [i for i, mDet in enumerate([1, 10, 100]) if mDet == maxDets]
-            prec = self.eval['precision']
-            if iouThr is not None:
-                t = np.where(iouThr == p.iouThrs)[0]
-                prec = prec[t]
-            prec = prec[:, :, :, aind, mind]
-            # [iou, rec, cls, 1] -> [rec]
-            prec = prec.mean(0).mean(1).flatten()
-            return iStr.format(iouStr, areaStr, maxDetsStr), prec
-        if not self.eval:
-            raise Exception('Please run accumulate() first')
-        prs = []
-        prs.append(_summarize())  # 0.5:0.95, all
-        prs.append(_summarize(iouThr=.5))  # 0.5, all
-        prs.append(_summarize(iouThr=.75))  # 0.75, all
-        prs.append(_summarize(areaRng='small'))  # 0.5:0.95, small
-        prs.append(_summarize(iouThr=.5, areaRng='small'))  # 0.5, small
-        prs.append(_summarize(iouThr=.75, areaRng='small'))  # 0.75, small
-        prs.append(_summarize(areaRng='medium'))  # 0.5:0.95, medium
-        prs.append(_summarize(iouThr=.5, areaRng='medium'))  # 0.5, medium
-        prs.append(_summarize(iouThr=.75, areaRng='medium'))  # 0.75, medium
-        prs.append(_summarize(areaRng='large'))  # 0.5:0.95, large
-        prs.append(_summarize(iouThr=.5, areaRng='large'))  # 0.5, large
-        prs.append(_summarize(iouThr=.75, areaRng='large'))  # 0.75, large
-        return dict(prs)
-    def __str__(self):
-        self.summarize()
-class Params:
-    '''
-    Params for coco evaluation api
-    '''
-    def setDetParams(self):
-        self.imgIds = []
-        self.catIds = []
-        # np.arange causes trouble.  the data point on arange is slightly larger than the true value
-        self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
-        self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01)) + 1, endpoint=True)
-        self.maxDets = [1, 10, 100]
-        self.areaRng = [[0 ** 2, 1e5 ** 2], [0 ** 2, 32 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]
-        self.areaRngLbl = ['all', 'small', 'medium', 'large']
-        self.useCats = 1
-    def setKpParams(self):
-        self.imgIds = []
-        self.catIds = []
-        # np.arange causes trouble.  the data point on arange is slightly larger than the true value
-        self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
-        self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01)) + 1, endpoint=True)
-        self.maxDets = [20]
-        self.areaRng = [[0 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]
-        self.areaRngLbl = ['all', 'medium', 'large']
-        self.useCats = 1
-    def __init__(self, iouType='segm'):
-        if iouType == 'segm' or iouType == 'bbox':
-            self.setDetParams()
-        elif iouType == 'keypoints':
-            self.setKpParams()
-        else:
-            raise Exception('iouType not supported')
-        self.iouType = iouType
-        # useSegm is deprecated
-        self.useSegm = None
\ No newline at end of file
--- a/seetadet/utils/pycocotools/mask.py
+++ b/seetadet/utils/pycocotools/mask.py
-__author__ = 'tsungyi'
-import seetadet.utils.pycocotools._mask as _mask
-# Interface for manipulating masks stored in RLE format.
-#
-# RLE is a simple yet efficient format for storing binary masks. RLE
-# first divides a vector (or vectorized image) into a series of piecewise
-# constant regions and then for each piece simply stores the length of
-# that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would
-# be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1]
-# (note that the odd counts are always the numbers of zeros). Instead of
-# storing the counts directly, additional compression is achieved with a
-# variable bitrate representation based on a common scheme called LEB128.
-#
-# Compression is greatest given large piecewise constant regions.
-# Specifically, the size of the RLE is proportional to the number of
-# *boundaries* in M (or for an image the number of boundaries in the y
-# direction). Assuming fairly simple shapes, the RLE representation is
-# O(sqrt(n)) where n is number of pixels in the object. Hence space usage
-# is substantially lower, especially for large simple objects (large n).
-#
-# Many common operations on masks can be computed directly using the RLE
-# (without need for decoding). This includes computations such as area,
-# union, intersection, etc. All of these operations are linear in the
-# size of the RLE, in other words they are O(sqrt(n)) where n is the area
-# of the object. Computing these operations on the original mask is O(n).
-# Thus, using the RLE can result in substantial computational savings.
-#
-# The following API functions are defined:
-#  encode         - Encode binary masks using RLE.
-#  decode         - Decode binary masks encoded via RLE.
-#  merge          - Compute union or intersection of encoded masks.
-#  iou            - Compute intersection over union between masks.
-#  area           - Compute area of encoded masks.
-#  toBbox         - Get bounding boxes surrounding encoded masks.
-#  frPyObjects    - Convert polygon, bbox, and uncompressed RLE to encoded RLE mask.
-#
-# Usage:
-#  Rs     = encode( masks )
-#  masks  = decode( Rs )
-#  R      = merge( Rs, intersect=false )
-#  o      = iou( dt, gt, iscrowd )
-#  a      = area( Rs )
-#  bbs    = toBbox( Rs )
-#  Rs     = frPyObjects( [pyObjects], h, w )
-#
-# In the API the following formats are used:
-#  Rs      - [dict] Run-length encoding of binary masks
-#  R       - dict Run-length encoding of binary mask
-#  masks   - [hxwxn] Binary mask(s) (must have type np.ndarray(dtype=uint8) in column-major order)
-#  iscrowd - [nx1] list of np.ndarray. 1 indicates corresponding gt image has crowd region to ignore
-#  bbs     - [nx4] Bounding box(es) stored as [x y w h]
-#  poly    - Polygon stored as [[x1 y1 x2 y2...],[x1 y1 ...],...] (2D list)
-#  dt,gt   - May be either bounding boxes or encoded masks
-# Both poly and bbs are 0-indexed (bbox=[0 0 1 1] encloses first pixel).
-#
-# Finally, a note about the intersection over union (iou) computation.
-# The standard iou of a ground truth (gt) and detected (dt) object is
-#  iou(gt,dt) = area(intersect(gt,dt)) / area(union(gt,dt))
-# For "crowd" regions, we use a modified criteria. If a gt object is
-# marked as "iscrowd", we allow a dt to match any subregion of the gt.
-# Choosing gt' in the crowd gt that best matches the dt can be done using
-# gt'=intersect(dt,gt). Since by definition union(gt',dt)=dt, computing
-#  iou(gt,dt,iscrowd) = iou(gt',dt) = area(intersect(gt,dt)) / area(dt)
-# For crowd gt regions we use this modified criteria above for the iou.
-#
-# To compile run "python setup.py build_ext --inplace"
-# Please do not contact us for help with compiling.
-#
-# Microsoft COCO Toolbox.      version 2.0
-# Data, paper, and tutorials available at:  http://mscoco.org/
-# Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
-# Licensed under the Simplified BSD License [see coco/license.txt]
-encode      = _mask.encode
-decode      = _mask.decode
-iou         = _mask.iou
-merge       = _mask.merge
-area        = _mask.area
-toBbox      = _mask.toBbox
-frPyObjects = _mask.frPyObjects
\ No newline at end of file
--- a/seetadet/utils/pycocotools/mask_utils.py
+++ b/seetadet/utils/pycocotools/mask_utils.py
-# ------------------------------------------------------------
-# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
-#
-# Licensed under the BSD 2-Clause License.
-# You should have received a copy of the BSD 2-Clause License
-# along with the software. If not, See,
-#
-#     <https://opensource.org/licenses/BSD-2-Clause>
-#
-# ------------------------------------------------------------
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import numpy as np
-from seetadet.utils.pycocotools import mask as mask_tools
-from seetadet.utils.pycocotools.mask import frPyObjects
-def poly2rle(poly, height, width):
-    """Convert polygon(s) into encoded rle.
-    The polygon(s) may be store in following format:
-    1. Polygon with uncompressed RLE:
-        {'size': (h, w), 'counts', [1, 2, ...]}
-    2. Polygons with number of coordinates > 4:
-        [[x1, y1, x2, y2, x3, y3, ...], [x1, y1, x2, y2, x3, y3, ...]]
-    3. Polygons with uncompressed RLE:
-        [{'size': (h, w), 'counts', [1, 2, ...]}]
-    COCO use **2** and **1** to annotate instances and crowed objects.
-    The output rle(s) will be:
-       {'size': (h, w), 'counts': 'abc...'} or [{'size': (h, w), 'counts': 'abc...'}]
-    Parameters
-    ----------
-    poly : Union[List, Dict]
-        The input polygons.
-    height : int
-        The height of image.
-    width : int
-        The width of image.
-    Returns
-    -------
-    Union[List, Dict]
-        The bytes or a sequence of bytes.
-    Notes
-    -----
-    COCODataset uses **2** and **1** to annotate instances and crowed objects.
-    """
-    return frPyObjects(poly, height, width)
-def poly2bytes(poly, height, width):
-    """Convert polygon(s) into encoded mask bytes.
-    The polygon(s) may be store in the following format:
-    1. Polygon with uncompressed RLE:
-        {'size': (h, w), 'counts', [1, 2, ...]}
-    2. Polygons with number of coordinates > 4:
-        [[x1, y1, x2, y2, x3, y3, ...], [x1, y1, x2, y2, x3, y3, ...]]
-    3. Polygons with uncompressed RLE:
-        [{'size': (h, w), 'counts', [1, 2, ...]}]
-    If the number of polygons >= 2, we will merge them into a single mask.
-    Parameters
-    ----------
-    poly : Union[List, Dict]
-        The input polygons.
-    height : int
-        The height of image.
-    width : int
-        The width of image.
-    Returns
-    -------
-    bytes
-        The mask bytes.
-    Notes
-    -----
-    COCODataset uses **2** and **1** to annotate instances and crowed objects.
-    """
-    rle_objects = poly2rle(poly, height, width)
-    if isinstance(rle_objects, list):
-        if len(rle_objects) == 1:
-            return rle_objects[0]['counts']
-        rle_objects = mask_tools.merge(rle_objects)
-    return rle_objects['counts']
-def bytes2img(data, height, width):
-    """Decode the RLE mask bytes to a 2d image.
-    Parameters
-    ----------
-    data : bytes
-        The encoded bytes.
-    height : int
-        The height of image.
-    width : int
-        The width of image.
-    Returns
-    -------
-    numpy.ndarray
-        The mask image.
-    """
-    rle_objects = [{'counts': data, 'size': [height, width]}]
-    mask_image = mask_tools.decode(rle_objects)
-    if mask_image.shape[2] != 1:
-        raise ValueError(
-            '{} instances are found in data.\n'
-            'Merge them before compressing.'
-            .format(mask_image.shape[2]))
-    return mask_image[:, :, 0]
-def img2bytes(data):
-    """Compress a 2d mask image to RLE bytes.
-    Parameters
-    ----------
-    data : numpy.ndarray
-        The image to compress.
-    Returns
-    -------
-    bytes
-        The encoded bytes.
-    """
-    if len(data.shape) == 3:
-        raise ValueError(
-            '{} instances are found in data.\n'
-            'Merge them before compressing.'
-            .format(data.shape[2])
-        )
-    elif len(data.shape) != 2:
-        raise ValueError('Excepted a 2d mask.')
-    rle_objects = mask_tools.encode(
-        np.array(np.stack([data], 2), order='F'))
-    return rle_objects[0]['counts']
--- a/seetadet/algo/retinanet/__init__.py
+++ b/seetadet/algo/retinanet/__init__.py
@@ -8,10 +8,11 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
+"""Visualization utilities."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-from seetadet.algo.retinanet.anchor_target import AnchorTarget
+from seetadet.utils.vis.colormap import colormap
-from seetadet.algo.retinanet.data_loader import DataLoader
+from seetadet.utils.vis.visualizer import vis_one_image
--- a/seetadet/utils/colormap.py
+++ b/seetadet/utils/colormap.py
@@ -17,7 +17,6 @@
 #    <https://github.com/facebookresearch/Detectron/blob/master/detectron/utils/colormap.py>
 #
 ##############################################################################
 """An awesome colormap for really neat visualizations."""
 from __future__ import absolute_import
@@ -29,8 +28,7 @@ import numpy as np
 def colormap(rgb=False):
-    color_list = np.array(
+    color_list = np.array([
-        [
        0.000, 0.447, 0.741,
        0.850, 0.325, 0.098,
        0.929, 0.694, 0.125,
@@ -109,9 +107,7 @@ def colormap(rgb=False):
        0.571, 0.571, 0.571,
        0.714, 0.714, 0.714,
        0.857, 0.857, 0.857,
-            1.000, 1.000, 1.000
+        1.000, 1.000, 1.000]).astype(np.float32)
-        ]
-    ).astype(np.float32)
    color_list = color_list.reshape((-1, 3)) * 255
    if not rgb:
        color_list = color_list[:, ::-1]

--- a/seetadet/utils/vis.py
+++ b/seetadet/utils/vis.py
@@ -29,18 +29,12 @@ import matplotlib.pyplot as plt
 from matplotlib.patches import Polygon
 import numpy as np
-from seetadet.utils.colormap import colormap
+from seetadet.utils.mask import paste_masks
-from seetadet.utils.boxes import expand_boxes
+from seetadet.utils.vis.colormap import colormap
 plt.rcParams['pdf.fonttype'] = 42  # For editing in Adobe Illustrator
-_GRAY = (218, 227, 218)
-_GREEN = (18, 127, 15)
-_WHITE = (255, 255, 255)
 def kp_connections(keypoints):
    kp_lines = [
        [keypoints.index('left_eye'), keypoints.index('right_eye')],
@@ -72,7 +66,6 @@ def convert_from_cls_format(cls_boxes, cls_segms, cls_keyps, class_names):
            box_list.append(cls_boxes[j])
            if cls_segms is not None:
                segm_list.append(cls_segms[j])
    if len(box_list) > 0:
        boxes = np.concatenate(box_list)
    else:
@@ -85,7 +78,6 @@ def convert_from_cls_format(cls_boxes, cls_segms, cls_keyps, class_names):
        keyps = [k for klist in cls_keyps for k in klist]
    else:
        keyps = None
    classes = []
    for j in range(len(cls_boxes)):
        classes += [j] * len(cls_boxes[j])
@@ -111,7 +103,7 @@ def get_bbox_contours(rotated_box):
    r21 = point_rotate((x2, y2), (cx, cy), radian)
    r22 = point_rotate((x2, y1), (cx, cy), radian)
    quad = np.array([r11, r12, r21, r22, r11])
-    # Main direction
+    # Main direction.
    mside = max(w, h) / 2
    x_end = mside * np.cos(radian)
    y_end = mside * np.sin(radian)
@@ -119,34 +111,8 @@ def get_bbox_contours(rotated_box):
    return quad, main_direction
-def get_mask(boxes, segms, im_shape, mask_thresh=0.5):
-    i, masks = 0, np.zeros(list(im_shape) + [len(boxes)], 'uint8')
-    for det, msk in zip(boxes, segms):
-        M = msk.shape[0]
-        scale = (M + 2.) / M
-        ref_box = expand_boxes(np.array([det[:4]]), scale)[0]
-        ref_box = ref_box.astype(np.int32)
-        padded_mask = np.zeros((M + 2, M + 2), 'float32')
-        padded_mask[1:-1, 1:-1] = msk[:, :]
-        w = ref_box[2] - ref_box[0] + 1
-        h = ref_box[3] - ref_box[1] + 1
-        w = np.maximum(w, 1)
-        h = np.maximum(h, 1)
-        mask = cv2.resize(padded_mask, (w, h))
-        mask = np.array(mask > mask_thresh, 'uint8')
-        x1 = max(ref_box[0], 0)
-        y1 = max(ref_box[1], 0)
-        x2 = min(ref_box[2] + 1, im_shape[1])
-        y2 = min(ref_box[3] + 1, im_shape[0])
-        masks[y1: y2, x1: x2, i] = mask[
-            (y1 - ref_box[1]): (y2 - ref_box[1]),
-            (x1 - ref_box[0]): (x2 - ref_box[0])]
-        i += 1
-    return masks
 def vis_one_image(
-    im,
+    img,
    class_names,
    boxes,
    segms=None,
@@ -154,7 +120,7 @@ def vis_one_image(
    thresh=0.9,
    kp_thresh=2,
    dpi=100,
-    box_alpha=0.,
+    box_alpha=1.,
    show_class=True,
    show_rotated=False,
    filename=None,
@@ -162,27 +128,22 @@ def vis_one_image(
    """Visual debugging of detections."""
    boxes, segms, keypoints, classes = \
        convert_from_cls_format(boxes, segms, keypoints, class_names)
+    if boxes is None or boxes.shape[0] == 0 or max(boxes[:, -1]) < thresh:
-    if boxes is None \
-        or boxes.shape[0] == 0 or \
-            max(boxes[:, -1]) < thresh:
        return
+    img, masks = img[:, :, ::-1], None
-    im, masks = im[:, :, ::-1], None
    if segms is not None and len(segms) > 0:
-        masks = get_mask(boxes, segms, im.shape[:2])
+        masks = paste_masks(segms, boxes, img.shape[0], img.shape[1],
+                            thresh=0.5, data_order='C')
    color_list = colormap(rgb=True) / 255
    fig = plt.figure(frameon=False)
-    fig.set_size_inches(im.shape[1] / dpi, im.shape[0] / dpi)
+    fig.set_size_inches(img.shape[1] / dpi, img.shape[0] / dpi)
    ax = plt.Axes(fig, [0., 0., 1., 1.])
    ax.axis('off')
    fig.add_axes(ax)
-    ax.imshow(im)
+    ax.imshow(img)
-    # Display in largest to smallest order to reduce occlusion
+    # Display in largest to smallest order to reduce occlusion.
    if boxes.shape[1] == 5:
        areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
    elif boxes.shape[1] == 6:
@@ -193,83 +154,47 @@ def vis_one_image(
    mask_color_id = 0
    for i in sorted_inds:
-        bbox = boxes[i, :-1]
+        bbox, score = boxes[i, :-1], boxes[i, -1]
-        score = boxes[i, -1]
        if score < thresh:
            continue
+        # Draw box.
-        # Show box
        if bbox.size == 4 and not show_rotated:
-            ax.add_patch(
+            ax.add_patch(plt.Rectangle(
-                plt.Rectangle(
+                (bbox[0], bbox[1]), bbox[2] - bbox[0], bbox[3] - bbox[1],
-                    (bbox[0], bbox[1]),
+                fill=False, edgecolor='g', linewidth=1., alpha=box_alpha))
-                    bbox[2] - bbox[0],
+        # Draw class.
-                    bbox[3] - bbox[1],
-                    fill=False,
-                    edgecolor='g',
-                    linewidth=1.,
-                    alpha=box_alpha,
-                )
-            )
-        # Show class
        if show_class:
-            ax.text(
+            ax.text(bbox[0], bbox[1] - 2,
-                bbox[0], bbox[1] - 2,
                    get_class_string(class_names[classes[i]], score),
-                fontsize=11,
+                    fontsize=11, family='serif', color='white',
-                family='serif',
+                    bbox=dict(facecolor='g', alpha=0.4, pad=0, edgecolor='none'))
-                bbox=dict(facecolor='g', alpha=0.4, pad=0, edgecolor='none'),
+        # Draw mask.
-                color='white',
-            )
-        # Show mask
        if segms is not None and len(segms) > i:
-            img = np.ones(im.shape)
+            img = np.ones(img.shape)
            color_mask = color_list[mask_color_id % len(color_list), 0:3]
            mask_color_id += 1
            w_ratio = .4
            for c in range(3):
                color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio
            for c in range(3):
                img[:, :, c] = color_mask[c]
            e = masks[:, :, i]
+            results = cv2.findContours(e.copy(), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
-            results = cv2.findContours(
-                e.copy(),
-                cv2.RETR_CCOMP,
-                cv2.CHAIN_APPROX_NONE,
-            )
            contours = results[0] if len(results) == 2 else results[1]
            if show_rotated and len(contours) > 1:
                contours = [max(contours, key=cv2.contourArea)]
            for c in contours:
                if show_rotated:
                    rect = cv2.minAreaRect(c)
-                    ax.add_patch(
+                    ax.add_patch(Polygon(cv2.boxPoints(rect), fill=False,
-                        Polygon(
+                                         edgecolor='g', linewidth=1., alpha=box_alpha))
-                            cv2.boxPoints(rect),
+                ax.add_patch(Polygon(c.reshape((-1, 2)),
-                            fill=False,
+                                     fill=True, facecolor=color_mask, edgecolor='w',
-                            edgecolor='g',
+                                     linewidth=1.2, alpha=0.5))
-                            linewidth=1.,
+    # Save or show.
-                            alpha=box_alpha,
-                        )
-                    )
-                ax.add_patch(Polygon(
-                    c.reshape((-1, 2)),
-                    fill=True,
-                    facecolor=color_mask,
-                    edgecolor='w',
-                    linewidth=1.2,
-                    alpha=0.5,
-                ))
    if filename is not None:
        fig.savefig(filename, dpi=dpi)
        plt.close('all')
    else:
-        plt.imshow(im)
+        plt.imshow(img)
        plt.show()
--- a/setup.py
+++ b/setup.py
@@ -15,94 +15,112 @@ from __future__ import print_function
 import os
 import shutil
-import setuptools
-import setuptools.command.install
-import sys
 import subprocess
+import sys
+import setuptools
+import setuptools.command.build_py
+import setuptools.command.install
-# Read the current version info
+version = git_version = None
-with open('version.txt', 'r') as f:
+if os.path.exists('version.txt'):
+    with open('version.txt', 'r') as f:
        version = f.read().strip()
-try:
+if os.path.exists('.git'):
+    try:
        git_version = subprocess.check_output(
-        ['git', 'rev-parse', 'HEAD'], cwd='./').decode('ascii').strip()
+            ['git', 'rev-parse', 'HEAD'], cwd='./')
-except (OSError, subprocess.CalledProcessError):
+        git_version = git_version.decode('ascii').strip()
-    git_version = None
+    except (OSError, subprocess.CalledProcessError):
+        pass
-def clean():
-    """Remove the work directories."""
-    if os.path.exists('build'):
-        shutil.rmtree('build')
-    if os.path.exists('seeta_det.egg-info'):
-        shutil.rmtree('seeta_det.egg-info')
-def configure():
+def build_extensions(parallel=4):
    """Prepare the package files."""
-    # Compile cxx sources
+    # Compile cxx sources.
    py_exec = sys.executable
    if subprocess.call(
        'cd csrc/cxx && '
-        '{} setup.py build_ext -b ../ --no-python-abi-suffix=0 -j 4 &&'
+        '{} setup.py build_ext -b ../../ -f --no-python-abi-suffix=0 -j {} &&'
-        '{} setup.py clean'.format(py_exec, py_exec), shell=True
+        '{} setup.py clean'.format(py_exec, parallel, py_exec), shell=True,
    ) > 0:
        raise RuntimeError('Failed to build the cxx sources.')
-    # Compile pyx sources
+    # Compile pyx sources.
    if subprocess.call(
        'cd csrc/pyx && '
-        '{} setup.py build_ext -b ../ --cython-c-in-temp -j 4 &&'
+        '{} setup.py build_ext -b ../../ -f --cython-c-in-temp -j {} &&'
-        '{} setup.py clean'.format(py_exec, py_exec), shell=True,
+        '{} setup.py clean'.format(py_exec, parallel, py_exec), shell=True,
    ) > 0:
        raise RuntimeError('Failed to build the pyx sources.')
-    # Copy the pre-built libraries
-    for root, _, files in os.walk('csrc/install'):
-        root = root[len('csrc/install/'):]
-        for file in files:
-            src = os.path.join(root, file)
-            dest = src.replace('lib', 'seetadet')
-            if os.path.exists(dest):
-                os.remove(dest)
-            shutil.copy(os.path.join('csrc/install', src), dest)
-    shutil.rmtree('csrc/install')
-    # Write the version file.
-    with open('seetadet/version.py', 'w') as f:
-        f.write("from __future__ import absolute_import\n"
-                "from __future__ import division\n"
-                "from __future__ import print_function\n\n"
-                "version = '{}'\n"
-                "git_version = '{}'\n".format(version, git_version))
-class install(setuptools.command.install.install):
+def clean_builds():
-    """Old-style command to prevent from installing egg."""
+    for path in ['build', 'seeta_det.egg-info']:
+        if os.path.exists(path):
+            shutil.rmtree(path)
-    def run(self):
-        setuptools.command.install.install.run(self)
+def find_packages(top):
-def find_packages():
    """Return the python sources installed to package."""
    packages = []
-    for root, _, files in os.walk('seetadet'):
+    for root, _, _ in os.walk(top):
        if os.path.exists(os.path.join(root, '__init__.py')):
            packages.append(root)
    return packages
-def find_package_data():
+def find_package_data(top):
    """Return the external data installed to package."""
-    libraries = []
+    headers, libraries = [], []
-    for root, _, files in os.walk('seetadet'):
+    if sys.platform == 'win32':
-        root = root[len('seetadet/'):]
+        dylib_suffix = '.pyd'
+    elif sys.platform == 'darwin':
+        dylib_suffix = '.dylib'
+    else:
+        dylib_suffix = '.so'
+    for root, _, files in os.walk(top):
+        root = root[len(top + '/'):]
        for file in files:
-            if file.endswith('.so') or file.endswith('.pyd'):
+            if file.endswith(dylib_suffix):
                libraries.append(os.path.join(root, file))
-    return libraries
+    return headers + libraries
+class BuildPyCommand(setuptools.command.build_py.build_py):
+    """Enhanced 'build_py' command."""
+    def build_packages(self):
+        clean_builds()
+        with open('seetadet/version.py', 'w') as f:
+            f.write("from __future__ import absolute_import\n"
+                    "from __future__ import division\n"
+                    "from __future__ import print_function\n\n"
+                    "version = '{}'\n"
+                    "git_version = '{}'\n".format(version, git_version))
+        super(BuildPyCommand, self).build_packages()
+    def build_package_data(self):
+        parallel = 4
+        for k in ('build', 'install'):
+            v = self.get_finalized_command(k).parallel
+            parallel = max(parallel, (int(v) if v else v) or 1)
+        build_extensions(parallel=parallel)
+        self.package_data = {'seetadet': find_package_data('seetadet')}
+        super(BuildPyCommand, self).build_package_data()
+class InstallCommand(setuptools.command.install.install):
+    """Enhanced 'install' command."""
+    user_options = setuptools.command.install.install.user_options
+    user_options += [('parallel=', 'j', "number of parallel build jobs")]
+    def initialize_options(self):
+        self.parallel = None
+        super(InstallCommand, self).initialize_options()
+        self.old_and_unmanageable = True
-configure()
 setuptools.setup(
    name='seeta-det',
    version=version,
@@ -110,10 +128,10 @@ setuptools.setup(
    url='https://gitlab.seetatech.com/seetaresearch/seetadet',
    author='SeetaTech',
    license='BSD 2-Clause',
-    packages=find_packages(),
+    packages=find_packages('seetadet'),
-    package_data={'seetadet': find_package_data()},
    package_dir={'seetadet': 'seetadet'},
-    cmdclass={'install': install},
+    cmdclass={'build_py': BuildPyCommand, 'install': InstallCommand},
+    install_requires=['opencv-python', 'pillow', 'pycocotools', 'prettytable'],
    classifiers=[
        'Development Status :: 5 - Production/Stable',
        'Intended Audience :: Developers',
@@ -125,7 +143,6 @@ setuptools.setup(
        'Programming Language :: Python :: 3 :: Only',
        'Topic :: Scientific/Engineering',
        'Topic :: Scientific/Engineering :: Mathematics',
-        'Topic :: Scientific/Engineering :: Artificial Intelligence',
+        'Topic :: Scientific/Engineering :: Artificial Intelligence'],
-    ],
 )
-clean()
+clean_builds()
--- a/tools/mpi_train.py
+++ b/tools/mpi_train.py
@@ -8,7 +8,7 @@
 #     <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
-"""Train a detection network with mpi utilities."""
+"""Train a detection network."""
 from __future__ import absolute_import
 from __future__ import division
@@ -22,15 +22,15 @@ import numpy
 from seetadet.core.config import cfg
 from seetadet.core.coordinator import Coordinator
-from seetadet.core.train import train_net
+from seetadet.core.training import train_engine
-from seetadet.datasets.factory import get_dataset
+from seetadet.data.build import build_dataset
-from seetadet.utils import logger
+from seetadet.utils import logging
 def parse_args():
    """Parse arguments."""
    parser = argparse.ArgumentParser(
-        description='Train a detection network with mpi utilities')
+        description='Train a detection network')
    parser.add_argument(
        '--cfg',
        dest='cfg_file',
@@ -50,11 +50,11 @@ if __name__ == '__main__':
    args = parse_args()
    coordinator = Coordinator(args.cfg_file, exp_dir=args.exp_dir)
-    checkpoint, start_iter = coordinator.checkpoint()
+    checkpoint, start_iter = coordinator.get_checkpoint()
    if checkpoint is not None:
        cfg.TRAIN.WEIGHTS = checkpoint
-    # Setup the distributed environment
+    # Setup the distributed environment.
    world_rank = dragon.distributed.get_rank()
    world_size = dragon.distributed.get_world_size()
    if cfg.NUM_GPUS != world_size:
@@ -62,26 +62,25 @@ if __name__ == '__main__':
            'Excepted staring of {} processes, got {}.'
            .format(cfg.NUM_GPUS, world_size))
-    # Setup the logging modules
+    # Setup the logging modules.
-    logger.set_root_logger(world_rank == 0)
+    logging.set_root_logger(world_rank == 0)
-    # Select the GPU depending on the rank of process
+    # Select the GPU depending on the rank of process.
    cfg.GPU_ID = [i for i in range(cfg.NUM_GPUS)][world_rank]
-    # Fix the random seed for reproducibility
+    # Fix the random seed for reproducibility.
-    numpy.random.seed(cfg.RNG_SEED)
+    numpy.random.seed(cfg.RNG_SEED + world_rank)
    dragon.random.set_seed(cfg.RNG_SEED)
-    # Inspect the dataset
+    # Inspect the dataset.
-    dataset = get_dataset(cfg.TRAIN.DATASET)
+    dataset = build_dataset(cfg.TRAIN.DATASET)
-    logger.info('Dataset({}): {} images will be used to train.'
+    logging.info('Dataset({}): {} images will be used to train.'
                 .format(cfg.TRAIN.DATASET, dataset.num_images))
-    # Ready to train the network
+    # Run training.
-    logger.info('Output will be saved to `{:s}`'
+    logging.info('Checkpoints will be saved to `{:s}`'
-                .format(coordinator.checkpoints_dir()))
+                 .format(coordinator.path_at('checkpoints')))
    with dragon.distributed.new_group(
            ranks=[i for i in range(cfg.NUM_GPUS)],
-            backend='NCCL' if cfg.USE_NCCL else 'MPI',
            verbose=True).as_default():
-        train_net(coordinator, start_iter)
+        train_engine.run_train(coordinator, start_iter)
--- a/tools/export.py
+++ b/tools/export.py
@@ -14,17 +14,18 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
+import argparse
+import os
 import sys
-import argparse
 import dragon.vm.torch as torch
-import pprint
+import numpy as np
-from seetadet import onnx as _
 from seetadet.core.config import cfg
 from seetadet.core.coordinator import Coordinator
-from seetadet.modeling.detector import new_detector
+from seetadet.models.build import build_detector
-from seetadet.utils import logger
+from seetadet.ops import onnx as _  # noqa
+from seetadet.utils import logging
 def parse_args():
@@ -41,16 +42,25 @@ def parse_args():
        default='',
        help='experiment dir')
    parser.add_argument(
+        '--model_dir',
+        default='',
+        help='model dir')
+    parser.add_argument(
+        '--gpu',
+        type=int,
+        default=0,
+        help='index of GPU to use')
+    parser.add_argument(
        '--iter',
        type=int,
        default=None,
-        help='iteration step of exporting checkpoint')
+        help='checkpoint of given step')
    parser.add_argument(
        '--input_shape',
        nargs='+',
        type=int,
-        default=(1, 224, 224, 3),
+        default=(1, 512, 512, 3),
-        help='spec of input shape')
+        help='input image shape')
    parser.add_argument(
        '--opset',
        type=int,
@@ -67,33 +77,50 @@ def parse_args():
    return parser.parse_args()
-if __name__ == '__main__':
+def find_weights(args, coordinator):
-    args = parse_args()
+    """Return the weights for exporting."""
-    logger.info('Called with args:\n' + str(args))
+    weights_list = []
+    if args.model_dir:
+        for file in os.listdir(args.model_dir):
+            if not file.endswith('.pkl'):
+                continue
+            weights_list.append(os.path.join(args.model_dir, file))
+    if args.iter is not None:
+        checkpoint, _ = coordinator.get_checkpoint(args.iter, wait=True)
+        weights_list.append(checkpoint)
+    return weights_list
-    coordinator = Coordinator(args.cfg_file, exp_dir=args.exp_dir)
+def get_dummay_inputs(args):
-    logger.info('Using config:\n' + pprint.pformat(cfg))
+    n, h, w, c = args.input_shape
+    im_batch = torch.zeros(n, h, w, c, dtype='uint8')
+    im_info = torch.tensor([[h, w, 1., 1.] for _ in range(n)], dtype='float32')
+    strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
+    strides = np.array(strides)[:, None]
+    grid_shapes = np.stack([[h, w]] * len(strides))
+    grid_shapes = (grid_shapes - 1) // strides + 1
+    grid_info = torch.tensor(grid_shapes, dtype='int64')
+    return {'img': im_batch, 'im_info': im_info, 'grid_info': grid_info}
-    # Load the checkpoint and test engine
-    checkpoint, _ = coordinator.checkpoint(args.iter)
-    if checkpoint is None:
-        raise RuntimeError(
-            'The checkpoint of step {} does not exist.'
-            .format(args.iter))
-    # Ready to export the network
+if __name__ == '__main__':
-    logger.info('Exporting model will be saved to `{:s}`'
+    args = parse_args()
-                .format(coordinator.exports_dir()))
+    logging.info('Called with args:\n' + str(args))
-    detector = new_detector(cfg.GPU_ID, checkpoint)
-    data = torch.zeros(*args.input_shape, dtype='uint8')
+    coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir)
-    ims_info = torch.zeros(args.input_shape[0], 3, dtype='float32')
+    logging.info('Using config:\n' + str(cfg))
+    # Run exporting.
+    weights = find_weights(args, coordinator)[0]
+    weights_name = os.path.splitext(os.path.basename(weights))[0]
+    output_dir = args.model_dir or coordinator.path_at('exports')
+    logging.info('Exports will be saved to ' + output_dir)
+    detector = build_detector(args.gpu, weights)
+    inputs = get_dummay_inputs(args)
    torch.onnx.export(
        model=detector,
-        args={'data': data, 'ims_info': ims_info},
+        args=inputs,
-        f=checkpoint.replace('checkpoints', 'exports')
+        f=os.path.join(output_dir, weights_name + '.onnx'),
-                    .replace('pkl', 'onnx'),
        verbose=True,
        opset_version=args.opset,
        enable_onnx_checker=args.check_model,

--- a/tools/serve.py
+++ b/tools/serve.py
@@ -8,35 +8,33 @@
 #      <https://opensource.org/licenses/BSD-2-Clause>
 #
 # ------------------------------------------------------------
-"""Deploy a detection network for serving."""
+"""Serve a detection network."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-import base64
+import argparse
-import importlib
+import collections
 import os
-import threading
+import multiprocessing as mp
+import time
-import argparse
-import cv2
-import dragon
 import flask
-import kpl_helper
 import numpy as np
-import pprint
 from seetadet.core.config import cfg
 from seetadet.core.coordinator import Coordinator
-from seetadet.modeling.detector import new_detector
+from seetadet.core.testing import test_engine
-from seetadet.utils import logger
+from seetadet.core.testing import test_server
+from seetadet.utils import logging
+from seetadet.utils import profiler
 def parse_args():
    """Parse arguments."""
    parser = argparse.ArgumentParser(
-        description='Deploy a detection network for serving')
+        description='Serve a detection network')
    parser.add_argument(
        '--cfg',
        dest='cfg_file',
@@ -47,14 +45,40 @@ def parse_args():
        default='',
        help='experiment dir')
    parser.add_argument(
+        '--iter',
+        type=int,
+        default=None,
+        help='iteration of checkpoint')
+    parser.add_argument(
        '--model_dir',
        default='',
-        help='final model dir')
+        help='model dir')
    parser.add_argument(
-        '--iter',
+        '--score_thresh',
+        type=float,
+        default=0.6,
+        help='score threshold for inference')
+    parser.add_argument(
+        '--batch_timeout',
+        type=float,
+        default=1,
+        help='timeout to wait for a batch')
+    parser.add_argument(
+        '--queue_size',
+        type=int,
+        default=512,
+        help='size of the memory queue')
+    parser.add_argument(
+        '--gpu',
+        nargs='+',
        type=int,
        default=None,
-        help='test checkpoint of given step')
+        help='index of GPUs to use')
+    parser.add_argument(
+        '--processes',
+        type=int,
+        default=1,
+        help='number of flask processes')
    parser.add_argument(
        '--port',
        type=int,
@@ -63,101 +87,129 @@ def parse_args():
    return parser.parse_args()
-def get_image(base64_str):
+class WebServer(test_server.WebServer):
-    try:
+    """Server to run web serving."""
-        image_bytes = base64.b64decode(base64_str)
-        img = np.frombuffer(image_bytes, np.uint8)
-        img = cv2.imdecode(img, cv2.IMREAD_COLOR)
-        return img
-    except Exception as e:
-        logger.info('Decode base64 image failed. detail: ' + str(e))
-    return None
+    def __init__(self, output_dir, output_queue, output_dict,
+                 score_thresh=0.6, perf_every=100):
+        super(WebServer, self).__init__(output_dir)
+        self.output_queue = output_queue
+        self.output_dict = output_dict
+        self.score_thresh = score_thresh
+        self.perf_every = perf_every
+        self.max_dets = cfg.TEST.DETECTIONS_PER_IM
-def get_objects(boxes_this_image):
+    def make_objects(self, outputs):
+        boxes = outputs.pop('boxes')
        objects = []
-    for j, name in enumerate(cfg.MODEL.CLASSES):
+        for j, name in enumerate(self.classes):
            if name == '__background__':
                continue
-        detections = boxes_this_image[j]
+            inds = np.where(boxes[j][:, 4] > self.score_thresh)[0]
-        return_inds = np.where(detections[:, 4] > cfg.VIS_TH)[0]
+            for box in boxes[j][inds]:
-        for det in detections[return_inds]:
+                objects.append({'bbox': box[:4].astype(int).tolist(),
-            objects.append({
+                                'score': float(box[4]), 'class': name})
-                'score': float(det[4]),
-                'name': name,
-                'xmin': int(det[0]),
-                'ymin': int(det[1]),
-                'xmax': int(det[2]),
-                'ymax': int(det[3])
-            })
-    logger.info('Detect objects: ' + str(objects))
        return objects
+    @staticmethod
-class Wrapper(object):
+    def get_objects(retry_time=0.005):
-    """Inference wrapper."""
-    def __init__(self, args):
-        if args.model_dir:
-            Coordinator(args.cfg_file, exp_dir=args.model_dir)
-            checkpoint = os.path.join(args.model_dir, 'model_final.pkl')
-        else:
-            coordinator = Coordinator(args.cfg_file, exp_dir=args.exp_dir)
-            checkpoint, _ = coordinator.checkpoint(args.iter, wait=False)
-        logger.info('Load model from: ' + checkpoint)
-        self.test_module = 'seetadet.algo.%s.test' % cfg.MODEL.TYPE
-        self.test_module = importlib.import_module(self.test_module)
-        self.detector = new_detector(cfg.GPU_ID, checkpoint)
-        self.lock = threading.RLock()
-    def do_inference(self, img):
-        compute_fn = getattr(self.test_module, 'ims_detect')
-        process_fn = getattr(self.test_module, 'get_detections')
        try:
-            self.lock.acquire()
+            req = flask.request.get_json(force=True)
-            outputs = compute_fn(self.detector, [img])[0]
+            img_id = req['image_id']
-        finally:
+        except KeyError:
-            self.lock.release()
+            err_msg, img_id = 'Not found "image_id" in data.', ''
-        outputs = process_fn(outputs)
+            flask.abort(flask.Response(err_msg))
-        return outputs[0]
+        while img_id not in output_dict:
+            time.sleep(retry_time)
+        return img_id, output_dict.pop(img_id)
+    def run(self):
+        """Main loop to make the detection objects."""
+        timers = collections.defaultdict(profiler.Timer)
+        count = 0
+        while True:
+            count += 1
+            img_id, time_diffs, outputs = self.output_queue.get()
+            outputs = test_engine.filter_outputs(outputs, self.max_dets)
+            for name, diff in time_diffs.items():
+                timers[name].add_diff(diff)
+            self.output_dict[img_id] = self.make_objects(outputs)
+            if count % self.perf_every == 0:
+                logging.info('im_detect: {:d} [{:.3f}s + {:.3f}s]'
+                             .format(count, timers['im_detect'].average_time,
+                                     timers['misc'].average_time))
+def find_weights(args, coordinator):
+    """Return the weights for testing."""
+    weights_list = []
+    if args.model_dir:
+        for file in os.listdir(args.model_dir):
+            if not file.endswith('.pkl'):
+                continue
+            weights_list.append(os.path.join(args.model_dir, file))
+    elif args.iter is not None:
+        checkpoint, _ = coordinator.get_checkpoint(args.iter, wait=True)
+        weights_list.append(checkpoint)
+    return weights_list[0]
 if __name__ == '__main__':
    os.environ['FLASK_ENV'] = 'production'
+    logging.set_formatter("%(asctime)s %(levelname)s %(message)s")
    args = parse_args()
-    logger.info('Called with args:\n' + str(args))
+    logging.info('Called with args:\n' + str(args))
-    logger.info('Using config:\n' + pprint.pformat(cfg))
+    coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir)
+    logging.info('Using config:\n' + str(cfg))
+    # Build actors.
+    weights = find_weights(args, coordinator)
+    devices = args.gpu if args.gpu else [cfg.GPU_ID]
+    num_devices = len(devices)
+    queues = [mp.Queue(args.queue_size) for _ in range(num_devices + 1)]
+    actors = [mp.Process(
+        target=test_engine.test_detector, kwargs={
+            'test_cfg': cfg,
+            'weights': weights,
+            'queues': [queues[i], queues[-1]],
+            'device': devices[i],
+            'verbose': i == 0,
+            'batch_timeout': args.batch_timeout}) for i in range(num_devices)]
+    for actor in actors:
+        actor.start()
+    # Build server.
+    server_manager = mp.Manager()
+    output_dict = server_manager.dict()
+    server = WebServer(
+        output_dir='./',
+        output_queue=queues[-1],
+        output_dict=output_dict,
+        score_thresh=args.score_thresh)
+    server.start()
+    # Build app.
    app = flask.Flask('SeetaDet')
-    workspace = dragon.Workspace()
+    logging._logging.getLogger('werkzeug').setLevel('ERROR')
-    with workspace.as_default():
+    debug_objects = os.environ.get('FLASK_DEBUG', False)
-        wrapper = Wrapper(args)
+    @app.route("/upload", methods=['POST'])
-    @app.route("/", methods=['POST'])
+    def upload():
-    def infer():
+        img_id, img = server.get_image()
-        try:
+        queues[img_id % num_devices].put((img_id, img))
-            req = flask.request.get_json(force=True)
+        return flask.jsonify({'image_id': img_id})
-            base64_str = req['base64_image']
-        except KeyError:
+    @app.route("/get", methods=['POST'])
-            print('Not found base64 image.')
+    def get():
-            return flask.abort(400)
+        img_id, objects = server.get_objects(retry_time=0.005)
-        response = kpl_helper.deploy.RectangleBoxObjectDetectionResponse(0, 0, 0)
+        msg = 'ImageId = %d, #Detects = %d' % (img_id, len(objects))
-        base64_str = base64_str.split(",")[-1]
+        if debug_objects:
-        img = get_image(base64_str)
+            msg += (('\n * ' if len(objects) > 0 else '') +
-        if not isinstance(img, np.ndarray):
+                    ('\n * '.join(str(obj) for obj in objects)))
-            return flask.jsonify(response.dumps())
+        logging.info(msg)
-        response.height, response.width, response.depth = img.shape
+        return flask.jsonify({'objects': objects})
-        with workspace.as_default():
-            detections = wrapper.do_inference(img)
+    app.run(host="0.0.0.0", port=args.port,
-            objects = get_objects(detections)
+            threaded=args.processes == 1, processes=args.processes)
-        for obj in objects:
-            response.add_object(obj['name'],
-                                obj['xmin'],
-                                obj['ymin'],
-                                obj['xmax'],
-                                obj['ymax'],
-                                obj['score'])
-        return flask.jsonify(response.dumps())
-    app.run(host="0.0.0.0", port=args.port)
--- a/tools/test.py
+++ b/tools/test.py
@@ -14,18 +14,15 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
+import argparse
 import os
 import sys
-import argparse
-import pprint
-from seetadet.core import test_engine
-from seetadet.core import test_server
-from seetadet.core.coordinator import Coordinator
 from seetadet.core.config import cfg
-from seetadet.datasets.factory import get_dataset
+from seetadet.core.coordinator import Coordinator
-from seetadet.utils import logger
+from seetadet.core.testing import test_engine
+from seetadet.data.build import build_dataset
+from seetadet.utils import logging
 def parse_args():
@@ -44,90 +41,96 @@ def parse_args():
    parser.add_argument(
        '--model_dir',
        default='',
-        help='final model dir')
+        help='model dir')
    parser.add_argument(
-        '--gpus',
+        '--gpu',
        nargs='+',
        type=int,
        default=None,
        help='index of GPUs to use')
    parser.add_argument(
        '--iter',
+        nargs='+',
        type=int,
        default=None,
-        help='test checkpoint of given step')
+        help='iteration step of checkpoints')
    parser.add_argument(
        '--last',
        type=int,
        default=1,
-        help='test n last checkpoints')
+        help='last N checkpoints')
    parser.add_argument(
        '--read_every',
        type=int,
-        default=1000,
+        default=100,
        help='read every-n images for testing')
    parser.add_argument(
-        '--log_every',
+        '--vis',
-        type=int,
+        type=float,
-        default=100,
+        default=0,
-        help='display testing progress every-n images')
+        help='score threshold for visualization')
    parser.add_argument(
        '--dump',
        action='store_true',
-        help='dump the result back to record or not')
+        help='dump the result back to record')
    parser.add_argument(
        '--wait',
        action='store_true',
        help='wait the checkpoint or not')
+    parser.add_argument(
+        '--precision',
+        default='',
+        help='compute precision')
    if len(sys.argv) == 1:
        parser.print_help()
        sys.exit(1)
    return parser.parse_args()
+def find_weights(args, coordinator):
+    """Return the weights for testing."""
+    weights_list = []
+    if args.model_dir:
+        for file in os.listdir(args.model_dir):
+            if not file.endswith('.pkl'):
+                continue
+            weights_list.append(os.path.join(args.model_dir, file))
+        return weights_list
+    if args.iter is not None:
+        for iter in args.iter:
+            checkpoint, _ = coordinator.get_checkpoint(iter, wait=True)
+            weights_list.append(checkpoint)
+        return weights_list
+    for i in range(1, args.last + 1):
+        checkpoint, _ = coordinator.get_checkpoint(last_idx=i)
+        if checkpoint is None:
+            break
+        weights_list.append(checkpoint)
+    return weights_list
 if __name__ == '__main__':
    args = parse_args()
-    logger.info('Called with args:\n' + str(args))
+    logging.info('Called with args:\n' + str(args))
    coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir)
-    logger.info('Using config:\n' + pprint.pformat(cfg))
+    cfg.MODEL.PRECISION = args.precision or cfg.MODEL.PRECISION
+    logging.info('Using config:\n' + str(cfg))
-    # Inspect the dataset
+    # Inspect dataset.
-    dataset = get_dataset(cfg.TEST.DATASET)
+    dataset = build_dataset(cfg.TEST.DATASET)
-    cfg.TEST.PROTOCOL = 'dump' if args.dump else cfg.TEST.PROTOCOL
+    logging.info('Dataset({}): {} images will be used to test.'
-    logger.info('Dataset({}): {} images will be used to test.'
                 .format(cfg.TEST.DATASET, dataset.num_images))
-    # Inspect the checkpoints
+    # Run testing.
-    test_checkpoints = []
+    for weights in find_weights(args, coordinator):
-    if args.model_dir:
+        weights_name = os.path.splitext(os.path.basename(weights))[0]
-        for file in os.listdir(args.model_dir):
+        output_dir = coordinator.path_at('results/' + weights_name)
-            if file.endswith('.pkl'):
+        logging.info('Results will be saved to ' + output_dir)
-                test_checkpoints.append(os.path.join(args.model_dir, file))
+        test_engine.run_test(
-    else:
+            weights=weights,
-        if args.iter is not None:
+            output_dir=output_dir,
-            checkpoint, _ = coordinator.checkpoint(args.iter, wait=True)
+            devices=args.gpu,
-            test_checkpoints.append(checkpoint)
-        else:
-            i = 1
-            while True:
-                checkpoint, _ = coordinator.checkpoint(last_idx=i)
-                if checkpoint is not None:
-                    test_checkpoints.append(checkpoint)
-                    i += 1
-                    if args.last is not None and i > args.last:
-                        break
-                else:
-                    break
-    for checkpoint in test_checkpoints:
-        # Create the server and run the test
-        output_dir = coordinator.results_dir(checkpoint)
-        logger.info('Results will be saved to ' + output_dir)
-        test_engine.run_test_net(
-            checkpoint=checkpoint,
-            server=test_server.EvaluateServer(output_dir),
-            devices=args.gpus,
            read_every=args.read_every,
-            log_every=args.log_every,
+            vis_thresh=args.vis,
        )
--- a/tools/train.py
+++ b/tools/train.py
@@ -14,19 +14,18 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
+import argparse
 import os
 import sys
-import argparse
 import dragon
 import numpy
-import pprint
 from seetadet.core.config import cfg
 from seetadet.core.coordinator import Coordinator
-from seetadet.core.train import train_net
+from seetadet.core.training import train_engine
-from seetadet.datasets.factory import get_dataset
+from seetadet.data.build import build_dataset
-from seetadet.utils import logger
+from seetadet.utils import logging
 def parse_args():
@@ -48,51 +47,42 @@ def parse_args():
    return parser.parse_args()
-def mpi_train(cfg_file, exp_dir):
+def run_distributed(args, coordinator):
-    """Call mpi to train models on multiple GPUs.
+    """Run distributed training."""
-    Parameters
-    ----------
-    cfg_file : str
-        The path of the cfg file.
-    exp_dir : str
-        The existing experiment dir.
-    """
    import subprocess
-    args = 'mpirun --allow-run-as-root -n {} --bind-to none '.format(cfg.NUM_GPUS)
+    cmd = 'mpirun --allow-run-as-root -n {} --bind-to none '.format(cfg.NUM_GPUS)
-    args += '{} {} '.format(sys.executable, 'mpi_train.py')
+    cmd += '{} {} '.format(sys.executable, 'distributed/train.py')
-    args += '--cfg {} --exp_dir {} '.format(os.path.abspath(cfg_file), exp_dir)
+    cmd += '--cfg {} '.format(os.path.abspath(args.cfg_file))
-    return subprocess.call(args, shell=True)
+    cmd += '--exp_dir {}'.format(coordinator.exp_dir)
+    return subprocess.call(cmd, shell=True)
 if __name__ == '__main__':
    args = parse_args()
-    logger.info('Called with args:\n' + str(args))
+    logging.info('Called with args:\n' + str(args))
    coordinator = Coordinator(args.cfg_file, args.exp_dir)
-    logger.info('Using config:\n' + pprint.pformat(cfg))
+    logging.info('Using config:\n' + str(cfg))
    if cfg.NUM_GPUS > 1:
-        # Dispatch the MPI to start a multi-nodes task
+        # Run a distributed task.
-        coordinator.checkpoints_dir()
+        run_distributed(args, coordinator)
-        mpi_train(args.cfg_file, coordinator.exp_dir)
    else:
        # Resume training?
-        checkpoint, start_iter = coordinator.checkpoint()
+        checkpoint, start_iter = coordinator.get_checkpoint()
        if checkpoint is not None:
            cfg.TRAIN.WEIGHTS = checkpoint
-        # Fix the random seed for reproducibility
+        # Fix the random seed for reproducibility.
        numpy.random.seed(cfg.RNG_SEED)
        dragon.random.set_seed(cfg.RNG_SEED)
-        # Inspect the dataset
+        # Inspect the dataset.
-        dataset = get_dataset(cfg.TRAIN.DATASET)
+        dataset = build_dataset(cfg.TRAIN.DATASET)
-        logger.info('Dataset({}): {} images will be used to train.'
+        logging.info('Dataset({}): {} images will be used to train.'
                     .format(cfg.TRAIN.DATASET, dataset.num_images))
-        # Ready to train the network
+        # Run training.
-        logger.info('Output will be saved to `{:s}`'
+        logging.info('Checkpoints will be saved to `{:s}`'
-                    .format(coordinator.checkpoints_dir()))
+                     .format(coordinator.path_at('checkpoints')))
-        train_net(coordinator, start_iter)
+        train_engine.run_train(coordinator, start_iter)
--- a/version.txt
+++ b/version.txt
-0.5.0a0
+0.6.0a0