Commit ca4313d9 by Ting PAN

Update version to 0.6.0a

1 parent efc0106a
Showing with 9307 additions and 13443 deletions
...@@ -9,4 +9,3 @@ ignore = E741, # ambiguous variable name ...@@ -9,4 +9,3 @@ ignore = E741, # ambiguous variable name
W504, # line break after binary operator W504, # line break after binary operator
# module imported but unused # module imported but unused
per-file-ignores = __init__.py: F401 per-file-ignores = __init__.py: F401
exclude = seetadet/utils/pycocotools
...@@ -2,25 +2,9 @@ ...@@ -2,25 +2,9 @@
## Introduction ## Introduction
### ImageNet Pretrained Models ### Pretrained Models
#### ResNet Models Please refer to [Pretrained Models](data/pretrained/README.md) for details.
- [R-50.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-50.pkl)
- [R-101.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-101.pkl)
#### VGG Models
- [VGG16.SSD.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/VGG16.SSD.pkl)
#### MobileNet Models
- [MobileNetV2.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/MobileNetV2.pkl)
- [ProxylessMobile.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/ProxylessMobile.pkl)
#### AirNet Models
- [AirNet.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/AirNet.pkl)
## Baselines ## Baselines
......
...@@ -7,10 +7,6 @@ while the style of codes is torch. ...@@ -7,10 +7,6 @@ while the style of codes is torch.
The torch-style codes help us to simplify the hierarchical pipeline of modern detection. The torch-style codes help us to simplify the hierarchical pipeline of modern detection.
## Requirements
seeta-dragon >= 0.3.0.dev20201024
## Installation ## Installation
### Build From Source ### Build From Source
...@@ -57,35 +53,23 @@ python test.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION> ...@@ -57,35 +53,23 @@ python test.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
``` ```
Or Or
### Export a detection model to ONNX
```bash ```bash
cd tools cd tools
python test_all.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --last 1 python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
``` ```
### Export a detection model to ONNX ### Serve a detection model
```bash ```bash
cd tools cd tools
python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION> python serve.py --cfg <MODEL_YAML> --model_dir <MODEL_DIR>
``` ```
## Benchmark and Model Zoo ## Benchmark and Model Zoo
Results and models are available in the [Model Zoo](MODEL_ZOO.md). Results and models are available in the [Model Zoo](MODEL_ZOO.md).
### Supported Backbones
- [ResNet](MODEL_ZOO.md#resnet-models)
- [VGG](MODEL_ZOO.md#vgg-models)
- [MobileNet](MODEL_ZOO.md#mobilenet-models)
- [AirNet](MODEL_ZOO.md#airnet-models)
### Supported Algorithms
- [Faster R-CNN](configs/faster_rcnn)
- [Mask R-CNN](configs/mask_rcnn)
- [SSD](configs/ssd)
- [RetinaNet](configs/retinanet)
## License ## License
[BSD 2-Clause license](LICENSE) [BSD 2-Clause license](LICENSE)
...@@ -14,13 +14,7 @@ ...@@ -14,13 +14,7 @@
## COCO Object Detection Baselines ## COCO Object Detection Baselines
| Model | Lr sched | Infer time (s/im) | box AP | Download | | Model | Lr sched | Infer time (fps) | box AP | Download |
| :---: | :------: | :---------------: | :----: | :------: | | :---: | :------: | :--------------: | :----: | :-----: |
| [R-50-FPN-800](coco_faster_rcnn_R-50-FPN_800_1x.yml) | 1x | 0.046 | 38.3 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_1x/model_final.pkl) | | [R-50-FPN](coco_faster_rcnn_R_50_FPN_1x.yml) | 1x | 27.78 | 38.4 | [model](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x/model_adb024b6.pkl) &#124; [log]() |
| [R-50-FPN-800](coco_faster_rcnn_R-50-FPN_800_2x.yml) | 2x | 0.046 | 39.7 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_2x/model_final.pkl) | | [R-50-FPN](coco_faster_rcnn_R_50_FPN_2x.yml) | 2x | 27.78 | 39.8 | [model](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_2x/model_9a8c9ae5.pkl) &#124; [log]() |
## Pascal VOC Object Detection Baselines
| Model | Infer time (s/im) | AP@0.5 | Download |
| :---: | :---------------: | :----: | :------: |
| [R-50-FPN-640](voc_faster_rcnn_R-50-FPN_640.yml) | 0.030 | 80.8 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/voc_faster_rcnn_R-50-FPN_640_1x/model_final.pkl) |
NUM_GPUS: 8 NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: 'faster_rcnn' TYPE: 'faster_rcnn'
BACKBONE: 'resnet50.fpn' PRECISION: 'float16'
CLASSES: ['__background__', CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'bus', 'train', 'truck', 'boat', 'traffic light',
...@@ -19,27 +17,30 @@ MODEL: ...@@ -19,27 +17,30 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush'] 'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
SOLVER: SOLVER:
BASE_LR: 0.02 BASE_LR: 0.02
DECAY_STEPS: [60000, 80000] DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000 MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_faster_rcnn_R-50-FPN_800_1x' SNAPSHOT_PREFIX: 'coco_faster_rcnn_R_50_FPN'
FRCNN:
BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7
TRAIN: TRAIN:
WEIGHTS: '/model/R-50.pkl' WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '/data/coco_2017_train' DATASET: '../data/datasets/coco_train2017'
IMS_PER_BATCH: 2 IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800] SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333 MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects USE_DIFF: False # Do not use crowd objects
TEST: TEST:
DATASET: '/data/coco_2017_val' DATASET: '../data/datasets/coco_val2017'
JSON_FILE: '/data/instances_val2017.json' JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
PROTOCOL: 'coco' EVALUATOR: 'coco'
IMS_PER_BATCH: 1 IMS_PER_BATCH: 1
SCALES: [800] SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 8 NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: 'faster_rcnn' TYPE: 'faster_rcnn'
BACKBONE: 'resnet50.fpn' PRECISION: 'float16'
CLASSES: ['__background__', CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'bus', 'train', 'truck', 'boat', 'traffic light',
...@@ -19,27 +17,30 @@ MODEL: ...@@ -19,27 +17,30 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush'] 'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
SOLVER: SOLVER:
BASE_LR: 0.02 BASE_LR: 0.02
DECAY_STEPS: [120000, 160000] DECAY_STEPS: [120000, 160000]
MAX_STEPS: 180000 MAX_STEPS: 180000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_faster_rcnn_R-50-FPN_800_2x' SNAPSHOT_PREFIX: 'coco_faster_rcnn_R-50-FPN_800'
FRCNN:
BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7
TRAIN: TRAIN:
WEIGHTS: '/model/R-50.pkl' WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '/data/coco_2017_train' DATASET: '../data/datasets/coco_train2017'
IMS_PER_BATCH: 2 IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800] SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333 MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects USE_DIFF: False # Do not use crowd objects
TEST: TEST:
DATASET: '/data/coco_2017_val' DATASET: '../data/datasets/coco_val2017'
JSON_FILE: '/data/instances_val2017.json' JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
PROTOCOL: 'coco' EVALUATOR: 'coco'
IMS_PER_BATCH: 1 IMS_PER_BATCH: 1
SCALES: [800] SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 1
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'faster_rcnn'
BACKBONE: 'resnet50.fpn'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
FRCNN:
BATCH_SIZE: 128
ROI_XFORM_RESOLUTION: 7
SOLVER:
BASE_LR: 0.002
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_faster_rcnn_R-50-FPN_640'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 2
SCALES: [480, 512, 544, 576, 608, 640]
MAX_SIZE: 1066
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [640]
MAX_SIZE: 1066
NMS: 0.45
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
## COCO Instance Segmentation Baselines ## COCO Instance Segmentation Baselines
| Model | Lr sched | Infer time (s/im) | box AP | mask AP | Download | | Model | Lr sched | Infer time (fps) | box AP | mask AP | Download |
| :---: | :------: | :---------------: | :----: | :-----: | :------: | | :---: | :------: | :---------------: | :----: | :-----: | :------: |
| [R-50-FPN-800](coco_mask_rcnn_R-50-FPN_800_1x.yml) | 1x | 0.056 | 39.2 | 34.8 | [model](https://dragon.seetatech.com/download/models/seetadet/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_1x/model_final.pkl) | | [R-50-FPN](coco_mask_rcnn_R_50_FPN_1x.yml) | 1x | 22.22 | 39.2 | 35.1 | [model](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x/model_90266029.pkl) &#124; [log]() |
| [R-50-FPN-800](coco_mask_rcnn_R-50-FPN_800_2x.yml) | 2x | 0.056 | 41.4 | 36.5 | [model](https://dragon.seetatech.com/download/models/seetadet/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_2x/model_final.pkl) | | [R-50-FPN](coco_mask_rcnn_R_50_FPN_2x.yml) | 2x | 22.22 | 41.4 | 36.7 | [model](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_2x/model_4ace9d05.pkl) &#124; [log]() |
NUM_GPUS: 8 NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: 'mask_rcnn' TYPE: mask_rcnn
BACKBONE: 'resnet50.fpn' PRECISION: 'float16'
CLASSES: ['__background__', CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'bus', 'train', 'truck', 'boat', 'traffic light',
...@@ -19,28 +17,31 @@ MODEL: ...@@ -19,28 +17,31 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush'] 'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
SOLVER: SOLVER:
BASE_LR: 0.02 BASE_LR: 0.02
DECAY_STEPS: [60000, 80000] DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000 MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_mask_rcnn_R-50-FPN_800_1x' SNAPSHOT_PREFIX: 'coco_mask_rcnn_R-50-FPN'
FRCNN:
BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7
MRCNN:
ROI_XFORM_RESOLUTION: 14
TRAIN: TRAIN:
WEIGHTS: '/model/R-50.pkl' WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '/data/coco_2017_train' DATASET: '../data/datasets/coco_train2017'
LOADER: 'mask_train'
IMS_PER_BATCH: 2 IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800] SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333 MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects USE_DIFF: False # Do not use crowd objects
TEST: TEST:
DATASET: '/data/coco_2017_val' DATASET: '../data/datasets/coco_val2017'
JSON_FILE: '/data/instances_val2017.json' JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
PROTOCOL: 'coco' EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800] SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 8 NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: 'mask_rcnn' TYPE: mask_rcnn
BACKBONE: 'resnet50.fpn' PRECISION: 'float16'
CLASSES: ['__background__', CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'bus', 'train', 'truck', 'boat', 'traffic light',
...@@ -19,28 +17,31 @@ MODEL: ...@@ -19,28 +17,31 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush'] 'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
SOLVER: SOLVER:
BASE_LR: 0.02 BASE_LR: 0.02
DECAY_STEPS: [120000, 160000] DECAY_STEPS: [120000, 160000]
MAX_STEPS: 180000 MAX_STEPS: 180000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_mask_rcnn_R-50-FPN_800_2x' SNAPSHOT_PREFIX: 'coco_mask_rcnn_R_50_FPN'
FRCNN:
BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7
MRCNN:
ROI_XFORM_RESOLUTION: 14
TRAIN: TRAIN:
WEIGHTS: '/model/R-50.pkl' WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '/data/coco_2017_train' DATASET: '../data/datasets/coco_train2017'
LOADER: 'mask_train'
IMS_PER_BATCH: 2 IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800] SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333 MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects USE_DIFF: False # Do not use crowd objects
TEST: TEST:
DATASET: '/data/coco_2017_val' DATASET: '../data/datasets/coco_val2017'
JSON_FILE: '/data/instances_val2017.json' JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
PROTOCOL: 'coco' EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800] SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
NMS: 0.5
...@@ -12,16 +12,7 @@ ...@@ -12,16 +12,7 @@
## COCO Object Detection Baselines ## COCO Object Detection Baselines
| Model | Lr sched | Infer time (s/im) | box AP | Download | | Model | Lr sched | Infer time (fps) | box AP | Download |
| :---: | :------: | :---------------: | :----: | :------: | | :---: | :------: | :--------------: | :----: | :------: |
| [R-50-FPN-416](coco_retinanet_R-50-FPN_416_6x.yml) | 6x | 0.019 | 34.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_416_6x/model_final.pkl) | | [R-50-FPN](coco_retinanet_R_50_FPN_1x.yml) | 1x | 23.3 | 37.4 | [model](https://dragon.seetatech.com/download/seetadet/retinanet/coco_retinanet_R_50_FPN_1x/model_01a4d35f.pkl) &#124; [log]() |
| [R-50-FPN-512](coco_retinanet_R-50-FPN_512_6x.yml) | 6x | 0.022 | 36.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_512_6x/model_final.pkl) | | [R-50-FPN](coco_retinanet_R_50_FPN_2x.yml) | 2x | 23.3 | 39.0 | [model](https://dragon.seetatech.com/download/seetadet/retinanet/coco_retinanet_R_50_FPN_2x/model_7e81f3ad.pkl) &#124; [log]() |
| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_1x.yml) | 1x | 0.051 | 37.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_1x/model_final.pkl) |
| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_2x.yml) | 2x | 0.051 | 39.1 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_2x/model_final.pkl) |
## Pascal VOC Object Detection Baselines
| Model | Infer time (s/im) | AP@0.5 | Download |
| :---: | :---------------: | :----: | :------: |
| [R-50-FPN-416](voc_retinanet_R-50-FPN_416.yml) | 0.015 | 82.3 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_416/model_final.pkl) |
| [R-50-FPN-512](voc_retinanet_R-50-FPN_512.yml) | 0.017 | 83.0 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_512/model_final.pkl) |
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'retinanet'
BACKBONE: 'resnet50.fpn'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [90000, 120000]
MAX_STEPS: 135000
SNAPSHOT_EVERY: 2500
SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_416_6x'
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2017_train'
IMS_PER_BATCH: 8
SCALES: [416]
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2017_val'
JSON_FILE: '/data/instances_val2017.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [416]
NMS: 0.5
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'retinanet'
BACKBONE: 'resnet50.fpn'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [90000, 120000]
MAX_STEPS: 135000
SNAPSHOT_EVERY: 2500
SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_512_6x'
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2017_train'
IMS_PER_BATCH: 8
SCALES: [512]
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2017_val'
JSON_FILE: '/data/instances_val2017.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [512]
NMS: 0.5
NUM_GPUS: 8 NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: 'retinanet' TYPE: 'retinanet'
BACKBONE: 'resnet50.fpn' PRECISION: 'float16'
CLASSES: ['__background__', CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'bus', 'train', 'truck', 'boat', 'traffic light',
...@@ -19,27 +17,25 @@ MODEL: ...@@ -19,27 +17,25 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush'] 'teddy bear', 'hair drier', 'toothbrush']
FPN: BACKBONE:
RPN_MIN_LEVEL: 3 TYPE: 'resnet50.fpn'
RPN_MAX_LEVEL: 7
SOLVER: SOLVER:
BASE_LR: 0.01 BASE_LR: 0.01
DECAY_STEPS: [60000, 80000] DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000 MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800_1x' SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800'
TRAIN: TRAIN:
WEIGHTS: '/model/R-50.pkl' WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '/data/coco_2017_train' DATASET: '../data/datasets/coco_train2017'
IMS_PER_BATCH: 2 IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800] SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333 MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects USE_DIFF: False # Do not use crowd objects
TEST: TEST:
DATASET: '/data/coco_2017_val' DATASET: '../data/datasets/coco_val2017'
JSON_FILE: '/data/instances_val2017.json' JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
PROTOCOL: 'coco' EVALUATOR: 'coco'
IMS_PER_BATCH: 1 IMS_PER_BATCH: 1
SCALES: [800] SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 8 NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: 'retinanet' TYPE: 'retinanet'
BACKBONE: 'resnet50.fpn' PRECISION: 'float16'
CLASSES: ['__background__', CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'bus', 'train', 'truck', 'boat', 'traffic light',
...@@ -19,27 +17,25 @@ MODEL: ...@@ -19,27 +17,25 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush'] 'teddy bear', 'hair drier', 'toothbrush']
FPN: BACKBONE:
RPN_MIN_LEVEL: 3 TYPE: 'resnet50.fpn'
RPN_MAX_LEVEL: 7
SOLVER: SOLVER:
BASE_LR: 0.01 BASE_LR: 0.01
DECAY_STEPS: [120000, 160000] DECAY_STEPS: [120000, 160000]
MAX_STEPS: 180000 MAX_STEPS: 180000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800_2x' SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800'
TRAIN: TRAIN:
WEIGHTS: '/model/R-50.pkl' WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '/data/coco_2017_train' DATASET: '../data/datasets/coco_train2017'
IMS_PER_BATCH: 2 IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800] SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333 MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects USE_DIFF: False # Do not use crowd objects
TEST: TEST:
DATASET: '/data/coco_2017_val' DATASET: '../data/datasets/coco_val2017'
JSON_FILE: '/data/instances_val2017.json' JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
PROTOCOL: 'coco' EVALUATOR: 'coco'
IMS_PER_BATCH: 1 IMS_PER_BATCH: 1
SCALES: [800] SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 1
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'retinanet'
BACKBONE: 'resnet50.fpn'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
RETINANET:
NUM_CONVS: 2
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_retinanet_R-50-FPN_416'
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 16
SCALES: [416]
RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [416]
NMS: 0.45
RETINANET_PRE_NMS_TOP_N: 1000
NUM_GPUS: 2
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'retinanet'
BACKBONE: 'resnet50.fpn'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
RETINANET:
NUM_CONVS: 2
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_retinanet_R-50-FPN_512'
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 8
SCALES: [512]
RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [512]
NMS: 0.45
RETINANET_PRE_NMS_TOP_N: 1000
...@@ -12,7 +12,9 @@ ...@@ -12,7 +12,9 @@
## Pascal VOC Object Detection Baselines ## Pascal VOC Object Detection Baselines
| Model | Infer time (s/im) | AP@0.5 | Download | | Model | Lr sched | Infer time (fps) | AP@0.5 | Download |
| :---: | :---------------: | :----: | :------: | | :---: | :----: | :--------------: | :----: | :------: |
| [VGG-16-300](voc_ssd_VGG-16_300.yml) | 0.012 | 78.3 | [model](https://dragon.seetatech.com/download/models/seetadet/ssd/voc_ssd_VGG-16_300/model_final.pkl) | | [VGG-16-SSD300](voc_ssd300_VGG_16_120e.yml) | 120e | 100.0 | 78.3 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd300_VGG-16_120e/model_54664312.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd300_VGG-16_120e/logs.json) |
| [VGG-16-512](voc_ssd_VGG-16_512.yml) | 0.021 | 80.1 | [model](https://dragon.seetatech.com/download/models/seetadet/ssd/voc_ssd_VGG-16_512/model_final.pkl) | | [VGG-16-SSD512](voc_ssd512_VGG_16_120e.yml) | 120e | 71.4 | 80.6 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd512_VGG-16_120e/model_e332ebfe.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd512_VGG-16_120e/logs.json) |
| [MobileNetV2-SSDLite](voc_ssdlite_MobileNetV2_300e.yml) | 300e | 76.9 | 71.6 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV2_300e/model_da31ebe7.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV2_300e/logs.json) |
| [MobileNetV3L-SSDLite](voc_ssdlite_MobileNetV3L_300e.yml) | 300e | 66.7 | 72.6 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV3L_300e/model_43b33a97.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV3L_300e/logs.json) |
NUM_GPUS: 1 NUM_GPUS: 1
PIXEL_STDS: [1.0, 1.0, 1.0]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: 'ssd' TYPE: 'ssd'
BACKBONE: 'vgg16_reduced_300' PRECISION: 'float16'
COARSEST_STRIDE: 0
CLASSES: ['__background__', CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat', 'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair', 'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse', 'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant', 'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor'] 'sheep', 'sofa', 'train', 'tvmonitor']
SSD: BACKBONE:
TYPE: 'vgg16_fcn.ssd300'
NORM: ''
COARSEST_STRIDE: 300
FPN:
ACTIVATION: 'ReLU'
ANCHOR_GENERATOR:
STRIDES: [8, 16, 32, 64, 100, 300] STRIDES: [8, 16, 32, 64, 100, 300]
ANCHOR_SIZES: [[30, 60], SIZES: [[30, 60], [60, 110],[110, 162],
[60, 110], [162, 213], [213, 264], [264, 315]]
[110, 162],
[162, 213],
[213, 264],
[264, 315]]
ASPECT_RATIOS: [[1, 2, 0.5], ASPECT_RATIOS: [[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33], [1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33], [1, 2, 0.5, 3, 0.33],
...@@ -31,18 +30,21 @@ SOLVER: ...@@ -31,18 +30,21 @@ SOLVER:
DECAY_STEPS: [80000, 100000] DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000 MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_ssd_VGG-16_300' SNAPSHOT_PREFIX: 'voc_ssd300_VGG_16'
AUG:
COLOR_JITTER: 0.5
TRAIN: TRAIN:
WEIGHTS: '/model/VGG16.SSD.pkl' WEIGHTS: '../data/pretrained/VGG-16-FCN_in1k.pkl'
DATASET: '/data/voc_0712_trainval' DATASET: '../data/datasets/voc_trainval0712'
IMS_PER_BATCH: 16 IMS_PER_BATCH: 16
SCALES: [300] SCALES: [300]
RANDOM_SCALES: [0.25, 1.0] SCALES_RANGE: [0.25, 1.0]
USE_COLOR_JITTER: True LOADER: 'ssd_train'
TEST: TEST:
DATASET: '/data/voc_2007_test' DATASET: '../data/datasets/voc_test2007'
PROTOCOL: 'voc2007' JSON_DATASET: '../data/datasets/voc_test2007.json'
EVALUATOR: 'voc2007'
IMS_PER_BATCH: 1 IMS_PER_BATCH: 1
SCALES: [300] SCALES: [300]
NMS: 0.45 NMS_THRESH: 0.45
SCORE_THRESH: 0.01 SCORE_THRESH: 0.01
NUM_GPUS: 2 NUM_GPUS: 1
PIXEL_STDS: [1.0, 1.0, 1.0]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: 'ssd' TYPE: 'ssd'
BACKBONE: 'vgg16_reduced_512' PRECISION: 'float16'
CLASSES: ['__background__', CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat', 'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair', 'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse', 'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant', 'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor'] 'sheep', 'sofa', 'train', 'tvmonitor']
SSD: BACKBONE:
TYPE: 'vgg16_fcn.ssd512'
NORM: ''
COARSEST_STRIDE: 512
FPN:
ACTIVATION: 'ReLU'
ANCHOR_GENERATOR:
STRIDES: [8, 16, 32, 64, 128, 256, 512] STRIDES: [8, 16, 32, 64, 128, 256, 512]
ANCHOR_SIZES: [[35.84, 76.8], SIZES: [[35.84, 76.8],
[76.8, 153.6], [76.8, 153.6],
[153.6, 230.4], [153.6, 230.4],
[230.4, 307.2], [230.4, 307.2],
...@@ -32,18 +36,21 @@ SOLVER: ...@@ -32,18 +36,21 @@ SOLVER:
DECAY_STEPS: [80000, 100000] DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000 MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_ssd_VGG-16_512' SNAPSHOT_PREFIX: 'voc_ssd512_VGG_16'
AUG:
COLOR_JITTER: 0.5
TRAIN: TRAIN:
WEIGHTS: '/model/VGG16.SSD.pkl' WEIGHTS: '../data/pretrained/VGG-16-FCN_in1k.pkl'
DATASET: '/data/voc_0712_trainval' DATASET: '../data/datasets/voc_trainval0712'
IMS_PER_BATCH: 8 IMS_PER_BATCH: 16
SCALES: [512] SCALES: [512]
RANDOM_SCALES: [0.25, 1.0] SCALES_RANGE: [0.25, 1.0]
USE_COLOR_JITTER: True LOADER: 'ssd_train'
TEST: TEST:
DATASET: '/data/voc_2007_test' DATASET: '../data/datasets/voc_test2007'
PROTOCOL: 'voc2007' JSON_DATASET: '../data/datasets/voc_test2007.json'
EVALUATOR: 'voc2007'
IMS_PER_BATCH: 1 IMS_PER_BATCH: 1
SCALES: [512] SCALES: [512]
NMS: 0.45 NMS_THRESH: 0.45
SCORE_THRESH: 0.01 SCORE_THRESH: 0.01
NUM_GPUS: 1
MODEL:
TYPE: 'ssd'
PRECISION: 'float16'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
BACKBONE:
TYPE: 'mobilenet_v2.ssdlite'
NORM: 'BN'
FPN:
CONV: 'SepConv2d'
NORM: 'BN'
ACTIVATION: 'ReLU6'
ANCHOR_GENERATOR:
STRIDES: [16, 32, 64, 107, 160, 320]
SIZES: [[48, 100], [100, 150],[150, 202],
[202, 253], [253, 304], [304, 320]]
ASPECT_RATIOS: [[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33]]
SOLVER:
BASE_LR: 0.04
WEIGHT_DECAY: 0.00004
DECAY_STEPS: [50000, 62500]
MAX_STEPS: 75000
SNAPSHOT_EVERY: 1250
SNAPSHOT_PREFIX: 'voc_ssdlite_MobileNetV2'
AUG:
COLOR_JITTER: 0.5
TRAIN:
WEIGHTS: '../data/pretrained/MobileNetV2_in1k_cls300e.pkl'
DATASET: '../data/datasets/voc_trainval0712'
IMS_PER_BATCH: 64
SCALES: [320]
SCALES_RANGE: [0.25, 1.0]
LOADER: 'ssd_train'
NUM_WORKERS: 12
TEST:
DATASET: '../data/datasets/voc_test2007'
JSON_DATASET: '../data/datasets/voc_test2007.json'
EVALUATOR: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [320]
NMS_THRESH: 0.45
SCORE_THRESH: 0.01
NUM_GPUS: 1
MODEL:
TYPE: 'ssd'
PRECISION: 'float16'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
BACKBONE:
TYPE: 'mobilenet_v3_large.ssdlite'
NORM: 'BN'
FPN:
CONV: 'SepConv2d'
NORM: 'BN'
ACTIVATION: 'ReLU'
ANCHOR_GENERATOR:
STRIDES: [16, 32, 64, 107, 160, 320]
SIZES: [[48, 100], [100, 150],[150, 202],
[202, 253], [253, 304], [304, 320]]
ASPECT_RATIOS: [[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33]]
SOLVER:
BASE_LR: 0.04
WEIGHT_DECAY: 0.00004
DECAY_STEPS: [50000, 62500]
MAX_STEPS: 75000
SNAPSHOT_EVERY: 1250
SNAPSHOT_PREFIX: 'voc_ssdlite_MobileNetV3L'
AUG:
COLOR_JITTER: 0.5
TRAIN:
WEIGHTS: '../data/pretrained/MobileNetV3L_in1k_cls600e.pkl'
DATASET: '../data/datasets/voc_trainval0712'
IMS_PER_BATCH: 64
SCALES: [320]
SCALES_RANGE: [0.25, 1.0]
LOADER: 'ssd_train'
NUM_WORKERS: 12
TEST:
DATASET: '../data/datasets/voc_test2007'
JSON_DATASET: '../data/datasets/voc_test2007.json'
EVALUATOR: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [320]
NMS_THRESH: 0.45
SCORE_THRESH: 0.01
#include "nms_op.h" #include "../operators/nms_op.h"
#include "../utils/detection_utils.h" #include "../utils/detection.h"
namespace dragon { namespace dragon {
template <class Context> template <class Context>
template <typename T> template <typename T>
void NonMaxSuppressionOp<Context>::DoRunWithType() { void NonMaxSuppressionOp<Context>::DoRunWithType() {
int num_selected; auto &X = Input(0), *Y = Output(0);
utils::detection::ApplyNMS( CHECK(X.ndim() == 2 && X.dim(1) == 5)
Output(0)->count(), << "\nThe dimensions of boxes should be (num_boxes, 5).";
Output(0)->count(), detection::ApplyNMS(
X.dim(0),
X.dim(0),
iou_threshold_, iou_threshold_,
Input(0).template mutable_data<T, Context>(), X.template mutable_data<T, Context>(),
Output(0)->template mutable_data<int64_t, CPUContext>(), out_indices_,
num_selected,
ctx()); ctx());
Output(0)->Reshape({num_selected}); Y->template CopyFrom<int64_t>(out_indices_);
}
template <class Context>
void NonMaxSuppressionOp<Context>::RunOnDevice() {
CHECK(Input(0).ndim() == 2 && Input(0).dim(1) == 5)
<< "\nThe dimensions of boxes should be (num_boxes, 5).";
Output(0)->Reshape({Input(0).dim(0)});
DispatchHelper<TensorTypes<float>>::Call(this, Input(0));
} }
DEPLOY_CPU_OPERATOR(NonMaxSuppression); DEPLOY_CPU_OPERATOR(NonMaxSuppression);
......
...@@ -10,8 +10,8 @@ ...@@ -10,8 +10,8 @@
* ------------------------------------------------------------ * ------------------------------------------------------------
*/ */
#ifndef SEETADET_CXX_OPERATORS_NMS_OP_H_ #ifndef DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
#define SEETADET_CXX_OPERATORS_NMS_OP_H_ #define DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
#include "dragon/core/operator.h" #include "dragon/core/operator.h"
...@@ -25,15 +25,18 @@ class NonMaxSuppressionOp final : public Operator<Context> { ...@@ -25,15 +25,18 @@ class NonMaxSuppressionOp final : public Operator<Context> {
iou_threshold_(OP_SINGLE_ARG(float, "iou_threshold", 0.5f)) {} iou_threshold_(OP_SINGLE_ARG(float, "iou_threshold", 0.5f)) {}
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override {
DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(0));
}
template <typename T> template <typename T>
void DoRunWithType(); void DoRunWithType();
protected: protected:
float iou_threshold_; float iou_threshold_;
vector<int64_t> out_indices_;
}; };
} // namespace dragon } // namespace dragon
#endif // SEETADET_CXX_OPERATORS_NMS_OP_H_ #endif // DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
#include <dragon/utils/math_functions.h> #include "../operators/retinanet_decoder_op.h"
#include "../utils/detection.h"
#include "../utils/detection_utils.h"
#include "retinanet_decoder_op.h"
namespace dragon { namespace dragon {
template <class Context> template <class Context>
template <typename T> template <typename T>
void RetinaNetDecoderOp<Context>::DoRunWithType() { void RetinaNetDecoderOp<Context>::DoRunWithType() {
using BT = float; // DType of BBox auto num_images = Input(SCORES).dim(0);
using BC = CPUContext; // Context of BBox auto num_anchors = Input(SCORES).dim(1);
int total_proposals = 0;
auto* batch_scores = Input(SCORES).template data<T, Context>();
auto* batch_deltas = Input(DELTAS).template data<T, BC>();
auto* im_info = Input(IMAGE_INFO).template data<BT, BC>();
auto* all_proposals = Output(0)->template mutable_data<BT, BC>();
for (int im_idx = 0; im_idx < num_images_; ++im_idx) {
BT im_h = im_info[0];
BT im_w = im_info[1];
BT im_scale_h = im_info[2];
BT im_scale_w = im_info[2];
if (Input(IMAGE_INFO).dim(1) == 4) im_scale_w = im_info[3];
CHECK_EQ(strides_.size(), InputSize() - 3)
<< "\nGiven " << strides_.size() << " strides "
<< "and " << InputSize() - 3 << " features";
// Select the top-k candidates as proposals
auto num_boxes = Input(SCORES).dim(1);
auto num_classes = Input(SCORES).dim(2); auto num_classes = Input(SCORES).dim(2);
utils::detection::SelectProposals( auto num_scores = num_anchors * num_classes;
Input(SCORES).count(1), auto num_cell_anchors = int64_t(ratios_.size() * scales_.size());
score_thr_,
batch_scores + im_idx * Input(SCORES).stride(0), // Generate anchors.
roi_scores_, CHECK_EQ(Input(GRID_INFO).dim(0), int64_t(strides_.size()))
roi_indices_, << "\nProvide " << Input(GRID_INFO).dim(0) << " grids for "
ctx()); << strides_.size() << " strides.";
auto num_candidates = (int)roi_scores_.size(); cell_anchors_.resize(strides_.size());
auto num_proposals = std::min(num_candidates, (int)pre_nms_topn_); vector<detection::GridArgs<int64_t>> grid_args(strides_.size());
utils::detection::ArgPartition( for (int i = 0; i < strides_.size(); ++i) {
num_candidates, num_proposals, true, roi_scores_.data(), indices_); grid_args[i].stride = strides_[i];
scores_.resize(indices_.size()); auto& anchors = cell_anchors_[i];
for (int i = 0; i < num_proposals; ++i) { if (int64_t(anchors.size()) == num_cell_anchors * 4) continue;
scores_[i] = roi_scores_[indices_[i]]; anchors.resize(num_cell_anchors * 4);
indices_[i] = roi_indices_[indices_[i]]; detection::GenerateAnchors(
}
// Decode proposals via anchors
int stride_offset = 0;
for (int i = 0; i < strides_.size(); i++) {
auto feature_h = Input(i).dim(2);
auto feature_w = Input(i).dim(3);
auto K = feature_h * feature_w;
auto A = int(ratios_.size() * scales_.size());
anchors_.resize((size_t)(A * 4));
utils::detection::GenerateAnchors(
strides_[i], strides_[i],
(int)ratios_.size(), int64_t(ratios_.size()),
(int)scales_.size(), int64_t(scales_.size()),
ratios_.data(), ratios_.data(),
scales_.data(), scales_.data(),
anchors_.data()); anchors.data());
utils::detection::GetShiftedAnchors( }
num_proposals,
// Set grid arguments.
auto* grid_info = Input(GRID_INFO).template data<int64_t, CPUContext>();
detection::SetGridArgs(num_anchors, num_cell_anchors, grid_info, grid_args);
// Decode detections.
auto* scores = Input(SCORES).template data<T, Context>();
auto* deltas = Input(DELTAS).template data<T, CPUContext>();
auto* im_info = Input(IM_INFO).template data<float, CPUContext>();
auto* Y = Output(0)->Reshape({num_images * pre_nms_topn_, 7});
auto* dets = Y->template mutable_data<float, CPUContext>();
int64_t size_dets = 0;
for (int batch_ind = 0; batch_ind < num_images; ++batch_ind) {
detection::ImageArgs<int64_t> im_args(im_info + batch_ind * 4);
im_args.batch_ind = batch_ind;
detection::SelectProposals(
num_scores,
pre_nms_topn_,
score_thresh_,
scores + batch_ind * num_scores,
scores_,
indices_,
ctx());
auto* offset_dets = dets + size_dets * 7;
auto num_dets = int64_t(indices_.size());
size_dets += num_dets;
for (int i = 0; i < strides_.size(); ++i) {
detection::GetAnchors(
num_dets,
num_cell_anchors,
num_classes, num_classes,
A, grid_args[i],
feature_h, cell_anchors_[i].data(),
feature_w,
strides_[i],
stride_offset,
anchors_.data(),
indices_.data(), indices_.data(),
all_proposals); offset_dets);
stride_offset += (A * K);
} }
utils::detection::GenerateDetections( detection::DecodeDetections(
num_proposals, num_dets,
num_boxes, num_anchors,
num_classes, num_classes,
im_idx, im_args,
im_h,
im_w,
im_scale_h,
im_scale_w,
scores_.data(), scores_.data(),
batch_deltas + im_idx * Input(DELTAS).stride(0), deltas + batch_ind * Input(DELTAS).stride(0),
indices_.data(), indices_.data(),
all_proposals); offset_dets);
total_proposals += num_proposals;
all_proposals += (num_proposals * 7);
im_info += Input(IMAGE_INFO).dim(1);
} }
Output(0)->Reshape({total_proposals, 7}); // Shrink to the correct dimensions.
} Y->Reshape({size_dets, 7});
template <class Context>
void RetinaNetDecoderOp<Context>::RunOnDevice() {
num_images_ = Input(0).dim(0);
CHECK_EQ(Input(-1).dim(0), num_images_)
<< "\nExcepted " << num_images_ << " groups info, got "
<< Input(-1).dim(0) << ".";
Output(0)->Reshape({num_images_ * pre_nms_topn_, 7});
DispatchHelper<TensorTypes<float>>::Call(this, Input(SCORES));
} }
DEPLOY_CPU_OPERATOR(RetinaNetDecoder); DEPLOY_CPU_OPERATOR(RetinaNetDecoder);
...@@ -109,7 +88,7 @@ DEPLOY_CPU_OPERATOR(RetinaNetDecoder); ...@@ -109,7 +88,7 @@ DEPLOY_CPU_OPERATOR(RetinaNetDecoder);
DEPLOY_CUDA_OPERATOR(RetinaNetDecoder); DEPLOY_CUDA_OPERATOR(RetinaNetDecoder);
#endif #endif
OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX); OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(4).NumOutputs(1);
NO_GRADIENT(RetinaNetDecoder); NO_GRADIENT(RetinaNetDecoder);
......
...@@ -10,8 +10,8 @@ ...@@ -10,8 +10,8 @@
* ------------------------------------------------------------ * ------------------------------------------------------------
*/ */
#ifndef SEETADET_CXX_OPERATORS_RETINANET_DECODER_OP_H_ #ifndef DRAGON_EXTENSION_OPERATORS_RETINANET_DECODER_OP_H_
#define SEETADET_CXX_OPERATORS_RETINANET_DECODER_OP_H_ #define DRAGON_EXTENSION_OPERATORS_RETINANET_DECODER_OP_H_
#include "dragon/core/operator.h" #include "dragon/core/operator.h"
...@@ -26,24 +26,29 @@ class RetinaNetDecoderOp final : public Operator<Context> { ...@@ -26,24 +26,29 @@ class RetinaNetDecoderOp final : public Operator<Context> {
ratios_(OP_REPEATED_ARG(float, "ratios")), ratios_(OP_REPEATED_ARG(float, "ratios")),
scales_(OP_REPEATED_ARG(float, "scales")), scales_(OP_REPEATED_ARG(float, "scales")),
pre_nms_topn_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)), pre_nms_topn_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
score_thr_(OP_SINGLE_ARG(float, "score_thresh", 0.05f)) {} score_thresh_(OP_SINGLE_ARG(float, "score_thresh", 0.05f)) {}
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override {
DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(SCORES));
}
template <typename T> template <typename T>
void DoRunWithType(); void DoRunWithType();
enum INPUT_TAGS { SCORES = -3, DELTAS = -2, IMAGE_INFO = -1 }; enum INPUT_TAGS { SCORES = 0, DELTAS = 1, IM_INFO = 2, GRID_INFO = 3 };
protected: protected:
float score_thr_; float score_thresh_;
vec64_t strides_, indices_, roi_indices_; vector<int64_t> strides_;
vector<float> ratios_, scales_, anchors_; vector<float> ratios_, scales_;
vector<float> scores_, roi_scores_; int64_t pre_nms_topn_;
int64_t num_images_, pre_nms_topn_;
vector<float> scores_;
vector<int64_t> indices_;
vector<vector<float>> cell_anchors_;
}; };
} // namespace dragon } // namespace dragon
#endif // SEETADET_CXX_OPERATORS_RETINANET_DECODER_OP_H_ #endif // DRAGON_EXTENSION_PERATORS_RETINANET_DECODER_OP_H_
#include <dragon/utils/math_functions.h> #include "../operators/rpn_decoder_op.h"
#include "../utils/detection.h"
#include "../utils/detection_utils.h"
#include "rpn_decoder_op.h"
namespace dragon { namespace dragon {
template <class Context> template <class Context>
template <typename T> template <typename T>
void RPNDecoderOp<Context>::DoRunWithType() { void RPNDecoderOp<Context>::DoRunWithType() {
using BT = float; // DType of BBox auto num_images = Input(SCORES).dim(0);
using BC = CPUContext; // Context of BBox auto num_anchors = Input(SCORES).dim(1);
auto num_cell_anchors = int64_t(ratios_.size() * scales_.size());
int feat_h, feat_w, K, A;
int total_rois = 0, num_rois; // Generate anchors.
int num_candidates, num_proposals; CHECK_EQ(Input(GRID_INFO).dim(0), int64_t(strides_.size()))
<< "\nProvide " << Input(GRID_INFO).dim(0) << " grids for "
auto* batch_scores = Input(SCORES).template data<T, BC>(); << strides_.size() << " strides.";
auto* batch_deltas = Input(DELTAS).template data<T, BC>(); cell_anchors_.resize(strides_.size());
auto* im_info = Input(IMAGE_INFO).template data<BT, BC>(); vector<detection::GridArgs<int64_t>> grid_args(strides_.size());
auto* all_rois = Output(0)->template mutable_data<BT, BC>(); for (int i = 0; i < strides_.size(); ++i) {
grid_args[i].stride = strides_[i];
for (int im_idx = 0; im_idx < num_images_; ++im_idx) { auto& anchors = cell_anchors_[i];
const BT im_h = im_info[0]; if (int64_t(anchors.size()) == num_cell_anchors * 4) continue;
const BT im_w = im_info[1]; anchors.resize(num_cell_anchors * 4);
auto* scores = batch_scores + im_idx * Input(SCORES).stride(0); detection::GenerateAnchors(
auto* deltas = batch_deltas + im_idx * Input(DELTAS).stride(0);
CHECK_EQ(strides_.size(), InputSize() - 3)
<< "\nGiven " << strides_.size() << " strides "
<< "and " << InputSize() - 3 << " feature inputs";
CHECK_EQ(strides_.size(), scales_.size())
<< "\nGiven " << strides_.size() << " strides "
<< "and " << scales_.size() << " scales";
// Select the top-k candidates as proposals
num_candidates = Input(SCORES).dim(1);
num_proposals = std::min(num_candidates, (int)pre_nms_top_n_);
utils::math::ArgPartition(
num_candidates, num_proposals, true, scores, indices_);
// Decode the candidates
int stride_offset = 0;
proposals_.Reshape({num_proposals, 5});
auto* proposals = proposals_.template mutable_data<BT, BC>();
for (int i = 0; i < strides_.size(); i++) {
feat_h = Input(i).dim(2);
feat_w = Input(i).dim(3);
K = feat_h * feat_w;
A = (int)ratios_.size();
anchors_.resize((size_t)(A * 4));
utils::detection::GenerateAnchors(
strides_[i], strides_[i],
(int)ratios_.size(), int64_t(ratios_.size()),
1, int64_t(scales_.size()),
ratios_.data(), ratios_.data(),
scales_.data(), scales_.data(),
anchors_.data()); anchors.data());
utils::detection::GetShiftedAnchors( }
// Set grid arguments.
auto* grid_info = Input(GRID_INFO).template data<int64_t, CPUContext>();
detection::SetGridArgs(num_anchors, num_cell_anchors, grid_info, grid_args);
// Decode proposals.
auto* scores = Input(SCORES).template data<T, CPUContext>();
auto* deltas = Input(DELTAS).template data<T, CPUContext>();
auto* im_info = Input(IM_INFO).template data<float, CPUContext>();
auto* Y = Output("Y")->Reshape({num_images * pre_nms_topn_, 5});
auto* proposals = Y->template mutable_data<float, CPUContext>();
vector<int64_t> size_proposals({0});
for (int batch_ind = 0; batch_ind < num_images; ++batch_ind) {
detection::ImageArgs<int64_t> im_args(im_info + batch_ind * 4);
im_args.batch_ind = batch_ind;
detection::SelectProposals(
num_anchors,
pre_nms_topn_,
score_thresh_,
scores + batch_ind * num_anchors,
scores_,
indices_,
(CPUContext*)nullptr); // Faster.
auto* offset_proposals = proposals + size_proposals.back() * 5;
auto num_proposals = int64_t(indices_.size());
size_proposals.push_back(size_proposals.back() + num_proposals);
for (int i = 0; i < strides_.size(); ++i) {
detection::GetAnchors(
num_proposals, num_proposals,
A, num_cell_anchors,
feat_h, grid_args[i],
feat_w, cell_anchors_[i].data(),
strides_[i],
stride_offset,
anchors_.data(),
indices_.data(), indices_.data(),
proposals); offset_proposals);
stride_offset += (A * K);
} }
utils::detection::GenerateProposals( detection::DecodeProposals(
num_candidates,
num_proposals, num_proposals,
im_h, num_anchors,
im_w, im_args,
scores, scores_.data(),
deltas, deltas + batch_ind * Input(DELTAS).stride(0),
&indices_[0], indices_.data(),
proposals); offset_proposals);
// Sort, NMS and Retrieve detection::SortBoxes<T, detection::Box5d<T>>(
utils::detection::SortProposals( num_proposals, offset_proposals);
0, num_proposals - 1, num_proposals, proposals); }
utils::detection::ApplyNMS(
// Apply NMS.
auto* proposals_v2 = Y->template data<float, Context>();
int64_t size_rois = 0;
for (int batch_ind = 0; batch_ind < num_images; ++batch_ind) {
auto offset = size_proposals[batch_ind];
auto num_proposals = size_proposals[batch_ind + 1] - offset;
detection::ApplyNMS(
num_proposals, num_proposals,
post_nms_top_n_, post_nms_topn_,
nms_thr_, nms_thresh_,
proposals_.template mutable_data<BT, Context>(), proposals_v2 + offset * 5,
roi_indices_.data(), nms_indices_,
num_rois,
ctx()); ctx());
utils::detection::RetrieveRoIs( num_proposals = int64_t(nms_indices_.size());
num_rois, im_idx, proposals, roi_indices_.data(), all_rois); for (int i = 0; i < num_proposals; ++i) {
total_rois += num_rois; scores_[size_rois] = batch_ind;
all_rois += (num_rois * 5); indices_[size_rois++] = nms_indices_[i] + offset;
im_info += Input(IMAGE_INFO).dim(1); }
} }
Output(0)->Reshape({total_rois, 5}); // Apply Histogram.
detection::ApplyHistogram(
// Distribute rois into K bins size_rois,
if (OutputSize() > 1) {
CHECK_EQ(max_level_ - min_level_ + 1, OutputSize())
<< "\nExcepted " << OutputSize() << " outputs for levels "
<< "between [" << min_level_ << ", " << max_level_ << "].";
vector<BT*> ys(OutputSize());
vector<vec64_t> bins(OutputSize());
Tensor RoIs;
RoIs.ReshapeLike(*Output(0));
auto* rois = RoIs.template mutable_data<BT, BC>();
ctx()->template Copy<BT, BC, BC>(
Output(0)->count(), rois, Output(0)->template data<BT, BC>());
utils::detection::CollectRoIs(
total_rois,
min_level_, min_level_,
max_level_, max_level_,
canonical_level_, canonical_level_,
canonical_scale_, canonical_scale_,
rois, proposals,
bins); scores_.data(),
indices_.data(),
for (int i = 0; i < OutputSize(); i++) { output_rois_);
Output(i)->Reshape({std::max((int)bins[i].size(), 1), 5});
ys[i] = Output(i)->template mutable_data<BT, BC>(); // Copy to outputs.
} for (int i = 0; i < OutputSize(); ++i) {
const auto& rois = output_rois_[i];
utils::detection::DistributeRoIs(bins, rois, ys); vector<int64_t> dims({int64_t(rois.size()) / 5, 5});
auto* Yi = Output(i)->Reshape(dims);
std::memcpy(
Yi->template mutable_data<T, CPUContext>(),
rois.data(),
sizeof(T) * rois.size());
} }
} }
template <class Context>
void RPNDecoderOp<Context>::RunOnDevice() {
num_images_ = Input(0).dim(0);
CHECK_EQ(Input(IMAGE_INFO).dim(0), num_images_)
<< "\nExcepted " << num_images_ << " groups info, got "
<< Input(IMAGE_INFO).dim(0) << ".";
roi_indices_.resize(post_nms_top_n_);
Output(0)->Reshape({num_images_ * post_nms_top_n_, 5});
DispatchHelper<TensorTypes<float>>::Call(this, Input(SCORES));
}
DEPLOY_CPU_OPERATOR(RPNDecoder); DEPLOY_CPU_OPERATOR(RPNDecoder);
#ifdef USE_CUDA #ifdef USE_CUDA
DEPLOY_CUDA_OPERATOR(RPNDecoder); DEPLOY_CUDA_OPERATOR(RPNDecoder);
#endif #endif
OPERATOR_SCHEMA(RPNDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX); OPERATOR_SCHEMA(RPNDecoder).NumInputs(4).NumOutputs(1, INT_MAX);
NO_GRADIENT(RPNDecoder); NO_GRADIENT(RPNDecoder);
......
...@@ -10,8 +10,8 @@ ...@@ -10,8 +10,8 @@
* ------------------------------------------------------------ * ------------------------------------------------------------
*/ */
#ifndef SEETADET_CXX_OPERATORS_RPN_DECODER_OP_H_ #ifndef DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
#define SEETADET_CXX_OPERATORS_RPN_DECODER_OP_H_ #define DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
#include "dragon/core/operator.h" #include "dragon/core/operator.h"
...@@ -25,32 +25,39 @@ class RPNDecoderOp final : public Operator<Context> { ...@@ -25,32 +25,39 @@ class RPNDecoderOp final : public Operator<Context> {
strides_(OP_REPEATED_ARG(int64_t, "strides")), strides_(OP_REPEATED_ARG(int64_t, "strides")),
ratios_(OP_REPEATED_ARG(float, "ratios")), ratios_(OP_REPEATED_ARG(float, "ratios")),
scales_(OP_REPEATED_ARG(float, "scales")), scales_(OP_REPEATED_ARG(float, "scales")),
pre_nms_top_n_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)), pre_nms_topn_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
post_nms_top_n_(OP_SINGLE_ARG(int64_t, "post_nms_top_n", 1000)), post_nms_topn_(OP_SINGLE_ARG(int64_t, "post_nms_top_n", 1000)),
nms_thr_(OP_SINGLE_ARG(float, "nms_thresh", 0.7f)), nms_thresh_(OP_SINGLE_ARG(float, "nms_thresh", 0.7f)),
score_thresh_(OP_SINGLE_ARG(float, "score_thresh", 0.f)),
min_level_(OP_SINGLE_ARG(int64_t, "min_level", 2)), min_level_(OP_SINGLE_ARG(int64_t, "min_level", 2)),
max_level_(OP_SINGLE_ARG(int64_t, "max_level", 5)), max_level_(OP_SINGLE_ARG(int64_t, "max_level", 5)),
canonical_level_(OP_SINGLE_ARG(int64_t, "canonical_level", 4)), canonical_level_(OP_SINGLE_ARG(int64_t, "canonical_level", 4)),
canonical_scale_(OP_SINGLE_ARG(int64_t, "canonical_scale", 224)) {} canonical_scale_(OP_SINGLE_ARG(int64_t, "canonical_scale", 224)) {}
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override {
DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(SCORES));
}
template <typename T> template <typename T>
void DoRunWithType(); void DoRunWithType();
enum INPUT_TAGS { SCORES = -3, DELTAS = -2, IMAGE_INFO = -1 }; enum INPUT_TAGS { SCORES = 0, DELTAS = 1, IM_INFO = 2, GRID_INFO = 3 };
protected: protected:
float nms_thr_; float nms_thresh_, score_thresh_;
vec64_t strides_, indices_, roi_indices_; vector<int64_t> strides_;
vector<float> ratios_, scales_, scores_, anchors_; vector<float> ratios_, scales_;
int64_t pre_nms_top_n_, post_nms_top_n_; int64_t min_level_, max_level_;
int64_t num_images_, min_level_, max_level_; int64_t pre_nms_topn_, post_nms_topn_;
int64_t canonical_level_, canonical_scale_; int64_t canonical_level_, canonical_scale_;
Tensor proposals_;
vector<float> scores_;
vector<int64_t> indices_, nms_indices_;
vector<vector<float>> cell_anchors_;
vector<vector<float>> output_rois_;
}; };
} // namespace dragon } // namespace dragon
#endif // SEETADET_CXX_OPERATORS_RPN_DECODER_OP_H_ #endif // DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Build cxx sources.""" """Build cpp extensions."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -16,7 +16,7 @@ from __future__ import print_function ...@@ -16,7 +16,7 @@ from __future__ import print_function
import glob import glob
from dragon.tools import cpp_extension from dragon.utils import cpp_extension
from setuptools import setup from setuptools import setup
Extension = cpp_extension.CppExtension Extension = cpp_extension.CppExtension
...@@ -32,23 +32,18 @@ def find_sources(*dirs): ...@@ -32,23 +32,18 @@ def find_sources(*dirs):
sources = [] sources = []
for path in dirs: for path in dirs:
for ext_suffix in ext_suffixes: for ext_suffix in ext_suffixes:
sources += glob.glob( sources += glob.glob(path + '/*' + ext_suffix, recursive=True)
path + '/*' + ext_suffix,
recursive=True,
)
return sources return sources
ext_modules = [ ext_modules = [
Extension( Extension(
name='install.lib.modules._C', name='seetadet.ops._C',
sources=find_sources('**'), sources=find_sources('**'),
define_macros=[('THRUST_IGNORE_CUB_VERSION_CHECK', None)], define_macros=[('THRUST_IGNORE_CUB_VERSION_CHECK', None)],
), ),
] ]
setup( setup(name='seetadet',
name='SeetaDet',
ext_modules=ext_modules, ext_modules=ext_modules,
cmdclass={'build_ext': cpp_extension.BuildExtension}, cmdclass={'build_ext': cpp_extension.BuildExtension})
)
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_H_
#include "../utils/detection/anchors.h"
#include "../utils/detection/bbox.h"
#include "../utils/detection/nms.h"
#include "../utils/detection/proposals.h"
#include "../utils/detection/types.h"
#endif // DRAGON_EXTENSION_UTILS_DETECTION_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
#include "../../utils/detection/types.h"
namespace dragon {
namespace detection {
/*!
* Anchor Functions.
*/
template <typename IndexT>
inline void SetGridArgs(
const int num_anchors,
const int num_cell_anchors,
const IndexT* grid_info,
vector<GridArgs<IndexT>>& grid_args) {
IndexT grid_offset = 0;
for (int i = 0; i < grid_args.size(); ++i, grid_info += 2) {
auto& args = grid_args[i];
args.h = grid_info[0];
args.w = grid_info[1];
args.offset = grid_offset;
grid_offset += num_cell_anchors * args.h * args.w;
}
std::stringstream ss;
if (grid_offset != num_anchors) {
ss << "Mismatched number of anchors. (Excepted ";
ss << num_anchors << ", Got " << grid_offset << ")";
for (int i = 0; i < grid_args.size(); ++i) {
ss << "\nGrid #" << i << ": "
<< "A=" << num_cell_anchors << ", H=" << grid_args[i].h
<< ", W=" << grid_args[i].w << "\n";
}
}
if (!ss.str().empty()) LOG(FATAL) << ss.str();
}
template <typename T>
inline void GenerateAnchors(
const int stride,
const int num_ratios,
const int num_scales,
const T* ratios,
const T* scales,
T* anchors) {
T* offset_anchors = anchors;
const T area = T(stride * stride);
const T ctr = T(0.5) * T(stride - 1);
for (int i = 0; i < num_ratios; ++i) {
const T ratio_w = std::round(std::sqrt(area / ratios[i]));
const T ratio_h = std::round(ratio_w * ratios[i]);
for (int j = 0; j < num_scales; ++j) {
const T w_half = T(0.5) * (ratio_w * scales[j] - T(1));
const T h_half = T(0.5) * (ratio_h * scales[j] - T(1));
offset_anchors[0] = ctr - w_half;
offset_anchors[1] = ctr - h_half;
offset_anchors[2] = ctr + w_half;
offset_anchors[3] = ctr + h_half;
offset_anchors += 4;
}
}
}
template <typename T>
inline void GetAnchors(
const int num_anchors,
const int num_cell_anchors,
const GridArgs<int64_t>& args,
const T* cell_anchors,
const int64_t* indices,
T* anchors) {
const int64_t index_min = args.offset;
const int64_t index_max = num_cell_anchors * args.h * args.w;
for (int i = 0; i < num_anchors; ++i) {
auto index = indices[i] - index_min;
if (index >= 0 && index < index_max) {
const auto w = index % args.w;
index /= args.w;
const auto h = index % args.h;
index /= args.h;
const auto shift_x = T(w * args.stride);
const auto shift_y = T(h * args.stride);
auto* offset_anchors = anchors + i * 5;
const auto* offset_cell_anchors = cell_anchors + index * 4;
offset_anchors[0] = shift_x + offset_cell_anchors[0];
offset_anchors[1] = shift_y + offset_cell_anchors[1];
offset_anchors[2] = shift_x + offset_cell_anchors[2];
offset_anchors[3] = shift_y + offset_cell_anchors[3];
}
}
}
template <typename T>
inline void GetAnchors(
const int num_anchors,
const int num_cell_anchors,
const int num_classes,
const GridArgs<int64_t>& args,
const T* cell_anchors,
const int64_t* indices,
T* anchors) {
const int64_t index_min = num_classes * args.offset;
const int64_t index_max = num_classes * (num_cell_anchors * args.h * args.w);
for (int i = 0; i < num_anchors; ++i) {
auto index = indices[i] - index_min;
if (index >= 0 && index < index_max) {
index /= num_classes;
const auto w = index % args.w;
index /= args.w;
const auto h = index % args.h;
index /= args.h;
const auto shift_x = T(w * args.stride);
const auto shift_y = T(h * args.stride);
auto* offset_anchors = anchors + i * 7 + 1;
const auto* offset_cell_anchors = cell_anchors + index * 4;
offset_anchors[0] = shift_x + offset_cell_anchors[0];
offset_anchors[1] = shift_y + offset_cell_anchors[1];
offset_anchors[2] = shift_x + offset_cell_anchors[2];
offset_anchors[3] = shift_y + offset_cell_anchors[3];
}
}
}
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
#include "../../utils/detection/types.h"
#if defined(__CUDACC__)
#define HOSTDEVICE_DECL inline __host__ __device__
#else
#define HOSTDEVICE_DECL inline
#endif
namespace dragon {
namespace detection {
/*
* BBox Functions.
*/
template <typename T, class BoxT>
inline void SortBoxes(const int N, T* data, bool descend = true) {
auto* boxes = reinterpret_cast<BoxT*>(data);
std::sort(boxes, boxes + N, [descend](BoxT lhs, BoxT rhs) {
return descend ? (lhs.score > rhs.score) : (lhs.score < rhs.score);
});
}
/*
* BBox Utilities.
*/
namespace utils {
template <typename T>
HOSTDEVICE_DECL bool CheckIoU(const T thresh, const T* a, const T* b) {
#if defined(__CUDACC__)
const T x1 = max(a[0], b[0]);
const T y1 = max(a[1], b[1]);
const T x2 = min(a[2], b[2]);
const T y2 = min(a[3], b[3]);
const T width = max(T(0), x2 - x1 + T(1));
const T height = max(T(0), y2 - y1 + T(1));
#else
const T x1 = std::max(a[0], b[0]);
const T y1 = std::max(a[1], b[1]);
const T x2 = std::min(a[2], b[2]);
const T y2 = std::min(a[3], b[3]);
const T width = std::max(T(0), x2 - x1 + T(1));
const T height = std::max(T(0), y2 - y1 + T(1));
#endif
const T inter = width * height;
const T Sa = (a[2] - a[0] + T(1)) * (a[3] - a[1] + T(1));
const T Sb = (b[2] - b[0] + T(1)) * (b[3] - b[1] + T(1));
return inter > thresh * (Sa + Sb - inter);
}
template <typename T>
inline void BBoxTransform(
const T dx,
const T dy,
const T dw,
const T dh,
const T im_w,
const T im_h,
const T im_scale_h,
const T im_scale_w,
T* bbox) {
const T w = bbox[2] - bbox[0] + 1;
const T h = bbox[3] - bbox[1] + 1;
const T ctr_x = bbox[0] + T(0.5) * w;
const T ctr_y = bbox[1] + T(0.5) * h;
const T pred_ctr_x = dx * w + ctr_x;
const T pred_ctr_y = dy * h + ctr_y;
const T pred_w = std::exp(dw) * w;
const T pred_h = std::exp(dh) * h;
const T x1 = pred_ctr_x - T(0.5) * pred_w;
const T y1 = pred_ctr_y - T(0.5) * pred_h;
const T x2 = pred_ctr_x + T(0.5) * pred_w;
const T y2 = pred_ctr_y + T(0.5) * pred_h;
bbox[0] = std::max(T(0), std::min(x1, im_w - T(1))) / im_scale_w;
bbox[1] = std::max(T(0), std::min(y1, im_h - T(1))) / im_scale_h;
bbox[2] = std::max(T(0), std::min(x2, im_w - T(1))) / im_scale_w;
bbox[3] = std::max(T(0), std::min(y2, im_h - T(1))) / im_scale_h;
}
template <typename T>
inline int GetBBoxLevel(
const int lvl_min,
const int lvl_max,
const int lvl0,
const int s0,
T* bbox) {
const T w = bbox[2] - bbox[0] + 1;
const T h = bbox[3] - bbox[1] + 1;
const T s = std::sqrt(w * h);
const int lvl = lvl0 + std::log2(s / s0 + T(1e-6));
return std::min(std::max(lvl, lvl_min), lvl_max);
}
} // namespace utils
} // namespace detection
} // namespace dragon
#undef HOSTDEVICE_DECL
#endif // DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
#include <dragon/core/common.h>
namespace dragon {
namespace detection {
template <typename MapT>
class KeyValueMapIterator
: public std::iterator<std::input_iterator_tag, MapT> {
public:
typedef KeyValueMapIterator self_type;
typedef ptrdiff_t difference_type;
typedef MapT value_type;
typedef MapT& reference;
KeyValueMapIterator(
typename MapT::key_type* key_ptr,
typename MapT::value_type* value_ptr)
: key_ptr_(key_ptr), value_ptr_(value_ptr) {}
self_type operator++(int) {
self_type ret = *this;
key_ptr_++;
value_ptr_++;
return ret;
}
self_type operator++() {
key_ptr_++;
value_ptr_++;
return *this;
}
self_type operator--() {
key_ptr_--;
value_ptr_--;
return *this;
}
self_type operator--(int) {
self_type ret = *this;
key_ptr_--;
value_ptr_--;
return ret;
}
reference operator*() const {
if (map_.key_ptr != key_ptr_) {
map_.key_ptr = key_ptr_;
map_.value_ptr = value_ptr_;
}
return map_;
}
self_type operator+(difference_type n) const {
return self_type(key_ptr_ + n, value_ptr_ + n);
}
self_type& operator+=(difference_type n) {
key_ptr_ += n;
value_ptr_ += n;
return *this;
}
self_type operator-(difference_type n) const {
return self_type(key_ptr_ - n, value_ptr_ - n);
}
self_type& operator-=(difference_type n) {
key_ptr_ -= n;
value_ptr_ -= n;
return *this;
}
difference_type operator-(self_type other) const {
return key_ptr_ - other.key_ptr_;
}
bool operator<(const self_type& rhs) const {
return key_ptr_ < rhs.key_ptr_;
}
bool operator<=(const self_type& rhs) const {
return key_ptr_ <= rhs.key_ptr_;
}
bool operator==(const self_type& rhs) const {
return key_ptr_ == rhs.key_ptr_;
}
bool operator!=(const self_type& rhs) const {
return key_ptr_ != rhs.key_ptr_;
}
private:
mutable MapT map_;
typename MapT::key_type* key_ptr_;
typename MapT::value_type* value_ptr_;
};
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
#include <dragon/core/context.h>
#include "../../utils/detection/bbox.h"
#include "../../utils/detection/nms.h"
namespace dragon {
namespace detection {
template <>
void ApplyNMS<float, CPUContext>(
const int N,
const int K,
const float thresh,
const float* boxes,
vector<int64_t>& indices,
CPUContext* ctx) {
int num_selected = 0;
indices.resize(K);
vector<char> is_dead(N, 0);
for (int i = 0; i < N; ++i) {
if (is_dead[i]) continue;
indices[num_selected++] = i;
if (num_selected >= K) break;
for (int j = i + 1; j < N; ++j) {
if (is_dead[j]) continue;
if (!utils::CheckIoU(thresh, &boxes[i * 5], &boxes[j * 5])) continue;
is_dead[j] = 1;
}
}
indices.resize(num_selected);
}
} // namespace detection
} // namespace dragon
#include <dragon/core/context_cuda.h>
#include <dragon/core/workspace.h>
#include "../../utils/detection/bbox.h"
#include "../../utils/detection/nms.h"
#include "../../utils/detection/utils.h"
namespace dragon {
namespace detection {
namespace {
#define NUM_THREADS 64
template <typename T>
__global__ void _NonMaxSuppression(
const int N,
const T thresh,
const T* boxes,
uint64_t* mask) {
const int row_start = blockIdx.y;
const int col_start = blockIdx.x;
if (row_start > col_start) return;
const int row_size = min(N - row_start * NUM_THREADS, NUM_THREADS);
const int col_size = min(N - col_start * NUM_THREADS, NUM_THREADS);
__shared__ T block_boxes[NUM_THREADS * 4];
if (threadIdx.x < col_size) {
auto* offset_block_boxes = block_boxes + threadIdx.x * 4;
auto* offset_boxes = boxes + (col_start * NUM_THREADS + threadIdx.x) * 5;
#pragma unroll
for (int i = 0; i < 4; ++i) {
*(offset_block_boxes++) = *(offset_boxes++);
}
}
__syncthreads();
if (threadIdx.x < row_size) {
const int index = row_start * NUM_THREADS + threadIdx.x;
const T* offset_boxes = boxes + index * 5;
unsigned long long val = 0;
const int start = (row_start == col_start) ? (threadIdx.x + 1) : 0;
for (int i = start; i < col_size; ++i) {
if (utils::CheckIoU(thresh, offset_boxes, block_boxes + i * 4)) {
val |= 1ULL << i;
}
}
mask[index * gridDim.x + col_start] = val;
}
}
} // namespace
template <>
void ApplyNMS<float, CUDAContext>(
const int N,
const int K,
const float thresh,
const float* boxes,
vector<int64_t>& indices,
CUDAContext* ctx) {
const auto num_blocks = utils::DivUp(N, NUM_THREADS);
vector<uint64_t> mask_host(N * num_blocks);
auto* mask_dev = (uint64_t*)ctx->workspace()->data<CUDAContext>(
mask_host.size() * sizeof(uint64_t), "BufferKernel");
_NonMaxSuppression<<<
dim3(num_blocks, num_blocks),
NUM_THREADS,
0,
ctx->cuda_stream()>>>(N, thresh, boxes, mask_dev);
CUDA_CHECK(cudaMemcpyAsync(
mask_host.data(),
mask_dev,
mask_host.size() * sizeof(uint64_t),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
ctx->FinishDeviceComputation();
vector<uint64_t> is_dead(num_blocks);
memset(&is_dead[0], 0, sizeof(uint64_t) * num_blocks);
int num_selected = 0;
indices.resize(K);
for (int i = 0; i < N; ++i) {
const int nblock = i / NUM_THREADS;
const int inblock = i % NUM_THREADS;
if (!(is_dead[nblock] & (1ULL << inblock))) {
indices[num_selected++] = i;
if (num_selected >= K) break;
auto* offset_mask = &mask_host[0] + i * num_blocks;
for (int j = nblock; j < num_blocks; ++j) {
is_dead[j] |= offset_mask[j];
}
}
}
indices.resize(num_selected);
}
} // namespace detection
} // namespace dragon
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
#include "../../utils/detection/types.h"
namespace dragon {
namespace detection {
template <typename T, class Context>
void ApplyNMS(
const int N,
const int K,
const T thresh,
const T* boxes,
vector<int64_t>& indices,
Context* ctx);
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
#include <dragon/core/context.h>
#include "../../utils/detection/proposals.h"
namespace dragon {
namespace detection {
namespace {
template <typename KeyT, typename ValueT>
inline void
ArgPartition(const int N, const int K, const ValueT* values, KeyT* keys) {
std::nth_element(keys, keys + K, keys + N, [&values](KeyT lhs, KeyT rhs) {
return values[lhs] > values[rhs];
});
}
} // namespace
template <>
void SelectProposals<float, CPUContext>(
const int N,
const int K,
const float thresh,
const float* scores,
vector<float>& out_scores,
vector<int64_t>& out_indices,
CPUContext* ctx) {
int num_selected = 0;
out_indices.resize(N);
if (thresh > 0.f) {
for (int i = 0; i < N; ++i) {
if (scores[i] > thresh) {
out_indices[num_selected++] = i;
}
}
} else {
num_selected = N;
std::iota(out_indices.begin(), out_indices.end(), 0);
}
if (num_selected > K) {
ArgPartition(num_selected, K, scores, out_indices.data());
out_scores.resize(K);
out_indices.resize(K);
for (int i = 0; i < K; ++i) {
out_scores[i] = scores[out_indices[i]];
}
} else {
out_scores.resize(num_selected);
out_indices.resize(num_selected);
for (int i = 0; i < num_selected; ++i) {
out_scores[i] = scores[out_indices[i]];
}
}
}
} // namespace detection
} // namespace dragon
#include <dragon/core/context_cuda.h>
#include <dragon/core/workspace.h>
#include <dragon/utils/device/common_thrust.h>
#include "../../utils/detection/iterator.h"
#include "../../utils/detection/proposals.h"
namespace dragon {
namespace detection {
namespace {
template <typename KeyT, typename ValueT>
struct ThresholdFunctor {
ThresholdFunctor(ValueT thresh) : thresh_(thresh) {}
inline __device__ bool operator()(
const thrust::tuple<KeyT, ValueT>& kv) const {
return thrust::get<1>(kv) > thresh_;
}
ValueT thresh_;
};
template <typename IterT>
inline void ArgPartition(const int N, const int K, IterT data) {
std::nth_element(
data,
data + K,
data + N,
[](const typename IterT::value_type& lhs,
const typename IterT::value_type& rhs) {
return *lhs.value_ptr > *rhs.value_ptr;
});
}
} // namespace
template <>
void SelectProposals<float, CUDAContext>(
const int N,
const int K,
const float thresh,
const float* scores,
vector<float>& out_scores,
vector<int64_t>& out_indices,
CUDAContext* ctx) {
int num_selected = N;
int64_t* indices = nullptr;
if (thresh > 0.f) {
indices = ctx->workspace()->data<int64_t, CUDAContext>(N, "BufferKernel");
auto policy = thrust::cuda::par.on(ctx->cuda_stream());
auto functor = ThresholdFunctor<int64_t, float>(thresh);
thrust::sequence(policy, indices, indices + N);
auto kv = thrust::make_tuple(indices, const_cast<float*>(scores));
auto first = thrust::make_zip_iterator(kv);
auto last = thrust::partition(policy, first, first + N, functor);
num_selected = last - first;
}
out_scores.resize(num_selected);
out_indices.resize(num_selected);
CUDA_CHECK(cudaMemcpyAsync(
out_scores.data(),
scores,
num_selected * sizeof(float),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
if (thresh > 0.f) {
CUDA_CHECK(cudaMemcpyAsync(
out_indices.data(),
indices,
num_selected * sizeof(int64_t),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
} else {
std::iota(out_indices.begin(), out_indices.end(), 0);
}
ctx->FinishDeviceComputation();
if (num_selected > K) {
auto iter = KeyValueMapIterator<KeyValueMap<int64_t, float>>(
out_indices.data(), out_scores.data());
ArgPartition(num_selected, K, iter);
out_scores.resize(K);
out_indices.resize(K);
}
}
} // namespace detection
} // namespace dragon
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
#include "../../utils/detection/bbox.h"
#include "../../utils/detection/types.h"
namespace dragon {
namespace detection {
template <typename T, class Context>
void SelectProposals(
const int N,
const int K,
const float thresh,
const T* input_scores,
vector<T>& output_scores,
vector<int64_t>& output_indices,
Context* ctx);
template <typename T>
void DecodeProposals(
const int num_proposals,
const int num_anchors,
const ImageArgs<int64_t>& im_args,
const T* scores,
const T* deltas,
const int64_t* indices,
T* proposals) {
T* offset_proposals = proposals;
const T* offset_dx = deltas;
const T* offset_dy = deltas + num_anchors;
const T* offset_dw = deltas + num_anchors * 2;
const T* offset_dh = deltas + num_anchors * 3;
for (int i = 0; i < num_proposals; ++i) {
const auto index = indices[i];
utils::BBoxTransform(
offset_dx[index],
offset_dy[index],
offset_dw[index],
offset_dh[index],
T(im_args.w),
T(im_args.h),
T(1),
T(1),
offset_proposals);
offset_proposals[4] = scores[i];
offset_proposals += 5;
}
}
template <typename T>
void DecodeDetections(
const int num_dets,
const int num_anchors,
const int num_classes,
const ImageArgs<int64_t>& im_args,
const T* scores,
const T* deltas,
const int64_t* indices,
T* dets) {
T* offset_dets = dets;
const T* offset_dx = deltas;
const T* offset_dy = deltas + num_anchors;
const T* offset_dw = deltas + num_anchors * 2;
const T* offset_dh = deltas + num_anchors * 3;
for (int i = 0; i < num_dets; ++i) {
const auto index = indices[i] / num_classes;
utils::BBoxTransform(
offset_dx[index],
offset_dy[index],
offset_dw[index],
offset_dh[index],
T(im_args.w),
T(im_args.h),
T(im_args.scale_h),
T(im_args.scale_w),
offset_dets + 1);
offset_dets[0] = T(im_args.batch_ind);
offset_dets[5] = scores[i];
offset_dets[6] = T(indices[i] % num_classes + 1);
offset_dets += 7;
}
}
template <typename T>
inline void ApplyHistogram(
const int N,
const int lvl_min,
const int lvl_max,
const int lvl0,
const int s0,
const T* boxes,
const T* batch_indices,
const int64_t* box_indices,
vector<vector<T>>& output_rois) {
vector<int> bin_indices(N);
vector<int> bin_count(lvl_max - lvl_min + 1, 0);
for (int i = 0; i < N; ++i) {
const T* offset_boxes = boxes + box_indices[i] * 5;
auto lvl = utils::GetBBoxLevel(lvl_min, lvl_max, lvl0, s0, offset_boxes);
bin_indices[i] = lvl - lvl_min;
bin_count[lvl - lvl_min]++;
}
output_rois.resize(lvl_max - lvl_min + 1);
for (int i = 0; i < output_rois.size(); ++i) {
auto& rois = output_rois[i];
rois.resize(std::max(bin_count[i], 1) * 5);
if (bin_count[i] == 0) rois[0] = T(-1); // Ignored.
}
for (int i = 0; i < N; ++i) {
const T* offset_boxes = boxes + box_indices[i] * 5;
const auto bin_index = bin_indices[i];
const auto roi_index = --bin_count[bin_index];
auto& rois = output_rois[bin_index];
T* offset_rois = rois.data() + roi_index * 5;
offset_rois[0] = batch_indices[i];
offset_rois[1] = offset_boxes[0];
offset_rois[2] = offset_boxes[1];
offset_rois[3] = offset_boxes[2];
offset_rois[4] = offset_boxes[3];
}
}
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
#include <dragon/core/common.h>
namespace dragon {
namespace detection {
template <typename T>
struct Box4d {
T x1, y1, x2, y2;
};
template <typename T>
struct Box5d {
T x1, y1, x2, y2, score;
};
template <typename IndexT>
struct ImageArgs {
ImageArgs(const float* im_info) {
h = im_info[0], w = im_info[1];
scale_h = im_info[2], scale_w = im_info[3];
}
IndexT batch_ind, h, w;
float scale_h, scale_w;
};
template <typename IndexT>
struct GridArgs {
IndexT h, w, stride, offset;
};
template <typename KeyT, typename ValueT>
struct KeyValueMap {
typedef KeyT key_type;
typedef ValueT value_type;
friend void swap(KeyValueMap& x, KeyValueMap& y) {
std::swap(*x.key_ptr, *y.key_ptr);
std::swap(*x.value_ptr, *y.value_ptr);
}
KeyT* key_ptr = nullptr;
ValueT* value_ptr = nullptr;
};
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
namespace dragon {
namespace detection {
/*
* Detection Utilities.
*/
namespace utils {
template <typename T>
inline T DivUp(const T a, const T b) {
return (a + b - T(1)) / b;
}
} // namespace utils
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
#include "detection_utils.h"
#include <dragon/core/context.h>
namespace dragon {
namespace utils {
namespace detection {
template <typename T>
T IoU(const T A[], const T B[]) {
if (A[0] > B[2] || A[1] > B[3] || A[2] < B[0] || A[3] < B[1]) return 0;
const T x1 = std::max(A[0], B[0]);
const T y1 = std::max(A[1], B[1]);
const T x2 = std::min(A[2], B[2]);
const T y2 = std::min(A[3], B[3]);
const T width = std::max((T)0, x2 - x1 + 1);
const T height = std::max((T)0, y2 - y1 + 1);
const T area = width * height;
const T A_area = (A[2] - A[0] + 1) * (A[3] - A[1] + 1);
const T B_area = (B[2] - B[0] + 1) * (B[3] - B[1] + 1);
return area / (A_area + B_area - area);
}
template <>
void ApplyNMS<float, CPUContext>(
const int num_boxes,
const int max_keeps,
const float thresh,
const float* boxes,
int64_t* keep_indices,
int& num_keep,
CPUContext* ctx) {
int count = 0;
std::vector<char> is_dead(num_boxes);
for (int i = 0; i < num_boxes; ++i)
is_dead[i] = 0;
for (int i = 0; i < num_boxes; ++i) {
if (is_dead[i]) continue;
keep_indices[count++] = i;
if (count == max_keeps) break;
for (int j = i + 1; j < num_boxes; ++j)
if (!is_dead[j] && IoU(&boxes[i * 5], &boxes[j * 5]) > thresh) {
is_dead[j] = 1;
}
}
num_keep = count;
}
template <>
void SelectProposals<float, CPUContext>(
const int count,
const float score_thresh,
const float* input_scores,
vector<float>& output_scores,
vector<int64_t>& output_indices,
CPUContext* ctx) {
int num_proposals = 0;
for (int i = 0; i < count; ++i) {
if (input_scores[i] > score_thresh) {
output_indices[num_proposals++] = i;
}
}
output_scores.resize(num_proposals);
for (int i = 0; i < num_proposals; ++i) {
output_scores[i] = input_scores[output_indices[i]];
}
}
} // namespace detection
} // namespace utils
} // namespace dragon
#ifdef USE_CUDA
#include <dragon/core/context_cuda.h>
#include <dragon/core/workspace.h>
#include <dragon/utils/device/common_cub.h>
#include <dragon/utils/device/common_thrust.h>
#include "detection_utils.h"
namespace dragon {
namespace utils {
namespace detection {
#define DIV_UP(m, n) ((m) / (n) + ((m) % (n) > 0))
#define NUM_THREADS 64
namespace {
template <typename T>
struct ThresholdFunctor {
ThresholdFunctor(float thresh) : thresh_(thresh) {}
inline __device__ bool operator()(
const thrust::tuple<int64_t, T>& key_val) const {
return thrust::get<1>(key_val) > thresh_;
}
float thresh_;
};
template <typename T>
__device__ bool _CheckIoU(const T* a, const T* b, const float thresh) {
const T x1 = max(a[0], b[0]);
const T y1 = max(a[1], b[1]);
const T x2 = min(a[2], b[2]);
const T y2 = min(a[3], b[3]);
const T width = max(T(0), x2 - x1 + 1);
const T height = max(T(0), y2 - y1 + 1);
const T inter = width * height;
const T Sa = (a[2] - a[0] + T(1)) * (a[3] - a[1] + T(1));
const T Sb = (b[2] - b[0] + T(1)) * (b[3] - b[1] + T(1));
return inter > thresh * (Sa + Sb - inter);
}
template <typename T>
__global__ void _NonMaxSuppression(
const int num_blocks,
const int num_boxes,
const T thresh,
const T* dev_boxes,
uint64_t* dev_mask) {
const int row_start = blockIdx.y;
const int col_start = blockIdx.x;
if (row_start > col_start) return;
const int row_size = min(num_boxes - row_start * NUM_THREADS, NUM_THREADS);
const int col_size = min(num_boxes - col_start * NUM_THREADS, NUM_THREADS);
__shared__ T block_boxes[NUM_THREADS * 4];
if (threadIdx.x < col_size) {
const int c1 = threadIdx.x * 4;
const int c2 = (col_start * NUM_THREADS + threadIdx.x) * 5;
block_boxes[c1] = dev_boxes[c2];
block_boxes[c1 + 1] = dev_boxes[c2 + 1];
block_boxes[c1 + 2] = dev_boxes[c2 + 2];
block_boxes[c1 + 3] = dev_boxes[c2 + 3];
}
__syncthreads();
if (threadIdx.x < row_size) {
const int index = row_start * NUM_THREADS + threadIdx.x;
const T* dev_box = dev_boxes + index * 5;
unsigned long long val = 0;
const int start = (row_start == col_start) ? (threadIdx.x + 1) : 0;
for (int i = start; i < col_size; ++i) {
if (_CheckIoU(dev_box, block_boxes + i * 4, thresh)) {
val |= 1ULL << i;
}
}
dev_mask[index * num_blocks + col_start] = val;
}
}
} // namespace
template <>
void SelectProposals<float, CUDAContext>(
const int count,
const float score_thresh,
const float* in_scores,
vector<float>& out_scores,
vector<int64_t>& out_indices,
CUDAContext* ctx) {
auto* in_indices = ctx->workspace()->template data<int64_t, CUDAContext>(
{count}, "data:1")[0];
auto iter = thrust::make_zip_iterator(
thrust::make_tuple(in_indices, const_cast<float*>(in_scores)));
auto policy = thrust::cuda::par.on(ctx->cuda_stream());
thrust::counting_iterator<int64_t> offset(0);
thrust::copy(policy, offset, offset + count, in_indices);
auto last = thrust::partition(
policy, iter, iter + count, ThresholdFunctor<float>(score_thresh));
size_t num_proposals = last - iter;
out_scores.resize(num_proposals);
out_indices.resize(num_proposals);
CUDA_CHECK(cudaMemcpyAsync(
out_scores.data(),
in_scores,
num_proposals * sizeof(float),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
CUDA_CHECK(cudaMemcpyAsync(
out_indices.data(),
in_indices,
num_proposals * sizeof(int64_t),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
ctx->FinishDeviceComputation();
}
template <>
void ApplyNMS<float, CUDAContext>(
const int num_boxes,
const int max_keeps,
const float thresh,
const float* boxes,
int64_t* keep_indices,
int& num_keep,
CUDAContext* ctx) {
const int num_blocks = DIV_UP(num_boxes, NUM_THREADS);
vector<uint64_t> mask_host(num_boxes * num_blocks);
auto* mask_dev = (uint64_t*)ctx->workspace()->data<CUDAContext>(
{mask_host.size() * sizeof(uint64_t)}, "data:1")[0];
_NonMaxSuppression<<<
dim3(num_blocks, num_blocks),
NUM_THREADS,
0,
ctx->cuda_stream()>>>(num_blocks, num_boxes, thresh, boxes, mask_dev);
CUDA_CHECK(cudaMemcpyAsync(
mask_host.data(),
mask_dev,
mask_host.size() * sizeof(uint64_t),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
ctx->FinishDeviceComputation();
vector<uint64_t> dead_bit(num_blocks);
memset(&dead_bit[0], 0, sizeof(uint64_t) * num_blocks);
int num_selected = 0;
for (int i = 0; i < num_boxes; ++i) {
const int nblock = i / NUM_THREADS;
const int inblock = i % NUM_THREADS;
if (!(dead_bit[nblock] & (1ULL << inblock))) {
keep_indices[num_selected++] = i;
auto* mask_i = &mask_host[0] + i * num_blocks;
for (int j = nblock; j < num_blocks; ++j)
dead_bit[j] |= mask_i[j];
if (num_selected == max_keeps) break;
}
}
num_keep = num_selected;
}
} // namespace detection
} // namespace utils
} // namespace dragon
#endif // USE_CUDA
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef SEETADET_CXX_UTILS_DETECTION_UTILS_H_
#define SEETADET_CXX_UTILS_DETECTION_UTILS_H_
#include "dragon/core/common.h"
namespace dragon {
namespace utils {
namespace detection {
#define ROUND(x) ((int)((x) + (T)0.5))
/*!
* Functional API
*/
template <typename T>
inline void ArgPartition(
const int count,
const int kth,
const bool descend,
const T* v,
vec64_t& indices) {
indices.resize(count);
std::iota(indices.begin(), indices.end(), 0);
if (descend) {
std::nth_element(
indices.begin(),
indices.begin() + kth,
indices.end(),
[&v](int64_t lhs, int64_t rhs) { return v[lhs] > v[rhs]; });
} else {
std::nth_element(
indices.begin(),
indices.begin() + kth,
indices.end(),
[&v](int64_t lhs, int64_t rhs) { return v[lhs] < v[rhs]; });
}
}
/*!
* Box API
*/
template <typename T>
inline void BBoxTransform(
const T dx,
const T dy,
const T d_log_w,
const T d_log_h,
const T im_w,
const T im_h,
const T im_scale_h,
const T im_scale_w,
T* bbox) {
const T w = bbox[2] - bbox[0] + 1;
const T h = bbox[3] - bbox[1] + 1;
const T ctr_x = bbox[0] + (T)0.5 * w;
const T ctr_y = bbox[1] + (T)0.5 * h;
const T pred_ctr_x = dx * w + ctr_x;
const T pred_ctr_y = dy * h + ctr_y;
const T pred_w = exp(d_log_w) * w;
const T pred_h = exp(d_log_h) * h;
bbox[0] = pred_ctr_x - (T)0.5 * pred_w;
bbox[1] = pred_ctr_y - (T)0.5 * pred_h;
bbox[2] = pred_ctr_x + (T)0.5 * pred_w;
bbox[3] = pred_ctr_y + (T)0.5 * pred_h;
bbox[0] = std::max((T)0, std::min(bbox[0], im_w - 1)) / im_scale_w;
bbox[1] = std::max((T)0, std::min(bbox[1], im_h - 1)) / im_scale_h;
bbox[2] = std::max((T)0, std::min(bbox[2], im_w - 1)) / im_scale_w;
bbox[3] = std::max((T)0, std::min(bbox[3], im_h - 1)) / im_scale_h;
}
/*!
* Anchor API
*/
template <typename T>
inline void GenerateAnchors(
int base_size,
const int num_ratios,
const int num_scales,
const T* ratios,
const T* scales,
T* anchors) {
const T base_area = (T)(base_size * base_size);
const T center = (T)0.5 * (base_size - (T)1);
T* offset_anchors = anchors;
for (int i = 0; i < num_ratios; ++i) {
const T ratio_w = (T)ROUND(sqrt(base_area / ratios[i]));
const T ratio_h = (T)ROUND(ratio_w * ratios[i]);
for (int j = 0; j < num_scales; ++j) {
const T scale_w = (T)0.5 * (ratio_w * scales[j] - (T)1);
const T scale_h = (T)0.5 * (ratio_h * scales[j] - (T)1);
offset_anchors[0] = center - scale_w;
offset_anchors[1] = center - scale_h;
offset_anchors[2] = center + scale_w;
offset_anchors[3] = center + scale_h;
offset_anchors += 4;
}
}
}
template <typename T>
inline void GetShiftedAnchors(
const int num_proposals,
const int num_anchors,
const int feat_h,
const int feat_w,
const int stride,
const int stride_offset,
const T* base_anchors,
const int64_t* indices,
T* shifted_anchors) {
T x, y;
int idx_3d, a, h, w;
int idx_range = num_anchors * feat_h * feat_w;
for (int i = 0; i < num_proposals; ++i) {
idx_3d = (int)indices[i] - stride_offset;
if (idx_3d >= 0 && idx_3d < idx_range) {
w = idx_3d % feat_w;
h = (idx_3d / feat_w) % feat_h;
a = idx_3d / feat_w / feat_h;
x = (T)w * stride, y = (T)h * stride;
auto* A = base_anchors + a * 4;
auto* P = shifted_anchors + i * 5;
P[0] = x + A[0], P[1] = y + A[1];
P[2] = x + A[2], P[3] = y + A[3];
}
}
}
template <typename T>
inline void GetShiftedAnchors(
const int num_proposals,
const int num_classes,
const int num_anchors,
const int feat_h,
const int feat_w,
const int stride,
const int stride_offset,
const T* base_anchors,
const int64_t* indices,
T* shifted_anchors) {
T x, y;
int idx_4d, a, h, w;
int lr = num_classes * stride_offset;
int rr = num_classes * (num_anchors * feat_h * feat_w);
for (int i = 0; i < num_proposals; ++i) {
idx_4d = (int)indices[i] - lr;
if (idx_4d >= 0 && idx_4d < rr) {
idx_4d /= num_classes;
w = idx_4d % feat_w;
h = (idx_4d / feat_w) % feat_h;
a = idx_4d / feat_w / feat_h;
x = (T)w * stride, y = (T)h * stride;
auto* A = base_anchors + a * 4;
auto* P = shifted_anchors + i * 7 + 1;
P[0] = x + A[0], P[1] = y + A[1];
P[2] = x + A[2], P[3] = y + A[3];
}
}
}
/*!
* Proposal API
*/
template <typename T, class Context>
void SelectProposals(
const int count,
const float score_thresh,
const T* input_scores,
vector<T>& output_scores,
vector<int64_t>& output_indices,
Context* ctx);
template <typename T>
void GenerateProposals_v1(
const int K,
const int num_proposals,
const float im_h,
const float im_w,
const T* scores,
const T* deltas,
const int64_t* indices,
T* proposals) {
// Shifted anchors in format: [K, A, 4]
int64_t index, a, k;
const T* delta;
T* proposal = proposals;
T dx, dy, d_log_w, d_log_h;
for (int i = 0; i < num_proposals; ++i) {
index = indices[i];
a = index / K, k = index % K;
delta = deltas + k;
dx = delta[(a * 4 + 0) * K];
dy = delta[(a * 4 + 1) * K];
d_log_w = delta[(a * 4 + 2) * K];
d_log_h = delta[(a * 4 + 3) * K];
BBoxTransform(dx, dy, d_log_w, d_log_h, im_w, im_h, T(1), T(1), proposal);
proposal[4] = scores[index];
proposal += 5;
}
}
template <typename T>
void GenerateProposals(
const int num_candidates,
const int num_proposals,
const float im_h,
const float im_w,
const T* scores,
const T* deltas,
const int64_t* indices,
T* proposals) {
// Shifted anchors in format: [4, A, K]
int64_t index;
int64_t num_candidates_2x = 2 * num_candidates;
int64_t num_candidates_3x = 3 * num_candidates;
T* proposal = proposals;
T dx, dy, d_log_w, d_log_h;
for (int i = 0; i < num_proposals; ++i) {
index = indices[i];
dx = deltas[index];
dy = deltas[num_candidates + index];
d_log_w = deltas[num_candidates_2x + index];
d_log_h = deltas[num_candidates_3x + index];
BBoxTransform(dx, dy, d_log_w, d_log_h, im_w, im_h, T(1), T(1), proposal);
proposal[4] = scores[index];
proposal += 5;
}
}
template <typename T>
void GenerateDetections(
const int num_proposals,
const int num_boxes,
const int num_classes,
const int im_idx,
const float im_h,
const float im_w,
const float im_scale_h,
const float im_scale_w,
const T* scores,
const T* deltas,
const int64_t* indices,
T* detections) {
int64_t index, cls;
int64_t num_boxes_2x = 2 * num_boxes;
int64_t num_boxes_3x = 3 * num_boxes;
T* detection = detections;
float dx, dy, d_log_w, d_log_h;
for (int i = 0; i < num_proposals; ++i) {
cls = indices[i] % num_classes;
index = indices[i] / num_classes;
dx = deltas[index];
dy = deltas[num_boxes + index];
d_log_w = deltas[num_boxes_2x + index];
d_log_h = deltas[num_boxes_3x + index];
detection[0] = im_idx;
BBoxTransform(
dx,
dy,
d_log_w,
d_log_h,
im_w,
im_h,
im_scale_h,
im_scale_w,
detection + 1);
// detection[5] = scores[indices[i]];
detection[5] = scores[i];
detection[6] = cls + 1;
detection += 7;
}
}
template <typename T>
inline void
SortProposals(const int start, const int end, const int num_top, T* proposals) {
const T pivot_score = proposals[start * 5 + 4];
int left = start + 1, right = end;
while (left <= right) {
while (left <= end && proposals[left * 5 + 4] >= pivot_score)
++left;
while (right > start && proposals[right * 5 + 4] <= pivot_score)
--right;
if (left <= right) {
for (int i = 0; i < 5; ++i)
std::swap(proposals[left * 5 + i], proposals[right * 5 + i]);
++left;
--right;
}
}
if (right > start) {
for (int i = 0; i < 5; ++i)
std::swap(proposals[start * 5 + i], proposals[right * 5 + i]);
}
if (start < right - 1) SortProposals(start, right - 1, num_top, proposals);
if (right + 1 < num_top && right + 1 < end)
SortProposals(right + 1, end, num_top, proposals);
}
template <typename T>
inline void RetrieveRoIs(
const int num_rois,
const int roi_batch_ind,
const T* proposals,
const int64_t* roi_indices,
T* rois) {
for (int i = 0; i < num_rois; ++i) {
const T* proposal = proposals + roi_indices[i] * 5;
rois[i * 5 + 0] = (T)roi_batch_ind;
rois[i * 5 + 1] = proposal[0];
rois[i * 5 + 2] = proposal[1];
rois[i * 5 + 3] = proposal[2];
rois[i * 5 + 4] = proposal[3];
}
}
template <typename T>
inline int roi_level(
const int min_level,
const int max_level,
const int canonical_level,
const int canonical_scale,
T* roi) {
T w = roi[3] - roi[1] + 1;
T h = roi[4] - roi[2] + 1;
// Refer the settings of paper
int level = canonical_level +
std::log2(std::max(std::sqrt(w * h), (T)1) / (T)canonical_scale);
return std::min(max_level, std::max(min_level, level));
}
template <typename T>
inline void CollectRoIs(
const int num_rois,
const int min_level,
const int max_level,
const int canonical_level,
const int canonical_scale,
const T* rois,
vector<vec64_t>& roi_bins) {
const T* roi = rois;
for (int i = 0; i < num_rois; ++i) {
int bin_idx =
roi_level(min_level, max_level, canonical_level, canonical_scale, roi);
bin_idx = std::max(bin_idx - min_level, 0);
roi_bins[bin_idx].push_back(i);
roi += 5;
}
}
template <typename T>
inline void DistributeRoIs(
const vector<vec64_t>& roi_bins,
const T* rois,
vector<T*> outputs) {
for (int i = 0; i < roi_bins.size(); i++) {
auto* y = outputs[i];
if (roi_bins[i].size() == 0) {
// Fake a tiny roi to avoid empty roi pooling
y[0] = 0, y[1] = 0, y[2] = 0, y[3] = 1, y[4] = 1;
} else {
for (int j = 0; j < roi_bins[i].size(); ++j) {
const T* roi = rois + roi_bins[i][j] * 5;
for (int k = 0; k < 5; ++k)
y[k] = roi[k];
y += 5;
}
}
}
}
/*!
* NMS API
*/
template <typename T, class Context>
void ApplyNMS(
const int num_boxes,
const int max_keeps,
const T thresh,
const T* boxes,
int64_t* keep_indices,
int& num_keep,
Context* ctx);
} // namespace detection
} // namespace utils
} // namespace dragon
#endif // SEETADET_CXX_UTILS_DETECTION_UTILS_H_
# distutils: language = c
# distutils: sources = ../common/maskApi.c
#**************************************************************************
# Microsoft COCO Toolbox. version 2.0
# Data, paper, and tutorials available at: http://mscoco.org/
# Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
# Licensed under the Simplified BSD License [see coco/license.txt]
#**************************************************************************
__author__ = 'tsungyi'
import sys
PYTHON_VERSION = sys.version_info[0]
# import both Python-level and C-level symbols of Numpy
# the API uses Numpy to interface C and Python
import numpy as np
cimport numpy as np
from libc.stdlib cimport malloc, free
# intialized Numpy. must do.
np.import_array()
# import numpy C function
# we use PyArray_ENABLEFLAGS to make Numpy ndarray responsible to memoery management
cdef extern from "numpy/arrayobject.h":
void PyArray_ENABLEFLAGS(np.ndarray arr, int flags)
# Declare the prototype of the C functions in MaskApi.h
cdef extern from "maskApi.h":
ctypedef unsigned int uint
ctypedef unsigned long siz
ctypedef unsigned char byte
ctypedef double* BB
ctypedef struct RLE:
siz h,
siz w,
siz m,
uint* cnts,
void rlesInit( RLE **R, siz n )
void rleEncode( RLE *R, const byte *M, siz h, siz w, siz n )
void rleDecode( const RLE *R, byte *mask, siz n )
void rleMerge( const RLE *R, RLE *M, siz n, int intersect )
void rleArea( const RLE *R, siz n, uint *a )
void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o )
void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o )
void rleToBbox( const RLE *R, BB bb, siz n )
void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n )
void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w )
char* rleToString( const RLE *R )
void rleFrString( RLE *R, char *s, siz h, siz w )
# python class to wrap RLE array in C
# the class handles the memory allocation and deallocation
cdef class RLEs:
cdef RLE *_R
cdef siz _n
def __cinit__(self, siz n =0):
rlesInit(&self._R, n)
self._n = n
# free the RLE array here
def __dealloc__(self):
if self._R is not NULL:
for i in range(self._n):
free(self._R[i].cnts)
free(self._R)
def __getattr__(self, key):
if key == 'n':
return self._n
raise AttributeError(key)
# python class to wrap Mask array in C
# the class handles the memory allocation and deallocation
cdef class Masks:
cdef byte *_mask
cdef siz _h
cdef siz _w
cdef siz _n
def __cinit__(self, h, w, n):
self._mask = <byte*> malloc(h*w*n* sizeof(byte))
self._h = h
self._w = w
self._n = n
# def __dealloc__(self):
# the memory management of _mask has been passed to np.ndarray
# it doesn't need to be freed here
# called when passing into np.array() and return an np.ndarray in column-major order
def __array__(self):
cdef np.npy_intp shape[1]
shape[0] = <np.npy_intp> self._h*self._w*self._n
# Create a 1D array, and reshape it to fortran/Matlab column-major array
ndarray = np.PyArray_SimpleNewFromData(1, shape, np.NPY_UINT8, self._mask).reshape((self._h, self._w, self._n), order='F')
# The _mask allocated by Masks is now handled by ndarray
PyArray_ENABLEFLAGS(ndarray, np.NPY_OWNDATA)
return ndarray
# internal conversion from Python RLEs object to compressed RLE format
def _toString(RLEs Rs):
cdef siz n = Rs.n
cdef bytes py_string
cdef char* c_string
objs = []
for i in range(n):
c_string = rleToString( <RLE*> &Rs._R[i] )
py_string = c_string
objs.append({
'size': [Rs._R[i].h, Rs._R[i].w],
'counts': py_string
})
free(c_string)
return objs
# internal conversion from compressed RLE format to Python RLEs object
def _frString(rleObjs):
cdef siz n = len(rleObjs)
Rs = RLEs(n)
cdef bytes py_string
cdef char* c_string
for i, obj in enumerate(rleObjs):
if PYTHON_VERSION == 2:
py_string = str(obj['counts']).encode('utf8')
elif PYTHON_VERSION == 3:
py_string = str.encode(obj['counts']) if type(obj['counts']) == str else obj['counts']
else:
raise Exception('Python version must be 2 or 3')
c_string = py_string
rleFrString( <RLE*> &Rs._R[i], <char*> c_string, obj['size'][0], obj['size'][1] )
return Rs
# encode mask to RLEs objects
# list of RLE string can be generated by RLEs member function
def encode(np.ndarray[np.uint8_t, ndim=3, mode='fortran'] mask):
h, w, n = mask.shape[0], mask.shape[1], mask.shape[2]
cdef RLEs Rs = RLEs(n)
rleEncode(Rs._R,<byte*>mask.data,h,w,n)
objs = _toString(Rs)
return objs
# decode mask from compressed list of RLE string or RLEs object
def decode(rleObjs):
cdef RLEs Rs = _frString(rleObjs)
h, w, n = Rs._R[0].h, Rs._R[0].w, Rs._n
masks = Masks(h, w, n)
rleDecode(<RLE*>Rs._R, masks._mask, n);
return np.array(masks)
def merge(rleObjs, intersect=0):
cdef RLEs Rs = _frString(rleObjs)
cdef RLEs R = RLEs(1)
rleMerge(<RLE*>Rs._R, <RLE*> R._R, <siz> Rs._n, intersect)
obj = _toString(R)[0]
return obj
def area(rleObjs):
cdef RLEs Rs = _frString(rleObjs)
cdef uint* _a = <uint*> malloc(Rs._n* sizeof(uint))
rleArea(Rs._R, Rs._n, _a)
cdef np.npy_intp shape[1]
shape[0] = <np.npy_intp> Rs._n
a = np.array((Rs._n, ), dtype=np.uint8)
a = np.PyArray_SimpleNewFromData(1, shape, np.NPY_UINT32, _a)
PyArray_ENABLEFLAGS(a, np.NPY_OWNDATA)
return a
# iou computation. support function overload (RLEs-RLEs and bbox-bbox).
def iou( dt, gt, pyiscrowd ):
def _preproc(objs):
if len(objs) == 0:
return objs
if type(objs) == np.ndarray:
if len(objs.shape) == 1:
objs = objs.reshape((objs[0], 1))
# check if it's Nx4 bbox
if not len(objs.shape) == 2 or not objs.shape[1] == 4:
raise Exception('numpy ndarray input is only for *bounding boxes* and should have Nx4 dimension')
objs = objs.astype(np.double)
elif type(objs) == list:
# check if list is in box format and convert it to np.ndarray
isbox = np.all(np.array([(len(obj)==4) and ((type(obj)==list) or (type(obj)==np.ndarray)) for obj in objs]))
isrle = np.all(np.array([type(obj) == dict for obj in objs]))
if isbox:
objs = np.array(objs, dtype=np.double)
if len(objs.shape) == 1:
objs = objs.reshape((1,objs.shape[0]))
elif isrle:
objs = _frString(objs)
else:
raise Exception('list input can be bounding box (Nx4) or RLEs ([RLE])')
else:
raise Exception('unrecognized type. The following type: RLEs (rle), np.ndarray (box), and list (box) are supported.')
return objs
def _rleIou(RLEs dt, RLEs gt, np.ndarray[np.uint8_t, ndim=1] iscrowd, siz m, siz n, np.ndarray[np.double_t, ndim=1] _iou):
rleIou( <RLE*> dt._R, <RLE*> gt._R, m, n, <byte*> iscrowd.data, <double*> _iou.data )
def _bbIou(np.ndarray[np.double_t, ndim=2] dt, np.ndarray[np.double_t, ndim=2] gt, np.ndarray[np.uint8_t, ndim=1] iscrowd, siz m, siz n, np.ndarray[np.double_t, ndim=1] _iou):
bbIou( <BB> dt.data, <BB> gt.data, m, n, <byte*> iscrowd.data, <double*>_iou.data )
def _len(obj):
cdef siz N = 0
if type(obj) == RLEs:
N = obj.n
elif len(obj)==0:
pass
elif type(obj) == np.ndarray:
N = obj.shape[0]
return N
# convert iscrowd to numpy array
cdef np.ndarray[np.uint8_t, ndim=1] iscrowd = np.array(pyiscrowd, dtype=np.uint8)
# simple type checking
cdef siz m, n
dt = _preproc(dt)
gt = _preproc(gt)
m = _len(dt)
n = _len(gt)
if m == 0 or n == 0:
return []
if not type(dt) == type(gt):
raise Exception('The dt and gt should have the same data type, either RLEs, list or np.ndarray')
# define local variables
cdef double* _iou = <double*> 0
cdef np.npy_intp shape[1]
# check type and assign iou function
if type(dt) == RLEs:
_iouFun = _rleIou
elif type(dt) == np.ndarray:
_iouFun = _bbIou
else:
raise Exception('input data type not allowed.')
_iou = <double*> malloc(m*n* sizeof(double))
iou = np.zeros((m*n, ), dtype=np.double)
shape[0] = <np.npy_intp> m*n
iou = np.PyArray_SimpleNewFromData(1, shape, np.NPY_DOUBLE, _iou)
PyArray_ENABLEFLAGS(iou, np.NPY_OWNDATA)
_iouFun(dt, gt, iscrowd, m, n, iou)
return iou.reshape((m,n), order='F')
def toBbox( rleObjs ):
cdef RLEs Rs = _frString(rleObjs)
cdef siz n = Rs.n
cdef BB _bb = <BB> malloc(4*n* sizeof(double))
rleToBbox( <const RLE*> Rs._R, _bb, n )
cdef np.npy_intp shape[1]
shape[0] = <np.npy_intp> 4*n
bb = np.array((1,4*n), dtype=np.double)
bb = np.PyArray_SimpleNewFromData(1, shape, np.NPY_DOUBLE, _bb).reshape((n, 4))
PyArray_ENABLEFLAGS(bb, np.NPY_OWNDATA)
return bb
def frBbox(np.ndarray[np.double_t, ndim=2] bb, siz h, siz w ):
cdef siz n = bb.shape[0]
Rs = RLEs(n)
rleFrBbox( <RLE*> Rs._R, <const BB> bb.data, h, w, n )
objs = _toString(Rs)
return objs
def frPoly( poly, siz h, siz w ):
cdef np.ndarray[np.double_t, ndim=1] np_poly
n = len(poly)
Rs = RLEs(n)
for i, p in enumerate(poly):
np_poly = np.array(p, dtype=np.double, order='F')
rleFrPoly( <RLE*>&Rs._R[i], <const double*> np_poly.data, int(len(p)/2), h, w )
objs = _toString(Rs)
return objs
def frUncompressedRLE(ucRles, siz h, siz w):
cdef np.ndarray[np.uint32_t, ndim=1] cnts
cdef RLE R
cdef uint *data
n = len(ucRles)
objs = []
for i in range(n):
Rs = RLEs(1)
cnts = np.array(ucRles[i]['counts'], dtype=np.uint32)
# time for malloc can be saved here but it's fine
data = <uint*> malloc(len(cnts)* sizeof(uint))
for j in range(len(cnts)):
data[j] = <uint> cnts[j]
R = RLE(ucRles[i]['size'][0], ucRles[i]['size'][1], len(cnts), <uint*> data)
Rs._R[0] = R
objs.append(_toString(Rs)[0])
return objs
def frPyObjects(pyobj, h, w):
# encode rle from a list of python objects
if type(pyobj) == np.ndarray:
objs = frBbox(pyobj, h, w)
elif type(pyobj) == list and len(pyobj[0]) == 4:
objs = frBbox(pyobj, h, w)
elif type(pyobj) == list and len(pyobj[0]) > 4:
objs = frPoly(pyobj, h, w)
elif type(pyobj) == list and type(pyobj[0]) == dict \
and 'counts' in pyobj[0] and 'size' in pyobj[0]:
objs = frUncompressedRLE(pyobj, h, w)
# encode rle from single python object
elif type(pyobj) == list and len(pyobj) == 4:
objs = frBbox([pyobj], h, w)[0]
elif type(pyobj) == list and len(pyobj) > 4:
objs = frPoly([pyobj], h, w)[0]
elif type(pyobj) == dict and 'counts' in pyobj and 'size' in pyobj:
objs = frUncompressedRLE([pyobj], h, w)[0]
else:
raise Exception('input type is not supported.')
return objs
\ No newline at end of file
/**************************************************************************
* Microsoft COCO Toolbox. version 2.0
* Data, paper, and tutorials available at: http://mscoco.org/
* Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
* Licensed under the Simplified BSD License [see coco/license.txt]
**************************************************************************/
#include "maskApi.h"
#include <math.h>
#include <stdlib.h>
uint umin( uint a, uint b ) { return (a<b) ? a : b; }
uint umax( uint a, uint b ) { return (a>b) ? a : b; }
void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts ) {
R->h=h; R->w=w; R->m=m; R->cnts=(m==0)?0:malloc(sizeof(uint)*m);
if(cnts) for(siz j=0; j<m; j++) R->cnts[j]=cnts[j];
}
void rleFree( RLE *R ) {
free(R->cnts); R->cnts=0;
}
void rlesInit( RLE **R, siz n ) {
*R = (RLE*) malloc(sizeof(RLE)*n);
for(siz i=0; i<n; i++) rleInit((*R)+i,0,0,0,0);
}
void rlesFree( RLE **R, siz n ) {
for(siz i=0; i<n; i++) rleFree((*R)+i); free(*R); *R=0;
}
void rleEncode( RLE *R, const byte *M, siz h, siz w, siz n ) {
siz i, j, k, a=w*h; uint c, *cnts; byte p;
cnts = malloc(sizeof(uint)*(a+1));
for(i=0; i<n; i++) {
const byte *T=M+a*i; k=0; p=0; c=0;
for(j=0; j<a; j++) { if(T[j]!=p) { cnts[k++]=c; c=0; p=T[j]; } c++; }
cnts[k++]=c; rleInit(R+i,h,w,k,cnts);
}
free(cnts);
}
void rleDecode( const RLE *R, byte *M, siz n ) {
for( siz i=0; i<n; i++ ) {
byte v=0; for( siz j=0; j<R[i].m; j++ ) {
for( siz k=0; k<R[i].cnts[j]; k++ ) *(M++)=v; v=!v; }}
}
void rleMerge( const RLE *R, RLE *M, siz n, bool intersect ) {
uint *cnts, c, ca, cb, cc, ct; bool v, va, vb, vp;
siz i, a, b, h=R[0].h, w=R[0].w, m=R[0].m; RLE A, B;
if(n==0) { rleInit(M,0,0,0,0); return; }
if(n==1) { rleInit(M,h,w,m,R[0].cnts); return; }
cnts = malloc(sizeof(uint)*(h*w+1));
for( a=0; a<m; a++ ) cnts[a]=R[0].cnts[a];
for( i=1; i<n; i++ ) {
B=R[i]; if(B.h!=h||B.w!=w) { h=w=m=0; break; }
rleInit(&A,h,w,m,cnts); ca=A.cnts[0]; cb=B.cnts[0];
v=va=vb=0; m=0; a=b=1; cc=0; ct=1;
while( ct>0 ) {
c=umin(ca,cb); cc+=c; ct=0;
ca-=c; if(!ca && a<A.m) { ca=A.cnts[a++]; va=!va; } ct+=ca;
cb-=c; if(!cb && b<B.m) { cb=B.cnts[b++]; vb=!vb; } ct+=cb;
vp=v; if(intersect) v=va&&vb; else v=va||vb;
if( v!=vp||ct==0 ) { cnts[m++]=cc; cc=0; }
}
rleFree(&A);
}
rleInit(M,h,w,m,cnts); free(cnts);
}
void rleArea( const RLE *R, siz n, uint *a ) {
for( siz i=0; i<n; i++ ) {
a[i]=0; for( siz j=1; j<R[i].m; j+=2 ) a[i]+=R[i].cnts[j]; }
}
void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o ) {
siz g, d; BB db, gb; bool crowd;
db=malloc(sizeof(double)*m*4); rleToBbox(dt,db,m);
gb=malloc(sizeof(double)*n*4); rleToBbox(gt,gb,n);
bbIou(db,gb,m,n,iscrowd,o); free(db); free(gb);
for( g=0; g<n; g++ ) for( d=0; d<m; d++ ) if(o[g*m+d]>0) {
crowd=iscrowd!=NULL && iscrowd[g];
if(dt[d].h!=gt[g].h || dt[d].w!=gt[g].w) { o[g*m+d]=-1; continue; }
siz ka, kb, a, b; uint c, ca, cb, ct, i, u; bool va, vb;
ca=dt[d].cnts[0]; ka=dt[d].m; va=vb=0;
cb=gt[g].cnts[0]; kb=gt[g].m; a=b=1; i=u=0; ct=1;
while( ct>0 ) {
c=umin(ca,cb); if(va||vb) { u+=c; if(va&&vb) i+=c; } ct=0;
ca-=c; if(!ca && a<ka) { ca=dt[d].cnts[a++]; va=!va; } ct+=ca;
cb-=c; if(!cb && b<kb) { cb=gt[g].cnts[b++]; vb=!vb; } ct+=cb;
}
if(i==0) u=1; else if(crowd) rleArea(dt+d,1,&u);
o[g*m+d] = (double)i/(double)u;
}
}
void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o ) {
double h, w, i, u, ga, da; siz g, d; bool crowd;
for( g=0; g<n; g++ ) {
BB G=gt+g*4; ga=G[2]*G[3]; crowd=iscrowd!=NULL && iscrowd[g];
for( d=0; d<m; d++ ) {
BB D=dt+d*4; da=D[2]*D[3]; o[g*m+d]=0;
w=fmin(D[2]+D[0],G[2]+G[0])-fmax(D[0],G[0]); if(w<=0) continue;
h=fmin(D[3]+D[1],G[3]+G[1])-fmax(D[1],G[1]); if(h<=0) continue;
i=w*h; u = crowd ? da : da+ga-i; o[g*m+d]=i/u;
}
}
}
void rleToBbox( const RLE *R, BB bb, siz n ) {
for( siz i=0; i<n; i++ ) {
uint h, w, x, y, xs, ys, xe, ye, cc, t; siz j, m;
h=(uint)R[i].h; w=(uint)R[i].w; m=R[i].m;
m=((siz)(m/2))*2; xs=w; ys=h; xe=ye=0; cc=0;
if(m==0) { bb[4*i+0]=bb[4*i+1]=bb[4*i+2]=bb[4*i+3]=0; continue; }
for( j=0; j<m; j++ ) {
cc+=R[i].cnts[j]; t=cc-j%2; y=t%h; x=(t-y)/h;
xs=umin(xs,x); xe=umax(xe,x); ys=umin(ys,y); ye=umax(ye,y);
}
bb[4*i+0]=xs; bb[4*i+2]=xe-xs+1;
bb[4*i+1]=ys; bb[4*i+3]=ye-ys+1;
}
}
void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n ) {
for( siz i=0; i<n; i++ ) {
double xs=bb[4*i+0], xe=xs+bb[4*i+2];
double ys=bb[4*i+1], ye=ys+bb[4*i+3];
double xy[8] = {xs,ys,xs,ye,xe,ye,xe,ys};
rleFrPoly( R+i, xy, 4, h, w );
}
}
int uintCompare(const void *a, const void *b) {
uint c=*((uint*)a), d=*((uint*)b); return c>d?1:c<d?-1:0;
}
void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w ) {
// upsample and get discrete points densely along entire boundary
siz j, m=0; double scale=5; int *x, *y, *u, *v; uint *a, *b;
x=malloc(sizeof(int)*(k+1)); y=malloc(sizeof(int)*(k+1));
for(j=0; j<k; j++) x[j]=(int)(scale*xy[j*2+0]+.5); x[k]=x[0];
for(j=0; j<k; j++) y[j]=(int)(scale*xy[j*2+1]+.5); y[k]=y[0];
for(j=0; j<k; j++) m+=umax(abs(x[j]-x[j+1]),abs(y[j]-y[j+1]))+1;
u=malloc(sizeof(int)*m); v=malloc(sizeof(int)*m); m=0;
for( j=0; j<k; j++ ) {
int xs=x[j], xe=x[j+1], ys=y[j], ye=y[j+1], dx, dy, t;
bool flip; double s; dx=abs(xe-xs); dy=abs(ys-ye);
flip = (dx>=dy && xs>xe) || (dx<dy && ys>ye);
if(flip) { t=xs; xs=xe; xe=t; t=ys; ys=ye; ye=t; }
s = dx>=dy ? (double)(ye-ys)/dx : (double)(xe-xs)/dy;
if(dx>=dy) for( int d=0; d<=dx; d++ ) {
t=flip?dx-d:d; u[m]=t+xs; v[m]=(int)(ys+s*t+.5); m++;
} else for( int d=0; d<=dy; d++ ) {
t=flip?dy-d:d; v[m]=t+ys; u[m]=(int)(xs+s*t+.5); m++;
}
}
// get points along y-boundary and downsample
free(x); free(y); k=m; m=0; double xd, yd;
x=malloc(sizeof(int)*k); y=malloc(sizeof(int)*k);
for( j=1; j<k; j++ ) if(u[j]!=u[j-1]) {
xd=(double)(u[j]<u[j-1]?u[j]:u[j]-1); xd=(xd+.5)/scale-.5;
if( floor(xd)!=xd || xd<0 || xd>w-1 ) continue;
yd=(double)(v[j]<v[j-1]?v[j]:v[j-1]); yd=(yd+.5)/scale-.5;
if(yd<0) yd=0; else if(yd>h) yd=h; yd=ceil(yd);
x[m]=(int) xd; y[m]=(int) yd; m++;
}
// compute rle encoding given y-boundary points
k=m; a=malloc(sizeof(uint)*(k+1));
for( j=0; j<k; j++ ) a[j]=(uint)(x[j]*(int)(h)+y[j]);
a[k++]=(uint)(h*w); free(u); free(v); free(x); free(y);
qsort(a,k,sizeof(uint),uintCompare); uint p=0;
for( j=0; j<k; j++ ) { uint t=a[j]; a[j]-=p; p=t; }
b=malloc(sizeof(uint)*k); j=m=0; b[m++]=a[j++];
while(j<k) if(a[j]>0) b[m++]=a[j++]; else {
j++; if(j<k) b[m-1]+=a[j++]; }
rleInit(R,h,w,m,b); free(a); free(b);
}
char* rleToString( const RLE *R ) {
// Similar to LEB128 but using 6 bits/char and ascii chars 48-111.
siz i, m=R->m, p=0; long x; bool more;
char *s=malloc(sizeof(char)*m*6);
for( i=0; i<m; i++ ) {
x=(long) R->cnts[i]; if(i>2) x-=(long) R->cnts[i-2]; more=1;
while( more ) {
char c=x & 0x1f; x >>= 5; more=(c & 0x10) ? x!=-1 : x!=0;
if(more) c |= 0x20; c+=48; s[p++]=c;
}
}
s[p]=0; return s;
}
void rleFrString( RLE *R, char *s, siz h, siz w ) {
siz m=0, p=0, k; long x; bool more; uint *cnts;
while( s[m] ) m++; cnts=malloc(sizeof(uint)*m); m=0;
while( s[p] ) {
x=0; k=0; more=1;
while( more ) {
char c=s[p]-48; x |= (c & 0x1f) << 5*k;
more = c & 0x20; p++; k++;
if(!more && (c & 0x10)) x |= -1 << 5*k;
}
if(m>2) x+=(long) cnts[m-2]; cnts[m++]=(uint) x;
}
rleInit(R,h,w,m,cnts); free(cnts);
}
/**************************************************************************
* Microsoft COCO Toolbox. version 2.0
* Data, paper, and tutorials available at: http://mscoco.org/
* Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
* Licensed under the Simplified BSD License [see coco/license.txt]
**************************************************************************/
#pragma once
#include <stdbool.h>
typedef unsigned int uint;
typedef unsigned long siz;
typedef unsigned char byte;
typedef double* BB;
typedef struct { siz h, w, m; uint *cnts; } RLE;
// Initialize/destroy RLE.
void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts );
void rleFree( RLE *R );
// Initialize/destroy RLE array.
void rlesInit( RLE **R, siz n );
void rlesFree( RLE **R, siz n );
// Encode binary masks using RLE.
void rleEncode( RLE *R, const byte *mask, siz h, siz w, siz n );
// Decode binary masks encoded via RLE.
void rleDecode( const RLE *R, byte *mask, siz n );
// Compute union or intersection of encoded masks.
void rleMerge( const RLE *R, RLE *M, siz n, bool intersect );
// Compute area of encoded masks.
void rleArea( const RLE *R, siz n, uint *a );
// Compute intersection over union between masks.
void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o );
// Compute intersection over union between bounding boxes.
void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o );
// Get bounding boxes surrounding encoded masks.
void rleToBbox( const RLE *R, BB bb, siz n );
// Convert bounding boxes to encoded masks.
void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n );
// Convert polygon to encoded mask.
void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w );
// Get compressed string representation of encoded mask.
char* rleToString( const RLE *R );
// Convert from compressed string representation of encoded mask.
void rleFrString( RLE *R, char *s, siz h, siz w );
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Compile the cython extensions.""" """Build cython extensions."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -16,34 +16,25 @@ from __future__ import print_function ...@@ -16,34 +16,25 @@ from __future__ import print_function
from distutils.extension import Extension from distutils.extension import Extension
from distutils.core import setup from distutils.core import setup
import os
from Cython.Distutils import build_ext from Cython.Distutils import build_ext
import numpy as np import numpy as np
ext_modules = [ ext_modules = [
Extension( Extension(
'install.lib.utils.cython_bbox', 'seetadet.utils.bbox.cython_bbox',
['cython_bbox.pyx'], ['cython_bbox.pyx'],
extra_compile_args=['-w'], extra_compile_args=['-w'],
include_dirs=[np.get_include()] include_dirs=[np.get_include()],
), ),
Extension( Extension(
'install.lib.utils.cython_nms', 'seetadet.utils.nms.cython_nms',
['cython_nms.pyx'], ['cython_nms.pyx'],
extra_compile_args=['-w'], extra_compile_args=['-w'],
include_dirs=[np.get_include()] include_dirs=[np.get_include()],
),
Extension(
'install.lib.utils.pycocotools._mask',
['maskApi.c', '_mask.pyx'],
include_dirs=[np.get_include(), os.path.dirname(os.path.abspath(__file__))],
extra_compile_args=['-w']
), ),
] ]
setup( setup(name='seetadet',
name='SeetaDet',
ext_modules=ext_modules, ext_modules=ext_modules,
cmdclass={'build_ext': build_ext}, cmdclass={'build_ext': build_ext})
)
# Datasets
## Introduction
This folder is kept for the record and json datasets.
Please prepare the datasets following the [documentation](../../scripts/datasets/README.md).
# Demo Images
## Introduction
This folder is kept for the demo images.
# Pretrained Models
## Introduction
This folder is kept for the pretrained models.
## ImageNet Pretrained Models
### Training settings
- ResNet models trained with 200 epochs follow the procedure in arXiv.1812.01187.
### ResNet
| Model | Lr sched | Acc@1 | Acc@5 | Source |
| :---: | :------: | :---: | :---: | :----: |
| [R-50](https://dragon.seetatech.com/download/seetadet/pretrained/R-50_in1k_cls90e.pkl) | 90e | 76.53 | 93.16 | Ours |
| [R-50](https://dragon.seetatech.com/download/seetadet/pretrained/R-50_in1k_cls200e.pkl) | 200e | 78.64 | 94.30 | Ours |
### MobileNet
| Model | Lr sched | Acc@1 | Acc@5 | Source |
| :---: | :------: | :---: | :---: | :----: |
| [MobileNetV2](https://dragon.seetatech.com/download/seetadet/pretrained/MobileNetV2_in1k_cls300e.pkl) | 300e | 71.88 | 90.29 | TorchVision |
| [MobileNetV3L](https://dragon.seetatech.com/download/seetadet/pretrained/MobileNetV3L_in1k_cls600e.pkl) | 600e | 74.04 | 91.34 | TorchVision |
### VGG
| Model | Lr sched | Acc@1 | Acc@5 | Source |
| :---: | :------: | :---: | :---: | :----: |
| [VGG-16-FCN](https://dragon.seetatech.com/download/seetadet/pretrained/VGG-16-FCN_in1k.pkl) | - | - | - | weiliu89 |
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Make record file for COCO dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import shutil
from maker import make_record
from roidb import make_database
if __name__ == '__main__':
COCO_ROOT = '/data'
# Encode masks to RLE bytes
if not os.path.exists('build'):
os.makedirs('build')
make_database('train', '2017', COCO_ROOT)
make_database('val', '2017', COCO_ROOT)
# coco_2017_train
make_record(
db_file='build/coco_2017_train.db.pkl',
record_file=os.path.join(COCO_ROOT, 'coco_2017_train'),
images_path=[os.path.join(COCO_ROOT, 'images/train2017')],
splits_path=[os.path.join(COCO_ROOT, 'splits')],
splits=['train2017'],
)
# coco_2017_val
make_record(
db_file='build/coco_2017_val.db.pkl',
record_file=os.path.join(COCO_ROOT, 'coco_2017_val'),
images_path=[os.path.join(COCO_ROOT, 'images/val2017')],
splits_path=[os.path.join(COCO_ROOT, 'splits')],
splits=['val2017'],
)
shutil.rmtree('build')
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
import os
import pickle
import time
import cv2
import dragon
import numpy as np
def make_example(image_file, objects, im_scale=None):
filename = os.path.split(image_file)[-1]
example = {'id': filename.split('.')[0], 'object': []}
if im_scale:
img = cv2.imread(image_file)
img = cv2.resize(
img, None,
fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR,
)
example['height'], example['width'], example['depth'] = img.shape
_, img = cv2.imencode('.jpg', img, [int(cv2.IMWRITE_JPEG_QUALITY), 95])
example['content'] = img.tostring()
else:
with open(image_file, 'rb') as f:
img_bytes = bytes(f.read())
img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 3)
example['height'], example['width'], example['depth'] = img.shape
example['content'] = img_bytes
for obj in objects:
x1, y1, x2, y2 = obj['bbox']
example['object'].append({
'name': obj['name'],
'xmin': x1,
'ymin': y1,
'xmax': x2,
'ymax': y2,
'mask': obj['mask'],
'polygons': obj['polygons'],
'difficult': obj.get('crowd', 0),
})
return example
def make_record(
record_file,
images_path,
db_file,
splits_path,
splits,
ext='.jpg',
im_scale=None,
):
if os.path.exists(record_file):
raise ValueError('The record file is already exist.')
os.makedirs(record_file)
if not isinstance(images_path, list):
images_path = [images_path]
if not isinstance(splits_path, list):
splits_path = [splits_path]
assert len(splits) == len(splits_path)
assert len(splits) == len(images_path)
if db_file is not None:
with open(db_file, 'rb') as f:
all_entries = pickle.load(f)
else:
all_entries = {}
print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
writer = dragon.io.KPLRecordWriter(
path=record_file,
protocol={
'id': 'string',
'content': 'bytes',
'height': 'int64',
'width': 'int64',
'depth': 'int64',
'object': [{
'name': 'string',
'xmin': 'float64',
'ymin': 'float64',
'xmax': 'float64',
'ymax': 'float64',
'mask': 'bytes',
'polygons': [['float64']],
'difficult': 'int64',
}]
}
)
count, total_line = 0, 0
start_time = time.time()
for db_idx, split in enumerate(splits):
split_file = os.path.join(splits_path[db_idx], split + '.txt')
if not os.path.exists(split_file):
# Fallback to try if split provided as json format
split_file = os.path.join(splits_path[db_idx], split + '.json')
if not os.path.exists(split_file):
raise FileNotFoundError('Unable to find the split:', split)
with open(split_file, 'r') as f:
import json
images_info = json.load(f)
total_line = len(images_info['images'])
lines = []
for info in images_info['images']:
lines.append(os.path.splitext(info['file_name'])[0])
else:
with open(split_file, 'r') as f:
lines = f.readlines()
total_line += len(lines)
for line in lines:
count += 1
if count % 2000 == 0:
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
count, total_line, now_time - start_time))
filename = line.strip()
image_file = os.path.join(images_path[db_idx], filename + ext)
objects = all_entries[filename] if filename in all_entries else {}
writer.write(make_example(image_file, objects, im_scale))
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(count, total_line, now_time - start_time))
writer.close()
end_time = time.time()
data_size = os.path.getsize(record_file + '/root.data') * 1e-6
print('{} images take {:.2f} MB in {:.2f} sec.'
.format(total_line, data_size, end_time - start_time))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import os
import os.path as osp
import pickle
from seetadet.utils.pycocotools import mask_utils
from seetadet.utils.pycocotools.coco import COCO
class COCOWrapper(object):
def __init__(self, image_set, year, data_dir):
self._year = year
self._image_set = image_set
self._data_path = osp.join(data_dir)
self.invalid_cnt = 0
self.ignore_cnt = 0
# Load COCO API, classes, class <-> id mappings
self._COCO = COCO(self._get_ann_file())
cats = self._COCO.loadCats(self._COCO.getCatIds())
self._classes = tuple(['__background__'] + [c['name'] for c in cats])
self._class_to_ind = dict(zip(self._classes, range(self.num_classes)))
self._ind_to_class = dict(zip(range(self.num_classes), self._classes))
self._class_to_cat_id = dict(zip([c['name'] for c in cats], self._COCO.getCatIds()))
self._cat_id_to_class_id = dict([(self._class_to_cat_id[cls], self._class_to_ind[cls])
for cls in self._classes[1:]])
self._data_name = {
# 5k ``val2014`` subset
'minival2014': 'val2014',
# ``val2014`` minus ``minival2014``
'valminusminival2014': 'val2014',
}.get(image_set + year, image_set + year)
self._image_index = self._load_image_set_index()
self._annotations = self._load_annotations()
def _get_ann_file(self):
prefix = 'instances' \
if self._image_set.find('test') == -1 \
else 'image_info'
return osp.join(
self._data_path,
'annotations',
prefix + '_' +
self._image_set +
self._year + '.json'
)
def _load_image_set_index(self):
"""Load image ids."""
image_ids = self._COCO.getImgIds()
return image_ids
def _load_annotations(self):
"""Load annotations."""
annotations = [self._load_coco_annotation(index)
for index in self._image_index]
return annotations
def image_path_from_index(self, index):
"""Construct an image path from the image's "index" identifier."""
# Example image path for index=119993:
# images/train2014/COCO_train2014_000000119993.jpg
# images/train2017/000000119993.jpg
filename = str(index).zfill(12) + '.jpg'
if '2014' in self._data_name:
filename = 'COCO_{}_{}'.format(self._data_name, filename)
image_path = osp.join(self._data_path, 'images',
self._data_name, filename)
assert osp.exists(image_path), \
'Path does not exist: {}'.format(image_path)
return image_path
def image_path_at(self, i):
"""Return the absolute path to image i in the image sequence."""
return self.image_path_from_index(self._image_index[i])
def annotation_at(self, i):
"""Return the absolute path to image i in the image sequence."""
return self._annotations[i]
def _load_coco_annotation(self, index):
"""Loads COCO bounding-box instance annotations."""
im_ann = self._COCO.loadImgs(index)[0]
width, height = im_ann['width'], im_ann['height']
ann_ids = self._COCO.getAnnIds(imgIds=index, iscrowd=None)
objects = self._COCO.loadAnns(ann_ids)
# Sanitize boxes -- some are invalid
valid_objects = []
mask, polygons = b'', []
for obj in objects:
x1 = float(max(0, obj['bbox'][0]))
y1 = float(max(0, obj['bbox'][1]))
x2 = float(min(width - 1, x1 + max(0, obj['bbox'][2] - 1)))
y2 = float(min(height - 1, y1 + max(0, obj['bbox'][3] - 1)))
if isinstance(obj['segmentation'], list):
for p in obj['segmentation']:
if len(p) < 6:
print('Remove Invalid segm.')
# Valid polygons have >= 3 points, so require >= 6 coordinates
polygons = [p for p in obj['segmentation'] if len(p) >= 6]
else:
# Crowd masks
# Some are encoded with height or width
# running out of the image bound
# Do not use them or decoding error is inevitable
mask = mask_utils.poly2bytes(obj['segmentation'], height, width)
if obj['area'] > 0 and x2 > x1 and y2 > y1:
obj['clean_bbox'] = [x1, y1, x2, y2]
valid_objects.append({
'bbox': [x1, y1, x2, y2],
'mask': mask,
'polygons': polygons,
'category_id': obj['category_id'],
'class_id': self._cat_id_to_class_id[obj['category_id']],
'crowd': obj['iscrowd'],
})
valid_objects[-1]['name'] = \
self._ind_to_class[valid_objects[-1]['class_id']]
return height, width, valid_objects
@property
def num_images(self):
return len(self._image_index)
@property
def num_classes(self):
return len(self._classes)
def make_database(split, year, data_dir):
coco = COCOWrapper(split, year, data_dir)
print('Preparing to make split: {}, total {} images'
.format(split, coco.num_images))
if not osp.exists(osp.join(coco._data_path, 'splits')):
os.makedirs(osp.join(coco._data_path, 'splits'))
entries = collections.OrderedDict()
for i in range(coco.num_images):
filename = osp.basename(coco.image_path_at(i)).split('.')[0]
h, w, objects = coco.annotation_at(i)
entries[filename] = objects
with open(osp.join('build',
'coco_' + year + '_' + split +
'.db.pkl'), 'wb') as f:
pickle.dump(entries, f, pickle.HIGHEST_PROTOCOL)
with open(osp.join(coco._data_path, 'splits',
split + year + '.txt'), 'w') as f:
for i in range(coco.num_images):
filename = str(osp.basename(coco.image_path_at(i)).split('.')[0])
if i != coco.num_images - 1:
filename += '\n'
f.write(filename)
def merge_database(split, year, db_files):
entries = collections.OrderedDict()
data_path = os.path.dirname(db_files[0])
for db_file in db_files:
with open(db_file, 'rb') as f:
entries = pickle.load(f)
entries.update(entries)
with open(osp.join(data_path,
'coco_' + year + '_' + split +
'.db.pkl'), 'wb') as f:
pickle.dump(entries, f, pickle.HIGHEST_PROTOCOL)
# Prepare Datasets
## Create Datasets for PASCAL VOC
We assume that raw dataset has the following structure:
```
VOC<year>
|_ JPEGImages
| |_ <im-1-name>.jpg
| |_ ...
| |_ <im-N-name>.jpg
|_ Annotations
| |_ <im-1-name>.xml
| |_ ...
| |_ <im-N-name>.xml
|_ ImageSets
| |_ Main
| | |_ trainval.txt
| | |_ test.txt
| | |_ ...
```
Create record and json dataset by:
```
python pascal_voc.py \
--rec /path/to/datasets/voc_trainval0712 \
--gt /path/to/datasets/voc_trainval0712.json \
--images /path/to/VOC2007/JPEGImages \
/path/to/VOC2012/JPEGImages \
--annotations /path/to/VOC2007/Annotations \
/path/to/VOC2012/Annotations \
--splits /path/to/VOC2007/ImageSets/Main/trainval.txt \
/path/to/VOC2012/ImageSets/Main/trainval.txt
```
## Create Datasets for COCO
We assume that raw dataset has the following structure:
```
COCO
|_ images
| |_ train2017
| | |_ <im-1-name>.jpg
| | |_ ...
| | |_ <im-N-name>.jpg
|_ annotations
| |_ instances_train2017.json
| |_ ...
```
Create record dataset by:
```
python coco.py \
--rec /path/to/datasets/coco_train2017 \
--images /path/to/COCO/images/train2017 \
--annotations /path/to/COCO/annotations/instances_train2017.json
```
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Prepare MS COCO datasets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import sys
import time
import dragon
from pycocotools.coco import COCO
from pycocotools.mask import frPyObjects
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Prepare PASCAL VOC datasets')
parser.add_argument(
'--rec',
default=None,
help='path to write record dataset')
parser.add_argument(
'--images',
nargs='+',
type=str,
default=None,
help='path of images folder')
parser.add_argument(
'--annotations',
nargs='+',
type=str,
default=None,
help='path of annotations folder')
parser.add_argument(
'--splits',
nargs='+',
type=str,
default=None,
help='path of split file')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def make_example(img_id, img_file, cocoGt):
"""Return the record example."""
img_meta = cocoGt.imgs[img_id]
img_anns = cocoGt.loadAnns(cocoGt.getAnnIds(imgIds=[img_id]))
cat_id_to_cat = dict((v['id'], v['name'])
for v in cocoGt.cats.values())
with open(img_file, 'rb') as f:
img_bytes = bytes(f.read())
height, width = img_meta['height'], img_meta['width']
example = {'id': str(img_id), 'height': height, 'width': width,
'depth': 3, 'content': img_bytes, 'object': []}
for ann in img_anns:
x1 = float(max(0, ann['bbox'][0]))
y1 = float(max(0, ann['bbox'][1]))
x2 = float(min(width - 1, x1 + max(0, ann['bbox'][2] - 1)))
y2 = float(min(height - 1, y1 + max(0, ann['bbox'][3] - 1)))
mask, polygons = b'', []
segm = ann.get('segmentation', None)
if segm is not None and isinstance(segm, list):
for p in ann['segmentation']:
if len(p) < 6:
print('Remove Invalid segm.')
# Valid polygons have >= 3 points, so require >= 6 coordinates
polygons = [p for p in ann['segmentation'] if len(p) >= 6]
elif segm is not None:
# Crowd masks.
# Some are encoded with wrong height or width.
# Do not use them or decoding error is inevitable.
rle = frPyObjects(ann['segmentation'], height, width)
assert type(rle) == dict
mask = rle['counts']
example['object'].append({
'name': cat_id_to_cat[ann['category_id']],
'xmin': x1, 'ymin': y1, 'xmax': x2, 'ymax': y2,
'mask': mask, 'polygons': polygons,
'difficult': ann.get('iscrowd', 0)})
return example
def write_dataset(args):
assert len(args.images) == len(args.annotations)
if os.path.exists(args.rec):
raise ValueError('The record path is already exist.')
os.makedirs(args.rec)
print('Write record dataset to {}'.format(args.rec))
writer = dragon.io.KPLRecordWriter(
path=args.rec,
protocol={
'id': 'string',
'content': 'bytes',
'height': 'int64',
'width': 'int64',
'depth': 'int64',
'object': [{
'name': 'string',
'xmin': 'float64',
'ymin': 'float64',
'xmax': 'float64',
'ymax': 'float64',
'mask': 'bytes',
'polygons': [['float64']],
'difficult': 'int64',
}]
}
)
# Scan all available entries.
print('Scan entries...')
entries, cocoGts = [], []
for ann_file in args.annotations:
cocoGts.append(COCO(ann_file))
if args.splits is not None:
assert len(args.splits) == len(args.images)
for i, split in enumerate(args.splits):
f = open(split, 'r')
for line in f.readlines():
filename = line.strip()
img_id = int(filename)
img_file = os.path.join(args.images[i], filename + '.jpg')
entries.append((img_id, img_file, cocoGts[i]))
f.close()
else:
for i, cocoGt in enumerate(cocoGts):
for info in cocoGt.imgs.values():
img_id = info['id']
img_file = os.path.join(args.images[i], info['file_name'])
entries.append((img_id, img_file, cocoGts[i]))
print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
start_time = time.time()
for i, entry in enumerate(entries):
if i > 0 and i % 2000 == 0:
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
i, len(entries), now_time - start_time))
writer.write(make_example(*entry))
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
len(entries), len(entries), now_time - start_time))
writer.close()
end_time = time.time()
data_size = os.path.getsize(args.rec + '/root.data') * 1e-6
print('{} images take {:.2f} MB in {:.2f} sec.'
.format(len(entries), data_size, end_time - start_time))
if __name__ == '__main__':
args = parse_args()
if args.rec is not None:
write_dataset(args)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Prepare JSON datasets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import json
import os
import sys
import dragon
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Prepare PASCAL VOC datasets')
parser.add_argument(
'--rec',
default=None,
help='path to read record')
parser.add_argument(
'--gt',
default=None,
help='path to write json ground-truth')
parser.add_argument(
'--categories',
nargs='+',
type=str,
default=None,
help='dataset object categories')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def get_image_id(image_name):
image_id = image_name.split('_')[-1].split('.')[0]
try:
return int(image_id)
except ValueError:
return image_name
def write_dataset(args):
dataset = {'images': [], 'categories': [], 'annotations': []}
kpl_dataset = dragon.io.KPLRecordDataset(args.rec)
cat_to_cat_id = dict(zip(args.categories,
range(1, len(args.categories) + 1)))
print('Writing json dataset to {}'.format(args.gt))
for cat in args.categories:
dataset['categories'].append({
'name': cat, 'id': cat_to_cat_id[cat]})
for _ in range(len(kpl_dataset)):
example = kpl_dataset.get()
image_id = get_image_id(example['id'])
dataset['images'].append({
'id': image_id, 'height': example['height'],
'width': example['width']})
for obj in example['object']:
if 'x2' in obj:
x1, y1, x2, y2 = obj['x1'], obj['y1'], obj['x2'], obj['y2']
elif 'xmin' in obj:
x1, y1, x2, y2 = obj['xmin'], obj['ymin'], obj['xmax'], obj['ymax']
else:
x1, y1, x2, y2 = obj['bbox']
w, h = x2 - x1 + 1, y2 - y1 + 1
dataset['annotations'].append({
'id': str(len(dataset['annotations'])),
'bbox': [x1, y1, w, h],
'area': w * h,
'iscrowd': obj.get('difficult', 0),
'image_id': image_id,
'category_id': cat_to_cat_id[obj['name']]})
with open(args.gt, 'w') as f:
json.dump(dataset, f)
if __name__ == '__main__':
args = parse_args()
if args.rec is None or not os.path.exists(args.rec):
raise ValueError('Specify the prepared record dataset.')
if args.gt is None:
raise ValueError('Specify the path to write json dataset.')
write_dataset(args)
...@@ -8,27 +8,67 @@ ...@@ -8,27 +8,67 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Prepare PASCAL VOC datasets."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import argparse
import os import os
import sys
import time import time
import cv2 import cv2
import dragon import dragon
import numpy as np import numpy as np
import xml.etree.ElementTree as ET import xml.etree.ElementTree
def make_example(image_file, xml_file): def parse_args():
tree = ET.parse(xml_file) """Parse arguments."""
parser = argparse.ArgumentParser(
description='Prepare PASCAL VOC datasets')
parser.add_argument(
'--rec',
default=None,
help='path to write record dataset')
parser.add_argument(
'--gt',
default=None,
help='path to write json dataset')
parser.add_argument(
'--images',
nargs='+',
type=str,
default=None,
help='path of images folder')
parser.add_argument(
'--annotations',
nargs='+',
type=str,
default=None,
help='path of annotations folder')
parser.add_argument(
'--splits',
nargs='+',
type=str,
default=None,
help='path of split file')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def make_example(img_file, xml_file):
"""Return the record example."""
tree = xml.etree.ElementTree.parse(xml_file)
filename = os.path.split(xml_file)[-1] filename = os.path.split(xml_file)[-1]
objs = tree.findall('object') objects = tree.findall('object')
size = tree.find('size') size = tree.find('size')
example = {'id': filename.split('.')[0], 'object': []} example = {'id': filename.split('.')[0], 'object': []}
with open(image_file, 'rb') as f: with open(img_file, 'rb') as f:
img_bytes = bytes(f.read()) img_bytes = bytes(f.read())
if size is not None: if size is not None:
example['height'] = int(size.find('height').text) example['height'] = int(size.find('height').text)
...@@ -38,7 +78,7 @@ def make_example(image_file, xml_file): ...@@ -38,7 +78,7 @@ def make_example(image_file, xml_file):
img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 3) img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 3)
example['height'], example['width'], example['depth'] = img.shape example['height'], example['width'], example['depth'] = img.shape
example['content'] = img_bytes example['content'] = img_bytes
for ix, obj in enumerate(objs): for obj in objects:
bbox = obj.find('bndbox') bbox = obj.find('bndbox')
is_diff = 0 is_diff = 0
if obj.find('difficult') is not None: if obj.find('difficult') is not None:
...@@ -49,35 +89,21 @@ def make_example(image_file, xml_file): ...@@ -49,35 +89,21 @@ def make_example(image_file, xml_file):
'ymin': float(bbox.find('ymin').text), 'ymin': float(bbox.find('ymin').text),
'xmax': float(bbox.find('xmax').text), 'xmax': float(bbox.find('xmax').text),
'ymax': float(bbox.find('ymax').text), 'ymax': float(bbox.find('ymax').text),
'difficult': is_diff, 'difficult': is_diff})
})
return example return example
def make_record( def write_dataset(args):
record_file, """Write the record dataset."""
images_path, assert len(args.splits) == len(args.images)
annotations_path, assert len(args.splits) == len(args.annotations)
splits_path, if os.path.exists(args.rec):
splits raise ValueError('The record path is already exist.')
): os.makedirs(args.rec)
if os.path.exists(record_file): print('Write record dataset to {}'.format(args.rec))
raise ValueError('The record file is already exist.')
os.makedirs(record_file)
if not isinstance(images_path, list):
images_path = [images_path]
if not isinstance(annotations_path, list):
annotations_path = [annotations_path]
if not isinstance(splits_path, list):
splits_path = [splits_path]
assert len(splits) == len(splits_path)
assert len(splits) == len(images_path)
assert len(splits) == len(annotations_path)
writer = dragon.io.KPLRecordWriter( writer = dragon.io.KPLRecordWriter(
path=record_file, path=args.rec,
protocol={ protocol={
'id': 'string', 'id': 'string',
'content': 'bytes', 'content': 'bytes',
...@@ -95,36 +121,56 @@ def make_record( ...@@ -95,36 +121,56 @@ def make_record(
} }
) )
# Scan all available entries # Scan all available entries.
print('Scan entries...') print('Scan entries...')
entries = [] entries = []
for i, split in enumerate(splits): for i, split in enumerate(args.splits):
split_file = os.path.join(splits_path[i], split + '.txt') with open(split, 'r') as f:
with open(split_file, 'r') as f:
lines = f.readlines() lines = f.readlines()
for line in lines: for line in lines:
filename = line.strip() filename = line.strip()
img_file = os.path.join(images_path[i], filename + '.jpg') img_file = os.path.join(args.images[i], filename + '.jpg')
ann_file = os.path.join(annotations_path[i], filename + '.xml') ann_file = os.path.join(args.annotations[i], filename + '.xml')
entries.append((img_file, ann_file)) entries.append((img_file, ann_file))
# Parse and write into record file # Parse and write into record file.
print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime())) print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
start_time = time.time() start_time = time.time()
for i, (img_file, xml_file) in enumerate(entries):
for i, (img_file, ann_file) in enumerate(entries):
if i > 0 and i % 2000 == 0: if i > 0 and i % 2000 == 0:
now_time = time.time() now_time = time.time()
print('{} / {} in {:.2f} sec'.format( print('{} / {} in {:.2f} sec'.format(
i, len(entries), now_time - start_time)) i, len(entries), now_time - start_time))
writer.write(make_example(img_file, ann_file)) writer.write(make_example(img_file, xml_file))
now_time = time.time() now_time = time.time()
print('{} / {} in {:.2f} sec'.format( print('{} / {} in {:.2f} sec'.format(
len(entries), len(entries), now_time - start_time)) len(entries), len(entries), now_time - start_time))
writer.close() writer.close()
end_time = time.time() end_time = time.time()
data_size = os.path.getsize(record_file + '/root.data') * 1e-6 data_size = os.path.getsize(args.rec + '/root.data') * 1e-6
print('{} images take {:.2f} MB in {:.2f} sec.' print('{} images take {:.2f} MB in {:.2f} sec.'
.format(len(entries), data_size, end_time - start_time)) .format(len(entries), data_size, end_time - start_time))
def write_json_dataset(args):
"""Write the json dataset."""
categories = ['aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
import subprocess
scirpt = os.path.dirname(os.path.abspath(__file__)) + '/json_dataset.py'
cmd = '{} {} '.format(sys.executable, scirpt)
cmd += '--rec {} --gt {} '.format(args.rec, args.gt)
cmd += '--categories {} '.format(' '.join(categories))
return subprocess.call(cmd, shell=True)
if __name__ == '__main__':
args = parse_args()
if args.rec is not None:
write_dataset(args)
if args.gt is not None:
write_json_dataset(args)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import time
import cv2
import dragon
import numpy as np
import xml.etree.ElementTree as ET
def make_example(image_file, xml_file):
tree = ET.parse(xml_file)
filename = os.path.split(xml_file)[-1]
objs = tree.findall('object')
example = {'id': filename.split('.')[0], 'object': []}
with open(image_file, 'rb') as f:
img_bytes = bytes(f.read())
img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 1)
example['height'], example['width'], example['depth'] = img.shape
example['content'] = img_bytes
for ix, obj in enumerate(objs):
bbox = obj.find('bndbox')
is_diff = 0
if obj.find('difficult') is not None:
is_diff = int(obj.find('difficult').text) == 1
example['object'].append({
'name': obj.find('name').text.strip(),
'x1': float(bbox.find('x1').text),
'y1': float(bbox.find('y1').text),
'x2': float(bbox.find('x2').text),
'y2': float(bbox.find('y2').text),
'x3': float(bbox.find('x3').text),
'y3': float(bbox.find('y3').text),
'x4': float(bbox.find('x4').text),
'y4': float(bbox.find('y4').text),
'difficult': is_diff,
})
return example
def make_record(
record_file,
images_path,
annotations_path,
splits_path,
splits
):
if os.path.exists(record_file):
raise ValueError('The record file is already exist.')
os.makedirs(record_file)
if not isinstance(images_path, list):
images_path = [images_path]
if not isinstance(annotations_path, list):
annotations_path = [annotations_path]
if not isinstance(splits_path, list):
splits_path = [splits_path]
assert len(splits) == len(splits_path)
assert len(splits) == len(images_path)
assert len(splits) == len(annotations_path)
print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
writer = dragon.io.KPLRecordWriter(
path=record_file,
protocol={
'id': 'string',
'content': 'bytes',
'height': 'int64',
'width': 'int64',
'depth': 'int64',
'object': [{
'name': 'string',
'x1': 'float64',
'y1': 'float64',
'x2': 'float64',
'y2': 'float64',
'x3': 'float64',
'y3': 'float64',
'x4': 'float64',
'y4': 'float64',
'difficult': 'int64',
}]
}
)
# Scan all available entries
print('Scan entries...')
entries = []
for i, split in enumerate(splits):
split_file = os.path.join(splits_path[i], split + '.txt')
with open(split_file, 'r') as f:
lines = f.readlines()
for line in lines:
filename = line.strip()
img_file = os.path.join(images_path[i], filename + '.jpg')
ann_file = os.path.join(annotations_path[i], filename + '.xml')
entries.append((img_file, ann_file))
# Parse and write into record file
print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
start_time = time.time()
for i, (img_file, ann_file) in enumerate(entries):
if i > 0 and i % 2000 == 0:
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
i, len(entries), now_time - start_time))
writer.write(make_example(img_file, ann_file))
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
len(entries), len(entries), now_time - start_time))
writer.close()
end_time = time.time()
data_size = os.path.getsize(record_file + '/root.data') * 1e-6
print('{} images take {:.2f} MB in {:.2f} sec.'
.format(len(entries), data_size, end_time - start_time))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Make record file for VOC dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from os import path as osp
from maker import make_record
if __name__ == '__main__':
voc_root = '/data'
make_record(
record_file=osp.join(voc_root, 'voc_0712_trainval'),
images_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/JPEGImages'),
osp.join(voc_root, 'VOCdevkit2012/VOC2012/JPEGImages')],
annotations_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/Annotations'),
osp.join(voc_root, 'VOCdevkit2012/VOC2012/Annotations')],
splits_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/ImageSets/Main'),
osp.join(voc_root, 'VOCdevkit2012/VOC2012/ImageSets/Main')],
splits=['trainval', 'trainval']
)
make_record(
record_file=osp.join(voc_root, 'voc_2007_test'),
images_path=osp.join(voc_root, 'VOCdevkit2007/VOC2007/JPEGImages'),
annotations_path=osp.join(voc_root, 'VOCdevkit2007/VOC2007/Annotations'),
splits_path=osp.join(voc_root, 'VOCdevkit2007/VOC2007/ImageSets/Main'),
splits=['test']
)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.core.config import cfg
class AnchorSampler(object):
"""Sample precomputed anchors asynchronously."""
def __init__(self):
self._rpn_target = None
self._retinanet_target = None
self._ssd_target = None
if 'rcnn' in cfg.MODEL.TYPE:
from seetadet.algo.faster_rcnn import anchor_target
self._rpn_target = anchor_target.AnchorTarget()
elif cfg.MODEL.TYPE == 'retinanet':
from seetadet.algo.retinanet import anchor_target
self._retinanet_target = anchor_target.AnchorTarget()
elif cfg.MODEL.TYPE == 'ssd':
from seetadet.algo.ssd import anchor_target
self._ssd_target = anchor_target.AnchorTarget()
def __call__(self, **inputs):
"""Return the sample anchors."""
if self._rpn_target:
fg_inds, bg_inds = \
self._rpn_target.sample_anchors(
gt_boxes=inputs['gt_boxes'],
im_info=inputs['im_info'],
)
return {'fg_inds': fg_inds, 'bg_inds': bg_inds}
if self._retinanet_target:
fg_inds, ignore_inds = \
self._retinanet_target.sample_anchors(
gt_boxes=inputs['gt_boxes'],
im_info=inputs['im_info'],
)
return {'fg_inds': fg_inds, 'bg_inds': ignore_inds}
if self._ssd_target:
fg_inds, neg_inds = \
self._ssd_target.sample_anchors(
gt_boxes=inputs['gt_boxes'],
)
return {'fg_inds': fg_inds, 'bg_inds': neg_inds}
return {}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import math
import numpy as np
import numpy.random as npr
from seetadet.algo.faster_rcnn import generate_anchors as anchor_util
from seetadet.algo.faster_rcnn import utils as rcnn_util
from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util
from seetadet.utils.env import new_tensor
class AnchorTarget(object):
"""Assign ground-truth targets to anchors."""
def __init__(self):
super(AnchorTarget, self).__init__()
# Load the basic configs
self.scales = cfg.RPN.SCALES
self.strides = cfg.RPN.STRIDES
self.ratios = cfg.RPN.ASPECT_RATIOS
self.num_strides = len(self.strides)
# Generate base anchors
self.base_anchors = []
for i in range(self.num_strides):
self.base_anchors.append(
anchor_util.generate_anchors(
self.strides[i],
self.ratios,
np.array([self.scales[i]])
if self.num_strides > 1
else np.array(self.scales)))
# Plan the maximum shifted anchor layout
max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
if cfg.MODEL.COARSEST_STRIDE > 0:
stride = float(cfg.MODEL.COARSEST_STRIDE)
max_size = int(math.ceil(max_size / stride) * stride)
self.max_shapes = [[math.ceil(max_size / stride)] * 2
for stride in self.strides]
self.all_coords = rcnn_util.get_shifted_coords(
self.max_shapes, self.base_anchors)
self.all_anchors = rcnn_util.get_shifted_anchors(
self.max_shapes, self.base_anchors, self.strides)
def sample_anchors(self, gt_boxes, im_info, all_anchors=None):
if all_anchors is None:
all_anchors = self.all_anchors
# Only keep anchors inside the image
# to get higher quality proposals.
inds_inside = np.where(
(all_anchors[:, 0] >= 0) &
(all_anchors[:, 1] >= 0) &
(all_anchors[:, 2] < im_info[1]) &
(all_anchors[:, 3] < im_info[0]))[0]
anchors = all_anchors[inds_inside, :]
num_inside = len(inds_inside)
labels = np.empty((num_inside,), 'int32')
labels.fill(-1)
# Overlaps between the anchors and the gt boxes.
overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
argmax_overlaps = overlaps.argmax(axis=1)
max_overlaps = overlaps[np.arange(num_inside), argmax_overlaps]
# Overlaps between the gt boxes and anchors with highest IoU.
gt_argmax_overlaps = overlaps.argmax(axis=0)
gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])]
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
# Foreground: for each gt, anchor with highest overlap.
labels[gt_argmax_overlaps] = 1
# Foreground: above threshold IoU.
labels[max_overlaps >= cfg.RPN.POSITIVE_OVERLAP] = 1
# Background: below threshold IoU.
labels[max_overlaps < cfg.RPN.NEGATIVE_OVERLAP] = 0
# Retract the clamping if we don't have one.
fg_inds = np.where(labels == 1)[0]
if len(fg_inds) == 0:
labels[gt_argmax_overlaps] = 1
fg_inds = np.where(labels == 1)[0]
# Subsample positive labels if we have too many.
num_fg = int(cfg.RPN.FG_FRACTION * cfg.RPN.BATCH_SIZE)
if len(fg_inds) > num_fg:
fg_inds = npr.choice(fg_inds, num_fg, False)
# Subsample negative labels if we have too many.
num_bg = cfg.RPN.BATCH_SIZE - len(fg_inds)
bg_inds = np.where(labels == 0)[0]
if len(bg_inds) > num_bg:
bg_inds = npr.choice(bg_inds, num_bg, False)
return inds_inside[fg_inds], inds_inside[bg_inds]
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
shapes = [f.shape[-2:] for f in inputs['features']]
image_stride = sum(self.base_anchors[i].shape[0] * np.prod(shapes[i])
for i in range(len(inputs['features'])))
narrow_args = [self.all_coords, self.base_anchors, self.max_shapes, shapes]
outputs = collections.defaultdict(list)
for ix in range(num_images):
fg_inds = inputs['fg_inds'][ix]
bg_inds = inputs['bg_inds'][ix]
gt_boxes = inputs['gt_boxes'][ix]
# Narrow anchors to match the feature layout
anchors = self.all_anchors[fg_inds]
bg_inds = rcnn_util.narrow_anchors(*(narrow_args + [bg_inds]))
_, anchors = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds, anchors]))
fg_inds = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds]))
# Compute bbox targets
gt_assignment = box_util.bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
bbox_targets = box_util.bbox_transform(anchors, gt_boxes[gt_assignment, :4])
outputs['bbox_anchors'].append(anchors)
outputs['bbox_targets'].append(bbox_targets)
# Compute sparse indices
fg_inds += ix * image_stride
bg_inds += ix * image_stride
outputs['cls_inds'].extend([fg_inds, bg_inds])
outputs['bbox_inds'].extend([fg_inds])
outputs['labels'].extend([np.ones_like(fg_inds, 'float32'),
np.zeros_like(bg_inds, 'float32')])
return {
'labels': new_tensor(
np.concatenate(outputs['labels'])),
'cls_inds': new_tensor(
np.concatenate(outputs['cls_inds'])),
'bbox_inds': new_tensor(
np.concatenate(outputs['bbox_inds'])),
'bbox_targets': new_tensor(
np.concatenate(outputs['bbox_targets']).astype('float32')),
'bbox_anchors': new_tensor(
np.concatenate(outputs['bbox_anchors']).astype('float32')),
}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import multiprocessing as mp
import time
import threading
import queue
import dragon
import dragon.vm.torch as torch
import numpy as np
from seetadet.algo.faster_rcnn import data_transformer
from seetadet.core.config import cfg
from seetadet.datasets.factory import get_dataset
from seetadet.utils import blob as blob_util
from seetadet.utils import logger
class DataLoader(object):
"""Load mini-batches of data."""
def __init__(self):
super(DataLoader, self).__init__()
dataset = get_dataset(cfg.TRAIN.DATASET)
self.iterator = Iterator(**{
'dataset': dataset.cls,
'source': dataset.source,
'classes': dataset.classes,
'shuffle': cfg.TRAIN.USE_SHUFFLE,
'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
})
self.iterator.start()
def __call__(self):
outputs = self.iterator.next()
if isinstance(outputs['image'], np.ndarray):
outputs['image'] = torch.from_numpy(outputs['image'])
return outputs
class Iterator(threading.Thread):
"""Iterator to return the batch of data."""
def __init__(self, **kwargs):
super(Iterator, self).__init__()
# Distributed settings
rank, group_size = 0, 1
process_group = dragon.distributed.get_group()
if process_group is not None and \
kwargs.get('phase', 'TRAIN') == 'TRAIN':
group_size = process_group.size
rank = dragon.distributed.get_rank(process_group)
# Configuration
self._batch_size = kwargs.get('batch_size', 2)
self._num_readers = kwargs.get('num_readers', 1)
self._num_transformers = kwargs.get('num_transformers', 3)
self.daemon = True
# Initialize queues
num_batches = self._num_readers
self._queue1 = mp.Queue(num_batches * self._batch_size)
self._queue2 = mp.Queue(num_batches * self._batch_size)
self._queue3 = queue.Queue(num_batches)
# Initialize readers
self._readers = []
for i in range(self._num_readers):
part_idx, num_parts = i, self._num_readers
num_parts *= group_size
part_idx += rank * self._num_readers
self._readers.append(dragon.io.DataReader(
part_idx=part_idx, num_parts=num_parts, **kwargs))
self._readers[i]._seed += part_idx
self._readers[i].q_out = self._queue1
self._readers[i].start()
time.sleep(0.1)
# Initialize transformers
self._transformers = []
for i in range(self._num_transformers):
p = data_transformer.DataTransformer(**kwargs)
p._seed += (i + rank * self._num_transformers)
p.q_in, p.q_out = self._queue1, self._queue2
p.start()
self._transformers.append(p)
time.sleep(0.1)
# Register cleanup callbacks
def cleanup():
def terminate(processes):
for p in processes:
p.terminate()
p.join()
terminate(self._transformers)
logger.info('Terminate DataTransformer.')
terminate(self._readers)
logger.info('Terminate DataReader.')
import atexit
atexit.register(cleanup)
def next(self):
"""Return the next batch of data."""
return self.__next__()
def run(self):
"""Main loop."""
num_images = cfg.TRAIN.IMS_PER_BATCH
num_batches = cfg.TRAIN.ASPECT_GROUPING
logger.info('Initialize prefetching batches...')
example_buffer = [self._queue2.get()
for _ in range(num_images * num_batches)]
next_examples = []
while True:
# Use cached buffer for next N examples
# Examples are sorted to simulate aspect grouping
if len(next_examples) == 0:
next_examples = example_buffer
next_examples.sort(key=lambda d: d['aspect_ratio'])
example_buffer = []
# Prepare the next batch
outputs = collections.defaultdict(list)
for i in range(num_images):
example = next_examples.pop(0)
outputs['image'].append(example['image'])
outputs['gt_boxes'].append(example['boxes'])
outputs['im_info'].append(example['im_info'])
outputs['fg_inds'].append(example.get('fg_inds', None))
outputs['bg_inds'].append(example.get('bg_inds', None))
example_buffer.append(self._queue2.get())
outputs['image'] = blob_util.im_list_to_blob(
outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
# Send batch data to consumer
self._queue3.put(outputs)
def __iter__(self):
"""Return the iterator self."""
return self
def __next__(self):
"""Return the next batch of data."""
return self._queue3.get()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import multiprocessing
import cv2
import numpy as np
import numpy.random as npr
from seetadet.algo import common as algo_common
from seetadet.core.config import cfg
from seetadet.datasets.example import Example
from seetadet.utils import boxes as box_util
from seetadet.utils import image as image_util
class DataTransformer(multiprocessing.Process):
"""DataTransformer."""
def __init__(self, **kwargs):
super(DataTransformer, self).__init__()
self._scales = cfg.TRAIN.SCALES
self._random_scales = cfg.TRAIN.RANDOM_SCALES
self._max_size = cfg.TRAIN.MAX_SIZE
self._seed = cfg.RNG_SEED
self._use_diff = cfg.TRAIN.USE_DIFF
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._use_distort = cfg.TRAIN.USE_COLOR_JITTER
self._classes = kwargs.get('classes', ('__background__',))
self._num_classes = len(self._classes)
self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self._anchor_sampler = algo_common.AnchorSampler()
self.q_in = self.q_out = None
self.daemon = True
def get_boxes(self, example, im_scale, im_offset, flipped):
objects, num_objects = example.objects, 0
height, width = example.height, example.width
if not self._use_diff:
for obj in objects:
if obj.get('difficult', 0) == 0:
num_objects += 1
else:
num_objects = len(objects)
boxes = np.zeros((num_objects, 4), 'float32')
gt_classes = np.zeros((num_objects,), 'float32')
# Filter the difficult instances.
object_idx = 0
for obj in objects:
if not self._use_diff and obj.get('difficult', 0) > 0:
continue
bbox = obj['bbox']
boxes[object_idx, :] = [max(0, bbox[0]),
max(0, bbox[1]),
min(bbox[2], width - 1),
min(bbox[3], height - 1)]
gt_classes[object_idx] = self._class_to_ind[obj['name']]
object_idx += 1
# Flip the boxes if necessary.
if flipped:
boxes = box_util.flip_boxes(boxes, width)
# Scale the boxes to the detecting scale.
boxes *= im_scale
# Offset the boxes to align the cropping.
if im_offset is not None:
boxes[:, 0::2] += im_offset[1]
boxes[:, 1::2] += im_offset[0]
boxes[:, :] = np.minimum(
np.maximum(boxes[:, :], 0),
[im_offset[2][1] - 1, im_offset[2][0] - 1] * 2)
# Attach the classes.
gt_boxes = np.empty((num_objects, 5), dtype=np.float32)
gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
return gt_boxes
def get(self, example):
example = Example(example)
# Resize.
target_size = npr.choice(self._scales)
img, im_scale = image_util.resize_image_with_target_size(
example.image,
target_size=target_size,
max_size=self._max_size,
random_scales=self._random_scales,
)
# Flip.
flipped = False
if self._use_flipped and npr.randint(2) > 0:
img = img[:, ::-1]
flipped = True
# Crop or Pad.
im_offset = None
if self._max_size == 0:
img, im_offset = image_util.get_image_with_target_size(
img, target_size)
# Distort.
if self._use_distort:
img = image_util.distort_image(img)
# Boxes.
boxes = self.get_boxes(example, im_scale, im_offset, flipped)
# Standard outputs.
outputs = {'image': img,
'boxes': boxes,
'im_info': img.shape[:2] + (im_scale,)}
# Attach precomputed targets.
if len(boxes) > 0:
outputs.update(
self._anchor_sampler(
gt_boxes=boxes,
im_info=outputs['im_info']))
return outputs
def run(self):
# Disable the opencv threading.
cv2.setNumThreads(1)
# Fix the process-local random seed.
np.random.seed(self._seed)
# Main prefetch loop
while True:
outputs = self.get(self.q_in.get())
if len(outputs['boxes']) < 1:
continue # Ignore non-object image.
height, width = outputs['image'].shape[:2]
outputs['aspect_ratio'] = float(height) / float(width)
self.q_out.put(outputs)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# Codes are based on:
#
# <https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/generate_anchors.py>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
# Verify that we compute the same anchors as Shaoqing's matlab implementation:
#
# >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat
# >> anchors
#
# anchors =
#
# -83 -39 100 56
# -175 -87 192 104
# -359 -183 376 200
# -55 -55 72 72
# -119 -119 136 136
# -247 -247 264 264
# -35 -79 52 96
# -79 -167 96 184
# -167 -343 184 360
# array([[ -83., -39., 100., 56.],
# [-175., -87., 192., 104.],
# [-359., -183., 376., 200.],
# [ -55., -55., 72., 72.],
# [-119., -119., 136., 136.],
# [-247., -247., 264., 264.],
# [ -35., -79., 52., 96.],
# [ -79., -167., 96., 184.],
# [-167., -343., 184., 360.]])
def generate_anchors(
base_size=16,
ratios=(0.5, 1, 2),
scales=2**np.arange(3, 6),
):
"""
Generate anchor (reference) windows by enumerating aspect ratios X
scales wrt a reference (0, 0, 15, 15) window.
"""
base_anchor = np.array([1, 1, base_size, base_size]) - 1
ratio_anchors = _ratio_enum(base_anchor, ratios)
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in range(ratio_anchors.shape[0])])
return anchors
def generate_anchors_v2(
stride=16,
ratios=(0.5, 1, 2),
sizes=(32, 64, 128, 256, 512),
):
"""
Generates a matrix of anchor boxes in (x1, y1, x2, y2) format. Anchors
are centered on stride / 2, have (approximate) sqrt areas of the specified
sizes, and aspect ratios as given.
"""
return generate_anchors(
base_size=stride,
ratios=ratios,
scales=np.array(sizes, dtype=np.float) / stride,
)
def _whctrs(anchor):
"""Return width, height, x center, and y center for an anchor (window)."""
w = anchor[2] - anchor[0] + 1
h = anchor[3] - anchor[1] + 1
x_ctr = anchor[0] + 0.5 * (w - 1)
y_ctr = anchor[1] + 0.5 * (h - 1)
return w, h, x_ctr, y_ctr
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""
Given a vector of widths (ws) and heights (hs) around a center
(x_ctr, y_ctr), output a set of anchors (windows).
"""
ws = ws[:, np.newaxis]
hs = hs[:, np.newaxis]
anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
y_ctr - 0.5 * (hs - 1),
x_ctr + 0.5 * (ws - 1),
y_ctr + 0.5 * (hs - 1)))
return anchors
def _ratio_enum(anchor, ratios):
"""Enumerate a set of anchors for each aspect ratio wrt an anchor."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
size = w * h
size_ratios = size / ratios
ws = np.round(np.sqrt(size_ratios))
hs = np.round(ws * ratios)
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
def _scale_enum(anchor, scales):
"""Enumerate a set of anchors for each scale wrt an anchor."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws = w * scales
hs = h * scales
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
if __name__ == '__main__':
print(generate_anchors())
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
from seetadet.algo.faster_rcnn import generate_anchors as anchor_util
from seetadet.algo.faster_rcnn import utils as rcnn_util
from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util
from seetadet.utils import nms
class Proposal(object):
"""Compute proposals by applying transformations anchors."""
def __init__(self):
super(Proposal, self).__init__()
# Load basic configs
self.scales = cfg.RPN.SCALES
self.strides = cfg.RPN.STRIDES
self.ratios = cfg.RPN.ASPECT_RATIOS
self.num_strides = len(self.strides)
self.defaults = collections.OrderedDict([
('rois', np.array([[-1, 0, 0, 1, 1]], 'float32'))])
self.bbox_transform_clip = \
np.log(max(cfg.TRAIN.MAX_SIZE,
max(cfg.TRAIN.SCALES)) / min(self.strides))
# Generate base anchors
self.base_anchors = []
for i in range(self.num_strides):
self.base_anchors.append(
anchor_util.generate_anchors(
self.strides[i],
self.ratios,
np.array([self.scales[i]])
if self.num_strides > 1
else np.array(self.scales)))
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
pre_nms_top_n = cfg.TRAIN.RPN_PRE_NMS_TOP_N
post_nms_top_n = cfg.TRAIN.RPN_POST_NMS_TOP_N
nms_thresh = cfg.TRAIN.RPN_NMS_THRESH
# Get resources
shapes = [f.shape[-2:] for f in inputs['features']]
all_anchors = rcnn_util.get_shifted_anchors(
shapes, self.base_anchors, self.strides)
# Prepare for the outputs
batch_rois = []
cls_prob = inputs['cls_prob'].numpy()
# (?, 4, A * K) -> (?, A * K, 4)
bbox_pred = inputs['bbox_pred'].numpy()
bbox_pred = bbox_pred.transpose((0, 2, 1))
# Extract RoIs separately
for ix in range(num_images):
# [?, N] -> [? * N, 1]
scores = cls_prob[ix].reshape((-1, 1))
deltas = bbox_pred[ix]
im_info = inputs['im_info'][ix]
if pre_nms_top_n <= 0 or pre_nms_top_n >= len(scores):
order = np.argsort(-scores.squeeze())
else:
# Avoid sorting possibly large arrays; First partition to get top K
# unsorted and then sort just those (~20x faster for 200k scores)
inds = np.argpartition(-scores.squeeze(), pre_nms_top_n)[:pre_nms_top_n]
order = np.argsort(-scores[inds].squeeze())
order = inds[order]
deltas = deltas[order]
anchors = all_anchors[order]
scores = scores[order]
# Convert anchors into proposals via bbox transformations
proposals = box_util.bbox_transform_inv(
anchors, deltas, clip=self.bbox_transform_clip)
# Clip predicted boxes to image
proposals = box_util.clip_tiled_boxes(proposals, im_info[:2])
# Apply nms (e.g. threshold = 0.7)
# Take after_nms_topN (e.g. 300)
# Return the top proposals (-> RoIs top)
keep = nms.gpu_nms(np.hstack((proposals, scores)), nms_thresh)
if post_nms_top_n > 0:
keep = keep[:post_nms_top_n]
proposals = proposals[keep, :]
# Attach RoIs with batch indices
batch_inds = np.empty((proposals.shape[0], 1), 'float32')
batch_inds.fill(ix)
rpn_rois = np.hstack((batch_inds, proposals.astype('float32', copy=False)))
batch_rois.append(rpn_rois)
# Merge RoIs into a blob
return np.concatenate(batch_rois, 0)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
import numpy.random as npr
from seetadet.algo.faster_rcnn import utils as rcnn_util
from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util
from seetadet.utils.env import new_tensor
class ProposalTarget(object):
"""Assign ground-truth targets to proposals."""
def __init__(self):
super(ProposalTarget, self).__init__()
self.num_strides = len(cfg.RPN.STRIDES)
self.num_classes = len(cfg.MODEL.CLASSES)
self.defaults = collections.OrderedDict([
('rois', np.array([[-1, 0, 0, 1, 1]], 'float32')),
('labels', np.array([-1], 'int64')),
('bbox_targets', np.zeros((1, 4), 'float32')),
])
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
# Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
all_rois = inputs['rois']
# Prepare for the outputs
keys = self.defaults.keys()
blobs = dict(map(lambda a, b: (a, b), keys, [[] for _ in keys]))
# Generate targets separately
for ix in range(num_images):
# GT boxes (x1, y1, x2, y2, label)
gt_boxes = inputs['gt_boxes'][ix]
# Extract proposals for this image
rois = all_rois[np.where(all_rois[:, 0].astype('int32') == ix)[0]]
# Include ground-truth boxes in the set of candidate rois
inds = np.ones((gt_boxes.shape[0], 1), gt_boxes.dtype) * ix
rois = np.vstack((rois, np.hstack((inds, gt_boxes[:, :4]))))
# Sample a batch of RoIs for training
rois_per_image = cfg.FRCNN.BATCH_SIZE
fg_rois_per_image = np.round(cfg.FRCNN.FG_FRACTION * rois_per_image)
rcnn_util.map_returns_to_blobs(
sample_rois(rois,
gt_boxes,
rois_per_image,
fg_rois_per_image),
blobs, keys,
)
# Stack into continuous blobs
blobs = dict((k, np.concatenate(blobs[k])) for k in blobs.keys())
if self.num_strides > 1:
# Distribute RoIs into pyramids
min_lvl = cfg.FPN.ROI_MIN_LEVEL
max_lvl = cfg.FPN.ROI_MAX_LEVEL
num_levels = max_lvl - min_lvl + 1
levels = rcnn_util.map_rois_to_levels(blobs['rois'], min_lvl, max_lvl)
lvl_blobs = rcnn_util.map_blobs_by_levels(
blobs,
self.defaults,
[np.where(levels == (i + min_lvl))[0] for i in range(num_levels)],
)
blobs = dict((k, np.concatenate(lvl_blobs[k])) for k in blobs.keys())
rois_wide = [lvl_blobs['rois'][i] for i in range(num_levels)]
else:
# Return RoIs directly for specified stride
rois_wide = [blobs['rois']]
# Select the foreground RoIs only for bbox branch
fg_inds = np.where(blobs['labels'] > 0)[0]
cls_inds = np.arange(len(blobs['rois'])) * self.num_classes
return {
'rois': [new_tensor(rois) for rois in rois_wide],
'labels': new_tensor(blobs['labels']),
'bbox_inds': new_tensor(cls_inds[fg_inds] + blobs['labels'][fg_inds]),
'bbox_targets': new_tensor(blobs['bbox_targets'][fg_inds].astype('float32')),
'bbox_anchors': new_tensor(blobs['rois'][fg_inds, 1:].astype('float32')),
}
def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois):
"""Sample a batch of RoIs comprising foreground and background examples."""
overlaps = box_util.bbox_overlaps(all_rois[:, 1:5], gt_boxes[:, :4])
gt_assignment = overlaps.argmax(axis=1)
max_overlaps = overlaps.max(axis=1)
labels = gt_boxes[gt_assignment, 4].astype('int64')
# Select foreground RoIs as those with >= POSITIVE_OVERLAP
fg_thresh = cfg.FRCNN.POSITIVE_OVERLAP
fg_inds = np.where(max_overlaps >= fg_thresh)[0]
while fg_inds.size == 0:
fg_thresh -= 0.01
fg_inds = np.where(max_overlaps >= fg_thresh)[0]
# Sample foreground regions without replacement
fg_rois_per_this_image = int(min(num_fg_rois, fg_inds.size))
fg_inds = npr.choice(fg_inds, fg_rois_per_this_image, False)
# Select background RoIs as those within
# [NEGATIVE_OVERLAP_LO, NEGATIVE_OVERLAP_HI)
bg_inds = np.where((max_overlaps < cfg.FRCNN.NEGATIVE_OVERLAP_HI) &
(max_overlaps >= cfg.FRCNN.NEGATIVE_OVERLAP_LO))[0]
# Compute number of background RoIs to take from this image
bg_rois_per_this_image = num_rois - fg_rois_per_this_image
bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size)
# Sample background regions without replacement
if bg_inds.size > 0:
bg_inds = npr.choice(bg_inds, bg_rois_per_this_image, False)
# The selecting indices (both fg and bg)
keep_inds = np.append(fg_inds, bg_inds)
# Select sampled values from various arrays
rois, labels = all_rois[keep_inds], labels[keep_inds]
# Clamp labels for the background RoIs to 0
labels[fg_rois_per_this_image:] = 0
# Compute the target from RoIs
outputs = [rois, labels]
outputs += [box_util.bbox_transform(
rois[:, 1:5],
gt_boxes[gt_assignment[keep_inds], :4],
cfg.BBOX_REG_WEIGHTS)]
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import types
import dragon.vm.torch as torch
import numpy as np
from seetadet.core.config import cfg
from seetadet.modeling.detector import new_detector
from seetadet.utils import blob as blob_util
from seetadet.utils import boxes as box_util
from seetadet.utils import image as image_util
from seetadet.utils import logger
from seetadet.utils import nms as nms_util
from seetadet.utils import time_util
def get_data(raw_images):
"""Return the test data."""
max_size = cfg.TEST.MAX_SIZE
images_wide = []
image_shapes_wide, image_scales_wide = [], []
for img in raw_images:
images, image_scales = image_util.scale_image(
img, scales=cfg.TEST.SCALES, max_size=max_size)
images_wide += images
image_scales_wide += image_scales
image_shapes_wide += [img.shape[:2] for img in images]
images = blob_util.im_list_to_blob(
images_wide, coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
image_shapes = np.array(image_shapes_wide)
image_scales = np.array(image_scales_wide).reshape((len(images), -1))
images_info = np.hstack([image_shapes, image_scales]).astype('float32')
return images, images_info
def ims_detect(detector, raw_images, timer=None):
"""Detect images at single or multiple scales."""
images, images_info = get_data(raw_images)
timer.tic() if timer else timer
# Do forward.
inputs = {'image': torch.from_numpy(images),
'im_info': torch.from_numpy(images_info)}
if not hasattr(detector, 'script_forward'):
def script_forward(self, image, im_info):
return self.forward({'image': image, 'im_info': im_info})
detector.script_forward = torch.jit.trace(
func=types.MethodType(script_forward, detector),
example_inputs=[inputs['image'], inputs['im_info']],
)
outputs = detector.script_forward(inputs['image'], inputs['im_info'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
# Decode results.
batch_pred = box_util.bbox_transform_inv(
outputs['rois'][:, 1:5],
outputs['bbox_pred'],
cfg.BBOX_REG_WEIGHTS)
results = [([], []) for _ in range(len(raw_images))]
for i in range(len(images)):
ii = i // len(cfg.TEST.SCALES)
inds = np.where(outputs['rois'][:, 0].astype(np.int32) == i)[0]
boxes = batch_pred[inds] / images_info[i][2]
boxes = box_util.clip_tiled_boxes(boxes, raw_images[ii].shape)
results[ii][0].append(outputs['cls_prob'][inds])
results[ii][1].append(boxes)
# Merge from multiple scales.
ret = [(np.vstack(s), np.vstack(b)) for s, b in results]
timer.toc() if timer else timer
return ret
def get_detections(outputs):
"""Return the categorical detections from outputs."""
scores, boxes = outputs
boxes_this_image = [[]]
empty_detections = np.zeros((0, 5), 'float32')
for j in range(1, len(cfg.MODEL.CLASSES)):
inds = np.where(scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
if len(inds) == 0:
boxes_this_image.append(empty_detections)
continue
cls_scores = scores[inds, j]
cls_boxes = boxes[inds, j * 4:(j + 1) * 4]
cls_detections = np.hstack(
(cls_boxes, cls_scores[:, np.newaxis])) \
.astype(np.float32, copy=False)
if cfg.TEST.USE_SOFT_NMS:
keep = nms_util.soft_nms(
cls_detections,
thresh=cfg.TEST.NMS,
method=cfg.TEST.SOFT_NMS_METHOD,
sigma=cfg.TEST.SOFT_NMS_SIGMA,
)
else:
keep = nms_util.nms(
cls_detections,
thresh=cfg.TEST.NMS,
)
cls_detections = cls_detections[keep, :]
boxes_this_image.append(cls_detections)
return [boxes_this_image]
def test_net(weights, q_in, q_out, device, root_logger=True):
"""Test a network trained with Faster R-CNN algorithm."""
cfg.GPU_ID = device
logger.set_root_logger(root_logger)
detector = new_detector(device, weights)
timers = time_util.new_timers('im_detect_bbox', 'misc')
must_stop = False
while not must_stop:
indices, raw_images = [], []
for _ in range(cfg.TEST.IMS_PER_BATCH):
i, raw_image = q_in.get()
if i < 0:
must_stop = True
break
indices.append(i)
raw_images.append(raw_image)
if len(raw_images) == 0:
continue
# Detect on specific scales.
all_outputs = ims_detect(
detector=detector,
raw_images=raw_images,
timer=timers['im_detect_bbox'],
)
# Post-processing.
for i, outputs in enumerate(all_outputs):
with timers['misc'].tic_and_toc():
boxes_this_image, = get_detections(outputs)
q_out.put((
indices[i],
dict([('im_detect', timers['im_detect_bbox'].average_time),
('misc', timers['misc'].average_time)]),
dict([('boxes', boxes_this_image)]),
))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
from seetadet.core.config import cfg
def get_shifted_coords(shapes, base_anchors):
"""Return the x-y coordinates of shifted anchors."""
xs, ys = [], []
for i in range(len(shapes)):
height, width = shapes[i]
x, y = np.arange(0, width), np.arange(0, height)
x, y = np.meshgrid(x, y)
# Add A anchors (A,) to cell K shifts (K,)
# to get shift coords (A, K)
xs.append(np.tile(x.flatten(), base_anchors[i].shape[0]))
ys.append(np.tile(y.flatten(), base_anchors[i].shape[0]))
return np.concatenate(xs), np.concatenate(ys)
def get_shifted_anchors(shapes, base_anchors, strides):
"""Return the shifted anchors on given shapes."""
anchors_to_pack = []
for i in range(len(shapes)):
height, width = shapes[i]
shift_x = np.arange(0, width) * strides[i]
shift_y = np.arange(0, height) * strides[i]
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose()
# Add A anchors (A, 1, 4) to cell K shifts (1, K, 4)
# to get shift anchors (A, K, 4)
a = base_anchors[i].shape[0]
k = shifts.shape[0]
anchors = (base_anchors[i].reshape((a, 1, 4)) +
shifts.reshape((1, k, 4)))
anchors_to_pack.append(anchors.reshape((a * k, 4)))
return np.vstack(anchors_to_pack)
def narrow_anchors(
all_coords,
base_anchors,
max_shapes,
shapes,
inds,
remapping=None,
):
"""Return the valid shifted anchors on given shapes."""
x_coords, y_coords = all_coords
inds_wide, remapping_wide = [], []
offset = num = 0
for i in range(len(max_shapes)):
num += base_anchors[i].shape[0] * np.prod(max_shapes[i])
inds_inside = np.where((inds >= offset) & (inds < num))[0]
inds_wide.append(inds[inds_inside])
if remapping is not None:
remapping_wide.append(remapping[inds_inside])
offset = num
offset1 = offset2 = num1 = num2 = 0
for i in range(len(max_shapes)):
num1 += base_anchors[i].shape[0] * np.prod(max_shapes[i])
num2 += base_anchors[i].shape[0] * np.prod(shapes[i])
inds = inds_wide[i]
x, y = x_coords[inds], y_coords[inds]
a = ((inds - offset1) // max_shapes[i][1]) // max_shapes[i][0]
inds = (a * shapes[i][0] + y) * shapes[i][1] + x + offset2
inds_mask = np.where((x < shapes[i][1]) & (y < shapes[i][0]))[0]
inds_wide[i] = inds[inds_mask]
if remapping is not None:
remapping_wide[i] = remapping_wide[i][inds_mask]
offset1, offset2 = num1, num2
outputs = [np.concatenate(inds_wide)]
if remapping is not None:
outputs += [np.concatenate(remapping_wide)]
return outputs[0] if len(outputs) == 1 else outputs
def map_returns_to_blobs(returns, blobs, keys):
"""Map returns of image to blobs."""
for i, key in enumerate(keys):
blobs[key].append(returns[i])
def map_rois_to_levels(rois, k_min, k_max):
"""Map rois to fpn levels."""
if len(rois) == 0:
return []
ws = rois[:, 3] - rois[:, 1] + 1
hs = rois[:, 4] - rois[:, 2] + 1
s = np.sqrt(ws * hs)
s0 = cfg.FPN.ROI_CANONICAL_SCALE # default: 224
lvl0 = cfg.FPN.ROI_CANONICAL_LEVEL # default: 4
target_levels = np.floor(lvl0 + np.log2(s / s0 + 1e-6))
return np.clip(target_levels, k_min, k_max)
def map_blobs_by_levels(blobs, defaults, lvl_inds):
"""Map blobs to outputs according to fpn indices."""
outputs = collections.defaultdict(list)
for inds in lvl_inds:
for key, blob in blobs.items():
outputs[key].append(
blob[inds]
if len(inds) > 0
else defaults[key])
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import multiprocessing as mp
import time
import threading
import queue
import dragon
import dragon.vm.torch as torch
import numpy as np
from seetadet.algo.mask_rcnn import data_transformer
from seetadet.core.config import cfg
from seetadet.datasets.factory import get_dataset
from seetadet.utils import blob as blob_util
from seetadet.utils import logger
class DataLoader(object):
"""Provide mini-batches of data."""
def __init__(self):
super(DataLoader, self).__init__()
dataset = get_dataset(cfg.TRAIN.DATASET)
self.iterator = Iterator(**{
'dataset': dataset.cls,
'source': dataset.source,
'classes': dataset.classes,
'shuffle': cfg.TRAIN.USE_SHUFFLE,
'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
})
self.iterator.start()
def __call__(self):
outputs = self.iterator.next()
if isinstance(outputs['image'], np.ndarray):
outputs['image'] = torch.from_numpy(outputs['image'])
return outputs
class Iterator(threading.Thread):
"""Iterator to return the batch of data."""
def __init__(self, **kwargs):
super(Iterator, self).__init__()
# Distributed settings
rank, group_size = 0, 1
process_group = dragon.distributed.get_group()
if process_group is not None and \
kwargs.get('phase', 'TRAIN') == 'TRAIN':
group_size = process_group.size
rank = dragon.distributed.get_rank(process_group)
# Configuration
self._batch_size = kwargs.get('batch_size', 2)
self._num_readers = kwargs.get('num_readers', 1)
self._num_transformers = kwargs.get('num_transformers', 3)
self.daemon = True
# Initialize queues
num_batches = self._num_readers
self._queue1 = mp.Queue(num_batches * self._batch_size)
self._queue2 = mp.Queue(num_batches * self._batch_size)
self._queue3 = queue.Queue(num_batches)
# Initialize readers
self._readers = []
for i in range(self._num_readers):
part_idx, num_parts = i, self._num_readers
num_parts *= group_size
part_idx += rank * self._num_readers
self._readers.append(dragon.io.DataReader(
part_idx=part_idx, num_parts=num_parts, **kwargs))
self._readers[i]._seed += part_idx
self._readers[i].q_out = self._queue1
self._readers[i].start()
time.sleep(0.1)
# Initialize transformers
self._transformers = []
for i in range(self._num_transformers):
p = data_transformer.DataTransformer(**kwargs)
p._seed += (i + rank * self._num_transformers)
p.q_in, p.q_out = self._queue1, self._queue2
p.start()
self._transformers.append(p)
time.sleep(0.1)
# Register cleanup callbacks
def cleanup():
def terminate(processes):
for p in processes:
p.terminate()
p.join()
terminate(self._transformers)
logger.info('Terminate DataTransformer.')
terminate(self._readers)
logger.info('Terminate DataReader.')
import atexit
atexit.register(cleanup)
def next(self):
"""Return the next batch of data."""
return self.__next__()
def run(self):
"""Main loop."""
num_images = cfg.TRAIN.IMS_PER_BATCH
num_batches = cfg.TRAIN.ASPECT_GROUPING
logger.info('Initialize prefetching batches...')
example_buffer = [self._queue2.get()
for _ in range(num_images * num_batches)]
next_examples = []
while True:
# Use cached buffer for next N examples
# Examples are sorted to simulate aspect grouping
if len(next_examples) == 0:
next_examples = example_buffer
next_examples.sort(key=lambda d: d['aspect_ratio'])
example_buffer = []
# Prepare the next batch
outputs = collections.defaultdict(list)
for i in range(num_images):
example = next_examples.pop(0)
outputs['image'].append(example['image'])
outputs['gt_boxes'].append(example['boxes'])
outputs['gt_segms'].append(example['segms'])
outputs['im_info'].append(example['im_info'])
outputs['fg_inds'].append(example.get('fg_inds', None))
outputs['bg_inds'].append(example.get('bg_inds', None))
example_buffer.append(self._queue2.get())
outputs['image'] = blob_util.im_list_to_blob(
outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
# Send batch data to consumer
self._queue3.put(outputs)
def __iter__(self):
"""Return the iterator self."""
return self
def __next__(self):
"""Return the next batch of data."""
return self._queue3.get()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import multiprocessing
import cv2
import numpy as np
import numpy.random as npr
from seetadet.algo import common as algo_common
from seetadet.core.config import cfg
from seetadet.datasets.example import Example
from seetadet.utils.pycocotools import mask_utils
from seetadet.utils import boxes as box_util
from seetadet.utils import image as image_util
class DataTransformer(multiprocessing.Process):
"""DataTransformer."""
def __init__(self, **kwargs):
super(DataTransformer, self).__init__()
self._scales = cfg.TRAIN.SCALES
self._random_scales = cfg.TRAIN.RANDOM_SCALES
self._max_size = cfg.TRAIN.MAX_SIZE
self._seed = cfg.RNG_SEED
self._use_diff = cfg.TRAIN.USE_DIFF
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._use_distort = cfg.TRAIN.USE_COLOR_JITTER
self._classes = kwargs.get('classes', ('__background__',))
self._num_classes = len(self._classes)
self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self._anchor_sampler = algo_common.AnchorSampler()
self.q_in = self.q_out = None
self.daemon = True
def get_boxes_and_segms(self, example, im_scale, im_offset, flipped):
objects, num_objects = example.objects, 0
height, width = example.height, example.width
if not self._use_diff:
for obj in objects:
if obj.get('difficult', 0) == 0:
num_objects += 1
else:
num_objects = len(objects)
boxes, segms = np.zeros((num_objects, 4), 'float32'), []
gt_classes = np.zeros((num_objects,), 'float32')
segm_flags = np.ones((num_objects,), 'float32')
# Filter the difficult instances.
object_idx = 0
for obj in objects:
if not self._use_diff and obj.get('difficult', 0) > 0:
continue
bbox = obj['bbox']
boxes[object_idx, :] = [max(0, bbox[0]),
max(0, bbox[1]),
min(bbox[2], width - 1),
min(bbox[3], height - 1)]
if 'mask' in obj:
mask_img = mask_utils.bytes2img(obj['mask'], height, width)
segms.append(mask_img[:, ::-1] if flipped else mask_img)
elif 'polygons' in obj:
polygons = obj['polygons']
segms.append(box_util.flip_polygons(
polygons, width) if flipped else polygons)
else:
segms.append(None)
segm_flags[object_idx] = 0.
gt_classes[object_idx] = self._class_to_ind[obj['name']]
object_idx += 1
# Flip the boxes if necessary.
if flipped:
boxes = box_util.flip_boxes(boxes, width)
# Scale the boxes to the detecting scale.
boxes *= im_scale
# Offset the boxes to align the cropping.
if im_offset is not None:
if min(im_offset[:2]) < 0:
raise ValueError('RandomCrop with mask is not supported.')
# Attach the classes and mask flags.
gt_boxes = np.empty((num_objects, 6), dtype=np.float32)
gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
gt_boxes[:, 5] = segm_flags # Has segmentation or not.
return gt_boxes, segms
def get(self, example):
example = Example(example)
# Resize.
target_size = npr.choice(self._scales)
img, im_scale = image_util.resize_image_with_target_size(
example.image,
target_size=target_size,
max_size=self._max_size,
random_scales=self._random_scales,
)
# Flip.
flipped = False
if self._use_flipped and npr.randint(2) > 0:
img = img[:, ::-1]
flipped = True
# Crop or Pad.
im_offset = None
if self._max_size == 0:
img, im_offset = image_util.get_image_with_target_size(
img, target_size)
# Distort.
if self._use_distort:
img = image_util.distort_image(img)
# Boxes and segmentations.
boxes, segms = self.get_boxes_and_segms(example, im_scale, im_offset, flipped)
# Standard outputs.
outputs = {'image': img,
'boxes': boxes,
'segms': segms,
'im_info': img.shape[:2] + (im_scale,)}
# Attach precomputed targets.
if len(boxes) > 0:
outputs.update(
self._anchor_sampler(
gt_boxes=boxes,
im_info=outputs['im_info']))
return outputs
def run(self):
# Disable the opencv threading.
cv2.setNumThreads(1)
# Fix the process-local random seed.
np.random.seed(self._seed)
# Main prefetch loop
while True:
outputs = self.get(self.q_in.get())
if len(outputs['boxes']) < 1:
continue # Ignore non-object image.
height, width = outputs['image'].shape[:2]
outputs['aspect_ratio'] = float(height) / float(width)
self.q_out.put(outputs)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
import numpy.random as npr
from seetadet.algo.faster_rcnn import utils as rcnn_util
from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util
from seetadet.utils import mask as mask_util
from seetadet.utils.env import new_tensor
class ProposalTarget(object):
"""Assign proposals to ground-truth targets."""
def __init__(self):
super(ProposalTarget, self).__init__()
self.resolution = cfg.MRCNN.RESOLUTION
self.num_classes = len(cfg.MODEL.CLASSES)
self.defaults = collections.OrderedDict([
('rois', np.array([[-1, 0, 0, 1, 1]], 'float32')),
('labels', np.array([-1], 'int64')),
('bbox_targets', np.zeros((1, 4), 'float32')),
('mask_targets', -np.ones((1, self.resolution, self.resolution), 'float32')),
])
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
# Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
all_rois = inputs['rois']
# Prepare for the outputs
keys = self.defaults.keys()
blobs = dict(map(lambda a, b: (a, b), keys, [[] for _ in keys]))
# Generate targets separately
for ix in range(num_images):
# GT boxes (x1, y1, x2, y2, label)
gt_boxes = inputs['gt_boxes'][ix]
gt_segms = inputs['gt_segms'][ix]
# Extract proposals for this image
rois = all_rois[np.where(all_rois[:, 0].astype('int32') == ix)[0]]
# Include ground-truth boxes in the set of candidate rois
inds = np.ones((gt_boxes.shape[0], 1), gt_boxes.dtype) * ix
rois = np.vstack((rois, np.hstack((inds, gt_boxes[:, :4]))))
# Sample a batch of RoIs for training
rois_per_image = cfg.FRCNN.BATCH_SIZE
fg_rois_per_image = np.round(cfg.FRCNN.FG_FRACTION * rois_per_image)
rcnn_util.map_returns_to_blobs(
sample_rois(
rois,
gt_boxes,
gt_segms,
rois_per_image,
fg_rois_per_image,
inputs['im_info'][ix][2],
), blobs, keys,
)
# Stack into continuous blobs
blobs = dict((k, np.concatenate(blobs[k])) for k in blobs.keys())
# Distribute rois into pyramids
k_min = cfg.FPN.ROI_MIN_LEVEL
k_max = cfg.FPN.ROI_MAX_LEVEL
num_levels = k_max - k_min + 1
levels = rcnn_util.map_rois_to_levels(blobs['rois'], k_min, k_max)
lvl_blobs = rcnn_util.map_blobs_by_levels(
blobs,
self.defaults,
[np.where(levels == (i + k_min))[0] for i in range(num_levels)],
)
rois_wide = [lvl_blobs['rois'][i] for i in range(num_levels)]
mask_rois_wide, mask_labels_wide = [], []
# Select the foreground RoIs only for bbox/mask branch
for i in range(num_levels):
inds = np.where(lvl_blobs['labels'][i] > 0)[0]
if len(inds) > 0:
mask_rois_wide.append(lvl_blobs['rois'][i][inds])
mask_labels_wide.append(lvl_blobs['labels'][i][inds] - 1)
lvl_blobs['mask_targets'][i] = lvl_blobs['mask_targets'][i][inds]
else:
mask_rois_wide.append(self.defaults['rois'])
mask_labels_wide.append(np.array([0], 'int64'))
lvl_blobs['mask_targets'][i] = self.defaults['mask_targets']
blobs = dict((k, np.concatenate(lvl_blobs[k])) for k in blobs.keys())
mask_labels = np.concatenate(mask_labels_wide)
fg_inds = np.where(blobs['labels'] > 0)[0]
bbox_cls_inds = np.arange(len(blobs['rois'])) * self.num_classes
mask_cls_inds = np.arange(len(mask_labels)) * (self.num_classes - 1)
# Sample a proposal randomly to avoid memory issue
if len(fg_inds) == 0:
fg_inds = np.random.randint(len(blobs['labels']), size=[1])
return {
'rois': [new_tensor(rois_wide[i]) for i in range(num_levels)],
'mask_rois': [new_tensor(mask_rois_wide[i]) for i in range(num_levels)],
'labels': new_tensor(blobs['labels']),
'bbox_inds': new_tensor(bbox_cls_inds[fg_inds] + blobs['labels'][fg_inds]),
'bbox_targets': new_tensor(blobs['bbox_targets'][fg_inds].astype('float32')),
'bbox_anchors': new_tensor(blobs['rois'][fg_inds, 1:].astype('float32')),
'mask_inds': new_tensor(mask_cls_inds + mask_labels),
'mask_targets': new_tensor(blobs['mask_targets']),
}
def compute_targets(
rois,
gt_boxes,
gt_labels,
fg_segms,
fg_segms_flag,
mask_size,
im_scale,
):
"""Compute the bounding-box regression targets."""
assert rois.shape[0] == gt_boxes.shape[0]
assert rois.shape[1] == 4
assert gt_boxes.shape[1] == 4
# Compute bbox regression targets
fg_inds = np.where(gt_labels > 0)[0]
bbox_targets = box_util.bbox_transform(rois, gt_boxes, cfg.BBOX_REG_WEIGHTS)
# Compute mask classification targets
mask_shape = [mask_size] * 2
mask_targets = -np.ones([len(rois)] + mask_shape, 'float32')
rois_ori = rois / im_scale
rois_ori_int = np.round(rois_ori).astype(int)
gt_boxes_ori_int = np.round(gt_boxes / im_scale).astype(int)
for i, fg_idx in enumerate(fg_inds):
if fg_segms_flag[i] > 0:
if isinstance(fg_segms[i], list):
target = mask_util.warp_mask_via_polygons(
fg_segms[i], rois_ori[i], mask_shape)
else:
target = mask_util.warp_mask_via_intersection(
fg_segms[i], rois_ori_int[i], gt_boxes_ori_int[i], mask_shape)
if target is not None:
mask_targets[fg_idx] = target.astype(mask_targets.dtype)
return bbox_targets, mask_targets
def sample_rois(
all_rois,
gt_boxes,
gt_segms,
num_rois,
num_fg_rois,
im_scale,
):
"""Sample a batch of RoIs comprising foreground and background examples."""
overlaps = box_util.bbox_overlaps(all_rois[:, 1:5], gt_boxes[:, :4])
gt_assignment = overlaps.argmax(axis=1)
max_overlaps = overlaps.max(axis=1)
labels = gt_boxes[gt_assignment, 4].astype('int64')
# Select foreground RoIs as those with >= FG_THRESH overlap
fg_inds = np.where(max_overlaps >= cfg.FRCNN.POSITIVE_OVERLAP)[0]
fg_rois_per_this_image = int(min(num_fg_rois, fg_inds.size))
# Sample foreground regions without replacement
if fg_inds.size > 0:
fg_inds = npr.choice(fg_inds, fg_rois_per_this_image, False)
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((max_overlaps < cfg.FRCNN.NEGATIVE_OVERLAP_HI) &
(max_overlaps >= cfg.FRCNN.NEGATIVE_OVERLAP_LO))[0]
# Compute number of background RoIs to take from this image
bg_rois_per_this_image = num_rois - fg_rois_per_this_image
bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size)
# Sample background regions without replacement
if bg_inds.size > 0:
bg_inds = npr.choice(bg_inds, bg_rois_per_this_image, False)
# The indices that we're selecting (both fg and bg)
keep_inds = np.append(fg_inds, bg_inds)
# Select sampled values from various arrays
rois, labels = all_rois[keep_inds], labels[keep_inds]
# Clamp labels for the background RoIs to 0
labels[fg_rois_per_this_image:] = 0
# Compute the target from RoIs
outputs = [rois, labels]
outputs += compute_targets(
rois[:, 1:5],
gt_boxes[gt_assignment[keep_inds], :4],
labels,
[gt_segms[i] for i in gt_assignment[fg_inds]],
gt_boxes[gt_assignment[fg_inds], 5],
cfg.MRCNN.RESOLUTION,
im_scale,
)
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import types
import dragon.vm.torch as torch
import numpy as np
from seetadet.algo.faster_rcnn import utils as rcnn_util
from seetadet.core.config import cfg
from seetadet.modeling.detector import new_detector
from seetadet.utils import env
from seetadet.utils import blob as blob_util
from seetadet.utils import boxes as box_util
from seetadet.utils import image as image_util
from seetadet.utils import logger
from seetadet.utils import nms as nms_util
from seetadet.utils import time_util
def get_data(raw_images):
"""Return the test data."""
max_size = cfg.TEST.MAX_SIZE
images_wide = []
image_shapes_wide, image_scales_wide = [], []
for img in raw_images:
images, image_scales = image_util.scale_image(
img, scales=cfg.TEST.SCALES, max_size=max_size)
images_wide += images
image_scales_wide += image_scales
image_shapes_wide += [img.shape[:2] for img in images]
images = blob_util.im_list_to_blob(
images_wide, coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
image_shapes = np.array(image_shapes_wide)
image_scales = np.array(image_scales_wide).reshape((len(images), -1))
images_info = np.hstack([image_shapes, image_scales]).astype('float32')
return images, images_info
def ims_detect(detector, raw_images, timer=None):
"""Detect a image, with single or multiple scales."""
images, images_info = get_data(raw_images)
timer.tic() if timer else timer
# Do forward
inputs = {'image': torch.from_numpy(images),
'im_info': torch.from_numpy(images_info)}
if not hasattr(detector, 'script_forward'):
def script_forward(self, image, im_info):
return self.forward({'image': image, 'im_info': im_info})
detector.script_forward = torch.jit.trace(
func=types.MethodType(script_forward, detector),
example_inputs=[inputs['image'], inputs['im_info']],
)
outputs = detector.script_forward(inputs['image'], inputs['im_info'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
# Decode results
batch_pred = box_util.bbox_transform_inv(
outputs['rois'][:, 1:5],
outputs['bbox_pred'],
cfg.BBOX_REG_WEIGHTS)
results = [([], [], []) for _ in range(len(raw_images))]
for i in range(len(images)):
ii = i // len(cfg.TEST.SCALES)
inds = np.where(outputs['rois'][:, 0].astype(np.int32) == i)[0]
boxes = batch_pred[inds] / images_info[i, 2]
boxes = box_util.clip_tiled_boxes(boxes, raw_images[ii].shape)
results[ii][0].append(outputs['cls_prob'][inds])
results[ii][1].append(boxes)
results[ii][2].append(np.ones((len(inds), 1), 'int32') * i)
# Merge from multiple scales
ret = [(np.vstack(s), np.vstack(b),
np.vstack(i), images_info[:, 2]) for s, b, i in results]
timer.toc() if timer else timer
return ret
def mask_detect(detector, rois):
k_min = cfg.FPN.ROI_MIN_LEVEL
k_max = cfg.FPN.ROI_MAX_LEVEL
k = k_max - k_min + 1
levels = rcnn_util.map_rois_to_levels(rois, k_min, k_max)
level_inds = [np.where(levels == (i + k_min))[0] for i in range(k)]
fpn_rois = rcnn_util.map_blobs_by_levels(
{'rois': rois[:, :5]},
{'rois': np.array([[-1, 0, 0, 1, 1]], 'float32')},
level_inds)['rois']
with torch.no_grad():
mask_score = detector.rcnn.compute_mask_score(
rois=[env.new_tensor(r.astype('float32')) for r in fpn_rois])
nc, i = mask_score.shape[1], 0
mask_inds = {}
for inds in level_inds:
for idx in inds:
cls = int(rois[idx, 5])
mask_inds[idx] = (i * nc + cls)
i += 1
if len(inds) == 0:
i += 1
mask_inds = list(map(mask_inds.get, sorted(mask_inds)))
mask_inds = env.new_tensor(np.array(mask_inds, 'int64'))
with torch.no_grad():
mask_pred = mask_score.index_select((0, 1), mask_inds)
return detector.rcnn.sigmoid(mask_pred).numpy().copy()
def get_detections(outputs):
"""Return the categorical detections from outputs."""
scores, boxes, batch_inds, im_scales = outputs
rois_this_image = []
boxes_this_image = [[]]
empty_detections = np.zeros((0, 5), 'float32')
empty_rois = np.zeros((0, 6), 'float32')
for j in range(1, len(cfg.MODEL.CLASSES)):
inds = np.where(scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
if len(inds) == 0:
boxes_this_image.append(empty_detections)
rois_this_image.append(empty_rois)
continue
cls_scores = scores[inds, j]
cls_boxes = boxes[inds, j * 4:(j + 1) * 4]
cls_batch_inds = batch_inds[inds]
cls_detections = np.hstack(
(cls_boxes, cls_scores[:, np.newaxis])) \
.astype(np.float32, copy=False)
if cfg.TEST.USE_SOFT_NMS:
keep = nms_util.soft_nms(
cls_detections,
thresh=cfg.TEST.NMS,
method=cfg.TEST.SOFT_NMS_METHOD,
sigma=cfg.TEST.SOFT_NMS_SIGMA,
)
else:
keep = nms_util.nms(
cls_detections,
thresh=cfg.TEST.NMS,
)
cls_detections = cls_detections[keep, :]
cls_batch_inds = cls_batch_inds[keep]
boxes_this_image.append(cls_detections)
rois_this_image.append(np.hstack((
cls_batch_inds,
cls_detections[:, :4] * im_scales[cls_batch_inds],
np.ones((len(keep), 1)) * (j - 1))))
return [boxes_this_image, rois_this_image]
def test_net(weights, q_in, q_out, device, root_logger=True):
"""Test a network trained with Mask R-CNN algorithm."""
cfg.GPU_ID = device
num_classes = len(cfg.MODEL.CLASSES)
logger.set_root_logger(root_logger)
detector = new_detector(device, weights)
timers = time_util.new_timers('im_detect_bbox', 'im_detect_mask', 'misc')
must_stop = False
while not must_stop:
# Wait inputs.
indices, raw_images = [], []
for _ in range(cfg.TEST.IMS_PER_BATCH):
i, raw_image = q_in.get()
if i < 0:
must_stop = True
break
indices.append(i)
raw_images.append(raw_image)
if len(raw_images) == 0:
continue
# Detect on specific scales.
all_outputs = ims_detect(
detector=detector,
raw_images=raw_images,
timer=timers['im_detect_bbox'],
)
# Post-processing.
for i, outputs in enumerate(all_outputs):
segms_this_image = [[]]
with timers['misc'].tic_and_toc():
boxes_this_image, rois_this_image = get_detections(outputs)
mask_rois = np.concatenate(rois_this_image)
if len(mask_rois) > 0:
k = 0
timers['im_detect_mask'].tic()
mask_pred = mask_detect(detector, mask_rois)
for j in range(1, num_classes):
num_pred = len(boxes_this_image[j])
cls_segms = mask_pred[k:k + num_pred]
segms_this_image.append(cls_segms)
k += num_pred
timers['im_detect_mask'].toc()
q_out.put((
indices[i],
dict([('im_detect', (timers['im_detect_bbox'].average_time +
timers['im_detect_mask'].average_time)),
('misc', timers['misc'].average_time)]),
dict([('boxes', boxes_this_image),
('masks', segms_this_image)]),
))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import math
import numpy as np
from seetadet.algo.faster_rcnn import generate_anchors as anchor_util
from seetadet.algo.faster_rcnn import utils as rcnn_util
from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util
from seetadet.utils.env import new_tensor
class AnchorTarget(object):
"""Assign ground-truth targets to anchors."""
def __init__(self):
super(AnchorTarget, self).__init__()
# Load the basic configs
k_max, k_min = cfg.FPN.RPN_MAX_LEVEL, cfg.FPN.RPN_MIN_LEVEL
scales_per_octave = cfg.RETINANET.SCALES_PER_OCTAVE
anchor_scale = cfg.RETINANET.ANCHOR_SCALE
self.strides = [2. ** lvl for lvl in range(k_min, k_max + 1)]
self.ratios = cfg.RETINANET.ASPECT_RATIOS
# Generate base anchors
self.base_anchors = []
for stride in self.strides:
sizes = [stride * anchor_scale *
(2 ** (octave / float(scales_per_octave)))
for octave in range(scales_per_octave)]
self.base_anchors.append(
anchor_util.generate_anchors_v2(
stride=stride,
ratios=self.ratios,
sizes=sizes))
# Plan the maximum anchor layout
max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
if cfg.MODEL.COARSEST_STRIDE > 0:
stride = float(cfg.MODEL.COARSEST_STRIDE)
max_size = int(math.ceil(max_size / stride) * stride)
self.max_shapes = [[math.ceil(max_size / stride)] * 2
for stride in self.strides]
self.all_coords = rcnn_util.get_shifted_coords(
self.max_shapes, self.base_anchors)
self.all_anchors = rcnn_util.get_shifted_anchors(
self.max_shapes, self.base_anchors, self.strides)
def sample_anchors(self, gt_boxes, im_info, all_anchors=None):
all_anchors = self.all_anchors \
if all_anchors is None else all_anchors
# Remove anchors separating from the image
inds_inside = np.where((all_anchors[:, 0] < im_info[1]) &
(all_anchors[:, 1] < im_info[0]))[0]
anchors = all_anchors[inds_inside, :]
num_inside = len(anchors)
labels = np.empty((num_inside,), dtype='int32')
labels.fill(-1)
# Overlaps between the anchors and the gt boxes.
overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
argmax_overlaps = overlaps.argmax(axis=1)
max_overlaps = overlaps[np.arange(num_inside), argmax_overlaps]
# Foreground: for each gt, anchor with highest overlap.
gt_argmax_overlaps = overlaps.argmax(axis=0)
gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])]
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
gt_assignment = argmax_overlaps[gt_argmax_overlaps]
labels[gt_argmax_overlaps] = gt_boxes[gt_assignment, 4]
# Foreground: above threshold IoU.
inds = max_overlaps >= cfg.RETINANET.POSITIVE_OVERLAP
gt_assignment = argmax_overlaps[inds]
labels[inds] = gt_boxes[gt_assignment, 4]
# Background: below threshold IoU.
labels[max_overlaps < cfg.RETINANET.NEGATIVE_OVERLAP] = 0
# Retract the clamping if we don't have one.
fg_inds = np.where(labels > 0)[0]
if len(fg_inds) == 0:
gt_assignment = argmax_overlaps[gt_argmax_overlaps]
labels[gt_argmax_overlaps] = gt_boxes[gt_assignment, 4]
fg_inds = np.where(labels > 0)[0]
# Select ignore labels to avoid too many negatives
# (~100x faster for 200 background indices)
ignore_inds = np.where(labels < 0)[0]
return inds_inside[fg_inds], inds_inside[ignore_inds]
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
shapes = [f.shape[-2:] for f in inputs['features']]
image_stride = sum(self.base_anchors[i].shape[0] * np.prod(shapes[i])
for i in range(len(inputs['features'])))
narrow_args = [self.all_coords, self.base_anchors, self.max_shapes, shapes]
outputs = collections.defaultdict(list)
# Label: ``1`` is positive, ``0`` is negative, ``-1` is don't care
output_labels = np.zeros((num_images, image_stride,), 'int64')
for ix in range(num_images):
fg_inds = inputs['fg_inds'][ix]
ignore_inds = inputs['bg_inds'][ix]
gt_boxes = inputs['gt_boxes'][ix]
# Narrow anchors to match the feature layout
anchors = self.all_anchors[fg_inds]
ignore_inds = rcnn_util.narrow_anchors(*(narrow_args + [ignore_inds]))
_, anchors = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds, anchors]))
fg_inds = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds]))
# Compute bbox targets
gt_assignment = box_util.bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
bbox_targets = box_util.bbox_transform(anchors, gt_boxes[gt_assignment, :4])
outputs['bbox_anchors'].append(anchors)
outputs['bbox_targets'].append(bbox_targets)
# Compute label assignments
output_labels[ix, ignore_inds] = -1
output_labels[ix, fg_inds] = gt_boxes[gt_assignment, 4]
# Compute sparse indices
fg_inds += ix * image_stride
outputs['bbox_inds'].extend([fg_inds])
return {
'labels': new_tensor(output_labels),
'bbox_inds': new_tensor(
np.concatenate(outputs['bbox_inds'])),
'bbox_targets': new_tensor(
np.concatenate(outputs['bbox_targets']).astype('float32')),
'bbox_anchors': new_tensor(
np.concatenate(outputs['bbox_anchors']).astype('float32')),
}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import types
import dragon.vm.torch as torch
import numpy as np
from seetadet.core.config import cfg
from seetadet.modeling.detector import new_detector
from seetadet.utils import blob as blob_util
from seetadet.utils import image as image_util
from seetadet.utils import logger
from seetadet.utils import nms as nms_util
from seetadet.utils import time_util
def get_data(raw_images):
"""Return the test data."""
max_size = cfg.TEST.MAX_SIZE
if cfg.PIPELINE.TYPE.lower() == 'ssd':
max_size = 0 # Warped to a fixed size
images_wide = []
image_shapes_wide, image_scales_wide = [], []
for img in raw_images:
images, image_scales = image_util.scale_image(
img, scales=cfg.TEST.SCALES, max_size=max_size)
images_wide += images
image_scales_wide += image_scales
image_shapes_wide += [img.shape[:2] for img in images]
images = blob_util.im_list_to_blob(
images_wide, coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
image_shapes = np.array(image_shapes_wide)
image_scales = np.array(image_scales_wide).reshape((len(images), -1))
images_info = np.hstack([image_shapes, image_scales]).astype('float32')
return images, images_info
def ims_detect(detector, raw_images, timer=None):
"""Detect images at single or multiple scales."""
images, images_info = get_data(raw_images)
timer.tic() if timer else timer
# Do Forward
inputs = {'image': torch.from_numpy(images),
'im_info': torch.from_numpy(images_info)}
if not hasattr(detector, 'script_forward'):
def script_forward(self, image, im_info):
return self.forward({'image': image, 'im_info': im_info})
detector.script_forward = torch.jit.trace(
func=types.MethodType(script_forward, detector),
example_inputs=[inputs['image'], inputs['im_info']],
)
outputs = detector.script_forward(inputs['image'], inputs['im_info'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
# Decode results
detections = outputs['detections']
results = [[] for _ in range(len(raw_images))]
for i in range(len(images)):
inds = np.where(detections[:, 0].astype(np.int32) == i)[0]
results[i // len(cfg.TEST.SCALES)].append(detections[inds, 1:])
# Merge from multiple scales
ret = [np.vstack(d) for d in results]
timer.toc() if timer else timer
return ret
def get_detections(outputs):
"""Return the categorical detections from outputs."""
num_classes = len(cfg.MODEL.CLASSES)
boxes_this_image = [[]]
raw_detections = outputs
empty_detections = np.zeros((0, 5), 'float32')
for j in range(1, num_classes):
cls_indices = np.where(
raw_detections[:, 5].astype(np.int32) == j)[0]
if len(cls_indices) == 0:
boxes_this_image.append(empty_detections)
continue
cls_boxes = raw_detections[cls_indices, :4]
cls_scores = raw_detections[cls_indices, 4]
cls_detections = np.hstack((
cls_boxes, cls_scores[:, np.newaxis])) \
.astype(np.float32, copy=False)
if cfg.TEST.USE_SOFT_NMS:
keep = nms_util.soft_nms(
cls_detections,
thresh=cfg.TEST.NMS,
method=cfg.TEST.SOFT_NMS_METHOD,
sigma=cfg.TEST.SOFT_NMS_SIGMA,
)
else:
keep = nms_util.nms(
cls_detections,
thresh=cfg.TEST.NMS,
)
cls_detections = cls_detections[keep, :]
boxes_this_image.append(cls_detections)
return [boxes_this_image]
def test_net(weights, q_in, q_out, device, root_logger=True):
"""Test a network trained with RetinaNet algorithm."""
cfg.GPU_ID = device
logger.set_root_logger(root_logger)
detector = new_detector(device, weights)
timers = time_util.new_timers('im_detect_bbox', 'misc')
must_stop = False
while not must_stop:
# Wait inputs.
indices, raw_images = [], []
for _ in range(cfg.TEST.IMS_PER_BATCH):
i, raw_image = q_in.get()
if i < 0:
must_stop = True
break
indices.append(i)
raw_images.append(raw_image)
if len(raw_images) == 0:
continue
# Detect on specific scales.
all_outputs = ims_detect(detector, raw_images, timers['im_detect_bbox'])
# Post-processing.
for i, outputs in enumerate(all_outputs):
with timers['misc'].tic_and_toc():
boxes_this_image, = get_detections(outputs)
q_out.put((
indices[i],
dict([('im_detect', timers['im_detect_bbox'].average_time),
('misc', timers['misc'].average_time)]),
dict([('boxes', boxes_this_image)]),
))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import math
import numpy as np
from seetadet.algo.ssd import generate_anchors as anchor_util
from seetadet.algo.ssd import utils as ssd_util
from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util
from seetadet.utils.env import new_tensor
class AnchorTarget(object):
"""Assign ground-truth targets to anchors."""
def __init__(self):
super(AnchorTarget, self).__init__()
# Load the basic configs
self.strides = cfg.SSD.STRIDES
anchor_sizes = cfg.SSD.ANCHOR_SIZES
aspect_ratios = cfg.SSD.ASPECT_RATIOS
self.base_anchors = []
for i in range(len(anchor_sizes)):
ratios = aspect_ratios[i]
if not isinstance(ratios, (tuple, list)):
# All strides share the same ratios
ratios = aspect_ratios
self.base_anchors.append(
anchor_util.generate_anchors(
min_sizes=[anchor_sizes[i][0]],
max_sizes=[anchor_sizes[i][1]],
ratios=ratios))
# Plan the fixed anchor layout
max_size = cfg.TRAIN.SCALES[0]
if cfg.MODEL.COARSEST_STRIDE > 0:
stride = float(cfg.MODEL.COARSEST_STRIDE)
max_size = int(math.ceil(max_size / stride) * stride)
shapes = [[math.ceil(max_size / stride)] * 2
for stride in self.strides]
self.all_anchors = ssd_util.get_shifted_anchors(
shapes, self.base_anchors, self.strides)
def sample_anchors(self, gt_boxes, all_anchors=None):
anchors = self.all_anchors \
if all_anchors is None else all_anchors
num_anchors = len(anchors)
labels = np.empty((num_anchors,), dtype='int32')
labels.fill(-1)
# Overlaps between the anchors and the gt boxes.
overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
argmax_overlaps = overlaps.argmax(axis=1)
max_overlaps = overlaps[np.arange(num_anchors), argmax_overlaps]
# Foreground: for each gt, anchor with highest overlap.
gt_argmax_overlaps = overlaps.argmax(axis=0)
gt_assignment = argmax_overlaps[gt_argmax_overlaps]
labels[gt_argmax_overlaps] = gt_boxes[gt_assignment, 4]
# Foreground: above threshold IoU.
inds = max_overlaps >= cfg.SSD.POSITIVE_OVERLAP
gt_assignment = argmax_overlaps[inds]
labels[inds] = gt_boxes[gt_assignment, 4]
fg_inds = np.where(labels > 0)[0]
# Negative: not matched and below threshold IoU.
neg_inds = np.where(labels <= 0)[0]
neg_overlaps = max_overlaps[neg_inds]
eligible_neg_inds = np.where(neg_overlaps < cfg.SSD.NEGATIVE_OVERLAP)[0]
neg_inds = neg_inds[eligible_neg_inds]
return fg_inds, neg_inds
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
neg_pos_ratio = cfg.SSD.NEGATIVE_POSITIVE_RATIO
image_stride = self.all_anchors.shape[0]
cls_prob = inputs['cls_prob'].numpy()
outputs = collections.defaultdict(list)
# Label: ``1`` is positive, ``0`` is negative, ``-1` is don't care
output_labels = np.empty((num_images, image_stride,), 'int64')
output_labels.fill(-1)
for ix in range(num_images):
fg_inds = inputs['fg_inds'][ix]
neg_inds = inputs['bg_inds'][ix]
gt_boxes = inputs['gt_boxes'][ix]
# Mining hard negatives as background.
num_pos, num_neg = len(fg_inds), len(neg_inds)
num_bg = min(int(num_pos * neg_pos_ratio), num_neg)
neg_loss = -np.log(np.maximum(
cls_prob[ix, neg_inds][np.arange(num_neg),
np.zeros((num_neg,), 'int32')],
np.finfo(float).eps))
bg_inds = neg_inds[np.argsort(-neg_loss)][:num_bg]
# Compute bbox targets.
anchors = self.all_anchors[fg_inds]
gt_assignment = box_util.bbox_overlaps(
anchors, gt_boxes).argmax(axis=1)
bbox_targets = box_util.bbox_transform(
anchors, gt_boxes[gt_assignment, :4],
cfg.BBOX_REG_WEIGHTS)
outputs['bbox_anchors'].append(anchors)
outputs['bbox_targets'].append(bbox_targets)
# Compute label assignments.
output_labels[ix, bg_inds] = 0
output_labels[ix, fg_inds] = gt_boxes[gt_assignment, 4]
# Compute sparse indices.
fg_inds += ix * image_stride
outputs['bbox_inds'].extend([fg_inds])
return {
'labels': new_tensor(output_labels),
'bbox_inds': new_tensor(
np.concatenate(outputs['bbox_inds'])),
'bbox_targets': new_tensor(
np.concatenate(outputs['bbox_targets']).astype('float32')),
'bbox_anchors': new_tensor(
np.concatenate(outputs['bbox_anchors']).astype('float32')),
}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import multiprocessing as mp
import time
import threading
import queue
import dragon
import dragon.vm.torch as torch
import numpy as np
from seetadet.algo.ssd import data_transformer
from seetadet.core.config import cfg
from seetadet.datasets.factory import get_dataset
from seetadet.utils import blob as blob_util
from seetadet.utils import logger
class DataLoader(object):
"""Provide mini-batches of data."""
def __init__(self):
super(DataLoader, self).__init__()
dataset = get_dataset(cfg.TRAIN.DATASET)
self.iterator = Iterator(**{
'dataset': dataset.cls,
'source': dataset.source,
'classes': dataset.classes,
'shuffle': cfg.TRAIN.USE_SHUFFLE,
'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
})
self.iterator.start()
def __call__(self):
outputs = self.iterator.next()
if isinstance(outputs['image'], np.ndarray):
outputs['image'] = torch.from_numpy(outputs['image'])
return outputs
class Iterator(threading.Thread):
"""Iterator to return the batch of data."""
def __init__(self, **kwargs):
super(Iterator, self).__init__()
# Distributed settings
rank, group_size = 0, 1
process_group = dragon.distributed.get_group()
if process_group is not None and \
kwargs.get('phase', 'TRAIN') == 'TRAIN':
group_size = process_group.size
rank = dragon.distributed.get_rank(process_group)
# Configuration
self._batch_size = kwargs.get('batch_size', 8)
self._num_readers = kwargs.get('num_readers', 1)
self._num_transformers = kwargs.get('num_transformers', 3)
self.daemon = True
# Initialize queues
num_batches = self._num_readers
self._queue1 = mp.Queue(num_batches * self._batch_size)
self._queue2 = mp.Queue(num_batches * self._batch_size)
self._queue3 = queue.Queue(num_batches)
# Initialize readers
self._readers = []
for i in range(self._num_readers):
part_idx, num_parts = i, self._num_readers
num_parts *= group_size
part_idx += rank * self._num_readers
self._readers.append(dragon.io.DataReader(
part_idx=part_idx, num_parts=num_parts, **kwargs))
self._readers[i]._seed += part_idx
self._readers[i].q_out = self._queue1
self._readers[i].start()
time.sleep(0.1)
# Initialize transformers
self._transformers = []
for i in range(self._num_transformers):
p = data_transformer.DataTransformer(**kwargs)
p._seed += (i + rank * self._num_transformers)
p.q_in, p.q_out = self._queue1, self._queue2
p.start()
self._transformers.append(p)
time.sleep(0.1)
# Register cleanup callbacks
def cleanup():
def terminate(processes):
for p in processes:
p.terminate()
p.join()
terminate(self._transformers)
logger.info('Terminate DataTransformer.')
terminate(self._readers)
logger.info('Terminate DataReader.')
import atexit
atexit.register(cleanup)
def next(self):
"""Return the next batch of data."""
return self.__next__()
def run(self):
"""Main loop."""
num_images = cfg.TRAIN.IMS_PER_BATCH
num_batches = cfg.TRAIN.ASPECT_GROUPING
logger.info('Initialize prefetching batches...')
example_buffer = [self._queue2.get()
for _ in range(num_images * num_batches)]
next_examples = []
while True:
# Use cached buffer for next N examples
if len(next_examples) == 0:
next_examples = example_buffer
example_buffer = []
# Prepare the next batch
outputs = collections.defaultdict(list)
for i in range(num_images):
example = next_examples.pop(0)
outputs['image'].append(example['image'])
outputs['gt_boxes'].append(example['boxes'])
outputs['im_info'].append(example['im_info'])
outputs['fg_inds'].append(example.get('fg_inds', None))
outputs['bg_inds'].append(example.get('bg_inds', None))
example_buffer.append(self._queue2.get())
outputs['image'] = blob_util.im_list_to_blob(
outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
# Send batch data to consumer
self._queue3.put(outputs)
def __iter__(self):
"""Return the iterator self."""
return self
def __next__(self):
"""Return the next batch of data."""
return self._queue3.get()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import multiprocessing
import cv2
import numpy as np
import numpy.random as npr
from seetadet.algo import common as algo_common
from seetadet.algo.ssd import transforms
from seetadet.core.config import cfg
from seetadet.datasets.example import Example
from seetadet.utils import boxes as box_util
class DataTransformer(multiprocessing.Process):
"""DataTransformer."""
def __init__(self, **kwargs):
super(DataTransformer, self).__init__()
self._scale = cfg.TRAIN.SCALES[0]
self._seed = cfg.RNG_SEED
self._use_diff = cfg.TRAIN.USE_DIFF
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._classes = kwargs.get('classes', ('__background__',))
self._num_classes = len(self._classes)
self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self._anchor_sampler = algo_common.AnchorSampler()
self._apply_transform = transforms.Compose(transforms.Distort(),
transforms.Expand(),
transforms.Sample(),
transforms.Resize())
self.q_in = self.q_out = None
self.daemon = True
def get_boxes(self, example, flipped):
objects, num_objects = example.objects, 0
height, width = example.height, example.width
if not self._use_diff:
for obj in objects:
if obj.get('difficult', 0) == 0:
num_objects += 1
else:
num_objects = len(objects)
boxes = np.zeros((num_objects, 4), 'float32')
gt_classes = np.zeros((num_objects,), 'int32')
# Filter the difficult instances.
object_idx = 0
for obj in objects:
if not self._use_diff and obj.get('difficult', 0) > 0:
continue
bbox = obj['bbox']
boxes[object_idx, :] = [max(0, bbox[0]),
max(0, bbox[1]),
min(bbox[2], width - 1),
min(bbox[3], height - 1)]
gt_classes[object_idx] = self._class_to_ind[obj['name']]
object_idx += 1
# Flip the boxes if necessary.
if flipped:
boxes = box_util.flip_boxes(boxes, width)
# Normalize.
boxes[:, 0::2] /= width
boxes[:, 1::2] /= height
# Attach the classes.
gt_boxes = np.empty((num_objects, 5), dtype=np.float32)
gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
return gt_boxes
def get(self, example):
example = Example(example)
img = example.image
# Flip.
flipped = False
if self._use_flipped and npr.randint(2) > 0:
img = img[:, ::-1]
flipped = True
# Boxes.
boxes = self.get_boxes(example, flipped)
# Return to avoid the invalid transforms.
if len(boxes) == 0:
return {'boxes': boxes}
# Distort => Expand => Sample => Resize
img, boxes = self._apply_transform(img, boxes)
# Restore to the blob scale.
boxes[:, :4] *= self._scale
# Standard outputs.
outputs = {'image': img,
'boxes': boxes,
'im_info': img.shape[:2]}
# Attach precomputed targets.
if len(boxes) > 0:
outputs.update(
self._anchor_sampler(
gt_boxes=boxes,
im_info=outputs['im_info']))
return outputs
def run(self):
# Disable the opencv threading.
cv2.setNumThreads(1)
# Fix the process-local random seed.
np.random.seed(self._seed)
# Main prefetch loop
while True:
outputs = self.get(self.q_in.get())
if len(outputs['boxes']) < 1:
continue # Ignore non-object image.
self.q_out.put(outputs)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def generate_anchors(min_sizes, max_sizes, ratios):
"""Generate anchors by enumerating aspect ratios and sizes."""
total_anchors = []
for idx, min_size in enumerate(min_sizes):
# Note that SSD assume it is a center anchor
base_anchor = np.array([0, 0, min_size, min_size])
anchors = _ratio_enum(base_anchor, ratios, _mkanchors)
if len(max_sizes) > 0:
max_size = max_sizes[idx]
_anchors = anchors[0].reshape((1, 4))
_anchors = np.vstack([
_anchors,
_max_size_enum(
base_anchor,
min_size,
max_size,
_mkanchors,
)])
anchors = np.vstack([_anchors, anchors[1:]])
total_anchors.append(anchors)
return np.vstack(total_anchors)
def _whctrs(anchor):
"""Return width, height, x center, and y center for an anchor (window)."""
w, h = anchor[2], anchor[3]
x_ctr, y_ctr = anchor[0], anchor[1]
return w, h, x_ctr, y_ctr
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""Given a vector of widths (ws) and heights (hs) around a center
(x_ctr, y_ctr), output a set of anchors (windows).
"""
ws, hs = ws[:, np.newaxis], hs[:, np.newaxis]
return np.hstack((
x_ctr - 0.5 * ws,
y_ctr - 0.5 * hs,
x_ctr + 0.5 * ws,
y_ctr + 0.5 * hs,
))
def _mkanchors_v2(ws, hs, x_ctr, y_ctr):
"""Given a vector of widths (ws) and heights (hs) around a center
(x_ctr, y_ctr), output a set of anchors (windows).
"""
ws, hs = ws[:, np.newaxis], hs[:, np.newaxis]
return np.hstack((0 * (ws) + x_ctr, 0 * (hs) + y_ctr, ws, hs))
def _ratio_enum(anchor, ratios, make_fn):
"""Enumerate a set of anchors for each aspect ratio wrt an anchor."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
size = w * h
size_ratios = size / ratios
hs = np.round(np.sqrt(size_ratios))
ws = np.round(hs * ratios)
return make_fn(ws, hs, x_ctr, y_ctr)
def _max_size_enum(base_anchor, min_size, max_size, make_fn):
"""Enumerate a anchor for max_size wrt base_anchor."""
w, h, x_ctr, y_ctr = _whctrs(base_anchor)
ws = hs = np.sqrt([min_size * max_size])
return make_fn(ws, hs, x_ctr, y_ctr)
if __name__ == '__main__':
print(generate_anchors(min_sizes=[30], max_sizes=[60], ratios=[1]))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import types
import dragon.vm.torch as torch
import numpy as np
from seetadet.core.config import cfg
from seetadet.modeling.detector import new_detector
from seetadet.utils import blob as blob_util
from seetadet.utils import boxes as box_util
from seetadet.utils import image as image_util
from seetadet.utils import logger
from seetadet.utils import nms as nms_util
from seetadet.utils import time_util
def get_data(raw_images):
"""Return the test data."""
images_wide, image_scales_wide = [], []
for img in raw_images:
images, image_scales = image_util.scale_image(
img, scales=cfg.TEST.SCALES, max_size=0)
images_wide += images
image_scales_wide += image_scales
images_wide = blob_util.im_list_to_blob(
images_wide, coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
return images_wide, image_scales_wide
def ims_detect(detector, raw_images, timer=None):
"""Detect images at single or multiple scales."""
images, image_scales = get_data(raw_images)
timer.tic() if timer else timer
# Do forward
inputs = {'image': torch.from_numpy(images)}
if not hasattr(detector, 'script_forward'):
def script_forward(self, image):
return self.forward({'image': image})
detector.script_forward = torch.jit.trace(
func=types.MethodType(script_forward, detector),
example_inputs=[inputs['image']],
)
outputs = detector.script_forward(inputs['image'])
timer.toc() if timer else timer
# Decode results
batch_pred = outputs['bbox_pred'].numpy()
batch_scores = outputs['cls_prob'].numpy()
results = [([], []) for _ in range(len(raw_images))]
for i in range(len(images)):
boxes = box_util.bbox_transform_inv(
outputs['prior_boxes'], batch_pred[i],
cfg.BBOX_REG_WEIGHTS)
boxes[:, 0::2] /= image_scales[i][1]
boxes[:, 1::2] /= image_scales[i][0]
boxes = box_util.clip_boxes(boxes, raw_images[i].shape)
results[i // len(cfg.TEST.SCALES)][0].append(batch_scores[i])
results[i // len(cfg.TEST.SCALES)][1].append(boxes)
# Merge from multiple scales
ret = [(np.vstack(s), np.vstack(b)) for s, b in results]
timer.toc() if timer else timer
return ret
def get_detections(outputs):
"""Return the categorical detections from outputs."""
scores, boxes = outputs
boxes_this_image = [[]]
for j in range(1, len(cfg.MODEL.CLASSES)):
inds = np.where(scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
cls_scores = scores[inds, j]
cls_boxes = boxes[inds]
pre_nms_inds = np.argsort(-cls_scores)[:cfg.TEST.PRE_NMS_TOP_N]
cls_scores = cls_scores[pre_nms_inds]
cls_boxes = cls_boxes[pre_nms_inds]
cls_detections = np.hstack(
(cls_boxes, cls_scores[:, np.newaxis])) \
.astype(np.float32, copy=False)
if cfg.TEST.USE_SOFT_NMS:
keep = nms_util.soft_nms(
cls_detections,
thresh=cfg.TEST.NMS,
method=cfg.TEST.SOFT_NMS_METHOD,
sigma=cfg.TEST.SOFT_NMS_SIGMA,
)
else:
keep = nms_util.nms(
cls_detections,
thresh=cfg.TEST.NMS,
)
cls_detections = cls_detections[keep, :]
boxes_this_image.append(cls_detections)
return [boxes_this_image]
def test_net(weights, q_in, q_out, device, root_logger=True):
"""Test a network trained with SSD algorithm."""
cfg.GPU_ID = device
logger.set_root_logger(root_logger)
detector = new_detector(device, weights)
timers = time_util.new_timers('im_detect_bbox', 'misc')
must_stop = False
while not must_stop:
# Wait inputs.
indices, raw_images = [], []
for _ in range(cfg.TEST.IMS_PER_BATCH):
i, raw_image = q_in.get()
if i < 0:
must_stop = True
break
indices.append(i)
raw_images.append(raw_image)
if len(raw_images) == 0:
continue
# Detect on specific scales.
all_outputs = ims_detect(
detector=detector,
raw_images=raw_images,
timer=timers['im_detect_bbox'],
)
# Post-processing.
for i, outputs in enumerate(all_outputs):
with timers['misc'].tic_and_toc():
boxes_this_image, = get_detections(outputs)
q_out.put((
indices[i],
dict([('im_detect', timers['im_detect_bbox'].average_time),
('misc', timers['misc'].average_time)]),
dict([('boxes', boxes_this_image)]),
))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import numpy as np
import numpy.random as npr
import PIL.Image
import PIL.ImageEnhance
from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util
from seetadet.utils import boxes_v2 as box_util_v2
from seetadet.utils import image as image_util
class Compose(object):
"""Compose the several transforms together."""
def __init__(self, *transforms):
self.transforms = transforms
def __call__(self, img, boxes):
for transform in self.transforms:
img, boxes = transform.apply(img, boxes)
return img, boxes
class Distort(object):
"""Distort the brightness, contrast and color of image."""
def __init__(self):
self._prob = 0.5 if cfg.TRAIN.USE_COLOR_JITTER else 0
def apply(self, img, boxes=None):
if self._prob > 0:
transforms = [PIL.ImageEnhance.Brightness,
PIL.ImageEnhance.Contrast,
PIL.ImageEnhance.Color]
npr.shuffle(transforms)
img = PIL.Image.fromarray(img)
for transform in transforms:
if npr.uniform() < self._prob:
img = transform(img)
img = img.enhance(1. + npr.uniform(-.4, .4))
img = np.array(img)
return img, boxes
class Expand(object):
"""Expand image to get smaller objects."""
def __init__(self):
self._max_ratio = 1. / cfg.TRAIN.RANDOM_SCALES[0]
self._expand_prob = 0.5 if self._max_ratio > 1 else 0
def apply(self, img, boxes=None):
prob = npr.uniform()
if prob > self._expand_prob:
return img, boxes
ratio = npr.uniform(1., self._max_ratio)
im_h, im_w = img.shape[:2]
expand_h, expand_w = int(im_h * ratio), int(im_w * ratio)
h_off = int(math.floor(npr.uniform(0., expand_h - im_h)))
w_off = int(math.floor(npr.uniform(0., expand_w - im_w)))
new_img = np.empty((expand_h, expand_w, 3), dtype=np.uint8)
new_img[:] = cfg.PIXEL_MEANS
new_img[h_off:h_off + im_h, w_off:w_off + im_w, :] = img
if boxes is not None:
new_boxes = boxes.astype(boxes.dtype, copy=True)
new_boxes[:, 0] = (boxes[:, 0] * im_w + w_off) / expand_w
new_boxes[:, 1] = (boxes[:, 1] * im_h + h_off) / expand_h
new_boxes[:, 2] = (boxes[:, 2] * im_w + w_off) / expand_w
new_boxes[:, 3] = (boxes[:, 3] * im_h + h_off) / expand_h
boxes = new_boxes
return new_img, boxes
class Resize(object):
"""Resize image."""
def __init__(self):
self._target_size = (cfg.TRAIN.SCALES[0],) * 2
def apply(self, img, boxes):
return image_util.resize_image(img, size=self._target_size), boxes
class Sample(object):
"""Crop image by sampling a region restricted by bounding boxes."""
def __init__(self):
min_scale, max_scale = \
cfg.PIPELINE.RANDOM_BBOX_CROP.SCALING
min_aspect_ratio, max_aspect_ratio = \
cfg.PIPELINE.RANDOM_BBOX_CROP.ASPECT_RATIO
self._samplers = [{'min_scale': 1.0,
'max_scale': 1.0,
'min_aspect_ratio': 1.0,
'max_aspect_ratio': 1.0,
'min_overlap': 0.0,
'max_overlap': 1.0,
'max_trials': 1,
'max_sample': 1}]
for min_overlap in cfg.PIPELINE.RANDOM_BBOX_CROP.THRESHOLDS:
self._samplers.append({'min_scale': min_scale,
'max_scale': max_scale,
'min_aspect_ratio': min_aspect_ratio,
'max_aspect_ratio': max_aspect_ratio,
'min_overlap': min_overlap,
'max_overlap': 1.0,
'max_trials': 10,
'max_sample': 1})
@classmethod
def _compute_overlaps(cls, rand_box, gt_boxes):
return box_util_v2.iou(np.expand_dims(rand_box, 0), gt_boxes[:, 0:4])
@classmethod
def _generate_sample(cls, sample_param):
min_scale = sample_param.get('min_scale', 1.)
max_scale = sample_param.get('max_scale', 1.)
scale = npr.uniform(min_scale, max_scale)
min_aspect_ratio = sample_param.get('min_aspect_ratio', 1.)
max_aspect_ratio = sample_param.get('max_aspect_ratio', 1.)
min_aspect_ratio = max(min_aspect_ratio, scale**2)
max_aspect_ratio = min(max_aspect_ratio, 1. / (scale**2))
aspect_ratio = npr.uniform(min_aspect_ratio, max_aspect_ratio)
bbox_w = scale * (aspect_ratio ** 0.5)
bbox_h = scale / (aspect_ratio ** 0.5)
w_off = npr.uniform(0., 1. - bbox_w)
h_off = npr.uniform(0., 1. - bbox_h)
return np.array([w_off, h_off, w_off + bbox_w, h_off + bbox_h])
@staticmethod
def _check_center(sample_box, gt_boxes):
ctr_x = (gt_boxes[:, 2] + gt_boxes[:, 0]) / 2.0
ctr_y = (gt_boxes[:, 3] + gt_boxes[:, 1]) / 2.0
# Keep the ground-truth box whose center is in the sample box
keep_indices = np.where((ctr_x >= sample_box[0]) & (ctr_x <= sample_box[2]) &
(ctr_y >= sample_box[1]) & (ctr_y <= sample_box[3]))[0]
return len(keep_indices) > 0
def _check_overlap(self, sample_box, gt_boxes, constraint):
min_overlap = constraint.get('min_overlap', None)
max_overlap = constraint.get('max_overlap', None)
if min_overlap is None and \
max_overlap is None:
return True
ovr = self._compute_overlaps(sample_box, gt_boxes).max()
if min_overlap is not None:
if ovr < min_overlap:
return False
if max_overlap is not None:
if ovr > max_overlap:
return False
return True
def _generate_batch_samples(self, gt_boxes):
sample_boxes = []
for sampler in self._samplers:
found = 0
for i in range(sampler['max_trials']):
if found >= sampler['max_sample']:
break
sample_box = self._generate_sample(sampler)
if sampler['min_overlap'] != 0. or \
sampler['max_overlap'] != 1.:
if not self._check_overlap(sample_box, gt_boxes, sampler):
continue
if not self._check_center(sample_box, gt_boxes):
continue
found += 1
sample_boxes.append(sample_box)
return sample_boxes
@classmethod
def _rand_crop(cls, im, rand_box, gt_boxes=None):
im_h, im_w = im.shape[:2]
w_off = int(rand_box[0] * im_w)
h_off = int(rand_box[1] * im_h)
crop_w = int((rand_box[2] - rand_box[0]) * im_w)
crop_h = int((rand_box[3] - rand_box[1]) * im_h)
new_im = im[h_off:h_off + crop_h, w_off:w_off + crop_w, :]
if gt_boxes is not None:
ctr_x = (gt_boxes[:, 2] + gt_boxes[:, 0]) / 2.0
ctr_y = (gt_boxes[:, 3] + gt_boxes[:, 1]) / 2.0
keep_indices = np.where((ctr_x >= rand_box[0]) & (ctr_x <= rand_box[2]) &
(ctr_y >= rand_box[1]) & (ctr_y <= rand_box[3]))[0]
gt_boxes = gt_boxes[keep_indices]
new_gt_boxes = gt_boxes.astype(gt_boxes.dtype, copy=True)
new_gt_boxes[:, 0] = (gt_boxes[:, 0] * im_w - w_off)
new_gt_boxes[:, 1] = (gt_boxes[:, 1] * im_h - h_off)
new_gt_boxes[:, 2] = (gt_boxes[:, 2] * im_w - w_off)
new_gt_boxes[:, 3] = (gt_boxes[:, 3] * im_h - h_off)
new_gt_boxes = box_util.clip_boxes(new_gt_boxes, (crop_h, crop_w))
new_gt_boxes[:, 0] = new_gt_boxes[:, 0] / crop_w
new_gt_boxes[:, 1] = new_gt_boxes[:, 1] / crop_h
new_gt_boxes[:, 2] = new_gt_boxes[:, 2] / crop_w
new_gt_boxes[:, 3] = new_gt_boxes[:, 3] / crop_h
return new_im, new_gt_boxes
return new_im, gt_boxes
def apply(self, img, boxes):
sample_boxes = self._generate_batch_samples(boxes)
if len(sample_boxes) > 0:
# Apply sampling if found at least one valid sample box
# Then randomly pick one
sample_idx = npr.randint(len(sample_boxes))
rand_box = sample_boxes[sample_idx]
img, boxes = self._rand_crop(img, rand_box, boxes)
return img, boxes
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def get_shifted_anchors(shapes, base_anchors, strides):
"""Return the shifted anchors on given shapes."""
anchors_to_pack = []
for i in range(len(shapes)):
height, width = shapes[i]
shift_x = (np.arange(0, width) + 0.5) * strides[i]
shift_y = (np.arange(0, height) + 0.5) * strides[i]
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose()
# Add a anchors (1, A, 4) to cell K shifts (K, 1, 4)
# to get shift anchors (K, A, 4) and reshape to (K * A, 4)
a = base_anchors[i].shape[0]
k = shifts.shape[0]
anchors = (base_anchors[i].reshape((1, a, 4)) +
shifts.reshape((1, k, 4)).transpose((1, 0, 2)))
anchors_to_pack.append(anchors.reshape((k * a, 4)))
return np.vstack(anchors_to_pack)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Platform backend."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import ctypes
import importlib.machinery
import os
import types
from dragon.vm import torch
def load_library(library_prefix):
"""Load a shared library."""
loader_details = (importlib.machinery.ExtensionFileLoader,
importlib.machinery.EXTENSION_SUFFIXES)
library_prefix = os.path.abspath(library_prefix)
lib_dir, fullname = os.path.split(library_prefix)
finder = importlib.machinery.FileFinder(lib_dir, loader_details)
ext_specs = finder.find_spec(fullname)
if ext_specs is None:
raise ImportError('Could not find the pre-built library '
'for <%s>.' % library_prefix)
ctypes.cdll.LoadLibrary(ext_specs.origin)
def trace_module(module, name, func, example_inputs=None):
"""Trace the function and bound to module."""
setattr(module, name, torch.jit.trace(
func=types.MethodType(func, module),
example_inputs=example_inputs))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from seetadet.utils.attrdict import AttrDict
cfg = __C = AttrDict()
###########################################
# #
# Pipeline Options #
# #
###########################################
__C.PIPELINE = AttrDict()
# The pipeline type
# Value supported as follows:
# - 'ssd'
# - 'rcnn'
# - 'default'
__C.PIPELINE.TYPE = 'default'
# RandomBBoxCrop
__C.PIPELINE.RANDOM_BBOX_CROP = AttrDict()
# - The range of scale for sampling regions
__C.PIPELINE.RANDOM_BBOX_CROP.SCALING = [0.3, 1.0]
# - The range of aspect ratio for sampling regions
__C.PIPELINE.RANDOM_BBOX_CROP.ASPECT_RATIO = [0.5, 2.0]
# - The minimum IoU to satisfy
__C.PIPELINE.RANDOM_BBOX_CROP.THRESHOLDS = [0.0, 0.1, 0.3, 0.5, 0.7, 0.9]
###########################################
# #
# Training Options #
# #
###########################################
__C.TRAIN = AttrDict()
# Initialize network with weights from this file
__C.TRAIN.WEIGHTS = ''
# Dataset to train
__C.TRAIN.DATASET = ''
# The number of threads to load train data
__C.TRAIN.NUM_THREADS = 4
# Scales to use during training (can list multiple scales)
# Each scale is the pixel size of an image's shortest side
__C.TRAIN.SCALES = (640,)
# Range to jitter the selected scale
__C.TRAIN.RANDOM_SCALES = [1., 1.]
# Max pixel size of the longest side of a scaled input image
__C.TRAIN.MAX_SIZE = 0
# Images to use per mini-batch
__C.TRAIN.IMS_PER_BATCH = 1
# The number of training batches to init for aspect grouping
__C.TRAIN.ASPECT_GROUPING = 64
# Use shuffled images during training?
__C.TRAIN.USE_SHUFFLE = True
# Use horizontally-flipped images during training?
__C.TRAIN.USE_FLIPPED = True
# Use the difficult(under occlusion) objects
__C.TRAIN.USE_DIFF = True
# If True, distort th brightness, contrast, and saturation
__C.TRAIN.USE_COLOR_JITTER = False
# NMS threshold used on RPN proposals
__C.TRAIN.RPN_NMS_THRESH = 0.7
# Number of top scoring boxes to keep before NMS to RPN proposals
__C.TRAIN.RPN_PRE_NMS_TOP_N = 12000
# Number of top scoring boxes to keep after NMS to RPN proposals
__C.TRAIN.RPN_POST_NMS_TOP_N = 2000
###########################################
# #
# Testing Options #
# #
###########################################
__C.TEST = AttrDict()
# Dataset to test
__C.TEST.DATASET = ''
# The test protocol for dataset
# Available protocols: 'voc2007', 'voc2010', 'coco'
__C.TEST.PROTOCOL = 'voc2007'
# Original json ground-truth file to use
__C.TEST.JSON_FILE = ''
# Scales to use during testing (can list multiple scales)
# Each scale is the pixel size of an image's shortest side
__C.TEST.SCALES = (640,)
# Max pixel size of the longest side of a scaled input image
__C.TEST.MAX_SIZE = 0
# Images to use per mini-batch
__C.TEST.IMS_PER_BATCH = 1
# The threshold for predicting boxes
__C.TEST.SCORE_THRESH = 0.05
# The threshold for predicting masks
__C.TEST.BINARY_THRESH = 0.5
# Number of top scoring boxes to keep before NMS to detections
__C.TEST.PRE_NMS_TOP_N = 300
# Overlap threshold used for NMS
__C.TEST.NMS = 0.3
# Use Soft-NMS instead of standard NMS?
# For the soft NMS overlap threshold, we simply use TEST.NMS
__C.TEST.USE_SOFT_NMS = False
__C.TEST.SOFT_NMS_METHOD = 'linear'
__C.TEST.SOFT_NMS_SIGMA = 0.5
# NMS threshold used on RPN proposals
__C.TEST.RPN_NMS_THRESH = 0.7
# Number of top scoring boxes to keep before NMS to RPN proposals
__C.TEST.RPN_PRE_NMS_TOP_N = 6000
# Number of top scoring boxes to keep after NMS to RPN proposals
__C.TEST.RPN_POST_NMS_TOP_N = 1000
# Number of top scoring boxes to keep before NMS to RetinaNet detections
__C.TEST.RETINANET_PRE_NMS_TOP_N = 3000
# Save detection results files if True
# If false, results files are cleaned up after evaluation
__C.TEST.COMPETITION_MODE = True
# Maximum number of detections to return per image
# 100 is based on the limit established for the COCO dataset
__C.TEST.DETECTIONS_PER_IM = 100
###########################################
# #
# Model Options #
# #
###########################################
__C.MODEL = AttrDict()
# The model type
# Value supported as follows:
# - 'faster_rcnn'
# - 'mask_rcnn'``
# - 'retinanet
# - 'ssd'
__C.MODEL.TYPE = ''
# The float precision for training and inference
# (FLOAT32, FLOAT16,)
__C.MODEL.PRECISION = 'FLOAT32'
# The backbone
__C.MODEL.BACKBONE = ''
# The backbone normalization module
# Values supported: 'FrozenBN', 'BN'
__C.MODEL.BACKBONE_NORM = 'FrozenBN'
# The name for each object class
__C.MODEL.CLASSES = ['__background__']
# Frozen the gradient since the convolution stage K
# The value of ``K`` is usually set to 2
__C.MODEL.FREEZE_AT = 2
# The variant of ReLU activation
# Values supported: 'ReLU', 'ReLU6'
__C.MODEL.RELU_VARIANT = 'ReLU'
# Setting of focal loss
__C.MODEL.FOCAL_LOSS_ALPHA = 0.25
__C.MODEL.FOCAL_LOSS_GAMMA = 2.0
# Stride of the coarsest feature level
# This is needed so the input can be padded properly
__C.MODEL.COARSEST_STRIDE = 32
###########################################
# #
# RPN Options #
# #
###########################################
__C.RPN = AttrDict()
# Total number of rpn training examples per image
__C.RPN.BATCH_SIZE = 256
# Target fraction of foreground examples per training batch
__C.RPN.FG_FRACTION = 0.5
# Strides for multiple rpn heads
__C.RPN.STRIDES = [4, 8, 16, 32, 64]
# Scales for multiple anchors
__C.RPN.SCALES = [8, 8, 8, 8, 8]
# RPN anchor aspect ratios
__C.RPN.ASPECT_RATIOS = [0.5, 1, 2]
# IoU overlap ratio for labeling an anchor as positive
# Anchors with >= iou overlap are labeled positive
__C.RPN.POSITIVE_OVERLAP = 0.7
# IoU overlap ratio for labeling an anchor as negative
# Anchors with < iou overlap are labeled negative
__C.RPN.NEGATIVE_OVERLAP = 0.3
# The optional loss for bbox regression
# Values supported: 'l1', 'smooth_l1'
__C.RPN.BBOX_REG_LOSS_TYPE = 'l1'
# Weight for bbox regression loss
__C.RPN.BBOX_REG_LOSS_WEIGHT = 1.0
###########################################
# #
# Retina-Net Options #
# #
###########################################
__C.RETINANET = AttrDict()
# Anchor aspect ratios to use
__C.RETINANET.ASPECT_RATIOS = (0.5, 1.0, 2.0)
# Anchor scales per octave
__C.RETINANET.SCALES_PER_OCTAVE = 3
# At each FPN level, we generate anchors based on their scale, aspect_ratio,
# stride of the level, and we multiply the resulting anchor by ANCHOR_SCALE
__C.RETINANET.ANCHOR_SCALE = 4
# Convolutions to use in the cls and bbox tower
# NOTE: this doesn't include the last conv for logits
__C.RETINANET.NUM_CONVS = 4
# IoU overlap ratio for labeling an anchor as positive
# Anchors with >= iou overlap are labeled positive
__C.RETINANET.POSITIVE_OVERLAP = 0.5
# IoU overlap ratio for labeling an anchor as negative
# Anchors with < iou overlap are labeled negative
__C.RETINANET.NEGATIVE_OVERLAP = 0.4
# The optional loss for bbox regression
# Values supported: 'l1', 'smooth_l1', 'giou'
__C.RETINANET.BBOX_REG_LOSS_TYPE = 'l1'
# Weight for bbox regression loss
__C.RETINANET.BBOX_REG_LOSS_WEIGHT = 1.0
###########################################
# #
# FPN Options #
# #
###########################################
__C.FPN = AttrDict()
# Channel dimension of the FPN feature levels
__C.FPN.DIM = 256
# Coarsest level of the FPN pyramid
__C.FPN.RPN_MAX_LEVEL = 6
# Finest level of the FPN pyramid
__C.FPN.RPN_MIN_LEVEL = 2
# Hyper-Parameters for the RoI-to-FPN level mapping heuristic
__C.FPN.ROI_CANONICAL_SCALE = 224
__C.FPN.ROI_CANONICAL_LEVEL = 4
# Coarsest level of the FPN pyramid
__C.FPN.ROI_MAX_LEVEL = 5
# Finest level of the FPN pyramid
__C.FPN.ROI_MIN_LEVEL = 2
###########################################
# #
# Fast R-CNN Options #
# #
###########################################
__C.FRCNN = AttrDict()
# Total number of training RoIs per image
__C.FRCNN.BATCH_SIZE = 128
# Target fraction of foreground RoIs per training batch
__C.FRCNN.FG_FRACTION = 0.25
# IoU overlap ratio for labeling a RoI as positive
# RoIs with >= iou overlap are labeled positive
__C.FRCNN.POSITIVE_OVERLAP = 0.5
# IoU overlap ratio for labeling a RoI as negative
# RoIs with iou overlap in [LO, HI) are labeled negative
__C.FRCNN.NEGATIVE_OVERLAP_HI = 0.5
__C.FRCNN.NEGATIVE_OVERLAP_LO = 0.0
# RoI transform function
# Values supported: 'RoIAlign', 'RoIPool'
__C.FRCNN.ROI_XFORM_METHOD = 'RoIAlign'
# RoI transform output resolution
__C.FRCNN.ROI_XFORM_RESOLUTION = 7
# Resampling window size for RoI transformation
__C.FRCNN.ROI_XFORM_SAMPLING_RATIO = 0
# Hidden layer dimension when using an MLP for the RoI box head
__C.FRCNN.MLP_HEAD_DIM = 1024
# The optional loss for bbox regression
# Values supported: 'l1', 'smooth_l1'
__C.FRCNN.BBOX_REG_LOSS_TYPE = 'l1'
# Weight for bbox regression loss
__C.FRCNN.BBOX_REG_LOSS_WEIGHT = 1.0
###########################################
# #
# Mask R-CNN Options #
# #
###########################################
__C.MRCNN = AttrDict()
# Resolution of mask predictions
__C.MRCNN.RESOLUTION = 28
# RoI transform function
# Values supported: 'RoIAlign', 'RoIPool'
__C.MRCNN.ROI_XFORM_METHOD = 'RoIAlign'
# RoI transform output resolution
__C.MRCNN.ROI_XFORM_RESOLUTION = 14
# Resampling window size for RoI transformation
__C.MRCNN.ROI_XFORM_SAMPLING_RATIO = 0
###########################################
# #
# SSD Options #
# #
###########################################
__C.SSD = AttrDict()
# Convolutions to use in the cls and bbox tower
# NOTE: this doesn't include the last conv for logits
__C.SSD.NUM_CONVS = 0
# Anchor aspect ratios to use
__C.SSD.ASPECT_RATIOS = []
# Strides for multiple ssd heads
__C.SSD.STRIDES = []
# Anchor sizes to use
__C.SSD.ANCHOR_SIZES = []
# IoU overlap ratio for labeling an anchor as positive
# Anchors with >= iou overlap are labeled positive
__C.SSD.POSITIVE_OVERLAP = 0.5
# IoU overlap ratio for labeling an anchor as negative
# Anchors with < iou overlap are labeled negative
__C.SSD.NEGATIVE_OVERLAP = 0.5
# The ratio to sample negative anchors as background
__C.SSD.NEGATIVE_POSITIVE_RATIO = 3.0
# The optional loss for bbox regression
# Values supported: 'l1', 'smooth_l1', 'giou'
__C.SSD.BBOX_REG_LOSS_TYPE = 'l1'
# Weight for bbox regression loss
__C.SSD.BBOX_REG_LOSS_WEIGHT = 1.0
###########################################
# #
# ResNet Options #
# #
###########################################
__C.RESNET = AttrDict()
# Number of groups to use
# 1 ==> ResNet; > 1 ==> ResNeXt
# ResNext 32x8d: NUM_GROUPS, WIDTH_PER_GROUP = 32, 8
# ResNext 64x4d: NUM_GROUPS, WIDTH_PER_GROUP = 64, 4
__C.RESNET.NUM_GROUPS = 1
# Baseline width of each group
__C.RESNET.WIDTH_PER_GROUP = 64
###########################################
# #
# Solver Options #
# #
###########################################
__C.SOLVER = AttrDict()
# The interval to display logs
__C.SOLVER.DISPLAY = 20
# The interval to snapshot a model
__C.SOLVER.SNAPSHOT_EVERY = 5000
# Prefix to yield the path: <prefix>_iter_XYZ.pkl
__C.SOLVER.SNAPSHOT_PREFIX = ''
# Optional scaling factor for total loss
# This option is helpful to scale the magnitude
# of gradients during FP16 training
__C.SOLVER.LOSS_SCALING = 1.0
# Maximum number of SGD iterations
__C.SOLVER.MAX_STEPS = 40000
# Base learning rate for the specified schedule
__C.SOLVER.BASE_LR = 0.001
# The uniform interval for LRScheduler
__C.SOLVER.DECAY_STEP = 1
# The custom intervals for LRScheduler
__C.SOLVER.DECAY_STEPS = []
# The decay factor for exponential LRScheduler
__C.SOLVER.DECAY_GAMMA = 0.1
# Warm up to ``BASE_LR`` over this number of steps
__C.SOLVER.WARM_UP_STEPS = 500
# Start the warm up from ``BASE_LR`` * ``FACTOR``
__C.SOLVER.WARM_UP_FACTOR = 0.333
# The type of LRScheduler
__C.SOLVER.LR_POLICY = 'steps_with_decay'
# Momentum to use with SGD
__C.SOLVER.MOMENTUM = 0.9
# L2 regularization for weight parameters
__C.SOLVER.WEIGHT_DECAY = 0.0001
# L2 regularization for legacy bias parameters
__C.SOLVER.WEIGHT_DECAY_BIAS = 0.0
# L2 norm factor for clipping gradients
__C.SOLVER.CLIP_NORM = 0.0
###########################################
# #
# Misc Options #
# #
###########################################
# Number of GPUs to use during training
__C.NUM_GPUS = 1
# Use NCCL for all reduce, otherwise use cuda-aware mpi
__C.USE_NCCL = True
# Hosts for Inter-Machine communication
__C.HOSTS = []
# Pixel stddev and mean values (BGR order)
__C.PIXEL_STDS = [1.0, 1.0, 1.0]
__C.PIXEL_MEANS = [103.53, 116.28, 123.675]
# Default weights on (dx, dy, dw, dh) for normalizing bbox regression targets
# These are empirically chosen to approximately lead to unit variance targets
__C.BBOX_REG_WEIGHTS = (10., 10., 5., 5.)
# Prior prob for the positives at the beginning of training.
# This is used to set the bias init for the logits layer
__C.PRIOR_PROB = 0.01
# For reproducibility
__C.RNG_SEED = 3
# Place outputs under an experiments directory
__C.EXP_DIR = ''
# Default GPU device index
__C.GPU_ID = 0
# Show detection visualizations
__C.VIS = False
# Write detection visualizations instead of showing
__C.VIS_ON_FILE = False
# Score threshold for visualization
__C.VIS_TH = 0.7
# Write summaries by TensorBoard
__C.ENABLE_TENSOR_BOARD = False
def cfg_from_file(filename):
"""Load a config file and merge it into the default options."""
import yaml
with open(filename, 'r') as f:
yaml_cfg = AttrDict(yaml.safe_load(f))
global __C
_merge_a_into_b(yaml_cfg, __C)
def cfg_from_list(cfg_list):
"""Set config keys via list (e.g., from command line)."""
from ast import literal_eval
assert len(cfg_list) % 2 == 0
for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
key_list = k.split('.')
d = __C
for sub_key in key_list[:-1]:
assert sub_key in d
d = d[sub_key]
sub_key = key_list[-1]
assert sub_key in d
try:
value = literal_eval(v)
except: # noqa
# Handle the case when v is a string literal
value = v
if type(value) != type(d[sub_key]): # noqa
raise TypeError('Type {} does not match original type {}'
.format(type(value), type(d[sub_key])))
d[sub_key] = value
def _merge_a_into_b(a, b):
"""Merge config dictionary a into config dictionary b, clobbering the
options in b whenever they are also specified in a."""
if not isinstance(a, dict):
return
for k, v in a.items():
# a must specify keys that are in b
if k not in b:
raise KeyError('{} is not a valid config key'.format(k))
# The types must match, too
v = _check_and_coerce_cfg_value_type(v, b[k], k)
# Recursively merge dicts
if type(v) is AttrDict:
try:
_merge_a_into_b(a[k], b[k])
except: # noqa
print('Error under config key: {}'.format(k))
raise
else:
b[k] = v
def _check_and_coerce_cfg_value_type(value_a, value_b, key):
"""Check if the value type matched."""
type_a, type_b = type(value_a), type(value_b)
if type_a is type_b:
return value_a
if type_b is float and type_a is int:
return float(value_a)
# Exceptions: numpy arrays, strings, tuple<->list
if isinstance(value_b, np.ndarray):
value_a = np.array(value_a, dtype=value_b.dtype)
elif isinstance(value_a, tuple) and isinstance(value_b, list):
value_a = list(value_a)
elif isinstance(value_a, list) and isinstance(value_b, tuple):
value_a = tuple(value_a)
elif isinstance(value_a, dict) and isinstance(value_b, AttrDict):
value_a = AttrDict(value_a)
else:
raise ValueError(
'Type mismatch ({} vs. {}) with values ({} vs. {}) for config '
'key: {}'.format(type_b, type_a, value_b, value_a, key))
return value_a
...@@ -8,9 +8,11 @@ ...@@ -8,9 +8,11 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Platform configurations."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from seetadet.algo.common.anchor_sampler import AnchorSampler # Variables
from seetadet.core.config.defaults import cfg # noqa
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Default configurations."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.core.config.yacs import CfgNode
_C = cfg = CfgNode()
# ------------------------------------------------------------
# Augmentation options
# ------------------------------------------------------------
_C.AUG = CfgNode()
# The probability to distort the color
_C.AUG.COLOR_JITTER = 0.0
# The crop size
# Disable cropping always if crop size <= 0
_C.AUG.CROP_SIZE = 0
# ------------------------------------------------------------
# Training options
# ------------------------------------------------------------
_C.TRAIN = CfgNode()
# Initialize network with weights from this file
_C.TRAIN.WEIGHTS = ''
# The train dataset
_C.TRAIN.DATASET = ''
# The loader type for training
_C.TRAIN.LOADER = 'det_train'
# The number of workers to load train data
_C.TRAIN.NUM_WORKERS = 3
# Scales to use during training (can list multiple scales)
# Each scale is the pixel size of an image shortest side
_C.TRAIN.SCALES = (640,)
# Range to jitter the image scales randomly
_C.TRAIN.SCALES_RANGE = (1.0, 1.0)
# Max pixel size of the longest side of a scaled input image
_C.TRAIN.MAX_SIZE = 1066
# Images to use per mini-batch
_C.TRAIN.IMS_PER_BATCH = 1
# Use the difficult (under occlusion) objects
_C.TRAIN.USE_DIFF = True
# ------------------------------------------------------------
# Testing options
# ------------------------------------------------------------
_C.TEST = CfgNode()
# The test dataset
_C.TEST.DATASET = ''
# THE JSON format dataset with annotations for evaluation
_C.TEST.JSON_DATASET = ''
# The loader type for testing
_C.TEST.LOADER = 'det_test'
# The evaluator type for dataset
_C.TEST.EVALUATOR = ''
# Scales to use during testing (can list multiple scales)
# Each scale is the pixel size of an image's shortest side
_C.TEST.SCALES = (640,)
# Max pixel size of the longest side of a scaled input image
_C.TEST.MAX_SIZE = 1066
# Images to use per mini-batch
_C.TEST.IMS_PER_BATCH = 1
# The threshold for predicting boxes
_C.TEST.SCORE_THRESH = 0.05
# The threshold for predicting masks
_C.TEST.BINARY_THRESH = 0.5
# Overlap threshold used for NMS
_C.TEST.NMS_THRESH = 0.5
# Maximum number of detections to return per image
# 100 is based on the limit established for the COCO dataset
_C.TEST.DETECTIONS_PER_IM = 100
# ------------------------------------------------------------
# Model options
# ------------------------------------------------------------
_C.MODEL = CfgNode()
# The model type
_C.MODEL.TYPE = ''
# The compute precision
_C.MODEL.PRECISION = 'float32'
# The name for each object class
_C.MODEL.CLASSES = ['__background__']
# Pixel mean and stddev values for image normalization (BGR order)
_C.MODEL.PIXEL_MEAN = [103.53, 116.28, 123.675]
_C.MODEL.PIXEL_STD = [57.375, 57.12, 58.395]
# Focal loss parameters
_C.MODEL.FOCAL_LOSS_ALPHA = 0.25
_C.MODEL.FOCAL_LOSS_GAMMA = 2.0
# ------------------------------------------------------------
# Backbone options
# ------------------------------------------------------------
_C.BACKBONE = CfgNode()
# The backbone type
_C.BACKBONE.TYPE = ''
# The normalization in backbone modules
_C.BACKBONE.NORM = 'FrozenBN'
# Freeze backbone since the stage K
# The value of ``K`` is usually set to 2
_C.BACKBONE.FREEZE_AT = 2
# Stride of the coarsest feature
# This is needed so the input can be padded properly
_C.BACKBONE.COARSEST_STRIDE = 32
# ------------------------------------------------------------
# FPN options
# ------------------------------------------------------------
_C.FPN = CfgNode()
# Finest level of the FPN pyramid
_C.FPN.MIN_LEVEL = 3
# Coarsest level of the FPN pyramid
_C.FPN.MAX_LEVEL = 7
# The number of repeated fpn cells.
_C.FPN.NUM_CELLS = 1
# Channel dimension of the FPN feature levels
_C.FPN.DIM = 256
# The FPN conv module
_C.FPN.CONV = 'Conv2d'
# The fpn normalization module
_C.FPN.NORM = ''
# The fpn activation module
_C.FPN.ACTIVATION = ''
# The feature fusion method
# Values supported: 'sum', 'attn'
_C.FPN.FUSE_TYPE = 'sum'
# ------------------------------------------------------------
# Anchor generator options
# ------------------------------------------------------------
_C.ANCHOR_GENERATOR = CfgNode()
# The stride of each level
_C.ANCHOR_GENERATOR.STRIDES = [8, 16, 32, 64, 128]
# The anchor size of each stride
_C.ANCHOR_GENERATOR.SIZES = [[32], [64], [128], [256], [512]]
# The aspect ratios of each stride
_C.ANCHOR_GENERATOR.ASPECT_RATIOS = [[0.5, 1.0, 2.0]]
# ------------------------------------------------------------
# RPN options
# ------------------------------------------------------------
_C.RPN = CfgNode()
# Total number of rpn training anchors per image
_C.RPN.BATCH_SIZE = 256
# Fraction of foreground anchors per training batch
_C.RPN.POSITIVE_FRACTION = 0.5
# IoU overlap ratio for labeling an anchor as positive
# Anchors with >= iou overlap are labeled positive
_C.RPN.POSITIVE_OVERLAP = 0.7
# IoU overlap ratio for labeling an anchor as negative
# Anchors with < iou overlap are labeled negative
_C.RPN.NEGATIVE_OVERLAP = 0.3
# NMS threshold used on RPN proposals
_C.RPN.NMS_THRESH = 0.7
# Number of top scoring boxes to keep before NMS to RPN proposals
_C.RPN.PRE_NMS_TOP_N_TRAIN = 12000
_C.RPN.PRE_NMS_TOP_N_TEST = 6000
# Number of top scoring boxes to keep after NMS to RPN proposals
_C.RPN.POST_NMS_TOP_N_TRAIN = 2000
_C.RPN.POST_NMS_TOP_N_TEST = 1000
# The optional loss for bbox regression
_C.RPN.BBOX_REG_LOSS_TYPE = 'l1'
# Weight for bbox regression loss
_C.RPN.BBOX_REG_LOSS_WEIGHT = 1.0
# ------------------------------------------------------------
# RetinaNet options
# ------------------------------------------------------------
_C.RETINANET = CfgNode()
# Number of conv layers to stack in the head
_C.RETINANET.NUM_CONV = 4
# The head conv module
_C.RETINANET.CONV = 'Conv2d'
# The head normalization module
_C.RETINANET.NORM = ''
# The head activation module
_C.RETINANET.ACTIVATION = 'ReLU'
# IoU overlap ratio for labeling an anchor as positive
# Anchors with >= iou overlap are labeled positive
_C.RETINANET.POSITIVE_OVERLAP = 0.5
# IoU overlap ratio for labeling an anchor as negative
# Anchors with < iou overlap are labeled negative
_C.RETINANET.NEGATIVE_OVERLAP = 0.4
# Number of top scoring boxes to keep before NMS
_C.RETINANET.PRE_NMS_TOP_N = 6000
# The bbox regression loss type
_C.RETINANET.BBOX_REG_LOSS_TYPE = 'l1'
# The weight for bbox regression loss
_C.RETINANET.BBOX_REG_LOSS_WEIGHT = 1.0
# ------------------------------------------------------------
# FastRCNN options
# ------------------------------------------------------------
_C.FRCNN = CfgNode()
# Total number of training RoIs per image
_C.FRCNN.BATCH_SIZE = 512
# The finest level of RoI feature
_C.FRCNN.MIN_LEVEL = 2
# The coarsest level of RoI feature
_C.FRCNN.MAX_LEVEL = 5
# Fraction of foreground RoIs per training batch
_C.FRCNN.POSITIVE_FRACTION = 0.25
# IoU overlap ratio for labeling a RoI as positive
# RoIs with >= iou overlap are labeled positive
_C.FRCNN.POSITIVE_OVERLAP = 0.5
# IoU overlap ratio for labeling a RoI as negative
# RoIs with < iou overlap are labeled negative
_C.FRCNN.NEGATIVE_OVERLAP = 0.5
# RoI pooler type
_C.FRCNN.POOLER_TYPE = 'RoIAlign'
# The output size of of RoI pooler
_C.FRCNN.POOLER_RESOLUTION = 7
# The resampling window size of RoI pooler
_C.FRCNN.POOLER_SAMPLING_RATIO = 0
# The number of conv layers to stack in the head
_C.FRCNN.NUM_CONV = 0
# The number of fc layers to stack in the head
_C.FRCNN.NUM_FC = 2
# The hidden dimension of conv head
_C.FRCNN.CONV_HEAD_DIM = 256
# The hidden dimension of fc head
_C.FRCNN.FC_HEAD_DIM = 1024
# The head normalization module
_C.FRCNN.NORM = ''
# The bbox regression loss type
_C.FRCNN.BBOX_REG_LOSS_TYPE = 'l1'
# The weight for bbox regression loss
_C.FRCNN.BBOX_REG_LOSS_WEIGHT = 1.0
# The weights on (dx, dy, dw, dh) for normalizing bbox regression targets
_C.FRCNN.BBOX_REG_WEIGHTS = (10., 10., 5., 5.)
# ------------------------------------------------------------
# MaskRCNN options
# ------------------------------------------------------------
_C.MRCNN = CfgNode()
# RoI pooler type
_C.MRCNN.POOLER_TYPE = 'RoIAlign'
# The output size of of RoI pooler
_C.MRCNN.POOLER_RESOLUTION = 14
# The resampling window size of RoI pooler
_C.MRCNN.POOLER_SAMPLING_RATIO = 0
# The number of conv layers to stack in the head
_C.MRCNN.NUM_CONV = 4
# The hidden dimension of conv head
_C.MRCNN.CONV_HEAD_DIM = 256
# The head normalization module
_C.MRCNN.NORM = ''
# ------------------------------------------------------------
# SSD options
# ------------------------------------------------------------
_C.SSD = CfgNode()
# Number of conv layers to stack in the cls and bbox tower
_C.SSD.NUM_CONVS = 0
# The head conv module
_C.SSD.CONV = 'Conv2d'
# The head normalization module
_C.SSD.NORM = ''
# Fraction of foreground anchors per training batch
_C.SSD.POSITIVE_FRACTION = 0.25
# IoU overlap ratio for labeling an anchor as positive
# Anchors with >= iou overlap are labeled positive
_C.SSD.POSITIVE_OVERLAP = 0.5
# IoU overlap ratio for labeling an anchor as negative
# Anchors with < iou overlap are labeled negative
_C.SSD.NEGATIVE_OVERLAP = 0.5
# Number of top scoring boxes to keep before NMS
_C.SSD.PRE_NMS_TOP_N = 300
# The optional loss for bbox regression
# Values supported: 'l1', 'smooth_l1', 'giou'
_C.SSD.BBOX_REG_LOSS_TYPE = 'l1'
# Weight for bbox regression loss
_C.SSD.BBOX_REG_LOSS_WEIGHT = 1.0
# The weights on (dx, dy, dw, dh) for normalizing bbox regression targets
_C.SSD.BBOX_REG_WEIGHTS = (10., 10., 5., 5.)
# ------------------------------------------------------------
# Solver options
# ------------------------------------------------------------
_C.SOLVER = CfgNode()
# The interval to display logs
_C.SOLVER.DISPLAY = 20
# The interval to snapshot a model
_C.SOLVER.SNAPSHOT_EVERY = 5000
# Prefix to yield the path: <prefix>_iter_XYZ.pkl
_C.SOLVER.SNAPSHOT_PREFIX = ''
# Loss scaling factor for mixed precision training
_C.SOLVER.LOSS_SCALE = 1024.0
# Maximum number of SGD iterations
_C.SOLVER.MAX_STEPS = 40000
# Base learning rate for the specified scheduler
_C.SOLVER.BASE_LR = 0.001
# Minimal learning rate for the specified scheduler
_C.SOLVER.MIN_LR = 0.0
# The decay intervals for LRScheduler
_C.SOLVER.DECAY_STEPS = []
# The decay factor for exponential LRScheduler
_C.SOLVER.DECAY_GAMMA = 0.1
# Warm up to ``BASE_LR`` over this number of steps
_C.SOLVER.WARM_UP_STEPS = 1000
# Start the warm up from ``BASE_LR`` * ``FACTOR``
_C.SOLVER.WARM_UP_FACTOR = 0.1
# The type of optimizier
_C.SOLVER.OPTIMIZER = 'SGD'
# The type of lr scheduler
_C.SOLVER.LR_POLICY = 'steps_with_decay'
# The layer-wise lr decay
_C.SOLVER.LAYER_LR_DECAY = 1.0
# Momentum to use with SGD
_C.SOLVER.MOMENTUM = 0.9
# L2 regularization for weight parameters
_C.SOLVER.WEIGHT_DECAY = 0.0001
# L2 norm factor for clipping gradients
_C.SOLVER.CLIP_NORM = 0.0
# ------------------------------------------------------------
# Misc options
# ------------------------------------------------------------
# Number of GPUs for distributed training
_C.NUM_GPUS = 1
# Random seed for reproducibility
_C.RNG_SEED = 3
# Place outputs under an experiments directory
_C.EXP_DIR = ''
# Default GPU device index
_C.GPU_ID = 0
# Write summaries by TensorBoard
_C.ENABLE_TENSOR_BOARD = False
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# Codes are based on:
#
# <https://github.com/rbgirshick/yacs/blob/master/yacs/config.py>
#
# ------------------------------------------------------------
"""Yet Another Configuration System (YACS)."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import copy
import numpy as np
import yaml
class CfgNode(dict):
"""Node for configuration options."""
IMMUTABLE = '__immutable__'
def __init__(self, *args, **kwargs):
super(CfgNode, self).__init__(*args, **kwargs)
self.__dict__[CfgNode.IMMUTABLE] = False
def clone(self):
"""Recursively copy this CfgNode."""
return copy.deepcopy(self)
def freeze(self):
"""Make this CfgNode and all of its children immutable."""
self._immutable(True)
def is_frozen(self):
"""Return mutability."""
return self.__dict__[CfgNode.IMMUTABLE]
def merge_from_file(self, cfg_filename):
"""Load a yaml config file and merge it into this CfgNode."""
with open(cfg_filename, 'r') as f:
other_cfg = CfgNode(yaml.safe_load(f))
self.merge_from_other_cfg(other_cfg)
def merge_from_list(self, cfg_list):
"""Merge config (keys, values) in a list into this CfgNode."""
assert len(cfg_list) % 2 == 0
from ast import literal_eval
for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
key_list = k.split('.')
d = self
for sub_key in key_list[:-1]:
assert sub_key in d
d = d[sub_key]
sub_key = key_list[-1]
assert sub_key in d
try:
value = literal_eval(v)
except: # noqa
# Handle the case when v is a string literal
value = v
if type(value) != type(d[sub_key]): # noqa
raise TypeError('Type {} does not match original type {}'
.format(type(value), type(d[sub_key])))
d[sub_key] = value
def merge_from_other_cfg(self, other_cfg):
"""Merge ``other_cfg`` into this CfgNode."""
_merge_a_into_b(other_cfg, self)
def _immutable(self, is_immutable):
"""Set immutability recursively to all nested CfgNode."""
self.__dict__[CfgNode.IMMUTABLE] = is_immutable
for v in self.__dict__.values():
if isinstance(v, CfgNode):
v._immutable(is_immutable)
for v in self.values():
if isinstance(v, CfgNode):
v._immutable(is_immutable)
def __getattr__(self, name):
if name in self.__dict__:
return self.__dict__[name]
elif name in self:
return self[name]
else:
raise AttributeError(name)
def __repr__(self):
return "{}({})".format(self.__class__.__name__,
super(CfgNode, self).__repr__())
def __setattr__(self, name, value):
if not self.__dict__[CfgNode.IMMUTABLE]:
if name in self.__dict__:
self.__dict__[name] = value
else:
self[name] = value
else:
raise AttributeError(
'Attempted to set "{}" to "{}", but CfgNode is immutable'
.format(name, value))
def __str__(self):
def _indent(s_, num_spaces):
s = s_.split("\n")
if len(s) == 1:
return s_
first = s.pop(0)
s = [(num_spaces * " ") + line for line in s]
s = "\n".join(s)
s = first + "\n" + s
return s
r = ""
s = []
for k, v in sorted(self.items()):
seperator = "\n" if isinstance(v, CfgNode) else " "
attr_str = "{}:{}{}".format(str(k), seperator, str(v))
attr_str = _indent(attr_str, 2)
s.append(attr_str)
r += "\n".join(s)
return r
def _merge_a_into_b(a, b):
"""Merge config dictionary a into config dictionary b, clobbering the
options in b whenever they are also specified in a."""
if not isinstance(a, dict):
return
for k, v in a.items():
# a must specify keys that are in b
if k not in b:
raise KeyError('{} is not a valid config key'.format(k))
# The types must match, too
v = _check_and_coerce_cfg_value_type(v, b[k], k)
# Recursively merge dicts
if type(v) is CfgNode:
try:
_merge_a_into_b(a[k], b[k])
except: # noqa
print('Error under config key: {}'.format(k))
raise
else:
b[k] = v
def _check_and_coerce_cfg_value_type(value_a, value_b, key):
"""Check if the value type matched."""
type_a, type_b = type(value_a), type(value_b)
if type_a is type_b:
return value_a
if type_b is float and type_a is int:
return float(value_a)
# Exceptions: numpy arrays, strings, tuple<->list
if isinstance(value_b, np.ndarray):
value_a = np.array(value_a, dtype=value_b.dtype)
elif isinstance(value_a, tuple) and isinstance(value_b, list):
value_a = list(value_a)
elif isinstance(value_a, list) and isinstance(value_b, tuple):
value_a = tuple(value_a)
elif isinstance(value_a, dict) and isinstance(value_b, CfgNode):
value_a = CfgNode(value_a)
else:
raise ValueError(
'Type mismatch ({} vs. {}) with values ({} vs. {}) for config '
'key: {}'.format(type_b, type_a, value_b, value_a, key))
return value_a
...@@ -8,6 +8,7 @@ ...@@ -8,6 +8,7 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Experiment coordinator."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -20,16 +21,14 @@ import time ...@@ -20,16 +21,14 @@ import time
import numpy as np import numpy as np
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.core.config import cfg_from_file from seetadet.utils import logging
from seetadet.utils import logger
class Coordinator(object): class Coordinator(object):
"""Manage the unique experiments.""" """Manage the unique experiments."""
def __init__(self, cfg_file, exp_dir=None): def __init__(self, cfg_file, exp_dir=None):
# Override the default configs cfg.merge_from_file(cfg_file)
cfg_from_file(cfg_file)
if cfg.EXP_DIR != '': if cfg.EXP_DIR != '':
exp_dir = cfg.EXP_DIR exp_dir = cfg.EXP_DIR
if exp_dir is None: if exp_dir is None:
...@@ -43,7 +42,7 @@ class Coordinator(object): ...@@ -43,7 +42,7 @@ class Coordinator(object):
raise ValueError('Invalid experiment dir: ' + exp_dir) raise ValueError('Invalid experiment dir: ' + exp_dir)
self.exp_dir = exp_dir self.exp_dir = exp_dir
def _path_at(self, file, auto_create=True): def path_at(self, file, auto_create=True):
try: try:
path = osp.abspath(osp.join(self.exp_dir, file)) path = osp.abspath(osp.join(self.exp_dir, file))
if auto_create and not osp.exists(path): if auto_create and not osp.exists(path):
...@@ -54,20 +53,8 @@ class Coordinator(object): ...@@ -54,20 +53,8 @@ class Coordinator(object):
os.makedirs(path) os.makedirs(path)
return path return path
def checkpoints_dir(self): def get_checkpoint(self, step=None, last_idx=1, wait=False):
return self._path_at('checkpoints') path = self.path_at('checkpoints')
def exports_dir(self):
return self._path_at('exports')
def results_dir(self, checkpoint=None, output_dir=None):
if output_dir is not None:
return output_dir
path = osp.splitext(osp.basename(checkpoint))[0] if checkpoint else ''
return self._path_at(osp.join('results', path))
def checkpoint(self, step=None, last_idx=1, wait=False):
path = self.checkpoints_dir()
def locate(last_idx=None): def locate(last_idx=None):
files = os.listdir(path) files = os.listdir(path)
...@@ -91,7 +78,7 @@ class Coordinator(object): ...@@ -91,7 +78,7 @@ class Coordinator(object):
file, file_step = locate(last_idx) file, file_step = locate(last_idx)
while file is None and wait: while file is None and wait:
logger.info('Wait for checkpoint at {}.'.format(step)) logging.info('Wait for checkpoint at {}.'.format(step))
time.sleep(10) time.sleep(10)
file, file_step = locate(last_idx) file, file_step = locate(last_idx)
return file, file_step return file, file_step
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import dragon.vm.torch as torch
import numpy as np
from seetadet.core.backend import trace_module
from seetadet.core.config import cfg
from seetadet.utils.bbox import bbox_transform_inv
from seetadet.utils.bbox import clip_tiled_boxes
from seetadet.utils.blob import blob_vstack
from seetadet.utils.image import im_rescale
from seetadet.utils.nms import nms
def get_data(imgs):
"""Return the test data."""
im_batch, im_shapes, im_scales = [], [], []
for img in imgs:
scaled_imgs, scales = im_rescale(
img, scales=cfg.TEST.SCALES, max_size=cfg.TEST.MAX_SIZE)
im_batch += scaled_imgs
im_scales += scales
im_shapes += [x.shape[:2] for x in scaled_imgs]
im_batch = blob_vstack(
im_batch, fill_value=cfg.MODEL.PIXEL_MEAN,
align=(cfg.BACKBONE.COARSEST_STRIDE,) * 2)
im_shapes = np.array(im_shapes)
im_scales = np.array(im_scales).reshape((len(im_batch), -1))
im_info = np.hstack([im_shapes, im_scales]).astype('float32')
strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
strides = np.array(strides)[:, None]
grid_shapes = np.stack([im_batch.shape[1:3]] * len(strides))
grid_shapes = (grid_shapes - 1) // strides + 1
grid_info = grid_shapes.astype('int64')
return im_batch, im_info, grid_info
@torch.no_grad()
def im_detect(model, imgs):
"""Detect images."""
im_batch, im_info, grid_info = get_data(imgs)
model.timers['im_detect'].tic()
inputs = {'img': torch.from_numpy(im_batch),
'im_info': torch.from_numpy(im_info),
'grid_info': torch.from_numpy(grid_info)}
if not hasattr(model, 'run_inference'):
def run_inference(self, img, im_info, grid_info):
return self.forward({'img': img, 'im_info': im_info,
'grid_info': grid_info})
trace_module(model, 'run_inference', run_inference)
outputs = model.run_inference(inputs['img'], inputs['im_info'],
inputs['grid_info'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
bbox_pred = bbox_transform_inv(
outputs['rois'][:, 1:5], outputs['bbox_pred'],
weights=cfg.FRCNN.BBOX_REG_WEIGHTS)
imgs_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
results = [([], []) for _ in range(imgs_per_batch)]
batch_inds = outputs['rois'][:, 0:1].astype('int32')
for i in range(imgs_per_batch * num_scales):
index = i // num_scales
inds = np.where(batch_inds == i)[0]
boxes = bbox_pred[inds] / im_info[i, 2]
boxes = clip_tiled_boxes(boxes, imgs[index].shape)
results[index][0].append(outputs['cls_score'][inds])
results[index][1].append(boxes)
results = [[np.vstack(x) for x in y] for y in results]
model.timers['im_detect'].toc(n=imgs_per_batch)
im_boxes = []
for scores, boxes in results:
with model.timers['misc'].tic_and_toc():
cls_boxes = get_cls_results(scores, boxes)
im_boxes.append(cls_boxes)
return [{'boxes': boxes} for boxes in im_boxes]
def get_cls_results(all_scores, all_boxes):
"""Return the categorical results."""
empty_boxes = np.zeros((0, 5), 'float32')
cls_boxes = [[]]
for j in range(1, len(cfg.MODEL.CLASSES)):
inds = np.where(all_scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
if len(inds) == 0:
cls_boxes.append(empty_boxes)
continue
scores = all_scores[inds, j]
boxes = all_boxes[inds, j * 4:(j + 1) * 4]
dets = np.hstack((boxes, scores[:, np.newaxis]))
dets = dets.astype('float32', copy=False)
keep = nms(dets, cfg.TEST.NMS_THRESH)
cls_boxes.append(dets[keep, :])
return cls_boxes
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torch
import numpy as np
from seetadet.core.backend import trace_module
from seetadet.core.config import cfg
from seetadet.utils.bbox import bbox_transform_inv
from seetadet.utils.bbox import clip_tiled_boxes
from seetadet.utils.bbox import distribute_boxes
from seetadet.utils.blob import blob_vstack
from seetadet.utils.image import im_rescale
from seetadet.utils.nms import nms
def get_data(imgs):
"""Return the test data."""
im_batch, im_shapes, im_scales = [], [], []
for img in imgs:
scaled_imgs, scales = im_rescale(
img, scales=cfg.TEST.SCALES, max_size=cfg.TEST.MAX_SIZE)
im_batch += scaled_imgs
im_scales += scales
im_shapes += [x.shape[:2] for x in scaled_imgs]
im_batch = blob_vstack(
im_batch, fill_value=cfg.MODEL.PIXEL_MEAN,
align=(cfg.BACKBONE.COARSEST_STRIDE,) * 2)
im_shapes = np.array(im_shapes)
im_scales = np.array(im_scales).reshape((len(im_batch), -1))
im_info = np.hstack([im_shapes, im_scales]).astype('float32')
strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
strides = np.array(strides)[:, None]
grid_shapes = np.stack([im_batch.shape[1:3]] * len(strides))
grid_shapes = (grid_shapes - 1) // strides + 1
grid_info = grid_shapes.astype('int64')
return im_batch, im_info, grid_info
@torch.no_grad()
def im_detect(model, imgs):
"""Detect images."""
im_boxes, im_rois = im_detect_bbox(model, imgs)
im_rois = np.concatenate(sum(im_rois, []))
mask_pred = im_detect_mask(model, im_rois)
imgs_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
im_masks = [[] for _ in range(imgs_per_batch)]
batch_inds = im_rois[:, 0:1].astype('int32')
for i in range(imgs_per_batch * num_scales):
index = i // num_scales
inds = np.where(batch_inds == i)[0]
masks, labels = mask_pred[inds], im_rois[inds, 5]
num_classes = len(im_boxes[index])
for _ in range(num_classes - len(im_masks[index])):
im_masks[index].append([])
for j in range(1, num_classes):
im_masks[index][j].append(masks[np.where(labels == (j - 1))[0]])
if (i + 1) % num_scales == 0:
v = im_masks[index][j]
im_masks[index][j] = np.vstack(v) if len(v) > 1 else v[0]
return [{'boxes': boxes, 'masks': masks}
for boxes, masks in zip(im_boxes, im_masks)]
@torch.no_grad()
def im_detect_bbox(model, imgs):
"""Detect images at single or multiple scales."""
im_batch, im_info, grid_info = get_data(imgs)
model.timers['im_detect'].tic()
inputs = {'img': torch.from_numpy(im_batch),
'im_info': torch.from_numpy(im_info),
'grid_info': torch.from_numpy(grid_info)}
if not hasattr(model, 'run_inference'):
def run_inference(self, img, im_info, grid_info):
return self.forward({'img': img, 'im_info': im_info,
'grid_info': grid_info})
trace_module(model, 'run_inference', run_inference)
outputs = model.run_inference(inputs['img'], inputs['im_info'],
inputs['grid_info'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
bbox_pred = bbox_transform_inv(
outputs['rois'][:, 1:5], outputs['bbox_pred'],
weights=cfg.FRCNN.BBOX_REG_WEIGHTS)
imgs_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
results = [([], [], []) for _ in range(imgs_per_batch)]
batch_inds = outputs['rois'][:, 0:1].astype('int32')
for i in range(imgs_per_batch * num_scales):
index = i // num_scales
inds = np.where(batch_inds == i)[0]
boxes = bbox_pred[inds] / im_info[i, 2]
boxes = clip_tiled_boxes(boxes, imgs[index].shape)
results[index][0].append(outputs['cls_score'][inds])
results[index][1].append(boxes)
results[index][2].append(batch_inds[inds])
results = [[np.vstack(x) for x in y] for y in results]
model.timers['im_detect'].toc(n=imgs_per_batch)
im_boxes, im_rois = [], []
for scores, boxes, batch_inds in results:
with model.timers['misc'].tic_and_toc():
cls_boxes, cls_rois = get_cls_results(
scores, boxes, batch_inds, im_info)
im_boxes.append(cls_boxes)
im_rois.append(cls_rois)
return im_boxes, im_rois
@torch.no_grad()
def im_detect_mask(model, im_rois):
lvl_min, lvl_max = cfg.FRCNN.MIN_LEVEL, cfg.FRCNN.MAX_LEVEL
roi_lvls = distribute_boxes(im_rois[:, 1:5], lvl_min, lvl_max)
roi_inds = [np.where(roi_lvls == (i + lvl_min))[0]
for i in range(lvl_max - lvl_min + 1)]
rois, labels = [], []
for inds in roi_inds:
rois.append(im_rois[inds, :5] if len(inds) > 0 else
np.array([[-1, 0, 0, 1, 1]], 'float32'))
labels.append(im_rois[inds, 5].astype('int64')
if len(inds) > 0 else np.array([-1], 'int64'))
model.timers['im_detect_mask'].tic()
model.outputs['rois'] = [model.to_tensor(x) for x in rois]
mask_pred = model.mask_head(model.outputs)['mask_pred']
num_rois, num_classes = mask_pred.shape[:2]
labels = np.concatenate(labels)
fg_inds = np.where(labels >= 0)[0]
strides = np.arange(num_rois) * num_classes
mask_inds = model.to_tensor(strides[fg_inds] + labels[fg_inds])
mask_pred = mask_pred.flatten_(0, 1)[mask_inds].numpy()
mask_pred = mask_pred[np.concatenate(roi_inds).argsort()].copy()
model.timers['im_detect_mask'].toc()
return mask_pred
def get_cls_results(all_scores, all_boxes, batch_inds, im_info):
"""Return the categorical results."""
empty_boxes = np.zeros((0, 5), 'float32')
empty_rois = np.zeros((0, 6), 'float32')
cls_boxes, cls_rois = [[]], []
for j in range(1, len(cfg.MODEL.CLASSES)):
inds = np.where(all_scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
if len(inds) == 0:
cls_boxes.append(empty_boxes)
cls_rois.append(empty_rois)
continue
scores = all_scores[inds, j]
boxes = all_boxes[inds, j * 4:(j + 1) * 4]
dets = np.hstack((boxes, scores[:, np.newaxis]))
dets = dets.astype('float32', copy=False)
keep = nms(dets, cfg.TEST.NMS_THRESH)
batch_inds_keep = batch_inds[inds][keep]
cls_boxes.append(dets[keep, :])
cls_rois.append(np.hstack((
batch_inds_keep,
cls_boxes[-1][:, :4] * im_info[batch_inds_keep, 2],
np.ones((len(keep), 1)) * (j - 1))).astype('float32'))
return cls_boxes, cls_rois
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import dragon.vm.torch as torch
import numpy as np
from seetadet.core.backend import trace_module
from seetadet.core.config import cfg
from seetadet.utils.blob import blob_vstack
from seetadet.utils.image import im_rescale
from seetadet.utils.nms import nms
def get_data(imgs):
"""Return the test data."""
im_batch, im_shapes, im_scales = [], [], []
for img in imgs:
scaled_imgs, scales = im_rescale(
img, scales=cfg.TEST.SCALES, max_size=cfg.TEST.MAX_SIZE)
im_batch += scaled_imgs
im_scales += scales
im_shapes += [x.shape[:2] for x in scaled_imgs]
im_batch = blob_vstack(
im_batch, fill_value=cfg.MODEL.PIXEL_MEAN,
size=(cfg.AUG.CROP_SIZE,) * 2,
align=(cfg.BACKBONE.COARSEST_STRIDE,) * 2)
im_shapes = np.array(im_shapes)
im_scales = np.array(im_scales).reshape((len(im_batch), -1))
im_info = np.hstack([im_shapes, im_scales]).astype('float32')
strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
strides = np.array(strides)[:, None]
grid_shapes = np.stack([im_batch.shape[1:3]] * len(strides))
grid_shapes = (grid_shapes - 1) // strides + 1
grid_info = grid_shapes.astype('int64')
return im_batch, im_info, grid_info
@torch.no_grad()
def im_detect(model, imgs):
"""Detect images."""
im_batch, im_info, grid_info = get_data(imgs)
model.timers['im_detect'].tic()
inputs = {'img': torch.from_numpy(im_batch),
'im_info': torch.from_numpy(im_info),
'grid_info': torch.from_numpy(grid_info)}
if not hasattr(model, 'run_inference'):
def run_inference(self, img, im_info, grid_info):
return self.forward({'img': img, 'im_info': im_info,
'grid_info': grid_info})
trace_module(model, 'run_inference', run_inference)
outputs = model.run_inference(inputs['img'], inputs['im_info'],
inputs['grid_info'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
imgs_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
results = [[] for _ in range(imgs_per_batch)]
batch_inds = outputs['dets'][:, 0:1].astype('int32')
for i in range(imgs_per_batch * num_scales):
index = i // num_scales
inds = np.where(batch_inds == i)[0]
results[index].append(outputs['dets'][inds, 1:])
for index in range(imgs_per_batch):
try:
results[index] = np.vstack(results[index])
except ValueError:
results[index] = results[index][0]
model.timers['im_detect'].toc(n=imgs_per_batch)
im_boxes = []
for dets in results:
with model.timers['misc'].tic_and_toc():
cls_boxes = get_cls_results(dets)
im_boxes.append(cls_boxes)
return [{'boxes': boxes} for boxes in im_boxes]
def get_cls_results(all_dets):
"""Return the categorical results."""
empty_boxes = np.zeros((0, 5), 'float32')
cls_boxes = [[]]
labels = all_dets[:, 5].astype('int32')
for j in range(1, len(cfg.MODEL.CLASSES)):
inds = np.where(labels == j)[0]
if len(inds) == 0:
cls_boxes.append(empty_boxes)
continue
dets = all_dets[inds, :5].astype('float32')
keep = nms(dets, cfg.TEST.NMS_THRESH)
cls_boxes.append(dets[keep, :])
return cls_boxes
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import dragon.vm.torch as torch
import numpy as np
from seetadet.core.backend import trace_module
from seetadet.core.config import cfg
from seetadet.utils.bbox import bbox_transform_inv
from seetadet.utils.bbox import clip_boxes
from seetadet.utils.blob import blob_vstack
from seetadet.utils.image import im_rescale
from seetadet.utils.nms import nms
def get_data(imgs):
"""Return the test data."""
im_batch, im_scales = [], []
for img in imgs:
scaled_imgs, scales = im_rescale(
img, scales=cfg.TEST.SCALES, keep_ratio=False)
im_batch += scaled_imgs
im_scales += scales
im_batch = blob_vstack(im_batch, fill_value=cfg.MODEL.PIXEL_MEAN)
return im_batch, im_scales
@torch.no_grad()
def im_detect(model, imgs):
"""Detect images."""
im_batch, im_scales = get_data(imgs)
model.timers['im_detect'].tic()
inputs = {'img': torch.from_numpy(im_batch)}
if not hasattr(model, 'run_inference'):
def run_inference(self, img):
return self.forward({'img': img})
trace_module(model, 'run_inference', run_inference,
example_inputs=[inputs['img']])
outputs = model.run_inference(inputs['img'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
anchors = model.bbox_head.targets.generator.grid_anchors
imgs_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
results = [([], []) for _ in range(imgs_per_batch)]
for i in range(imgs_per_batch * num_scales):
index = i // num_scales
boxes = bbox_transform_inv(
anchors, outputs['bbox_pred'][i],
weights=cfg.SSD.BBOX_REG_WEIGHTS)
boxes[:, 0::2] /= im_scales[i][1]
boxes[:, 1::2] /= im_scales[i][0]
boxes = clip_boxes(boxes, imgs[index].shape)
results[index][0].append(outputs['cls_score'][i])
results[index][1].append(boxes)
results = [[np.vstack(x) for x in y] for y in results]
model.timers['im_detect'].toc(n=imgs_per_batch)
im_boxes = []
for scores, boxes in results:
with model.timers['misc'].tic_and_toc():
cls_boxes = get_cls_results(scores, boxes)
im_boxes.append(cls_boxes)
return [{'boxes': boxes} for boxes in im_boxes]
def get_cls_results(all_scores, all_boxes):
"""Return the categorical results."""
cls_boxes = [[]]
for j in range(1, len(cfg.MODEL.CLASSES)):
inds = np.where(all_scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
scores, boxes = all_scores[inds, j], all_boxes[inds]
inds = np.argsort(-scores)[:cfg.SSD.PRE_NMS_TOP_N]
scores, boxes = scores[inds], boxes[inds]
dets = np.hstack((boxes, scores[:, np.newaxis]))
dets = dets.astype('float32', copy=False)
keep = nms(dets, cfg.TEST.NMS_THRESH)
cls_boxes.append(dets[keep, :])
return cls_boxes
...@@ -8,6 +8,7 @@ ...@@ -8,6 +8,7 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Registry class."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -18,42 +19,37 @@ import functools ...@@ -18,42 +19,37 @@ import functools
class Registry(object): class Registry(object):
"""The base registry class.""" """Registry class."""
def __init__(self, name): def __init__(self, name):
self._name = name self.name = name
self._registry = collections.OrderedDict() self.registry = collections.OrderedDict()
def has(self, key): def has(self, key):
return key in self._registry return key in self.registry
def register(self, name, func=None, **kwargs): def register(self, name, func=None, **kwargs):
def decorated(inner_function): def decorated(inner_function):
for key in (name if isinstance( for key in (name if isinstance(
name, (tuple, list)) else [name]): name, (tuple, list)) else [name]):
if self.has(key): self.registry[key] = \
raise KeyError(
'`%s` has been registered in %s.'
% (key, self._name))
self._registry[key] = \
functools.partial(inner_function, **kwargs) functools.partial(inner_function, **kwargs)
return inner_function return inner_function
if func is not None: if func is not None:
return decorated(func) return decorated(func)
return decorated return decorated
def get(self, name): def get(self, name, default=None):
if name is None:
return None
if not self.has(name): if not self.has(name):
raise KeyError( if default is not None:
"`%s` is not registered in <%s>." return default
% (name, self._name)) raise KeyError("`%s` is not registered in <%s>."
return self._registry[name] % (name, self.name))
return self.registry[name]
def try_get(self, name): def try_get(self, name):
if self.has(name): if self.has(name):
return self.get(name) return self.get(name)
return None return None
backbone = Registry('backbone')
fusion_pass = Registry('fusion_pass')
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import datetime
import importlib
import multiprocessing
import numpy as np
from seetadet.core.config import cfg
from seetadet.utils import time_util
from seetadet.utils.vis import vis_one_image
def run_test_net(
checkpoint,
server,
devices,
read_every=1000,
log_every=100,
):
classes = server.classes
num_images = server.num_images
num_classes = server.num_classes
devices = devices if devices else [cfg.GPU_ID]
num_workers = len(devices)
read_stride = float(num_workers * cfg.TEST.IMS_PER_BATCH)
read_every = int(np.ceil(read_every / read_stride) * read_stride)
log_every = log_every if log_every > 0 else num_images
test_module = 'seetadet.algo.%s.test' % cfg.MODEL.TYPE
test_fn = getattr(importlib.import_module(test_module), 'test_net')
timers = time_util.new_timers('im_detect', 'misc')
vis_image_dict = {}
all_boxes = [[[] for _ in range(num_images)] for _ in range(num_classes)]
all_masks = [[[] for _ in range(num_images)] for _ in range(num_classes)]
queues = [multiprocessing.Queue() for _ in range(num_workers + 1)]
workers = [
multiprocessing.Process(
target=test_fn,
kwargs={
'weights': checkpoint,
'q_in': queues[i],
'q_out': queues[-1],
'device': devices[i],
'root_logger': i == 0,
}
) for i in range(num_workers)
]
for process in workers:
process.start()
num_sends = 0
for count in range(num_images):
if count >= num_sends:
num_to_send = min(read_every, num_images - num_sends)
for i in range(count, count + num_to_send):
image_id, raw_image = server.get_image()
queues[i % num_workers].put((i, raw_image))
if cfg.VIS or cfg.VIS_ON_FILE:
vis_image_dict[i] = (image_id, raw_image)
num_sends += num_to_send
if num_sends == num_images:
for i in range(num_workers):
queues[i].put((-1, None))
i, time_diffs, results = queues[-1].get()
# Unpack the diverse results
boxes_this_image = results['boxes']
masks_this_image = results.get('masks', None)
# Disable some collections
if masks_this_image is None:
all_masks = None
# Update time difference
for name, diff in time_diffs.items():
timers[name].add_diff(diff)
# Visualize the results if necessary
if cfg.VIS or cfg.VIS_ON_FILE:
image_id, raw_image = vis_image_dict[i]
vis_one_image(
raw_image,
classes,
boxes_this_image,
masks_this_image,
thresh=cfg.VIS_TH,
box_alpha=1.,
show_class=True,
filename=server.get_save_filename(image_id),
)
del vis_image_dict[i]
# Pack the results in the class-major order
for j in range(1, num_classes):
all_boxes[j][i] = boxes_this_image[j]
if all_masks is not None:
if j < len(masks_this_image):
all_masks[j][i] = masks_this_image[j]
# Limit to max_per_image detections *over all classes*
max_detections = cfg.TEST.DETECTIONS_PER_IM
if max_detections > 0:
scores = []
for j in range(1, num_classes):
if len(all_boxes[j][i]) < 1:
continue
scores.append(all_boxes[j][i][:, -1])
if len(scores) > 0:
scores = np.hstack(scores)
if len(scores) > max_detections:
thr = np.sort(scores)[-max_detections]
for j in range(1, num_classes):
keep = np.where(all_boxes[j][i][:, -1] >= thr)[0]
all_boxes[j][i] = all_boxes[j][i][keep, :]
if all_masks is not None:
all_masks[j][i] = all_masks[j][i][keep]
if (count + 1) % log_every == 0:
avg_total_time = np.sum([t.average_time for t in timers.values()])
eta_seconds = avg_total_time * (num_images - count - 1)
print('\rim_detect: {:d}/{:d} [{:.3f}s + {:.3f}s] (eta: {})'
.format(count + 1, num_images,
timers['im_detect'].average_time,
timers['misc'].average_time,
str(datetime.timedelta(seconds=int(eta_seconds)))),
end='')
print('\n\n>>> Evaluating detections\n')
server.evaluate_detections(all_boxes)
if all_masks is not None:
print('>>> Evaluating segmentations\n')
server.evaluate_segmentations(all_boxes, all_masks)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import multiprocessing as mp
import os
import cv2
import dragon
from seetadet.core.config import cfg
from seetadet.datasets.example import Example
from seetadet.datasets.factory import get_dataset
class _Server(object):
"""Base server class."""
def __init__(self, output_dir):
self.output_dir = output_dir
if cfg.VIS_ON_FILE:
self.vis_dir = os.path.join(self.output_dir, 'vis')
if not os.path.exists(self.vis_dir):
os.makedirs(self.vis_dir)
def evaluate_detections(self, all_boxes):
pass
def evaluate_segmentations(self, all_boxes, all_masks):
pass
def get_image(self):
pass
def get_save_filename(self, image_id, ext='.jpg'):
return os.path.join(self.vis_dir, image_id + ext) \
if cfg.VIS_ON_FILE else None
class EvaluateServer(_Server):
"""Server to evaluate network with ground-truth."""
def __init__(self, output_dir):
super(EvaluateServer, self).__init__(output_dir)
self.dataset = get_dataset(cfg.TEST.DATASET)
self.dataset.competition_mode(cfg.TEST.COMPETITION_MODE)
self.classes = self.dataset.classes
self.num_images = self.dataset.num_images
self.num_classes = self.dataset.num_classes
self.data_reader = dragon.io.DataReader(
dataset=self.dataset.cls, source=self.dataset.source)
self.data_reader.q_out = mp.Queue(cfg.TEST.IMS_PER_BATCH * 4)
self.data_reader.start()
self.gt_recs = collections.OrderedDict()
def get_image(self):
example = Example(self.data_reader.q_out.get())
image, image_id = example.image, example.id
self.gt_recs[image_id] = {
'height': example.height,
'width': example.width,
'objects': example.objects,
}
return image_id, image
def get_records(self):
if len(self.gt_recs) != self.num_images:
raise RuntimeError(
'Loading {} records, while {} required.'
.format(len(self.gt_recs), self.num_images))
return self.gt_recs
def evaluate_detections(self, all_boxes):
if cfg.TEST.PROTOCOL == 'dump':
self.dataset.dump_detections(all_boxes, self.output_dir)
else:
self.dataset.evaluate_detections(
all_boxes,
self.get_records(),
self.output_dir,
)
def evaluate_segmentations(self, all_boxes, all_masks):
self.dataset.evaluate_segmentations(
all_boxes,
all_masks,
self.get_records(),
self.output_dir,
)
class InferServer(_Server):
"""Server to run inference."""
def __init__(self, output_dir):
super(InferServer, self).__init__(output_dir)
self.images_dir = cfg.TEST.DATASET
self.images = os.listdir(self.images_dir)
self.classes = cfg.MODEL.CLASSES
self.num_images = len(self.images)
self.num_classes = len(cfg.MODEL.CLASSES)
self.output_dir = output_dir
self.image_idx = 0
def get_image(self):
image_name = self.images[self.image_idx]
image_id = image_name.split('.')[0]
image = cv2.imread(os.path.join(self.images_dir, image_name))
self.image_idx = (self.image_idx + 1) % self.num_images
return image_id, image
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Testing engine."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import datetime
import importlib
import multiprocessing as mp
import time
import numpy as np
from seetadet.core.config import cfg
from seetadet.core.testing import test_server
from seetadet.models.build import build_detector
from seetadet.utils.vis import vis_one_image
from seetadet.utils import logging
from seetadet.utils import profiler
def filter_outputs(outputs, max_dets=100):
"""Limit the max number of detections."""
if max_dets <= 0:
return outputs
boxes = outputs.pop('boxes')
masks = outputs.pop('masks', None)
scores, num_classes = [], len(boxes)
for i in range(num_classes):
if len(boxes[i]) > 0:
scores.append(boxes[i][:, -1])
scores = np.hstack(scores) if len(scores) > 0 else []
if len(scores) > max_dets:
thr = np.sort(scores)[-max_dets]
for i in range(num_classes):
if len(boxes[i]) < 1:
continue
keep = np.where(boxes[i][:, -1] >= thr)[0]
boxes[i] = boxes[i][keep]
if masks is not None:
masks[i] = masks[i][keep]
outputs['boxes'] = boxes
outputs['masks'] = masks
return outputs
def extend_results(index, collection, results):
"""Add image results to the collection."""
if results is None:
return
for _ in range(len(results) - len(collection)):
collection.append([])
for i in range(1, len(results)):
for _ in range(index - len(collection[i]) + 1):
collection[i].append([])
collection[i][index] = results[i]
def test_detector(
test_cfg,
weights,
queues,
device,
verbose=True,
batch_timeout=None,
):
"""Test a detector.
Parameters
----------
test_cfg : CfgNode
The cfg for testing.
weights : str
The path of model weights to load.
queues : Sequence[multiprocessing.Queue]
The input and output queue.
device : int
The index of computing device.
verbose : bool, optional, default=True
Print the infomation or not.
batch_timeout : number, optional
The timeout to wait "IMS_PER_BATCH"
"""
cfg.merge_from_other_cfg(test_cfg)
cfg.GPU_ID = device
cfg.freeze()
logging.set_root(verbose)
model = build_detector(device, weights)
module = 'seetadet.core.modules.%s' % cfg.MODEL.TYPE
module = importlib.import_module(module)
input_queue, output_queue = queues
imgs_per_batch = cfg.TEST.IMS_PER_BATCH
must_stop = False
while not must_stop:
indices, imgs = [], []
deadline, timeout = None, None
for i in range(imgs_per_batch):
if batch_timeout and i == 1:
deadline = time.monotonic() + batch_timeout
if batch_timeout and i >= 1:
timeout = deadline - time.monotonic()
try:
index, img = input_queue.get(timeout=timeout)
if index < 0:
must_stop = True
break
indices.append(index)
imgs.append(img)
except Exception:
pass
if len(imgs) == 0:
continue
results = module.im_detect(model, imgs)
time_diffs = dict((k, v.average_time) for k, v in model.timers.items())
time_diffs['im_detect'] += time_diffs.pop('im_detect_mask', 0.0)
for i, outputs in enumerate(results):
output_queue.put((indices[i], time_diffs, outputs))
def run_test(weights, output_dir, devices, read_every=100, vis_thresh=0):
"""Run a model testing.
Parameters
----------
weights : str
The path of model weights to load.
output_dir : str
The path to save results.
devices : Sequence[int]
The index of computing devices.
read_every : int, optional, default=100
Read every N images to distribute to devices.
vis_thresh : float, optional, default=0
The score threshold for visualization.
"""
server = test_server.EvaluateServer(output_dir)
devices = devices if devices else [cfg.GPU_ID]
num_devices = len(devices)
num_images = server.dataset.num_images
max_dets = cfg.TEST.DETECTIONS_PER_IM
read_stride = float(num_devices * cfg.TEST.IMS_PER_BATCH)
read_every = int(np.ceil(read_every / read_stride) * read_stride)
if vis_thresh > 0:
import matplotlib.pyplot as plt
plt.switch_backend('agg')
queues = [mp.Queue() for _ in range(num_devices + 1)]
actors = [mp.Process(
target=test_detector, kwargs={
'test_cfg': cfg,
'weights': weights,
'queues': [queues[i], queues[-1]],
'device': devices[i],
'verbose': i == 0}) for i in range(num_devices)]
for actor in actors:
actor.start()
timers = collections.defaultdict(profiler.Timer)
all_boxes, all_masks, vis_images = [], [], {}
for count in range(1, num_images + 1):
img_id, img = server.get_image()
queues[count % num_devices].put((count - 1, img))
if vis_thresh > 0:
filename = server.get_save_filename(img_id)
vis_images[count - 1] = (filename, img)
if count % read_every > 0 and count < num_images:
continue
if count == num_images:
for i in range(num_devices):
queues[i].put((-1, None))
for _ in range(((count - 1) % read_every + 1)):
index, time_diffs, outputs = queues[-1].get()
outputs = filter_outputs(outputs, max_dets)
extend_results(index, all_boxes, outputs['boxes'])
extend_results(index, all_masks, outputs.get('masks', None))
for name, diff in time_diffs.items():
timers[name].add_diff(diff)
if vis_thresh > 0:
filename, img = vis_images[index]
vis_one_image(img, server.dataset.classes,
outputs['boxes'],
outputs.get('masks', None),
thresh=vis_thresh,
filename=filename)
del vis_images[index]
avg_time = sum([t.average_time for t in timers.values()])
eta_seconds = avg_time * (num_images - count)
print('\rim_detect: {:d}/{:d} [{:.3f}s + {:.3f}s] (eta: {})'
.format(count, num_images,
timers['im_detect'].average_time,
timers['misc'].average_time,
str(datetime.timedelta(seconds=int(eta_seconds)))),
end='')
print('\nEvaluating detections...')
server.eval_bbox(all_boxes)
if len(all_masks) > 0:
print('Evaluating segmentations...')
server.eval_segm(all_boxes, all_masks)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Testing servers."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import base64
import collections
import multiprocessing
import os
import cv2
import numpy as np
try:
import flask
except ImportError:
flask = None
from seetadet.core.config import cfg
from seetadet.data.build import build_dataset
from seetadet.data.build import build_evaluator
from seetadet.data.build import build_loader_test
class BaseServer(object):
"""Base server class."""
def __init__(self, output_dir):
self.output_dir = output_dir
self.vis_dir = os.path.join(self.output_dir, 'vis')
def get_image(self):
"""Return the image."""
def get_save_filename(self, img_id, ext='.jpg'):
if not os.path.exists(self.vis_dir):
os.makedirs(self.vis_dir)
return os.path.join(self.vis_dir, img_id + ext)
class EvaluateServer(BaseServer):
"""Server to evaluate model with ground-truth."""
def __init__(self, output_dir):
super(EvaluateServer, self).__init__(output_dir)
self.loader = build_loader_test()
self.dataset = build_dataset(cfg.TEST.DATASET)
self.evaluator = build_evaluator()
self.next_inputs = []
self.metas = collections.OrderedDict()
def get_image(self):
if len(self.next_inputs) == 0:
inputs = self.loader()
for i, img_meta in enumerate(inputs['img_meta']):
self.next_inputs.append({
'img': inputs['img'][i],
'objects': inputs['objects'][i],
'id': img_meta['id'],
'height': img_meta['height'],
'width': img_meta['width']})
inputs = self.next_inputs.pop(0)
img_id, img = inputs.pop('id'), inputs.pop('img')
self.metas[img_id] = inputs
return img_id, img
def eval_bbox(self, boxes):
self.check_metas()
res_file = self.evaluator.write_bbox_results(
boxes, self.metas, self.output_dir)
self.evaluator.eval_bbox(res_file)
def eval_segm(self, boxes, masks):
self.check_metas()
res_file = self.evaluator.write_segm_results(
boxes, masks, self.metas, self.output_dir)
self.evaluator.eval_segm(res_file)
def check_metas(self):
if len(self.metas) != self.dataset.num_images:
raise RuntimeError(
'Mismatched number of metas and images. ({} vs. {}).'
'\nCheck if existing duplicate image ids.'
.format(len(self.metas), self.dataset.num_images))
if self.evaluator.cocoGt is None:
ann_file = self.evaluator.write_annotations(self.metas, self.output_dir)
self.evaluator.load_annotations(ann_file)
class WebServer(BaseServer, multiprocessing.Process):
"""Server for web serving."""
def __init__(self, output_dir):
BaseServer.__init__(self, output_dir)
multiprocessing.Process.__init__(self, daemon=True)
self.output_dir = output_dir
self.classes = cfg.MODEL.CLASSES
self.num_classes = len(cfg.MODEL.CLASSES)
self.img_id = multiprocessing.Value('i', 0)
def get_image(self):
try:
req = flask.request.get_json(force=True)
img_base64 = req['image']
except KeyError:
err_msg = 'Not found "image" in data.'
flask.abort(flask.Response(err_msg))
try:
img_base64 = img_base64.split(",")[-1]
img_bytes = base64.b64decode(img_base64)
except Exception as e:
err_msg = 'Decode image bytes failed. Detail: ' + str(e)
flask.abort(flask.Response(err_msg))
try:
img = np.frombuffer(img_bytes, 'uint8')
img = cv2.imdecode(img, cv2.IMREAD_COLOR)
except Exception as e:
err_msg = 'Decode image bytes. Detail: ' + str(e)
flask.abort(flask.Response(err_msg))
with self.img_id.get_lock():
self.img_id.value += 1
img_id = self.img_id.value
return img_id, img
class InferServer(BaseServer):
"""Server to run model inference."""
def __init__(self, output_dir):
super(InferServer, self).__init__(output_dir)
self.img_dir = cfg.TEST.DATASET
self.classes = cfg.MODEL.CLASSES
self.num_classes = len(cfg.MODEL.CLASSES)
self.imgs = os.listdir(self.img_dir)
self.num_images = len(self.imgs)
self.img_id = 0
def get_image(self):
img_file = self.imgs[self.img_id]
img = cv2.imread(os.path.join(self.img_dir, img_file))
self.img_id += 1
return self.img_id - 1, img
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import os
from dragon.vm import torch
from seetadet.core.config import cfg
from seetadet.solver.sgd import SGDSolver
from seetadet.utils import logger
from seetadet.utils import time_util
from seetadet.utils.stats import SmoothedValue
class SolverWrapper(object):
"""Sovler wrapper."""
def __init__(self, coordinator):
self.output_dir = coordinator.checkpoints_dir()
self.solver = SGDSolver()
self.detector = self.solver.detector
# Setup the detector
self.detector.load_weights(cfg.TRAIN.WEIGHTS)
self.detector.cuda(cfg.GPU_ID)
if cfg.MODEL.PRECISION.lower() == 'float16':
# Mixed precision training
self.detector.half()
# Plan the metrics
self.board = None
self.metrics = collections.OrderedDict()
if cfg.ENABLE_TENSOR_BOARD and logger.is_root():
try:
from dragon.tools.tensorboard import TensorBoard
log_dir = coordinator.exp_dir + '/logs'
self.board = TensorBoard(log_dir=log_dir)
except ImportError:
pass
def snapshot(self):
filename = (cfg.SOLVER.SNAPSHOT_PREFIX +
'_iter_{}.pkl'.format(self.solver.iter))
filename = os.path.join(self.output_dir, filename)
if logger.is_root() and not os.path.exists(filename):
torch.save(self.detector.state_dict(), filename)
logger.info('Wrote snapshot to: {:s}'.format(filename))
def add_metrics(self, stats):
for k, v in stats['loss'].items():
if k not in self.metrics:
self.metrics[k] = SmoothedValue(20)
self.metrics[k].add_value(v)
def send_metrics(self, stats):
if self.board is not None:
self.board.scalar_summary('lr', stats['lr'], stats['iter'])
self.board.scalar_summary('time', stats['time'], stats['iter'])
for k, v in self.metrics.items():
if k == 'total':
self.board.scalar_summary(
'total_loss', v.median(), stats['iter'])
else:
self.board.scalar_summary(k, v.median(), stats['iter'])
def step(self):
display = self.solver.iter % cfg.SOLVER.DISPLAY == 0
stats = self.solver.step()
self.add_metrics(stats)
if display:
logger.info(
'Iteration %d, lr = %.8f, loss = %f, time = %.2fs'
% (stats['iter'], stats['lr'],
self.metrics['total'].median(), stats['time']))
for k, v in self.metrics.items():
if k == 'total':
continue
logger.info(' ' * 10 + 'Train net output({}): {}'
.format(k, v.median()))
self.send_metrics(stats)
def train_model(self):
"""Network training loop."""
timer = time_util.Timer()
max_steps = cfg.SOLVER.MAX_STEPS
while self.solver.iter < max_steps:
with timer.tic_and_toc():
_, step = self.step(), self.solver.iter
if step % (10 * cfg.SOLVER.DISPLAY) == 0:
logger.info(time_util.get_progress_info(timer, step, max_steps))
if step % cfg.SOLVER.SNAPSHOT_EVERY == 0:
self.snapshot()
def train_net(coordinator, start_iter=0):
sw = SolverWrapper(coordinator)
sw.solver.iter = start_iter
logger.info('Solving...')
sw.train_model()
sw.snapshot()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build for training library."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torch
from seetadet.core.config import cfg
from seetadet.core.training import lr_scheduler
from seetadet.utils import logging
def build_optimizer(params, **kwargs):
"""Build the optimizer."""
args = {'lr': cfg.SOLVER.BASE_LR,
'weight_decay': cfg.SOLVER.WEIGHT_DECAY,
'clip_norm': cfg.SOLVER.CLIP_NORM,
'grad_scale': 1.0 / cfg.SOLVER.LOSS_SCALE}
optimizer = kwargs.pop('optimizer', cfg.SOLVER.OPTIMIZER)
if optimizer == 'SGD':
args['momentum'] = cfg.SOLVER.MOMENTUM
args.update(kwargs)
return getattr(torch.optim, optimizer)(params, **args)
def build_lr_scheduler(**kwargs):
"""Build the LR scheduler."""
args = {'lr_max': cfg.SOLVER.BASE_LR,
'lr_min': cfg.SOLVER.MIN_LR,
'warmup_steps': cfg.SOLVER.WARM_UP_STEPS,
'warmup_factor': cfg.SOLVER.WARM_UP_FACTOR}
policy = kwargs.pop('policy', cfg.SOLVER.LR_POLICY)
args.update(kwargs)
if policy == 'cosine_decay':
return lr_scheduler.CosineLR(
decay_step=(cfg.SOLVER.DECAY_STEPS or [1])[0],
max_steps=cfg.SOLVER.MAX_STEPS, **args)
elif policy == 'steps_with_decay':
return lr_scheduler.MultiStepLR(
decay_steps=cfg.SOLVER.DECAY_STEPS,
decay_gamma=cfg.SOLVER.DECAY_GAMMA, **args)
return lr_scheduler.ConstantLR(**args)
def build_tensorboard(log_dir):
"""Build the tensorboard."""
if cfg.ENABLE_TENSOR_BOARD and logging.is_root():
try:
from dragon.utils.tensorboard import TensorBoard
return TensorBoard(log_dir=log_dir)
except ImportError:
pass
return None
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""LearningRate schedulers."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
class ConstantLR(object):
"""Constant LR scheduler."""
def __init__(self, **kwargs):
self._lr_max = kwargs.pop('lr_max')
self._lr_min = kwargs.pop('lr_min', 0)
self._warmup_steps = kwargs.pop('warmup_steps', 0)
self._warmup_factor = kwargs.pop('warmup_factor', 0)
if kwargs:
raise ValueError('Unexpected arguments: ' + ','.join(v for v in kwargs))
self._step_count = 0
self._last_decay = 1.
def step(self):
self._step_count += 1
def get_lr(self):
if self._step_count < self._warmup_steps:
alpha = (self._step_count + 1.) / self._warmup_steps
return self._lr_max * (alpha + (1. - alpha) * self._warmup_factor)
return self._lr_min + (self._lr_max - self._lr_min) * self.get_decay()
def get_decay(self):
return self._last_decay
class CosineLR(ConstantLR):
"""LR scheduler with cosine decay."""
def __init__(self, lr_max, max_steps, lr_min=0, decay_step=1, **kwargs):
super(CosineLR, self).__init__(lr_max=lr_max, lr_min=lr_min, **kwargs)
self._decay_step = decay_step
self._max_steps = max_steps
def get_decay(self):
t = self._step_count - self._warmup_steps
t_max = self._max_steps - self._warmup_steps
if t > 0 and t % self._decay_step == 0:
self._last_decay = .5 * (1. + math.cos(math.pi * t / t_max))
return self._last_decay
class MultiStepLR(ConstantLR):
"""LR scheduler with multi-steps decay."""
def __init__(self, lr_max, decay_steps, decay_gamma, **kwargs):
super(MultiStepLR, self).__init__(lr_max=lr_max, **kwargs)
self._decay_steps = decay_steps
self._decay_gamma = decay_gamma
self._stage_count = 0
self._num_stages = len(decay_steps)
def get_decay(self):
if self._stage_count < self._num_stages:
k = self._decay_steps[self._stage_count]
while self._step_count >= k:
self._stage_count += 1
if self._stage_count >= self._num_stages:
break
k = self._decay_steps[self._stage_count]
self._last_decay = self._decay_gamma ** self._stage_count
return self._last_decay
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Training engine."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import functools
import os
from dragon.vm import torch
from seetadet.core.config import cfg
from seetadet.core.training.build import build_lr_scheduler
from seetadet.core.training.build import build_optimizer
from seetadet.core.training.build import build_tensorboard
from seetadet.core.training.utils import get_param_groups
from seetadet.data.build import build_loader_train
from seetadet.models.build import build_detector
from seetadet.utils import logging
from seetadet.utils import profiler
class Trainer(object):
"""Schedule the iterative model training."""
def __init__(self, coordinator):
# Build loader.
self.loader = build_loader_train()
# Build model.
self.model = build_detector(training=True)
self.model.load_weights(cfg.TRAIN.WEIGHTS)
self.model.cuda(cfg.GPU_ID)
if cfg.MODEL.PRECISION.lower() == 'float16':
self.model.half()
# Build optimizer.
self.loss_scale = cfg.SOLVER.LOSS_SCALE
param_groups_getter = get_param_groups
if cfg.SOLVER.LAYER_LR_DECAY < 1.0:
param_groups_getter = functools.partial(
param_groups_getter, lr_scale_getter=functools.partial(
self.model.get_lr_scale, decay=cfg.SOLVER.LAYER_LR_DECAY))
self.optimizer = build_optimizer(param_groups_getter(self.model))
self.scheduler = build_lr_scheduler()
# Build monitor.
self.coordinator = coordinator
self.metrics = collections.OrderedDict()
self.board = build_tensorboard(coordinator.path_at('logs'))
@property
def iter(self):
return self.scheduler._step_count
def snapshot(self):
"""Save the checkpoint of current iterative step."""
f = cfg.SOLVER.SNAPSHOT_PREFIX
f += '_iter_{}.pkl'.format(self.iter)
f = os.path.join(self.coordinator.path_at('checkpoints'), f)
if logging.is_root() and not os.path.exists(f):
torch.save(self.model.state_dict(), f, pickle_protocol=4)
logging.info('Wrote snapshot to: {:s}'.format(f))
def add_metrics(self, stats):
"""Add or update the metrics."""
for k, v in stats['metrics'].items():
if k not in self.metrics:
self.metrics[k] = profiler.SmoothedValue()
self.metrics[k].update(v)
def display_metrics(self, stats):
"""Send metrics to the monitor."""
logging.info('Iteration %d, lr = %.8f, time = %.2fs'
% (stats['iter'], stats['lr'], stats['time']))
for k, v in self.metrics.items():
if k == 'total_loss':
continue
logging.info(' ' * 4 + 'Train net output({}): {:.4f} ({:.4f})'
.format(k, stats['metrics'][k], v.average()))
if self.board is not None:
self.board.scalar_summary('lr', stats['lr'], stats['iter'])
self.board.scalar_summary('time', stats['time'], stats['iter'])
for k, v in self.metrics.items():
self.board.scalar_summary(k, v.average(), stats['iter'])
def step(self):
stats = {'iter': self.iter}
metrics = collections.defaultdict(float)
# Run forward.
timer = profiler.Timer().tic()
inputs = self.loader()
outputs, losses = self.model(inputs), []
for k, v in outputs.items():
if 'loss' in k:
losses.append(v)
loss_val = float(v)
metrics[k] += loss_val
metrics['total_loss'] += loss_val
# Run backward.
losses = sum(losses[1:], losses[0])
if self.loss_scale != 1.0:
losses *= self.loss_scale
losses.backward()
# Apply update.
stats['lr'] = self.scheduler.get_lr()
for group in self.optimizer.param_groups:
group['lr'] = stats['lr'] * group.get('lr_scale', 1.0)
self.optimizer.step()
self.scheduler.step()
stats['time'] = timer.toc()
stats['metrics'] = metrics
return stats
def train_model(self, start_iter=0):
"""Network training loop."""
timer = profiler.Timer()
max_steps = cfg.SOLVER.MAX_STEPS
display_every = cfg.SOLVER.DISPLAY
progress_every = 10 * display_every
snapshot_every = cfg.SOLVER.SNAPSHOT_EVERY
self.scheduler._step_count = start_iter
while self.iter < max_steps:
with timer.tic_and_toc():
stats = self.step()
self.add_metrics(stats)
if stats['iter'] % display_every == 0:
self.display_metrics(stats)
if self.iter % progress_every == 0:
logging.info(profiler.get_progress(timer, self.iter, max_steps))
if self.iter % snapshot_every == 0:
self.snapshot()
self.metrics.clear()
def run_train(coordinator, start_iter=0):
"""Start a network training task."""
trainer = Trainer(coordinator)
logging.info('Start training...')
trainer.train_model(start_iter)
trainer.snapshot()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Training utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
def count_params(module):
"""Return the number of parameters in MB."""
return sum([v.size().numel() for v in module.parameters()]) / 1e6
def freeze_module(module):
"""Freeze parameters of given module."""
module.eval()
for param in module.parameters():
param.requires_grad = False
def get_param_groups(module, lr_scale_getter=None):
"""Separate parameters into groups."""
groups = collections.OrderedDict()
for name, param in module.named_parameters():
if not param.requires_grad:
continue
attrs = collections.OrderedDict()
if lr_scale_getter:
attrs['lr_scale'] = lr_scale_getter(name)
no_weight_decay = not (name.endswith('weight') and param.dim() > 1)
no_weight_decay = getattr(param, 'no_weight_decay', no_weight_decay)
if no_weight_decay:
attrs['weight_decay'] = 0
group_name = '/'.join(['%s:%s' % (v[0], v[1]) for v in list(attrs.items())])
if group_name not in groups:
groups[group_name] = {'params': []}
groups[group_name].update(attrs)
groups[group_name]['params'].append(param)
return list(groups.values())
...@@ -13,7 +13,7 @@ from __future__ import absolute_import ...@@ -13,7 +13,7 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from seetadet.algo.faster_rcnn.anchor_target import AnchorTarget # Modules.
from seetadet.algo.faster_rcnn.data_loader import DataLoader from seetadet.data import datasets
from seetadet.algo.faster_rcnn.proposal import Proposal from seetadet.data import evaluators
from seetadet.algo.faster_rcnn.proposal_target import ProposalTarget from seetadet.data import pipelines
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Anchor generator for RPN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
class AnchorGenerator(object):
"""Generate anchors for bbox regression."""
def __init__(self, strides, sizes, aspect_ratios,
scales_per_octave=1):
self.strides = strides
self.sizes = _align_args(strides, sizes)
self.aspect_ratios = _align_args(strides, aspect_ratios)
for i in range(len(self.sizes)):
octave_sizes = []
for j in range(1, scales_per_octave):
scale = 2 ** (float(j) / scales_per_octave)
octave_sizes += [x * scale for x in self.sizes[i]]
self.sizes[i] += octave_sizes
self.scales = [[x / y for x in z] for y, z in zip(strides, self.sizes)]
self.cell_anchors = []
for i in range(len(strides)):
self.cell_anchors.append(generate_anchors_v2(
strides[i], self.aspect_ratios[i], self.sizes[i]))
self.grid_shapes = None
self.grid_anchors = None
self.grid_coords = None
def reset_grid(self, max_size):
"""Reset the grid."""
self.grid_shapes = [(int(np.ceil(max_size / x)),) * 2 for x in self.strides]
self.grid_coords = self.get_coords(self.grid_shapes)
self.grid_anchors = self.get_anchors(self.grid_shapes)
def num_cell_anchors(self, index=0):
"""Return number of cell anchors."""
return self.cell_anchors[index].shape[0]
def num_anchors(self, shapes):
"""Return the number of grid anchors."""
return sum(self.cell_anchors[i].shape[0] * np.prod(shapes[i])
for i in range(len(shapes)))
def get_coords(self, shapes):
"""Return the x-y coordinates of grid anchors."""
xs, ys = [], []
for i in range(len(shapes)):
height, width = shapes[i]
x, y = np.arange(0, width), np.arange(0, height)
x, y = np.meshgrid(x, y)
# Add A anchors (A,) to cell K shifts (K,)
# to get shift coords (A, K)
xs.append(np.tile(x.flatten(), self.cell_anchors[i].shape[0]))
ys.append(np.tile(y.flatten(), self.cell_anchors[i].shape[0]))
return np.concatenate(xs), np.concatenate(ys)
def get_anchors(self, shapes):
"""Return the grid anchors."""
grid_anchors = []
for i in range(len(shapes)):
h, w = shapes[i]
shift_x = np.arange(0, w) * self.strides[i]
shift_y = np.arange(0, h) * self.strides[i]
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose()
shifts = shifts.astype(self.cell_anchors[i].dtype)
# Add A anchors (A, 1, 4) to cell K shifts (1, K, 4)
# to get shift anchors (A, K, 4)
a, k = self.num_cell_anchors(i), shifts.shape[0]
anchors = (self.cell_anchors[i].reshape((a, 1, 4)) +
shifts.reshape((1, k, 4)))
grid_anchors.append(anchors.reshape((a * k, 4)))
return np.vstack(grid_anchors)
def narrow_anchors(self, shapes, inds, return_anchors=False):
"""Return the valid anchors on given shapes."""
max_shapes = self.grid_shapes
anchors = self.grid_anchors
x_coords, y_coords = self.grid_coords
offset1 = offset2 = num1 = num2 = 0
out_inds, out_anchors = [], []
for i in range(len(max_shapes)):
num1 += self.num_cell_anchors(i) * np.prod(max_shapes[i])
num2 += self.num_cell_anchors(i) * np.prod(shapes[i])
inds_keep = inds[np.where((inds >= offset1) & (inds < num1))[0]]
anchors_keep = anchors[inds_keep] if return_anchors else None
x, y = x_coords[inds_keep], y_coords[inds_keep]
z = ((inds_keep - offset1) // max_shapes[i][1]) // max_shapes[i][0]
keep = np.where((x < shapes[i][1]) & (y < shapes[i][0]))[0]
inds_keep = (z * shapes[i][0] + y) * shapes[i][1] + x + offset2
out_inds.append(inds_keep[keep])
out_anchors.append(anchors_keep[keep] if return_anchors else None)
offset1, offset2 = num1, num2
outputs = [np.concatenate(out_inds)]
if return_anchors:
outputs += [np.concatenate(out_anchors)]
return outputs[0] if len(outputs) == 1 else outputs
def generate_anchors(stride=16, ratios=(0.5, 1, 2), scales=2 ** np.arange(3, 6)):
"""Generate anchors by enumerating aspect ratios and scales."""
base_anchor = np.array([1, 1, stride, stride]) - 1
ratio_anchors = _ratio_enum(base_anchor, ratios)
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in range(ratio_anchors.shape[0])])
return anchors.astype('float32')
def generate_anchors_v2(stride=16, ratios=(0.5, 1, 2), sizes=(32, 64, 128, 256, 512)):
"""Generate anchors by enumerating aspect ratios and sizes."""
scales = np.array(sizes) / stride
return generate_anchors(stride, ratios, scales)
def _whctrs(anchor):
"""Return the xywh of an anchor."""
w = anchor[2] - anchor[0] + 1
h = anchor[3] - anchor[1] + 1
x_ctr = anchor[0] + 0.5 * (w - 1)
y_ctr = anchor[1] + 0.5 * (h - 1)
return w, h, x_ctr, y_ctr
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""Return a sef of anchors by widths, heights and center."""
ws, hs = ws[:, np.newaxis], hs[:, np.newaxis]
return np.hstack((x_ctr - 0.5 * (ws - 1), y_ctr - 0.5 * (hs - 1),
x_ctr + 0.5 * (ws - 1), y_ctr + 0.5 * (hs - 1)))
def _ratio_enum(anchor, ratios):
"""Enumerate a set of anchors by aspect ratios."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws = np.round(np.sqrt(w * h / ratios))
hs = np.round(ws * ratios)
return _mkanchors(ws, hs, x_ctr, y_ctr)
def _scale_enum(anchor, scales):
"""Enumerate a set of anchors by scales."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws, hs = w * scales, h * scales
return _mkanchors(ws, hs, x_ctr, y_ctr)
def _align_args(strides, args):
"""Align the args to the strides."""
args = (args * len(strides)) if len(args) == 1 else args
assert len(args) == len(strides)
return [[x] if not isinstance(x, (tuple, list)) else x[:] for x in args]
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Anchor generator for SSD head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
class AnchorGenerator(object):
"""Generate anchors for bbox regression."""
def __init__(self, strides, sizes, aspect_ratios):
self.strides = strides
self.sizes = _align_args(strides, sizes)
self.aspect_ratios = _align_args(strides, aspect_ratios)
self.scales = [[x / y for x in z] for y, z in zip(strides, self.sizes)]
self.cell_anchors = []
for i in range(len(strides)):
self.cell_anchors.append(generate_anchors(
self.aspect_ratios[i], self.sizes[i]))
self.grid_shapes = None
self.grid_anchors = None
def reset_grid(self, max_size):
"""Reset the grid."""
self.grid_shapes = [(int(np.ceil(max_size / x)),) * 2 for x in self.strides]
self.grid_anchors = self.get_anchors(self.grid_shapes)
def num_cell_anchors(self, index=0):
"""Return number of cell anchors."""
return self.cell_anchors[index].shape[0]
def get_anchors(self, shapes):
"""Return the grid anchors."""
grid_anchors = []
for i in range(len(shapes)):
h, w = shapes[i]
shift_x = (np.arange(0, w) + 0.5) * self.strides[i]
shift_y = (np.arange(0, h) + 0.5) * self.strides[i]
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose()
shifts = shifts.astype(self.cell_anchors[i].dtype)
# Add a anchors (1, A, 4) to cell K shifts (K, 1, 4)
# to get shift anchors (K, A, 4) and reshape to (K * A, 4)
a = self.cell_anchors[i].shape[0]
k = shifts.shape[0]
anchors = (self.cell_anchors[i].reshape((1, a, 4)) +
shifts.reshape((1, k, 4)).transpose((1, 0, 2)))
grid_anchors.append(anchors.reshape((k * a, 4)))
return np.vstack(grid_anchors)
def generate_anchors(ratios, sizes):
"""Generate anchors by enumerating aspect ratios and sizes."""
min_size, max_size = sizes
base_anchor = np.array([0, 0, min_size, min_size])
ratio_anchors = _ratio_enum(base_anchor, ratios)
size_anchors = _size_enum(base_anchor, min_size, max_size)
anchors = np.vstack([ratio_anchors[:1], size_anchors, ratio_anchors[1:]])
return anchors.astype('float32')
def _whctrs(anchor):
"""Return the xywh of an anchor."""
w, h = anchor[2], anchor[3]
x_ctr, y_ctr = anchor[0], anchor[1]
return w, h, x_ctr, y_ctr
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""Return a sef of anchors by widths, heights and center."""
ws, hs = ws[:, np.newaxis], hs[:, np.newaxis]
return np.hstack((x_ctr - 0.5 * ws, y_ctr - 0.5 * hs,
x_ctr + 0.5 * ws, y_ctr + 0.5 * hs))
def _ratio_enum(anchor, ratios):
"""Enumerate a set of anchors by aspect ratios."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
hs = np.round(np.sqrt(w * h / ratios))
ws = np.round(hs * ratios)
return _mkanchors(ws, hs, x_ctr, y_ctr)
def _size_enum(anchor, min_size, max_size):
"""Enumerate a anchor for size wrt base_anchor."""
_, _, x_ctr, y_ctr = _whctrs(anchor)
ws = hs = np.sqrt([min_size * max_size])
return _mkanchors(ws, hs, x_ctr, y_ctr)
def _align_args(strides, args):
"""Align the args to the strides."""
args = (args * len(strides)) if len(args) == 1 else args
assert len(args) == len(strides)
return [[x] if not isinstance(x, (tuple, list)) else x[:] for x in args]
if __name__ == '__main__':
anchor_generator = AnchorGenerator(
strides=(8, 16, 32, 64, 100, 300),
sizes=((30, 60), (60, 110), (110, 162),
(162, 213), (213, 264), (264, 315)),
aspect_ratios=((1, 2, 0.5),
(1, 2, 0.5, 3, 0.33),
(1, 2, 0.5, 3, 0.33),
(1, 2, 0.5, 3, 0.33),
(1, 2, 0.5),
(1, 2, 0.5)))
anchor_generator.reset_grid(max_size=300)
assert anchor_generator.grid_anchors.shape == (8732, 4)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Ground-truth assigners."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from seetadet.utils.bbox import bbox_overlaps
class MaxIoUAssigner(object):
"""Assign ground-truth to boxes according to the IoU."""
def __init__(
self,
pos_iou_thr=0.5,
neg_iou_thr=0.5,
match_low_quality=True,
gt_max_assign_all=True,
):
"""Create a ``MaxIoUAssigner``.
Parameters
----------
pos_iou_thr : float, optional, default=0.5
The minimum IoU overlap to label positives.
neg_iou_thr : float, optional, default=0.5
The maximum IoU overlap to label negatives.
match_low_quality : bool, optional, default=True
Assign boxes for each gt box or not.
gt_max_assign_all : bool, optional, default=True
Assign all boxes with max overlaps for gt boxes or not.
"""
self.pos_iou_thr = pos_iou_thr
self.neg_iou_thr = neg_iou_thr
self.match_low_quality = match_low_quality
self.gt_max_assign_all = gt_max_assign_all
def assign(self, boxes, gt_boxes):
# Initialize assigns with ignored index "-1".
num_boxes = len(boxes)
labels = np.empty((num_boxes,), 'int8')
labels.fill(-1)
# Overlaps between the anchors and the gt boxes.
overlaps = bbox_overlaps(boxes, gt_boxes)
max_overlaps = overlaps.max(axis=1)
# Background: below threshold IoU.
labels[max_overlaps < self.neg_iou_thr] = 0
# Foreground: above threshold IoU.
labels[max_overlaps >= self.pos_iou_thr] = 1
# Foreground: for each gt, assign anchor(s) with highest overlap.
if self.match_low_quality:
if self.gt_max_assign_all:
gt_max_overlaps = overlaps.max(axis=0)
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
else:
gt_argmax_overlaps = overlaps.argmax(axis=0)
labels[gt_argmax_overlaps] = 1
# Return the assigned labels for future development.
return labels
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build for data."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.core.config import cfg
from seetadet.core.registry import Registry
LOADERS = Registry('loaders')
DATASETS = Registry('datasets')
EVALUATORS = Registry('evaluators')
ANCHOR_SAMPLERS = Registry('anchor_samplers')
def build_anchor_sampler():
return ANCHOR_SAMPLERS.try_get(cfg.MODEL.TYPE)()
def build_dataset(path):
"""Build the dataset."""
keys = path.split('://')
if len(keys) >= 2:
return DATASETS.get(keys[0])(keys[1])
return DATASETS.get('kpl')(path)
def build_loader_train(**kwargs):
"""Build the train loader."""
args = {'dataset': cfg.TRAIN.DATASET,
'batch_size': cfg.TRAIN.IMS_PER_BATCH,
'num_workers': cfg.TRAIN.NUM_WORKERS,
'shuffle': True, 'contiguous': True}
args.update(kwargs)
return LOADERS.get(cfg.TRAIN.LOADER)(**args)
def build_loader_test(**kwargs):
"""Build the test loader."""
args = {'dataset': cfg.TEST.DATASET,
'batch_size': cfg.TEST.IMS_PER_BATCH,
'shuffle': False, 'contiguous': False}
args.update(kwargs)
return LOADERS.get(cfg.TEST.LOADER)(**args)
def build_evaluator(**kwargs):
evaluator_type = cfg.TEST.EVALUATOR
if not evaluator_type:
return None
args = {'classes': cfg.MODEL.CLASSES}
if evaluator_type == 'voc2007':
args['use_07_metric'] = True
args.update(kwargs)
evaluator = EVALUATORS.get(evaluator_type)(**args)
ann_file = cfg.TEST.JSON_DATASET
if ann_file:
evaluator.load_annotations(ann_file)
return evaluator
...@@ -8,24 +8,14 @@ ...@@ -8,24 +8,14 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Datasets."""
"""Make record file for Rotated dataset."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from os import path as osp # Classes.
from maker import make_record from seetadet.data.datasets.datum import AnnotatedDatum
if __name__ == '__main__':
data_root = '/data'
make_record( # Modules.
record_file=osp.join(data_root, 'rotated_train'), from seetadet.data.datasets import kpl_dataset
images_path=[osp.join(data_root, 'JPEGImages')],
annotations_path=[osp.join(data_root, 'Annotations')],
splits_path=[osp.join(data_root, 'ImageSets')],
splits=['train']
)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Base dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.core.config import cfg
class Dataset(object):
"""Base dataset class."""
def __init__(self, source):
self.source = source
self.num_images = 0
self.classes = cfg.MODEL.CLASSES
self.num_classes = len(self.classes)
self.class_to_ind = dict(zip(self.classes, range(self.num_classes)))
@property
def type(self):
return type(self)
...@@ -8,6 +8,7 @@ ...@@ -8,6 +8,7 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Annotated datum."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -17,72 +18,41 @@ import cv2 ...@@ -17,72 +18,41 @@ import cv2
import numpy as np import numpy as np
class Example(object): class AnnotatedDatum(object):
"""Wrapper for annotated example.""" """Wrapper for annotated datum."""
def __init__(self, datum): def __init__(self, example):
"""Create a ``Example``. self._example = example
self._img = None
Parameters
----------
datum : Dict
The data loaded for dataset
"""
self._datum = datum
self._image = None
@property @property
def id(self): def id(self):
"""Return the example id. """Return the example id."""
return self._example['id']
Returns
-------
str
The unique id.
"""
return self._datum['id']
@property
def image(self):
"""Return the image array.
Returns
-------
numpy.ndarray
The image array.
"""
if self._image is None:
img_bytes = np.frombuffer(self._datum['content'], 'uint8')
self._image = cv2.imdecode(img_bytes, 3)
return self._image
@property @property
def height(self): def height(self):
"""Return the image height. """Return the image height."""
return self._example['height']
Returns @property
------- def width(self):
int """Return the image width."""
The image height. return self._example['width']
""" @property
return self._datum['height'] def img(self):
"""Return the image array."""
if self._img is None:
img_bytes = np.frombuffer(self._example['content'], 'uint8')
self._img = cv2.imdecode(img_bytes, cv2.IMREAD_COLOR)
return self._img
@property @property
def objects(self): def objects(self):
"""Return the annotated objects. """Return the annotated objects."""
Returns
-------
Sequence[dict]
The objects.
"""
objects = [] objects = []
for ix, obj in enumerate(self._datum['object']): for obj in self._example['object']:
mask = obj.get('mask', None) mask = obj.get('mask', None)
polygons = obj.get('polygons', None) polygons = obj.get('polygons', None)
if 'x3' in obj: if 'x3' in obj:
...@@ -99,25 +69,11 @@ class Example(object): ...@@ -99,25 +69,11 @@ class Example(object):
bbox = [obj['xmin'], obj['ymin'], obj['xmax'], obj['ymax']] bbox = [obj['xmin'], obj['ymin'], obj['xmax'], obj['ymax']]
else: else:
bbox = obj['bbox'] bbox = obj['bbox']
objects.append({ objects.append({'name': obj['name'],
'name': obj['name'],
'bbox': bbox, 'bbox': bbox,
'difficult': obj.get('difficult', 0), 'difficult': obj.get('difficult', 0)})
})
if mask is not None and len(mask) > 0: if mask is not None and len(mask) > 0:
objects[-1]['mask'] = mask objects[-1]['mask'] = mask
elif polygons is not None and len(polygons) > 0: elif polygons is not None and len(polygons) > 0:
objects[-1]['polygons'] = [np.array(p) for p in polygons] objects[-1]['polygons'] = [np.array(p) for p in polygons]
return objects return objects
@property
def width(self):
"""Return the image width.
Returns
-------
int
The image width.
"""
return self._datum['width']
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""KPLRecord dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import dragon
from seetadet.data.build import DATASETS
from seetadet.data.datasets.dataset import Dataset
@DATASETS.register('kpl')
class KPLRecordDataset(Dataset):
def __init__(self, source):
super(KPLRecordDataset, self).__init__(source)
self.num_images = self.type(self.source).size
@property
def type(self):
return dragon.io.KPLRecordDataset
...@@ -8,10 +8,12 @@ ...@@ -8,10 +8,12 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Evaluators."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from seetadet.algo.ssd.data_loader import DataLoader # Modules.
from seetadet.algo.ssd.anchor_target import AnchorTarget from seetadet.data.evaluators import coco_evaluator
from seetadet.data.evaluators import voc_evaluator
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""COCO dataset evaluator."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
import prettytable
from pycocotools.cocoeval import COCOeval
from seetadet.data.build import EVALUATORS
from seetadet.data.evaluators.evaluator import Evaluator
@EVALUATORS.register('coco')
class COCOEvaluator(Evaluator):
"""Evaluator for MS COCO dataset."""
def __init__(self, classes):
super(COCOEvaluator, self).__init__(classes, COCOeval)
def print_eval_results(self, coco_eval):
def get_thr_ind(coco_eval, thr):
ind = np.where((coco_eval.params.iouThrs > thr - 1e-5) &
(coco_eval.params.iouThrs < thr + 1e-5))[0][0]
iou_thr = coco_eval.params.iouThrs[ind]
assert np.isclose(iou_thr, thr)
return ind
ind_lo = get_thr_ind(coco_eval, 0.5)
ind_hi = get_thr_ind(coco_eval, 0.95)
# Precision: (iou, recall, cls, area range, max dets)
# Recall: (iou, cls, area range, max dets)
# Area range index 0: all area ranges
# Max dets index 2: 100 per image
all_prec = coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, :, 0, 2]
all_recall = coco_eval.eval['recall'][ind_lo:(ind_hi + 1), :, 0, 2]
metrics = collections.OrderedDict([
('AP@[IoU=0.5:0.95]', []), ('AR@[IoU=0.5:0.95]', [])])
class_table = prettytable.PrettyTable()
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
ap = np.mean(all_prec[:, :, cls_ind - 1]) # (iou, recall, cls)
recall = np.mean(all_recall[:, cls_ind - 1]) # (iou, cls)
metrics['AP@[IoU=0.5:0.95]'].append(ap)
metrics['AR@[IoU=0.5:0.95]'].append(recall)
for k, v in metrics.items():
v = np.nan_to_num(v, nan=0)
class_table.add_column(k, np.round(v * 100, 2))
class_table.add_column('Class', self.classes[1:])
print('Per class results:\n' + class_table.get_string(), '\n')
print('Summary:')
coco_eval.summarize()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Base evaluator."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import json
import os
import numpy as np
from pycocotools import mask as maskUtils
from pycocotools.coco import COCO
from seetadet.core.config import cfg
from seetadet.utils import logging
from seetadet.utils.mask import paste_masks
class Evaluator(object):
"""Evaluator using COCO json dataset format."""
def __init__(self, classes, eval_type=None):
self.classes = classes
self.num_classes = len(self.classes)
self.class_to_cat_id = dict(zip(self.classes, range(self.num_classes)))
self.eval_type = eval_type
self.cocoGt = None
self.binary_thresh = cfg.TEST.BINARY_THRESH
def load_annotations(self, ann_file=None):
"""Load annotations."""
self.cocoGt = COCO(ann_file)
if len(self.cocoGt.dataset) > 0:
self.class_to_cat_id = dict((v['name'], v['id'])
for v in self.cocoGt.cats.values())
def eval_bbox(self, res_file):
"""Evaluate bbox results."""
if len(self.cocoGt.dataset['annotations']) == 0:
logging.info('No annotations. Skip evaluation.')
return
cocoDt = self.cocoGt.loadRes(res_file)
coco_eval = self.eval_type(self.cocoGt, cocoDt, 'bbox')
coco_eval.evaluate()
coco_eval.accumulate()
self.print_eval_results(coco_eval)
def eval_segm(self, res_file):
"""Evaluate segmentation results."""
if len(self.cocoGt.dataset['annotations']) == 0:
logging.info('No annotations. Skip evaluation.')
return
cocoDt = self.cocoGt.loadRes(res_file)
coco_eval = self.eval_type(self.cocoGt, cocoDt, 'segm')
coco_eval.evaluate()
coco_eval.accumulate()
self.print_eval_results(coco_eval)
def print_eval_results(self, coco_eval):
"""Print the evaluation results."""
def bbox_results_one_category(self, boxes, metas, cat_id):
results = []
for i, img_id in enumerate(metas.keys()):
dets = boxes[i].astype('float64')
if len(dets) == 0:
continue
xs, ys = dets[:, 0], dets[:, 1]
ws, hs = dets[:, 2] - xs + 1, dets[:, 3] - ys + 1
scores = dets[:, -1]
results.extend([{
'image_id': self.get_image_id(img_id),
'category_id': cat_id,
'bbox': [xs[j], ys[j], ws[j], hs[j]],
'score': scores[j],
} for j in range(dets.shape[0])])
return results
def segm_results_one_category(self, boxes, masks, metas, cat_id):
results = []
for i, (img_id, meta) in enumerate(metas.items()):
dets = boxes[i]
if len(dets) == 0:
continue
img_size = (meta['height'], meta['width'])
segms = self.encode_masks(masks[i], dets[:, :4], img_size)
scores = dets[:, -1]
results.extend([{
'image_id': self.get_image_id(img_id),
'category_id': cat_id,
'segmentation': segms[j],
'score': float(scores[j]),
} for j in range(dets.shape[0])])
return results
def write_bbox_results(self, boxes, metas, output_dir):
results = []
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
print('Collecting {} results ({:d}/{:d})'
.format(cls, cls_ind, self.num_classes - 1))
results.extend(self.bbox_results_one_category(
boxes[cls_ind], metas, self.class_to_cat_id[cls]))
res_file = self.get_res_file(output_dir)
print('Writing results json to {}'.format(res_file))
with open(res_file, 'w') as f:
json.dump(results, f)
return res_file
def write_segm_results(self, boxes, masks, metas, output_dir):
results = []
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
print('Collecting {} results ({:d}/{:d})'
.format(cls, cls_ind, self.num_classes - 1))
results.extend(self.segm_results_one_category(
boxes[cls_ind], masks[cls_ind], metas,
self.class_to_cat_id[cls]))
res_file = self.get_res_file(output_dir, 'segm')
print('Writing results json to {}'.format(res_file))
with open(res_file, 'w') as fid:
json.dump(results, fid)
return res_file
def write_annotations(self, metas, output_dir):
dataset = {'images': [], 'categories': [], 'annotations': []}
for img_id, meta in metas.items():
dataset['images'].append({
'id': self.get_image_id(img_id),
'height': meta['height'], 'width': meta['width']})
for cls in self.classes:
if cls == '__background__':
continue
dataset['categories'].append({
'name': cls, 'id': self.class_to_cat_id[cls]})
for img_id, meta in metas.items():
img_size = (meta['height'], meta['width'])
for obj in meta['objects']:
x, y = obj['bbox'][0], obj['bbox'][1]
w, h = obj['bbox'][2] - x + 1, obj['bbox'][3] - y + 1
dataset['annotations'].append({
'id': str(len(dataset['annotations'])),
'bbox': [x, y, w, h],
'area': w * h,
'iscrowd': obj['difficult'],
'image_id': self.get_image_id(img_id),
'category_id': self.class_to_cat_id[obj['name']]})
if 'mask' in obj:
segm = {'size': img_size, 'counts': obj['mask']}
dataset['annotations'][-1]['segmentation'] = segm
elif 'polygons' in obj:
segm = []
for poly in obj['polygons']:
if isinstance(poly, np.ndarray):
poly = poly.tolist()
segm.append(poly)
dataset['annotations'][-1]['segmentation'] = segm
ann_file = self.get_ann_file(output_dir)
print('Writing annotations json to {}'.format(ann_file))
with open(ann_file, 'w') as f:
json.dump(dataset, f)
return ann_file
def encode_masks(self, masks, boxes, size):
segms = maskUtils.encode(paste_masks(
masks, boxes, size, thresh=self.binary_thresh))
for segm in segms:
segm['counts'] = segm['counts'].decode()
return segms
@staticmethod
def get_prefix(type='bbox'):
if type == 'bbox':
return 'detections'
elif type == 'segm':
return 'segmentations'
elif type == 'kpt':
return 'keypoints'
return ''
@staticmethod
def get_ann_file(output_dir):
filename = 'annotations.json'
if not os.path.exists(output_dir):
os.makedirs(output_dir)
return os.path.join(output_dir, filename)
@staticmethod
def get_res_file(output_dir, type='bbox'):
filename = Evaluator.get_prefix(type) + '.json'
if not os.path.exists(output_dir):
os.makedirs(output_dir)
return os.path.join(output_dir, filename)
@staticmethod
def get_image_id(image_name):
image_id = image_name.split('_')[-1].split('.')[0]
try:
return int(image_id)
except ValueError:
return image_name
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Evaluation API on the Pascal VOC dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import datetime
import itertools
import time
import numpy as np
from pycocotools import mask as maskUtils
def voc_ap(rec, prec, use_07_metric=False):
"""Compute VOC AP given precision and recall."""
if use_07_metric:
# 11 point metric.
ap = 0.
for t in np.arange(0., 1.1, 0.1):
if np.sum(rec >= t) == 0:
p = 0
else:
p = np.max(prec[rec >= t])
ap = ap + p / 11.
else:
# Correct AP calculation.
# First append sentinel values at the end.
mrec = np.concatenate(([0.], rec, [1.]))
mpre = np.concatenate(([0.], prec, [0.]))
# Compute the precision envelope.
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# To calculate area under PR curve, look for points.
# where X axis (recall) changes value.
i = np.where(mrec[1:] != mrec[:-1])[0]
# And sum (\Delta recall) * prec
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
class VOCeval(object):
"""Interface for evaluating detection via COCO object."""
def __init__(self, cocoGt=None, cocoDt=None, iouType='bbox',
iouThrs=[0.5, 0.7], use_07_metric=False):
self.cocoGt = cocoGt
self.cocoDt = cocoDt
self.params = Params(iouType)
self.params.iouThrs = iouThrs
self.params.use_07_metric = use_07_metric
if cocoGt is not None:
self.params.imgIds = sorted(cocoGt.getImgIds())
self.params.catIds = sorted(cocoGt.getCatIds())
self.ious = {}
def _prepare(self):
p = self.params
gts = self.cocoGt.loadAnns(
self.cocoGt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
dts = self.cocoDt.loadAnns(
self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
for gt in gts:
gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']
self._gts = collections.defaultdict(list)
self._dts = collections.defaultdict(list)
for gt in gts:
self._gts[gt['image_id'], gt['category_id']].append(gt)
for dt in dts:
self._dts[dt['image_id'], dt['category_id']].append(dt)
self.eval = {}
def evaluate(self):
tic = time.time()
print('Running per image evaluation...')
p = self.params
print('Evaluate annotation type *{}*'.format(p.iouType))
p.imgIds = list(np.unique(p.imgIds))
p.catIds = list(np.unique(p.catIds))
self._prepare()
self.ious = {(imgId, catId): self.computeIoU(imgId, catId)
for imgId in p.imgIds for catId in p.catIds}
self.evalImgs = [self.evaluateImg(imgId, catId)
for catId in p.catIds for imgId in p.imgIds]
toc = time.time()
print('DONE (t={:0.2f}s).'.format(toc - tic))
def accumulate(self, p=None):
print('Accumulating evaluation results...')
tic = time.time()
if not self.evalImgs:
print('Please run evaluate() first')
if p is None:
p = self.params
print('VOC07 metric? ' + ('Yes' if p.use_07_metric else 'No'))
T, K, I = len(p.iouThrs), len(p.catIds), len(p.imgIds)
recall, ap = np.zeros((T, K)), np.zeros((T, K))
for k in range(K):
E = [self.evalImgs[k * I + i] for i in range(I)]
E = [e for e in E if e is not None]
if len(E) == 0:
continue
dtScores = np.concatenate([e['dtScores'] for e in E])
inds = np.argsort(-dtScores)
dtm = np.concatenate([e['dtMatches'] for e in E], axis=1)[:, inds]
dtIg = np.concatenate([e['dtIgnore'] for e in E], axis=1)[:, inds]
gtIg = np.concatenate([e['gtIgnore'] for e in E])
npig = np.count_nonzero(gtIg == 0)
if npig == 0:
continue
tps = np.logical_and(dtm, np.logical_not(dtIg))
fps = np.logical_and(np.logical_not(dtm), np.logical_not(dtIg))
tp_sum = np.cumsum(tps, axis=1).astype(dtype=np.float)
fp_sum = np.cumsum(fps, axis=1).astype(dtype=np.float)
for t, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
nd = len(tp)
rc = tp / npig
pr = tp / np.maximum(tp + fp, np.spacing(1))
recall[t, k] = rc[-1] if nd else 0
ap[t, k] = voc_ap(rc, pr, use_07_metric=p.use_07_metric)
self.eval = {'counts': [T, K],
'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'ap': ap, 'recall': recall}
toc = time.time()
print('DONE (t={:0.2f}s).'.format(toc - tic))
def computeIoU(self, imgId, catId):
p = self.params
gt = self._gts[imgId, catId]
dt = self._dts[imgId, catId]
if len(gt) == 0 and len(dt) == 0:
return []
inds = np.argsort([-d['score'] for d in dt], kind='mergesort')
dt = [dt[i] for i in inds]
if p.iouType == 'segm':
g = [g['segmentation'] for g in gt]
d = [d['segmentation'] for d in dt]
elif p.iouType == 'bbox':
g = [g['bbox'] for g in gt]
d = [d['bbox'] for d in dt]
else:
raise Exception('unknown iouType for iou computation')
iscrowd = [int(o['iscrowd']) for o in gt]
return maskUtils.iou(d, g, iscrowd)
def evaluateImg(self, imgId, catId):
p = self.params
gt = self._gts[imgId, catId]
dt = self._dts[imgId, catId]
if len(gt) == 0 and len(dt) == 0:
return None
for g in gt:
g['_ignore'] = g['ignore']
gtind = np.argsort([g['_ignore'] for g in gt], kind='mergesort')
gt = [gt[i] for i in gtind]
dtind = np.argsort([-d['score'] for d in dt], kind='mergesort')
dt = [dt[i] for i in dtind]
iscrowd = [int(o['iscrowd']) for o in gt]
ious = (self.ious[imgId, catId][:, gtind]
if len(self.ious[imgId, catId]) > 0 else self.ious[imgId, catId])
T, G, D = len(p.iouThrs), len(gt), len(dt)
gtm, dtm = np.zeros((T, G)), np.zeros((T, D))
gtIg, dtIg = np.array([g['_ignore'] for g in gt]), np.zeros((T, D))
for (tind, iou), (dind, d) in itertools.product(
enumerate(p.iouThrs), enumerate(dt)):
m = -1
for gind, g in enumerate(gt):
if gtm[tind, gind] > 0 and not iscrowd[gind]:
continue
if m > -1 and gtIg[m] == 0 and gtIg[gind] == 1:
break
if ious[dind, gind] <= iou:
continue
m = gind
if m == -1:
continue
dtIg[tind, dind] = gtIg[m]
dtm[tind, dind] = gt[m]['id']
gtm[tind, m] = d['id']
return {'image_id': imgId,
'category_id': catId,
'dtMatches': dtm,
'dtScores': [d['score'] for d in dt],
'gtIgnore': gtIg,
'dtIgnore': dtIg}
class Params(object):
"""Params for evaluation API."""
def setDetParams(self):
self.imgIds = []
self.catIds = []
self.iouThrs = [0.5]
self.use_07_metric = False
def __init__(self, iouType='segm'):
if iouType == 'segm' or iouType == 'bbox':
self.setDetParams()
else:
raise Exception('iouType not supported')
self.iouType = iouType
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""VOC dataset evaluator."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import functools
import numpy as np
import prettytable
from seetadet.data.build import EVALUATORS
from seetadet.data.evaluators.evaluator import Evaluator
from seetadet.data.evaluators.voc_eval import VOCeval
@EVALUATORS.register(['voc', 'voc2007', 'voc2010', 'voc2012'])
class VOCEvaluator(Evaluator):
"""Evaluator for Pascal VOC dataset."""
def __init__(self, classes, use_07_metric=False):
eval_type = functools.partial(
VOCeval, iouThrs=[0.5], use_07_metric=use_07_metric)
super(VOCEvaluator, self).__init__(classes, eval_type)
def print_eval_results(self, coco_eval):
metrics = collections.OrderedDict()
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
for k, name in zip(('ap', 'recall'), ('AP', 'AR')):
for i, iou in enumerate(coco_eval.params.iouThrs):
name = '%s@[IoU=%s]' % (name, str(iou))
v = coco_eval.eval[k][i, cls_ind]
if name not in metrics:
metrics[name] = []
metrics[name].append(v)
class_table = prettytable.PrettyTable()
summary_table = prettytable.PrettyTable()
for k, v in metrics.items():
v = np.nan_to_num(v, nan=0)
class_table.add_column(k, np.round(v * 100, 2))
summary_table.add_column(k, [np.round(np.mean(v) * 100, 2)])
class_table.add_column('Class', self.classes[1:])
print('Per class results:\n' + class_table.get_string(), '\n')
print('Summary:\n' + summary_table.get_string())
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import multiprocessing as mp
import time
import threading
import queue
import cv2
import dragon
import numpy as np
from seetadet.core.config import cfg
from seetadet.data.build import build_dataset
from seetadet.utils import logging
from seetadet.utils.blob import blob_vstack
class BalancedQueues(object):
"""Balanced queues."""
def __init__(self, base_queue, num=1):
self.queues = [base_queue]
self.queues += [mp.Queue(base_queue._maxsize) for _ in range(num - 1)]
self.index = 0
def put(self, obj, block=True, timeout=None):
q = self.queues[self.index]
q.put(obj, block=block, timeout=timeout)
self.index = (self.index + 1) % len(self.queues)
def get(self, block=True, timeout=None):
q = self.queues[self.index]
obj = q.get(block=block, timeout=timeout)
self.index = (self.index + 1) % len(self.queues)
return obj
class DataWorkerBase(mp.Process):
"""Base class of data worker."""
def __init__(self):
super(DataWorkerBase, self).__init__(daemon=True)
self.seed = cfg.RNG_SEED
self.reader_queue = None
self.worker_queue = None
def get_outputs(self, inputs):
"""Return the transformed data."""
return inputs
def run(self):
# Disable the opencv threading.
cv2.setNumThreads(1)
# Fix the process-local random seed.
np.random.seed(self.seed)
# Use cached buffer for next 4 examples.
# At least 4 examples prepared for mosaic.
example_buffer = []
# Main prefetch loop.
while True:
while len(example_buffer) < 4:
example_buffer.append(self.reader_queue.get())
outputs = self.get_outputs(example_buffer)
if outputs is not None:
self.worker_queue.put(outputs)
class DataLoaderBase(threading.Thread):
"""Base class of data loader."""
def __init__(self, worker, **kwargs):
super(DataLoaderBase, self).__init__(daemon=True)
self.batch_size = kwargs.get('batch_size', 2)
self.num_readers = kwargs.get('num_readers', 1)
self.num_workers = kwargs.get('num_workers', 3)
self.queue_depth = kwargs.get('queue_depth', 2)
# Initialize distributed group.
rank, group_size = 0, 1
dist_group = dragon.distributed.get_group()
if dist_group is not None:
group_size = dist_group.size
rank = dragon.distributed.get_rank(dist_group)
# Build queues.
self.reader_queue = mp.Queue(self.queue_depth * self.batch_size)
self.worker_queue = mp.Queue(self.queue_depth * self.batch_size)
self.batch_queue = queue.Queue(self.queue_depth)
self.reader_queue = BalancedQueues(self.reader_queue, self.num_workers)
self.worker_queue = BalancedQueues(self.worker_queue, self.num_workers)
# Build readers.
self.readers = []
for i in range(self.num_readers):
part_idx, num_parts = i, self.num_readers
num_parts *= group_size
part_idx += rank * self.num_readers
self.readers.append(dragon.io.DataReader(**kwargs))
self.readers[i]._part_idx = part_idx
self.readers[i]._num_parts = num_parts
self.readers[i]._seed += part_idx
self.readers[i]._reader_queue = self.reader_queue
self.readers[i].start()
time.sleep(0.1)
# Build workers.
self.workers = []
for i in range(self.num_workers):
p = worker(**kwargs)
p.seed += (i + rank * self.num_workers)
p.reader_queue = self.reader_queue.queues[i]
p.worker_queue = self.worker_queue.queues[i]
p.start()
self.workers.append(p)
time.sleep(0.1)
# Register cleanup callbacks.
def cleanup():
def terminate(processes):
for p in processes:
p.terminate()
p.join()
terminate(self.workers)
terminate(self.readers)
import atexit
atexit.register(cleanup)
# Start batch prefetching.
self.start()
def next(self):
"""Return the next batch of data."""
return self.__next__()
def run(self):
"""Main loop."""
def __call__(self):
return self.next()
def __iter__(self):
"""Return the iterator self."""
return self
def __next__(self):
"""Return the next batch of data."""
return self.batch_queue.get()
class DataLoader(DataLoaderBase):
"""Loader to return the batch of data."""
def __init__(self, dataset, worker, **kwargs):
dataset = build_dataset(dataset)
self.contiguous = kwargs.get('contiguous', True)
self.prefetch_count = kwargs.get('prefetch_count', 50)
self.img_mean = cfg.MODEL.PIXEL_MEAN
self.img_align = (cfg.BACKBONE.COARSEST_STRIDE,) * 2
args = {'dataset': dataset.type,
'source': dataset.source,
'classes': dataset.classes,
'shuffle': kwargs.get('shuffle', True),
'batch_size': kwargs.get('batch_size', 1),
'num_workers': kwargs.get('num_workers', 1),
'stick_to_part': dataset.num_images < 100000}
super(DataLoader, self).__init__(worker, **args)
def run(self):
"""Main loop."""
logging.info('Prefetch batches...')
prev_inputs = [self.worker_queue.get()
for _ in range(self.prefetch_count * self.batch_size)]
next_inputs = []
while True:
# Use cached buffer for next N inputs.
# Inputs are sorted to simulate aspect grouping.
if len(next_inputs) == 0:
next_inputs = prev_inputs
if 'aspect_ratio' in next_inputs[0]:
# Inputs are sorted to simulate aspect grouping.
next_inputs.sort(key=lambda d: d['aspect_ratio'][0])
prev_inputs = []
# Collect the next batch.
outputs = collections.defaultdict(list)
for _ in range(self.batch_size):
inputs = next_inputs.pop(0)
for k, v in inputs.items():
outputs[k].extend(v)
prev_inputs.append(self.worker_queue.get())
# Stack batch data.
if self.contiguous:
outputs['img'] = blob_vstack(
outputs['img'], fill_value=self.img_mean,
align=self.img_align)
# Send batch data to consumer.
self.batch_queue.put(outputs)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Data loading pipelines."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.core.config import cfg
from seetadet.data import transforms
from seetadet.data.build import LOADERS
from seetadet.data.build import build_anchor_sampler
from seetadet.data.datasets import AnnotatedDatum
from seetadet.data.loader import DataWorkerBase
from seetadet.data.loader import DataLoader
class DetTrainWorker(DataWorkerBase):
"""Worker that defines a generic train pipeline."""
def __init__(self, **kwargs):
super(DetTrainWorker, self).__init__()
self.parse_boxes = transforms.ParseBoxes()
self.resize = transforms.RandomResize(
scales=cfg.TRAIN.SCALES,
scales_range=cfg.TRAIN.SCALES_RANGE,
max_size=cfg.TRAIN.MAX_SIZE)
self.flip = transforms.RandomFlip()
self.crop = transforms.RandomCrop(crop_size=cfg.AUG.CROP_SIZE)
self.distort = transforms.ColorJitter(cfg.AUG.COLOR_JITTER)
self.anchor_sampler = build_anchor_sampler()
def get_outputs(self, inputs):
datum = AnnotatedDatum(inputs.pop(0))
img, boxes = datum.img, self.parse_boxes(datum)
img, boxes = self.resize(img, boxes)
img, boxes = self.flip(img, boxes)
img, boxes = self.crop(img, boxes)
if len(boxes) == 0:
return None
img = self.distort(img)
height, width = img.shape[:2]
im_scale = self.resize.im_scale
outputs = {'img': [img], 'gt_boxes': [boxes],
'im_info': [(height, width, im_scale)],
'aspect_ratio': [float(height) / float(width)]}
if self.anchor_sampler is not None:
anchor_info = self.anchor_sampler.sample(boxes, outputs['im_info'][0])
for k, v in anchor_info.items():
outputs[k] = [v]
return outputs
class MaskTrainWorker(DataWorkerBase):
"""Worker that defines a generic train pipeline."""
def __init__(self, **kwargs):
super(MaskTrainWorker, self).__init__()
self.parse_boxes = transforms.ParseBoxes()
self.parse_segms = transforms.ParseSegms()
self.resize = transforms.RandomResize(
scales=cfg.TRAIN.SCALES,
scales_range=cfg.TRAIN.SCALES_RANGE,
max_size=cfg.TRAIN.MAX_SIZE)
self.flip = transforms.RandomFlip()
self.distort = transforms.ColorJitter(cfg.AUG.COLOR_JITTER)
self.anchor_sampler = build_anchor_sampler()
def get_outputs(self, inputs):
datum = AnnotatedDatum(inputs.pop(0))
img, boxes = datum.img, self.parse_boxes(datum)
segms, width = self.parse_segms(datum), img.shape[1]
img, boxes = self.resize(img, boxes)
if len(boxes) == 0:
return None
img, boxes = self.flip(img, boxes)
segms = self.flip.apply_segms(segms, width)
img = self.distort(img)
height, width = img.shape[:2]
im_scale = self.resize.im_scale
outputs = {'img': [img], 'gt_boxes': [boxes], 'gt_segms': [segms],
'im_info': [(height, width, im_scale)],
'aspect_ratio': [float(height) / float(width)]}
anchor_info = self.anchor_sampler.sample(boxes, outputs['im_info'][0])
for k, v in anchor_info.items():
outputs[k] = [v]
return outputs
class SSDTrainWorker(DataWorkerBase):
"""DataTransformer."""
def __init__(self, **kwargs):
super(SSDTrainWorker, self).__init__()
self.parse_boxes = transforms.ParseBoxes()
self.paste = transforms.RandomPaste()
self.crop = transforms.RandomBBoxCrop()
self.resize = transforms.RandomResize(
scales=cfg.TRAIN.SCALES, keep_ratio=False)
self.flip = transforms.RandomFlip()
self.distort = transforms.ColorJitter(cfg.AUG.COLOR_JITTER)
self.anchor_sampler = build_anchor_sampler()
def get_outputs(self, inputs):
datum = AnnotatedDatum(inputs.pop(0))
img, boxes = datum.img, self.parse_boxes(datum)
boxes /= [(img.shape[1], img.shape[0]) * 2 + (1,)]
img, boxes = self.paste(img, boxes)
img, boxes = self.crop(img, boxes)
if len(boxes) == 0:
return None
img, _ = self.resize(img)
boxes[:, :4] *= img.shape[0]
img, boxes = self.flip(img, boxes)
img = self.distort(img)
outputs = {'img': [img], 'gt_boxes': [boxes],
'im_info': [img.shape[:2]]}
if self.anchor_sampler is not None:
anchor_info = self.anchor_sampler.sample(boxes, outputs['im_info'][0])
for k, v in anchor_info.items():
outputs[k] = [v]
return outputs
class DetTestWorker(DataWorkerBase):
"""Worker that defines a generic test pipeline."""
def __init__(self, **kwargs):
super(DetTestWorker, self).__init__()
def get_outputs(self, inputs):
datum = AnnotatedDatum(inputs.pop(0))
img, objects = datum.img, datum.objects
outputs = {'img': [img], 'objects': [objects],
'img_meta': [{'id': datum.id,
'height': datum.height,
'width': datum.width}]}
return outputs
LOADERS.register('det_train', DataLoader, worker=DetTrainWorker)
LOADERS.register('mask_train', DataLoader, worker=MaskTrainWorker)
LOADERS.register('ssd_train', DataLoader, worker=SSDTrainWorker)
LOADERS.register('det_test', DataLoader, worker=DetTestWorker)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
import numpy.random as npr
from seetadet.core.config import cfg
from seetadet.ops.normalization import to_tensor
from seetadet.utils.bbox import bbox_overlaps
from seetadet.utils.bbox import bbox_transform
from seetadet.utils.bbox import distribute_boxes
from seetadet.utils.mask import mask_from
class ProposalTargets(object):
"""Generate ground-truth targets for proposals."""
def __init__(self):
super(ProposalTargets, self).__init__()
self.num_classes = len(cfg.MODEL.CLASSES)
self.num_rois = cfg.FRCNN.BATCH_SIZE
self.num_fg_rois = round(cfg.FRCNN.POSITIVE_FRACTION * self.num_rois)
self.pos_iou_thr = cfg.FRCNN.POSITIVE_OVERLAP
self.neg_iou_thr = cfg.FRCNN.NEGATIVE_OVERLAP
self.bbox_reg_weights = cfg.FRCNN.BBOX_REG_WEIGHTS
self.mask_size = (cfg.MRCNN.POOLER_RESOLUTION * 2,) * 2
self.lvl_min, self.lvl_max = cfg.FRCNN.MIN_LEVEL, cfg.FRCNN.MAX_LEVEL
self.defaults = {'rois': np.array([[-1, 0, 0, 1, 1]], 'float32'),
'labels': np.array([-1], 'int64'),
'bbox_targets': np.zeros((1, 4), 'float32'),
'mask_targets': np.full((1,) + self.mask_size, -1, 'float32')}
def sample_rois(self, rois, gt_boxes):
"""Sample positive and negative RoIs."""
# Compute overlaps between RoIs and ground-truth boxes.
overlaps = bbox_overlaps(rois[:, 1:5], gt_boxes[:, :4])
max_overlaps = overlaps.max(axis=1)
# Assign with the ground-truth boxes taken the highest IoU.
gt_assignments = overlaps.argmax(axis=1)
labels = gt_boxes[gt_assignments, 4].astype('int64')
# Select foreground regions.
pos_iou_thr = self.pos_iou_thr
fg_inds = np.where(max_overlaps >= pos_iou_thr)[0]
while fg_inds.size == 0:
pos_iou_thr -= 0.01
fg_inds = np.where(max_overlaps >= pos_iou_thr)[0]
# Select background regions.
bg_inds = np.where(max_overlaps < self.neg_iou_thr)[0]
# Sample foreground regions without replacement.
num_fg_rois = int(min(self.num_fg_rois, fg_inds.size))
fg_inds = npr.choice(fg_inds, num_fg_rois, False)
# Sample background regions without replacement.
num_bg_rois = self.num_rois - num_fg_rois
num_bg_rois = min(num_bg_rois, bg_inds.size)
if bg_inds.size > 0:
bg_inds = npr.choice(bg_inds, num_bg_rois, False)
# Take values via sampled indices.
keep_inds = np.append(fg_inds, bg_inds)
rois, labels = rois[keep_inds], labels[keep_inds]
gt_assignments = gt_assignments[keep_inds]
# Reassign background regions.
labels[num_fg_rois:] = 0
return rois, labels, gt_assignments
def distribute_blobs(self, blobs, lvls):
"""Distribute blobs on given levels."""
outputs = collections.defaultdict(list)
lvl_inds = [np.where(lvls == (i + self.lvl_min))[0]
for i in range(self.lvl_max - self.lvl_min + 1)]
for inds in lvl_inds:
for key, blob in blobs.items():
outputs[key].append(blob[inds] if len(inds) > 0
else self.defaults[key])
return outputs
def get_bbox_targets(self, rois, boxes):
return bbox_transform(rois, boxes, weights=self.bbox_reg_weights)
def get_mask_targets(self, rois, segms, inds):
targets = np.full((len(rois),) + self.mask_size, -1, 'float32')
for i in inds:
if segms[i] is not None:
targets[i] = mask_from(segms[i], self.mask_size, rois[i])
return targets
def compute(self, **inputs):
"""Compute proposal targets."""
blobs = collections.defaultdict(list)
all_rois = inputs['rois']
batch_inds = all_rois[:, 0].astype('int32')
# Compute targets per image.
for i, gt_boxes in enumerate(inputs['gt_boxes']):
# Select proposals of this image.
rois = all_rois[np.where(batch_inds == i)[0]]
# Include ground-truth boxes in the set of candidates.
inds = np.ones((gt_boxes.shape[0], 1), gt_boxes.dtype) * i
rois = np.vstack((rois, np.hstack((inds, gt_boxes[:, :4]))))
# Sample a batch of RoIs for training.
rois, labels, gt_assignments = self.sample_rois(rois, gt_boxes)
# Fill blobs.
blobs['rois'].append(rois)
blobs['labels'].append(labels)
blobs['bbox_targets'].append(self.get_bbox_targets(
rois[:, 1:5], gt_boxes[gt_assignments, :4]))
if 'gt_segms' in inputs:
fg_inds = np.where(labels > 0)[0]
segms = [inputs['gt_segms'][i][j] for j in gt_assignments]
blobs['mask_targets'].append(self.get_mask_targets(
rois[:, 1:5] / inputs['im_info'][i][2], segms, fg_inds))
# Concat to get the contiguous blobs.
blobs = dict((k, np.concatenate(blobs[k])) for k in blobs.keys())
# Distribute blobs by the level of all ROIs.
lvls = distribute_boxes(blobs['rois'][:, 1:], self.lvl_min, self.lvl_max)
blobs = self.distribute_blobs(blobs, lvls)
# Add the targets using foreground ROIs only.
for lvl in range(self.lvl_max - self.lvl_min + 1):
inds = np.where(blobs['labels'][lvl] > 0)[0]
if len(inds) > 0:
blobs['fg_rois'].append(blobs['rois'][lvl][inds])
blobs['mask_labels'].append(blobs['labels'][lvl][inds] - 1)
if 'mask_targets' in blobs:
blobs['mask_targets'][lvl] = blobs['mask_targets'][lvl][inds]
else:
blobs['fg_rois'].append(self.defaults['rois'])
blobs['mask_labels'].append(np.array([0], 'int64'))
if 'mask_targets' in blobs:
blobs['mask_targets'][lvl] = self.defaults['mask_targets']
# Concat to get contiguous blobs along the levels.
rois, fg_rois = blobs['rois'], blobs['fg_rois']
blobs = dict((k, np.concatenate(blobs[k])) for k in blobs.keys())
# Compute class-specific strides.
bbox_strides = np.arange(len(blobs['rois'])) * self.num_classes
mask_strides = np.arange(len(blobs['fg_rois'])) * (self.num_classes - 1)
# Select the foreground RoIs for bbox targets.
fg_inds = np.where(blobs['labels'] > 0)[0]
if len(fg_inds) == 0:
# Sample a proposal randomly to avoid memory issue.
fg_inds = npr.randint(len(blobs['labels']), size=[1])
outputs = {
'rois': [to_tensor(rois[i]) for i in range(len(rois))],
'fg_rois': [to_tensor(fg_rois[i]) for i in range(len(fg_rois))],
'labels': to_tensor(blobs['labels']),
'bbox_inds': to_tensor(bbox_strides[fg_inds] + blobs['labels'][fg_inds]),
'mask_inds': to_tensor(mask_strides + blobs['mask_labels']),
'bbox_targets': to_tensor(blobs['bbox_targets'][fg_inds]),
'bbox_anchors': to_tensor(blobs['rois'][fg_inds, 1:]),
}
if 'mask_targets' in blobs:
outputs['mask_targets'] = to_tensor(blobs['mask_targets'])
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
from seetadet.core.config import cfg
from seetadet.data.build import ANCHOR_SAMPLERS
from seetadet.data.anchors.rpn import AnchorGenerator
from seetadet.data.assigners import MaxIoUAssigner
from seetadet.ops.normalization import to_tensor
from seetadet.utils.bbox import bbox_overlaps
from seetadet.utils.bbox import bbox_transform
@ANCHOR_SAMPLERS.register('retinanet')
class AnchorTargets(object):
"""Generate ground-truth targets for anchors."""
def __init__(self):
super(AnchorTargets, self).__init__()
self.generator = AnchorGenerator(
strides=cfg.ANCHOR_GENERATOR.STRIDES,
sizes=cfg.ANCHOR_GENERATOR.SIZES,
aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS,
scales_per_octave=3)
self.assigner = MaxIoUAssigner(
pos_iou_thr=cfg.RETINANET.POSITIVE_OVERLAP,
neg_iou_thr=cfg.RETINANET.NEGATIVE_OVERLAP)
max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
if cfg.BACKBONE.COARSEST_STRIDE > 0:
stride = float(cfg.BACKBONE.COARSEST_STRIDE)
max_size = int(np.ceil(max_size / stride) * stride)
self.generator.reset_grid(max_size)
def sample(self, gt_boxes, im_info, anchors=None):
"""Sample positive and negative anchors."""
# Remove anchors separating from the image.
anchors = self.generator.grid_anchors
inds_inside = np.where((anchors[:, 0] < im_info[1]) &
(anchors[:, 1] < im_info[0]))[0]
anchors = anchors[inds_inside, :]
# Assign ground-truth according to the IoU.
labels = self.assigner.assign(anchors, gt_boxes)
# Select foreground and ignored indices
# to avoid too many backgrounds.
# (~100x faster for 200k background indices)
return {'fg_inds': inds_inside[np.where(labels > 0)[0]],
'bg_inds': inds_inside[np.where(labels < 0)[0]]}
def compute(self, **inputs):
"""Compute anchor targets."""
shapes = [x[:2] for x in inputs['grid_info']]
num_images = len(inputs['gt_boxes'])
num_anchors = self.generator.num_anchors(shapes)
blobs = collections.defaultdict(list)
# "1" is positive, "0" is negative, "-1" is don't care.
labels = np.zeros((num_images, num_anchors), 'int64')
for i, gt_boxes in enumerate(inputs['gt_boxes']):
fg_inds = inputs['fg_inds'][i]
ignore_inds = inputs['bg_inds'][i]
# Narrow anchors to match the feature layout.
ignore_inds = self.generator.narrow_anchors(shapes, ignore_inds)
fg_inds, anchors = self.generator.narrow_anchors(shapes, fg_inds, True)
# Compute bbox targets.
gt_assignments = bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
bbox_targets = bbox_transform(anchors, gt_boxes[gt_assignments, :4])
blobs['bbox_anchors'].append(anchors)
blobs['bbox_targets'].append(bbox_targets)
# Compute label assignments.
labels[i, ignore_inds] = -1
labels[i, fg_inds] = gt_boxes[gt_assignments, 4]
# Compute sparse indices.
fg_inds += i * num_anchors
blobs['bbox_inds'].extend([fg_inds])
return {
'labels': to_tensor(labels),
'bbox_inds': to_tensor(np.hstack(blobs['bbox_inds'])),
'bbox_targets': to_tensor(np.vstack(blobs['bbox_targets'])),
'bbox_anchors': to_tensor(np.vstack(blobs['bbox_anchors'])),
}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Generate targets for RPN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
import numpy.random as npr
from seetadet.core.config import cfg
from seetadet.data.anchors.rpn import AnchorGenerator
from seetadet.data.assigners import MaxIoUAssigner
from seetadet.data.build import ANCHOR_SAMPLERS
from seetadet.ops.normalization import to_tensor
from seetadet.utils.bbox import bbox_overlaps
from seetadet.utils.bbox import bbox_transform
@ANCHOR_SAMPLERS.register(['faster_rcnn', 'mask_rcnn'])
class AnchorTargets(object):
"""Generate ground-truth targets for anchors."""
def __init__(self):
super(AnchorTargets, self).__init__()
self.generator = AnchorGenerator(
strides=cfg.ANCHOR_GENERATOR.STRIDES,
sizes=cfg.ANCHOR_GENERATOR.SIZES,
aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS)
self.assigner = MaxIoUAssigner(
pos_iou_thr=cfg.RPN.POSITIVE_OVERLAP,
neg_iou_thr=cfg.RPN.NEGATIVE_OVERLAP)
max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
if cfg.BACKBONE.COARSEST_STRIDE > 0:
stride = float(cfg.BACKBONE.COARSEST_STRIDE)
max_size = int(np.ceil(max_size / stride) * stride)
self.generator.reset_grid(max_size)
def sample(self, gt_boxes, im_info):
"""Sample positive and negative anchors."""
# Only keep anchors inside the image.
anchors = self.generator.grid_anchors
inds_inside = np.where((anchors[:, 0] >= 0) &
(anchors[:, 1] >= 0) &
(anchors[:, 2] < im_info[1]) &
(anchors[:, 3] < im_info[0]))[0]
anchors = anchors[inds_inside, :]
# Assign ground-truth according to the IoU.
labels = self.assigner.assign(anchors, gt_boxes)
# Sample positive labels if we have too many.
fg_inds = np.where(labels > 0)[0]
num_fg = int(cfg.RPN.POSITIVE_FRACTION * cfg.RPN.BATCH_SIZE)
if len(fg_inds) > num_fg:
fg_inds = npr.choice(fg_inds, num_fg, False)
# Sample negative labels if we have too many.
num_bg = cfg.RPN.BATCH_SIZE - len(fg_inds)
bg_inds = np.where(labels == 0)[0]
if len(bg_inds) > num_bg:
bg_inds = npr.choice(bg_inds, num_bg, False)
# Select foreground and background indices.
return {'fg_inds': inds_inside[fg_inds],
'bg_inds': inds_inside[bg_inds]}
def compute(self, **inputs):
"""Compute anchor targets."""
shapes = [x[:2] for x in inputs['grid_info']]
num_anchors = self.generator.num_anchors(shapes)
blobs = collections.defaultdict(list)
for i, gt_boxes in enumerate(inputs['gt_boxes']):
fg_inds = inputs['fg_inds'][i]
bg_inds = inputs['bg_inds'][i]
# Narrow anchors to match the feature layout.
bg_inds = self.generator.narrow_anchors(shapes, bg_inds)
fg_inds, anchors = self.generator.narrow_anchors(shapes, fg_inds, True)
# Compute bbox targets.
gt_assignments = bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
bbox_targets = bbox_transform(anchors, gt_boxes[gt_assignments, :4])
blobs['bbox_anchors'].append(anchors)
blobs['bbox_targets'].append(bbox_targets)
# Compute sparse indices.
fg_inds += i * num_anchors
bg_inds += i * num_anchors
blobs['cls_inds'] += [fg_inds, bg_inds]
blobs['bbox_inds'] += [fg_inds]
blobs['labels'] += [np.ones_like(fg_inds, 'float32'),
np.zeros_like(bg_inds, 'float32')]
return {
'labels': to_tensor(np.hstack(blobs['labels'])),
'cls_inds': to_tensor(np.hstack(blobs['cls_inds'])),
'bbox_inds': to_tensor(np.hstack(blobs['bbox_inds'])),
'bbox_targets': to_tensor(np.vstack(blobs['bbox_targets'])),
'bbox_anchors': to_tensor(np.vstack(blobs['bbox_anchors'])),
}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Generate targets for SSD head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
from seetadet.core.config import cfg
from seetadet.data.build import ANCHOR_SAMPLERS
from seetadet.data.anchors.ssd import AnchorGenerator
from seetadet.data.assigners import MaxIoUAssigner
from seetadet.ops.normalization import to_tensor
from seetadet.utils.bbox import bbox_overlaps
from seetadet.utils.bbox import bbox_transform
@ANCHOR_SAMPLERS.register('ssd')
class AnchorTargets(object):
"""Generate ground-truth targets for anchors."""
def __init__(self):
super(AnchorTargets, self).__init__()
self.generator = AnchorGenerator(
strides=cfg.ANCHOR_GENERATOR.STRIDES,
sizes=cfg.ANCHOR_GENERATOR.SIZES,
aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS)
self.assigner = MaxIoUAssigner(
pos_iou_thr=cfg.SSD.POSITIVE_OVERLAP,
neg_iou_thr=cfg.SSD.NEGATIVE_OVERLAP,
gt_max_assign_all=False)
self.neg_pos_ratio = (1.0 / cfg.SSD.POSITIVE_FRACTION) - 1.0
max_size = cfg.ANCHOR_GENERATOR.STRIDES[-1]
self.generator.reset_grid(max_size)
def sample(self, gt_boxes, im_info):
"""Sample positive and negative anchors."""
anchors = self.generator.grid_anchors
# Assign ground-truth according to the IoU.
labels = self.assigner.assign(anchors, gt_boxes)
# Select positive and non-positive indices.
return {'fg_inds': np.where(labels > 0)[0],
'bg_inds': np.where(labels <= 0)[0]}
def compute(self, **inputs):
"""Compute anchor targets."""
num_images = len(inputs['gt_boxes'])
num_anchors = self.generator.grid_anchors.shape[0]
cls_score = inputs['cls_score'].numpy().astype('float32')
blobs = collections.defaultdict(list)
# "1" is positive, "0" is negative, "-1" is don't care
labels = np.full((num_images, num_anchors,), -1, 'int64')
for i, gt_boxes in enumerate(inputs['gt_boxes']):
fg_inds = pos_inds = inputs['fg_inds'][i]
neg_inds = inputs['bg_inds'][i]
# Mining hard negatives as background.
num_pos, num_neg = len(pos_inds), len(neg_inds)
num_bg = min(int(num_pos * self.neg_pos_ratio), num_neg)
neg_score = cls_score[i, neg_inds, 0]
bg_inds = neg_inds[np.argsort(neg_score)][:num_bg]
# Compute bbox targets.
anchors = self.generator.grid_anchors[fg_inds]
gt_assignments = bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
bbox_targets = bbox_transform(anchors, gt_boxes[gt_assignments, :4],
weights=cfg.SSD.BBOX_REG_WEIGHTS)
blobs['bbox_anchors'].append(anchors)
blobs['bbox_targets'].append(bbox_targets)
# Compute label assignments.
labels[i, bg_inds] = 0
labels[i, fg_inds] = gt_boxes[gt_assignments, 4]
# Compute sparse indices.
fg_inds += i * num_anchors
blobs['bbox_inds'].extend([fg_inds])
return {
'labels': to_tensor(labels),
'bbox_inds': to_tensor(np.hstack(blobs['bbox_inds'])),
'bbox_targets': to_tensor(np.vstack(blobs['bbox_targets'])),
'bbox_anchors': to_tensor(np.vstack(blobs['bbox_anchors'])),
}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import numpy.random as npr
from seetadet.core.config import cfg
from seetadet.utils.bbox import boxes_iou
from seetadet.utils.bbox import clip_boxes
from seetadet.utils.bbox import flip_boxes
from seetadet.utils.bbox import flip_polygons
from seetadet.utils.image import im_resize
from seetadet.utils.image import color_jitter
from seetadet.utils.mask import mask_from
class ParseBoxes(object):
"""Parse the ground-truth boxes."""
def __init__(self):
self.classes = cfg.MODEL.CLASSES
self.num_classes = len(self.classes)
self.class_indices = dict(zip(self.classes, range(self.num_classes)))
self.use_diff = cfg.TRAIN.USE_DIFF
def __call__(self, datum):
objects, num_objects = datum.objects, 0
height, width = datum.height, datum.width
if not self.use_diff:
for obj in objects:
if obj.get('difficult', 0) == 0:
num_objects += 1
else:
num_objects = len(objects)
boxes = np.zeros((num_objects, 4), 'float32')
classes = np.zeros((num_objects, 1), 'float32')
# Filter the difficult instances.
object_idx = 0
for obj in objects:
if not self.use_diff and obj.get('difficult', 0) > 0:
continue
bbox = obj['bbox']
boxes[object_idx, :] = [max(0, bbox[0]),
max(0, bbox[1]),
min(bbox[2], width - 1),
min(bbox[3], height - 1)]
classes[object_idx, :] = self.class_indices[obj['name']]
object_idx += 1
# Attach the classes.
cls_boxes = np.empty((len(boxes), 5), 'float32')
cls_boxes[:, :4], cls_boxes[:, 4:] = boxes, classes
return cls_boxes
class ParseSegms(object):
"""Parse the ground-truth segmentations."""
def __init__(self):
self.use_diff = cfg.TRAIN.USE_DIFF
def __call__(self, datum):
objects, num_objects = datum.objects, 0
height, width = datum.height, datum.width
if not self.use_diff:
for obj in objects:
if obj.get('difficult', 0) == 0:
num_objects += 1
else:
num_objects = len(objects)
segms = []
# Filter the difficult instances.
object_idx = 0
for obj in objects:
if not self.use_diff and obj.get('difficult', 0) > 0:
continue
if 'mask' in obj:
segms.append(mask_from(obj['mask'], (height, width)))
elif 'polygons' in obj:
segms.append(obj['polygons'])
else:
segms.append(None)
object_idx += 1
return segms
class RandomMosaic(object):
"""Copy images into a 4x4 grid canvas."""
def __init__(self, size=None):
if size is None:
size = cfg.AUG.MOSAIC_SIZE
self._prob = cfg.AUG.MOSAIC
self._pixel_mean = cfg.MODEL.PIXEL_MEAN
if isinstance(size, (tuple, list)):
self._size_h, self._size_w = size[:2]
else:
self._size_h = self._size_w = int(size)
self._out_h = self._size_h * 2
self._out_w = self._size_w * 2
self._enabled = self._out_h > 0 and self._out_w > 0
@property
def enabled(self):
return self._enabled and npr.rand() < self._prob
@staticmethod
def _coords_to_slice(coords):
x1, y1, x2, y2 = coords
return slice(y1, y2), slice(x1, x2)
def _get_coords(self, index, w, h, x_ctr, y_ctr):
if index == 0:
x1, y1 = max(x_ctr - w, 0), max(y_ctr - h, 0)
x2, y2 = x_ctr, y_ctr
coords = (w - (x2 - x1), h - (y2 - y1), w, h)
elif index == 1:
x1, y1 = x_ctr, max(y_ctr - h, 0)
x2, y2 = min(x_ctr + w, self._out_w), y_ctr
coords = (0, h - (y2 - y1), min(w, x2 - x1), h)
elif index == 2:
x1, y1 = max(x_ctr - w, 0), y_ctr
x2, y2 = x_ctr, min(self._out_h, y_ctr + h)
coords = (w - (x2 - x1), 0, w, min(y2 - y1, h))
else:
x1, y1 = x_ctr, y_ctr
x2, y2 = min(x_ctr + w, self._out_w), min(self._out_h, y_ctr + h)
coords = (0, 0, min(w, x2 - x1), min(y2 - y1, h))
out_coords = self._coords_to_slice((x1, y1, x2, y2))
coords = self._coords_to_slice(coords)
return out_coords, coords
def __call__(self, img_list, boxes_list):
out_shape = list(img_list[0].shape)
out_shape[:2] = (self._out_h, self._out_w)
y_ctr = int(npr.uniform(0.5 * self._size_h, 1.5 * self._size_h))
x_ctr = int(npr.uniform(0.5 * self._size_w, 1.5 * self._size_w))
out_img = np.empty(out_shape, img_list[0].dtype)
out_img[:], out_boxes = self._pixel_mean, []
for i in range(4):
img, boxes = img_list[i], boxes_list[i]
h, w = img.shape[:2]
im_scale = min(self._size_h / float(h), self._size_w / float(w))
img = im_resize(img, scale=im_scale)
h, w = img.shape[:2]
out_coords, coords = self._get_coords(i, w, h, x_ctr, y_ctr)
h_offset = out_coords[0].start - coords[0].start
w_offset = out_coords[1].start - coords[1].start
out_img[out_coords] = img[coords]
boxes[:, (0, 2)] = boxes[:, (0, 2)] * im_scale + w_offset
boxes[:, (1, 3)] = boxes[:, (1, 3)] * im_scale + h_offset
boxes = clip_boxes(boxes, out_img.shape)
valid_inds = (boxes[:, 2] > boxes[:, 0]) & (boxes[:, 3] > boxes[:, 1])
out_boxes.append(boxes[valid_inds])
out_boxes = np.vstack(out_boxes)
return out_img, out_boxes
class RandomFlip(object):
"""Flip the image randomly."""
def __init__(self, prob=0.5):
self.prob = prob
self.is_flipped = False
def apply_segms(self, segms, width):
for i, segm in enumerate(segms):
if not self.is_flipped or segm is None:
continue
if isinstance(segm, np.ndarray):
segm = segm[:, ::-1]
else:
segm = flip_polygons(segm, width)
segms[i] = segm
return segms
def __call__(self, img, boxes=None):
self.is_flipped = npr.rand() < self.prob
img = img[:, ::-1] if self.is_flipped else img
if boxes is not None and self.is_flipped:
boxes = flip_boxes(boxes, img.shape[1])
return img, boxes
class RandomResize(object):
"""Resize the image randomly."""
def __init__(
self,
scales=(640,),
scales_range=(1.0, 1.0),
max_size=1066,
keep_ratio=True,
):
self.scales = scales
self.scales_range = scales_range
self.max_size = max_size
self.keep_ratio = keep_ratio
self.im_scale = 1.0
self.im_scale_factor = 1.0
def __call__(self, img, boxes=None):
im_shape = img.shape
target_size = npr.choice(self.scales)
if self.keep_ratio:
# Scale along the shortest side.
max_size = max(self.max_size, target_size)
im_size_min = np.min(im_shape[:2])
im_size_max = np.max(im_shape[:2])
self.im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than *MAX_SIZE*.
if np.round(self.im_scale * im_size_max) > max_size:
self.im_scale = float(max_size) / float(im_size_max)
# Apply the scale jitter to get a range of dynamic scales.
r = self.scales_range
self.im_scale_factor = r[0] + npr.rand() * (r[1] - r[0])
self.im_scale *= self.im_scale_factor
img = im_resize(img, scale=self.im_scale)
if boxes is not None:
boxes[:, :4] *= self.im_scale
else:
self.im_scale = (float(target_size) / float(im_shape[0]),
float(target_size) / float(im_shape[1]))
img = im_resize(img, size=target_size)
if boxes is not None:
boxes[:, (0, 2)] = boxes[:, (0, 2)] * self.im_scale[1]
boxes[:, (1, 3)] = boxes[:, (1, 3)] * self.im_scale[0]
return img, boxes
class RandomPaste(object):
"""Copy image into a larger canvas randomly."""
def __init__(self, prob=0.5):
self.ratio = 1. / cfg.TRAIN.SCALES_RANGE[0]
self.prob = prob if self.ratio > 1 else 0
self.pixel_mean = cfg.MODEL.PIXEL_MEAN
def __call__(self, img, boxes):
if npr.rand() > self.prob:
return img, boxes
im_shape = list(img.shape)
h, w = im_shape[:2]
ratio = npr.uniform(1., self.ratio)
out_h, out_w = int(h * ratio), int(w * ratio)
y1 = int(np.floor(npr.uniform(0., out_h - h)))
x1 = int(np.floor(npr.uniform(0., out_w - w)))
im_shape[:2] = (out_h, out_w)
out_img = np.empty(im_shape, dtype=img.dtype)
out_img[:] = self.pixel_mean
out_img[y1:y1 + h, x1:x1 + w, :] = img
out_boxes = boxes.astype(boxes.dtype, copy=True)
out_boxes[:, (0, 2)] = (boxes[:, (0, 2)] * w + x1) / out_w
out_boxes[:, (1, 3)] = (boxes[:, (1, 3)] * h + y1) / out_h
return out_img, out_boxes
class RandomCrop(object):
"""Crop the image randomly."""
def __init__(self, crop_size=512):
self.crop_size = crop_size
self.pixel_mean = cfg.MODEL.PIXEL_MEAN
def __call__(self, img, boxes):
if self.crop_size <= 0:
return img, boxes
im_shape = list(img.shape)
h, w = im_shape[:2]
out_h, out_w = (self.crop_size,) * 2
y1 = npr.randint(max(h - out_h, 0) + 1)
x1 = npr.randint(max(w - out_w, 0) + 1)
im_shape[:2] = (out_h, out_w)
out_img = np.empty(im_shape, dtype=img.dtype)
out_img[:] = self.pixel_mean
out_img[:h, :w] = img[y1:y1 + out_h, x1:x1 + out_w]
img = out_img
boxes[:, (0, 2)] -= x1
boxes[:, (1, 3)] -= y1
boxes = clip_boxes(boxes, img.shape)
valid_inds = (boxes[:, 2] > boxes[:, 0]) & (boxes[:, 3] > boxes[:, 1])
boxes = boxes[valid_inds]
return img, boxes
class ColorJitter(object):
"""Distort the brightness, contrast and color of image."""
def __init__(self, prob=0.5):
self.prob = prob
self.brightness_range = (0.875, 1.125)
self.contrast_range = (0.5, 1.5)
self.saturation_range = (0.5, 1.5)
def __call__(self, img):
brightness = contrast = saturation = None
if npr.rand() < self.prob:
brightness = self.brightness_range
if npr.rand() < self.prob:
contrast = self.contrast_range
if npr.rand() < self.prob:
saturation = self.saturation_range
return color_jitter(img, brightness=brightness,
contrast=contrast, saturation=saturation)
class RandomBBoxCrop(object):
"""Crop image by sampling a region restricted by bounding boxes."""
def __init__(self, scales_range=(0.3, 1.0), aspect_ratios_range=(0.5, 2.0),
overlaps=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9)):
self.samplers = [{}]
for ov in overlaps:
self.samplers.append({
'scales_range': scales_range,
'aspect_ratios_range': aspect_ratios_range,
'overlaps_range': (ov, 1.0), 'max_trials': 10})
@classmethod
def generate_sample(cls, param):
scales_range = param.get('scales_range', (1.0, 1.0))
aspect_ratios_range = param.get('aspect_ratios_range', (1.0, 1.0))
scale = npr.uniform(scales_range[0], scales_range[1])
min_aspect_ratio = max(aspect_ratios_range[0], scale**2)
max_aspect_ratio = min(aspect_ratios_range[1], 1. / (scale**2))
aspect_ratio = npr.uniform(min_aspect_ratio, max_aspect_ratio)
bbox_w = scale * (aspect_ratio ** 0.5)
bbox_h = scale / (aspect_ratio ** 0.5)
w_off = npr.uniform(0., 1. - bbox_w)
h_off = npr.uniform(0., 1. - bbox_h)
return np.array([w_off, h_off, w_off + bbox_w, h_off + bbox_h])
@staticmethod
def check_center(sample_box, boxes):
x_ctr = (boxes[:, 2] + boxes[:, 0]) / 2.0
y_ctr = (boxes[:, 3] + boxes[:, 1]) / 2.0
# Keep the ground-truth box whose center is in the sample box.
keep = np.where((x_ctr >= sample_box[0]) & (x_ctr <= sample_box[2]) &
(y_ctr >= sample_box[1]) & (y_ctr <= sample_box[3]))[0]
return len(keep) > 0
@staticmethod
def check_overlap(sample_box, boxes, param):
overlaps_range = param.get('overlaps_range', (0.0, 1.0))
if overlaps_range[0] == 0.0 and overlaps_range[1] == 1.0:
return True
ovmax = boxes_iou(sample_box[None, :], boxes[:, :4]).max()
if ovmax < overlaps_range[0] or ovmax > overlaps_range[1]:
return False
return True
def generate_batch_samples(self, boxes):
sample_boxes = []
for sampler in self.samplers:
found, max_trails = 0, sampler.get('max_trials', 1)
for _ in range(max_trails):
if found >= 1:
break
sample_box = self.generate_sample(sampler)
if not self.check_overlap(sample_box, boxes, sampler):
continue
if not self.check_center(sample_box, boxes):
continue
found += 1
sample_boxes.append(sample_box)
return sample_boxes
@classmethod
def crop(cls, img, crop_box, boxes=None):
h, w = img.shape[:2]
w_offset = int(crop_box[0] * w)
h_offset = int(crop_box[1] * h)
crop_w = int((crop_box[2] - crop_box[0]) * w)
crop_h = int((crop_box[3] - crop_box[1]) * h)
img = img[h_offset:h_offset + crop_h, w_offset:w_offset + crop_w]
if boxes is not None:
x_ctr = (boxes[:, 2] + boxes[:, 0]) / 2.0
y_ctr = (boxes[:, 3] + boxes[:, 1]) / 2.0
keep = np.where((x_ctr >= crop_box[0]) & (x_ctr <= crop_box[2]) &
(y_ctr >= crop_box[1]) & (y_ctr <= crop_box[3]))[0]
boxes = boxes[keep]
boxes[:, (0, 2)] = boxes[:, (0, 2)] * w - w_offset
boxes[:, (1, 3)] = boxes[:, (1, 3)] * h - h_offset
boxes = clip_boxes(boxes, (crop_h, crop_w))
boxes[:, (0, 2)] /= crop_w
boxes[:, (1, 3)] /= crop_h
return img, boxes
def __call__(self, img, boxes):
sample_boxes = self.generate_batch_samples(boxes)
if len(sample_boxes) > 0:
rand_box = sample_boxes[npr.randint(len(sample_boxes))]
img, boxes = self.crop(img, rand_box, boxes)
return img, boxes
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import json
import os
import sys
import numpy as np
from seetadet.core.config import cfg
from seetadet.utils import mask as mask_util
from seetadet.utils.pycocotools import mask as mask_tools
from seetadet.utils.pycocotools.coco import COCO
from seetadet.utils.pycocotools.cocoeval import COCOeval
class COCOEvaluator(object):
"""Evaluator for MS COCO dataset."""
def __init__(self, imdb, ann_file=None):
self.imdb = imdb
if ann_file is not None and os.path.exists(ann_file):
self.coco = COCO(ann_file)
cats = self.coco.loadCats(self.coco.getCatIds())
self.class_to_cat_id = dict(zip([c['name'] for c in cats],
self.coco.getCatIds()))
else:
self.coco = None
self.class_to_cat_id = None
def bbox_results_one_category(self, boxes, cat_id, gt_recs):
ix, results = 0, []
for image_name, rec in gt_recs.items():
detections = boxes[ix]
ix += 1
if isinstance(detections, list) and len(detections) == 0:
continue
detections = detections.astype('float64')
scores = detections[:, -1]
xs = detections[:, 0]
ys = detections[:, 1]
ws = detections[:, 2] - xs + 1
hs = detections[:, 3] - ys + 1
results.extend([{
'image_id': self.get_image_id(image_name),
'category_id': cat_id,
'bbox': [xs[k], ys[k], ws[k], hs[k]],
'score': scores[k],
} for k in range(detections.shape[0])])
return results
def do_bbox_eval(self, res_file):
coco_dt = self.coco.loadRes(res_file)
coco_eval = COCOeval(self.coco, coco_dt, 'bbox')
coco_eval.evaluate()
coco_eval.accumulate()
self.print_coco_eval_results(coco_eval)
def do_segm_eval(self, res_file):
coco_dt = self.coco.loadRes(res_file)
coco_eval = COCOeval(self.coco, coco_dt, 'segm')
coco_eval.evaluate()
coco_eval.accumulate()
self.print_coco_eval_results(coco_eval)
@staticmethod
def encode_masks(masks, boxes, im_h, im_w):
mask_image = mask_util.project_masks(
masks, boxes, im_h, im_w,
cfg.TEST.BINARY_THRESH)
return mask_tools.encode(mask_image)
@staticmethod
def get_prefix(type='bbox'):
if type == 'bbox':
return 'detections'
elif type == 'segm':
return 'segmentations'
elif type == 'kpt':
return 'keypoints'
return ''
@staticmethod
def get_annotations_file(results_folder, type='bbox'):
# experiments/model_id/annotations/[GT]detections.json
filename = '[GT]' + COCOEvaluator.get_prefix(type) + '.json'
if not os.path.exists(results_folder):
os.makedirs(results_folder)
return os.path.join(results_folder, filename)
@staticmethod
def get_image_id(image_name):
image_id = image_name.split('_')[-1].split('.')[0]
try:
return int(image_id)
except ValueError:
return image_name
def get_results_file(self, results_folder, type='bbox'):
# experiments/model_id/results/detections.json
filename = self.get_prefix(type) + self.imdb.comp_id + '.json'
if not os.path.exists(results_folder):
os.makedirs(results_folder)
return os.path.join(results_folder, filename)
def print_coco_eval_results(self, coco_eval, iou_thr=(0.5, 0.95)):
def get_thr_ind(coco_eval, thr):
ind = np.where((coco_eval.params.iouThrs > thr - 1e-5) &
(coco_eval.params.iouThrs < thr + 1e-5))[0][0]
iou_thr = coco_eval.params.iouThrs[ind]
assert np.isclose(iou_thr, thr)
return ind
ind_lo = get_thr_ind(coco_eval, iou_thr[0])
ind_hi = get_thr_ind(coco_eval, iou_thr[1])
# Precision has dims (iou, recall, cls, area range, max dets)
# Area range index 0: all area ranges
# Max dets index 2: 100 per image
precision_res = coco_eval.eval['precision']
precision = precision_res[ind_lo:(ind_hi + 1), :, :, 0, 2]
ap_default = np.mean(precision[precision > -1])
print('~~~~ Mean and per-category AP @ IoU=[{:.2f},{:.2f}] '
'~~~~'.format(iou_thr[0], iou_thr[1]))
print('{:.1f}'.format(100 * ap_default))
for cls_ind, cls in enumerate(self.imdb.classes):
if cls == '__background__':
continue
precision = precision_res[ind_lo:(ind_hi + 1), :, cls_ind - 1, 0, 2]
ap = np.mean(precision[precision > -1])
print('{:.1f}'.format(100 * ap))
print('~~~~ Summary metrics ~~~~')
coco_eval.summarize()
def segm_results_one_category(self, boxes, masks, cat_id, gt_recs):
def filter_boxes(dets):
boxes = dets[:, :4]
ws = boxes[:, 2] - boxes[:, 0]
hs = boxes[:, 3] - boxes[:, 1]
keep = np.where((ws >= 1) & (hs >= 1))[0]
return keep
results = []
ix = 0
for image_name, rec in gt_recs.items():
dets = boxes[ix].astype(np.float)
msks = masks[ix]
ix += 1
keep = filter_boxes(dets)
im_h, im_w = rec['height'], rec['width']
if len(keep) == 0:
continue
scores = dets[:, -1]
mask_encode = self.encode_masks(
msks[keep], dets[keep, :4], im_h, im_w)
for k in range(dets[keep].shape[0]):
rle = mask_encode[k]
if sys.version_info >= (3, 0):
rle['counts'] = rle['counts'].decode()
results.append({
'image_id': self.get_image_id(image_name),
'category_id': cat_id,
'segmentation': rle,
'score': scores[k],
})
return results
def write_bbox_annotations(self, gt_recs, output_dir):
# Build images
dataset = {'images': []}
for image_name, rec in gt_recs.items():
dataset['images'].append({
'file_name': image_name + '.jpg',
'id': self.get_image_id(image_name),
'height': rec['height'], 'width': rec['width'],
})
# Build categories
dataset['categories'] = []
for cls in self.imdb.classes:
if cls == '__background__':
continue
dataset['categories'].append({
'name': cls,
'id': self.imdb.class_to_ind[cls],
})
# Build annotations
dataset['annotations'] = []
ann_id = 0
for image_name, rec in gt_recs.items():
for obj in rec['objects']:
x, y = obj['bbox'][0], obj['bbox'][1]
w, h = obj['bbox'][2] - x + 1, obj['bbox'][3] - y + 1
dataset['annotations'].append({
'id': str(ann_id),
'bbox': [x, y, w, h],
'area': w * h,
'iscrowd': obj['difficult'],
'image_id': self.get_image_id(image_name),
'category_id': self.imdb.class_to_ind[obj['name']],
})
ann_id += 1
ann_file = self.get_annotations_file(output_dir, 'bbox')
with open(ann_file, 'w') as f:
json.dump(dataset, f)
return ann_file
def write_bbox_results(self, all_boxes, gt_recs, output_dir):
filename = self.get_results_file(output_dir)
results = []
for cls_ind, cls in enumerate(self.imdb.classes):
if cls == '__background__':
continue
print('Collecting {} results ({:d}/{:d})'
.format(cls, cls_ind, self.imdb.num_classes - 1))
cat_id = self.class_to_cat_id[cls]
results.extend(self.bbox_results_one_category(
all_boxes[cls_ind], cat_id, gt_recs))
print('Writing results json to {}'.format(filename))
with open(filename, 'w') as fid:
json.dump(results, fid)
return filename
def write_segm_annotations(self, gt_recs, output_dir):
# Build images
dataset = {'images': []}
for image_name, rec in gt_recs.items():
dataset['images'].append({
'file_name': image_name + '.jpg',
'id': self.get_image_id(image_name),
'height': rec['height'], 'width': rec['width'],
})
# Build categories
dataset['categories'] = []
for cls in self.imdb._classes:
if cls == '__background__':
continue
dataset['categories'].append({
'name': cls,
'id': self.imdb.class_to_ind[cls],
})
# Build annotations
dataset['annotations'] = []
ann_id = 0
for image_name, rec in gt_recs.items():
mask_size = (rec['height'], rec['width'])
for obj in rec['objects']:
x, y = obj['bbox'][0], obj['bbox'][1]
w, h = obj['bbox'][2] - x + 1, obj['bbox'][3] - y + 1
if 'mask' in obj:
segm = {'size': mask_size, 'counts': obj['mask']}
if sys.version_info >= (3, 0):
segm['counts'] = segm['counts'].decode()
elif 'polygons' in obj:
segm = []
for poly in obj['polygons']:
if isinstance(poly, np.ndarray):
segm.append(poly.tolist())
else:
segm.append(poly)
else:
raise ValueError('Excepted mask-rle or polygons.')
dataset['annotations'].append({
'id': str(ann_id),
'bbox': [x, y, w, h],
'area': w * h,
'segmentation': segm,
'iscrowd': obj['difficult'],
'image_id': self.get_image_id(image_name),
'category_id': self.imdb.class_to_ind[obj['name']],
})
ann_id += 1
ann_file = self.get_annotations_file(output_dir, 'segm')
with open(ann_file, 'w') as f:
json.dump(dataset, f)
return ann_file
def write_segm_results(self, all_boxes, all_masks, gt_recs, output_dir):
filename = self.get_results_file(output_dir, 'segm')
results = []
for cls_ind, cls in enumerate(self.imdb.classes):
if cls == '__background__':
continue
print('Collecting {} results ({:d}/{:d})'
.format(cls, cls_ind, self.imdb.num_classes - 1))
cat_id = self.class_to_cat_id[cls]
results.extend(self.segm_results_one_category(
all_boxes[cls_ind], all_masks[cls_ind], cat_id, gt_recs))
print('Writing results json to {}'.format(filename))
with open(filename, 'w') as fid:
json.dump(results, fid)
return filename
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import uuid
from seetadet.core.config import cfg
from seetadet.datasets.coco_evaluator import COCOEvaluator
from seetadet.datasets.voc_evaluator import VOCEvaluator
class Dataset(object):
"""The base dataset class."""
def __init__(self, source):
self._source = source
self._num_images = 0
self._classes = cfg.MODEL.CLASSES
self._class_to_ind = self._class_to_cat_id = \
dict(zip(self.classes, range(self.num_classes)))
self._salt = str(uuid.uuid4())
self.config = {'cleanup': True, 'use_salt': True}
@property
def classes(self):
return self._classes
@property
def class_to_ind(self):
return self._class_to_ind
@property
def cls(self):
return type(self)
@property
def comp_id(self):
return '_' + self._salt if self.config['use_salt'] else ''
@property
def num_classes(self):
return len(self._classes)
@property
def num_images(self):
return self._num_images
@property
def source(self):
return self._source
def competition_mode(self, on):
if on:
self.config['use_salt'] = False
self.config['cleanup'] = False
else:
self.config['use_salt'] = True
self.config['cleanup'] = True
def dump_detections(self, all_boxes, output_dir):
pass
def evaluate_detections(self, all_boxes, gt_recs, output_dir):
protocol = cfg.TEST.PROTOCOL
if 'voc' in protocol:
evaluator = VOCEvaluator(self)
evaluator.write_bbox_results(all_boxes, gt_recs, output_dir)
if '!' not in protocol:
for ovr in (0.5,):
evaluator.do_bbox_eval(
gt_recs,
output_dir,
iou=ovr,
use_07_metric='2007' in protocol,
)
elif 'coco' in protocol:
ann_file = cfg.TEST.JSON_FILE
evaluator = COCOEvaluator(self, ann_file)
if evaluator.coco is None:
ann_file = evaluator \
.write_bbox_annotations(
gt_recs, output_dir)
evaluator = COCOEvaluator(self, ann_file)
res_file = evaluator.write_bbox_results(
all_boxes, gt_recs, output_dir)
if '!' not in protocol:
evaluator.do_bbox_eval(res_file)
def evaluate_segmentations(self, all_boxes, all_masks, gt_recs, output_dir):
protocol = cfg.TEST.PROTOCOL
if 'voc' in protocol:
evaluator = VOCEvaluator(self)
evaluator.write_segm_results(all_boxes, all_masks, output_dir)
if '!' not in protocol:
for ovr in (0.5,):
evaluator.do_segm_eval(
gt_recs,
output_dir,
iou=ovr,
use_07_metric='2007' in protocol,
)
elif 'coco' in protocol:
ann_file = cfg.TEST.JSON_FILE
evaluator = COCOEvaluator(self, ann_file)
if evaluator.coco is None:
ann_file = evaluator \
.write_segm_annotations(
gt_recs, output_dir)
evaluator = COCOEvaluator(self, ann_file)
res_file = evaluator.write_segm_results(
all_boxes, all_masks, gt_recs, output_dir)
if '!' not in protocol:
evaluator.do_segm_eval(res_file)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from seetadet.datasets import kpl_dataset
def get_dataset(name):
"""Get a dataset by name."""
keys = name.split('://')
if len(keys) >= 2:
cls, source = keys
if cls not in _GLOBAL_REGISTERED_DATASET:
raise KeyError('Unknown dataset:', cls)
return _GLOBAL_REGISTERED_DATASET[cls](source)
elif os.path.exists(name):
return _GLOBAL_REGISTERED_DATASET['default'](name)
else:
raise ValueError('Illegal dataset:', name)
def list_dataset():
"""List all registered dataset."""
return _GLOBAL_REGISTERED_DATASET.keys()
_GLOBAL_REGISTERED_DATASET = {
'default': lambda source:
kpl_dataset.KPLRecordDataset(source),
}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import dragon
from seetadet.core.config import cfg
from seetadet.datasets.dataset import Dataset
class KPLRecordDataset(Dataset):
def __init__(self, source):
super(KPLRecordDataset, self).__init__(source)
self._num_images = self.cls(self.source).size
@property
def cls(self):
return dragon.io.KPLRecordDataset
def dump_detections(self, all_boxes, output_dir):
dataset = self.cls(self.source)
for file in ('root.data', 'root.index', 'root.meta'):
file = os.path.join(output_dir, file)
if os.path.exists(file):
os.remove(file)
writer = dragon.io.KPLRecordWriter(output_dir, dataset.protocol)
for i in range(len(dataset)):
example = dataset.get()
example['object'] = []
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
detections = all_boxes[cls_ind][i]
if len(detections) == 0:
continue
for k in range(detections.shape[0]):
if detections[k, -1] < cfg.VIS_TH:
continue
example['object'].append({
'name': cls,
'xmin': float(detections[k][0]),
'ymin': float(detections[k][1]),
'xmax': float(detections[k][2]),
'ymax': float(detections[k][3]),
'difficult': 0,
})
writer.write(example)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# Codes are based on:
#
# <https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/datasets/voc_eval.py>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import cv2
import numpy as np
from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util
from seetadet.utils.env import pickle
from seetadet.utils.mask import mask_overlap
from seetadet.utils.pycocotools import mask_utils
def voc_ap(rec, prec, use_07_metric=False):
"""Compute VOC AP given precision and recall."""
if use_07_metric:
# 11 point metric
ap = 0.
for t in np.arange(0., 1.1, 0.1):
if np.sum(rec >= t) == 0:
p = 0
else:
p = np.max(prec[rec >= t])
ap = ap + p / 11.
else:
# Correct AP calculation
# First append sentinel values at the end
mrec = np.concatenate(([0.], rec, [1.]))
mpre = np.concatenate(([0.], prec, [0.]))
# Compute the precision envelope
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# To calculate area under PR curve, look for points
# where X axis (recall) changes value
i = np.where(mrec[1:] != mrec[:-1])[0]
# And sum (\Delta recall) * prec
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
def voc_bbox_eval(
det_file,
gt_recs,
cls_name,
iou=0.5,
use_07_metric=False,
):
class_recs, n_pos = {}, 0
for image_name, rec in gt_recs.items():
objects = [obj for obj in rec['objects'] if obj['name'] == cls_name]
bbox = np.array([x['bbox'] for x in objects])
diff = np.array([x['difficult'] for x in objects]).astype(np.bool)
det = [False] * len(objects)
n_pos = n_pos + sum(~diff)
class_recs[image_name] = {'bbox': bbox, 'difficult': diff, 'det': det}
# Read detections
with open(det_file, 'r') as f:
lines = f.readlines()
splitlines = [x.strip().split(' ') for x in lines]
image_ids = [x[0] for x in splitlines]
confidence = np.array([float(x[1]) for x in splitlines])
BB = np.array([[float(z) for z in x[2:]] for x in splitlines])
# Avoid IndexError if detecting nothing
if len(BB) == 0:
return 0, 0, -1
# Sort by confidence
sorted_ind = np.argsort(-confidence)
BB = BB[sorted_ind, :]
image_ids = [image_ids[x] for x in sorted_ind]
# Go down detections and mark TPs and FPs
nd = len(image_ids)
tp, fp = np.zeros(nd), np.zeros(nd)
def compute_overlaps(bb, BBGT):
ixmin = np.maximum(BBGT[:, 0], bb[0])
iymin = np.maximum(BBGT[:, 1], bb[1])
ixmax = np.minimum(BBGT[:, 2], bb[2])
iymax = np.minimum(BBGT[:, 3], bb[3])
iw = np.maximum(ixmax - ixmin + 1., 0.)
ih = np.maximum(iymax - iymin + 1., 0.)
inters = iw * ih
uni = ((bb[2] - bb[0] + 1.) *
(bb[3] - bb[1] + 1.) +
(BBGT[:, 2] - BBGT[:, 0] + 1.) *
(BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
return inters / uni
for d in range(nd):
R = class_recs[image_ids[d]]
bb = BB[d, :].astype(float)
ov_max, j_max = -np.inf, 0
BBGT = R['bbox'].astype(float)
if BBGT.size > 0:
overlaps = compute_overlaps(bb, BBGT)
ov_max = np.max(overlaps)
j_max = np.argmax(overlaps)
if ov_max > iou:
if not R['difficult'][j_max]:
if not R['det'][j_max]:
tp[d] = 1.
R['det'][j_max] = 1
else:
fp[d] = 1.
else:
fp[d] = 1.
# Compute precision recall
fp = np.cumsum(fp)
tp = np.cumsum(tp)
rec = tp / float(n_pos)
# Avoid divide by zero in case the first detection
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
ap = voc_ap(rec, prec, use_07_metric)
return rec, prec, ap
def voc_segm_eval(
det_file,
seg_file,
gt_recs,
cls_name,
iou=0.5,
use_07_metric=False,
):
# 0. Constants
M = cfg.MRCNN.RESOLUTION
binary_thresh = cfg.TEST.BINARY_THRESH
scale = (M + 2.) / M
padded_mask = np.zeros((M + 2, M + 2), dtype=np.float32)
# 1. Get bbox & mask ground truths
image_names, class_recs, n_pos = [], {}, 0
for image_name, rec in gt_recs.items():
objects = [obj for obj in rec['objects'] if obj['name'] == cls_name]
bbox = np.array([x['bbox'] for x in objects])
mask = np.array([
mask_utils.bytes2img(
x['mask'],
rec['height'],
rec['width']
) for x in objects]
)
difficult = np.array([x['difficult'] for x in objects]).astype(np.bool)
det = [False] * len(objects)
n_pos = n_pos + sum(~difficult)
class_recs[image_name] = {
'bbox': bbox,
'mask': mask,
'difficult': difficult,
'det': det
}
image_names.append(image_name)
# 2. Get predict pickle file for this class
with open(det_file, 'rb') as f:
boxes_pkl = pickle.load(f)
with open(seg_file, 'rb') as f:
masks_pkl = pickle.load(f)
# 3. Pre-compute number of total instances to allocate memory
num_images = len(gt_recs)
box_num = 0
for im_i in range(num_images):
box_num += len(boxes_pkl[im_i])
# avoid IndexError if detecting nothing
if box_num == 0:
return 0, 0, -1
# 4. Re-organize all the predicted boxes
new_boxes = np.zeros((box_num, 5))
new_masks = np.zeros((box_num, M, M))
new_images = []
cnt = 0
for image_ind in range(num_images):
boxes = boxes_pkl[image_ind]
masks = masks_pkl[image_ind]
num_instance = len(boxes)
for box_ind in range(num_instance):
new_boxes[cnt] = boxes[box_ind]
new_masks[cnt] = masks[box_ind]
new_images.append(image_names[image_ind])
cnt += 1
# 5. Rearrange boxes according to their scores
seg_scores = new_boxes[:, -1]
keep_inds = np.argsort(-seg_scores)
new_boxes = new_boxes[keep_inds, :]
new_masks = new_masks[keep_inds, :, :]
num_pred = new_boxes.shape[0]
# 6. Calculate t/f positive
fp = np.zeros((num_pred, 1))
tp = np.zeros((num_pred, 1))
ref_boxes = box_util.expand_boxes(new_boxes, scale)
ref_boxes = ref_boxes.astype(np.int32)
for i in range(num_pred):
image_name = new_images[keep_inds[i]]
if image_name not in class_recs:
print('Warning: {} does not exist in the ground-truths.'.format(image_name))
fp[i] = 1
continue
R = class_recs[image_name]
im_h = gt_recs[image_name]['height']
im_w = gt_recs[image_name]['width']
# Decode mask
ref_box = ref_boxes[i, :4]
mask = new_masks[i]
padded_mask[1:-1, 1:-1] = mask[:, :]
w = ref_box[2] - ref_box[0] + 1
h = ref_box[3] - ref_box[1] + 1
w = np.maximum(w, 1)
h = np.maximum(h, 1)
mask = cv2.resize(padded_mask, (w, h))
mask = np.array(mask > binary_thresh, dtype=np.uint8)
x1 = max(ref_box[0], 0)
y1 = max(ref_box[1], 0)
x2 = min(ref_box[2] + 1, im_w)
y2 = min(ref_box[3] + 1, im_h)
pred_mask = mask[(y1 - ref_box[1]): (y2 - ref_box[1]),
(x1 - ref_box[0]): (x2 - ref_box[0])]
# Calculate max region overlap
ovmax, jmax = -1, -1
for j in range(len(R['det'])):
gt_mask_bound = R['bbox'][j].astype(int)
pred_mask_bound = new_boxes[i, :4].astype(int)
crop_mask = R['mask'][j][gt_mask_bound[1]:gt_mask_bound[3] + 1,
gt_mask_bound[0]:gt_mask_bound[2] + 1]
ov = mask_overlap(gt_mask_bound,
pred_mask_bound,
crop_mask,
pred_mask)
if ov > ovmax:
ovmax = ov
jmax = j
if ovmax > iou:
if not R['difficult'][jmax]:
if not R['det'][jmax]:
tp[i] = 1.
R['det'][jmax] = 1
else:
fp[i] = 1.
else:
fp[i] = 1
# 7. Calculate precision
fp = np.cumsum(fp)
tp = np.cumsum(tp)
rec = tp / float(n_pos)
# Avoid divide by zero in case the first matches a difficult gt
prec = tp / np.maximum(fp + tp, np.finfo(np.float64).eps)
ap = voc_ap(rec, prec, use_07_metric=use_07_metric)
return ap
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import numpy as np
from seetadet.datasets import voc_eval
from seetadet.utils.env import pickle
class VOCEvaluator(object):
"""Evaluator for PASCAL VOC dataset."""
def __init__(self, imdb):
self.imdb = imdb
def do_bbox_eval(
self,
gt_recs,
output_dir,
iou=0.5,
use_07_metric=True,
):
aps = []
print('~~~~~~ Evaluation IoU@%s ~~~~~~' % str(iou))
print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No'))
for i, cls in enumerate(self.imdb.classes):
if cls == '__background__':
continue
det_file = self.get_results_file(output_dir).format(cls)
rec, prec, ap = \
voc_eval.voc_bbox_eval(
det_file,
gt_recs, cls,
iou=iou,
use_07_metric=use_07_metric,
)
if ap > 0:
aps += [ap]
print('AP for {} = {:.4f}'.format(cls, ap))
print('Mean AP = {:.4f}\n'.format(np.mean(aps)))
def do_segm_eval(
self,
gt_recs,
output_dir,
iou=0.5,
use_07_metric=True,
):
aps = []
print('~~~~~~ Evaluation IoU@%s ~~~~~~' % str(iou))
print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No'))
for i, cls in enumerate(self.imdb.classes):
if cls == '__background__':
continue
segm_filename = self.get_results_file(output_dir, 'segm').format(cls)
bbox_filename = segm_filename.replace('segmentations', 'detections')
ap = voc_eval.voc_segm_eval(
bbox_filename,
segm_filename,
gt_recs, cls,
iou=iou,
use_07_metric=use_07_metric,
)
if ap > 0:
aps += [ap]
print('AP for {} = {:.4f}'.format(cls, ap))
print('Mean AP = {:.4f}\n'.format(np.mean(aps)))
@staticmethod
def get_prefix(type='bbox'):
if type == 'bbox':
return 'detections'
elif type == 'segm':
return 'segmentations'
elif type == 'kpt':
return 'keypoints'
return ''
def get_results_file(self, results_folder, type='bbox'):
# experiments/model_id/results/detections_<comp_id>_<class_name>.txt
if type == 'bbox':
filename = self.get_prefix(type) + self.imdb.comp_id + '_{:s}.txt'
elif type == 'segm':
filename = self.get_prefix(type) + self.imdb.comp_id + '_{:s}.pkl'
else:
raise ValueError('Type of results can be either bbox or segm.')
if not os.path.exists(results_folder):
os.makedirs(results_folder)
return os.path.join(results_folder, filename)
def write_bbox_results(self, all_boxes, gt_recs, output_dir):
for cls_ind, cls in enumerate(self.imdb.classes):
if cls == '__background__':
continue
print('Writing {} VOC format bbox results'.format(cls))
filename = self.get_results_file(output_dir).format(cls)
with open(filename, 'wt') as f:
ix = 0
for image_id, rec in gt_recs.items():
dets = all_boxes[cls_ind][ix]
ix += 1
if len(dets) == 0:
continue
for k in range(dets.shape[0]):
content = '{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}' \
.format(image_id, dets[k, -1],
dets[k, 0] + 1, dets[k, 1] + 1,
dets[k, 2] + 1, dets[k, 3] + 1)
if dets.shape[1] == 6:
content += ' {:.2f}'.format(dets[k, 4])
f.write(content + '\n')
def write_segm_results(self, all_boxes, all_masks, output_dir):
for cls_inds, cls in enumerate(self.imdb.classes):
if cls == '__background__':
continue
print('Writing {} VOC format segm results'.format(cls))
segm_filename = self.get_results_file(output_dir, 'segm').format(cls)
bbox_filename = segm_filename.replace('segmentations', 'detections')
with open(bbox_filename, 'wb') as f:
pickle.dump(all_boxes[cls_inds], f, pickle.HIGHEST_PROTOCOL)
with open(segm_filename, 'wb') as f:
pickle.dump(all_masks[cls_inds], f, pickle.HIGHEST_PROTOCOL)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""AirNet backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import dragon.vm.torch as torch
from seetadet.core import registry
from seetadet.core.config import cfg
from seetadet.modules import init
from seetadet.modules import nn
class ResBlock(nn.Module):
"""The resnet block."""
def __init__(self, dim_in, dim_out, stride=1, downsample=None):
super(ResBlock, self).__init__()
norm = cfg.MODEL.BACKBONE_NORM
self.conv1 = nn.Conv3x3(dim_in, dim_out, stride)
self.bn1 = nn.get_norm(norm, dim_out)
self.conv2 = nn.Conv3x3(dim_out, dim_out)
self.bn2 = nn.get_norm(norm, dim_out)
self.downsample = downsample
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class InceptionBlock(nn.Module):
"""The inception block."""
def __init__(self, dim_in, dim_out):
super(InceptionBlock, self).__init__()
norm = cfg.MODEL.BACKBONE_NORM
self.conv1 = nn.Conv1x1(dim_in, dim_out)
self.bn1 = nn.get_norm(norm, dim_out)
self.conv2 = nn.Conv3x3(dim_out, dim_out // 2)
self.bn2 = nn.get_norm(norm, dim_out // 2)
self.conv3a = nn.Conv3x3(dim_out // 2, dim_out)
self.bn3a = nn.get_norm(norm, dim_out)
self.conv3b = nn.Conv3x3(dim_out, dim_out)
self.bn3b = nn.get_norm(norm, dim_out)
self.conv4 = nn.Conv3x3(dim_out * 3, dim_out)
self.bn4 = nn.get_norm(norm, dim_out)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
identity = x
out = self.conv1(x)
out_1x1 = self.bn1(out)
out_1x1 = self.relu(out_1x1)
out = self.conv2(out_1x1)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3a(out)
out_3x3_a = self.bn3a(out)
out_3x3_a = self.relu(out_3x3_a)
out = self.conv3b(out_1x1)
out_3x3_b = self.bn3b(out)
out_3x3_b = self.relu(out_3x3_b)
out = torch.cat([out_1x1, out_3x3_a, out_3x3_b], 1)
out = self.conv4(out)
out = self.bn4(out)
out += identity
out = self.relu(out)
return out
class AirNet(nn.Module):
"""The airnet class."""
def __init__(self, model_cfg):
super(AirNet, self).__init__()
dim_in, dims, features = 64, [64, 128, 256, 384], []
self.conv1 = nn.Conv2d(3, 64, kernel_size=7,
stride=2, padding=3, bias=False)
self.bn1 = nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_in)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
self.feature_dims = collections.OrderedDict(stem=64)
for i, v, dim_out in zip(range(4), model_cfg, dims):
stride = 1 if i == 0 else 2
downsample = nn.Sequential(
nn.Conv1x1(dim_in, dim_out, stride=stride),
nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_out),
)
features.append(ResBlock(dim_in, dim_out, stride, downsample))
for j in range(1, len(v)):
if v[j] == 'r':
features.append(ResBlock(dim_out, dim_out))
elif v[j] == 'i':
features.append(InceptionBlock(dim_out, dim_out))
else:
raise ValueError('Unknown block flag: ' + v[i])
setattr(self, 'layer%d' % (i + 1), nn.Sequential(*features[-len(v):]))
self.feature_dims[id(features[-1])] = dim_in = dim_out
self.features = features
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
init.kaiming_normal(m.weight, mode='fan_out')
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
outputs = [None]
for layer in self.features:
x = layer(x)
if self.feature_dims.get(id(layer)):
outputs.append(x)
return outputs
def airnet(num_layers=5):
model_cfg = (('r', 'r'), ('r', 'i'), ('r', 'i'), ('r', 'i'))
return AirNet(model_cfg[:num_layers])
registry.backbone.register('airnet', airnet)
registry.backbone.register('airnet_3b', airnet, num_layers=3)
registry.backbone.register('airnet_4b', airnet, num_layers=4)
registry.backbone.register('airnet_5b', airnet, num_layers=5)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Generic detector."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import importlib
import dragon.vm.torch as torch
from seetadet import modeling as models
from seetadet.core.config import cfg
from seetadet.core import registry
from seetadet.modules import nn
from seetadet.modules import utils as module_util
from seetadet.modules import vision
from seetadet.utils import logger
class Detector(nn.Module):
"""Organize the detection pipelines."""
def __init__(self):
super(Detector, self).__init__()
model_type = cfg.MODEL.TYPE
backbone = cfg.MODEL.BACKBONE.lower().split('.')
conv_body, conv_modules = backbone[0], backbone[1:]
# DataLoader
self.data_loader = None
self.data_loader_cls = getattr(importlib.import_module(
'seetadet.algo.{}'.format(model_type)), 'DataLoader')
self.image_norm = vision.ImageNormalizer()
# FeatureExtractor
self.conv_body = registry.backbone.get(conv_body)()
feature_dims = list(self.conv_body.feature_dims.values())
# FeatureEnhancer
if 'fpn' in conv_modules:
self.fpn = models.FPN(feature_dims)
feature_dims = [self.fpn.feature_dim]
# DetectionHead
if 'rcnn' in model_type:
self.rpn = models.RPN(feature_dims[0])
if 'faster' in model_type:
self.rcnn = models.FastRCNN(feature_dims[0])
elif 'mask' in model_type:
self.rcnn = models.MaskRCNN(feature_dims[0])
else:
raise ValueError('Unsupported model: ' + model_type)
elif model_type == 'retinanet':
self.retinanet = models.RetinaNet(feature_dims[0])
elif model_type == 'ssd':
self.ssd = models.SSD(feature_dims)
else:
raise ValueError('Unsupported model: ' + model_type)
def load_weights(self, weights):
"""Load the state dict of this detector.
Note that the mismatched keys will be ignored.
Parameters
----------
weights : str
The path of the weights file.
"""
self.load_state_dict(torch.load(weights), strict=False)
def forward(self, inputs=None):
"""Compute the detection outputs.
Parameters
----------
inputs : dict, optional
The inputs.
Returns
-------
dict
The outputs.
"""
# Get the inputs
if inputs is None:
if self.data_loader is None:
self.data_loader = self.data_loader_cls()
inputs = self.data_loader()
# Extract features
image = self.image_norm(inputs['image'])
features = self.conv_body(image)
# Apply the FPN to enhance features if necessary
if hasattr(self, 'fpn'):
features = self.fpn(features)
# Collect detection outputs
outputs = collections.OrderedDict()
# Features -> RPN -> R-CNN
if hasattr(self, 'rpn'):
outputs.update(self.rpn(features=features, **inputs))
outputs.update(
self.rcnn(
features=features,
rpn_cls_score=outputs['rpn_cls_score'],
rpn_bbox_pred=outputs['rpn_bbox_pred'],
**inputs
)
)
# Features -> RetinaNet
if hasattr(self, 'retinanet'):
outputs.update(self.retinanet(features=features, **inputs))
# Features -> SSD
if hasattr(self, 'ssd'):
outputs.update(self.ssd(features=features, **inputs))
return outputs
def optimize_for_inference(self):
"""Optimize the graph for the inference."""
# Optimization #1: LayerFusion
fusions = set()
last_module = None
for module in self.modules():
pass_key, pass_fn = module_util \
.get_fusion_pass(last_module, module)
if pass_fn is not None:
fusions.add(pass_key)
pass_fn(last_module, module)
last_module = module
if len(fusions) > 0:
logger.info('Enable fusions: ' + ', '.join(fusions))
def new_detector(device, weights=None, training=False):
detector = Detector().cuda(device)
if weights is not None:
detector.load_weights(weights)
if not training:
detector.eval()
detector.optimize_for_inference()
# Enable the fp16 inference support if necessary
# Boost a little if TensorCore is available
if cfg.MODEL.PRECISION.lower() == 'float16':
detector.half()
return detector
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""EfficientNet backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import functools
import math
from seetadet.core import registry
from seetadet.modeling.mobilenet_v3 import conv_triplet
from seetadet.modeling.mobilenet_v3 import conv_quintet
from seetadet.modeling.mobilenet_v3 import make_divisible
from seetadet.modules import init
from seetadet.modules import nn
class SqueezeExcite(nn.Module):
"""Squeeze-excite attention module."""
def __init__(self, dim_in, dim_squeeze, squeeze_ratio=0.25):
super(SqueezeExcite, self).__init__()
dim = int(dim_squeeze * squeeze_ratio)
self.layers = nn.Sequential(nn.AvgPool2d(-1, global_pooling=True),
nn.Conv2d(dim_in, dim, kernel_size=1),
nn.Swish(),
nn.Conv2d(dim, dim_in, kernel_size=1),
nn.Sigmoid(True))
def forward(self, x):
return x * self.layers(x)
class InvertedResidual(nn.Module):
"""Invert residual block."""
def __init__(
self,
dim_in,
dim_out,
kernel_size=3,
expand_ratio=3,
stride=1,
activation=None,
squeeze_excite=0,
):
super(InvertedResidual, self).__init__()
self.stride = stride
self.apply_residual = stride == 1 and dim_in == dim_out
self.dim = dim = int(round(dim_in * expand_ratio))
self.endpoint = None # Expansion feature
layers = []
if expand_ratio != 1:
layers.append(nn.Sequential(*conv_triplet(
dim_in, dim, activation=activation)))
expansion_transform = None
if squeeze_excite > 0:
expansion_transform = SqueezeExcite(dim, dim_in)
quintet = conv_quintet(dim, dim_out,
kernel_size=kernel_size,
stride=stride,
activation=activation,
expansion_transform=expansion_transform)
layers.append(nn.Sequential(*quintet[:3]))
layers.extend(quintet[3:])
self.conv = nn.Sequential(*layers)
def forward(self, x):
out = self.conv[0](x)
self.endpoint = out if self.stride == 2 else None
for layer in self.conv[1:]:
out = layer(out)
if self.apply_residual:
out += x
return out
class NASMobileNet(nn.Module):
"""NAS variant of mobilenet class."""
def __init__(self, arch, preset, width_mult=1.0, depth_mult=1.0):
super(NASMobileNet, self).__init__()
# Hand-craft configurations.
repeats, strides, out_channels, def_blocks = preset
assert sum(repeats) == len(arch), 'Bad architecture.'
self.feature_dims = collections.OrderedDict()
# Apply the width scaling.
out_channels = list(map(lambda x: make_divisible(x * width_mult),
out_channels))
# Apply the depth scaling.
repeated_arch = []
for i, repeat in enumerate(repeats):
idx_start = sum(repeats[:i])
indices = arch[idx_start: idx_start + repeat]
repeat = int(math.ceil(repeat * depth_mult))
repeated_arch += (indices + [indices[-1]] * (repeat - len(indices)))
arch = repeated_arch
# Stem.
features = [nn.Sequential(
*conv_triplet(
dim_in=3,
dim_out=out_channels[0],
kernel_size=3,
stride=2,
activation=nn.Swish(),
))]
# Blocks.
dim_in, stride_out = out_channels[0], 2
for repeat, dim_out, stride in \
zip(repeats, out_channels[1:], strides):
repeat = int(math.ceil(repeat * depth_mult))
stride_out *= stride
for i in range(repeat):
stride = stride if i == 0 else 1
idx = arch[len(features) - 1]
if def_blocks is None:
block = functools.partial(
InvertedResidual,
kernel_size=(idx // 100) % 10,
expand_ratio=int(idx / 1000.) / 10,
squeeze_excite=idx % 10)
else:
block = def_blocks[idx]
features.append(block(
dim_in, dim_out,
stride=stride,
activation=nn.Swish()))
dim_in = dim_out
if stride == 2:
self.feature_dims[id(features[-1])] = features[-1].dim
features.append(nn.Sequential(
*conv_triplet(
dim_in=dim_in,
dim_out=out_channels[-1],
kernel_size=1,
stride=1,
activation=nn.Swish())))
self.feature_dims[id(features[-1])] = out_channels[-1]
self.features = nn.Sequential(*features)
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
init.kaiming_normal(m.weight, mode='fan_out')
def forward(self, x):
outputs = []
for i, layer in enumerate(self.features):
x = layer(x)
if self.feature_dims.get(id(layer)):
outputs.append(getattr(layer, 'endpoint', x))
return outputs
class ModelSetting(object):
"""Hand-craft model setting."""
# Default NASBlocks definition.
# We use the following hash method:
# ef * 10000 + kernel_size * 100 + se * 1
# e.g., ef=4.0, ks=3, se=True, with index 40301
DEFAULT_NAS_BLOCKS_DEF = None
EFFICIENT = (
[1, 2, 2, 3, 3, 4, 1],
[1, 2, 2, 2, 1, 2, 1],
[32, 16, 24, 40, 80, 112, 192, 320, 1280],
DEFAULT_NAS_BLOCKS_DEF,
)
def efficientnet(width_mult=1.0, depth_mult=1.0):
return NASMobileNet([10301,
60301, 60301,
60501, 60501,
60301, 60301, 60301,
60501, 60501, 60501,
60501, 60501, 60501, 60501,
60301],
preset=ModelSetting.EFFICIENT,
width_mult=width_mult,
depth_mult=depth_mult)
@registry.backbone.register('efficientnet_b0')
def efficientnet_b0():
return efficientnet(width_mult=1.0, depth_mult=1.0)
@registry.backbone.register('efficientnet_b1')
def efficientnet_b1():
return efficientnet(width_mult=1.0, depth_mult=1.1)
@registry.backbone.register('efficientnet_b2')
def efficientnet_b2():
return efficientnet(width_mult=1.1, depth_mult=1.2)
@registry.backbone.register('efficientnet_b3')
def efficientnet_b3():
return efficientnet(width_mult=1.2, depth_mult=1.4)
@registry.backbone.register('efficientnet_b4')
def efficientnet_b4():
return efficientnet(width_mult=1.4, depth_mult=1.8)
@registry.backbone.register('efficientnet_b5')
def efficientnet_b5():
return efficientnet(width_mult=1.6, depth_mult=2.2)
@registry.backbone.register('efficientnet_b6')
def efficientnet_b6():
return efficientnet(width_mult=1.8, depth_mult=2.6)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""FastRCNN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import functools
import dragon.vm.torch as torch
from seetadet.algo import faster_rcnn
from seetadet.core.config import cfg
from seetadet.modules import det
from seetadet.modules import init
from seetadet.modules import nn
from seetadet.modules import vision
class FastRCNN(nn.Module):
r"""Generate proposal regions for R-CNN series.
The pipeline is as follows:
... -> RoIs \ /-> cls_score -> cls_loss
-> RoIFeatureXform -> MLP
... -> Features / \-> bbox_pred -> bbox_loss
"""
def __init__(self, dim_in=256):
super(FastRCNN, self).__init__()
self.data = {}
self.roi_head_dim = dim_in * (cfg.FRCNN.ROI_XFORM_RESOLUTION ** 2)
self.fc6 = nn.Linear(self.roi_head_dim, cfg.FRCNN.MLP_HEAD_DIM)
self.fc7 = nn.Linear(cfg.FRCNN.MLP_HEAD_DIM, cfg.FRCNN.MLP_HEAD_DIM)
self.cls_score = nn.Linear(cfg.FRCNN.MLP_HEAD_DIM, len(cfg.MODEL.CLASSES))
self.bbox_pred = nn.Linear(cfg.FRCNN.MLP_HEAD_DIM, len(cfg.MODEL.CLASSES) * 4)
self.rpn_decoder = det.RPNDecoder()
self.proposal = faster_rcnn.Proposal()
self.proposal_target = faster_rcnn.ProposalTarget()
self.softmax = nn.Softmax(dim=1)
self.relu = nn.ReLU(inplace=True)
self.sigmoid = nn.Sigmoid()
self.box_roi_feature = functools.partial({
'RoIPool': vision.roi_pool,
'RoIAlign': vision.roi_align,
}[cfg.FRCNN.ROI_XFORM_METHOD],
size=cfg.FRCNN.ROI_XFORM_RESOLUTION,
sampling_ratio=cfg.FRCNN.ROI_XFORM_SAMPLING_RATIO)
self.cls_loss = nn.CrossEntropyLoss()
if cfg.FRCNN.BBOX_REG_LOSS_TYPE.lower() == 'l1':
self.bbox_loss = nn.L1Loss(reduction='sum')
else:
self.bbox_loss = nn.SmoothL1Loss(beta=1.0, reduction='sum')
# Compute spatial scales according to strides.
self.spatial_scales = [
1. / (2 ** lvl)
for lvl in range(
cfg.FPN.ROI_MIN_LEVEL,
cfg.FPN.ROI_MAX_LEVEL + 1
)]
self.reset_parameters()
def reset_parameters(self):
init.normal(self.cls_score.weight, std=0.01)
init.normal(self.bbox_pred.weight, std=0.001)
for name, param in self.named_parameters():
if 'bias' in name:
init.constant(param, 0)
def forward(self, **kwargs):
# Generate proposals.
proposal_fn = self.proposal \
if self.training else self.rpn_decoder
self.data = {
'rois': proposal_fn(
features=kwargs['features'],
cls_prob=self.sigmoid(kwargs['rpn_cls_score'].data),
bbox_pred=kwargs['rpn_bbox_pred'],
im_info=kwargs['im_info'],
)
}
# Generate targets from proposals.
if self.training:
self.data.update(
self.proposal_target(
rois=self.data['rois'],
gt_boxes=kwargs['gt_boxes'],
)
)
# Transform RoI features.
if len(self.data['rois']) > 1:
roi_features = \
torch.cat([
self.box_roi_feature(
kwargs['features'][i],
self.data['rois'][i],
spatial_scale,
) for i, spatial_scale in enumerate(self.spatial_scales)
], dim=0)
else:
roi_features = \
self.box_roi_feature(
kwargs['features'][0],
self.data['rois'][0],
1. / cfg.RPN.STRIDES[0],
)
# Apply a simple MLP.
roi_features = roi_features.view(-1, self.roi_head_dim)
roi_features = self.relu(self.fc6(roi_features))
roi_features = self.relu(self.fc7(roi_features))
# Compute logits and losses.
outputs = collections.OrderedDict()
cls_score = self.cls_score(roi_features).float()
outputs['bbox_pred'] = self.bbox_pred(roi_features).float()
if self.training:
# Compute rcnn losses.
bbox_pred = outputs['bbox_pred'].view(0, -1, 4) \
.index_select((0, 1), self.data['bbox_inds'])
batch_size = roi_features.size(0)
bbox_loss_weight = cfg.FRCNN.BBOX_REG_LOSS_WEIGHT
bbox_loss_weight /= float(batch_size)
outputs.update(collections.OrderedDict([
('cls_loss', self.cls_loss(
cls_score,
self.data['labels'])),
('bbox_loss', self.bbox_loss(
bbox_pred,
self.data['bbox_targets'],
self.data['bbox_anchors']) * bbox_loss_weight),
]))
else:
# Return the rois to decode the refine boxes.
if len(self.data['rois']) > 1:
outputs['rois'] = torch.cat(self.data['rois'], 0)
else:
outputs['rois'] = self.data['rois'][0]
# Return the classification prob.
outputs['cls_prob'] = self.softmax(cls_score)
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""FPN feature enhancer."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.core.config import cfg
from seetadet.modules import init
from seetadet.modules import nn
HIGHEST_BACKBONE_LVL = 5 # E.g., "conv5"-like level
class FPN(nn.Module):
"""Feature Pyramid Networks to enhance input features."""
def __init__(self, feature_dims):
super(FPN, self).__init__()
self.C = nn.ModuleList()
self.P = nn.ModuleList()
self.feature_dim = dim = cfg.FPN.DIM
self.highest_backbone_lvl = min(cfg.FPN.RPN_MAX_LEVEL, HIGHEST_BACKBONE_LVL)
for lvl in range(cfg.FPN.RPN_MIN_LEVEL, self.highest_backbone_lvl + 1):
self.C.append(nn.Conv1x1(feature_dims[lvl - 1], dim, bias=True))
self.P.append(nn.Conv3x3(dim, dim, bias=True))
if 'rcnn' in cfg.MODEL.TYPE:
self.apply_func = self.apply_rcnn
self.maxpool = nn.MaxPool2d(kernel_size=1, stride=2)
else:
self.apply_func = self.apply_generic
self.relu = nn.ReLU(inplace=False)
for lvl in range(self.highest_backbone_lvl + 1, cfg.FPN.RPN_MAX_LEVEL + 1):
dim_in = feature_dims[-1] if lvl == self.highest_backbone_lvl + 1 else dim
self.P.append(nn.Conv3x3(dim_in, dim, stride=2, bias=True))
self.coarsest_stride = cfg.MODEL.COARSEST_STRIDE
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
init.xavier_uniform(m.weight)
init.constant(m.bias, 0)
def apply_rcnn(self, features):
fpn_input = self.C[-1](features[-1])
min_lvl, max_lvl = cfg.FPN.RPN_MIN_LEVEL, cfg.FPN.RPN_MAX_LEVEL
outputs = [self.P[self.highest_backbone_lvl - min_lvl](fpn_input)]
# Apply max pool for higher features.
for i in range(self.highest_backbone_lvl + 1, max_lvl + 1):
outputs.append(self.maxpool(outputs[-1]))
# Build pyramids between [MIN_LEVEL, HIGHEST_LEVEL]
for i in range(self.highest_backbone_lvl - 1, min_lvl - 1, -1):
lateral_output = self.C[i - min_lvl](features[i - 1])
if self.coarsest_stride > 0:
upscale_output = nn.upsample(fpn_input, scale_factor=2)
else:
upscale_output = nn.upsample(fpn_input, size=lateral_output.shape[2:])
fpn_input = lateral_output.__iadd__(upscale_output)
outputs.insert(0, self.P[i - min_lvl](fpn_input))
return outputs
def apply_generic(self, features):
fpn_input = self.C[-1](features[-1])
min_lvl, max_lvl = cfg.FPN.RPN_MIN_LEVEL, cfg.FPN.RPN_MAX_LEVEL
outputs = [self.P[self.highest_backbone_lvl - min_lvl](fpn_input)]
# Add extra convolutions for higher features.
extra_input = features[-1]
for i in range(self.highest_backbone_lvl + 1, max_lvl + 1):
outputs.append(self.P[i - min_lvl](extra_input))
if i != max_lvl:
extra_input = self.relu(outputs[-1])
# Build pyramids between [MIN_LEVEL, HIGHEST_LEVEL]
for i in range(self.highest_backbone_lvl - 1, min_lvl - 1, -1):
lateral_output = self.C[i - min_lvl](features[i - 1])
if self.coarsest_stride > 0:
upscale_output = nn.upsample(fpn_input, scale_factor=2)
else:
upscale_output = nn.upsample(fpn_input, size=lateral_output.shape[2:])
fpn_input = lateral_output.__iadd__(upscale_output)
outputs.insert(0, self.P[i - min_lvl](fpn_input))
return outputs
def forward(self, features):
return self.apply_func(features)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""MaskRCNN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import functools
import dragon.vm.torch as torch
from seetadet.algo import mask_rcnn
from seetadet.core.config import cfg
from seetadet.modules import det
from seetadet.modules import init
from seetadet.modules import nn
from seetadet.modules import vision
class MaskRCNN(nn.Module):
r"""Generate mask regions for R-CNN series.
The pipeline is as follows:
... -> BoxRoIs \ /-> cls_score -> cls_loss
-> RoIFeatureXform -> MLP
... -> Features / \-> bbox_pred -> bbox_loss
... -> MaskRoIs \
-> RoIFeatureXform -> FCN -> mask_score -> mask_loss
... -> Features /
"""
def __init__(self, dim_in=256):
super(MaskRCNN, self).__init__()
self.data = {}
self.roi_head_dim = dim_in * (cfg.FRCNN.ROI_XFORM_RESOLUTION ** 2)
self.fc6 = nn.Linear(self.roi_head_dim, cfg.FRCNN.MLP_HEAD_DIM)
self.fc7 = nn.Linear(cfg.FRCNN.MLP_HEAD_DIM, cfg.FRCNN.MLP_HEAD_DIM)
self.fcn = nn.ModuleList([nn.Conv3x3(dim_in, dim_in, bias=True) for _ in range(4)])
self.fcn += [nn.ConvTranspose2d(dim_in, dim_in, 2, 2, 0)]
self.cls_score = nn.Linear(cfg.FRCNN.MLP_HEAD_DIM, len(cfg.MODEL.CLASSES))
self.bbox_pred = nn.Linear(cfg.FRCNN.MLP_HEAD_DIM, len(cfg.MODEL.CLASSES) * 4)
self.mask_score = nn.Conv1x1(dim_in, len(cfg.MODEL.CLASSES) - 1, bias=True)
self.rpn_decoder = det.RPNDecoder()
self.proposal = mask_rcnn.Proposal()
self.proposal_target = mask_rcnn.ProposalTarget()
self.sigmoid = nn.Sigmoid()
self.softmax = nn.Softmax(dim=1)
self.relu = nn.ReLU(True)
self.box_roi_feature = functools.partial({
'RoIPool': vision.roi_pool,
'RoIAlign': vision.roi_align,
}[cfg.FRCNN.ROI_XFORM_METHOD],
size=cfg.FRCNN.ROI_XFORM_RESOLUTION,
sampling_ratio=cfg.FRCNN.ROI_XFORM_SAMPLING_RATIO)
self.mask_roi_feature = functools.partial({
'RoIPool': vision.roi_pool,
'RoIAlign': vision.roi_align,
}[cfg.MRCNN.ROI_XFORM_METHOD],
size=cfg.MRCNN.ROI_XFORM_RESOLUTION,
sampling_ratio=cfg.MRCNN.ROI_XFORM_SAMPLING_RATIO)
self.cls_loss = nn.CrossEntropyLoss()
if cfg.FRCNN.BBOX_REG_LOSS_TYPE.lower() == 'l1':
self.bbox_loss = nn.L1Loss(reduction='sum')
else:
self.bbox_loss = nn.SmoothL1Loss(beta=1.0, reduction='sum')
self.mask_loss = nn.BCEWithLogitsLoss()
self.compute_mask_score = None
# Compute spatial scales according to strides.
self.spatial_scales = [
1. / (2 ** lvl)
for lvl in range(cfg.FPN.ROI_MIN_LEVEL,
cfg.FPN.ROI_MAX_LEVEL + 1)]
self.reset_parameters()
def reset_parameters(self):
init.normal(self.cls_score.weight, std=0.01)
init.normal(self.bbox_pred.weight, std=0.001)
init.normal(self.mask_score.weight, std=0.001)
for m in self.fcn.modules():
if hasattr(m, 'weight'):
init.kaiming_normal(m.weight)
for name, param in self.named_parameters():
if 'bias' in name:
init.constant(param, 0)
def get_mask_score(self, features, rois):
roi_features = \
torch.cat([
self.mask_roi_feature(
features[i], rois[i], spatial_scale,
) for i, spatial_scale in enumerate(self.spatial_scales)
], dim=0)
for i in range(len(self.fcn)):
roi_features = self.relu(self.fcn[i](roi_features))
return self.mask_score(roi_features).float()
def forward(self, **kwargs):
# Generate proposals.
proposal_func = self.proposal \
if self.training else self.rpn_decoder
self.data = {
'rois': proposal_func(
features=kwargs['features'],
cls_prob=self.sigmoid(kwargs['rpn_cls_score'].data),
bbox_pred=kwargs['rpn_bbox_pred'],
im_info=kwargs['im_info'],
)
}
# Generate targets from proposals.
if self.training:
self.data.update(
self.proposal_target(
rois=self.data['rois'],
gt_boxes=kwargs['gt_boxes'],
gt_segms=kwargs['gt_segms'],
im_info=kwargs['im_info'],
)
)
# Transform RoI features.
roi_features = \
torch.cat([
self.box_roi_feature(
kwargs['features'][i],
self.data['rois'][i],
spatial_scale,
) for i, spatial_scale in enumerate(self.spatial_scales)
], dim=0)
# Apply a simple MLP.
roi_features = roi_features.view(-1, self.roi_head_dim)
roi_features = self.relu(self.fc6(roi_features))
roi_features = self.relu(self.fc7(roi_features))
# Compute logits and losses.
outputs = collections.OrderedDict()
cls_score = self.cls_score(roi_features).float()
outputs['bbox_pred'] = self.bbox_pred(roi_features).float()
if self.training:
# Compute the loss of bbox branch.
bbox_pred = outputs['bbox_pred'].view(0, -1, 4) \
.index_select((0, 1), self.data['bbox_inds'])
batch_size = roi_features.size(0)
bbox_loss_weight = cfg.FRCNN.BBOX_REG_LOSS_WEIGHT
bbox_loss_weight /= float(batch_size)
outputs.update(collections.OrderedDict([
('cls_loss', self.cls_loss(
cls_score,
self.data['labels'])),
('bbox_loss', self.bbox_loss(
bbox_pred,
self.data['bbox_targets']) * bbox_loss_weight),
]))
# Compute the loss of mask branch.
mask_score = self.get_mask_score(
kwargs['features'], self.data['mask_rois'])
mask_score = mask_score \
.index_select((0, 1), self.data['mask_inds'])
outputs['mask_loss'] = self.mask_loss(
mask_score, self.data['mask_targets'])
else:
# Return the RoIs to decode the refine boxes.
if len(self.data['rois']) > 1:
outputs['rois'] = torch.cat(self.data['rois'], 0)
else:
outputs['rois'] = self.data['rois'][0]
# Return the classification prob.
outputs['cls_prob'] = self.softmax(cls_score)
# Set a callback to decode mask from refined RoIs.
self.compute_mask_score = functools.partial(
self.get_mask_score, features=kwargs['features'])
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""MobileNetV2 backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import functools
from seetadet.core import registry
from seetadet.core.config import cfg
from seetadet.modules import init
from seetadet.modules import nn
def make_divisible(v, divisor=8):
"""Return the divisible value."""
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
def conv_triplet(dim_in, dim_out, kernel_size=1, stride=1):
"""Return a convolution triplet."""
return [nn.Conv2d(dim_in, dim_out,
kernel_size=kernel_size,
stride=stride,
padding=kernel_size // 2,
bias=False),
nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_out),
nn.ReLU(True)]
def conv_quintet(dim_in, dim_out, kernel_size, stride):
"""Return a convolution quintet."""
return [nn.Conv2d(dim_in, dim_in,
kernel_size=kernel_size,
stride=stride,
padding=kernel_size // 2,
groups=dim_in,
bias=False),
nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_in),
nn.ReLU(True),
nn.Conv2d(dim_in, dim_out, kernel_size=1, bias=False),
nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_out)]
class InvertedResidual(nn.Module):
"""Invert residual block."""
def __init__(self, dim_in, dim_out, kernel_size=3, expand_ratio=3, stride=1):
super(InvertedResidual, self).__init__()
self.stride = stride
self.apply_residual = stride == 1 and dim_in == dim_out
self.dim = dim = int(round(dim_in * expand_ratio))
self.endpoint = None # Expansion feature
layers = []
if expand_ratio != 1:
layers.append(nn.Sequential(*conv_triplet(dim_in, dim)))
quintet = conv_quintet(dim, dim_out, kernel_size, stride)
layers.append(nn.Sequential(*quintet[:3]))
layers.extend(quintet[3:])
self.conv = nn.Sequential(*layers)
def forward(self, x):
out = self.conv[0](x)
self.endpoint = out if self.stride == 2 else None
for layer in self.conv[1:]:
out = layer(out)
if self.apply_residual:
out += x
return out
class NASMobileNet(nn.Module):
"""NAS variant of mobilenet class."""
def __init__(self, arch, preset, width_mult=1.0):
super(NASMobileNet, self).__init__()
# Hand-craft configurations
repeats, strides, out_channels, def_blocks = preset
assert sum(repeats) == len(arch), 'Bad architecture.'
self.feature_dims = collections.OrderedDict()
# Apply the width scaling.
out_channels = list(map(lambda x: make_divisible(x * width_mult),
out_channels))
# Stem.
features = [nn.Sequential(
*conv_triplet(
dim_in=3,
dim_out=out_channels[0],
kernel_size=3,
stride=2,
))]
# Blocks.
dim_in, dim_out = out_channels[:2]
features.append(InvertedResidual(dim_in, dim_out, 3, 1))
for repeat, dim_out, stride in \
zip(repeats, out_channels[2:], strides):
for i in range(repeat):
stride = stride if i == 0 else 1
block = def_blocks[arch[len(features) - 2]]
features.append(block(dim_in, dim_out, stride=stride))
dim_in = dim_out
if stride == 2:
self.feature_dims[id(features[-1])] = features[-1].dim
features.append(nn.Sequential(*conv_triplet(dim_in, out_channels[-1])))
self.feature_dims[id(features[-1])] = out_channels[-1]
self.features = nn.Sequential(*features)
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
init.kaiming_normal(m.weight, mode='fan_out')
def forward(self, x):
outputs = []
for layer in self.features:
x = layer(x)
if self.feature_dims.get(id(layer)):
outputs.append(getattr(layer, 'endpoint', x))
return outputs
class ModelSetting(object):
"""Hand-craft model setting."""
# Default NASBlocks definition.
# See ProxyLessNAS (arxiv.1812.00332) for details.
DEFAULT_NAS_BLOCKS_DEF = {
0: functools.partial(InvertedResidual, kernel_size=3, expand_ratio=3),
1: functools.partial(InvertedResidual, kernel_size=3, expand_ratio=6),
2: functools.partial(InvertedResidual, kernel_size=5, expand_ratio=3),
3: functools.partial(InvertedResidual, kernel_size=5, expand_ratio=6),
4: functools.partial(InvertedResidual, kernel_size=7, expand_ratio=3),
5: functools.partial(InvertedResidual, kernel_size=7, expand_ratio=6),
6: nn.Identity,
}
V2 = (
[2, 3, 4, 3, 3, 1],
[2, 2, 2, 1, 2, 1],
[32, 16, 24, 32, 64, 96, 160, 320, 1280],
DEFAULT_NAS_BLOCKS_DEF,
)
PROXYLESS_MOBILE = (
[4, 4, 4, 4, 4, 1],
[2, 2, 2, 1, 2, 1],
[32, 16, 32, 40, 80, 96, 192, 320, 1280],
DEFAULT_NAS_BLOCKS_DEF,
)
PROXYLESS_GPU = (
[4, 4, 4, 4, 4, 1],
[2, 2, 2, 1, 2, 1],
[40, 24, 32, 56, 112, 128, 256, 432, 1280],
DEFAULT_NAS_BLOCKS_DEF,
)
@registry.backbone.register('mobilenet_v2')
def mobilenet_v2():
return NASMobileNet([1, 1,
1, 1, 1,
1, 1, 1, 1,
1, 1, 1,
1, 1, 1,
1], ModelSetting.V2)
@registry.backbone.register('proxyless_mobile')
def proxyless_mobile():
return NASMobileNet([2, 0, 6, 6,
4, 0, 2, 2,
5, 2, 2, 2,
3, 2, 2, 2,
5, 5, 4, 4,
5], ModelSetting.PROXYLESS_MOBILE)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""MobileNetV3 backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import functools
from seetadet.core import registry
from seetadet.core.config import cfg
from seetadet.modules import init
from seetadet.modules import nn
def make_divisible(v, divisor=8):
"""Return the divisible value."""
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
def conv_triplet(dim_in, dim_out, kernel_size=1, stride=1, activation=None):
"""Return a convolution triplet."""
return [nn.Conv2d(dim_in, dim_out,
kernel_size=kernel_size,
stride=stride,
padding=kernel_size // 2,
bias=False),
nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_out),
nn.ReLU(True) if activation is None else activation]
def conv_quintet(
dim_in,
dim_out,
kernel_size,
stride,
activation=None,
expansion_transform=None,
):
"""Return a convolution quintet."""
layers = [nn.Conv2d(dim_in, dim_in,
kernel_size=kernel_size,
stride=stride,
padding=kernel_size // 2,
groups=dim_in,
bias=False),
nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_in),
nn.ReLU(True) if activation is None else activation]
if expansion_transform is not None:
layers += [expansion_transform]
layers += [nn.Conv2d(dim_in, dim_out, kernel_size=1, bias=False),
nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_out)]
return layers
class SqueezeExcite(nn.Module):
"""Squeeze-excite attention module."""
def __init__(self, dim_in, squeeze_ratio=0.25):
super(SqueezeExcite, self).__init__()
dim = make_divisible(dim_in * squeeze_ratio)
self.layers = nn.Sequential(nn.AvgPool2d(-1, global_pooling=True),
nn.Conv2d(dim_in, dim, kernel_size=1),
nn.ReLU(True),
nn.Conv2d(dim, dim_in, kernel_size=1),
nn.Hardsigmoid(True))
def forward(self, x):
return x * self.layers(x)
class InvertedResidual(nn.Module):
"""Invert residual block."""
def __init__(
self,
dim_in,
dim_out,
kernel_size=3,
expand_ratio=3,
stride=1,
activation=None,
squeeze_excite=0,
):
super(InvertedResidual, self).__init__()
self.stride = stride
self.apply_residual = stride == 1 and dim_in == dim_out
self.dim = dim = int(round(dim_in * expand_ratio))
self.endpoint = None # Expansion feature
layers = []
if expand_ratio != 1:
layers.append(nn.Sequential(*conv_triplet(
dim_in, dim, activation=activation)))
expansion_transform = None
if squeeze_excite > 0:
expansion_transform = SqueezeExcite(dim)
quintet = conv_quintet(dim, dim_out,
kernel_size=kernel_size,
stride=stride,
activation=activation,
expansion_transform=expansion_transform)
layers.append(nn.Sequential(*quintet[:3]))
layers.extend(quintet[3:])
self.conv = nn.Sequential(*layers)
def forward(self, x):
out = self.conv[0](x)
self.endpoint = out if self.stride == 2 else None
for layer in self.conv[1:]:
out = layer(out)
if self.apply_residual:
out += x
return out
class NASMobileNet(nn.Module):
"""NAS variant of mobilenet class."""
def __init__(self, arch, preset, width_mult=1.0):
super(NASMobileNet, self).__init__()
# Hand-craft configurations.
repeats, strides, out_channels, def_blocks = preset
assert sum(repeats) == len(arch), 'Bad architecture.'
self.feature_dims = collections.OrderedDict()
# Apply the width scaling.
out_channels = list(map(lambda x: make_divisible(x * width_mult),
out_channels))
# Stem.
features = [nn.Sequential(
*conv_triplet(
dim_in=3,
dim_out=out_channels[0],
kernel_size=3,
stride=2,
activation=nn.Hardswish(),
))]
# Blocks.
dim_in, stride_out = out_channels[0], 2
for repeat, dim_out, stride in \
zip(repeats, out_channels[1:], strides):
stride_out *= stride
for i in range(repeat):
stride = stride if i == 0 else 1
idx = arch[len(features) - 1]
if def_blocks is None:
block = functools.partial(
InvertedResidual,
kernel_size=(idx // 100) % 10,
expand_ratio=int(idx / 1000.) / 10,
squeeze_excite=idx % 10)
else:
block = def_blocks[idx]
features.append(block(
dim_in, dim_out,
stride=stride,
activation=nn.Hardswish()
if stride_out > 8 else nn.ReLU(True)))
dim_in = dim_out
if stride == 2:
self.feature_dims[id(features[-1])] = features[-1].dim
features.append(nn.Sequential(
*conv_triplet(
dim_in=dim_in,
dim_out=out_channels[-1],
kernel_size=1,
stride=1,
activation=nn.Hardswish())))
self.feature_dims[id(features[-1])] = out_channels[-1]
self.features = nn.Sequential(*features)
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
init.kaiming_normal(m.weight, mode='fan_out')
def forward(self, x):
outputs = []
for i, layer in enumerate(self.features):
x = layer(x)
if self.feature_dims.get(id(layer)):
outputs.append(getattr(layer, 'endpoint', x))
return outputs
class ModelSetting(object):
"""Hand-craft model setting."""
# Default NASBlocks definition.
# We use the following hash method:
# ef * 10000 + kernel_size * 100 + se * 1
# e.g., ef=4.0, ks=3, se=True, with index 40301
DEFAULT_NAS_BLOCKS_DEF = None
V3 = (
[1, 2, 3, 4, 2, 3],
[1, 2, 2, 2, 1, 2],
[16, 16, 24, 40, 80, 112, 160, 960],
DEFAULT_NAS_BLOCKS_DEF,
)
@registry.backbone.register('mobilenet_v3')
def mobilenet_v3():
return NASMobileNet([10300,
40300, 30300,
30501, 30501, 30501,
60300, 25300, 23300, 23300,
60301, 60301,
60501, 60501, 60501], ModelSetting.V3)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""ResNet backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
from seetadet.core import registry
from seetadet.core.config import cfg
from seetadet.modules import nn
from seetadet.modules import init
from seetadet.utils import env
class BasicBlock(nn.Module):
"""The basic resnet block."""
expansion = 1
def __init__(self, dim_in, dim, stride=1, downsample=None):
super(BasicBlock, self).__init__()
norm = cfg.MODEL.BACKBONE_NORM
self.conv1 = nn.Conv3x3(dim_in, dim, stride)
self.bn1 = nn.get_norm(norm, dim)
self.relu = nn.ReLU(inplace=True)
self.conv2 = nn.Conv3x3(dim, dim)
self.bn2 = nn.get_norm(norm, dim)
self.downsample = downsample
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class Bottleneck(nn.Module):
"""The bottleneck resnet block."""
expansion = 4
def __init__(self, dim_in, dim, stride=1, downsample=None):
super(Bottleneck, self).__init__()
groups = cfg.RESNET.NUM_GROUPS
width_per_group = cfg.RESNET.WIDTH_PER_GROUP
norm = cfg.MODEL.BACKBONE_NORM
width = int(dim * (width_per_group / 64.)) * groups
self.conv1 = nn.Conv1x1(dim_in, width)
self.bn1 = nn.get_norm(norm, width)
self.conv2 = nn.Conv3x3(width, width, stride=stride)
self.bn2 = nn.get_norm(norm, width)
self.conv3 = nn.Conv1x1(width, dim * self.expansion)
self.bn3 = nn.get_norm(norm, dim * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class ResNet(nn.Module):
"""The resnet class."""
def __init__(self, block, layers):
super(ResNet, self).__init__()
dim_in, dims, features = 64, [64, 128, 256, 512], []
self.conv1 = nn.Conv2d(3, dim_in, kernel_size=7,
stride=2, padding=3, bias=False)
self.bn1 = nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim_in)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.feature_dims = collections.OrderedDict(stem=64)
for i, repeat, dim in zip(range(4), layers, dims):
stride = 1 if i == 0 else 2
downsample = None
if stride != 1 or dim_in != dim * block.expansion:
downsample = nn.Sequential(
nn.Conv1x1(dim_in, dim * block.expansion, stride=stride),
nn.get_norm(cfg.MODEL.BACKBONE_NORM, dim * block.expansion))
features.append(block(dim_in, dim, stride, downsample))
dim_in = dim * block.expansion
for j in range(repeat - 1):
features.append(block(dim_in, dim))
setattr(self, 'layer%d' % (i + 1), nn.Sequential(*features[-repeat:]))
self.feature_dims[id(features[-1])] = dim_in
self.features = features
self.last_outputs = None
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
init.kaiming_normal(m.weight, mode='fan_out')
if cfg.MODEL.FREEZE_AT > 0:
self.conv1.apply(env.freeze_module)
self.bn1.apply(env.freeze_module)
for i in range(cfg.MODEL.FREEZE_AT, 1, -1):
getattr(self, 'layer{}'.format(i - 1)).apply(env.freeze_module)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
outputs = [None]
for layer in self.features:
x = layer(x)
if self.feature_dims.get(id(layer)):
outputs.append(x)
if self.training:
self.last_outputs = outputs
return outputs
def resnet(depth):
if depth == 18:
layers = [2, 2, 2, 2]
elif depth == 34:
layers = [3, 4, 6, 3]
elif depth == 50:
layers = [3, 4, 6, 3]
elif depth == 101:
layers = [3, 4, 23, 3]
elif depth == 152:
layers = [3, 8, 36, 3]
elif depth == 200:
layers = [3, 24, 36, 3]
elif depth == 269:
layers = [3, 30, 48, 8]
else:
raise ValueError('Unsupported depth: %d' % depth)
block = Bottleneck if depth >= 50 else BasicBlock
return ResNet(block, layers)
registry.backbone.register(['res50', 'resnet50', 'resnet_50'],
func=resnet, depth=50)
registry.backbone.register(['res101', 'resnet101', 'resnet_101'],
func=resnet, depth=101)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RetinaNet head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import math
import dragon.vm.torch as torch
from seetadet.algo import retinanet
from seetadet.core.config import cfg
from seetadet.modules import det
from seetadet.modules import init
from seetadet.modules import nn
from seetadet.utils import stats
class RetinaNet(nn.Module):
def __init__(self, dim_in=256):
super(RetinaNet, self).__init__()
self.data = dict()
########################################
# RetinaNet outputs #
########################################
self.cls_conv = nn.ModuleList(
nn.Conv3x3(dim_in, dim_in, bias=True)
for _ in range(cfg.RETINANET.NUM_CONVS)
)
self.bbox_conv = nn.ModuleList(
nn.Conv3x3(dim_in, dim_in, bias=True)
for _ in range(cfg.RETINANET.NUM_CONVS)
)
self.cls_dim = len(cfg.MODEL.CLASSES) - 1
anchor_dim = (len(cfg.RETINANET.ASPECT_RATIOS) *
cfg.RETINANET.SCALES_PER_OCTAVE)
self.cls_score = nn.Conv3x3(dim_in, self.cls_dim * anchor_dim, bias=True)
self.bbox_pred = nn.Conv3x3(dim_in, 4 * anchor_dim, bias=True)
self.cls_prob = nn.Sigmoid(inplace=True)
self.relu = nn.ReLU(inplace=True)
self.decoder = det.RetinaNetDecoder()
########################################
# RetinaNet losses #
########################################
self.anchor_target = retinanet.AnchorTarget()
self.cls_loss = nn.SigmoidFocalLoss()
if cfg.RETINANET.BBOX_REG_LOSS_TYPE.lower() == 'l1':
self.bbox_loss = nn.L1Loss(reduction='sum')
elif cfg.RETINANET.BBOX_REG_LOSS_TYPE.lower() == 'giou':
self.bbox_loss = nn.GIoULoss(reduction='sum')
else:
self.bbox_loss = nn.SmoothL1Loss(beta=0.1, reduction='sum')
self.normalizer = stats.ExponentialMovingAverage(decay=0.9)
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
init.normal(m.weight, std=0.01)
init.constant(m.bias, 0)
# Bias prior initialization for Focal Loss.
# For details, See the official codes:
# https://github.com/facebookresearch/Detectron
bias_init = -math.log((1 - cfg.PRIOR_PROB) / cfg.PRIOR_PROB)
self.cls_score.bias.fill_(bias_init)
def compute_outputs(self, features):
"""Compute RetinaNet logits."""
cls_score_wide, bbox_pred_wide = [], []
for j, feature in enumerate(features):
cls_input, bbox_input = feature, feature
for i in range(cfg.RETINANET.NUM_CONVS):
cls_input = self.relu(self.cls_conv[i](cls_input))
bbox_input = self.relu(self.bbox_conv[i](bbox_input))
cls_score_wide.append(self.cls_score(cls_input).view(0, self.cls_dim, - 1))
bbox_pred_wide.append(self.bbox_pred(bbox_input).view(0, 4, -1))
if len(features) > 1:
return (torch.cat(cls_score_wide, dim=2),
torch.cat(bbox_pred_wide, dim=2))
else:
return cls_score_wide[0], bbox_pred_wide[0]
def compute_losses(self, **inputs):
"""Compute RetinaNet classification and regression loss."""
self.data = self.anchor_target(**inputs)
bbox_pred = inputs['bbox_pred'].permute(0, 2, 1) \
.index_select((0, 1), self.data['bbox_inds'])
self.normalizer.add_value(self.data['bbox_inds'].size(0))
cls_loss_weight = 1.0 / self.normalizer.running_average()
bbox_loss_weight = (cfg.RETINANET.BBOX_REG_LOSS_WEIGHT /
self.normalizer.running_average())
outputs = collections.OrderedDict([
('cls_loss', self.cls_loss(
inputs['cls_score'],
self.data['labels'],) * cls_loss_weight),
('bbox_loss', self.bbox_loss(
bbox_pred,
self.data['bbox_targets'],
self.data['bbox_anchors']) * bbox_loss_weight)])
return outputs
def forward(self, **kwargs):
cls_score, bbox_pred = self.compute_outputs(kwargs['features'])
cls_score, bbox_pred = cls_score.float(), bbox_pred.float()
outputs = collections.OrderedDict([('bbox_pred', bbox_pred)])
if self.training:
outputs.update(
self.compute_losses(
features=kwargs['features'],
cls_score=cls_score,
bbox_pred=bbox_pred,
fg_inds=kwargs['fg_inds'],
bg_inds=kwargs['bg_inds'],
gt_boxes=kwargs['gt_boxes'],
)
)
else:
outputs['detections'] = self.decoder(
kwargs['features'],
self.cls_prob(cls_score).permute(0, 2, 1),
bbox_pred,
kwargs['im_info'],
)
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RPN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import dragon.vm.torch as torch
from seetadet.algo import faster_rcnn
from seetadet.core.config import cfg
from seetadet.modules import init
from seetadet.modules import nn
class RPN(nn.Module):
"""Region Proposal Networks for R-CNN series."""
def __init__(self, dim_in=256):
super(RPN, self).__init__()
self.data = {}
##################################
# RPN outputs #
##################################
num_anchors = len(cfg.RPN.ASPECT_RATIOS) * (
len(cfg.RPN.SCALES) if len(cfg.RPN.STRIDES) == 1 else 1)
self.output = nn.Conv3x3(dim_in, dim_in, bias=True)
self.cls_score = nn.Conv1x1(dim_in, num_anchors, bias=True)
self.bbox_pred = nn.Conv1x1(dim_in, num_anchors * 4, bias=True)
self.relu = nn.ReLU(inplace=True)
##################################
# RPN losses #
##################################
self.anchor_target = faster_rcnn.AnchorTarget()
self.cls_loss = nn.BCEWithLogitsLoss(reduction='mean')
if cfg.RPN.BBOX_REG_LOSS_TYPE.lower() == 'l1':
self.bbox_loss = nn.L1Loss(reduction='sum')
else:
self.bbox_loss = nn.SmoothL1Loss(beta=0.1, reduction='sum')
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
init.normal(m.weight, std=0.01)
init.constant(m.bias, 0)
def compute_outputs(self, features):
"""Compute the RPN logits."""
cls_score_wide, bbox_pred_wide = [], []
for i, feature in enumerate(features):
x = self.relu(self.output(feature))
cls_score_wide.append(self.cls_score(x).view(0, -1))
bbox_pred_wide.append(self.bbox_pred(x).view(0, 4, -1))
if len(features) > 1:
return (torch.cat(cls_score_wide, dim=1),
torch.cat(bbox_pred_wide, dim=2))
else:
return cls_score_wide[0], bbox_pred_wide[0]
def compute_losses(self, **inputs):
"""Compute the RPN classification loss and regression loss."""
self.data = self.anchor_target(**inputs)
cls_score = inputs['cls_score'] \
.index_select((0, 1), self.data['cls_inds'])
bbox_pred = inputs['bbox_pred'].permute(0, 2, 1) \
.index_select((0, 1), self.data['bbox_inds'])
batch_size = cfg.RPN.BATCH_SIZE * cfg.TRAIN.IMS_PER_BATCH
bbox_loss_weight = cfg.RPN.BBOX_REG_LOSS_WEIGHT / float(batch_size)
return collections.OrderedDict([
('rpn_cls_loss', self.cls_loss(
cls_score,
self.data['labels'])),
('rpn_bbox_loss', self.bbox_loss(
bbox_pred,
self.data['bbox_targets'],
self.data['bbox_anchors']) * bbox_loss_weight),
])
def forward(self, **kwargs):
cls_score, bbox_pred = \
self.compute_outputs(kwargs['features'])
outputs = collections.OrderedDict([
('rpn_cls_score', cls_score.float()),
('rpn_bbox_pred', bbox_pred.float()),
])
if self.training:
outputs.update(
self.compute_losses(
features=kwargs['features'],
cls_score=outputs['rpn_cls_score'],
bbox_pred=outputs['rpn_bbox_pred'],
fg_inds=kwargs['fg_inds'],
bg_inds=kwargs['bg_inds'],
gt_boxes=kwargs['gt_boxes'],
)
)
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""SSD head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import dragon.vm.torch as torch
from seetadet.algo import ssd
from seetadet.core.config import cfg
from seetadet.modules import init
from seetadet.modules import nn
from seetadet.utils import stats
class SSD(nn.Module):
def __init__(self, feature_dims):
super(SSD, self).__init__()
self.data = {}
########################################
# SSD outputs #
########################################
self.cls_conv = torch.nn.ModuleList(
nn.Conv3x3(feature_dims[0], feature_dims[0], bias=True)
for _ in range(cfg.SSD.NUM_CONVS))
self.bbox_conv = torch.nn.ModuleList(
nn.Conv3x3(feature_dims[0], feature_dims[0], bias=True)
for _ in range(cfg.SSD.NUM_CONVS))
self.cls_score = nn.ModuleList()
self.bbox_pred = nn.ModuleList()
self.softmax = nn.Softmax(dim=2)
self.relu = nn.ReLU(inplace=True)
self.box_dim = len(cfg.BBOX_REG_WEIGHTS)
if len(feature_dims) != len(cfg.SSD.STRIDES):
# FPN case, all strides share the same feature dim
feature_dims = [feature_dims[0]] * len(cfg.SSD.STRIDES)
for i, dim in enumerate(feature_dims):
ratios = cfg.SSD.ASPECT_RATIOS[i]
if not isinstance(ratios, (tuple, list)):
# Legacy case, All strides share the same ratios
ratios = cfg.SSD.ASPECT_RATIOS
nc, na = len(cfg.MODEL.CLASSES), len(ratios) + 1
self.cls_score.append(nn.Conv3x3(dim, na * nc, bias=True))
self.bbox_pred.append(nn.Conv3x3(dim, na * self.box_dim, bias=True))
########################################
# SSD losses #
########################################
self.anchor_target = ssd.AnchorTarget()
self.cls_loss = nn.CrossEntropyLoss(reduction='sum')
if cfg.SSD.BBOX_REG_LOSS_TYPE.lower() == 'l1':
self.bbox_loss = nn.L1Loss(reduction='sum')
elif cfg.SSD.BBOX_REG_LOSS_TYPE.lower() == 'giou':
self.bbox_loss = nn.GIoULoss(
reduction='sum', delta_weights=cfg.BBOX_REG_WEIGHTS)
else:
self.bbox_loss = nn.SmoothL1Loss(beta=1.0, reduction='sum')
self.normalizer = stats.ExponentialMovingAverage(decay=0.9)
self.reset_parameters()
def reset_parameters(self):
if cfg.SSD.NUM_CONVS > 0:
for m in self.modules():
if isinstance(m, nn.Conv2d):
init.normal(m.weight, std=0.01)
init.constant(m.bias, 0)
else:
for m in self.modules():
if isinstance(m, nn.Conv2d):
init.normal(m.weight, std=0.001)
init.constant(m.bias, 0)
def compute_outputs(self, features):
"""Compute SSD logits."""
cls_score_wide, bbox_pred_wide = [], []
for i, feature in enumerate(features):
cls_input, bbox_input = feature, feature
for j in range(cfg.SSD.NUM_CONVS):
cls_input = self.relu(self.cls_conv[j](cls_input))
bbox_input = self.relu(self.bbox_conv[j](bbox_input))
cls_score_wide.append(
self.cls_score[i](cls_input)
.permute(0, 2, 3, 1).view(0, -1))
bbox_pred_wide.append(
self.bbox_pred[i](bbox_input)
.permute(0, 2, 3, 1).view(0, -1))
return (torch.cat(cls_score_wide, dim=1)
.view(0, -1, len(cfg.MODEL.CLASSES)),
torch.cat(bbox_pred_wide, dim=1)
.view(0, -1, self.box_dim))
def compute_losses(self, **inputs):
"""Compute tSSD classification and regression loss."""
self.data = self.anchor_target(**inputs)
bbox_pred = inputs['bbox_pred'] \
.index_select((0, 1), self.data['bbox_inds'])
self.normalizer.add_value(self.data['bbox_inds'].size(0))
cls_loss_weight = 1.0 / self.normalizer.running_average()
bbox_loss_weight = (cfg.SSD.BBOX_REG_LOSS_WEIGHT /
self.normalizer.running_average())
return collections.OrderedDict([
('cls_loss', self.cls_loss(
inputs['cls_score'].view(-1, len(cfg.MODEL.CLASSES)),
self.data['labels']) * cls_loss_weight),
('bbox_loss', self.bbox_loss(
bbox_pred,
self.data['bbox_targets'],
self.data['bbox_anchors']) * bbox_loss_weight)
])
def forward(self, **kwargs):
cls_score, bbox_pred = self.compute_outputs(kwargs['features'])
cls_score, bbox_pred = cls_score.float(), bbox_pred.float()
if cls_score.size(1) != self.anchor_target.all_anchors.shape[0]:
raise ValueError('Misalignment between default anchors and features.\n'
'Specify correct <SSD.STRIDES> to avoid this problem.')
outputs = collections.OrderedDict([
('bbox_pred', bbox_pred),
('prior_boxes', self.anchor_target.all_anchors),
])
if self.training:
outputs.update(
self.compute_losses(
cls_score=cls_score,
bbox_pred=bbox_pred,
cls_prob=self.softmax(cls_score.data),
fg_inds=kwargs['fg_inds'],
bg_inds=kwargs['bg_inds'],
gt_boxes=kwargs['gt_boxes'],
)
)
else:
outputs['cls_prob'] = self.softmax(cls_score)
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""VGGNet backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
from seetadet.core import registry
from seetadet.modules import init
from seetadet.modules import nn
class VGG(nn.Module):
"""The VGG net class."""
def __init__(self, model_cfg, extra_cfg=None):
super(VGG, self).__init__()
layers, features, dim_in = [], [], 3
self.feature_dims = collections.OrderedDict()
self.feature_norms = nn.ModuleList()
for v in model_cfg:
if v == 'M':
features.append(nn.Sequential(*layers))
if extra_cfg and len(features) == 5:
layers = [nn.MaxPool2d(kernel_size=3, padding=1)]
else:
layers = [nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)]
if len(features) > 1:
self.feature_dims[id(features[-1])] = dim_in
if extra_cfg and len(features) == 4:
self.feature_norms.append(nn.L2Normalize(dim_in, init=20.))
else:
conv2d = nn.Conv2d(dim_in, v, kernel_size=3, padding=1)
layers += [conv2d, nn.ReLU(inplace=True)]
dim_in = v
if extra_cfg:
lowest_lvl = id(features[3])
self.feature_dims = collections.OrderedDict(
[(lowest_lvl, self.feature_dims[lowest_lvl])])
layers += [nn.Conv2d(dim_in, 1024, kernel_size=3, padding=6, dilation=6)]
layers += [nn.ReLU(inplace=True)]
layers += [nn.Conv2d(1024, 1024, kernel_size=1)]
layers += [nn.ReLU(inplace=True)]
features.append(nn.Sequential(*layers))
self.feature_dims[id(features[-1])] = dim_in = 1024
for c, (k, s, p) in extra_cfg:
features.append(nn.Sequential(
nn.Conv2d(dim_in, c, kernel_size=1),
nn.ReLU(inplace=True),
nn.Conv2d(c, c * 2, kernel_size=k, stride=s, padding=p),
nn.ReLU(inplace=True),
))
self.feature_dims[id(features[-1])] = dim_in = c * 2
self.features = nn.Sequential(*features)
self.last_outputs = None
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
init.xavier_uniform(m.weight)
init.constant(m.bias, 0)
def forward(self, x):
outputs = []
for layer in self.features:
x = layer(x)
if self.feature_dims.get(id(layer)):
outputs.append(x)
for i, norm_layer in enumerate(self.feature_norms):
outputs[i] = norm_layer(outputs[i])
if self.training:
self.last_outputs = outputs
return outputs
def vgg16(extra_cfg=None):
model_cfg = [64, 64, 'M',
128, 128, 'M',
256, 256, 256, 'M',
512, 512, 512, 'M',
512, 512, 512, 'M']
return VGG(model_cfg, extra_cfg)
def vgg16_reduced(scale=300):
if scale == 300:
extra_cfg = [(256, (3, 2, 1)),
(128, (3, 2, 1)),
(128, (3, 1, 0)),
(128, (3, 1, 0))]
elif scale == 512:
extra_cfg = [(256, (3, 2, 1)),
(128, (3, 2, 1)),
(128, (3, 2, 1)),
(128, (3, 2, 1)),
(128, (4, 1, 1))]
else:
raise ValueError('Unsupported scale: {}'.format(scale))
return vgg16(extra_cfg)
registry.backbone.register('vgg16_reduced_300', vgg16_reduced, scale=300)
registry.backbone.register('vgg16_reduced_512', vgg16_reduced, scale=512)
...@@ -8,25 +8,15 @@ ...@@ -8,25 +8,15 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Modeling utilities.""" """Models."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
# Backbone # Modules.
import seetadet.modeling.airnet from seetadet.models import backbones
import seetadet.modeling.mobilenet_v2 from seetadet.models import decoders
import seetadet.modeling.mobilenet_v3 from seetadet.models import dense_heads
import seetadet.modeling.resnet from seetadet.models import detectors
import seetadet.modeling.vgg from seetadet.models import roi_heads
# FeatureEnhancer
from seetadet.modeling.fpn import FPN
# RoIHead
from seetadet.modeling.fast_rcnn import FastRCNN
from seetadet.modeling.mask_rcnn import MaskRCNN
from seetadet.modeling.retinanet import RetinaNet
from seetadet.modeling.rpn import RPN
from seetadet.modeling.ssd import SSD
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Backbones."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# Modules
from seetadet.models.backbones import airnet
from seetadet.models.backbones import bifpn
from seetadet.models.backbones import efficientnet
from seetadet.models.backbones import fpn
from seetadet.models.backbones import mobilenet_v2
from seetadet.models.backbones import mobilenet_v3
from seetadet.models.backbones import resnet
from seetadet.models.backbones import repvgg
from seetadet.models.backbones import vgg
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Vanilla FPN neck."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.models.build import BACKBONES
from seetadet.ops.conv import ConvNorm2d
@BACKBONES.register('fpn')
class FPN(nn.Module):
"""FPN to enhance input features."""
def __init__(self, in_dims):
super(FPN, self).__init__()
lateral_conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.FPN.NORM)
output_conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.FPN.NORM, conv_type=cfg.FPN.CONV)
self.dim = cfg.FPN.DIM
self.min_lvl = cfg.FPN.MIN_LEVEL
self.max_lvl = cfg.FPN.MAX_LEVEL
self.highest_lvl = min(self.max_lvl, len(in_dims))
self.coarsest_stride = cfg.BACKBONE.COARSEST_STRIDE
self.out_dims = [self.dim] * (self.max_lvl - self.min_lvl + 1)
self.lateral_conv = nn.ModuleList()
self.output_conv = nn.ModuleList()
for dim in in_dims[self.min_lvl - 1:self.highest_lvl + 1]:
self.lateral_conv += [lateral_conv_module(dim, self.dim, 1)]
self.output_conv += [output_conv_module(self.dim, self.dim, 3)]
if 'rcnn' not in cfg.MODEL.TYPE:
for lvl in range(self.highest_lvl + 1, self.max_lvl + 1):
dim = in_dims[-1] if lvl == self.highest_lvl + 1 else self.dim
self.output_conv += [output_conv_module(dim, self.dim, 3, stride=2)]
def forward(self, features):
features = features[self.min_lvl - 1:self.highest_lvl + 1]
laterals = [conv(x) for conv, x in zip(self.lateral_conv, features)]
for i in range(len(features) - 1, 0, -1):
y, x = laterals[i - 1], laterals[i]
scale = 2 if self.coarsest_stride > 1 else None
size = None if self.coarsest_stride > 1 else y.shape[2:]
y += nn.functional.interpolate(x, size, scale)
outputs = [conv(x) for conv, x in zip(self.output_conv, laterals)]
if len(self.output_conv) <= len(self.lateral_conv):
for _ in range(len(outputs), len(self.out_dims)):
outputs.append(nn.functional.max_pool2d(outputs[-1], 1, stride=2))
else:
outputs.append(self.output_conv[len(outputs)](features[-1]))
for i in range(len(outputs), len(self.out_dims)):
outputs.append(self.output_conv[i](nn.functional.relu(outputs[-1])))
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""MobileNetV2 backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.models.build import BACKBONES
from seetadet.ops.conv import ConvNorm2d
def make_divisible(v, divisor=8):
"""Return the divisible value."""
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
class InvertedResidual(nn.Module):
"""Invert residual block."""
def __init__(self, dim_in, dim_out, kernel_size=3, stride=1, expand_ratio=6):
super(InvertedResidual, self).__init__()
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
activation_type='ReLU6')
self.has_endpoint = stride == 2
self.apply_shortcut = stride == 1 and dim_in == dim_out
self.dim = dim = int(round(dim_in * expand_ratio))
self.conv1 = (conv_module(dim_in, dim, 1)
if expand_ratio > 1 else nn.Identity())
self.conv2 = conv_module(dim, dim, kernel_size, stride, groups=dim)
self.conv3 = conv_module(dim, dim_out, 1, activation_type='')
def forward(self, x):
shortcut = x
x = self.conv1(x)
if self.has_endpoint:
self.endpoint = x
x = self.conv2(x)
x = self.conv3(x)
if self.apply_shortcut:
return x.add_(shortcut)
return x
class MobileNetV2(nn.Module):
"""MobileNetV2 class."""
def __init__(self, depths, dims, strides, expand_ratios, width_mult=1.0):
super(MobileNetV2, self).__init__()
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
activation_type='ReLU6')
dims = list(map(lambda x: make_divisible(x * width_mult), dims))
self.conv1 = conv_module(3, dims[0], 3, 2)
dim_in, blocks = dims[0], []
self.out_indices, self.out_dims = [], []
for i, (depth, dim) in enumerate(zip(depths, dims[1:-1])):
for j in range(depth):
stride = strides[i] if j == 0 else 1
blocks.append(InvertedResidual(
dim_in, dim, stride=stride,
expand_ratio=expand_ratios[i]))
if blocks[-1].has_endpoint:
self.out_indices.append(len(blocks) - 1)
self.out_dims.append(blocks[-1].dim)
dim_in = dim
setattr(self, 'layer%d' % (i + 1), nn.Sequential(*blocks[-depth:]))
self.conv2 = conv_module(dim_in, dims[-1], 1)
self.blocks = blocks + [self.conv2]
self.out_dims.append(dims[-1])
self.out_indices.append(len(self.blocks) - 1)
def forward(self, x):
x = self.conv1(x)
outputs = []
for i, blk in enumerate(self.blocks):
x = blk(x)
if i in self.out_indices:
outputs.append(blk.__dict__.pop('endpoint', x))
return outputs
BACKBONES.register(
'mobilenet_v2', MobileNetV2,
dims=(32,) + (16, 24, 32, 64, 96, 160, 320) + (1280,),
depths=(1, 2, 3, 4, 3, 3, 1),
strides=(1, 2, 2, 2, 1, 2, 1),
expand_ratios=(1, 6, 6, 6, 6, 6, 6))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""MobileNetV3 backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.models.build import BACKBONES
from seetadet.ops.conv import ConvNorm2d
def make_divisible(v, divisor=8):
"""Return the divisible value."""
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
class SqueezeExcite(nn.Module):
"""Squeeze-and-Excitation block."""
def __init__(self, dim_in, dim):
super(SqueezeExcite, self).__init__()
self.conv1 = nn.Conv2d(dim_in, dim, 1)
self.conv2 = nn.Conv2d(dim, dim_in, 1)
self.activation1 = nn.ReLU(True)
self.activation2 = nn.Hardsigmoid(True)
def forward(self, x):
scale = x.mean((2, 3), keepdim=True)
scale = self.activation1(self.conv1(scale))
scale = self.activation2(self.conv2(scale))
return x * scale
class InvertedResidual(nn.Module):
"""Invert residual block."""
def __init__(
self,
dim_in,
dim_out,
kernel_size=3,
stride=1,
expand_ratio=3,
squeeze_ratio=1,
activation_type='ReLU',
):
super(InvertedResidual, self).__init__()
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
activation_type=activation_type)
self.has_endpoint = stride == 2
self.apply_shortcut = stride == 1 and dim_in == dim_out
self.dim = dim = int(round(dim_in * expand_ratio))
self.conv1 = (conv_module(dim_in, dim, 1)
if expand_ratio > 1 else nn.Identity())
self.conv2 = conv_module(dim, dim, kernel_size, stride, groups=dim)
self.se = (SqueezeExcite(dim, make_divisible(dim * squeeze_ratio))
if squeeze_ratio < 1 else nn.Identity())
self.conv3 = conv_module(dim, dim_out, 1, activation_type='')
def forward(self, x):
shortcut = x
x = self.conv1(x)
if self.has_endpoint:
self.endpoint = x
x = self.conv2(x)
x = self.se(x)
x = self.conv3(x)
if self.apply_shortcut:
return x.add_(shortcut)
return x
class MobileNetV3(nn.Module):
"""MobileNetV3 class."""
def __init__(self, depths, dims, kernel_sizes, strides,
expand_ratios, squeeze_ratios, width_mult=1.0):
super(MobileNetV3, self).__init__()
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
activation_type='Hardswish')
dims = list(map(lambda x: make_divisible(x * width_mult), dims))
self.conv1 = conv_module(3, dims[0], 3, 2)
dim_in, blocks, coarsest_stride = dims[0], [], 2
self.out_indices, self.out_dims = [], []
for i, (depth, dim) in enumerate(zip(depths, dims[1:])):
coarsest_stride *= strides[i]
layer_expand_ratios = expand_ratios[i]
if not isinstance(layer_expand_ratios, (tuple, list)):
layer_expand_ratios = [layer_expand_ratios]
layer_expand_ratios = list(layer_expand_ratios)
layer_expand_ratios += ([layer_expand_ratios[-1]] *
(depth - len(layer_expand_ratios)))
for j in range(depth):
blocks.append(InvertedResidual(
dim_in, dim,
kernel_size=kernel_sizes[i],
stride=strides[i] if j == 0 else 1,
expand_ratio=layer_expand_ratios[j],
squeeze_ratio=squeeze_ratios[i],
activation_type='Hardswish'
if coarsest_stride >= 16 else 'ReLU'))
if blocks[-1].has_endpoint:
self.out_indices.append(len(blocks) - 1)
self.out_dims.append(blocks[-1].dim)
dim_in = dim
setattr(self, 'layer%d' % (i + 1), nn.Sequential(*blocks[-depth:]))
self.conv2 = conv_module(dim_in, blocks[-1].dim, 1)
self.blocks = blocks + [self.conv2]
self.out_dims.append(blocks[-1].dim)
self.out_indices.append(len(self.blocks) - 1)
def forward(self, x):
x = self.conv1(x)
outputs = []
for i, blk in enumerate(self.blocks):
x = blk(x)
if i in self.out_indices:
outputs.append(blk.__dict__.pop('endpoint', x))
return outputs
BACKBONES.register(
'mobilenet_v3_large', MobileNetV3,
dims=(16,) + (16, 24, 40, 80, 112, 160),
depths=(1, 2, 3, 4, 2, 3),
kernel_sizes=(3, 3, 5, 3, 3, 5),
strides=(1, 2, 2, 2, 1, 2),
expand_ratios=(1, (4, 3), 3, (6, 2.5, 2.3, 2.3), 6, 6),
squeeze_ratios=(1, 1, 0.25, 1, 0.25, 0.25))
BACKBONES.register(
'mobilenet_v3_small', MobileNetV3,
dims=(16,) + (16, 24, 40, 48, 96),
depths=(1, 2, 3, 2, 3),
kernel_sizes=(3, 3, 5, 5, 5),
strides=(2, 2, 2, 1, 2),
expand_ratios=(1, (4.5, 88. / 24), (4, 6, 6), 3, 6),
squeeze_ratios=(0.25, 1, 0.25, 0.25, 0.25))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""ResNet backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import itertools
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.core.training.utils import freeze_module
from seetadet.models.build import BACKBONES
from seetadet.ops.build import build_norm
class BasicBlock(nn.Module):
"""The basic resnet block."""
expansion = 1
def __init__(self, dim_in, dim, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(dim_in, dim, 3, stride, padding=1, bias=False)
self.bn1 = build_norm(dim, cfg.BACKBONE.NORM)
self.relu = nn.ReLU(True)
self.conv2 = nn.Conv2d(dim, dim, 3, padding=1, bias=False)
self.bn2 = build_norm(dim, cfg.BACKBONE.NORM)
self.downsample = downsample
def forward(self, x):
shortcut = x
x = self.relu(self.bn1(self.conv1(x)))
x = self.bn2(self.conv2(x))
if self.downsample is not None:
shortcut = self.downsample(shortcut)
return self.relu(x.add_(shortcut))
class Bottleneck(nn.Module):
"""The bottleneck resnet block."""
expansion = 4
groups, width_per_group = 1, 64
def __init__(self, dim_in, dim, stride=1, downsample=None):
super(Bottleneck, self).__init__()
width = int(dim * (self.width_per_group / 64.)) * self.groups
self.conv1 = nn.Conv2d(dim_in, width, 1, bias=False)
self.bn1 = build_norm(width, cfg.BACKBONE.NORM)
self.conv2 = nn.Conv2d(width, width, 3, stride, padding=1, bias=False)
self.bn2 = build_norm(width, cfg.BACKBONE.NORM)
self.conv3 = nn.Conv2d(width, dim * self.expansion, 1, bias=False)
self.bn3 = build_norm(dim * self.expansion, cfg.BACKBONE.NORM)
self.relu = nn.ReLU(True)
self.downsample = downsample
def forward(self, x):
shortcut = x
x = self.relu(self.bn1(self.conv1(x)))
x = self.relu(self.bn2(self.conv2(x)))
x = self.bn3(self.conv3(x))
if self.downsample is not None:
shortcut = self.downsample(shortcut)
return self.relu(x.add_(shortcut))
class ResNet(nn.Module):
"""ResNet class."""
def __init__(self, block, depths):
super(ResNet, self).__init__()
dim_in, dims, blocks = 64, [64, 128, 256, 512], []
self.out_indices = [v - 1 for v in itertools.accumulate(depths)]
self.out_dims = [dim_in] + [v * block.expansion for v in dims]
self.conv1 = nn.Conv2d(3, dim_in, 7, 2, padding=3, bias=False)
self.bn1 = build_norm(dim_in, cfg.BACKBONE.NORM)
self.relu = nn.ReLU(True)
self.maxpool = nn.MaxPool2d(3, 2, padding=1)
for i, depth, dim in zip(range(4), depths, dims):
downsample, stride = None, 1 if i == 0 else 2
if stride != 1 or dim_in != dim * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(dim_in, dim * block.expansion, 1, stride, bias=False),
build_norm(dim * block.expansion, cfg.BACKBONE.NORM))
blocks.append(block(dim_in, dim, stride, downsample))
dim_in = dim * block.expansion
for _ in range(depth - 1):
blocks.append(block(dim_in, dim))
setattr(self, 'layer%d' % (i + 1), nn.Sequential(*blocks[-depth:]))
self.blocks = blocks
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(
m.weight, mode='fan_out', nonlinearity='relu')
num_freeze_stages = cfg.BACKBONE.FREEZE_AT
if num_freeze_stages > 0:
self.conv1.apply(freeze_module)
self.bn1.apply(freeze_module)
for i in range(num_freeze_stages - 1, 0, -1):
getattr(self, 'layer%d' % i).apply(freeze_module)
def forward(self, x):
x = self.relu(self.bn1(self.conv1(x)))
x = self.maxpool(x)
outputs = [None]
for i, blk in enumerate(self.blocks):
x = blk(x)
if i in self.out_indices:
outputs.append(x)
return outputs
BACKBONES.register('resnet18', func=ResNet, block=BasicBlock, depths=[2, 2, 2, 2])
BACKBONES.register('resnet34', func=ResNet, block=BasicBlock, depths=[3, 4, 6, 3])
BACKBONES.register('resnet50', func=ResNet, block=Bottleneck, depths=[3, 4, 6, 3])
BACKBONES.register('resnet101', func=ResNet, block=Bottleneck, depths=[3, 4, 23, 3])
BACKBONES.register('resnet152', func=ResNet, block=Bottleneck, depths=[3, 8, 36, 3])
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""VGGNet backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import itertools
import functools
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.models.build import BACKBONES
from seetadet.ops.build import build_norm
from seetadet.ops.conv import ConvNorm2d
from seetadet.ops.normalization import L2Norm
class VGGBlock(nn.Module):
"""The VGG block."""
def __init__(self, dim_in, dim, downsample=None):
super(VGGBlock, self).__init__()
self.conv = nn.Conv2d(dim_in, dim, 3, padding=1,
bias=not cfg.BACKBONE.NORM)
self.bn = build_norm(dim, cfg.BACKBONE.NORM)
self.relu = nn.ReLU(True)
self.downsample = downsample
def forward(self, x):
if self.downsample is not None:
x = self.downsample(x)
return self.relu(self.bn(self.conv(x)))
class VGG(nn.Module):
"""VGGNet."""
def __init__(self, depths):
super(VGG, self).__init__()
dim_in, dims, blocks = 3, [64, 128, 256, 512, 512], []
self.out_indices = [v - 1 for v in itertools.accumulate(depths)][1:]
self.out_dims = dims[1:]
for i, (depth, dim) in enumerate(zip(depths, dims)):
downsample = nn.MaxPool2d(2, 2, ceil_mode=True) if i > 0 else None
blocks.append(VGGBlock(dim_in, dim, downsample))
for _ in range(depth - 1):
blocks.append(VGGBlock(dim, dim))
setattr(self, 'layer%d' % i, nn.Sequential(*blocks[-depth:]))
dim_in = dim
self.blocks = blocks
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(
m.weight, mode='fan_out', nonlinearity='relu')
def forward(self, x):
outputs = []
for i, blk in enumerate(self.blocks):
x = blk(x)
if i in self.out_indices:
outputs.append(x)
return outputs
class VGGFCN(VGG):
"""Fully convolutional VGGNet in SSD."""
def __init__(self, depths):
super(VGGFCN, self).__init__(depths)
dim_in, out_index = self.out_dims[-1], self.out_indices[-1]
self.blocks.append(nn.Sequential(
nn.MaxPool2d(3, padding=1),
nn.Conv2d(dim_in, 1024, 3, padding=6, dilation=6),
nn.ReLU(True)))
self.blocks.append(nn.Sequential(nn.Conv2d(1024, 1024, 1), nn.ReLU(True)))
self.layer4.add_module(str(len(self.layer4)), self.blocks[-2])
self.layer4.add_module(str(len(self.layer4)), self.blocks[-1])
self.out_dims = [self.out_dims[-2], 1024] # conv4_3, fc7
self.out_indices = [self.out_indices[-2], out_index + 2] # 9, 14
self.norm = L2Norm(dim_in, init=20.0)
def forward(self, x):
outputs = super(VGGFCN, self).forward(x)
outputs[0] = self.norm(outputs[0])
return outputs
class SSDNeck(nn.Module):
"""Feature Pyramid Network."""
def __init__(self, in_dims, out_dims, kernel_sizes, strides, paddings):
super(SSDNeck, self).__init__()
self.out_dims = list(in_dims[-2:]) + list(out_dims)
dim_in, self.blocks = in_dims[-1], nn.ModuleList()
conv_module = functools.partial(
ConvNorm2d, conv_type=cfg.FPN.CONV,
norm_type=cfg.FPN.NORM, activation_type=cfg.FPN.ACTIVATION)
for dim, kernel_size, stride, padding in zip(
out_dims, kernel_sizes, strides, paddings):
self.blocks.append(conv_module(dim_in, dim // 2, 1))
self.blocks.append(conv_module(dim // 2, dim, kernel_size, stride, padding))
dim_in = dim
def forward(self, features):
x, outputs = features[-1], features[-2:]
for i, blk in enumerate(self.blocks):
x = blk(x)
if i % 2 > 0:
outputs.append(x)
return outputs
BACKBONES.register('vgg16', VGG, depths=(2, 2, 3, 3, 3))
BACKBONES.register('vgg16_fcn', VGGFCN, depths=(2, 2, 3, 3, 3))
BACKBONES.register(
'ssd300', SSDNeck,
out_dims=(512, 256, 256, 256),
kernel_sizes=(3, 3, 3, 3),
strides=(2, 2, 1, 1),
paddings=(1, 1, 0, 0))
BACKBONES.register(
'ssd512', SSDNeck,
out_dims=(512, 256, 256, 256, 256),
kernel_sizes=(3, 3, 3, 3, 4),
strides=(2, 2, 2, 2, 1),
paddings=(1, 1, 1, 1, 1))
BACKBONES.register(
'ssdlite', SSDNeck,
out_dims=(512, 256, 256, 128),
kernel_sizes=(3, 3, 3, 3),
strides=(2, 2, 2, 2),
paddings=(1, 1, 1, 1))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build for models."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.core.registry import Registry
from seetadet.utils.profiler import Timer
BACKBONES = Registry('backbones')
DETECTORS = Registry('detectors')
def build_backbone():
"""Build the backbone."""
backbone_types = cfg.BACKBONE.TYPE.lower().split('.')
backbone = BACKBONES.get(backbone_types[0])()
backbone_dims = backbone.out_dims
neck = nn.Identity()
if len(backbone_types) > 1:
neck = BACKBONES.get(backbone_types[1])(backbone_dims)
else:
neck.out_dims = backbone_dims
return backbone, neck
def build_detector(device=None, weights=None, training=False):
"""Create a detector instance.
Parameters
----------
device : int, optional
The index of compute device.
weights : str, optional
The path of weight file.
training : bool, optional, default=False
Return a training detector or not.
"""
model = DETECTORS.get(cfg.MODEL.TYPE)()
if model is None:
raise ValueError('Unknown detector: ' + cfg.MODEL.TYPE)
if weights is not None:
model.load_weights(weights, strict=True)
if device is not None:
model.cuda(device)
if not training:
model.eval()
model.optimize_for_inference()
model.timers = collections.defaultdict(Timer)
return model
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RetinaNet decoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm.torch import autograd
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.data.anchors.rpn import AnchorGenerator
class RetinaNetDecoder(nn.Module):
"""Decode predictions from retinanet."""
def __init__(self):
super(RetinaNetDecoder, self).__init__()
self.anchor_generator = AnchorGenerator(
strides=cfg.ANCHOR_GENERATOR.STRIDES,
sizes=cfg.ANCHOR_GENERATOR.SIZES,
aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS,
scales_per_octave=3)
self.pre_nms_top_n = cfg.RETINANET.PRE_NMS_TOP_N
self.score_thresh = float(cfg.TEST.SCORE_THRESH)
def forward(self, inputs):
input_tags = ['cls_score', 'bbox_pred', 'im_info', 'grid_info']
return autograd.Function.apply(
'RetinaNetDecoder',
inputs['cls_score'].device,
inputs=[inputs[k] for k in input_tags],
strides=self.anchor_generator.strides,
ratios=self.anchor_generator.aspect_ratios[0],
scales=self.anchor_generator.scales[0],
pre_nms_top_n=self.pre_nms_top_n,
score_thresh=self.score_thresh,
)
autograd.Function.register(
'RetinaNetDecoder', lambda **kwargs: {
'strides': kwargs.get('strides', []),
'ratios': kwargs.get('ratios', []),
'scales': kwargs.get('scales', []),
'pre_nms_top_n': kwargs.get('pre_nms_top_n', 1),
'score_thresh': kwargs.get('score_thresh', 0.),
'check_device': False,
})
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RPN decoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm.torch import autograd
from dragon.vm.torch import nn
import numpy as np
from seetadet.core.config import cfg
from seetadet.data.anchors.rpn import AnchorGenerator
from seetadet.utils.bbox import bbox_transform_inv
from seetadet.utils.bbox import clip_boxes
from seetadet.utils.nms import gpu_nms
class RPNDecoder(nn.Module):
"""Generate proposal regions from RPN."""
def __init__(self):
super(RPNDecoder, self).__init__()
self.anchor_generator = AnchorGenerator(
strides=cfg.ANCHOR_GENERATOR.STRIDES,
sizes=cfg.ANCHOR_GENERATOR.SIZES,
aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS)
self.min_level = cfg.FRCNN.MIN_LEVEL
self.max_level = cfg.FRCNN.MAX_LEVEL
self.pre_nms_top_n = {True: cfg.RPN.PRE_NMS_TOP_N_TRAIN,
False: cfg.RPN.PRE_NMS_TOP_N_TEST}
self.post_nms_top_n = {True: cfg.RPN.POST_NMS_TOP_N_TRAIN,
False: cfg.RPN.POST_NMS_TOP_N_TEST}
self.nms_thresh = float(cfg.RPN.NMS_THRESH)
def decode_proposals(self, scores, deltas, anchors, im_info):
pre_nms_top_n = self.pre_nms_top_n[self.training]
post_nms_top_n = self.post_nms_top_n[self.training]
if pre_nms_top_n <= 0 or pre_nms_top_n >= len(scores):
order = np.argsort(-scores.squeeze())
else:
# Avoid sorting possibly large arrays; First partition to get top K
# unsorted and then sort just those (~20x faster for 200k scores)
inds = np.argpartition(-scores.squeeze(), pre_nms_top_n)[:pre_nms_top_n]
order = np.argsort(-scores[inds].squeeze())
order = inds[order]
scores, deltas, anchors = scores[order], deltas[order], anchors[order]
# Convert anchors into proposals.
proposals = bbox_transform_inv(anchors, deltas)
proposals = clip_boxes(proposals, im_info[:2])
# Apply NMS.
keep = gpu_nms(np.hstack((proposals, scores)), self.nms_thresh)
keep = keep[:post_nms_top_n] if post_nms_top_n > 0 else keep
return proposals[keep, :].astype('float32', copy=False)
def forward_train(self, inputs):
shapes = [x[:2] for x in inputs['grid_info']]
anchors = self.anchor_generator.get_anchors(shapes)
cls_score = inputs['cls_score'].numpy()
bbox_pred = inputs['bbox_pred'].permute(0, 2, 1).numpy()
all_rois, batch_size = [], cls_score.shape[0]
for batch_ind in range(batch_size):
scores = cls_score[batch_ind].reshape((-1, 1))
deltas = bbox_pred[batch_ind]
im_info = inputs['im_info'][batch_ind]
proposals = self.decode_proposals(scores, deltas, anchors, im_info)
batch_inds = np.full((proposals.shape[0], 1), batch_ind, 'float32')
all_rois.append(np.hstack((batch_inds, proposals)))
return np.concatenate(all_rois)
def forward(self, inputs):
if self.training:
return self.forward_train(inputs)
input_tags = ['cls_score', 'bbox_pred', 'im_info', 'grid_info']
return autograd.Function.apply(
'RPNDecoder',
inputs['cls_score'].device,
inputs=[inputs[k] for k in input_tags],
outputs=[None] * (self.max_level - self.min_level + 1),
strides=self.anchor_generator.strides,
ratios=self.anchor_generator.aspect_ratios[0],
scales=self.anchor_generator.scales[0],
min_level=self.min_level,
max_level=self.max_level,
pre_nms_top_n=self.pre_nms_top_n[False],
post_nms_top_n=self.post_nms_top_n[False],
nms_thresh=self.nms_thresh,
)
autograd.Function.register(
'RPNDecoder', lambda **kwargs: {
'strides': kwargs.get('strides', []),
'ratios': kwargs.get('ratios', []),
'scales': kwargs.get('scales', []),
'pre_nms_top_n': kwargs.get('pre_nms_top_n', 6000),
'post_nms_top_n': kwargs.get('post_nms_top_n', 1000),
'nms_thresh': kwargs.get('nms_thresh', 0.7),
'min_level': kwargs.get('min_level', 2),
'max_level': kwargs.get('max_level', 5),
'check_device': False,
})
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RetinaNet head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import math
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.data.targets.retinanet import AnchorTargets
from seetadet.ops.build import build_activation
from seetadet.ops.build import build_loss
from seetadet.ops.build import build_norm
from seetadet.ops.conv import ConvNorm2d
from seetadet.ops.fusion import fuse_conv_bn
class RetinaNetHead(nn.Module):
"""RetinaNet head."""
def __init__(self, in_dims):
super(RetinaNetHead, self).__init__()
conv_module = functools.partial(
ConvNorm2d, dim_in=in_dims[0], dim_out=in_dims[0],
kernel_size=3, conv_type=cfg.RETINANET.CONV)
norm_module = functools.partial(build_norm, norm_type=cfg.RETINANET.NORM)
self.conv_module = conv_module
self.dim_cls = len(cfg.MODEL.CLASSES) - 1
self.cls_conv = nn.ModuleList(
conv_module() for _ in range(cfg.RETINANET.NUM_CONV))
self.bbox_conv = nn.ModuleList(
conv_module() for _ in range(cfg.RETINANET.NUM_CONV))
self.cls_norm = nn.ModuleList()
self.bbox_norm = nn.ModuleList()
for _ in range(len(self.cls_conv)):
self.cls_norm.append(nn.ModuleList())
self.bbox_norm.append(nn.ModuleList())
for _ in range(len(in_dims)):
self.cls_norm[-1].append(norm_module(in_dims[0]))
self.bbox_norm[-1].append(norm_module(in_dims[0]))
self.targets = AnchorTargets()
num_anchors = self.targets.generator.num_cell_anchors(0)
self.cls_score = conv_module(dim_out=self.dim_cls * num_anchors)
self.bbox_pred = conv_module(dim_out=4 * num_anchors)
self.activation = build_activation(cfg.RETINANET.ACTIVATION, inplace=True)
self.cls_loss = build_loss('sigmoid_focal')
self.bbox_loss = build_loss(cfg.RETINANET.BBOX_REG_LOSS_TYPE, beta=0.1)
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight, std=0.01)
# Bias prior initialization for focal loss.
for name, param in self.cls_score.named_parameters():
if name.endswith('bias'):
nn.init.constant_(param, -math.log((1 - 0.01) / 0.01))
def optimize_for_inference(self):
"""Optimize modules for inference."""
if hasattr(self.cls_norm[0][0], 'momentum'):
cls_conv = nn.ModuleList()
bbox_conv = nn.ModuleList()
for i in range(len(self.cls_norm)):
cls_conv.append(nn.ModuleList())
bbox_conv.append(nn.ModuleList())
cls_state = self.cls_conv[i].state_dict()
bbox_state = self.bbox_conv[i].state_dict()
for j in range(len(self.cls_norm[i])):
cls_conv[i].append(self.conv_module()._apply(
lambda t: t.to(self.cls_norm[i][j].weight.device)))
bbox_conv[i].append(self.conv_module()._apply(
lambda t: t.to(self.bbox_norm[i][j].weight.device)))
cls_conv[i][j].load_state_dict(cls_state)
bbox_conv[i][j].load_state_dict(bbox_state)
fuse_conv_bn(cls_conv[i][j][-1], self.cls_norm[i][j])
fuse_conv_bn(bbox_conv[i][j][-1], self.bbox_norm[i][j])
self._modules['cls_conv'] = cls_conv
self._modules['bbox_conv'] = bbox_conv
def get_outputs(self, inputs):
"""Return the outputs."""
features = list(inputs['features'])
cls_score, bbox_pred = [], []
for j, feature in enumerate(features):
cls_input, box_input = feature, feature
for i in range(len(self.cls_conv)):
if isinstance(self.cls_conv[i], nn.ModuleList):
cls_input = self.cls_conv[i][j](cls_input)
box_input = self.bbox_conv[i][j](box_input)
else:
cls_input = self.cls_conv[i](cls_input)
box_input = self.bbox_conv[i](box_input)
cls_input = self.activation(self.cls_norm[i][j](cls_input))
box_input = self.activation(self.bbox_norm[i][j](box_input))
cls_score.append(self.cls_score(cls_input).reshape_((0, self.dim_cls, -1)))
bbox_pred.append(self.bbox_pred(box_input).reshape_((0, 4, -1)))
cls_score = torch.cat(cls_score, 2) if len(features) > 1 else cls_score[0]
bbox_pred = torch.cat(bbox_pred, 2) if len(features) > 1 else bbox_pred[0]
return {'cls_score': cls_score, 'bbox_pred': bbox_pred}
def get_losses(self, inputs, targets):
"""Return the losses."""
bbox_pred = inputs['bbox_pred'].permute(0, 2, 1)
bbox_pred = bbox_pred.flatten_(0, 1)[targets['bbox_inds']]
cls_loss = self.cls_loss(inputs['cls_score'], targets['labels'])
bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'],
targets['bbox_anchors'])
normalizer = targets['bbox_inds'].size(0)
cls_loss_weight = 1.0 / normalizer
bbox_loss_weight = cfg.RETINANET.BBOX_REG_LOSS_WEIGHT / normalizer
cls_loss = cls_loss.mul_(cls_loss_weight)
bbox_loss = bbox_loss.mul_(bbox_loss_weight)
return {'cls_loss': cls_loss, 'bbox_loss': bbox_loss}
def forward(self, inputs):
outputs = self.get_outputs(inputs)
if self.training:
targets = self.targets.compute(**inputs)
logits = {'cls_score': outputs['cls_score'].float(),
'bbox_pred': outputs['bbox_pred'].float()}
return self.get_losses(logits, targets)
else:
cls_score = outputs['cls_score'].permute(0, 2, 1)
cls_score = nn.functional.sigmoid(cls_score, inplace=True)
return {'cls_score': cls_score.float(),
'bbox_pred': outputs['bbox_pred'].float()}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RPN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.data.targets.rpn import AnchorTargets
from seetadet.ops.build import build_loss
class RPNHead(nn.Module):
"""RPN head."""
def __init__(self, in_dims):
super(RPNHead, self).__init__()
self.targets = AnchorTargets()
num_anchors = self.targets.generator.num_cell_anchors(0)
self.output_conv = nn.Conv2d(in_dims[0], in_dims[0], 3, padding=1)
self.cls_score = nn.Conv2d(in_dims[0], num_anchors, 1)
self.bbox_pred = nn.Conv2d(in_dims[0], num_anchors * 4, 1)
self.activation = nn.ReLU(inplace=True)
self.cls_loss = nn.BCEWithLogitsLoss(reduction='mean')
self.bbox_loss = build_loss(cfg.RPN.BBOX_REG_LOSS_TYPE, beta=0.1)
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight, std=0.01)
def get_outputs(self, inputs):
"""Return the outputs."""
features = list(inputs['features'])
cls_score, bbox_pred = [], []
for x in features:
x = self.activation(self.output_conv(x))
cls_score.append(self.cls_score(x).reshape_((0, -1)))
bbox_pred.append(self.bbox_pred(x).reshape_((0, 4, -1)))
cls_score = torch.cat(cls_score, 1) if len(features) > 1 else cls_score[0]
bbox_pred = torch.cat(bbox_pred, 2) if len(features) > 1 else bbox_pred[0]
return {'rpn_cls_score': cls_score, 'rpn_bbox_pred': bbox_pred}
def get_losses(self, inputs, targets):
"""Return the losses."""
bbox_pred = inputs['bbox_pred'].permute(0, 2, 1)
bbox_pred = bbox_pred.index_select((0, 1), targets['bbox_inds'])
cls_score = inputs['cls_score'].index_select((0, 1), targets['cls_inds'])
cls_loss = self.cls_loss(cls_score, targets['labels'])
bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'],
targets['bbox_anchors'])
normalizer = cfg.RPN.BATCH_SIZE * cfg.TRAIN.IMS_PER_BATCH
bbox_loss_weight = cfg.RPN.BBOX_REG_LOSS_WEIGHT / normalizer
bbox_loss = bbox_loss.mul_(bbox_loss_weight)
return {'rpn_cls_loss': cls_loss, 'rpn_bbox_loss': bbox_loss}
def forward(self, inputs):
outputs = self.get_outputs(inputs)
rpn_cls_score = outputs.pop('rpn_cls_score').float()
outputs['rpn_bbox_pred'] = outputs['rpn_bbox_pred'].float()
outputs['rpn_cls_score'] = nn.functional.sigmoid(rpn_cls_score.data)
if self.training:
targets = self.targets.compute(**inputs)
logits = {'cls_score': rpn_cls_score,
'bbox_pred': outputs['rpn_bbox_pred']}
outputs.update(self.get_losses(logits, targets))
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""SSD head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.data.targets.ssd import AnchorTargets
from seetadet.ops.build import build_loss
from seetadet.ops.conv import ConvNorm2d
class SSDHead(nn.Module):
"""SSD head."""
def __init__(self, in_dims):
super(SSDHead, self).__init__()
self.targets = AnchorTargets()
self.cls_score = nn.ModuleList()
self.bbox_pred = nn.ModuleList()
self.num_classes = len(cfg.MODEL.CLASSES)
conv_module = nn.Conv2d
if cfg.FPN.CONV == 'SepConv2d':
conv_module = functools.partial(ConvNorm2d, conv_type='SepConv2d')
conv_module = functools.partial(conv_module, kernel_size=3, padding=1)
for i, dim in enumerate(in_dims):
num_anchors = self.targets.generator.num_cell_anchors(i)
self.cls_score.append(conv_module(dim, num_anchors * self.num_classes))
self.bbox_pred.append(conv_module(dim, num_anchors * 4))
self.cls_loss = nn.CrossEntropyLoss(ignore_index=-1, reduction='sum')
self.bbox_loss = build_loss(cfg.SSD.BBOX_REG_LOSS_TYPE)
def get_outputs(self, inputs):
"""Return the outputs."""
features = list(inputs['features'])
cls_score, bbox_pred = [], []
for i, x in enumerate(features):
cls_score.append(self.cls_score[i](x).permute(0, 2, 3, 1).flatten_(1))
bbox_pred.append(self.bbox_pred[i](x).permute(0, 2, 3, 1).flatten_(1))
cls_score = torch.cat(cls_score, 1) if len(features) > 1 else cls_score[0]
bbox_pred = torch.cat(bbox_pred, 1) if len(features) > 1 else bbox_pred[0]
cls_score = cls_score.reshape_((0, -1, self.num_classes))
bbox_pred = bbox_pred.reshape_((0, -1, 4))
return {'cls_score': cls_score, 'bbox_pred': bbox_pred}
def get_losses(self, inputs, targets):
"""Return the losses."""
cls_score = inputs['cls_score'].flatten_(0, 1)
bbox_pred = inputs['bbox_pred'].flatten_(0, 1)
bbox_pred = bbox_pred[targets['bbox_inds']]
cls_loss = self.cls_loss(cls_score, targets['labels'])
bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'],
targets['bbox_anchors'])
normalizer = targets['bbox_inds'].size(0)
cls_loss_weight = 1.0 / normalizer
bbox_loss_weight = cfg.SSD.BBOX_REG_LOSS_WEIGHT / normalizer
cls_loss = cls_loss.mul_(cls_loss_weight)
bbox_loss = bbox_loss.mul_(bbox_loss_weight)
return {'cls_loss': cls_loss, 'bbox_loss': bbox_loss}
def forward(self, inputs):
outputs = self.get_outputs(inputs)
cls_score = outputs['cls_score']
if self.training:
cls_score_data = nn.functional.softmax(cls_score.data, dim=2)
targets = self.targets.compute(cls_score=cls_score_data, **inputs)
logits = {'cls_score': cls_score.float(),
'bbox_pred': outputs['bbox_pred'].float()}
return self.get_losses(logits, targets)
else:
cls_score = nn.functional.softmax(cls_score, dim=2, inplace=True)
return {'cls_score': cls_score.float(),
'bbox_pred': outputs['bbox_pred'].float()}
...@@ -8,24 +8,15 @@ ...@@ -8,24 +8,15 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Detectors."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from seetadet.algo import faster_rcnn # Classes.
from seetadet.algo import ssd from seetadet.models.detectors.detector import Detector
from seetadet.core.config import cfg from seetadet.models.detectors.faster_rcnn import FasterRCNN
from seetadet.models.detectors.mask_rcnn import MaskRCNN
from seetadet.models.detectors.retinanet import RetinaNet
class DataLoader(object): from seetadet.models.detectors.ssd import SSD
"""Provide mini-batches of data."""
def __new__(cls):
pipeline_type = cfg.PIPELINE.TYPE.lower()
if pipeline_type == 'default' or pipeline_type == 'rcnn':
return faster_rcnn.DataLoader()
elif pipeline_type == 'ssd':
return ssd.DataLoader()
else:
raise ValueError('Unsupported pipeline: ' + pipeline_type)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Base detector."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.models.build import build_backbone
from seetadet.ops.fusion import get_fusion
from seetadet.ops.normalization import ToTensor
from seetadet.utils import logging
class Detector(nn.Module):
"""Class to build and compute the detection pipelines."""
def __init__(self):
super(Detector, self).__init__()
self.to_tensor = ToTensor()
self.backbone, self.neck = build_backbone()
self.backbone_dims = self.neck.out_dims
def get_inputs(self, inputs):
"""Return the detection inputs.
Parameters
----------
inputs : dict, optional
The optional inputs.
"""
inputs['img'] = self.to_tensor(inputs['img'], normalize=True)
return inputs
def get_features(self, inputs):
"""Return the detection features.
Parameters
----------
inputs : dict
The inputs.
"""
return self.neck(self.backbone(inputs['img']))
def get_outputs(self, inputs):
"""Return the detection outputs.
Parameters
----------
inputs : dict
The inputs.
"""
return inputs
def forward(self, inputs):
"""Define the computation performed at every call.
Parameters
----------
inputs : dict
The inputs.
"""
return self.get_outputs(inputs)
def load_weights(self, weights, strict=False):
"""Load the state dict of this detector.
Parameters
----------
weights : str
The path of the weights file.
"""
return self.load_state_dict(torch.load(weights), strict=strict)
def optimize_for_inference(self):
"""Optimize the graph for the inference."""
# Set precision.
precision = cfg.MODEL.PRECISION.lower()
self.half() if precision == 'float16' else self.float()
logging.info('Set precision: ' + precision)
# Fuse modules.
fusion_memo, last_module = set(), None
for module in self.modules():
if module is self:
continue
if hasattr(module, 'optimize_for_inference'):
module.optimize_for_inference()
fusion_memo.add(module.__class__.__name__)
continue
key, fn = get_fusion(last_module, module)
if fn is not None:
fusion_memo.add(key)
fn(last_module, module)
last_module = module
for key in fusion_memo:
logging.info('Fuse modules: ' + key)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Faster R-CNN detector."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.data.targets.rcnn import ProposalTargets
from seetadet.models.build import DETECTORS
from seetadet.models.decoders.rpn import RPNDecoder
from seetadet.models.dense_heads.rpn import RPNHead
from seetadet.models.detectors.detector import Detector
from seetadet.models.roi_heads.fast_rcnn import FastRCNNHead
@DETECTORS.register('faster_rcnn')
class FasterRCNN(Detector):
"""Faster R-CNN detector."""
def __init__(self):
super(FasterRCNN, self).__init__()
self.rpn_head = RPNHead(self.backbone_dims)
self.bbox_head = FastRCNNHead(self.backbone_dims)
self.rpn_decoder = RPNDecoder()
self.proposal_targets = ProposalTargets()
def get_outputs(self, inputs):
"""Return the detection outputs."""
inputs = self.get_inputs(inputs)
inputs['features'] = self.get_features(inputs)
inputs['grid_info'] = inputs.pop(
'grid_info', [x.shape[-2:] for x in inputs['features']])
outputs = self.rpn_head(inputs)
inputs['rois'] = self.rpn_decoder({
'cls_score': outputs.pop('rpn_cls_score'),
'bbox_pred': outputs.pop('rpn_bbox_pred'),
'im_info': inputs['im_info'],
'grid_info': inputs['grid_info']})
if self.training:
targets = self.proposal_targets.compute(**inputs)
inputs['rois'] = targets['rois']
outputs.update(self.bbox_head(inputs, targets))
else:
outputs.update(self.bbox_head(inputs))
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Mask R-CNN detector."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.data.targets.rcnn import ProposalTargets
from seetadet.models.build import DETECTORS
from seetadet.models.decoders.rpn import RPNDecoder
from seetadet.models.dense_heads.rpn import RPNHead
from seetadet.models.detectors.detector import Detector
from seetadet.models.roi_heads.fast_rcnn import FastRCNNHead
from seetadet.models.roi_heads.mask_rcnn import MaskRCNNHead
@DETECTORS.register('mask_rcnn')
class MaskRCNN(Detector):
"""Mask R-CNN detector."""
def __init__(self):
super(MaskRCNN, self).__init__()
self.rpn_head = RPNHead(self.backbone_dims)
self.bbox_head = FastRCNNHead(self.backbone_dims)
self.mask_head = MaskRCNNHead(self.backbone_dims)
self.rpn_decoder = RPNDecoder()
self.proposal_targets = ProposalTargets()
def get_outputs(self, inputs):
"""Return the detection outputs."""
inputs = self.get_inputs(inputs)
inputs['features'] = self.get_features(inputs)
inputs['grid_info'] = inputs.pop(
'grid_info', [x.shape[-2:] for x in inputs['features']])
outputs = self.rpn_head(inputs)
inputs['rois'] = self.rpn_decoder({
'cls_score': outputs.pop('rpn_cls_score'),
'bbox_pred': outputs.pop('rpn_bbox_pred'),
'im_info': inputs['im_info'],
'grid_info': inputs['grid_info']})
if self.training:
targets = self.proposal_targets.compute(**inputs)
inputs['rois'] = targets.pop('rois')
outputs.update(self.bbox_head(inputs, targets))
inputs['rois'] = targets.pop('fg_rois')
outputs.update(self.mask_head(inputs, targets))
else:
outputs.update(self.bbox_head(inputs))
self.outputs = {'features': inputs['features']}
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RetinaNet detector."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.models.build import DETECTORS
from seetadet.models.decoders.retinanet import RetinaNetDecoder
from seetadet.models.dense_heads.retinanet import RetinaNetHead
from seetadet.models.detectors.detector import Detector
@DETECTORS.register('retinanet')
class RetinaNet(Detector):
"""RetinaNet detector."""
def __init__(self):
super(RetinaNet, self).__init__()
self.bbox_head = RetinaNetHead(self.backbone_dims)
self.bbox_decoder = RetinaNetDecoder()
def get_outputs(self, inputs):
"""Compute detection outputs."""
inputs = self.get_inputs(inputs)
inputs['features'] = self.get_features(inputs)
inputs['grid_info'] = inputs.pop(
'grid_info', [x.shape[-2:] for x in inputs['features']])
outputs = self.bbox_head(inputs)
if not self.training:
outputs['dets'] = self.bbox_decoder({
'cls_score': outputs.pop('cls_score'),
'bbox_pred': outputs.pop('bbox_pred'),
'im_info': inputs['im_info'],
'grid_info': inputs['grid_info']})
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""SSD detector."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.models.build import DETECTORS
from seetadet.models.dense_heads.ssd import SSDHead
from seetadet.models.detectors.detector import Detector
@DETECTORS.register('ssd')
class SSD(Detector):
"""SSD detector."""
def __init__(self):
super(SSD, self).__init__()
self.bbox_head = SSDHead(self.backbone_dims)
def get_outputs(self, inputs=None):
"""Compute detection outputs."""
inputs = self.get_inputs(inputs)
inputs['features'] = self.get_features(inputs)
outputs = self.bbox_head(inputs)
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Fast-RCNN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.ops.build import build_loss
from seetadet.ops.conv import ConvNorm2d
from seetadet.ops.vision import RoIPooler
class FastRCNNHead(nn.Module):
"""Fast R-CNN head."""
def __init__(self, in_dims):
super(FastRCNNHead, self).__init__()
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.FRCNN.NORM,
kernel_size=3, activation_type='ReLU')
self.output_conv = nn.ModuleList()
self.output_fc = nn.ModuleList()
for i in range(cfg.FRCNN.NUM_CONV):
dim = in_dims[0] if i == 0 else cfg.FRCNN.CONV_HEAD_DIM
self.output_conv += [conv_module(dim, cfg.FRCNN.CONV_HEAD_DIM)]
for i in range(cfg.FRCNN.NUM_FC):
dim = in_dims[0] * cfg.FRCNN.POOLER_RESOLUTION ** 2
dim = dim if i == 0 else cfg.FRCNN.FC_HEAD_DIM
self.output_fc += [nn.Sequential(nn.Linear(dim, cfg.FRCNN.FC_HEAD_DIM),
nn.ReLU(inplace=True))]
self.cls_score = nn.Linear(cfg.FRCNN.FC_HEAD_DIM, len(cfg.MODEL.CLASSES))
self.bbox_pred = nn.Linear(cfg.FRCNN.FC_HEAD_DIM, len(cfg.MODEL.CLASSES) * 4)
self.pooler = RoIPooler(
pooler_type=cfg.FRCNN.POOLER_TYPE,
resolution=cfg.FRCNN.POOLER_RESOLUTION,
sampling_ratio=cfg.FRCNN.POOLER_SAMPLING_RATIO)
self.cls_loss = nn.CrossEntropyLoss(ignore_index=-1, reduction='mean')
self.bbox_loss = build_loss(cfg.FRCNN.BBOX_REG_LOSS_TYPE)
self.spatial_scales = [1. / (2 ** lvl) for lvl in range(
cfg.FRCNN.MIN_LEVEL, cfg.FRCNN.MAX_LEVEL + 1)]
self.reset_parameters()
def reset_parameters(self):
nn.init.normal_(self.cls_score.weight, std=0.01)
nn.init.normal_(self.bbox_pred.weight, std=0.001)
def get_outputs(self, inputs):
x = torch.cat([self.pooler(
inputs['features'][i], inputs['rois'][i],
spatial_scale=spatial_scale) for i, spatial_scale
in enumerate(self.spatial_scales)])
for layer in self.output_conv:
x = layer(x)
x = x.flatten_(1)
for layer in self.output_fc:
x = layer(x)
cls_score, bbox_pred = self.cls_score(x), self.bbox_pred(x)
return {'cls_score': cls_score, 'bbox_pred': bbox_pred}
def get_losses(self, inputs, targets):
bbox_pred = inputs['bbox_pred'].reshape_((0, -1, 4))
bbox_pred = bbox_pred.index_select((0, 1), targets['bbox_inds'])
cls_loss = self.cls_loss(inputs['cls_score'], targets['labels'])
bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'])
normalizer = cfg.FRCNN.BATCH_SIZE * cfg.TRAIN.IMS_PER_BATCH
bbox_loss_weight = cfg.FRCNN.BBOX_REG_LOSS_WEIGHT / normalizer
bbox_loss = bbox_loss.mul_(bbox_loss_weight)
return {'cls_loss': cls_loss, 'bbox_loss': bbox_loss}
def forward(self, inputs, targets=None):
outputs = self.get_outputs(inputs)
if self.training:
logits = {'cls_score': outputs['cls_score'].float(),
'bbox_pred': outputs['bbox_pred'].float()}
return self.get_losses(logits, targets)
else:
outputs['cls_score'] = nn.functional.softmax(
outputs['cls_score'], dim=1, inplace=True)
return {'rois': torch.cat(inputs['rois']),
'cls_score': outputs['cls_score'].float(),
'bbox_pred': outputs['bbox_pred'].float()}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Mask R-CNN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.ops.conv import ConvNorm2d
from seetadet.ops.vision import RoIPooler
class MaskRCNNHead(nn.Module):
"""Mask R-CNN head."""
def __init__(self, in_dims):
super(MaskRCNNHead, self).__init__()
self.dim = cfg.MRCNN.CONV_HEAD_DIM
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.MRCNN.NORM,
kernel_size=3, activation_type='ReLU')
self.output_conv = nn.ModuleList()
for i in range(cfg.MRCNN.NUM_CONV):
dim = in_dims[0] if i == 0 else self.dim
self.output_conv += [conv_module(dim, self.dim)]
self.output_conv += [nn.Sequential(
nn.ConvTranspose2d(self.dim, self.dim, 2, 2),
nn.ReLU(True))]
self.mask_pred = nn.Conv2d(self.dim, len(cfg.MODEL.CLASSES) - 1, 1)
self.pooler = RoIPooler(
pooler_type=cfg.MRCNN.POOLER_TYPE,
resolution=cfg.MRCNN.POOLER_RESOLUTION,
sampling_ratio=cfg.MRCNN.POOLER_SAMPLING_RATIO)
self.mask_loss = nn.BCEWithLogitsLoss(reduction='valid')
self.spatial_scales = [1. / (2 ** lvl) for lvl in range(
cfg.FRCNN.MIN_LEVEL, cfg.FRCNN.MAX_LEVEL + 1)]
self.reset_parameters()
def reset_parameters(self):
nn.init.normal_(self.mask_pred.weight, std=0.001)
def get_outputs(self, inputs):
x = torch.cat([self.pooler(
inputs['features'][i], inputs['rois'][i],
spatial_scale=spatial_scale) for i, spatial_scale
in enumerate(self.spatial_scales)])
for layer in self.output_conv:
x = layer(x)
mask_pred = self.mask_pred(x)
return {'mask_pred': mask_pred}
def get_losses(self, inputs, targets):
mask_pred = inputs['mask_pred']
mask_pred = mask_pred.index_select((0, 1), targets['mask_inds'])
mask_loss = self.mask_loss(mask_pred, targets['mask_targets'])
return {'mask_loss': mask_loss}
def forward(self, inputs, targets=None):
outputs = self.get_outputs(inputs)
if self.training:
logits = {'mask_pred': outputs['mask_pred'].float()}
return self.get_losses(logits, targets)
else:
outputs['mask_pred'] = nn.functional.sigmoid(
outputs['mask_pred'].float(), inplace=True)
return {'mask_pred': outputs['mask_pred']}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Detection modules."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm.torch import nn
from dragon.vm.torch import autograd
from seetadet.core.config import cfg
class _NonMaxSuppression(autograd.Function):
"""Filter out boxes that have high IoU with selected ones."""
def __init__(self, key, dev, **kwargs):
super(_NonMaxSuppression, self).__init__(key, dev, **kwargs)
self.iou_threshold = kwargs.get('iou_threshold', 0.5)
def attributes(self):
return {
'op_type': 'NonMaxSuppression',
'arguments': {'iou_threshold': self.iou_threshold}
}
def forward(self, input):
return self.dispatch([input], [self.alloc()])
class _RetinaNetDecoder(autograd.Function):
"""Decode predictions from RetinaNet."""
def __init__(self, key, dev, **kwargs):
super(_RetinaNetDecoder, self).__init__(key, dev, **kwargs)
self.args = kwargs
def attributes(self):
return {
'op_type': 'RetinaNetDecoder',
'arguments': {
'strides': self.args['strides'],
'ratios': self.args['ratios'],
'scales': self.args['scales'],
'pre_nms_top_n': self.args['pre_nms_top_n'],
'score_thresh': self.args['score_thresh'],
}
}
def forward(self, features, cls_prob, bbox_pred, ims_info):
inputs = features + [cls_prob, bbox_pred, ims_info]
self._check_device(inputs[:-1]) # Skip <ims_info>
return self.dispatch(inputs, [self.alloc()], check_device=False)
class _RPNDecoder(autograd.Function):
"""Decode proposal regions from RPN."""
def __init__(self, key, dev, **kwargs):
super(_RPNDecoder, self).__init__(key, dev, **kwargs)
self.args = kwargs
def attributes(self):
return {
'op_type': 'RPNDecoder',
'arguments': {
'strides': self.args['strides'],
'ratios': self.args['ratios'],
'scales': self.args['scales'],
'pre_nms_top_n': self.args['pre_nms_top_n'],
'post_nms_top_n': self.args['post_nms_top_n'],
'nms_thresh': self.args['nms_thresh'],
'min_level': self.args['min_level'],
'max_level': self.args['max_level'],
'canonical_scale': self.args['canonical_scale'],
'canonical_level': self.args['canonical_level'],
}
}
def forward(self, features, cls_prob, bbox_pred, im_info):
inputs = features + [cls_prob, bbox_pred, im_info]
self._check_device(inputs[:-1]) # Skip <im_info>
num_outputs = self.args['max_level'] - self.args['min_level'] + 1
outputs = [self.alloc() for _ in range(num_outputs)]
return self.dispatch(inputs, outputs, check_device=False)
def decode_retinanet(
features,
cls_prob,
bbox_pred,
ims_info,
strides,
ratios,
scales,
pre_nms_top_n,
score_thresh,
):
return _RetinaNetDecoder \
.instantiate(
cls_prob.device,
strides=strides,
ratios=ratios,
scales=scales,
pre_nms_top_n=pre_nms_top_n,
score_thresh=score_thresh,
).apply(features, cls_prob, bbox_pred, ims_info)
def decode_rpn(
features,
cls_prob,
bbox_pred,
im_info,
strides,
ratios,
scales,
pre_nms_top_n,
post_nms_top_n,
nms_thresh,
min_level,
max_level,
canonical_scale,
canonical_level,
):
return _RPNDecoder \
.instantiate(
cls_prob.device,
strides=strides,
ratios=ratios,
scales=scales,
pre_nms_top_n=pre_nms_top_n,
post_nms_top_n=post_nms_top_n,
nms_thresh=nms_thresh,
min_level=min_level,
max_level=max_level,
canonical_scale=canonical_scale,
canonical_level=canonical_level,
).apply(features, cls_prob, bbox_pred, im_info)
def nms(input, iou_threshold=0.5):
return _NonMaxSuppression \
.instantiate(
input.device,
iou_threshold=iou_threshold,
).apply(input)
class RetinaNetDecoder(nn.Module):
"""Decode predictions from retinanet."""
def __init__(self):
super(RetinaNetDecoder, self).__init__()
k_max, k_min = cfg.FPN.RPN_MAX_LEVEL, cfg.FPN.RPN_MIN_LEVEL
scales_per_octave = cfg.RETINANET.SCALES_PER_OCTAVE
self.strides = [int(2. ** lvl) for lvl in range(k_min, k_max + 1)]
self.scales = [cfg.RETINANET.ANCHOR_SCALE *
(2 ** (octave / float(scales_per_octave)))
for octave in range(scales_per_octave)]
def forward(self, features, cls_prob, bbox_pred, ims_info):
return decode_retinanet(
features=features,
cls_prob=cls_prob,
bbox_pred=bbox_pred,
ims_info=ims_info,
strides=self.strides,
ratios=[float(e) for e in cfg.RETINANET.ASPECT_RATIOS],
scales=self.scales,
pre_nms_top_n=cfg.TEST.RETINANET_PRE_NMS_TOP_N,
score_thresh=float(cfg.TEST.SCORE_THRESH),
)
class RPNDecoder(nn.Module):
"""Generate proposal regions from RPN."""
def __init__(self):
super(RPNDecoder, self).__init__()
def forward(self, features, cls_prob, bbox_pred, im_info):
return decode_rpn(
features=features,
cls_prob=cls_prob,
bbox_pred=bbox_pred,
im_info=im_info,
strides=cfg.RPN.STRIDES,
ratios=[float(e) for e in cfg.RPN.ASPECT_RATIOS],
scales=[float(e) for e in cfg.RPN.SCALES],
pre_nms_top_n=cfg.TEST.RPN_PRE_NMS_TOP_N,
post_nms_top_n=cfg.TEST.RPN_POST_NMS_TOP_N,
nms_thresh=cfg.TEST.RPN_NMS_THRESH,
min_level=cfg.FPN.ROI_MIN_LEVEL,
max_level=cfg.FPN.ROI_MAX_LEVEL,
canonical_scale=cfg.FPN.ROI_CANONICAL_SCALE,
canonical_level=cfg.FPN.ROI_CANONICAL_LEVEL,
)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""NN modules."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import dragon
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
class Conv1x1(object):
"""1x1 convolution."""
def __new__(cls, dim_in, dim_out, stride=1, bias=False):
return nn.Conv2d(
in_channels=dim_in,
out_channels=dim_out,
kernel_size=1,
stride=stride,
bias=bias,
)
class Conv3x3(object):
"""3x3 convolution."""
def __new__(
cls,
dim_in,
dim_out,
stride=1,
dilation=1,
groups=1,
bias=False
):
return nn.Conv2d(
in_channels=dim_in,
out_channels=dim_out,
kernel_size=3,
stride=stride,
padding=1 * dilation,
dilation=dilation,
groups=groups,
bias=bias,
)
class CrossEntropyLoss(object):
"""Cross entropy loss."""
def __new__(cls, reduction='valid'):
return nn.CrossEntropyLoss(
reduction=reduction, ignore_index=-1)
class FrozenBatchNorm2d(nn.Module):
"""BatchNorm2d where statistics and the affine parameters are fixed."""
def __init__(self, num_features, eps=1e-5, inplace=True):
super(FrozenBatchNorm2d, self).__init__()
self.num_features = num_features
self.eps = eps
self.inplace = inplace
self.register_buffer('weight', torch.ones(num_features))
self.register_buffer('bias', torch.zeros(num_features))
self.register_buffer('running_mean', torch.zeros(num_features))
self.register_buffer('running_var', torch.ones(num_features) - eps)
def extra_repr(self):
affine_str = '{num_features}, eps={eps}'.format(**self.__dict__)
inplace_str = ', inplace' if self.inplace else ''
return affine_str + inplace_str
def forward(self, input):
return torch.channel_affine(
input,
self.weight,
self.bias,
dim=1,
out=input if self.inplace else None,
)
def _load_from_state_dict(
self,
state_dict,
prefix,
strict,
missing_keys,
unexpected_keys,
error_msgs,
):
super(FrozenBatchNorm2d, self)._load_from_state_dict(
state_dict,
prefix,
strict,
missing_keys,
unexpected_keys,
error_msgs,
)
# Fuse the running stats into weight and bias.
# Note that this behavior will break the original stats
# into zero means and one stds.
with torch.no_grad():
self.running_var.float_().add_(self.eps).sqrt_()
self.weight.float_().div_(self.running_var)
self.bias.float_().sub_(self.running_mean.float_() * self.weight)
self.running_mean.zero_()
self.running_var.one_().sub_(self.eps)
class GIoULoss(nn.Module):
"""GIoU loss."""
def __init__(self, reduction='sum', delta_weights=None):
super(GIoULoss, self).__init__()
self.reduction = reduction
self.delta_weights = delta_weights
# Store the detached tensors
self.data = {}
self.x1, self.y1, self.x2, self.y2 = None, None, None, None
def transform_inv(self, boxes, deltas, name=None):
widths = boxes[:, 2] - boxes[:, 0]
heights = boxes[:, 3] - boxes[:, 1]
ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights
if name is not None:
self.data[name + '/widths'] = widths
self.data[name + '/heights'] = heights
dx, dy, dw, dh = torch.chunk(deltas, chunks=4, dim=1)
if self.delta_weights is not None:
wx, wy, ww, wh = self.delta_weights
dx, dy, dw, dh = dx / wx, dy / wy, dw / ww, dh / wh
pred_ctr_x = dx * widths + ctr_x
pred_ctr_y = dy * heights + ctr_y
pred_w = torch.exp(dw) * widths
pred_h = torch.exp(dh) * heights
x1 = pred_ctr_x - 0.5 * pred_w
y1 = pred_ctr_y - 0.5 * pred_h
x2 = pred_ctr_x + 0.5 * pred_w
y2 = pred_ctr_y + 0.5 * pred_h
return x1, y1, x2, y2
def forward_impl(self, input, target, anchor):
x1, y1, x2, y2 = self.transform_inv(
anchor, input, name='logits')
self.x1, self.y1, self.x2, self.y2 = \
self.transform_inv(anchor, target)
# Compute the independent area
pred_area = (x2 - x1) * (y2 - y1)
target_area = (self.x2 - self.x1) * (self.y2 - self.y1)
# Compute the intersecting area
x1_inter = torch.maximum(x1, self.x1)
y1_inter = torch.maximum(y1, self.y1)
x2_inter = torch.minimum(x2, self.x2)
y2_inter = torch.minimum(y2, self.y2)
w_inter = torch.clamp(x2_inter - x1_inter, min=0)
h_inter = torch.clamp(y2_inter - y1_inter, min=0)
area_inter = w_inter * h_inter
# Compute the enclosing area
x1_enc = torch.minimum(x1, self.x1)
y1_enc = torch.minimum(y1, self.y1)
x2_enc = torch.maximum(x2, self.x2)
y2_enc = torch.maximum(y2, self.y2)
area_enc = (x2_enc - x1_enc) * (y2_enc - y1_enc) + 1.
# Compute the differentiable IoU metric
area_union = pred_area + target_area - area_inter
iou = area_inter / (area_union + 1.)
iou_metric = iou - (area_enc - area_union) / area_enc
# Compute the reduced loss
if self.reduction == 'sum':
return (1 - iou_metric).sum()
else:
return (1 - iou_metric).mean()
def forward(self, *inputs, **kwargs):
# Enter a new detaching scope
with dragon.eager_scope('${IOU_LOSS}'):
return self.forward_impl(*inputs, **kwargs)
class Identity(nn.Module):
"""Pass input to the output."""
def __init__(self, *args, **kwargs):
super(Identity, self).__init__()
_, _ = args, kwargs
def forward(self, x):
return x
class L1Loss(nn.Module):
"""L1 loss."""
def __init__(self, reduction='sum'):
super(L1Loss, self).__init__()
self.reduction = reduction
def forward(self, input, target, *args):
return nn.functional.l1_loss(
input, target,
reduction=self.reduction,
)
class L2Normalize(nn.Module):
"""Normalize the input using L2 norm."""
def __init__(self, num_features, init=20.):
super(L2Normalize, self).__init__()
self.weight = nn.Parameter(torch.Tensor(num_features).fill_(init))
def forward(self, input):
out = nn.functional.normalize(input, p=2, dim=1, eps=1e-5)
out = torch.channel_affine(out, self.weight, dim=1)
return out
class ReLU(object):
"""The generic ReLU activation."""
def __new__(cls, inplace=False):
return getattr(torch.nn, cfg.MODEL.RELU_VARIANT)(inplace)
class SigmoidFocalLoss(object):
"""Sigmoid focal loss."""
def __new__(cls, reduction='sum'):
return nn.SigmoidFocalLoss(
alpha=cfg.MODEL.FOCAL_LOSS_ALPHA,
gamma=cfg.MODEL.FOCAL_LOSS_GAMMA,
negative_index=0, # Background index
reduction=reduction,
)
class SmoothL1Loss(nn.Module):
"""Smoothed l1 loss."""
def __init__(self, beta=1.0, reduction='sum'):
super(SmoothL1Loss, self).__init__()
self.beta = beta
self.reduction = reduction
def forward(self, input, target, *args):
return nn.functional.smooth_l1_loss(
input, target,
beta=self.beta,
reduction=self.reduction,
)
# Getters
def get_norm(norm, dim_in):
"""Return a normalization module."""
if isinstance(norm, str):
if len(norm) == 0:
return Identity()
norm = {'BN': BatchNorm2d,
'FrozenBN': FrozenBatchNorm2d}[norm]
return norm(dim_in)
# Aliases
AvgPool2d = nn.AvgPool2d
BatchNorm2d = nn.BatchNorm2d
BCEWithLogitsLoss = nn.BCEWithLogitsLoss
Conv2d = nn.Conv2d
ConvTranspose2d = nn.ConvTranspose2d
DepthwiseConv2d = nn.DepthwiseConv2d
DropBlock2d = nn.DropBlock2d
Hardsigmoid = nn.Hardsigmoid
Hardswish = nn.Hardswish
Linear = nn.Linear
MaxPool2d = nn.MaxPool2d
Module = nn.Module
ModuleList = nn.ModuleList
Sequential = nn.Sequential
Sigmoid = nn.Sigmoid
Softmax = nn.Softmax
Swish = nn.Swish
upsample = nn.functional.upsample
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Module utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torch
from seetadet.core import registry
@registry.fusion_pass.register([
'Conv2d+BatchNorm2d',
'Conv2d+FrozenBatchNorm2d',
'DepthwiseConv2d+BatchNorm2d',
'DepthwiseConv2d+FrozenBatchNorm2d',
])
def layer_fusion_conv2d_and_bn2d(conv_module, bn_module):
"""Layer fusion between Conv2d and BatchNorm2d."""
if conv_module.bias is None:
with torch.no_grad():
delattr(conv_module, 'bias')
bn_module.forward = lambda x: x
t = torch.sqrt(bn_module.running_var + bn_module.eps)
t = bn_module.weight / t
conv_module.register_buffer(
'bias', bn_module.bias - t * bn_module.running_mean)
t = t.view(0, *([1] * (conv_module.weight.ndimension() - 1)))
if conv_module.weight.dtype == 'float16':
conv_module.bias.half_()
weight = conv_module.weight.float()
weight.mul_(t).half_()
conv_module.weight.copy_(weight)
else:
conv_module.weight.mul_(t)
def get_fusion_pass(*modules):
"""Return the fusion pass between modules."""
pass_key = '+'.join(m.__class__.__name__ for m in modules)
return pass_key, registry.fusion_pass.try_get(pass_key)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Vision modules."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm import torch
from dragon.vm import torchvision
from dragon.vm.torch import nn
from seetadet.core.config import cfg
def roi_align(input, boxes, spatial_scale, size, **kwargs):
return torchvision.ops.roi_align(
input, boxes,
output_size=(size, size),
spatial_scale=spatial_scale,
sampling_ratio=kwargs.get('sampling_ratio', 0),
)
def roi_pool(input, boxes, spatial_scale, size, **kwargs):
_ = locals() # Unused
return torchvision.ops.roi_pool(
input, boxes,
output_size=(size, size),
spatial_scale=spatial_scale,
)
class ImageNormalizer(nn.Module):
"""Normalize the image to match the computation."""
def __init__(self):
super(ImageNormalizer, self).__init__()
self._device = torch.device('cpu')
self._dummy_buffer = torch.ones(1)
self._normalize_func = functools.partial(
torch.channel_normalize,
mean=cfg.PIXEL_MEANS,
std=cfg.PIXEL_STDS,
dim=1,
dims=(0, 3, 1, 2),
dtype=cfg.MODEL.PRECISION.lower(),
)
def _apply(self, fn):
fn(self._dummy_buffer)
def cpu(self):
self._device = torch.device('cpu')
def cuda(self, device=None):
self._device = torch.device('cuda', device)
def device(self):
return self._dummy_buffer.device
def forward(self, input):
if isinstance(input, torch.Tensor):
if input.shape[1] <= 3:
return input
cur_device = self.device()
if input._device != cur_device:
if cur_device.type == 'cpu':
input = input.cpu()
else:
input = input.cuda(cur_device.index)
return self._normalize_func(input)
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Modules.""" """Operators."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -16,5 +16,5 @@ from __future__ import print_function ...@@ -16,5 +16,5 @@ from __future__ import print_function
import os import os
from seetadet.utils import env from seetadet.core.backend import load_library as _load_library
env.load_library(os.path.join(os.path.dirname(__file__), '_C')) _load_library(os.path.join(os.path.dirname(__file__), '_C'))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build for ops."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm.torch import nn
from seetadet.ops.loss import GIoULoss
from seetadet.ops.loss import L1Loss
from seetadet.ops.loss import SmoothL1Loss
from seetadet.ops.loss import SigmoidFocalLoss
from seetadet.ops.normalization import FrozenBatchNorm2d
def build_loss(loss_type, reduction='sum', **kwargs):
if isinstance(loss_type, str):
loss_type = loss_type.lower()
if loss_type != 'smooth_l1':
kwargs.pop('beta', None)
loss_type = {
'l1': L1Loss,
'smooth_l1': SmoothL1Loss,
'giou': GIoULoss,
'cross_entroy': nn.CrossEntropyLoss,
'sigmoid_focal': SigmoidFocalLoss,
}[loss_type]
return loss_type(reduction=reduction, **kwargs)
def build_norm(dim, norm_type):
"""Build the normalization module."""
if isinstance(norm_type, str):
if len(norm_type) == 0:
return nn.Identity()
norm_type = {
'BN': nn.BatchNorm2d,
'FrozenBN': FrozenBatchNorm2d,
'SyncBN': nn.SyncBatchNorm,
'GN': lambda c: nn.GroupNorm(32, c),
'Affine': lambda c: FrozenBatchNorm2d(c, affine=True),
}[norm_type]
return norm_type(dim)
def build_activation(activation_type, inplace=False):
"""Build the activation module."""
if isinstance(activation_type, str):
if len(activation_type) == 0:
return nn.Identity()
activation_type = getattr(nn, activation_type)
activation = activation_type()
activation.inplace = inplace
return activation
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Convolution ops."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm.torch import nn
from seetadet.ops.build import build_norm
class ConvNorm2d(nn.Sequential):
"""2d convolution followed by norm."""
def __init__(
self,
dim_in,
dim_out,
kernel_size,
stride=1,
padding=None,
dilation=1,
groups=1,
bias=True,
conv_type='Conv2d',
norm_type='',
activation_type='',
inplace=True,
):
super(ConvNorm2d, self).__init__()
if padding is None:
padding = kernel_size // 2
if conv_type == 'Conv2d':
layers = [nn.Conv2d(dim_in, dim_out,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias=bias and (not norm_type))]
elif conv_type == 'SepConv2d':
layers = [nn.Conv2d(dim_in, dim_in,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=dim_in,
bias=False),
nn.Conv2d(dim_in, dim_out,
kernel_size=1,
bias=bias and (not norm_type))]
else:
raise ValueError('Unknown conv type: ' + conv_type)
if norm_type:
layers += [build_norm(dim_out, norm_type)]
if activation_type:
layers += [getattr(nn, activation_type)()]
layers[-1].inplace = inplace
for i, layer in enumerate(layers):
self.add_module(str(i), layer)
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(
m.weight, mode='fan_out', nonlinearity='relu')
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Operator fusions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torch
from seetadet.core.registry import Registry
# Pass to fuse adjacent modules.
FUSIONS = Registry('fusions')
@FUSIONS.register([
'Conv2d+BatchNorm2d',
'Conv2d+FrozenBatchNorm2d',
'Conv2d+SyncBatchNorm',
'ConvTranspose2d+BatchNorm2d',
'ConvTranspose2d+FrozenBatchNorm2d',
'ConvTranspose2d+SyncBatchNorm',
'DepthwiseConv2d+BatchNorm2d',
'DepthwiseConv2d+FrozenBatchNorm2d',
'DepthwiseConv2d+SyncBatchNorm'])
def fuse_conv_bn(conv, bn):
"""Fuse Conv and BatchNorm."""
with torch.no_grad():
m = bn.running_mean
if conv.bias is not None:
m.sub_(conv.bias.float())
else:
delattr(conv, 'bias')
bn.forward = lambda x: x
t = bn.weight.div((bn.running_var + bn.eps).sqrt_())
conv._parameters['bias'] = bn.bias.sub(t * m)
t_conv_shape = [1, conv.out_channels] if conv.transposed else [0, 1]
t_conv_shape += [1] * len(conv.kernel_size)
if conv.weight.dtype == 'float16' and t.dtype == 'float32':
conv.bias.half_()
weight = conv.weight.float()
weight.mul_(t.reshape_(t_conv_shape)).half_()
conv.weight.copy_(weight)
else:
conv.weight.mul_(t.reshape_(t_conv_shape))
def get_fusion(*modules):
"""Return the fusion pass between modules."""
key = '+'.join(m.__class__.__name__ for m in modules)
return key, FUSIONS.try_get(key)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Loss ops."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import dragon
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
class GIoULoss(nn.Module):
"""GIoU loss."""
def __init__(self, reduction='sum', delta_weights=None):
super(GIoULoss, self).__init__()
self.reduction = reduction
self.delta_weights = delta_weights
def transform_inv(self, boxes, deltas):
widths = boxes[:, 2:3] - boxes[:, 0:1]
heights = boxes[:, 3:4] - boxes[:, 1:2]
ctr_x = boxes[:, 0:1] + 0.5 * widths
ctr_y = boxes[:, 1:2] + 0.5 * heights
dx, dy, dw, dh = torch.chunk(deltas, chunks=4, dim=1)
if self.delta_weights is not None:
wx, wy, ww, wh = self.delta_weights
dx, dy, dw, dh = dx / wx, dy / wy, dw / ww, dh / wh
pred_ctr_x = dx * widths + ctr_x
pred_ctr_y = dy * heights + ctr_y
pred_w = torch.exp(dw) * widths
pred_h = torch.exp(dh) * heights
x1 = pred_ctr_x - 0.5 * pred_w
y1 = pred_ctr_y - 0.5 * pred_h
x2 = pred_ctr_x + 0.5 * pred_w
y2 = pred_ctr_y + 0.5 * pred_h
return x1, y1, x2, y2
def forward_impl(self, input, target, anchor):
x1, y1, x2, y2 = self.transform_inv(anchor, input)
x1_, y1_, x2_, y2_ = self.transform_inv(anchor, target)
# Compute the independent area.
pred_area = (x2 - x1) * (y2 - y1)
target_area = (x2_ - x1_) * (y2_ - y1_)
# Compute the intersecting area.
x1_inter = torch.maximum(x1, x1_)
y1_inter = torch.maximum(y1, y1_)
x2_inter = torch.minimum(x2, x2_)
y2_inter = torch.minimum(y2, y2_)
w_inter = torch.clamp(x2_inter - x1_inter, min=0)
h_inter = torch.clamp(y2_inter - y1_inter, min=0)
area_inter = w_inter * h_inter
# Compute the enclosing area.
x1_enc = torch.minimum(x1, x1_)
y1_enc = torch.minimum(y1, y1_)
x2_enc = torch.maximum(x2, x2_)
y2_enc = torch.maximum(y2, y2_)
area_enc = (x2_enc - x1_enc) * (y2_enc - y1_enc) + 1.
# Compute the differentiable IoU metric.
area_union = pred_area + target_area - area_inter
iou = area_inter / (area_union + 1.)
iou_metric = iou - (area_enc - area_union) / area_enc
# Compute the reduced loss.
if self.reduction == 'sum':
return (1 - iou_metric).sum()
else:
return (1 - iou_metric).mean()
def forward(self, *inputs, **kwargs):
with dragon.variable_scope('IoULossVariable'):
return self.forward_impl(*inputs, **kwargs)
class L1Loss(nn.L1Loss):
"""L1 loss."""
def forward(self, input, target, *args):
return super(L1Loss, self).forward(input, target)
class SigmoidFocalLoss(nn.SigmoidFocalLoss):
"""Sigmoid focal loss."""
def __init__(self, reduction='sum'):
super(SigmoidFocalLoss, self).__init__(
alpha=cfg.MODEL.FOCAL_LOSS_ALPHA,
gamma=cfg.MODEL.FOCAL_LOSS_GAMMA,
start_index=1, # Foreground index
reduction=reduction)
class SmoothL1Loss(nn.SmoothL1Loss):
"""Smoothed l1 loss."""
def forward(self, input, target, *args):
return nn.functional.smooth_l1_loss(
input, target, beta=self.beta,
reduction=self.reduction)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""NN ops."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torch
from dragon.vm.torch import nn
class WeightedFusion(nn.Module):
"""Fuse inputs using the weighted sum."""
def __init__(self, num_inputs, fuse_type='sum', init=1.):
super(WeightedFusion, self).__init__()
self.fuse_type = fuse_type
if fuse_type == 'attn' or fuse_type == 'fast_attn':
self.weight = nn.Parameter(torch.Tensor(num_inputs).fill_(init))
elif fuse_type == 'sum':
self.weight = None
else:
raise ValueError('Unknown fuse type: ' + fuse_type)
def forward(self, inputs):
inputs = list(filter(lambda x: x is not None, inputs))
if self.fuse_type == 'attn':
weight = nn.functional.softmax(self.weight, 0)
inputs = [inputs[i] * weight[i] for i in range(len(inputs))]
elif self.fuse_type == 'fast_attn':
# NB: This implementation actually is "slow"
# due to the more kernels are launched.
weight = nn.functional.relu(self.weight)
weight = weight / (weight.sum(0) + 1e-4)
inputs = [inputs[i] * weight[i] for i in range(len(inputs))]
out = inputs[0]
for x in inputs[1:]:
out = out + x
return out
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Normalization ops."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
class FrozenBatchNorm2d(nn.Module):
"""BatchNorm2d where statistics or affine parameters are fixed."""
def __init__(self, num_features, eps=1e-5, affine=False, inplace=True):
super(FrozenBatchNorm2d, self).__init__()
self.num_features = num_features
self.eps = eps
self.affine = affine
self.inplace = inplace and (not affine)
if self.affine:
self.weight = torch.nn.Parameter(torch.ones(num_features))
self.bias = torch.nn.Parameter(torch.zeros(num_features))
else:
self.register_buffer('weight', torch.ones(num_features))
self.register_buffer('bias', torch.zeros(num_features))
self.register_buffer('running_mean', torch.zeros(num_features))
self.register_buffer('running_var', torch.ones(num_features) - eps)
def extra_repr(self):
affine_str = '{num_features}, eps={eps}, affine={affine}' \
.format(**self.__dict__)
inplace_str = ', inplace' if self.inplace else ''
return affine_str + inplace_str
def forward(self, input):
return nn.functional.affine(
input, self.weight, self.bias,
dim=1, out=input if self.inplace else None)
def _load_from_state_dict(
self,
state_dict,
prefix,
strict,
missing_keys,
unexpected_keys,
error_msgs,
):
super(FrozenBatchNorm2d, self)._load_from_state_dict(
state_dict,
prefix,
strict,
missing_keys,
unexpected_keys,
error_msgs,
)
# Fuse the running stats into weight and bias.
# Note that this behavior will break the original stats
# into zero means and one stds.
with torch.no_grad():
self.running_var.float_().add_(self.eps).sqrt_()
self.weight.float_().div_(self.running_var)
self.bias.float_().sub_(self.running_mean.float_() * self.weight)
self.running_mean.zero_()
self.running_var.one_().sub_(self.eps)
class L2Norm(nn.Module):
"""Parameterized L2 normalize."""
def __init__(self, num_features, init=20., eps=1e-5):
super(L2Norm, self).__init__()
self.eps = eps
self.weight = nn.Parameter(torch.Tensor(num_features).fill_(init))
def forward(self, input):
out = nn.functional.normalize(input, p=2, dim=1, eps=self.eps)
return nn.functional.affine(out, self.weight, dim=1)
class ToTensor(nn.Module):
"""Convert input to tensor."""
def __init__(self):
super(ToTensor, self).__init__()
self.device = torch.device('cpu')
self.tensor = torch.ones(1)
self.normalize = functools.partial(
nn.functional.channel_norm,
mean=cfg.MODEL.PIXEL_MEAN,
std=cfg.MODEL.PIXEL_STD,
dim=1, dims=(0, 3, 1, 2),
dtype=cfg.MODEL.PRECISION.lower())
def _apply(self, fn):
fn(self.tensor)
def cpu(self):
self.device = torch.device('cpu')
def cuda(self, device=None):
self.device = torch.device('cuda', device)
def forward(self, input, normalize=False):
if input is None:
return input
if not isinstance(input, torch.Tensor):
input = torch.from_numpy(input)
input = input.to(self.tensor.device)
if normalize and not input.is_floating_point():
input = self.normalize(input)
return input
def to_tensor(input, device='cuda'):
"""Convert input to tensor."""
if input is None:
return input
if not isinstance(input, torch.Tensor):
input = torch.from_numpy(input)
device = torch.device(device, cfg.GPU_ID)
return input.to(device=device)
...@@ -13,16 +13,15 @@ from __future__ import absolute_import ...@@ -13,16 +13,15 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from dragon.vm.onnx.core import exporter
from dragon.vm.onnx.core import helper from dragon.vm.onnx.core import helper
from dragon.vm.onnx.core.exporters import utils as export_util
@exporter.register('RetinanetDecoder') @export_util.register('RetinaNetDecoder')
def retinanet_decoder_exporter(op_def, shape_dict, ws): def retinanet_decoder_exporter(op_def, context):
node, const_tensors = exporter.translate(**locals()) node, const_tensors = export_util.translate(**locals())
node.op_type = 'ATen' # Currently not supported in ai.onnx node.op_type = 'ATen' # Currently not supported in ai.onnx.
helper.add_attribute(node, 'op_type', 'RetinaNetDecoder') helper.add_attribute(node, 'op_type', 'RetinaNetDecoder')
for arg in op_def.arg: for arg in op_def.arg:
if arg.name == 'strides': if arg.name == 'strides':
helper.add_attribute(node, 'strides', arg.ints) helper.add_attribute(node, 'strides', arg.ints)
...@@ -34,16 +33,14 @@ def retinanet_decoder_exporter(op_def, shape_dict, ws): ...@@ -34,16 +33,14 @@ def retinanet_decoder_exporter(op_def, shape_dict, ws):
helper.add_attribute(node, 'pre_nms_top_n', arg.i) helper.add_attribute(node, 'pre_nms_top_n', arg.i)
elif arg.name == 'score_thresh': elif arg.name == 'score_thresh':
helper.add_attribute(node, 'score_thresh', arg.f) helper.add_attribute(node, 'score_thresh', arg.f)
return node, const_tensors return node, const_tensors
@exporter.register('RPNDecoder') @export_util.register('RPNDecoder')
def rpn_decoder_exporter(op_def, shape_dict, ws): def rpn_decoder_exporter(op_def, context):
node, const_tensors = exporter.translate(**locals()) node, const_tensors = export_util.translate(**locals())
node.op_type = 'ATen' # Currently not supported in ai.onnx node.op_type = 'ATen' # Currently not supported in ai.onnx.
helper.add_attribute(node, 'op_type', 'RPNDecoder') helper.add_attribute(node, 'op_type', 'RPNDecoder')
for arg in op_def.arg: for arg in op_def.arg:
if arg.name == 'strides': if arg.name == 'strides':
helper.add_attribute(node, 'strides', arg.ints) helper.add_attribute(node, 'strides', arg.ints)
...@@ -61,9 +58,4 @@ def rpn_decoder_exporter(op_def, shape_dict, ws): ...@@ -61,9 +58,4 @@ def rpn_decoder_exporter(op_def, shape_dict, ws):
helper.add_attribute(node, 'min_level', arg.i) helper.add_attribute(node, 'min_level', arg.i)
elif arg.name == 'max_level': elif arg.name == 'max_level':
helper.add_attribute(node, 'max_level', arg.i) helper.add_attribute(node, 'max_level', arg.i)
elif arg.name == 'canonical_scale':
helper.add_attribute(node, 'canonical_scale', arg.i)
elif arg.name == 'canonical_level':
helper.add_attribute(node, 'canonical_level', arg.i)
return node, const_tensors return node, const_tensors
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Vision ops."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torchvision
from dragon.vm.torch import nn
from dragon.vm.torch import autograd
class RoIPooler(nn.Module):
"""Resample RoI features into a fixed resolution."""
def __init__(self, pooler_type='RoIAlign', resolution=7, sampling_ratio=1.0):
super(RoIPooler, self).__init__()
self.pooler_type = pooler_type
self.resolution = resolution
self.sampling_ratio = sampling_ratio
def forward(self, input, boxes, spatial_scale=1.0):
if self.pooler_type == 'RoIPool':
return torchvision.ops.roi_pool(
input, boxes,
output_size=(self.resolution, self.resolution),
spatial_scale=spatial_scale)
elif self.pooler_type == 'RoIAlign':
return torchvision.ops.roi_align(
input, boxes,
output_size=(self.resolution, self.resolution),
spatial_scale=spatial_scale,
sampling_ratio=self.sampling_ratio)
else:
raise NotImplementedError
class NonMaxSuppression(object):
"""Filter out boxes that have high IoU with selected ones."""
@staticmethod
def apply(input, iou_threshold=0.5):
return autograd.Function.apply(
'NonMaxSuppression', input.device, [input],
iou_threshold=iou_threshold)
autograd.Function.register(
'NonMaxSuppression', lambda **kwargs: {
'iou_threshold': kwargs.get('iou_threshold', 0.5),
})
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from seetadet.core.config import cfg
class _LRScheduler(object):
def __init__(
self,
lr_max,
lr_min=0.,
warmup_steps=0,
warmup_factor=0.,
):
self._step_count = 0
self._lr_max = lr_max
self._lr_min = lr_min
self._warmup_steps = warmup_steps
self._warmup_factor = warmup_factor
self._last_lr = self._lr_max
self._last_steps = self._warmup_steps
def step(self):
self._step_count += 1
def get_lr(self):
if self._step_count < self._warmup_steps:
alpha = (self._step_count + 1.) / self._warmup_steps
decay_factor = self._warmup_factor * (1 - alpha) + alpha
self._last_lr = self._lr_max * decay_factor
return self._last_lr
return self.schedule_impl()
def schedule_impl(self):
raise NotImplementedError
class CosineLR(_LRScheduler):
def __init__(
self,
lr_max,
lr_min,
decay_step,
max_steps,
warmup_steps=0,
warmup_factor=0.,
):
super(CosineLR, self).__init__(
lr_max=lr_max,
lr_min=lr_min,
warmup_steps=warmup_steps,
warmup_factor=warmup_factor,
)
self._decay_step = decay_step
self._max_steps = max_steps - warmup_steps
def schedule_impl(self):
step_count = self._step_count - self._last_steps
if step_count % self._decay_step == 0:
decay_factor = 0.5 * (1. + math.cos(
math.pi * step_count / self._max_steps))
self._last_lr = self._lr_min + \
(self._lr_max - self._lr_min) * decay_factor
return self._last_lr
class MultiStepLR(_LRScheduler):
def __init__(
self,
lr_max,
decay_steps,
decay_gamma,
warmup_steps=0,
warmup_factor=0.,
):
super(MultiStepLR, self).__init__(
lr_max=lr_max,
warmup_steps=warmup_steps,
warmup_factor=warmup_factor,
)
self._decay_steps = decay_steps
self._decay_gamma = decay_gamma
self._stage_count = 0
self._num_stages = len(self._decay_steps)
def schedule_impl(self):
if self._stage_count < self._num_stages:
k = self._decay_steps[self._stage_count]
while self._step_count >= k:
self._stage_count += 1
if self._stage_count >= self._num_stages:
break
k = self._decay_steps[self._stage_count]
self._last_lr = self._lr_max * (
self._decay_gamma ** self._stage_count)
return self._last_lr
class LinearCosineLR(_LRScheduler):
def __init__(
self,
lr_max,
lr_min,
decay_step,
max_steps,
warmup_steps=0,
warmup_factor=0.,
):
super(LinearCosineLR, self).__init__(
lr_max=lr_max,
lr_min=lr_min,
warmup_steps=warmup_steps,
warmup_factor=warmup_factor,
)
self._decay_step = decay_step
self._max_steps = max_steps - warmup_steps
def schedule_impl(self):
step_count = self._step_count - self._last_steps
if step_count % self._decay_step == 0:
linear_decay = 1. - float(step_count) / self._max_steps
cosine_decay = 0.5 * (1. + math.cos(
math.pi * step_count / self._max_steps))
decay_factor = linear_decay * cosine_decay
self._last_lr = \
self._lr_min + (self._lr_max - self._lr_min) * decay_factor
return self._last_lr
class StepLR(_LRScheduler):
def __init__(
self,
lr_max,
decay_step,
decay_gamma,
warmup_steps=0,
warmup_factor=0.,
):
super(StepLR, self).__init__(
lr_max=lr_max,
warmup_steps=warmup_steps,
warmup_factor=warmup_factor,
)
self._decay_step = decay_step
self._decay_gamma = decay_gamma
def schedule_impl(self):
step_count = self._step_count - self._last_steps
if step_count % self._decay_step == 0:
decay_factor = step_count // self._decay_step
self._last_lr = self._lr_max * (
self._decay_gamma ** decay_factor)
return self._last_lr
def get_scheduler():
lr_policy = cfg.SOLVER.LR_POLICY
if lr_policy == 'cosine_decay':
return CosineLR(
lr_max=cfg.SOLVER.BASE_LR,
lr_min=0.,
decay_step=cfg.SOLVER.DECAY_STEP,
max_steps=cfg.SOLVER.MAX_STEPS,
warmup_steps=cfg.SOLVER.WARM_UP_STEPS,
warmup_factor=cfg.SOLVER.WARM_UP_FACTOR,
)
elif lr_policy == 'linear_cosine_decay':
return LinearCosineLR(
lr_max=cfg.SOLVER.BASE_LR,
lr_min=0.,
decay_step=cfg.SOLVER.DECAY_STEP,
max_steps=cfg.SOLVER.MAX_STEPS,
warmup_steps=cfg.SOLVER.WARM_UP_STEPS,
warmup_factor=cfg.SOLVER.WARM_UP_FACTOR,
)
elif lr_policy == 'step':
return StepLR(
lr_max=cfg.SOLVER.BASE_LR,
decay_step=cfg.SOLVER.DECAY_STEP,
decay_gamma=cfg.SOLVER.DECAY_GAMMA,
warmup_steps=cfg.SOLVER.WARM_UP_STEPS,
warmup_factor=cfg.SOLVER.WARM_UP_FACTOR,
)
elif lr_policy == 'steps_with_decay':
return MultiStepLR(
lr_max=cfg.SOLVER.BASE_LR,
decay_steps=cfg.SOLVER.DECAY_STEPS,
decay_gamma=cfg.SOLVER.DECAY_GAMMA,
warmup_steps=cfg.SOLVER.WARM_UP_STEPS,
warmup_factor=cfg.SOLVER.WARM_UP_FACTOR,
)
else:
raise ValueError('Unknown lr policy: ' + lr_policy)
if __name__ == '__main__':
def extract_label(scheduler):
class_name = scheduler.__class__.__name__
label = class_name + '('
if class_name == 'CosineLR':
label += 'α=' + str(scheduler._decay_step)
elif class_name == 'LinearCosineLR':
label += 'α=' + str(scheduler._decay_step)
elif class_name == 'MultiStepLR':
label += 'α=' + str(scheduler._decay_steps) + ', '
label += 'γ=' + str(scheduler._decay_gamma)
elif class_name == 'StepLR':
label += 'α=' + str(scheduler._decay_step) + ', '
label += 'γ=' + str(scheduler._decay_gamma)
label += ')'
return label
vis = True
max_steps = 240
shared_args = {
'lr_max': 0.4,
'warmup_steps': 5,
'warmup_factor': 0.,
}
schedulers = [
StepLR(decay_step=1, decay_gamma=0.97, **shared_args),
MultiStepLR(decay_steps=[60, 120, 180], decay_gamma=0.1, **shared_args),
CosineLR(lr_min=0., decay_step=1, max_steps=max_steps, **shared_args),
LinearCosineLR(lr_min=0., decay_step=1, max_steps=max_steps, **shared_args),
]
for i in range(max_steps):
info = 'Step = %d\n' % i
for scheduler in schedulers:
if i == 0:
scheduler.lr_seq = []
info += ' * {}: {}\n'.format(
extract_label(scheduler),
scheduler.get_lr())
scheduler.lr_seq.append(scheduler.get_lr())
scheduler.step()
if not vis:
print(info)
if vis:
import matplotlib.pyplot as plt
plt.figure(1)
plt.title('Visualization of different LR Schedulers')
plt.xlabel('Step')
plt.ylabel('Learning Rate')
line = '-'
colors = ['r', 'g', 'b', 'c', 'm', 'y', 'k']
for i, scheduler in enumerate(schedulers):
plt.plot(
range(max_steps),
scheduler.lr_seq,
colors[i] + line,
linewidth=1.,
label=extract_label(scheduler),
)
plt.legend()
plt.grid(linestyle='--')
plt.show()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import dragon.vm.torch as torch
from seetadet.core.config import cfg
from seetadet.modeling.detector import Detector
from seetadet.solver import lr_scheduler
from seetadet.utils import env
from seetadet.utils import time_util
class SGDSolver(object):
def __init__(self):
# Define the generic detector
self.detector = Detector()
# Define the optimizer and its arguments
self.optimizer = torch.optim.SGD(
env.get_param_groups(self.detector),
lr=cfg.SOLVER.BASE_LR,
momentum=cfg.SOLVER.MOMENTUM,
weight_decay=cfg.SOLVER.WEIGHT_DECAY,
clip_norm=float(cfg.SOLVER.CLIP_NORM),
scale=1.0 / cfg.SOLVER.LOSS_SCALING,
)
self.lr_scheduler = lr_scheduler.get_scheduler()
def step(self):
def add_loss(x, y):
return y if x is None else x + y
stats = {
'iter': self.iter,
'loss': {'total': 0.},
'time': time_util.Timer(),
}
with stats['time'].tic_and_toc():
# Forward pass
outputs = self.detector()
# Backward pass
total_loss = None
loss_scaling = cfg.SOLVER.LOSS_SCALING
for k, v in outputs.items():
if 'loss' in k:
if k not in stats['loss']:
stats['loss'][k] = 0.
total_loss = add_loss(total_loss, v)
stats['loss'][k] += float(v)
stats['loss']['total'] += float(total_loss)
if loss_scaling != 1.0:
total_loss *= loss_scaling
total_loss.backward()
# Apply Update
self.base_lr = self.lr_scheduler.get_lr()
self.optimizer.step()
self.lr_scheduler.step()
# Misc stats
stats['lr'] = self.base_lr
stats['time'] = stats['time'].total_time
return stats
@property
def base_lr(self):
return self.optimizer.param_groups[0]['lr']
@base_lr.setter
def base_lr(self, value):
for group in self.optimizer.param_groups:
group['lr'] = value
@property
def iter(self):
return self.lr_scheduler._step_count
@iter.setter
def iter(self, value):
self.lr_scheduler._step_count = value
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
"""A simple attribute dictionary used for representing configuration options."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
class AttrDict(dict):
IMMUTABLE = '__immutable__'
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
self.__dict__[AttrDict.IMMUTABLE] = False
def __getattr__(self, name):
if name in self.__dict__:
return self.__dict__[name]
elif name in self:
return self[name]
else:
raise AttributeError(name)
def __setattr__(self, name, value):
if not self.__dict__[AttrDict.IMMUTABLE]:
if name in self.__dict__:
self.__dict__[name] = value
else:
self[name] = value
else:
raise AttributeError(
'Attempted to set "{}" to "{}", but AttrDict is immutable'
.format(name, value))
def immutable(self, is_immutable):
"""Set immutability to is_immutable and recursively apply the setting
to all nested AttrDicts.
"""
self.__dict__[AttrDict.IMMUTABLE] = is_immutable
# Recursively set immutable state
for v in self.__dict__.values():
if isinstance(v, AttrDict):
v.immutable(is_immutable)
for v in self.values():
if isinstance(v, AttrDict):
v.immutable(is_immutable)
def is_immutable(self):
return self.__dict__[AttrDict.IMMUTABLE]
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Bounding-Box utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.utils.bbox.helper import filter_boxes
from seetadet.utils.bbox.helper import flip_boxes
from seetadet.utils.bbox.helper import flip_polygons
from seetadet.utils.bbox.helper import clip_boxes
from seetadet.utils.bbox.helper import clip_tiled_boxes
from seetadet.utils.bbox.helper import distribute_boxes
from seetadet.utils.bbox.helper import rescale_boxes
from seetadet.utils.bbox.metrics import bbox_overlaps
from seetadet.utils.bbox.metrics import bbox_centerness
from seetadet.utils.bbox.metrics import boxes_iou
from seetadet.utils.bbox.transforms import bbox_transform
from seetadet.utils.bbox.transforms import bbox_transform_inv
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Helper functions for Bounding-Box."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def clip_boxes(boxes, im_shape):
"""Clip the boxes."""
xmax, ymax = im_shape[1] - 1, im_shape[0] - 1
boxes[:, (0, 2)] = np.maximum(np.minimum(boxes[:, (0, 2)], xmax), 0)
boxes[:, (1, 3)] = np.maximum(np.minimum(boxes[:, (1, 3)], ymax), 0)
return boxes
def clip_tiled_boxes(boxes, im_shape):
"""Clip the tiled boxes."""
xmax, ymax = im_shape[1] - 1, im_shape[0] - 1
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], xmax), 0)
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], ymax), 0)
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], xmax), 0)
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], ymax), 0)
return boxes
def rescale_boxes(boxes, scale_factor=1.):
"""Rescale the boxes."""
w = (boxes[:, 2] - boxes[:, 0]) * 0.5 * scale_factor
h = (boxes[:, 3] - boxes[:, 1]) * 0.5 * scale_factor
x_ctr = (boxes[:, 2] + boxes[:, 0]) * 0.5
y_ctr = (boxes[:, 3] + boxes[:, 1]) * 0.5
boxes_rescaled = np.zeros(boxes.shape)
boxes_rescaled[:, 0], boxes_rescaled[:, 1] = x_ctr - w, y_ctr - h
boxes_rescaled[:, 2], boxes_rescaled[:, 3] = x_ctr + w, y_ctr + h
return boxes_rescaled
def flip_boxes(boxes, width):
"""Flip the boxes horizontally."""
boxes_flipped = boxes.copy()
boxes_flipped[:, 0] = width - boxes[:, 2] - 1
boxes_flipped[:, 2] = width - boxes[:, 0] - 1
return boxes_flipped
def flip_polygons(polygons, width):
"""Flip the polygons horizontally."""
for i, poly in enumerate(polygons):
poly_flipped = poly.copy()
poly_flipped[0::2] = width - poly[0::2] - 1
polygons[i] = poly_flipped
return polygons
def filter_boxes(boxes, min_size):
"""Remove all boxes with any side smaller than min size."""
ws = boxes[:, 2] - boxes[:, 0] + 1
hs = boxes[:, 3] - boxes[:, 1] + 1
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
return keep
def distribute_boxes(boxes, lvl_min, lvl_max):
"""Return the fpn level of boxes."""
if len(boxes) == 0:
return []
ws = boxes[:, 2] - boxes[:, 0] + 1
hs = boxes[:, 3] - boxes[:, 1] + 1
s = np.sqrt(ws * hs)
s0 = 224 # default: 224
lvl0 = 4 # default: 4
lvls = np.floor(lvl0 + np.log2(s / s0 + 1e-6))
return np.clip(lvls, lvl_min, lvl_max)
...@@ -8,60 +8,83 @@ ...@@ -8,60 +8,83 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Box utilities for normalized coordinates.""" """Bounding-Box metrics."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from seetadet.utils.bbox import cython_bbox
import numpy as np import numpy as np
def bbox_overlaps(boxes1, boxes2):
"""Compute the overlaps between two group of boxes."""
boxes1 = np.ascontiguousarray(boxes1, dtype=np.float)
boxes2 = np.ascontiguousarray(boxes2, dtype=np.float)
return cython_bbox.bbox_overlaps(boxes1, boxes2)
def bbox_centerness(boxes1, boxes2):
"""Compute centerness of the boxes to ground-truth."""
ctr_x = (boxes1[:, 2] + boxes1[:, 0]) / 2
ctr_y = (boxes1[:, 3] + boxes1[:, 1]) / 2
l = ctr_x - boxes2[:, 0]
t = ctr_y - boxes2[:, 1]
r = boxes2[:, 2] - ctr_x
b = boxes2[:, 3] - ctr_y
centerness = ((np.minimum(l, r) / np.maximum(l, r)) *
(np.minimum(t, b) / np.maximum(t, b)))
min_dist = np.stack([l, t, r, b], axis=1).min(axis=1)
keep_inds = np.where(min_dist > 0.01)[0]
discard_inds = np.where(min_dist <= 0.01)[0]
centerness[keep_inds] = np.sqrt(centerness[keep_inds])
centerness[discard_inds] = -1
return centerness, keep_inds, discard_inds
def boxes_area(boxes): def boxes_area(boxes):
"""Compute the area of input boxes.""" """Compute the area of input boxes."""
w = (boxes[:, 2] - boxes[:, 0]) w = (boxes[:, 2] - boxes[:, 0])
h = (boxes[:, 3] - boxes[:, 1]) h = (boxes[:, 3] - boxes[:, 1])
area = w * h return w * h
assert np.all(area >= 0), 'Negative areas founds'
return area
def intersection(box1, box2): def boxes_intersection(boxes1, boxes2):
"""Compute intersection between boxes.""" """Compute intersection between boxes."""
[y_min1, x_min1, y_max1, x_max1] = np.split(box1, 4, axis=1) [y_min1, x_min1, y_max1, x_max1] = np.split(boxes1, 4, axis=1)
[y_min2, x_min2, y_max2, x_max2] = np.split(box2, 4, axis=1) [y_min2, x_min2, y_max2, x_max2] = np.split(boxes2, 4, axis=1)
all_pairs_min_ymax = np.minimum(y_max1, np.transpose(y_max2)) all_pairs_min_ymax = np.minimum(y_max1, np.transpose(y_max2))
all_pairs_max_ymin = np.maximum(y_min1, np.transpose(y_min2)) all_pairs_max_ymin = np.maximum(y_min1, np.transpose(y_min2))
all_pairs_min_xmax = np.minimum(x_max1, np.transpose(x_max2)) all_pairs_min_xmax = np.minimum(x_max1, np.transpose(x_max2))
all_pairs_max_xmin = np.maximum(x_min1, np.transpose(x_min2)) all_pairs_max_xmin = np.maximum(x_min1, np.transpose(x_min2))
inter_heights = np.maximum( inter_heights = np.maximum(np.zeros(all_pairs_max_ymin.shape),
np.zeros(all_pairs_max_ymin.shape),
all_pairs_min_ymax - all_pairs_max_ymin) all_pairs_min_ymax - all_pairs_max_ymin)
inter_widths = np.maximum( inter_widths = np.maximum(np.zeros(all_pairs_max_xmin.shape),
np.zeros(all_pairs_max_xmin.shape),
all_pairs_min_xmax - all_pairs_max_xmin) all_pairs_min_xmax - all_pairs_max_xmin)
return inter_heights * inter_widths return inter_heights * inter_widths
def ioa1(box1, box2): def boxes_ioa1(boxes1, boxes2):
"""Compute intersection-over-area1 between boxes.""" """Compute intersection-over-area1 between boxes."""
inter = intersection(box1, box2) inter = boxes_intersection(boxes1, boxes2)
area = np.expand_dims(boxes_area(box1), axis=1) area = np.expand_dims(boxes_area(boxes1), axis=1)
return inter / area return inter / area
def ioa2(box1, box2): def boxes_ioa2(boxes1, boxes2):
"""Compute intersection-over-area2 between boxes.""" """Compute intersection-over-area2 between boxes."""
inter = intersection(box1, box2) inter = boxes_intersection(boxes1, boxes2)
area = np.expand_dims(boxes_area(box2), axis=0) area = np.expand_dims(boxes_area(boxes2), axis=0)
return inter / area return inter / area
def iou(box1, box2): def boxes_iou(boxes1, boxes2):
"""Compute intersection-over-union between boxes.""" """Compute intersection-over-union between boxes."""
inter = intersection(box1, box2) inter = boxes_intersection(boxes1, boxes2)
area1 = boxes_area(box1) area1 = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1])
area2 = boxes_area(box2) area2 = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1])
union = (np.expand_dims(area1, axis=1) + union = (np.expand_dims(area1, axis=1) +
np.expand_dims(area2, axis=0) - inter) np.expand_dims(area2, axis=0) - inter)
return inter / union return inter / union
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Box utilities for original coordinates.""" """Bounding-Box transforms."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -16,15 +16,7 @@ from __future__ import print_function ...@@ -16,15 +16,7 @@ from __future__ import print_function
import numpy as np import numpy as np
from seetadet.utils import cython_bbox _DEFAULT_SCALE_CLIP = np.log(1333.0 / 4.0)
def bbox_overlaps(boxes1, boxes2):
"""Compute the overlaps between two group of boxes."""
return cython_bbox.bbox_overlaps(
np.ascontiguousarray(boxes1, dtype=np.float),
np.ascontiguousarray(boxes2, dtype=np.float),
)
def bbox_transform(ex_rois, gt_rois, weights=(1., 1., 1., 1.)): def bbox_transform(ex_rois, gt_rois, weights=(1., 1., 1., 1.)):
...@@ -33,137 +25,41 @@ def bbox_transform(ex_rois, gt_rois, weights=(1., 1., 1., 1.)): ...@@ -33,137 +25,41 @@ def bbox_transform(ex_rois, gt_rois, weights=(1., 1., 1., 1.)):
ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1. ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.
ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights
gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1. gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.
gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1. gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.
gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights
wx, wy, ww, wh = weights wx, wy, ww, wh = weights
targets = [wx * (gt_ctr_x - ex_ctr_x) / ex_widths] targets = [wx * (gt_ctr_x - ex_ctr_x) / ex_widths]
targets += [wy * (gt_ctr_y - ex_ctr_y) / ex_heights] targets += [wy * (gt_ctr_y - ex_ctr_y) / ex_heights]
targets += [ww * np.log(gt_widths / ex_widths)] targets += [ww * np.log(gt_widths / ex_widths)]
targets += [wh * np.log(gt_heights / ex_heights)] targets += [wh * np.log(gt_heights / ex_heights)]
return np.vstack(targets).transpose() return np.vstack(targets).transpose()
def bbox_centerness(ex_rois, gt_rois): def bbox_transform_inv(boxes, deltas, weights=(1., 1., 1., 1.)):
"""Compute centerness of the boxes to ground-truth."""
ex_ctr_x = (ex_rois[:, 2] + ex_rois[:, 0]) / 2
ex_ctr_y = (ex_rois[:, 3] + ex_rois[:, 1]) / 2
l = ex_ctr_x - gt_rois[:, 0]
t = ex_ctr_y - gt_rois[:, 1]
r = gt_rois[:, 2] - ex_ctr_x
b = gt_rois[:, 3] - ex_ctr_y
centerness = \
(np.minimum(l, r) / np.maximum(l, r)) * \
(np.minimum(t, b) / np.maximum(t, b))
min_dist = np.stack([l, t, r, b], axis=1).min(axis=1)
keep_inds = np.where(min_dist > 0.01)[0]
discard_inds = np.where(min_dist <= 0.01)[0]
centerness[keep_inds] = np.sqrt(centerness[keep_inds])
centerness[discard_inds] = -1
return centerness, keep_inds, discard_inds
def bbox_transform_inv(boxes, deltas, weights=(1., 1., 1., 1.), clip=None):
"""Decode the final boxes according to the deltas.""" """Decode the final boxes according to the deltas."""
if boxes.shape[0] == 0: if boxes.shape[0] == 0:
return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype) return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
boxes = boxes.astype(deltas.dtype, copy=False) boxes = boxes.astype(deltas.dtype, copy=False)
widths = boxes[:, 2] - boxes[:, 0] + 1. widths = boxes[:, 2] - boxes[:, 0] + 1.
heights = boxes[:, 3] - boxes[:, 1] + 1. heights = boxes[:, 3] - boxes[:, 1] + 1.
ctr_x = boxes[:, 0] + 0.5 * widths ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights ctr_y = boxes[:, 1] + 0.5 * heights
wx, wy, ww, wh = weights wx, wy, ww, wh = weights
dx = deltas[:, 0::4] / wx dx = deltas[:, 0::4] / wx
dy = deltas[:, 1::4] / wy dy = deltas[:, 1::4] / wy
dw = deltas[:, 2::4] / ww dw = deltas[:, 2::4] / ww
dh = deltas[:, 3::4] / wh dh = deltas[:, 3::4] / wh
dw = np.minimum(dw, _DEFAULT_SCALE_CLIP)
# Heuristically clip height and width deltas dh = np.minimum(dh, _DEFAULT_SCALE_CLIP)
# to avoid too large value in np.exp(...)
if clip is not None:
dw = np.minimum(dw, clip)
dh = np.minimum(dh, clip)
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis] pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis] pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
pred_w = np.exp(dw) * widths[:, np.newaxis] pred_w = np.exp(dw) * widths[:, np.newaxis]
pred_h = np.exp(dh) * heights[:, np.newaxis] pred_h = np.exp(dh) * heights[:, np.newaxis]
pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype) pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w # x1 pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h # y1 pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w - 1 # x2 pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w - 1
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h - 1 # y2 pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h - 1
return pred_boxes return pred_boxes
def clip_boxes(boxes, im_shape):
# x1 >= 0
boxes[:, 0] = np.maximum(np.minimum(boxes[:, 0], im_shape[1] - 1), 0)
# y1 >= 0
boxes[:, 1] = np.maximum(np.minimum(boxes[:, 1], im_shape[0] - 1), 0)
# x2 < im_shape[1]
boxes[:, 2] = np.maximum(np.minimum(boxes[:, 2], im_shape[1] - 1), 0)
# y2 < im_shape[0]
boxes[:, 3] = np.maximum(np.minimum(boxes[:, 3], im_shape[0] - 1), 0)
return boxes
def clip_tiled_boxes(boxes, im_shape):
# x1 >= 0
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
# y1 >= 0
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
# x2 < im_shape[1]
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
# y2 < im_shape[0]
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
return boxes
def expand_boxes(boxes, scale):
"""Expand an array of boxes by a given scale."""
w_half = (boxes[:, 2] - boxes[:, 0]) * .5
h_half = (boxes[:, 3] - boxes[:, 1]) * .5
x_c = (boxes[:, 2] + boxes[:, 0]) * .5
y_c = (boxes[:, 3] + boxes[:, 1]) * .5
w_half *= scale
h_half *= scale
boxes_exp = np.zeros(boxes.shape)
boxes_exp[:, 0] = x_c - w_half
boxes_exp[:, 2] = x_c + w_half
boxes_exp[:, 1] = y_c - h_half
boxes_exp[:, 3] = y_c + h_half
return boxes_exp
def flip_boxes(boxes, width):
"""Flip the boxes horizontally."""
boxes_flipped = boxes.copy()
boxes_flipped[:, 0] = width - boxes[:, 2] - 1
boxes_flipped[:, 2] = width - boxes[:, 0] - 1
return boxes_flipped
def flip_polygons(polygons, width):
"""Flip the polygons horizontally."""
for i, poly in enumerate(polygons):
poly_flipped = poly.copy()
poly_flipped[0::2] = width - poly[0::2] - 1
polygons[i] = poly_flipped
return polygons
def filter_boxes(boxes, min_size):
"""Remove all boxes with any side smaller than min size."""
ws = boxes[:, 2] - boxes[:, 0] + 1
hs = boxes[:, 3] - boxes[:, 1] + 1
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
return keep
...@@ -7,11 +7,8 @@ ...@@ -7,11 +7,8 @@
# #
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# Codes are based on:
#
# <https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/utils/blob.py>
#
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Blob utilities."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -19,26 +16,29 @@ from __future__ import print_function ...@@ -19,26 +16,29 @@ from __future__ import print_function
import numpy as np import numpy as np
from seetadet.core.config import cfg
def im_list_to_blob(ims, coarsest_stride=0): def blob_vstack(arrays, fill_value=None, dtype=None, size=None, align=None):
"""Convert a list of images into a network input.""" """Stack arrays in sequence vertically."""
blob_dtype = 'uint8' if ims[0].dtype == 'uint8' else 'float32' if fill_value is None:
max_shape = np.array([im.shape for im in ims]).max(axis=0) return np.vstack(arrays)
if coarsest_stride > 0: # Compute the max stack shape.
stride = coarsest_stride max_shape = np.max(np.stack([arr.shape for arr in arrays]), 0)
max_shape[0] = int(np.ceil(max_shape[0] / stride) * stride) if size is not None and min(size) > 0:
max_shape[1] = int(np.ceil(max_shape[1] / stride) * stride) max_shape[:len(size)] = size
if align is not None and min(align) > 0:
align_size = np.ceil(max_shape[:len(align)] / align)
max_shape[:len(align)] = align_size.astype('int64') * align
blob_shape = (len(ims), max_shape[0], max_shape[1], 3) # Fill output with the given value.
blob = np.empty(blob_shape, blob_dtype) output_dtype = dtype or arrays[0].dtype
blob[:] = cfg.PIXEL_MEANS output_shape = [len(arrays)] + list(max_shape)
output = np.empty(output_shape, output_dtype)
output[:] = fill_value
for i, im in enumerate(ims): # Copy arrays.
if im.dtype == 'uint16': for i, arr in enumerate(arrays):
im = im.astype(blob_dtype) / 256. copy_slices = (slice(0, d) for d in arr.shape)
blob[i, :im.shape[0], :im.shape[1], :] = im output[(i,) + tuple(copy_slices)] = arr
return blob return output
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import importlib.machinery
import os
import dragon
import numpy as np
from dragon.core.util import six
from dragon.vm import torch
from seetadet.core.config import cfg
def freeze_module(module):
"""Freeze parameters of given module.
Parameters
----------
module : dragon.vm.torch.nn.Module
The module to freeze parameters.
"""
for param in list(module._parameters.keys()):
module._parameters[param].requires_grad = False
module._buffers[param] = module._parameters[param]
del module._parameters[param]
def get_param_groups(module):
"""Separate parameters for different weight decay.
Parameters
----------
module : dragon.vm.torch.nn.Module
The module to collect parameters.
Returns
-------
Sequence[ParamGroup]
The parameter groups.
"""
param_groups = [
{'params': [], 'weight_decay': cfg.SOLVER.WEIGHT_DECAY},
{'params': [], 'weight_decay': 0.},
{'params': [], 'weight_decay': cfg.SOLVER.WEIGHT_DECAY_BIAS},
]
legacy_biases = set()
for name, param in module.named_parameters():
if name.endswith('weight') and param.dim() > 1:
legacy_biases.add(name[:-6] + 'bias')
for name, param in module.named_parameters():
gi = 0 if 'weight' in name and param.dim() > 1 else 1
if gi > 0 and name in legacy_biases:
gi = 2
param_groups[gi]['params'].append(param)
return list(filter(lambda g: len(g['params']) > 0, param_groups))
def load_library(library_prefix):
"""Load a shared library.
Parameters
----------
library_prefix : str
The prefix of library.
"""
loader_details = (
importlib.machinery.ExtensionFileLoader,
importlib.machinery.EXTENSION_SUFFIXES)
library_prefix = os.path.abspath(library_prefix)
lib_dir, fullname = os.path.split(library_prefix)
finder = importlib.machinery.FileFinder(lib_dir, loader_details)
ext_specs = finder.find_spec(fullname)
if ext_specs is None:
raise ImportError(
'Could not find the pre-built library '
'for <%s>.' % library_prefix)
dragon.load_library(ext_specs.origin)
def new_tensor(data, enforce_cpu=False):
"""Create a new tensor from the data.
Parameters
----------
data : array_like
The data value.
enforce_cpu : bool, optional, default=False
**True** to enforce the cpu storage.
Returns
-------
dragon.vm.torch.Tensor
The tensor taken with the data.
"""
if data is None:
return data
if isinstance(data, np.ndarray):
tensor = torch.from_numpy(data)
elif isinstance(data, torch.Tensor):
tensor = data
else:
tensor = torch.tensor(data)
if not enforce_cpu:
tensor = tensor.cuda(cfg.GPU_ID)
return tensor
# Aliases
pickle = six.moves.pickle
...@@ -8,111 +8,74 @@ ...@@ -8,111 +8,74 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Image utilities."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import numpy as np import numpy as np
import numpy.random as npr
import PIL.Image import PIL.Image
import PIL.ImageEnhance import PIL.ImageEnhance
from seetadet.core.config import cfg
def im_resize(img, size=None, scale=None, mode='linear'):
def distort_image(img): """Resize image by the scale or size."""
"""Distort the brightness, contrast and color of an image."""
img = PIL.Image.fromarray(img)
transforms = [PIL.ImageEnhance.Brightness,
PIL.ImageEnhance.Contrast,
PIL.ImageEnhance.Color]
npr.shuffle(transforms)
for transform in transforms:
if npr.uniform() < 0.5:
img = transform(img)
img = img.enhance(1. + np.random.uniform(-.4, .4))
return np.array(img)
def get_image_with_target_size(img, target_size, no_offset=False):
"""Crop or pad an image with the target size."""
im_shape = list(img.shape)
if not isinstance(target_size, (tuple, list)):
target_size = [target_size, target_size]
h_diff = target_size[0] - im_shape[0]
w_diff = target_size[1] - im_shape[1]
def get_param(diff, crop, no_offset):
diff = max(-diff if crop else diff, 0)
return 0 if no_offset else npr.randint(diff + 1)
offset_crop_w = get_param(w_diff, True, no_offset)
offset_crop_h = get_param(h_diff, True, no_offset)
im_shape[:2] = target_size
new_img = np.empty(im_shape, dtype=img.dtype)
new_img[:] = cfg.PIXEL_MEANS
new_img[:img.shape[0], :img.shape[1]] = \
img[offset_crop_h:offset_crop_h + target_size[0],
offset_crop_w:offset_crop_w + target_size[1]]
offset_w = -offset_crop_w
offset_h = -offset_crop_h
return new_img, (offset_h, offset_w, target_size)
def resize_image(img, fx=1.0, fy=1.0, size=None):
"""Resize an image."""
if size is None: if size is None:
size = (int(img.shape[1] * fx), int(img.shape[0] * fy)) if not isinstance(scale, (tuple, list)):
scale = (scale, scale)
h, w = img.shape[:2]
size = int(h * scale[0] + .5), int(w * scale[1] + .5)
else: else:
if not isinstance(size, (tuple, list)): if not isinstance(size, (tuple, list)):
size = (size, size) size = (size, size)
mode = {'linear': PIL.Image.BILINEAR,
'nearest': PIL.Image.NEAREST}[mode]
img = PIL.Image.fromarray(img) img = PIL.Image.fromarray(img)
return np.array(img.resize(size, PIL.Image.BILINEAR)) return np.array(img.resize(size[::-1], mode))
def resize_image_with_target_size( def im_rescale(img, scales, max_size=0, keep_ratio=True):
img, """Rescale image to match the detecting scales."""
target_size,
max_size=0,
random_scales=(1.0, 1.0),
):
"""Resize an image with the target size."""
im_shape = img.shape im_shape = img.shape
max_size = max_size if max_size > 0 else target_size img_list, img_scales = [], []
# Scale along the shortest side if keep_ratio:
im_size_min = np.min(im_shape[:2]) size_min = np.min(im_shape[:2])
im_size_max = np.max(im_shape[:2]) size_max = np.max(im_shape[:2])
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than MAX_SIZE
if np.round(im_scale * im_size_max) > max_size:
im_scale = float(max_size) / float(im_size_max)
# Apply the scale jitter to get a range of dynamic scales
r = random_scales
jitter = r[0] + npr.rand() * (r[1] - r[0])
im_scale *= jitter
return resize_image(img, im_scale, im_scale), im_scale
def scale_image(img, scales, max_size=0):
"""Resize image to match the detecting scales."""
processed_images, image_scales = [], []
if max_size > 0:
im_size_min = np.min(img.shape[:2])
im_size_max = np.max(img.shape[:2])
for target_size in scales: for target_size in scales:
im_scale = float(target_size) / float(im_size_min) im_scale = float(target_size) / float(size_min)
if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE: target_size_max = max_size if max_size > 0 else target_size
im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max) if np.round(im_scale * size_max) > target_size_max:
processed_images.append(resize_image(img, im_scale, im_scale)) im_scale = float(target_size_max) / float(size_max)
image_scales.append(im_scale) img_list.append(im_resize(img, scale=im_scale))
img_scales.append((im_scale, im_scale))
else: else:
for target_size in scales: for target_size in scales:
fy = float(target_size) / img.shape[0] h_scale = float(target_size) / im_shape[0]
fx = float(target_size) / img.shape[1] w_scale = float(target_size) / im_shape[1]
processed_images.append(resize_image(img, size=target_size)) img_list.append(im_resize(img, size=target_size))
image_scales.append([fy, fx]) img_scales.append((h_scale, w_scale))
return processed_images, image_scales return img_list, img_scales
def color_jitter(img, brightness=None, contrast=None, saturation=None):
"""Distort the color of image."""
def add_transform(transforms, type, range):
if range is not None:
if not isinstance(range, (tuple, list)):
range = (1. - range, 1. + range)
transforms.append((type, range))
transforms = []
contrast_first = np.random.rand() < 0.5
add_transform(transforms, PIL.ImageEnhance.Brightness, brightness)
if contrast_first:
add_transform(transforms, PIL.ImageEnhance.Contrast, contrast)
add_transform(transforms, PIL.ImageEnhance.Color, saturation)
if not contrast_first:
add_transform(transforms, PIL.ImageEnhance.Contrast, contrast)
for transform, jitter_range in transforms:
if isinstance(img, np.ndarray):
img = PIL.Image.fromarray(img)
img = transform(img)
img = img.enhance(np.random.uniform(*jitter_range))
return np.asarray(img)
...@@ -7,11 +7,8 @@ ...@@ -7,11 +7,8 @@
# #
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# Codes are based on:
#
# <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/platform/tf_logging.py>
#
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Logging utilities."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -25,7 +22,6 @@ import threading ...@@ -25,7 +22,6 @@ import threading
_logger = None _logger = None
_is_root = True
_logger_lock = threading.Lock() _logger_lock = threading.Lock()
...@@ -38,11 +34,12 @@ def get_logger(): ...@@ -38,11 +34,12 @@ def get_logger():
try: try:
if _logger: if _logger:
return _logger return _logger
logger = _logging.getLogger('SeetaDet') logger = _logging.getLogger('seetadet')
logger.setLevel('INFO') logger.setLevel('INFO')
logger.propagate = False logger.propagate = False
logger._is_root = True
if True: if True:
# Determine whether we are in an interactive environment # Determine whether we are in an interactive environment.
_interactive = False _interactive = False
try: try:
# This is only defined in interactive shells. # This is only defined in interactive shells.
...@@ -108,14 +105,24 @@ def get_verbosity(): ...@@ -108,14 +105,24 @@ def get_verbosity():
def set_verbosity(v): def set_verbosity(v):
"""Sets the threshold for what messages will be logged.""" """Set the threshold for what messages will be logged."""
get_logger().setLevel(v) get_logger().setLevel(v)
def set_root_logger(is_root=True): def set_formatter(fmt=None, datefmt=None):
global _is_root """Set the formatter."""
_is_root = is_root handler = _logging.StreamHandler(_sys.stderr)
handler.setFormatter(_logging.Formatter(fmt, datefmt))
logger = get_logger()
logger.removeHandler(logger.handlers[0])
logger.addHandler(handler)
def set_root(is_root=True):
"""Set logger to the root."""
get_logger()._is_root = is_root
def is_root(): def is_root():
return _is_root """Return logger is the root."""
return get_logger()._is_root
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Mask utilities with boxes."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import copy
import cv2
import numpy as np
import PIL.Image
from seetadet.utils.pycocotools import mask as mask_tools
from seetadet.utils import boxes as box_util
def warp_mask_via_intersection(mask, box1, box2, size):
"""Warp mask via intersection."""
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], min(box2[2], mask.shape[1] - 1))
y2 = min(box1[3], min(box2[3], mask.shape[0] - 1))
if x1 > x2 or y1 > y2:
return None
w = x2 - x1 + 1
h = y2 - y1 + 1
ex_start_y = y1 - box1[1]
ex_start_x = x1 - box1[0]
inter_mask = mask[y1:y2 + 1, x1:x2 + 1]
target_h = box1[3] - box1[1] + 1
target_w = box1[2] - box1[0] + 1
warped_mask = np.zeros((target_h, target_w), dtype='uint8')
warped_mask[ex_start_y:ex_start_y + h,
ex_start_x:ex_start_x + w] = inter_mask
if not isinstance(size, (tuple, list)):
size = (size, size)
mask = PIL.Image.fromarray(warped_mask)
mask = mask.resize((size[1], size[0]), PIL.Image.NEAREST)
return np.array(mask)
def warp_mask_via_polygons(polygons, box, size):
"""Warp mask via polygons."""
w, h = box[2] - box[0], box[3] - box[1]
if not isinstance(size, (tuple, list)):
size = (size, size)
ratio_h = size[0] / max(h, 0.1)
ratio_w = size[1] / max(w, 0.1)
polygons = copy.deepcopy(polygons)
for p in polygons:
p[0::2] = p[0::2] - box[0]
p[1::2] = p[1::2] - box[1]
if ratio_h == ratio_w:
for p in polygons:
p *= ratio_h
else:
for p in polygons:
p[0::2] *= ratio_w
p[1::2] *= ratio_h
rle_objs = mask_tools.frPyObjects(polygons, size[0], size[1])
rle_objs = [mask_tools.merge(rle_objs)]
return mask_tools.decode(rle_objs)[:, :, 0]
def mask_overlap(box1, box2, mask1, mask2):
"""Compute the overlap of two masks."""
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], box2[2])
y2 = min(box1[3], box2[3])
if x1 > x2 or y1 > y2:
return 0
w = x2 - x1 + 1
h = y2 - y1 + 1
# Get masks in the intersection part
start_ya = y1 - box1[1]
start_xa = x1 - box1[0]
inter_mask_a = mask1[start_ya: start_ya + h, start_xa:start_xa + w]
start_yb = y1 - box2[1]
start_xb = x1 - box2[0]
inter_mask_b = mask2[start_yb: start_yb + h, start_xb:start_xb + w]
assert inter_mask_a.shape == inter_mask_b.shape
inter = np.logical_and(inter_mask_b, inter_mask_a).sum()
union = mask1.sum() + mask2.sum() - inter
if union < 1.:
return 0.
return float(inter) / float(union)
def project_masks(
masks,
boxes,
height,
width,
thresh=0.5,
data_format='HWC',
data_order='F',
):
"""Project the predicting masks to a image.
Parameters
----------
masks : numpy.ndarray
The masks packed in (C, H, W) format.
boxes : numpy.ndarray
The predicting bounding boxes.
height : int
The height of image.
width : int
The width of image.
thresh : float, optional, default=0.5
The threshold to binarize floating mask.
data_format : {'HWC', 'CHW'}, optional
The data format of output image.
data_order : {'F', 'C'}, optional
The fortran-style or c-style order.
Returns
-------
numpy.ndarray
The output image.
"""
num_pred = boxes.shape[0]
assert masks.shape[0] == num_pred
mask_shape = [height, width]
if data_format == 'HWC':
mask_shape += [num_pred]
elif data_format == 'CHW':
mask_shape = [num_pred] + mask_shape
else:
raise ValueError('Unknown data format', data_format)
mask_image = np.zeros(mask_shape, 'uint8', data_order)
size = masks[0].shape[0]
scale = (size + 2.) / size
ref_boxes = box_util.expand_boxes(boxes, scale)
ref_boxes = ref_boxes.astype(np.int32)
padded_mask = np.zeros((size + 2, size + 2), 'float32')
for i in range(num_pred):
ref_box = ref_boxes[i, :4]
mask = masks[i]
padded_mask[1:-1, 1:-1] = mask[:, :]
w = ref_box[2] - ref_box[0] + 1
h = ref_box[3] - ref_box[1] + 1
w = np.maximum(w, 1)
h = np.maximum(h, 1)
mask = cv2.resize(padded_mask, (w, h))
mask = np.array(mask >= thresh, 'uint8')
x1 = max(ref_box[0], 0)
y1 = max(ref_box[1], 0)
x2 = min(ref_box[2] + 1, width)
y2 = min(ref_box[3] + 1, height)
if data_format == 'HWC':
mask_image[y1:y2, x1:x2, i] = \
mask[(y1 - ref_box[1]):(y2 - ref_box[1]),
(x1 - ref_box[0]):(x2 - ref_box[0])]
elif data_format == 'CHW':
mask_image[i, y1:y2, x1:x2] = \
mask[(y1 - ref_box[1]):(y2 - ref_box[1]),
(x1 - ref_box[0]):(x2 - ref_box[0])]
return mask_image
...@@ -8,10 +8,12 @@ ...@@ -8,10 +8,12 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""ONNX utilities.""" """Mask utilities."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from seetadet.onnx import nodes as _ from seetadet.utils.mask.helper import mask_from
from seetadet.utils.mask.helper import paste_masks
from seetadet.utils.mask.metrics import mask_overlap
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Helper functions for Mask."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import copy
import cv2
import numpy as np
from pycocotools.mask import decode
from pycocotools.mask import merge
from pycocotools.mask import frPyObjects
from seetadet.utils.bbox import rescale_boxes
from seetadet.utils.image import im_resize
def mask_from_buffer(buffer, size):
"""Return a binary mask from the buffer."""
if not isinstance(size, (tuple, list)):
size = (size, size)
rles = [{'counts': buffer, 'size': size}]
mask = decode(rles)
if mask.shape[2] != 1:
raise ValueError('Mask contains {} instances. '
'Merge them before compressing.'
.format(mask.shape[2]))
return mask[:, :, 0]
def mask_from_polygons(polygons, size, box=None):
"""Return a binary mask from the polygons."""
if not isinstance(size, (tuple, list)):
size = (size, size)
if box is not None:
polygons = copy.deepcopy(polygons)
w, h = box[2] - box[0], box[3] - box[1]
ratio_h = size[0] / max(h, 0.1)
ratio_w = size[1] / max(w, 0.1)
for p in polygons:
p[0::2] = p[0::2] - box[0]
p[1::2] = p[1::2] - box[1]
if ratio_h == ratio_w:
for p in polygons:
p *= ratio_h
else:
for p in polygons:
p[0::2] *= ratio_w
p[1::2] *= ratio_h
rles = frPyObjects(polygons, size[0], size[1])
return decode([merge(rles)])[:, :, 0]
def mask_from_bitmap(bitmap, size, box=None):
"""Return a binary mask from the bitmap."""
if not isinstance(size, (tuple, list)):
size = (size, size)
if box is not None:
box = np.round(box).astype('int64')
bitmap = bitmap[box[1]:box[3] + 1, box[0]:box[2] + 1]
return im_resize(bitmap, size, mode='nearest')
def mask_from(segm, size, box=None):
"""Return a binary mask from the segmentation object."""
if segm is None:
return None
elif isinstance(segm, list):
return mask_from_polygons(segm, size, box)
elif isinstance(segm, np.ndarray):
return mask_from_bitmap(segm, size, box)
elif isinstance(segm, bytes):
return mask_from_buffer(segm, size, box)
else:
raise TypeError('Unknown segmentation type: ' + type(segm))
def paste_masks(masks, boxes, img_size, thresh=0.5, data_order='F'):
"""Paste masks on an image."""
num_boxes = boxes.shape[0]
assert masks.shape[0] == num_boxes
img_shape = list(img_size) + [num_boxes]
output = np.zeros(img_shape, 'uint8', data_order)
size = masks[0].shape[0]
scale_factor = (size + 2.) / size
boxes = rescale_boxes(boxes, scale_factor).astype(np.int32)
padded_mask = np.zeros((size + 2, size + 2), 'float32')
for i in range(num_boxes):
box, mask = boxes[i, :4], masks[i]
padded_mask[1:-1, 1:-1] = mask[:, :]
w = max(box[2] - box[0] + 1, 1)
h = max(box[3] - box[1] + 1, 1)
mask = cv2.resize(padded_mask, (w, h))
mask = np.array(mask >= thresh, 'uint8')
x1, y1 = max(box[0], 0), max(box[1], 0)
x2, y2 = min(box[2] + 1, img_size[1]), min(box[3] + 1, img_size[0])
mask = mask[y1 - box[1]:y2 - box[1], x1 - box[0]:x2 - box[0]]
output[y1:y2, x1:x2, i] = mask
return output
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Mask metrics."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def mask_overlap(box1, box2, mask1, mask2):
"""Compute the overlap of two masks."""
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], box2[2])
y2 = min(box1[3], box2[3])
if x1 > x2 or y1 > y2:
return 0
w = x2 - x1 + 1
h = y2 - y1 + 1
# Get masks in the intersection part.
start_ya = y1 - box1[1]
start_xa = x1 - box1[0]
inter_mask_a = mask1[start_ya: start_ya + h, start_xa:start_xa + w]
start_yb = y1 - box2[1]
start_xb = x1 - box2[0]
inter_mask_b = mask2[start_yb: start_yb + h, start_xb:start_xb + w]
assert inter_mask_a.shape == inter_mask_b.shape
inter = np.logical_and(inter_mask_b, inter_mask_a).sum()
union = mask1.sum() + mask2.sum() - inter
if union < 1.:
return 0.
return float(inter) / float(union)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.modules import det
from seetadet.utils import env
try:
from seetadet.utils.cython_nms import cpu_nms
from seetadet.utils.cython_nms import cpu_soft_nms
except ImportError:
cpu_nms = cpu_soft_nms = print
def gpu_nms(detections, thresh):
"""Filter out the detections using GPU-NMS."""
if detections.shape[0] == 0:
return []
scores = detections[:, 4]
order = scores.argsort()[::-1]
sorted_detections = env.new_tensor(detections[order, :])
keep = det.nms(sorted_detections, iou_threshold=thresh).numpy()
return order[keep]
def nms(detections, thresh):
"""Filter out the detections using NMS."""
if detections.shape[0] == 0:
return []
if cpu_nms is print:
raise ImportError('Failed to load <cython_nms> library.')
return cpu_nms(detections, thresh)
def soft_nms(
detections,
thresh,
method='linear',
sigma=0.5,
score_thresh=0.001,
):
"""Filter out the detections using Soft-NMS."""
if detections.shape[0] == 0:
return []
if cpu_soft_nms is print:
raise ImportError('Failed to load <cython_nms> library.')
methods = {'hard': 0, 'linear': 1, 'gaussian': 2}
if method not in methods:
raise ValueError('Unknown soft nms method:', method)
return cpu_soft_nms(
detections,
thresh,
methods[method],
sigma,
score_thresh,
)
...@@ -8,25 +8,12 @@ ...@@ -8,25 +8,12 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Init modules.""" """Non-Maximum Suppression utilities."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from dragon.vm.torch import nn from seetadet.utils.nms.nms_impl import gpu_nms
from seetadet.utils.nms.nms_impl import nms
from seetadet.utils.nms.nms_impl import soft_nms
def xavier_uniform(weight, mode='fan_in'):
"""The initializer of xavier uniform distribution."""
nn.init.kaiming_uniform_(weight, mode=mode, nonlinearity='linear')
def kaiming_normal(weight, mode='fan_in'):
"""The initializer of kaiming normal distribution."""
nn.init.kaiming_normal_(weight, mode=mode, nonlinearity='relu')
# Aliases
constant = nn.init.constant_
normal = nn.init.normal_
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Implementations of Non-Maximum Suppression."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.ops.normalization import to_tensor
from seetadet.ops.vision import NonMaxSuppression
try:
from seetadet.utils.nms.cython_nms import cpu_nms
from seetadet.utils.nms.cython_nms import cpu_soft_nms
except ImportError:
cpu_nms = cpu_soft_nms = print
def gpu_nms(dets, thresh):
"""Filter out the dets using GPU-NMS."""
if dets.shape[0] == 0:
return []
scores = dets[:, 4]
order = scores.argsort()[::-1]
sorted_dets = to_tensor(dets[order, :])
keep = NonMaxSuppression.apply(sorted_dets, iou_threshold=thresh)
return order[keep.numpy()]
def nms(dets, thresh):
"""Filter out the dets using NMS."""
if dets.shape[0] == 0:
return []
if cpu_nms is print:
raise ImportError('Failed to load <cython_nms> library.')
return cpu_nms(dets, thresh)
def soft_nms(dets, thresh, method='linear', sigma=0.5, score_thresh=0.001):
"""Filter out the dets using Soft-NMS."""
if dets.shape[0] == 0:
return []
if cpu_soft_nms is print:
raise ImportError('Failed to load <cython_nms> library.')
methods = {'hard': 0, 'linear': 1, 'gaussian': 2}
if method not in methods:
raise ValueError('Unknown soft nms method: ' + method)
return cpu_soft_nms(dets, thresh, methods[method], sigma, score_thresh)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import operator
from dragon.vm import torch
from seetadet.modules import nn
def conv_flops(m, inputs, output):
"""Hook to compute flops for a convolution."""
_ = locals() # Unused
k_dim = functools.reduce(operator.mul, m.kernel_size)
out_dim = functools.reduce(operator.mul, output.shape[2:])
out_c, in_c = m.weight.shape[:2]
m.__params__ = (k_dim * in_c + (1 if m.bias else 0)) * out_c
m.__flops__ = m.__params__ * out_dim
def register_flops(module):
"""Register hooks to collect flops info."""
if not hasattr(module, '__flops__'):
module.__flops__ = 0.
for m in module.modules():
if isinstance(m, nn.Conv2d):
m.register_forward_hook(conv_flops)
def collect_flops(module, normalizer=1e6):
"""Collect flops from the last forward."""
total_flops = 0.0
for m in module.modules():
if hasattr(m, '__flops__'):
total_flops += m.__flops__
m.__flops__ = 0.0
return total_flops / normalizer
def benchmark_flops(module, normalizer=1e6):
"""Return the flops by running benchmark once."""
register_flops(module)
collect_flops(module)
original_training = module.training
if original_training:
module.eval()
with torch.no_grad():
module.benchmark()
if original_training:
module.train()
return collect_flops(module, normalizer)
...@@ -8,12 +8,12 @@ ...@@ -8,12 +8,12 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Profiler utilities."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from seetadet.algo.faster_rcnn.anchor_target import AnchorTarget from seetadet.utils.profiler.stats import SmoothedValue
from seetadet.algo.faster_rcnn.proposal import Proposal from seetadet.utils.profiler.timer import Timer
from seetadet.algo.mask_rcnn.data_loader import DataLoader from seetadet.utils.profiler.timer import get_progress
from seetadet.algo.mask_rcnn.proposal_target import ProposalTarget
...@@ -8,6 +8,7 @@ ...@@ -8,6 +8,7 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Trackable statistics."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -18,32 +19,30 @@ import numpy as np ...@@ -18,32 +19,30 @@ import numpy as np
class SmoothedValue(object): class SmoothedValue(object):
"""Track a series of values and provide smoothed report.""" """Track values and provide smoothed report."""
def __init__(self, window_size): def __init__(self, window_size=None):
self.deque = collections.deque(maxlen=window_size) self.deque = collections.deque(maxlen=window_size)
self.series = []
self.total = 0.0 self.total = 0.0
self.count = 0 self.count = 0
def add_value(self, value): def update(self, value):
self.deque.append(value) self.deque.append(value)
self.series.append(value)
self.count += 1 self.count += 1
self.total += value self.total += value
def average(self): def mean(self):
return np.mean(self.deque) return np.mean(self.deque)
def global_average(self):
return self.total / self.count
def median(self): def median(self):
return np.median(self.deque) return np.median(self.deque)
def average(self):
return self.total / self.count
class ExponentialMovingAverage(object): class ExponentialMovingAverage(object):
"""Track a series of values and provide EMA report.""" """Track values and provide EMA report."""
def __init__(self, decay=0.9): def __init__(self, decay=0.9):
self.value = None self.value = None
...@@ -51,7 +50,7 @@ class ExponentialMovingAverage(object): ...@@ -51,7 +50,7 @@ class ExponentialMovingAverage(object):
self.total = 0.0 self.total = 0.0
self.count = 0 self.count = 0
def add_value(self, value): def update(self, value):
if self.value is None: if self.value is None:
self.value = value self.value = value
else: else:
......
...@@ -8,6 +8,7 @@ ...@@ -8,6 +8,7 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Timing functions."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -19,7 +20,7 @@ import time ...@@ -19,7 +20,7 @@ import time
class Timer(object): class Timer(object):
"""A simple timer.""" """Simple timer."""
def __init__(self): def __init__(self):
self.total_time = 0. self.total_time = 0.
...@@ -28,74 +29,32 @@ class Timer(object): ...@@ -28,74 +29,32 @@ class Timer(object):
self.diff = 0. self.diff = 0.
self.average_time = 0. self.average_time = 0.
def add_diff(self, diff, average=True): def add_diff(self, diff, n=1, average=True):
self.total_time += diff self.total_time += diff
self.calls += 1 self.calls += n
self.average_time = self.total_time / self.calls self.average_time = self.total_time / self.calls
if average: return self.average_time if average else self.diff
return self.average_time
else:
return self.diff
@contextlib.contextmanager @contextlib.contextmanager
def tic_and_toc(self): def tic_and_toc(self, n=1, average=True):
try: try:
yield self.tic() yield self.tic()
finally: finally:
self.toc() self.toc(n, average)
def tic(self): def tic(self):
self.start_time = time.time() self.start_time = time.time()
return self
def toc(self, average=True): def toc(self, n=1, average=True):
self.diff = time.time() - self.start_time self.diff = time.time() - self.start_time
self.total_time += self.diff return self.add_diff(self.diff, n, average)
self.calls += 1
self.average_time = self.total_time / self.calls
if average:
return self.average_time
else:
return self.diff
def get_progress_info(timer, step, max_steps):
"""Return a info of current progress.
Parameters
----------
timer : Timer
The timer to get progress.
step : int
The current step.
max_steps : int
The total number of steps.
Returns def get_progress(timer, step, max_steps):
------- """Return the progress information."""
str eta_seconds = timer.average_time * (max_steps - step)
The progress info.
"""
average_time = timer.average_time
eta_seconds = average_time * (max_steps - step)
eta = str(datetime.timedelta(seconds=int(eta_seconds))) eta = str(datetime.timedelta(seconds=int(eta_seconds)))
progress = (step + 1.) / max_steps progress = (step + 1.) / max_steps
return ('< PROGRESS: {:.2%} | SPEED: {:.3f}s / iter | ETA: {} >' return ('< PROGRESS: {:.2%} | SPEED: {:.3f}s / iter | ETA: {} >'
.format(progress, timer.average_time, eta)) .format(progress, timer.average_time, eta))
def new_timers(*args):
"""Return a dict that contains specified timers.
Parameters
----------
args : str...
The key(s) to create timers.
Returns
-------
Dict[Timer]
The timer dict.
"""
return dict([(k, Timer()) for k in args])
__author__ = 'tylin'
__version__ = '2.0'
# Interface for accessing the Microsoft COCO dataset.
# Microsoft COCO is a large image dataset designed for object detection,
# segmentation, and caption generation. pycocotools is a Python API that
# assists in loading, parsing and visualizing the annotations in COCO.
# Please visit http://mscoco.org/ for more information on COCO, including
# for the data, paper, and tutorials. The exact format of the annotations
# is also described on the COCO website. For example usage of the pycocotools
# please see pycocotools_demo.ipynb. In addition to this API, please download both
# the COCO images and annotations in order to run the demo.
# An alternative to using the API is to load the annotations directly
# into Python dictionary
# Using the API provides additional utility functions. Note that this API
# supports both *instance* and *caption* annotations. In the case of
# captions not all functions are defined (e.g. categories are undefined).
# The following API functions are defined:
# COCO - COCO api class that loads COCO annotation file and prepare data structures.
# decodeMask - Decode binary mask M encoded via run-length encoding.
# encodeMask - Encode binary mask M using run-length encoding.
# getAnnIds - Get ann ids that satisfy given filter conditions.
# getCatIds - Get cat ids that satisfy given filter conditions.
# getImgIds - Get img ids that satisfy given filter conditions.
# loadAnns - Load anns with the specified ids.
# loadCats - Load cats with the specified ids.
# loadImgs - Load imgs with the specified ids.
# annToMask - Convert segmentation in an annotation to binary mask.
# showAnns - Display the specified annotations.
# loadRes - Load algorithm results and create API for accessing them.
# download - Download COCO images from mscoco.org server.
# Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
# Help on each functions can be accessed by: "help COCO>function".
# See also COCO>decodeMask,
# COCO>encodeMask, COCO>getAnnIds, COCO>getCatIds,
# COCO>getImgIds, COCO>loadAnns, COCO>loadCats,
# COCO>loadImgs, COCO>annToMask, COCO>showAnns
# Microsoft COCO Toolbox. version 2.0
# Data, paper, and tutorials available at: http://mscoco.org/
# Code written by Piotr Dollar and Tsung-Yi Lin, 2014.
# Licensed under the Simplified BSD License [see bsd.txt]
import json
import time
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from matplotlib.patches import Polygon
import numpy as np
import copy
import itertools
from . import mask as maskUtils
import os
from collections import defaultdict
import sys
PYTHON_VERSION = sys.version_info[0]
if PYTHON_VERSION == 2:
from urllib import urlretrieve
elif PYTHON_VERSION == 3:
from urllib.request import urlretrieve
def _isArrayLike(obj):
return hasattr(obj, '__iter__') and hasattr(obj, '__len__')
class COCO:
def __init__(self, annotation_file=None):
"""
Constructor of Microsoft COCO helper class for reading and visualizing annotations.
:param annotation_file (str): location of annotation file
:param image_folder (str): location to the folder that hosts images.
:return:
"""
# load dataset
self.dataset,self.anns,self.cats,self.imgs = dict(),dict(),dict(),dict()
self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list)
if not annotation_file == None:
print('loading annotations into memory...')
tic = time.time()
dataset = json.load(open(annotation_file, 'r'))
assert type(dataset)==dict, 'annotation file format {} not supported'.format(type(dataset))
print('Done (t={:0.2f}s)'.format(time.time()- tic))
self.dataset = dataset
self.createIndex()
def createIndex(self):
# create index
print('creating index...')
anns, cats, imgs = {}, {}, {}
imgToAnns,catToImgs = defaultdict(list),defaultdict(list)
if 'annotations' in self.dataset:
for ann in self.dataset['annotations']:
imgToAnns[ann['image_id']].append(ann)
anns[ann['id']] = ann
if 'images' in self.dataset:
for img in self.dataset['images']:
imgs[img['id']] = img
if 'categories' in self.dataset:
for cat in self.dataset['categories']:
cats[cat['id']] = cat
if 'annotations' in self.dataset and 'categories' in self.dataset:
for ann in self.dataset['annotations']:
catToImgs[ann['category_id']].append(ann['image_id'])
print('index created!')
# create class members
self.anns = anns
self.imgToAnns = imgToAnns
self.catToImgs = catToImgs
self.imgs = imgs
self.cats = cats
def info(self):
"""
Print information about the annotation file.
:return:
"""
for key, value in self.dataset['info'].items():
print('{}: {}'.format(key, value))
def getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None):
"""
Get ann ids that satisfy given filter conditions. default skips that filter
:param imgIds (int array) : get anns for given imgs
catIds (int array) : get anns for given cats
areaRng (float array) : get anns for given area range (e.g. [0 inf])
iscrowd (boolean) : get anns for given crowd label (False or True)
:return: ids (int array) : integer array of ann ids
"""
imgIds = imgIds if _isArrayLike(imgIds) else [imgIds]
catIds = catIds if _isArrayLike(catIds) else [catIds]
if len(imgIds) == len(catIds) == len(areaRng) == 0:
anns = self.dataset['annotations']
else:
if not len(imgIds) == 0:
lists = [self.imgToAnns[imgId] for imgId in imgIds if imgId in self.imgToAnns]
anns = list(itertools.chain.from_iterable(lists))
else:
anns = self.dataset['annotations']
anns = anns if len(catIds) == 0 else [ann for ann in anns if ann['category_id'] in catIds]
anns = anns if len(areaRng) == 0 else [ann for ann in anns if ann['area'] > areaRng[0] and ann['area'] < areaRng[1]]
if not iscrowd == None:
ids = [ann['id'] for ann in anns if ann['iscrowd'] == iscrowd]
else:
ids = [ann['id'] for ann in anns]
return ids
def getCatIds(self, catNms=[], supNms=[], catIds=[]):
"""
filtering parameters. default skips that filter.
:param catNms (str array) : get cats for given cat names
:param supNms (str array) : get cats for given supercategory names
:param catIds (int array) : get cats for given cat ids
:return: ids (int array) : integer array of cat ids
"""
catNms = catNms if _isArrayLike(catNms) else [catNms]
supNms = supNms if _isArrayLike(supNms) else [supNms]
catIds = catIds if _isArrayLike(catIds) else [catIds]
if len(catNms) == len(supNms) == len(catIds) == 0:
cats = self.dataset['categories']
else:
cats = self.dataset['categories']
cats = cats if len(catNms) == 0 else [cat for cat in cats if cat['name'] in catNms]
cats = cats if len(supNms) == 0 else [cat for cat in cats if cat['supercategory'] in supNms]
cats = cats if len(catIds) == 0 else [cat for cat in cats if cat['id'] in catIds]
ids = [cat['id'] for cat in cats]
return ids
def getImgIds(self, imgIds=[], catIds=[]):
'''
Get img ids that satisfy given filter conditions.
:param imgIds (int array) : get imgs for given ids
:param catIds (int array) : get imgs with all given cats
:return: ids (int array) : integer array of img ids
'''
imgIds = imgIds if _isArrayLike(imgIds) else [imgIds]
catIds = catIds if _isArrayLike(catIds) else [catIds]
if len(imgIds) == len(catIds) == 0:
ids = self.imgs.keys()
else:
ids = set(imgIds)
for i, catId in enumerate(catIds):
if i == 0 and len(ids) == 0:
ids = set(self.catToImgs[catId])
else:
ids &= set(self.catToImgs[catId])
return list(ids)
def loadAnns(self, ids=[]):
"""
Load anns with the specified ids.
:param ids (int array) : integer ids specifying anns
:return: anns (object array) : loaded ann objects
"""
if _isArrayLike(ids):
return [self.anns[id] for id in ids]
elif type(ids) == int:
return [self.anns[ids]]
def loadCats(self, ids=[]):
"""
Load cats with the specified ids.
:param ids (int array) : integer ids specifying cats
:return: cats (object array) : loaded cat objects
"""
if _isArrayLike(ids):
return [self.cats[id] for id in ids]
elif type(ids) == int:
return [self.cats[ids]]
def loadImgs(self, ids=[]):
"""
Load anns with the specified ids.
:param ids (int array) : integer ids specifying img
:return: imgs (object array) : loaded img objects
"""
if _isArrayLike(ids):
return [self.imgs[id] for id in ids]
elif type(ids) == int:
return [self.imgs[ids]]
def showAnns(self, anns):
"""
Display the specified annotations.
:param anns (array of object): annotations to display
:return: None
"""
if len(anns) == 0:
return 0
if 'segmentation' in anns[0] or 'keypoints' in anns[0]:
datasetType = 'instances'
elif 'caption' in anns[0]:
datasetType = 'captions'
else:
raise Exception('datasetType not supported')
if datasetType == 'instances':
ax = plt.gca()
ax.set_autoscale_on(False)
polygons = []
color = []
for ann in anns:
c = (np.random.random((1, 3))*0.6+0.4).tolist()[0]
if 'segmentation' in ann:
if type(ann['segmentation']) == list:
# polygon
for seg in ann['segmentation']:
poly = np.array(seg).reshape((int(len(seg)/2), 2))
polygons.append(Polygon(poly))
color.append(c)
else:
# mask
t = self.imgs[ann['image_id']]
if type(ann['segmentation']['counts']) == list:
rle = maskUtils.frPyObjects([ann['segmentation']], t['height'], t['width'])
else:
rle = [ann['segmentation']]
m = maskUtils.decode(rle)
img = np.ones( (m.shape[0], m.shape[1], 3) )
if ann['iscrowd'] == 1:
color_mask = np.array([2.0,166.0,101.0])/255
if ann['iscrowd'] == 0:
color_mask = np.random.random((1, 3)).tolist()[0]
for i in range(3):
img[:,:,i] = color_mask[i]
ax.imshow(np.dstack( (img, m*0.5) ))
if 'keypoints' in ann and type(ann['keypoints']) == list:
# turn skeleton into zero-based index
sks = np.array(self.loadCats(ann['category_id'])[0]['skeleton'])-1
kp = np.array(ann['keypoints'])
x = kp[0::3]
y = kp[1::3]
v = kp[2::3]
for sk in sks:
if np.all(v[sk]>0):
plt.plot(x[sk],y[sk], linewidth=3, color=c)
plt.plot(x[v>0], y[v>0],'o',markersize=8, markerfacecolor=c, markeredgecolor='k',markeredgewidth=2)
plt.plot(x[v>1], y[v>1],'o',markersize=8, markerfacecolor=c, markeredgecolor=c, markeredgewidth=2)
p = PatchCollection(polygons, facecolor=color, linewidths=0, alpha=0.4)
ax.add_collection(p)
p = PatchCollection(polygons, facecolor='none', edgecolors=color, linewidths=2)
ax.add_collection(p)
elif datasetType == 'captions':
for ann in anns:
print(ann['caption'])
def loadRes(self, resFile):
"""
Load result file and return a result api object.
:param resFile (str) : file name of result file
:return: res (obj) : result api object
"""
res = COCO()
res.dataset['images'] = [img for img in self.dataset['images']]
print('Loading and preparing results...')
tic = time.time()
if type(resFile) == str or type(resFile) == unicode:
anns = json.load(open(resFile))
elif type(resFile) == np.ndarray:
anns = self.loadNumpyAnnotations(resFile)
else:
anns = resFile
assert type(anns) == list, 'results in not an array of objects'
annsImgIds = [ann['image_id'] for ann in anns]
assert set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())), \
'Results do not correspond to current coco set'
if 'caption' in anns[0]:
imgIds = set([img['id'] for img in res.dataset['images']]) & set([ann['image_id'] for ann in anns])
res.dataset['images'] = [img for img in res.dataset['images'] if img['id'] in imgIds]
for id, ann in enumerate(anns):
ann['id'] = id+1
elif 'bbox' in anns[0] and not anns[0]['bbox'] == []:
res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
for id, ann in enumerate(anns):
bb = ann['bbox']
x1, x2, y1, y2 = [bb[0], bb[0]+bb[2], bb[1], bb[1]+bb[3]]
if not 'segmentation' in ann:
ann['segmentation'] = [[x1, y1, x1, y2, x2, y2, x2, y1]]
ann['area'] = bb[2]*bb[3]
ann['id'] = id+1
ann['iscrowd'] = 0
elif 'segmentation' in anns[0]:
res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
for id, ann in enumerate(anns):
# now only support compressed RLE format as segmentation results
ann['area'] = maskUtils.area([ann['segmentation']])[0]
if not 'bbox' in ann:
ann['bbox'] = maskUtils.toBbox([ann['segmentation']])[0]
ann['id'] = id+1
ann['iscrowd'] = 0
elif 'keypoints' in anns[0]:
res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
for id, ann in enumerate(anns):
s = ann['keypoints']
x = s[0::3]
y = s[1::3]
x0,x1,y0,y1 = np.min(x), np.max(x), np.min(y), np.max(y)
ann['area'] = (x1-x0)*(y1-y0)
ann['id'] = id + 1
ann['bbox'] = [x0,y0,x1-x0,y1-y0]
print('DONE (t={:0.2f}s)'.format(time.time()- tic))
res.dataset['annotations'] = anns
res.createIndex()
return res
def download(self, tarDir = None, imgIds = [] ):
'''
Download COCO images from mscoco.org server.
:param tarDir (str): COCO results directory name
imgIds (list): images to be downloaded
:return:
'''
if tarDir is None:
print('Please specify target directory')
return -1
if len(imgIds) == 0:
imgs = self.imgs.values()
else:
imgs = self.loadImgs(imgIds)
N = len(imgs)
if not os.path.exists(tarDir):
os.makedirs(tarDir)
for i, img in enumerate(imgs):
tic = time.time()
fname = os.path.join(tarDir, img['file_name'])
if not os.path.exists(fname):
urlretrieve(img['coco_url'], fname)
print('downloaded {}/{} images (t={:0.1f}s)'.format(i, N, time.time()- tic))
def loadNumpyAnnotations(self, data):
"""
Convert result data from a numpy array [Nx7] where each row contains {imageID,x1,y1,w,h,score,class}
:param data (numpy.ndarray)
:return: annotations (python nested list)
"""
print('Converting ndarray to lists...')
assert(type(data) == np.ndarray)
print(data.shape)
assert(data.shape[1] == 7)
N = data.shape[0]
ann = []
for i in range(N):
if i % 1000000 == 0:
print('{}/{}'.format(i,N))
ann += [{
'image_id' : int(data[i, 0]),
'bbox' : [ data[i, 1], data[i, 2], data[i, 3], data[i, 4] ],
'score' : data[i, 5],
'category_id': int(data[i, 6]),
}]
return ann
def annToRLE(self, ann):
"""
Convert annotation which can be polygons, uncompressed RLE to RLE.
:return: binary mask (numpy 2D array)
"""
t = self.imgs[ann['image_id']]
h, w = t['height'], t['width']
segm = ann['segmentation']
if type(segm) == list:
# polygon -- a single object might consist of multiple parts
# we merge all parts into one mask rle code
rles = maskUtils.frPyObjects(segm, h, w)
rle = maskUtils.merge(rles)
elif type(segm['counts']) == list:
# uncompressed RLE
rle = maskUtils.frPyObjects(segm, h, w)
else:
# rle
rle = ann['segmentation']
return rle
def annToMask(self, ann):
"""
Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask.
:return: binary mask (numpy 2D array)
"""
rle = self.annToRLE(ann)
m = maskUtils.decode(rle)
return m
__author__ = 'tsungyi'
import numpy as np
import datetime
import time
from collections import defaultdict
from . import mask as maskUtils
import copy
class COCOeval:
# Interface for evaluating detection on the Microsoft COCO dataset.
#
# The usage for CocoEval is as follows:
# cocoGt=..., cocoDt=... # load dataset and results
# E = CocoEval(cocoGt,cocoDt); # initialize CocoEval object
# E.params.recThrs = ...; # set parameters as desired
# E.evaluate(); # run per image evaluation
# E.accumulate(); # accumulate per image results
# E.summarize(); # display summary metrics of results
# For example usage see evalDemo.m and http://mscoco.org/.
#
# The evaluation parameters are as follows (defaults in brackets):
# imgIds - [all] N img ids to use for evaluation
# catIds - [all] K cat ids to use for evaluation
# iouThrs - [.5:.05:.95] T=10 IoU thresholds for evaluation
# recThrs - [0:.01:1] R=101 recall thresholds for evaluation
# areaRng - [...] A=4 object area ranges for evaluation
# maxDets - [1 10 100] M=3 thresholds on max detections per image
# iouType - ['segm'] set iouType to 'segm', 'bbox' or 'keypoints'
# iouType replaced the now DEPRECATED useSegm parameter.
# useCats - [1] if true use category labels for evaluation
# Note: if useCats=0 category labels are ignored as in proposal scoring.
# Note: multiple areaRngs [Ax2] and maxDets [Mx1] can be specified.
#
# evaluate(): evaluates detections on every image and every category and
# concats the results into the "evalImgs" with fields:
# dtIds - [1xD] id for each of the D detections (dt)
# gtIds - [1xG] id for each of the G ground truths (gt)
# dtMatches - [TxD] matching gt id at each IoU or 0
# gtMatches - [TxG] matching dt id at each IoU or 0
# dtScores - [1xD] confidence of each dt
# gtIgnore - [1xG] ignore flag for each gt
# dtIgnore - [TxD] ignore flag for each dt at each IoU
#
# accumulate(): accumulates the per-image, per-category evaluation
# results in "evalImgs" into the dictionary "eval" with fields:
# params - parameters used for evaluation
# date - date evaluation was performed
# counts - [T,R,K,A,M] parameter dimensions (see above)
# precision - [TxRxKxAxM] precision for every evaluation setting
# recall - [TxKxAxM] max recall for every evaluation setting
# Note: precision and recall==-1 for settings with no gt objects.
#
# See also coco, mask, pycocoDemo, pycocoEvalDemo
#
# Microsoft COCO Toolbox. version 2.0
# Data, paper, and tutorials available at: http://mscoco.org/
# Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
# Licensed under the Simplified BSD License [see coco/license.txt]
def __init__(self, cocoGt=None, cocoDt=None, iouType='segm'):
'''
Initialize CocoEval using coco APIs for gt and dt
:param cocoGt: coco object with ground truth annotations
:param cocoDt: coco object with detection results
:return: None
'''
if not iouType:
print('iouType not specified. use default iouType segm')
self.cocoGt = cocoGt # ground truth COCO API
self.cocoDt = cocoDt # detections COCO API
self.params = {} # evaluation parameters
self.evalImgs = defaultdict(list) # per-image per-category evaluation results [KxAxI] elements
self.eval = {} # accumulated evaluation results
self._gts = defaultdict(list) # gt for evaluation
self._dts = defaultdict(list) # dt for evaluation
self.params = Params(iouType=iouType) # parameters
self._paramsEval = {} # parameters for evaluation
self.stats = [] # result summarization
self.ious = {} # ious between all gts and dts
if not cocoGt is None:
self.params.imgIds = sorted(cocoGt.getImgIds())
self.params.catIds = sorted(cocoGt.getCatIds())
def _prepare(self):
'''
Prepare ._gts and ._dts for evaluation based on params
:return: None
'''
def _toMask(anns, coco):
# modify ann['segmentation'] by reference
for ann in anns:
rle = coco.annToRLE(ann)
ann['segmentation'] = rle
p = self.params
if p.useCats:
gts=self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
else:
gts=self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds))
dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds))
# convert ground truth to mask if iouType == 'segm'
if p.iouType == 'segm':
_toMask(gts, self.cocoGt)
_toMask(dts, self.cocoDt)
# set ignore flag
for gt in gts:
gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']
if p.iouType == 'keypoints':
gt['ignore'] = (gt['num_keypoints'] == 0) or gt['ignore']
self._gts = defaultdict(list) # gt for evaluation
self._dts = defaultdict(list) # dt for evaluation
for gt in gts:
self._gts[gt['image_id'], gt['category_id']].append(gt)
for dt in dts:
self._dts[dt['image_id'], dt['category_id']].append(dt)
self.evalImgs = defaultdict(list) # per-image per-category evaluation results
self.eval = {} # accumulated evaluation results
def evaluate(self):
'''
Run per image evaluation on given images and store results (a list of dict) in self.evalImgs
:return: None
'''
tic = time.time()
print('Running per image evaluation...')
p = self.params
# add backward compatibility if useSegm is specified in params
if not p.useSegm is None:
p.iouType = 'segm' if p.useSegm == 1 else 'bbox'
print('useSegm (deprecated) is not None. Running {} evaluation'.format(p.iouType))
print('Evaluate annotation type *{}*'.format(p.iouType))
p.imgIds = list(np.unique(p.imgIds))
if p.useCats:
p.catIds = list(np.unique(p.catIds))
p.maxDets = sorted(p.maxDets)
self.params=p
self._prepare()
# loop through images, area range, max detection number
catIds = p.catIds if p.useCats else [-1]
if p.iouType == 'segm' or p.iouType == 'bbox':
computeIoU = self.computeIoU
elif p.iouType == 'keypoints':
computeIoU = self.computeOks
self.ious = {(imgId, catId): computeIoU(imgId, catId) \
for imgId in p.imgIds
for catId in catIds}
evaluateImg = self.evaluateImg
maxDet = p.maxDets[-1]
self.evalImgs = [evaluateImg(imgId, catId, areaRng, maxDet)
for catId in catIds
for areaRng in p.areaRng
for imgId in p.imgIds
]
self._paramsEval = copy.deepcopy(self.params)
toc = time.time()
print('DONE (t={:0.2f}s).'.format(toc-tic))
def computeIoU(self, imgId, catId):
p = self.params
if p.useCats:
gt = self._gts[imgId,catId]
dt = self._dts[imgId,catId]
else:
gt = [_ for cId in p.catIds for _ in self._gts[imgId,cId]]
dt = [_ for cId in p.catIds for _ in self._dts[imgId,cId]]
if len(gt) == 0 and len(dt) ==0:
return []
inds = np.argsort([-d['score'] for d in dt], kind='mergesort')
dt = [dt[i] for i in inds]
if len(dt) > p.maxDets[-1]:
dt=dt[0:p.maxDets[-1]]
if p.iouType == 'segm':
g = [g['segmentation'] for g in gt]
d = [d['segmentation'] for d in dt]
elif p.iouType == 'bbox':
g = [g['bbox'] for g in gt]
d = [d['bbox'] for d in dt]
else:
raise Exception('unknown iouType for iou computation')
# compute iou between each dt and gt region
iscrowd = [int(o['iscrowd']) for o in gt]
ious = maskUtils.iou(d,g,iscrowd)
return ious
def computeOks(self, imgId, catId):
p = self.params
# dimention here should be Nxm
gts = self._gts[imgId, catId]
dts = self._dts[imgId, catId]
inds = np.argsort([-d['score'] for d in dts], kind='mergesort')
dts = [dts[i] for i in inds]
if len(dts) > p.maxDets[-1]:
dts = dts[0:p.maxDets[-1]]
# if len(gts) == 0 and len(dts) == 0:
if len(gts) == 0 or len(dts) == 0:
return []
ious = np.zeros((len(dts), len(gts)))
sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62,.62, 1.07, 1.07, .87, .87, .89, .89])/10.0
vars = (sigmas * 2)**2
k = len(sigmas)
# compute oks between each detection and ground truth object
for j, gt in enumerate(gts):
# create bounds for ignore regions(double the gt bbox)
g = np.array(gt['keypoints'])
xg = g[0::3]; yg = g[1::3]; vg = g[2::3]
k1 = np.count_nonzero(vg > 0)
bb = gt['bbox']
x0 = bb[0] - bb[2]; x1 = bb[0] + bb[2] * 2
y0 = bb[1] - bb[3]; y1 = bb[1] + bb[3] * 2
for i, dt in enumerate(dts):
d = np.array(dt['keypoints'])
xd = d[0::3]; yd = d[1::3]
if k1>0:
# measure the per-keypoint distance if keypoints visible
dx = xd - xg
dy = yd - yg
else:
# measure minimum distance to keypoints in (x0,y0) & (x1,y1)
z = np.zeros((k))
dx = np.max((z, x0-xd),axis=0)+np.max((z, xd-x1),axis=0)
dy = np.max((z, y0-yd),axis=0)+np.max((z, yd-y1),axis=0)
e = (dx**2 + dy**2) / vars / (gt['area']+np.spacing(1)) / 2
if k1 > 0:
e=e[vg > 0]
ious[i, j] = np.sum(np.exp(-e)) / e.shape[0]
return ious
def evaluateImg(self, imgId, catId, aRng, maxDet):
'''
perform evaluation for single category and image
:return: dict (single image results)
'''
p = self.params
if p.useCats:
gt = self._gts[imgId,catId]
dt = self._dts[imgId,catId]
else:
gt = [_ for cId in p.catIds for _ in self._gts[imgId,cId]]
dt = [_ for cId in p.catIds for _ in self._dts[imgId,cId]]
if len(gt) == 0 and len(dt) ==0:
return None
for g in gt:
if g['ignore'] or (g['area']<aRng[0] or g['area']>aRng[1]):
g['_ignore'] = 1
else:
g['_ignore'] = 0
# sort dt highest score first, sort gt ignore last
gtind = np.argsort([g['_ignore'] for g in gt], kind='mergesort')
gt = [gt[i] for i in gtind]
dtind = np.argsort([-d['score'] for d in dt], kind='mergesort')
dt = [dt[i] for i in dtind[0:maxDet]]
iscrowd = [int(o['iscrowd']) for o in gt]
# load computed ious
ious = self.ious[imgId, catId][:, gtind] if len(self.ious[imgId, catId]) > 0 else self.ious[imgId, catId]
T = len(p.iouThrs)
G = len(gt)
D = len(dt)
gtm = np.zeros((T,G))
dtm = np.zeros((T,D))
gtIg = np.array([g['_ignore'] for g in gt])
dtIg = np.zeros((T,D))
if not len(ious)==0:
for tind, t in enumerate(p.iouThrs):
for dind, d in enumerate(dt):
# information about best match so far (m=-1 -> unmatched)
iou = min([t,1-1e-10])
m = -1
for gind, g in enumerate(gt):
# if this gt already matched, and not a crowd, continue
if gtm[tind,gind]>0 and not iscrowd[gind]:
continue
# if dt matched to reg gt, and on ignore gt, stop
if m>-1 and gtIg[m]==0 and gtIg[gind]==1:
break
# continue to next gt unless better match made
if ious[dind,gind] < iou:
continue
# if match successful and best so far, store appropriately
iou=ious[dind,gind]
m=gind
# if match made store id of match for both dt and gt
if m ==-1:
continue
dtIg[tind,dind] = gtIg[m]
dtm[tind,dind] = gt[m]['id']
gtm[tind,m] = d['id']
# set unmatched detections outside of area range to ignore
a = np.array([d['area']<aRng[0] or d['area']>aRng[1] for d in dt]).reshape((1, len(dt)))
dtIg = np.logical_or(dtIg, np.logical_and(dtm==0, np.repeat(a,T,0)))
# store results for given image and category
return {
'image_id': imgId,
'category_id': catId,
'aRng': aRng,
'maxDet': maxDet,
'dtIds': [d['id'] for d in dt],
'gtIds': [g['id'] for g in gt],
'dtMatches': dtm,
'gtMatches': gtm,
'dtScores': [d['score'] for d in dt],
'gtIgnore': gtIg,
'dtIgnore': dtIg,
}
def accumulate(self, p = None):
'''
Accumulate per image evaluation results and store the result in self.eval
:param p: input params for evaluation
:return: None
'''
print('Accumulating evaluation results...')
tic = time.time()
if not self.evalImgs:
print('Please run evaluate() first')
# allows input customized parameters
if p is None:
p = self.params
p.catIds = p.catIds if p.useCats == 1 else [-1]
T = len(p.iouThrs)
R = len(p.recThrs)
K = len(p.catIds) if p.useCats else 1
A = len(p.areaRng)
M = len(p.maxDets)
precision = -np.ones((T,R,K,A,M)) # -1 for the precision of absent categories
recall = -np.ones((T,K,A,M))
scores = -np.ones((T,R,K,A,M))
# create dictionary for future indexing
_pe = self._paramsEval
catIds = _pe.catIds if _pe.useCats else [-1]
setK = set(catIds)
setA = set(map(tuple, _pe.areaRng))
setM = set(_pe.maxDets)
setI = set(_pe.imgIds)
# get inds to evaluate
k_list = [n for n, k in enumerate(p.catIds) if k in setK]
m_list = [m for n, m in enumerate(p.maxDets) if m in setM]
a_list = [n for n, a in enumerate(map(lambda x: tuple(x), p.areaRng)) if a in setA]
i_list = [n for n, i in enumerate(p.imgIds) if i in setI]
I0 = len(_pe.imgIds)
A0 = len(_pe.areaRng)
# retrieve E at each category, area range, and max number of detections
for k, k0 in enumerate(k_list):
Nk = k0*A0*I0
for a, a0 in enumerate(a_list):
Na = a0*I0
for m, maxDet in enumerate(m_list):
E = [self.evalImgs[Nk + Na + i] for i in i_list]
E = [e for e in E if not e is None]
if len(E) == 0:
continue
dtScores = np.concatenate([e['dtScores'][0:maxDet] for e in E])
# different sorting method generates slightly different results.
# mergesort is used to be consistent as Matlab implementation.
inds = np.argsort(-dtScores, kind='mergesort')
dtScoresSorted = dtScores[inds]
dtm = np.concatenate([e['dtMatches'][:,0:maxDet] for e in E], axis=1)[:,inds]
dtIg = np.concatenate([e['dtIgnore'][:,0:maxDet] for e in E], axis=1)[:,inds]
gtIg = np.concatenate([e['gtIgnore'] for e in E])
npig = np.count_nonzero(gtIg==0 )
if npig == 0:
continue
tps = np.logical_and( dtm, np.logical_not(dtIg) )
fps = np.logical_and(np.logical_not(dtm), np.logical_not(dtIg) )
tp_sum = np.cumsum(tps, axis=1).astype(dtype=np.float)
fp_sum = np.cumsum(fps, axis=1).astype(dtype=np.float)
for t, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
tp = np.array(tp)
fp = np.array(fp)
nd = len(tp)
rc = tp / npig
pr = tp / (fp+tp+np.spacing(1))
fn = npig - tp
tn = nd - tp - fp - fn
q = np.zeros((R,))
ss = np.zeros((R,))
if nd:
recall[t,k,a,m] = rc[-1]
else:
recall[t,k,a,m] = 0
# numpy is slow without cython optimization for accessing elements
# use python array gets significant speed improvement
pr = pr.tolist(); q = q.tolist()
for i in range(nd-1, 0, -1):
if pr[i] > pr[i-1]:
pr[i-1] = pr[i]
inds = np.searchsorted(rc, p.recThrs, side='left')
try:
for ri, pi in enumerate(inds):
q[ri] = pr[pi]
ss[ri] = dtScoresSorted[pi]
except:
pass
precision[t,:,k,a,m] = np.array(q)
scores[t,:,k,a,m] = np.array(ss)
self.eval = {
'params': p,
'counts': [T, R, K, A, M],
'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'precision': precision,
'recall': recall,
'scores': scores,
}
toc = time.time()
print('DONE (t={:0.2f}s).'.format( toc-tic))
def summarize(self):
'''
Compute and display summary metrics for evaluation results.
Note this functin can *only* be applied on the default parameter setting
'''
def _summarize( ap=1, iouThr=None, areaRng='all', maxDets=100 ):
p = self.params
iStr = ' {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}'
titleStr = 'Average Precision' if ap == 1 else 'Average Recall'
typeStr = '(AP)' if ap==1 else '(AR)'
iouStr = '{:0.2f}:{:0.2f}'.format(p.iouThrs[0], p.iouThrs[-1]) \
if iouThr is None else '{:0.2f}'.format(iouThr)
aind = [i for i, aRng in enumerate(p.areaRngLbl) if aRng == areaRng]
mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets]
if ap == 1:
# dimension of precision: [TxRxKxAxM]
s = self.eval['precision']
# IoU
if iouThr is not None:
t = np.where(iouThr == p.iouThrs)[0]
s = s[t]
s = s[:,:,:,aind,mind]
else:
# dimension of recall: [TxKxAxM]
s = self.eval['recall']
if iouThr is not None:
t = np.where(iouThr == p.iouThrs)[0]
s = s[t]
s = s[:,:,aind,mind]
if len(s[s>-1])==0:
mean_s = -1
else:
mean_s = np.mean(s[s>-1])
print(iStr.format(titleStr, typeStr, iouStr, areaRng, maxDets, mean_s))
return mean_s
def _summarizeDets():
stats = np.zeros((12,))
stats[0] = _summarize(1)
stats[1] = _summarize(1, iouThr=.5, maxDets=self.params.maxDets[2])
stats[2] = _summarize(1, iouThr=.75, maxDets=self.params.maxDets[2])
stats[3] = _summarize(1, areaRng='small', maxDets=self.params.maxDets[2])
stats[4] = _summarize(1, areaRng='medium', maxDets=self.params.maxDets[2])
stats[5] = _summarize(1, areaRng='large', maxDets=self.params.maxDets[2])
stats[6] = _summarize(0, maxDets=self.params.maxDets[0])
stats[7] = _summarize(0, maxDets=self.params.maxDets[1])
stats[8] = _summarize(0, maxDets=self.params.maxDets[2])
stats[9] = _summarize(0, areaRng='small', maxDets=self.params.maxDets[2])
stats[10] = _summarize(0, areaRng='medium', maxDets=self.params.maxDets[2])
stats[11] = _summarize(0, areaRng='large', maxDets=self.params.maxDets[2])
return stats
def _summarizeKps():
stats = np.zeros((10,))
stats[0] = _summarize(1, maxDets=20)
stats[1] = _summarize(1, maxDets=20, iouThr=.5)
stats[2] = _summarize(1, maxDets=20, iouThr=.75)
stats[3] = _summarize(1, maxDets=20, areaRng='medium')
stats[4] = _summarize(1, maxDets=20, areaRng='large')
stats[5] = _summarize(0, maxDets=20)
stats[6] = _summarize(0, maxDets=20, iouThr=.5)
stats[7] = _summarize(0, maxDets=20, iouThr=.75)
stats[8] = _summarize(0, maxDets=20, areaRng='medium')
stats[9] = _summarize(0, maxDets=20, areaRng='large')
return stats
if not self.eval:
raise Exception('Please run accumulate() first')
iouType = self.params.iouType
if iouType == 'segm' or iouType == 'bbox':
summarize = _summarizeDets
elif iouType == 'keypoints':
summarize = _summarizeKps
self.stats = summarize()
def prs(self):
def _summarize(iouThr=None, areaRng='all', maxDets=100):
p = self.params
iStr = '[ IoU={:<9} | area={:>6} | maxDets={:>3} ]'
iouStr = '%0.2f:%0.2f' % (p.iouThrs[0], p.iouThrs[-1]) if iouThr is None else '%0.2f' % (iouThr)
areaStr = areaRng
maxDetsStr = '%d' % (maxDets)
aind = [i for i, aRng in enumerate(['all', 'small', 'medium', 'large']) if aRng == areaRng]
mind = [i for i, mDet in enumerate([1, 10, 100]) if mDet == maxDets]
prec = self.eval['precision']
if iouThr is not None:
t = np.where(iouThr == p.iouThrs)[0]
prec = prec[t]
prec = prec[:, :, :, aind, mind]
# [iou, rec, cls, 1] -> [rec]
prec = prec.mean(0).mean(1).flatten()
return iStr.format(iouStr, areaStr, maxDetsStr), prec
if not self.eval:
raise Exception('Please run accumulate() first')
prs = []
prs.append(_summarize()) # 0.5:0.95, all
prs.append(_summarize(iouThr=.5)) # 0.5, all
prs.append(_summarize(iouThr=.75)) # 0.75, all
prs.append(_summarize(areaRng='small')) # 0.5:0.95, small
prs.append(_summarize(iouThr=.5, areaRng='small')) # 0.5, small
prs.append(_summarize(iouThr=.75, areaRng='small')) # 0.75, small
prs.append(_summarize(areaRng='medium')) # 0.5:0.95, medium
prs.append(_summarize(iouThr=.5, areaRng='medium')) # 0.5, medium
prs.append(_summarize(iouThr=.75, areaRng='medium')) # 0.75, medium
prs.append(_summarize(areaRng='large')) # 0.5:0.95, large
prs.append(_summarize(iouThr=.5, areaRng='large')) # 0.5, large
prs.append(_summarize(iouThr=.75, areaRng='large')) # 0.75, large
return dict(prs)
def __str__(self):
self.summarize()
class Params:
'''
Params for coco evaluation api
'''
def setDetParams(self):
self.imgIds = []
self.catIds = []
# np.arange causes trouble. the data point on arange is slightly larger than the true value
self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01)) + 1, endpoint=True)
self.maxDets = [1, 10, 100]
self.areaRng = [[0 ** 2, 1e5 ** 2], [0 ** 2, 32 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]
self.areaRngLbl = ['all', 'small', 'medium', 'large']
self.useCats = 1
def setKpParams(self):
self.imgIds = []
self.catIds = []
# np.arange causes trouble. the data point on arange is slightly larger than the true value
self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01)) + 1, endpoint=True)
self.maxDets = [20]
self.areaRng = [[0 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]
self.areaRngLbl = ['all', 'medium', 'large']
self.useCats = 1
def __init__(self, iouType='segm'):
if iouType == 'segm' or iouType == 'bbox':
self.setDetParams()
elif iouType == 'keypoints':
self.setKpParams()
else:
raise Exception('iouType not supported')
self.iouType = iouType
# useSegm is deprecated
self.useSegm = None
\ No newline at end of file
__author__ = 'tsungyi'
import seetadet.utils.pycocotools._mask as _mask
# Interface for manipulating masks stored in RLE format.
#
# RLE is a simple yet efficient format for storing binary masks. RLE
# first divides a vector (or vectorized image) into a series of piecewise
# constant regions and then for each piece simply stores the length of
# that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would
# be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1]
# (note that the odd counts are always the numbers of zeros). Instead of
# storing the counts directly, additional compression is achieved with a
# variable bitrate representation based on a common scheme called LEB128.
#
# Compression is greatest given large piecewise constant regions.
# Specifically, the size of the RLE is proportional to the number of
# *boundaries* in M (or for an image the number of boundaries in the y
# direction). Assuming fairly simple shapes, the RLE representation is
# O(sqrt(n)) where n is number of pixels in the object. Hence space usage
# is substantially lower, especially for large simple objects (large n).
#
# Many common operations on masks can be computed directly using the RLE
# (without need for decoding). This includes computations such as area,
# union, intersection, etc. All of these operations are linear in the
# size of the RLE, in other words they are O(sqrt(n)) where n is the area
# of the object. Computing these operations on the original mask is O(n).
# Thus, using the RLE can result in substantial computational savings.
#
# The following API functions are defined:
# encode - Encode binary masks using RLE.
# decode - Decode binary masks encoded via RLE.
# merge - Compute union or intersection of encoded masks.
# iou - Compute intersection over union between masks.
# area - Compute area of encoded masks.
# toBbox - Get bounding boxes surrounding encoded masks.
# frPyObjects - Convert polygon, bbox, and uncompressed RLE to encoded RLE mask.
#
# Usage:
# Rs = encode( masks )
# masks = decode( Rs )
# R = merge( Rs, intersect=false )
# o = iou( dt, gt, iscrowd )
# a = area( Rs )
# bbs = toBbox( Rs )
# Rs = frPyObjects( [pyObjects], h, w )
#
# In the API the following formats are used:
# Rs - [dict] Run-length encoding of binary masks
# R - dict Run-length encoding of binary mask
# masks - [hxwxn] Binary mask(s) (must have type np.ndarray(dtype=uint8) in column-major order)
# iscrowd - [nx1] list of np.ndarray. 1 indicates corresponding gt image has crowd region to ignore
# bbs - [nx4] Bounding box(es) stored as [x y w h]
# poly - Polygon stored as [[x1 y1 x2 y2...],[x1 y1 ...],...] (2D list)
# dt,gt - May be either bounding boxes or encoded masks
# Both poly and bbs are 0-indexed (bbox=[0 0 1 1] encloses first pixel).
#
# Finally, a note about the intersection over union (iou) computation.
# The standard iou of a ground truth (gt) and detected (dt) object is
# iou(gt,dt) = area(intersect(gt,dt)) / area(union(gt,dt))
# For "crowd" regions, we use a modified criteria. If a gt object is
# marked as "iscrowd", we allow a dt to match any subregion of the gt.
# Choosing gt' in the crowd gt that best matches the dt can be done using
# gt'=intersect(dt,gt). Since by definition union(gt',dt)=dt, computing
# iou(gt,dt,iscrowd) = iou(gt',dt) = area(intersect(gt,dt)) / area(dt)
# For crowd gt regions we use this modified criteria above for the iou.
#
# To compile run "python setup.py build_ext --inplace"
# Please do not contact us for help with compiling.
#
# Microsoft COCO Toolbox. version 2.0
# Data, paper, and tutorials available at: http://mscoco.org/
# Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
# Licensed under the Simplified BSD License [see coco/license.txt]
encode = _mask.encode
decode = _mask.decode
iou = _mask.iou
merge = _mask.merge
area = _mask.area
toBbox = _mask.toBbox
frPyObjects = _mask.frPyObjects
\ No newline at end of file
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from seetadet.utils.pycocotools import mask as mask_tools
from seetadet.utils.pycocotools.mask import frPyObjects
def poly2rle(poly, height, width):
"""Convert polygon(s) into encoded rle.
The polygon(s) may be store in following format:
1. Polygon with uncompressed RLE:
{'size': (h, w), 'counts', [1, 2, ...]}
2. Polygons with number of coordinates > 4:
[[x1, y1, x2, y2, x3, y3, ...], [x1, y1, x2, y2, x3, y3, ...]]
3. Polygons with uncompressed RLE:
[{'size': (h, w), 'counts', [1, 2, ...]}]
COCO use **2** and **1** to annotate instances and crowed objects.
The output rle(s) will be:
{'size': (h, w), 'counts': 'abc...'} or [{'size': (h, w), 'counts': 'abc...'}]
Parameters
----------
poly : Union[List, Dict]
The input polygons.
height : int
The height of image.
width : int
The width of image.
Returns
-------
Union[List, Dict]
The bytes or a sequence of bytes.
Notes
-----
COCODataset uses **2** and **1** to annotate instances and crowed objects.
"""
return frPyObjects(poly, height, width)
def poly2bytes(poly, height, width):
"""Convert polygon(s) into encoded mask bytes.
The polygon(s) may be store in the following format:
1. Polygon with uncompressed RLE:
{'size': (h, w), 'counts', [1, 2, ...]}
2. Polygons with number of coordinates > 4:
[[x1, y1, x2, y2, x3, y3, ...], [x1, y1, x2, y2, x3, y3, ...]]
3. Polygons with uncompressed RLE:
[{'size': (h, w), 'counts', [1, 2, ...]}]
If the number of polygons >= 2, we will merge them into a single mask.
Parameters
----------
poly : Union[List, Dict]
The input polygons.
height : int
The height of image.
width : int
The width of image.
Returns
-------
bytes
The mask bytes.
Notes
-----
COCODataset uses **2** and **1** to annotate instances and crowed objects.
"""
rle_objects = poly2rle(poly, height, width)
if isinstance(rle_objects, list):
if len(rle_objects) == 1:
return rle_objects[0]['counts']
rle_objects = mask_tools.merge(rle_objects)
return rle_objects['counts']
def bytes2img(data, height, width):
"""Decode the RLE mask bytes to a 2d image.
Parameters
----------
data : bytes
The encoded bytes.
height : int
The height of image.
width : int
The width of image.
Returns
-------
numpy.ndarray
The mask image.
"""
rle_objects = [{'counts': data, 'size': [height, width]}]
mask_image = mask_tools.decode(rle_objects)
if mask_image.shape[2] != 1:
raise ValueError(
'{} instances are found in data.\n'
'Merge them before compressing.'
.format(mask_image.shape[2]))
return mask_image[:, :, 0]
def img2bytes(data):
"""Compress a 2d mask image to RLE bytes.
Parameters
----------
data : numpy.ndarray
The image to compress.
Returns
-------
bytes
The encoded bytes.
"""
if len(data.shape) == 3:
raise ValueError(
'{} instances are found in data.\n'
'Merge them before compressing.'
.format(data.shape[2])
)
elif len(data.shape) != 2:
raise ValueError('Excepted a 2d mask.')
rle_objects = mask_tools.encode(
np.array(np.stack([data], 2), order='F'))
return rle_objects[0]['counts']
...@@ -8,10 +8,11 @@ ...@@ -8,10 +8,11 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Visualization utilities."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from seetadet.algo.retinanet.anchor_target import AnchorTarget from seetadet.utils.vis.colormap import colormap
from seetadet.algo.retinanet.data_loader import DataLoader from seetadet.utils.vis.visualizer import vis_one_image
...@@ -17,7 +17,6 @@ ...@@ -17,7 +17,6 @@
# <https://github.com/facebookresearch/Detectron/blob/master/detectron/utils/colormap.py> # <https://github.com/facebookresearch/Detectron/blob/master/detectron/utils/colormap.py>
# #
############################################################################## ##############################################################################
"""An awesome colormap for really neat visualizations.""" """An awesome colormap for really neat visualizations."""
from __future__ import absolute_import from __future__ import absolute_import
...@@ -29,8 +28,7 @@ import numpy as np ...@@ -29,8 +28,7 @@ import numpy as np
def colormap(rgb=False): def colormap(rgb=False):
color_list = np.array( color_list = np.array([
[
0.000, 0.447, 0.741, 0.000, 0.447, 0.741,
0.850, 0.325, 0.098, 0.850, 0.325, 0.098,
0.929, 0.694, 0.125, 0.929, 0.694, 0.125,
...@@ -109,9 +107,7 @@ def colormap(rgb=False): ...@@ -109,9 +107,7 @@ def colormap(rgb=False):
0.571, 0.571, 0.571, 0.571, 0.571, 0.571,
0.714, 0.714, 0.714, 0.714, 0.714, 0.714,
0.857, 0.857, 0.857, 0.857, 0.857, 0.857,
1.000, 1.000, 1.000 1.000, 1.000, 1.000]).astype(np.float32)
]
).astype(np.float32)
color_list = color_list.reshape((-1, 3)) * 255 color_list = color_list.reshape((-1, 3)) * 255
if not rgb: if not rgb:
color_list = color_list[:, ::-1] color_list = color_list[:, ::-1]
......
...@@ -29,18 +29,12 @@ import matplotlib.pyplot as plt ...@@ -29,18 +29,12 @@ import matplotlib.pyplot as plt
from matplotlib.patches import Polygon from matplotlib.patches import Polygon
import numpy as np import numpy as np
from seetadet.utils.colormap import colormap from seetadet.utils.mask import paste_masks
from seetadet.utils.boxes import expand_boxes from seetadet.utils.vis.colormap import colormap
plt.rcParams['pdf.fonttype'] = 42 # For editing in Adobe Illustrator plt.rcParams['pdf.fonttype'] = 42 # For editing in Adobe Illustrator
_GRAY = (218, 227, 218)
_GREEN = (18, 127, 15)
_WHITE = (255, 255, 255)
def kp_connections(keypoints): def kp_connections(keypoints):
kp_lines = [ kp_lines = [
[keypoints.index('left_eye'), keypoints.index('right_eye')], [keypoints.index('left_eye'), keypoints.index('right_eye')],
...@@ -72,7 +66,6 @@ def convert_from_cls_format(cls_boxes, cls_segms, cls_keyps, class_names): ...@@ -72,7 +66,6 @@ def convert_from_cls_format(cls_boxes, cls_segms, cls_keyps, class_names):
box_list.append(cls_boxes[j]) box_list.append(cls_boxes[j])
if cls_segms is not None: if cls_segms is not None:
segm_list.append(cls_segms[j]) segm_list.append(cls_segms[j])
if len(box_list) > 0: if len(box_list) > 0:
boxes = np.concatenate(box_list) boxes = np.concatenate(box_list)
else: else:
...@@ -85,7 +78,6 @@ def convert_from_cls_format(cls_boxes, cls_segms, cls_keyps, class_names): ...@@ -85,7 +78,6 @@ def convert_from_cls_format(cls_boxes, cls_segms, cls_keyps, class_names):
keyps = [k for klist in cls_keyps for k in klist] keyps = [k for klist in cls_keyps for k in klist]
else: else:
keyps = None keyps = None
classes = [] classes = []
for j in range(len(cls_boxes)): for j in range(len(cls_boxes)):
classes += [j] * len(cls_boxes[j]) classes += [j] * len(cls_boxes[j])
...@@ -111,7 +103,7 @@ def get_bbox_contours(rotated_box): ...@@ -111,7 +103,7 @@ def get_bbox_contours(rotated_box):
r21 = point_rotate((x2, y2), (cx, cy), radian) r21 = point_rotate((x2, y2), (cx, cy), radian)
r22 = point_rotate((x2, y1), (cx, cy), radian) r22 = point_rotate((x2, y1), (cx, cy), radian)
quad = np.array([r11, r12, r21, r22, r11]) quad = np.array([r11, r12, r21, r22, r11])
# Main direction # Main direction.
mside = max(w, h) / 2 mside = max(w, h) / 2
x_end = mside * np.cos(radian) x_end = mside * np.cos(radian)
y_end = mside * np.sin(radian) y_end = mside * np.sin(radian)
...@@ -119,34 +111,8 @@ def get_bbox_contours(rotated_box): ...@@ -119,34 +111,8 @@ def get_bbox_contours(rotated_box):
return quad, main_direction return quad, main_direction
def get_mask(boxes, segms, im_shape, mask_thresh=0.5):
i, masks = 0, np.zeros(list(im_shape) + [len(boxes)], 'uint8')
for det, msk in zip(boxes, segms):
M = msk.shape[0]
scale = (M + 2.) / M
ref_box = expand_boxes(np.array([det[:4]]), scale)[0]
ref_box = ref_box.astype(np.int32)
padded_mask = np.zeros((M + 2, M + 2), 'float32')
padded_mask[1:-1, 1:-1] = msk[:, :]
w = ref_box[2] - ref_box[0] + 1
h = ref_box[3] - ref_box[1] + 1
w = np.maximum(w, 1)
h = np.maximum(h, 1)
mask = cv2.resize(padded_mask, (w, h))
mask = np.array(mask > mask_thresh, 'uint8')
x1 = max(ref_box[0], 0)
y1 = max(ref_box[1], 0)
x2 = min(ref_box[2] + 1, im_shape[1])
y2 = min(ref_box[3] + 1, im_shape[0])
masks[y1: y2, x1: x2, i] = mask[
(y1 - ref_box[1]): (y2 - ref_box[1]),
(x1 - ref_box[0]): (x2 - ref_box[0])]
i += 1
return masks
def vis_one_image( def vis_one_image(
im, img,
class_names, class_names,
boxes, boxes,
segms=None, segms=None,
...@@ -154,7 +120,7 @@ def vis_one_image( ...@@ -154,7 +120,7 @@ def vis_one_image(
thresh=0.9, thresh=0.9,
kp_thresh=2, kp_thresh=2,
dpi=100, dpi=100,
box_alpha=0., box_alpha=1.,
show_class=True, show_class=True,
show_rotated=False, show_rotated=False,
filename=None, filename=None,
...@@ -162,27 +128,22 @@ def vis_one_image( ...@@ -162,27 +128,22 @@ def vis_one_image(
"""Visual debugging of detections.""" """Visual debugging of detections."""
boxes, segms, keypoints, classes = \ boxes, segms, keypoints, classes = \
convert_from_cls_format(boxes, segms, keypoints, class_names) convert_from_cls_format(boxes, segms, keypoints, class_names)
if boxes is None or boxes.shape[0] == 0 or max(boxes[:, -1]) < thresh:
if boxes is None \
or boxes.shape[0] == 0 or \
max(boxes[:, -1]) < thresh:
return return
img, masks = img[:, :, ::-1], None
im, masks = im[:, :, ::-1], None
if segms is not None and len(segms) > 0: if segms is not None and len(segms) > 0:
masks = get_mask(boxes, segms, im.shape[:2]) masks = paste_masks(segms, boxes, img.shape[0], img.shape[1],
thresh=0.5, data_order='C')
color_list = colormap(rgb=True) / 255 color_list = colormap(rgb=True) / 255
fig = plt.figure(frameon=False) fig = plt.figure(frameon=False)
fig.set_size_inches(im.shape[1] / dpi, im.shape[0] / dpi) fig.set_size_inches(img.shape[1] / dpi, img.shape[0] / dpi)
ax = plt.Axes(fig, [0., 0., 1., 1.]) ax = plt.Axes(fig, [0., 0., 1., 1.])
ax.axis('off') ax.axis('off')
fig.add_axes(ax) fig.add_axes(ax)
ax.imshow(im) ax.imshow(img)
# Display in largest to smallest order to reduce occlusion # Display in largest to smallest order to reduce occlusion.
if boxes.shape[1] == 5: if boxes.shape[1] == 5:
areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
elif boxes.shape[1] == 6: elif boxes.shape[1] == 6:
...@@ -193,83 +154,47 @@ def vis_one_image( ...@@ -193,83 +154,47 @@ def vis_one_image(
mask_color_id = 0 mask_color_id = 0
for i in sorted_inds: for i in sorted_inds:
bbox = boxes[i, :-1] bbox, score = boxes[i, :-1], boxes[i, -1]
score = boxes[i, -1]
if score < thresh: if score < thresh:
continue continue
# Draw box.
# Show box
if bbox.size == 4 and not show_rotated: if bbox.size == 4 and not show_rotated:
ax.add_patch( ax.add_patch(plt.Rectangle(
plt.Rectangle( (bbox[0], bbox[1]), bbox[2] - bbox[0], bbox[3] - bbox[1],
(bbox[0], bbox[1]), fill=False, edgecolor='g', linewidth=1., alpha=box_alpha))
bbox[2] - bbox[0], # Draw class.
bbox[3] - bbox[1],
fill=False,
edgecolor='g',
linewidth=1.,
alpha=box_alpha,
)
)
# Show class
if show_class: if show_class:
ax.text( ax.text(bbox[0], bbox[1] - 2,
bbox[0], bbox[1] - 2,
get_class_string(class_names[classes[i]], score), get_class_string(class_names[classes[i]], score),
fontsize=11, fontsize=11, family='serif', color='white',
family='serif', bbox=dict(facecolor='g', alpha=0.4, pad=0, edgecolor='none'))
bbox=dict(facecolor='g', alpha=0.4, pad=0, edgecolor='none'), # Draw mask.
color='white',
)
# Show mask
if segms is not None and len(segms) > i: if segms is not None and len(segms) > i:
img = np.ones(im.shape) img = np.ones(img.shape)
color_mask = color_list[mask_color_id % len(color_list), 0:3] color_mask = color_list[mask_color_id % len(color_list), 0:3]
mask_color_id += 1 mask_color_id += 1
w_ratio = .4 w_ratio = .4
for c in range(3): for c in range(3):
color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio
for c in range(3): for c in range(3):
img[:, :, c] = color_mask[c] img[:, :, c] = color_mask[c]
e = masks[:, :, i] e = masks[:, :, i]
results = cv2.findContours(e.copy(), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
results = cv2.findContours(
e.copy(),
cv2.RETR_CCOMP,
cv2.CHAIN_APPROX_NONE,
)
contours = results[0] if len(results) == 2 else results[1] contours = results[0] if len(results) == 2 else results[1]
if show_rotated and len(contours) > 1: if show_rotated and len(contours) > 1:
contours = [max(contours, key=cv2.contourArea)] contours = [max(contours, key=cv2.contourArea)]
for c in contours: for c in contours:
if show_rotated: if show_rotated:
rect = cv2.minAreaRect(c) rect = cv2.minAreaRect(c)
ax.add_patch( ax.add_patch(Polygon(cv2.boxPoints(rect), fill=False,
Polygon( edgecolor='g', linewidth=1., alpha=box_alpha))
cv2.boxPoints(rect), ax.add_patch(Polygon(c.reshape((-1, 2)),
fill=False, fill=True, facecolor=color_mask, edgecolor='w',
edgecolor='g', linewidth=1.2, alpha=0.5))
linewidth=1., # Save or show.
alpha=box_alpha,
)
)
ax.add_patch(Polygon(
c.reshape((-1, 2)),
fill=True,
facecolor=color_mask,
edgecolor='w',
linewidth=1.2,
alpha=0.5,
))
if filename is not None: if filename is not None:
fig.savefig(filename, dpi=dpi) fig.savefig(filename, dpi=dpi)
plt.close('all') plt.close('all')
else: else:
plt.imshow(im) plt.imshow(img)
plt.show() plt.show()
...@@ -15,94 +15,112 @@ from __future__ import print_function ...@@ -15,94 +15,112 @@ from __future__ import print_function
import os import os
import shutil import shutil
import setuptools
import setuptools.command.install
import sys
import subprocess import subprocess
import sys
import setuptools
import setuptools.command.build_py
import setuptools.command.install
# Read the current version info version = git_version = None
with open('version.txt', 'r') as f: if os.path.exists('version.txt'):
with open('version.txt', 'r') as f:
version = f.read().strip() version = f.read().strip()
try: if os.path.exists('.git'):
try:
git_version = subprocess.check_output( git_version = subprocess.check_output(
['git', 'rev-parse', 'HEAD'], cwd='./').decode('ascii').strip() ['git', 'rev-parse', 'HEAD'], cwd='./')
except (OSError, subprocess.CalledProcessError): git_version = git_version.decode('ascii').strip()
git_version = None except (OSError, subprocess.CalledProcessError):
pass
def clean():
"""Remove the work directories."""
if os.path.exists('build'):
shutil.rmtree('build')
if os.path.exists('seeta_det.egg-info'):
shutil.rmtree('seeta_det.egg-info')
def configure(): def build_extensions(parallel=4):
"""Prepare the package files.""" """Prepare the package files."""
# Compile cxx sources # Compile cxx sources.
py_exec = sys.executable py_exec = sys.executable
if subprocess.call( if subprocess.call(
'cd csrc/cxx && ' 'cd csrc/cxx && '
'{} setup.py build_ext -b ../ --no-python-abi-suffix=0 -j 4 &&' '{} setup.py build_ext -b ../../ -f --no-python-abi-suffix=0 -j {} &&'
'{} setup.py clean'.format(py_exec, py_exec), shell=True '{} setup.py clean'.format(py_exec, parallel, py_exec), shell=True,
) > 0: ) > 0:
raise RuntimeError('Failed to build the cxx sources.') raise RuntimeError('Failed to build the cxx sources.')
# Compile pyx sources # Compile pyx sources.
if subprocess.call( if subprocess.call(
'cd csrc/pyx && ' 'cd csrc/pyx && '
'{} setup.py build_ext -b ../ --cython-c-in-temp -j 4 &&' '{} setup.py build_ext -b ../../ -f --cython-c-in-temp -j {} &&'
'{} setup.py clean'.format(py_exec, py_exec), shell=True, '{} setup.py clean'.format(py_exec, parallel, py_exec), shell=True,
) > 0: ) > 0:
raise RuntimeError('Failed to build the pyx sources.') raise RuntimeError('Failed to build the pyx sources.')
# Copy the pre-built libraries
for root, _, files in os.walk('csrc/install'):
root = root[len('csrc/install/'):]
for file in files:
src = os.path.join(root, file)
dest = src.replace('lib', 'seetadet')
if os.path.exists(dest):
os.remove(dest)
shutil.copy(os.path.join('csrc/install', src), dest)
shutil.rmtree('csrc/install')
# Write the version file.
with open('seetadet/version.py', 'w') as f:
f.write("from __future__ import absolute_import\n"
"from __future__ import division\n"
"from __future__ import print_function\n\n"
"version = '{}'\n"
"git_version = '{}'\n".format(version, git_version))
class install(setuptools.command.install.install): def clean_builds():
"""Old-style command to prevent from installing egg.""" for path in ['build', 'seeta_det.egg-info']:
if os.path.exists(path):
shutil.rmtree(path)
def run(self):
setuptools.command.install.install.run(self)
def find_packages(top):
def find_packages():
"""Return the python sources installed to package.""" """Return the python sources installed to package."""
packages = [] packages = []
for root, _, files in os.walk('seetadet'): for root, _, _ in os.walk(top):
if os.path.exists(os.path.join(root, '__init__.py')): if os.path.exists(os.path.join(root, '__init__.py')):
packages.append(root) packages.append(root)
return packages return packages
def find_package_data(): def find_package_data(top):
"""Return the external data installed to package.""" """Return the external data installed to package."""
libraries = [] headers, libraries = [], []
for root, _, files in os.walk('seetadet'): if sys.platform == 'win32':
root = root[len('seetadet/'):] dylib_suffix = '.pyd'
elif sys.platform == 'darwin':
dylib_suffix = '.dylib'
else:
dylib_suffix = '.so'
for root, _, files in os.walk(top):
root = root[len(top + '/'):]
for file in files: for file in files:
if file.endswith('.so') or file.endswith('.pyd'): if file.endswith(dylib_suffix):
libraries.append(os.path.join(root, file)) libraries.append(os.path.join(root, file))
return libraries return headers + libraries
class BuildPyCommand(setuptools.command.build_py.build_py):
"""Enhanced 'build_py' command."""
def build_packages(self):
clean_builds()
with open('seetadet/version.py', 'w') as f:
f.write("from __future__ import absolute_import\n"
"from __future__ import division\n"
"from __future__ import print_function\n\n"
"version = '{}'\n"
"git_version = '{}'\n".format(version, git_version))
super(BuildPyCommand, self).build_packages()
def build_package_data(self):
parallel = 4
for k in ('build', 'install'):
v = self.get_finalized_command(k).parallel
parallel = max(parallel, (int(v) if v else v) or 1)
build_extensions(parallel=parallel)
self.package_data = {'seetadet': find_package_data('seetadet')}
super(BuildPyCommand, self).build_package_data()
class InstallCommand(setuptools.command.install.install):
"""Enhanced 'install' command."""
user_options = setuptools.command.install.install.user_options
user_options += [('parallel=', 'j', "number of parallel build jobs")]
def initialize_options(self):
self.parallel = None
super(InstallCommand, self).initialize_options()
self.old_and_unmanageable = True
configure()
setuptools.setup( setuptools.setup(
name='seeta-det', name='seeta-det',
version=version, version=version,
...@@ -110,10 +128,10 @@ setuptools.setup( ...@@ -110,10 +128,10 @@ setuptools.setup(
url='https://gitlab.seetatech.com/seetaresearch/seetadet', url='https://gitlab.seetatech.com/seetaresearch/seetadet',
author='SeetaTech', author='SeetaTech',
license='BSD 2-Clause', license='BSD 2-Clause',
packages=find_packages(), packages=find_packages('seetadet'),
package_data={'seetadet': find_package_data()},
package_dir={'seetadet': 'seetadet'}, package_dir={'seetadet': 'seetadet'},
cmdclass={'install': install}, cmdclass={'build_py': BuildPyCommand, 'install': InstallCommand},
install_requires=['opencv-python', 'pillow', 'pycocotools', 'prettytable'],
classifiers=[ classifiers=[
'Development Status :: 5 - Production/Stable', 'Development Status :: 5 - Production/Stable',
'Intended Audience :: Developers', 'Intended Audience :: Developers',
...@@ -125,7 +143,6 @@ setuptools.setup( ...@@ -125,7 +143,6 @@ setuptools.setup(
'Programming Language :: Python :: 3 :: Only', 'Programming Language :: Python :: 3 :: Only',
'Topic :: Scientific/Engineering', 'Topic :: Scientific/Engineering',
'Topic :: Scientific/Engineering :: Mathematics', 'Topic :: Scientific/Engineering :: Mathematics',
'Topic :: Scientific/Engineering :: Artificial Intelligence', 'Topic :: Scientific/Engineering :: Artificial Intelligence'],
],
) )
clean() clean_builds()
...@@ -8,7 +8,7 @@ ...@@ -8,7 +8,7 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Train a detection network with mpi utilities.""" """Train a detection network."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -22,15 +22,15 @@ import numpy ...@@ -22,15 +22,15 @@ import numpy
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.core.coordinator import Coordinator from seetadet.core.coordinator import Coordinator
from seetadet.core.train import train_net from seetadet.core.training import train_engine
from seetadet.datasets.factory import get_dataset from seetadet.data.build import build_dataset
from seetadet.utils import logger from seetadet.utils import logging
def parse_args(): def parse_args():
"""Parse arguments.""" """Parse arguments."""
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description='Train a detection network with mpi utilities') description='Train a detection network')
parser.add_argument( parser.add_argument(
'--cfg', '--cfg',
dest='cfg_file', dest='cfg_file',
...@@ -50,11 +50,11 @@ if __name__ == '__main__': ...@@ -50,11 +50,11 @@ if __name__ == '__main__':
args = parse_args() args = parse_args()
coordinator = Coordinator(args.cfg_file, exp_dir=args.exp_dir) coordinator = Coordinator(args.cfg_file, exp_dir=args.exp_dir)
checkpoint, start_iter = coordinator.checkpoint() checkpoint, start_iter = coordinator.get_checkpoint()
if checkpoint is not None: if checkpoint is not None:
cfg.TRAIN.WEIGHTS = checkpoint cfg.TRAIN.WEIGHTS = checkpoint
# Setup the distributed environment # Setup the distributed environment.
world_rank = dragon.distributed.get_rank() world_rank = dragon.distributed.get_rank()
world_size = dragon.distributed.get_world_size() world_size = dragon.distributed.get_world_size()
if cfg.NUM_GPUS != world_size: if cfg.NUM_GPUS != world_size:
...@@ -62,26 +62,25 @@ if __name__ == '__main__': ...@@ -62,26 +62,25 @@ if __name__ == '__main__':
'Excepted staring of {} processes, got {}.' 'Excepted staring of {} processes, got {}.'
.format(cfg.NUM_GPUS, world_size)) .format(cfg.NUM_GPUS, world_size))
# Setup the logging modules # Setup the logging modules.
logger.set_root_logger(world_rank == 0) logging.set_root_logger(world_rank == 0)
# Select the GPU depending on the rank of process # Select the GPU depending on the rank of process.
cfg.GPU_ID = [i for i in range(cfg.NUM_GPUS)][world_rank] cfg.GPU_ID = [i for i in range(cfg.NUM_GPUS)][world_rank]
# Fix the random seed for reproducibility # Fix the random seed for reproducibility.
numpy.random.seed(cfg.RNG_SEED) numpy.random.seed(cfg.RNG_SEED + world_rank)
dragon.random.set_seed(cfg.RNG_SEED) dragon.random.set_seed(cfg.RNG_SEED)
# Inspect the dataset # Inspect the dataset.
dataset = get_dataset(cfg.TRAIN.DATASET) dataset = build_dataset(cfg.TRAIN.DATASET)
logger.info('Dataset({}): {} images will be used to train.' logging.info('Dataset({}): {} images will be used to train.'
.format(cfg.TRAIN.DATASET, dataset.num_images)) .format(cfg.TRAIN.DATASET, dataset.num_images))
# Ready to train the network # Run training.
logger.info('Output will be saved to `{:s}`' logging.info('Checkpoints will be saved to `{:s}`'
.format(coordinator.checkpoints_dir())) .format(coordinator.path_at('checkpoints')))
with dragon.distributed.new_group( with dragon.distributed.new_group(
ranks=[i for i in range(cfg.NUM_GPUS)], ranks=[i for i in range(cfg.NUM_GPUS)],
backend='NCCL' if cfg.USE_NCCL else 'MPI',
verbose=True).as_default(): verbose=True).as_default():
train_net(coordinator, start_iter) train_engine.run_train(coordinator, start_iter)
...@@ -14,17 +14,18 @@ from __future__ import absolute_import ...@@ -14,17 +14,18 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import argparse
import os
import sys import sys
import argparse
import dragon.vm.torch as torch import dragon.vm.torch as torch
import pprint import numpy as np
from seetadet import onnx as _
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.core.coordinator import Coordinator from seetadet.core.coordinator import Coordinator
from seetadet.modeling.detector import new_detector from seetadet.models.build import build_detector
from seetadet.utils import logger from seetadet.ops import onnx as _ # noqa
from seetadet.utils import logging
def parse_args(): def parse_args():
...@@ -41,16 +42,25 @@ def parse_args(): ...@@ -41,16 +42,25 @@ def parse_args():
default='', default='',
help='experiment dir') help='experiment dir')
parser.add_argument( parser.add_argument(
'--model_dir',
default='',
help='model dir')
parser.add_argument(
'--gpu',
type=int,
default=0,
help='index of GPU to use')
parser.add_argument(
'--iter', '--iter',
type=int, type=int,
default=None, default=None,
help='iteration step of exporting checkpoint') help='checkpoint of given step')
parser.add_argument( parser.add_argument(
'--input_shape', '--input_shape',
nargs='+', nargs='+',
type=int, type=int,
default=(1, 224, 224, 3), default=(1, 512, 512, 3),
help='spec of input shape') help='input image shape')
parser.add_argument( parser.add_argument(
'--opset', '--opset',
type=int, type=int,
...@@ -67,33 +77,50 @@ def parse_args(): ...@@ -67,33 +77,50 @@ def parse_args():
return parser.parse_args() return parser.parse_args()
if __name__ == '__main__': def find_weights(args, coordinator):
args = parse_args() """Return the weights for exporting."""
logger.info('Called with args:\n' + str(args)) weights_list = []
if args.model_dir:
for file in os.listdir(args.model_dir):
if not file.endswith('.pkl'):
continue
weights_list.append(os.path.join(args.model_dir, file))
if args.iter is not None:
checkpoint, _ = coordinator.get_checkpoint(args.iter, wait=True)
weights_list.append(checkpoint)
return weights_list
coordinator = Coordinator(args.cfg_file, exp_dir=args.exp_dir) def get_dummay_inputs(args):
logger.info('Using config:\n' + pprint.pformat(cfg)) n, h, w, c = args.input_shape
im_batch = torch.zeros(n, h, w, c, dtype='uint8')
im_info = torch.tensor([[h, w, 1., 1.] for _ in range(n)], dtype='float32')
strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
strides = np.array(strides)[:, None]
grid_shapes = np.stack([[h, w]] * len(strides))
grid_shapes = (grid_shapes - 1) // strides + 1
grid_info = torch.tensor(grid_shapes, dtype='int64')
return {'img': im_batch, 'im_info': im_info, 'grid_info': grid_info}
# Load the checkpoint and test engine
checkpoint, _ = coordinator.checkpoint(args.iter)
if checkpoint is None:
raise RuntimeError(
'The checkpoint of step {} does not exist.'
.format(args.iter))
# Ready to export the network if __name__ == '__main__':
logger.info('Exporting model will be saved to `{:s}`' args = parse_args()
.format(coordinator.exports_dir())) logging.info('Called with args:\n' + str(args))
detector = new_detector(cfg.GPU_ID, checkpoint)
data = torch.zeros(*args.input_shape, dtype='uint8') coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir)
ims_info = torch.zeros(args.input_shape[0], 3, dtype='float32') logging.info('Using config:\n' + str(cfg))
# Run exporting.
weights = find_weights(args, coordinator)[0]
weights_name = os.path.splitext(os.path.basename(weights))[0]
output_dir = args.model_dir or coordinator.path_at('exports')
logging.info('Exports will be saved to ' + output_dir)
detector = build_detector(args.gpu, weights)
inputs = get_dummay_inputs(args)
torch.onnx.export( torch.onnx.export(
model=detector, model=detector,
args={'data': data, 'ims_info': ims_info}, args=inputs,
f=checkpoint.replace('checkpoints', 'exports') f=os.path.join(output_dir, weights_name + '.onnx'),
.replace('pkl', 'onnx'),
verbose=True, verbose=True,
opset_version=args.opset, opset_version=args.opset,
enable_onnx_checker=args.check_model, enable_onnx_checker=args.check_model,
......
...@@ -8,35 +8,33 @@ ...@@ -8,35 +8,33 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Deploy a detection network for serving.""" """Serve a detection network."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import base64 import argparse
import importlib import collections
import os import os
import threading import multiprocessing as mp
import time
import argparse
import cv2
import dragon
import flask import flask
import kpl_helper
import numpy as np import numpy as np
import pprint
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.core.coordinator import Coordinator from seetadet.core.coordinator import Coordinator
from seetadet.modeling.detector import new_detector from seetadet.core.testing import test_engine
from seetadet.utils import logger from seetadet.core.testing import test_server
from seetadet.utils import logging
from seetadet.utils import profiler
def parse_args(): def parse_args():
"""Parse arguments.""" """Parse arguments."""
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description='Deploy a detection network for serving') description='Serve a detection network')
parser.add_argument( parser.add_argument(
'--cfg', '--cfg',
dest='cfg_file', dest='cfg_file',
...@@ -47,14 +45,40 @@ def parse_args(): ...@@ -47,14 +45,40 @@ def parse_args():
default='', default='',
help='experiment dir') help='experiment dir')
parser.add_argument( parser.add_argument(
'--iter',
type=int,
default=None,
help='iteration of checkpoint')
parser.add_argument(
'--model_dir', '--model_dir',
default='', default='',
help='final model dir') help='model dir')
parser.add_argument( parser.add_argument(
'--iter', '--score_thresh',
type=float,
default=0.6,
help='score threshold for inference')
parser.add_argument(
'--batch_timeout',
type=float,
default=1,
help='timeout to wait for a batch')
parser.add_argument(
'--queue_size',
type=int,
default=512,
help='size of the memory queue')
parser.add_argument(
'--gpu',
nargs='+',
type=int, type=int,
default=None, default=None,
help='test checkpoint of given step') help='index of GPUs to use')
parser.add_argument(
'--processes',
type=int,
default=1,
help='number of flask processes')
parser.add_argument( parser.add_argument(
'--port', '--port',
type=int, type=int,
...@@ -63,101 +87,129 @@ def parse_args(): ...@@ -63,101 +87,129 @@ def parse_args():
return parser.parse_args() return parser.parse_args()
def get_image(base64_str): class WebServer(test_server.WebServer):
try: """Server to run web serving."""
image_bytes = base64.b64decode(base64_str)
img = np.frombuffer(image_bytes, np.uint8)
img = cv2.imdecode(img, cv2.IMREAD_COLOR)
return img
except Exception as e:
logger.info('Decode base64 image failed. detail: ' + str(e))
return None
def __init__(self, output_dir, output_queue, output_dict,
score_thresh=0.6, perf_every=100):
super(WebServer, self).__init__(output_dir)
self.output_queue = output_queue
self.output_dict = output_dict
self.score_thresh = score_thresh
self.perf_every = perf_every
self.max_dets = cfg.TEST.DETECTIONS_PER_IM
def get_objects(boxes_this_image): def make_objects(self, outputs):
boxes = outputs.pop('boxes')
objects = [] objects = []
for j, name in enumerate(cfg.MODEL.CLASSES): for j, name in enumerate(self.classes):
if name == '__background__': if name == '__background__':
continue continue
detections = boxes_this_image[j] inds = np.where(boxes[j][:, 4] > self.score_thresh)[0]
return_inds = np.where(detections[:, 4] > cfg.VIS_TH)[0] for box in boxes[j][inds]:
for det in detections[return_inds]: objects.append({'bbox': box[:4].astype(int).tolist(),
objects.append({ 'score': float(box[4]), 'class': name})
'score': float(det[4]),
'name': name,
'xmin': int(det[0]),
'ymin': int(det[1]),
'xmax': int(det[2]),
'ymax': int(det[3])
})
logger.info('Detect objects: ' + str(objects))
return objects return objects
@staticmethod
class Wrapper(object): def get_objects(retry_time=0.005):
"""Inference wrapper."""
def __init__(self, args):
if args.model_dir:
Coordinator(args.cfg_file, exp_dir=args.model_dir)
checkpoint = os.path.join(args.model_dir, 'model_final.pkl')
else:
coordinator = Coordinator(args.cfg_file, exp_dir=args.exp_dir)
checkpoint, _ = coordinator.checkpoint(args.iter, wait=False)
logger.info('Load model from: ' + checkpoint)
self.test_module = 'seetadet.algo.%s.test' % cfg.MODEL.TYPE
self.test_module = importlib.import_module(self.test_module)
self.detector = new_detector(cfg.GPU_ID, checkpoint)
self.lock = threading.RLock()
def do_inference(self, img):
compute_fn = getattr(self.test_module, 'ims_detect')
process_fn = getattr(self.test_module, 'get_detections')
try: try:
self.lock.acquire() req = flask.request.get_json(force=True)
outputs = compute_fn(self.detector, [img])[0] img_id = req['image_id']
finally: except KeyError:
self.lock.release() err_msg, img_id = 'Not found "image_id" in data.', ''
outputs = process_fn(outputs) flask.abort(flask.Response(err_msg))
return outputs[0] while img_id not in output_dict:
time.sleep(retry_time)
return img_id, output_dict.pop(img_id)
def run(self):
"""Main loop to make the detection objects."""
timers = collections.defaultdict(profiler.Timer)
count = 0
while True:
count += 1
img_id, time_diffs, outputs = self.output_queue.get()
outputs = test_engine.filter_outputs(outputs, self.max_dets)
for name, diff in time_diffs.items():
timers[name].add_diff(diff)
self.output_dict[img_id] = self.make_objects(outputs)
if count % self.perf_every == 0:
logging.info('im_detect: {:d} [{:.3f}s + {:.3f}s]'
.format(count, timers['im_detect'].average_time,
timers['misc'].average_time))
def find_weights(args, coordinator):
"""Return the weights for testing."""
weights_list = []
if args.model_dir:
for file in os.listdir(args.model_dir):
if not file.endswith('.pkl'):
continue
weights_list.append(os.path.join(args.model_dir, file))
elif args.iter is not None:
checkpoint, _ = coordinator.get_checkpoint(args.iter, wait=True)
weights_list.append(checkpoint)
return weights_list[0]
if __name__ == '__main__': if __name__ == '__main__':
os.environ['FLASK_ENV'] = 'production' os.environ['FLASK_ENV'] = 'production'
logging.set_formatter("%(asctime)s %(levelname)s %(message)s")
args = parse_args() args = parse_args()
logger.info('Called with args:\n' + str(args)) logging.info('Called with args:\n' + str(args))
logger.info('Using config:\n' + pprint.pformat(cfg))
coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir)
logging.info('Using config:\n' + str(cfg))
# Build actors.
weights = find_weights(args, coordinator)
devices = args.gpu if args.gpu else [cfg.GPU_ID]
num_devices = len(devices)
queues = [mp.Queue(args.queue_size) for _ in range(num_devices + 1)]
actors = [mp.Process(
target=test_engine.test_detector, kwargs={
'test_cfg': cfg,
'weights': weights,
'queues': [queues[i], queues[-1]],
'device': devices[i],
'verbose': i == 0,
'batch_timeout': args.batch_timeout}) for i in range(num_devices)]
for actor in actors:
actor.start()
# Build server.
server_manager = mp.Manager()
output_dict = server_manager.dict()
server = WebServer(
output_dir='./',
output_queue=queues[-1],
output_dict=output_dict,
score_thresh=args.score_thresh)
server.start()
# Build app.
app = flask.Flask('SeetaDet') app = flask.Flask('SeetaDet')
workspace = dragon.Workspace() logging._logging.getLogger('werkzeug').setLevel('ERROR')
with workspace.as_default(): debug_objects = os.environ.get('FLASK_DEBUG', False)
wrapper = Wrapper(args)
@app.route("/upload", methods=['POST'])
@app.route("/", methods=['POST']) def upload():
def infer(): img_id, img = server.get_image()
try: queues[img_id % num_devices].put((img_id, img))
req = flask.request.get_json(force=True) return flask.jsonify({'image_id': img_id})
base64_str = req['base64_image']
except KeyError: @app.route("/get", methods=['POST'])
print('Not found base64 image.') def get():
return flask.abort(400) img_id, objects = server.get_objects(retry_time=0.005)
response = kpl_helper.deploy.RectangleBoxObjectDetectionResponse(0, 0, 0) msg = 'ImageId = %d, #Detects = %d' % (img_id, len(objects))
base64_str = base64_str.split(",")[-1] if debug_objects:
img = get_image(base64_str) msg += (('\n * ' if len(objects) > 0 else '') +
if not isinstance(img, np.ndarray): ('\n * '.join(str(obj) for obj in objects)))
return flask.jsonify(response.dumps()) logging.info(msg)
response.height, response.width, response.depth = img.shape return flask.jsonify({'objects': objects})
with workspace.as_default():
detections = wrapper.do_inference(img) app.run(host="0.0.0.0", port=args.port,
objects = get_objects(detections) threaded=args.processes == 1, processes=args.processes)
for obj in objects:
response.add_object(obj['name'],
obj['xmin'],
obj['ymin'],
obj['xmax'],
obj['ymax'],
obj['score'])
return flask.jsonify(response.dumps())
app.run(host="0.0.0.0", port=args.port)
...@@ -14,18 +14,15 @@ from __future__ import absolute_import ...@@ -14,18 +14,15 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import argparse
import os import os
import sys import sys
import argparse
import pprint
from seetadet.core import test_engine
from seetadet.core import test_server
from seetadet.core.coordinator import Coordinator
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.datasets.factory import get_dataset from seetadet.core.coordinator import Coordinator
from seetadet.utils import logger from seetadet.core.testing import test_engine
from seetadet.data.build import build_dataset
from seetadet.utils import logging
def parse_args(): def parse_args():
...@@ -44,90 +41,96 @@ def parse_args(): ...@@ -44,90 +41,96 @@ def parse_args():
parser.add_argument( parser.add_argument(
'--model_dir', '--model_dir',
default='', default='',
help='final model dir') help='model dir')
parser.add_argument( parser.add_argument(
'--gpus', '--gpu',
nargs='+', nargs='+',
type=int, type=int,
default=None, default=None,
help='index of GPUs to use') help='index of GPUs to use')
parser.add_argument( parser.add_argument(
'--iter', '--iter',
nargs='+',
type=int, type=int,
default=None, default=None,
help='test checkpoint of given step') help='iteration step of checkpoints')
parser.add_argument( parser.add_argument(
'--last', '--last',
type=int, type=int,
default=1, default=1,
help='test n last checkpoints') help='last N checkpoints')
parser.add_argument( parser.add_argument(
'--read_every', '--read_every',
type=int, type=int,
default=1000, default=100,
help='read every-n images for testing') help='read every-n images for testing')
parser.add_argument( parser.add_argument(
'--log_every', '--vis',
type=int, type=float,
default=100, default=0,
help='display testing progress every-n images') help='score threshold for visualization')
parser.add_argument( parser.add_argument(
'--dump', '--dump',
action='store_true', action='store_true',
help='dump the result back to record or not') help='dump the result back to record')
parser.add_argument( parser.add_argument(
'--wait', '--wait',
action='store_true', action='store_true',
help='wait the checkpoint or not') help='wait the checkpoint or not')
parser.add_argument(
'--precision',
default='',
help='compute precision')
if len(sys.argv) == 1: if len(sys.argv) == 1:
parser.print_help() parser.print_help()
sys.exit(1) sys.exit(1)
return parser.parse_args() return parser.parse_args()
def find_weights(args, coordinator):
"""Return the weights for testing."""
weights_list = []
if args.model_dir:
for file in os.listdir(args.model_dir):
if not file.endswith('.pkl'):
continue
weights_list.append(os.path.join(args.model_dir, file))
return weights_list
if args.iter is not None:
for iter in args.iter:
checkpoint, _ = coordinator.get_checkpoint(iter, wait=True)
weights_list.append(checkpoint)
return weights_list
for i in range(1, args.last + 1):
checkpoint, _ = coordinator.get_checkpoint(last_idx=i)
if checkpoint is None:
break
weights_list.append(checkpoint)
return weights_list
if __name__ == '__main__': if __name__ == '__main__':
args = parse_args() args = parse_args()
logger.info('Called with args:\n' + str(args)) logging.info('Called with args:\n' + str(args))
coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir) coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir)
logger.info('Using config:\n' + pprint.pformat(cfg)) cfg.MODEL.PRECISION = args.precision or cfg.MODEL.PRECISION
logging.info('Using config:\n' + str(cfg))
# Inspect the dataset # Inspect dataset.
dataset = get_dataset(cfg.TEST.DATASET) dataset = build_dataset(cfg.TEST.DATASET)
cfg.TEST.PROTOCOL = 'dump' if args.dump else cfg.TEST.PROTOCOL logging.info('Dataset({}): {} images will be used to test.'
logger.info('Dataset({}): {} images will be used to test.'
.format(cfg.TEST.DATASET, dataset.num_images)) .format(cfg.TEST.DATASET, dataset.num_images))
# Inspect the checkpoints # Run testing.
test_checkpoints = [] for weights in find_weights(args, coordinator):
if args.model_dir: weights_name = os.path.splitext(os.path.basename(weights))[0]
for file in os.listdir(args.model_dir): output_dir = coordinator.path_at('results/' + weights_name)
if file.endswith('.pkl'): logging.info('Results will be saved to ' + output_dir)
test_checkpoints.append(os.path.join(args.model_dir, file)) test_engine.run_test(
else: weights=weights,
if args.iter is not None: output_dir=output_dir,
checkpoint, _ = coordinator.checkpoint(args.iter, wait=True) devices=args.gpu,
test_checkpoints.append(checkpoint)
else:
i = 1
while True:
checkpoint, _ = coordinator.checkpoint(last_idx=i)
if checkpoint is not None:
test_checkpoints.append(checkpoint)
i += 1
if args.last is not None and i > args.last:
break
else:
break
for checkpoint in test_checkpoints:
# Create the server and run the test
output_dir = coordinator.results_dir(checkpoint)
logger.info('Results will be saved to ' + output_dir)
test_engine.run_test_net(
checkpoint=checkpoint,
server=test_server.EvaluateServer(output_dir),
devices=args.gpus,
read_every=args.read_every, read_every=args.read_every,
log_every=args.log_every, vis_thresh=args.vis,
) )
...@@ -14,19 +14,18 @@ from __future__ import absolute_import ...@@ -14,19 +14,18 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import argparse
import os import os
import sys import sys
import argparse
import dragon import dragon
import numpy import numpy
import pprint
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.core.coordinator import Coordinator from seetadet.core.coordinator import Coordinator
from seetadet.core.train import train_net from seetadet.core.training import train_engine
from seetadet.datasets.factory import get_dataset from seetadet.data.build import build_dataset
from seetadet.utils import logger from seetadet.utils import logging
def parse_args(): def parse_args():
...@@ -48,51 +47,42 @@ def parse_args(): ...@@ -48,51 +47,42 @@ def parse_args():
return parser.parse_args() return parser.parse_args()
def mpi_train(cfg_file, exp_dir): def run_distributed(args, coordinator):
"""Call mpi to train models on multiple GPUs. """Run distributed training."""
Parameters
----------
cfg_file : str
The path of the cfg file.
exp_dir : str
The existing experiment dir.
"""
import subprocess import subprocess
args = 'mpirun --allow-run-as-root -n {} --bind-to none '.format(cfg.NUM_GPUS) cmd = 'mpirun --allow-run-as-root -n {} --bind-to none '.format(cfg.NUM_GPUS)
args += '{} {} '.format(sys.executable, 'mpi_train.py') cmd += '{} {} '.format(sys.executable, 'distributed/train.py')
args += '--cfg {} --exp_dir {} '.format(os.path.abspath(cfg_file), exp_dir) cmd += '--cfg {} '.format(os.path.abspath(args.cfg_file))
return subprocess.call(args, shell=True) cmd += '--exp_dir {}'.format(coordinator.exp_dir)
return subprocess.call(cmd, shell=True)
if __name__ == '__main__': if __name__ == '__main__':
args = parse_args() args = parse_args()
logger.info('Called with args:\n' + str(args)) logging.info('Called with args:\n' + str(args))
coordinator = Coordinator(args.cfg_file, args.exp_dir) coordinator = Coordinator(args.cfg_file, args.exp_dir)
logger.info('Using config:\n' + pprint.pformat(cfg)) logging.info('Using config:\n' + str(cfg))
if cfg.NUM_GPUS > 1: if cfg.NUM_GPUS > 1:
# Dispatch the MPI to start a multi-nodes task # Run a distributed task.
coordinator.checkpoints_dir() run_distributed(args, coordinator)
mpi_train(args.cfg_file, coordinator.exp_dir)
else: else:
# Resume training? # Resume training?
checkpoint, start_iter = coordinator.checkpoint() checkpoint, start_iter = coordinator.get_checkpoint()
if checkpoint is not None: if checkpoint is not None:
cfg.TRAIN.WEIGHTS = checkpoint cfg.TRAIN.WEIGHTS = checkpoint
# Fix the random seed for reproducibility # Fix the random seed for reproducibility.
numpy.random.seed(cfg.RNG_SEED) numpy.random.seed(cfg.RNG_SEED)
dragon.random.set_seed(cfg.RNG_SEED) dragon.random.set_seed(cfg.RNG_SEED)
# Inspect the dataset # Inspect the dataset.
dataset = get_dataset(cfg.TRAIN.DATASET) dataset = build_dataset(cfg.TRAIN.DATASET)
logger.info('Dataset({}): {} images will be used to train.' logging.info('Dataset({}): {} images will be used to train.'
.format(cfg.TRAIN.DATASET, dataset.num_images)) .format(cfg.TRAIN.DATASET, dataset.num_images))
# Ready to train the network # Run training.
logger.info('Output will be saved to `{:s}`' logging.info('Checkpoints will be saved to `{:s}`'
.format(coordinator.checkpoints_dir())) .format(coordinator.path_at('checkpoints')))
train_net(coordinator, start_iter) train_engine.run_train(coordinator, start_iter)
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!