Commit ca4313d9 by Ting PAN

Update version to 0.6.0a

1 parent efc0106a
Showing with 1940 additions and 2455 deletions
......@@ -9,4 +9,3 @@ ignore = E741, # ambiguous variable name
W504, # line break after binary operator
# module imported but unused
per-file-ignores = __init__.py: F401
exclude = seetadet/utils/pycocotools
......@@ -2,25 +2,9 @@
## Introduction
### ImageNet Pretrained Models
### Pretrained Models
#### ResNet Models
- [R-50.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-50.pkl)
- [R-101.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-101.pkl)
#### VGG Models
- [VGG16.SSD.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/VGG16.SSD.pkl)
#### MobileNet Models
- [MobileNetV2.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/MobileNetV2.pkl)
- [ProxylessMobile.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/ProxylessMobile.pkl)
#### AirNet Models
- [AirNet.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/AirNet.pkl)
Please refer to [Pretrained Models](data/pretrained/README.md) for details.
## Baselines
......
......@@ -7,10 +7,6 @@ while the style of codes is torch.
The torch-style codes help us to simplify the hierarchical pipeline of modern detection.
## Requirements
seeta-dragon >= 0.3.0.dev20201024
## Installation
### Build From Source
......@@ -57,35 +53,23 @@ python test.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
```
Or
### Export a detection model to ONNX
```bash
cd tools
python test_all.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --last 1
python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
```
### Export a detection model to ONNX
### Serve a detection model
```bash
cd tools
python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
python serve.py --cfg <MODEL_YAML> --model_dir <MODEL_DIR>
```
## Benchmark and Model Zoo
Results and models are available in the [Model Zoo](MODEL_ZOO.md).
### Supported Backbones
- [ResNet](MODEL_ZOO.md#resnet-models)
- [VGG](MODEL_ZOO.md#vgg-models)
- [MobileNet](MODEL_ZOO.md#mobilenet-models)
- [AirNet](MODEL_ZOO.md#airnet-models)
### Supported Algorithms
- [Faster R-CNN](configs/faster_rcnn)
- [Mask R-CNN](configs/mask_rcnn)
- [SSD](configs/ssd)
- [RetinaNet](configs/retinanet)
## License
[BSD 2-Clause license](LICENSE)
......@@ -14,13 +14,7 @@
## COCO Object Detection Baselines
| Model | Lr sched | Infer time (s/im) | box AP | Download |
| :---: | :------: | :---------------: | :----: | :------: |
| [R-50-FPN-800](coco_faster_rcnn_R-50-FPN_800_1x.yml) | 1x | 0.046 | 38.3 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_1x/model_final.pkl) |
| [R-50-FPN-800](coco_faster_rcnn_R-50-FPN_800_2x.yml) | 2x | 0.046 | 39.7 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_2x/model_final.pkl) |
## Pascal VOC Object Detection Baselines
| Model | Infer time (s/im) | AP@0.5 | Download |
| :---: | :---------------: | :----: | :------: |
| [R-50-FPN-640](voc_faster_rcnn_R-50-FPN_640.yml) | 0.030 | 80.8 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/voc_faster_rcnn_R-50-FPN_640_1x/model_final.pkl) |
| Model | Lr sched | Infer time (fps) | box AP | Download |
| :---: | :------: | :--------------: | :----: | :-----: |
| [R-50-FPN](coco_faster_rcnn_R_50_FPN_1x.yml) | 1x | 27.78 | 38.4 | [model](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x/model_adb024b6.pkl) &#124; [log]() |
| [R-50-FPN](coco_faster_rcnn_R_50_FPN_2x.yml) | 2x | 27.78 | 39.8 | [model](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_2x/model_9a8c9ae5.pkl) &#124; [log]() |
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'faster_rcnn'
BACKBONE: 'resnet50.fpn'
PRECISION: 'float16'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
......@@ -19,27 +17,30 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
SOLVER:
BASE_LR: 0.02
DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_faster_rcnn_R-50-FPN_800_1x'
FRCNN:
BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7
SNAPSHOT_PREFIX: 'coco_faster_rcnn_R_50_FPN'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2017_train'
WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '../data/datasets/coco_train2017'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2017_val'
JSON_FILE: '/data/instances_val2017.json'
PROTOCOL: 'coco'
DATASET: '../data/datasets/coco_val2017'
JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'faster_rcnn'
BACKBONE: 'resnet50.fpn'
PRECISION: 'float16'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
......@@ -19,27 +17,30 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
SOLVER:
BASE_LR: 0.02
DECAY_STEPS: [120000, 160000]
MAX_STEPS: 180000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_faster_rcnn_R-50-FPN_800_2x'
FRCNN:
BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7
SNAPSHOT_PREFIX: 'coco_faster_rcnn_R-50-FPN_800'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2017_train'
WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '../data/datasets/coco_train2017'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2017_val'
JSON_FILE: '/data/instances_val2017.json'
PROTOCOL: 'coco'
DATASET: '../data/datasets/coco_val2017'
JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 1
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'faster_rcnn'
BACKBONE: 'resnet50.fpn'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
FRCNN:
BATCH_SIZE: 128
ROI_XFORM_RESOLUTION: 7
SOLVER:
BASE_LR: 0.002
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_faster_rcnn_R-50-FPN_640'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 2
SCALES: [480, 512, 544, 576, 608, 640]
MAX_SIZE: 1066
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [640]
MAX_SIZE: 1066
NMS: 0.45
......@@ -14,7 +14,7 @@
## COCO Instance Segmentation Baselines
| Model | Lr sched | Infer time (s/im) | box AP | mask AP | Download |
| Model | Lr sched | Infer time (fps) | box AP | mask AP | Download |
| :---: | :------: | :---------------: | :----: | :-----: | :------: |
| [R-50-FPN-800](coco_mask_rcnn_R-50-FPN_800_1x.yml) | 1x | 0.056 | 39.2 | 34.8 | [model](https://dragon.seetatech.com/download/models/seetadet/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_1x/model_final.pkl) |
| [R-50-FPN-800](coco_mask_rcnn_R-50-FPN_800_2x.yml) | 2x | 0.056 | 41.4 | 36.5 | [model](https://dragon.seetatech.com/download/models/seetadet/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_2x/model_final.pkl) |
| [R-50-FPN](coco_mask_rcnn_R_50_FPN_1x.yml) | 1x | 22.22 | 39.2 | 35.1 | [model](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x/model_90266029.pkl) &#124; [log]() |
| [R-50-FPN](coco_mask_rcnn_R_50_FPN_2x.yml) | 2x | 22.22 | 41.4 | 36.7 | [model](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_2x/model_4ace9d05.pkl) &#124; [log]() |
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'mask_rcnn'
BACKBONE: 'resnet50.fpn'
TYPE: mask_rcnn
PRECISION: 'float16'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
......@@ -19,28 +17,31 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
SOLVER:
BASE_LR: 0.02
DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_mask_rcnn_R-50-FPN_800_1x'
FRCNN:
BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7
MRCNN:
ROI_XFORM_RESOLUTION: 14
SNAPSHOT_PREFIX: 'coco_mask_rcnn_R-50-FPN'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2017_train'
WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '../data/datasets/coco_train2017'
LOADER: 'mask_train'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2017_val'
JSON_FILE: '/data/instances_val2017.json'
PROTOCOL: 'coco'
DATASET: '../data/datasets/coco_val2017'
JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'mask_rcnn'
BACKBONE: 'resnet50.fpn'
TYPE: mask_rcnn
PRECISION: 'float16'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
......@@ -19,28 +17,31 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
SOLVER:
BASE_LR: 0.02
DECAY_STEPS: [120000, 160000]
MAX_STEPS: 180000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_mask_rcnn_R-50-FPN_800_2x'
FRCNN:
BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7
MRCNN:
ROI_XFORM_RESOLUTION: 14
SNAPSHOT_PREFIX: 'coco_mask_rcnn_R_50_FPN'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2017_train'
WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '../data/datasets/coco_train2017'
LOADER: 'mask_train'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2017_val'
JSON_FILE: '/data/instances_val2017.json'
PROTOCOL: 'coco'
DATASET: '../data/datasets/coco_val2017'
JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
......@@ -12,16 +12,7 @@
## COCO Object Detection Baselines
| Model | Lr sched | Infer time (s/im) | box AP | Download |
| :---: | :------: | :---------------: | :----: | :------: |
| [R-50-FPN-416](coco_retinanet_R-50-FPN_416_6x.yml) | 6x | 0.019 | 34.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_416_6x/model_final.pkl) |
| [R-50-FPN-512](coco_retinanet_R-50-FPN_512_6x.yml) | 6x | 0.022 | 36.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_512_6x/model_final.pkl) |
| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_1x.yml) | 1x | 0.051 | 37.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_1x/model_final.pkl) |
| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_2x.yml) | 2x | 0.051 | 39.1 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_2x/model_final.pkl) |
## Pascal VOC Object Detection Baselines
| Model | Infer time (s/im) | AP@0.5 | Download |
| :---: | :---------------: | :----: | :------: |
| [R-50-FPN-416](voc_retinanet_R-50-FPN_416.yml) | 0.015 | 82.3 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_416/model_final.pkl) |
| [R-50-FPN-512](voc_retinanet_R-50-FPN_512.yml) | 0.017 | 83.0 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_512/model_final.pkl) |
| Model | Lr sched | Infer time (fps) | box AP | Download |
| :---: | :------: | :--------------: | :----: | :------: |
| [R-50-FPN](coco_retinanet_R_50_FPN_1x.yml) | 1x | 23.3 | 37.4 | [model](https://dragon.seetatech.com/download/seetadet/retinanet/coco_retinanet_R_50_FPN_1x/model_01a4d35f.pkl) &#124; [log]() |
| [R-50-FPN](coco_retinanet_R_50_FPN_2x.yml) | 2x | 23.3 | 39.0 | [model](https://dragon.seetatech.com/download/seetadet/retinanet/coco_retinanet_R_50_FPN_2x/model_7e81f3ad.pkl) &#124; [log]() |
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'retinanet'
BACKBONE: 'resnet50.fpn'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [90000, 120000]
MAX_STEPS: 135000
SNAPSHOT_EVERY: 2500
SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_416_6x'
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2017_train'
IMS_PER_BATCH: 8
SCALES: [416]
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2017_val'
JSON_FILE: '/data/instances_val2017.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [416]
NMS: 0.5
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'retinanet'
BACKBONE: 'resnet50.fpn'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [90000, 120000]
MAX_STEPS: 135000
SNAPSHOT_EVERY: 2500
SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_512_6x'
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2017_train'
IMS_PER_BATCH: 8
SCALES: [512]
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2017_val'
JSON_FILE: '/data/instances_val2017.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [512]
NMS: 0.5
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'retinanet'
BACKBONE: 'resnet50.fpn'
PRECISION: 'float16'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
......@@ -19,27 +17,25 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
BACKBONE:
TYPE: 'resnet50.fpn'
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800_1x'
SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2017_train'
WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '../data/datasets/coco_train2017'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2017_val'
JSON_FILE: '/data/instances_val2017.json'
PROTOCOL: 'coco'
DATASET: '../data/datasets/coco_val2017'
JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'retinanet'
BACKBONE: 'resnet50.fpn'
PRECISION: 'float16'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
......@@ -19,27 +17,25 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
BACKBONE:
TYPE: 'resnet50.fpn'
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [120000, 160000]
MAX_STEPS: 180000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800_2x'
SNAPSHOT_PREFIX: 'coco_retinanet_R-50-FPN_800'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2017_train'
WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '../data/datasets/coco_train2017'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2017_val'
JSON_FILE: '/data/instances_val2017.json'
PROTOCOL: 'coco'
DATASET: '../data/datasets/coco_val2017'
JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 1
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'retinanet'
BACKBONE: 'resnet50.fpn'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
RETINANET:
NUM_CONVS: 2
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_retinanet_R-50-FPN_416'
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 16
SCALES: [416]
RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [416]
NMS: 0.45
RETINANET_PRE_NMS_TOP_N: 1000
NUM_GPUS: 2
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'retinanet'
BACKBONE: 'resnet50.fpn'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
RETINANET:
NUM_CONVS: 2
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_retinanet_R-50-FPN_512'
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 8
SCALES: [512]
RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [512]
NMS: 0.45
RETINANET_PRE_NMS_TOP_N: 1000
......@@ -12,7 +12,9 @@
## Pascal VOC Object Detection Baselines
| Model | Infer time (s/im) | AP@0.5 | Download |
| :---: | :---------------: | :----: | :------: |
| [VGG-16-300](voc_ssd_VGG-16_300.yml) | 0.012 | 78.3 | [model](https://dragon.seetatech.com/download/models/seetadet/ssd/voc_ssd_VGG-16_300/model_final.pkl) |
| [VGG-16-512](voc_ssd_VGG-16_512.yml) | 0.021 | 80.1 | [model](https://dragon.seetatech.com/download/models/seetadet/ssd/voc_ssd_VGG-16_512/model_final.pkl) |
| Model | Lr sched | Infer time (fps) | AP@0.5 | Download |
| :---: | :----: | :--------------: | :----: | :------: |
| [VGG-16-SSD300](voc_ssd300_VGG_16_120e.yml) | 120e | 100.0 | 78.3 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd300_VGG-16_120e/model_54664312.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd300_VGG-16_120e/logs.json) |
| [VGG-16-SSD512](voc_ssd512_VGG_16_120e.yml) | 120e | 71.4 | 80.6 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd512_VGG-16_120e/model_e332ebfe.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssd512_VGG-16_120e/logs.json) |
| [MobileNetV2-SSDLite](voc_ssdlite_MobileNetV2_300e.yml) | 300e | 76.9 | 71.6 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV2_300e/model_da31ebe7.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV2_300e/logs.json) |
| [MobileNetV3L-SSDLite](voc_ssdlite_MobileNetV3L_300e.yml) | 300e | 66.7 | 72.6 | [model](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV3L_300e/model_43b33a97.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/ssd/voc_ssdlite_MobileNetV3L_300e/logs.json) |
NUM_GPUS: 1
PIXEL_STDS: [1.0, 1.0, 1.0]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: 'ssd'
BACKBONE: 'vgg16_reduced_300'
COARSEST_STRIDE: 0
PRECISION: 'float16'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
SSD:
BACKBONE:
TYPE: 'vgg16_fcn.ssd300'
NORM: ''
COARSEST_STRIDE: 300
FPN:
ACTIVATION: 'ReLU'
ANCHOR_GENERATOR:
STRIDES: [8, 16, 32, 64, 100, 300]
ANCHOR_SIZES: [[30, 60],
[60, 110],
[110, 162],
[162, 213],
[213, 264],
[264, 315]]
SIZES: [[30, 60], [60, 110],[110, 162],
[162, 213], [213, 264], [264, 315]]
ASPECT_RATIOS: [[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
......@@ -31,18 +30,21 @@ SOLVER:
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_ssd_VGG-16_300'
SNAPSHOT_PREFIX: 'voc_ssd300_VGG_16'
AUG:
COLOR_JITTER: 0.5
TRAIN:
WEIGHTS: '/model/VGG16.SSD.pkl'
DATASET: '/data/voc_0712_trainval'
WEIGHTS: '../data/pretrained/VGG-16-FCN_in1k.pkl'
DATASET: '../data/datasets/voc_trainval0712'
IMS_PER_BATCH: 16
SCALES: [300]
RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True
SCALES_RANGE: [0.25, 1.0]
LOADER: 'ssd_train'
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007'
DATASET: '../data/datasets/voc_test2007'
JSON_DATASET: '../data/datasets/voc_test2007.json'
EVALUATOR: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [300]
NMS: 0.45
NMS_THRESH: 0.45
SCORE_THRESH: 0.01
NUM_GPUS: 2
PIXEL_STDS: [1.0, 1.0, 1.0]
PIXEL_MEANS: [103.53, 116.28, 123.675]
NUM_GPUS: 1
MODEL:
TYPE: 'ssd'
BACKBONE: 'vgg16_reduced_512'
PRECISION: 'float16'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
SSD:
BACKBONE:
TYPE: 'vgg16_fcn.ssd512'
NORM: ''
COARSEST_STRIDE: 512
FPN:
ACTIVATION: 'ReLU'
ANCHOR_GENERATOR:
STRIDES: [8, 16, 32, 64, 128, 256, 512]
ANCHOR_SIZES: [[35.84, 76.8],
[76.8, 153.6],
[153.6, 230.4],
[230.4, 307.2],
[307.2, 384.0],
[384.0, 460.8],
[460.8, 537.6]]
SIZES: [[35.84, 76.8],
[76.8, 153.6],
[153.6, 230.4],
[230.4, 307.2],
[307.2, 384.0],
[384.0, 460.8],
[460.8, 537.6]]
ASPECT_RATIOS: [[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
......@@ -32,18 +36,21 @@ SOLVER:
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_ssd_VGG-16_512'
SNAPSHOT_PREFIX: 'voc_ssd512_VGG_16'
AUG:
COLOR_JITTER: 0.5
TRAIN:
WEIGHTS: '/model/VGG16.SSD.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 8
WEIGHTS: '../data/pretrained/VGG-16-FCN_in1k.pkl'
DATASET: '../data/datasets/voc_trainval0712'
IMS_PER_BATCH: 16
SCALES: [512]
RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True
SCALES_RANGE: [0.25, 1.0]
LOADER: 'ssd_train'
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007'
DATASET: '../data/datasets/voc_test2007'
JSON_DATASET: '../data/datasets/voc_test2007.json'
EVALUATOR: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [512]
NMS: 0.45
NMS_THRESH: 0.45
SCORE_THRESH: 0.01
NUM_GPUS: 1
MODEL:
TYPE: 'ssd'
PRECISION: 'float16'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
BACKBONE:
TYPE: 'mobilenet_v2.ssdlite'
NORM: 'BN'
FPN:
CONV: 'SepConv2d'
NORM: 'BN'
ACTIVATION: 'ReLU6'
ANCHOR_GENERATOR:
STRIDES: [16, 32, 64, 107, 160, 320]
SIZES: [[48, 100], [100, 150],[150, 202],
[202, 253], [253, 304], [304, 320]]
ASPECT_RATIOS: [[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33]]
SOLVER:
BASE_LR: 0.04
WEIGHT_DECAY: 0.00004
DECAY_STEPS: [50000, 62500]
MAX_STEPS: 75000
SNAPSHOT_EVERY: 1250
SNAPSHOT_PREFIX: 'voc_ssdlite_MobileNetV2'
AUG:
COLOR_JITTER: 0.5
TRAIN:
WEIGHTS: '../data/pretrained/MobileNetV2_in1k_cls300e.pkl'
DATASET: '../data/datasets/voc_trainval0712'
IMS_PER_BATCH: 64
SCALES: [320]
SCALES_RANGE: [0.25, 1.0]
LOADER: 'ssd_train'
NUM_WORKERS: 12
TEST:
DATASET: '../data/datasets/voc_test2007'
JSON_DATASET: '../data/datasets/voc_test2007.json'
EVALUATOR: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [320]
NMS_THRESH: 0.45
SCORE_THRESH: 0.01
NUM_GPUS: 1
MODEL:
TYPE: 'ssd'
PRECISION: 'float16'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
BACKBONE:
TYPE: 'mobilenet_v3_large.ssdlite'
NORM: 'BN'
FPN:
CONV: 'SepConv2d'
NORM: 'BN'
ACTIVATION: 'ReLU'
ANCHOR_GENERATOR:
STRIDES: [16, 32, 64, 107, 160, 320]
SIZES: [[48, 100], [100, 150],[150, 202],
[202, 253], [253, 304], [304, 320]]
ASPECT_RATIOS: [[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33]]
SOLVER:
BASE_LR: 0.04
WEIGHT_DECAY: 0.00004
DECAY_STEPS: [50000, 62500]
MAX_STEPS: 75000
SNAPSHOT_EVERY: 1250
SNAPSHOT_PREFIX: 'voc_ssdlite_MobileNetV3L'
AUG:
COLOR_JITTER: 0.5
TRAIN:
WEIGHTS: '../data/pretrained/MobileNetV3L_in1k_cls600e.pkl'
DATASET: '../data/datasets/voc_trainval0712'
IMS_PER_BATCH: 64
SCALES: [320]
SCALES_RANGE: [0.25, 1.0]
LOADER: 'ssd_train'
NUM_WORKERS: 12
TEST:
DATASET: '../data/datasets/voc_test2007'
JSON_DATASET: '../data/datasets/voc_test2007.json'
EVALUATOR: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [320]
NMS_THRESH: 0.45
SCORE_THRESH: 0.01
#include "nms_op.h"
#include "../utils/detection_utils.h"
#include "../operators/nms_op.h"
#include "../utils/detection.h"
namespace dragon {
template <class Context>
template <typename T>
void NonMaxSuppressionOp<Context>::DoRunWithType() {
int num_selected;
utils::detection::ApplyNMS(
Output(0)->count(),
Output(0)->count(),
auto &X = Input(0), *Y = Output(0);
CHECK(X.ndim() == 2 && X.dim(1) == 5)
<< "\nThe dimensions of boxes should be (num_boxes, 5).";
detection::ApplyNMS(
X.dim(0),
X.dim(0),
iou_threshold_,
Input(0).template mutable_data<T, Context>(),
Output(0)->template mutable_data<int64_t, CPUContext>(),
num_selected,
X.template mutable_data<T, Context>(),
out_indices_,
ctx());
Output(0)->Reshape({num_selected});
}
template <class Context>
void NonMaxSuppressionOp<Context>::RunOnDevice() {
CHECK(Input(0).ndim() == 2 && Input(0).dim(1) == 5)
<< "\nThe dimensions of boxes should be (num_boxes, 5).";
Output(0)->Reshape({Input(0).dim(0)});
DispatchHelper<TensorTypes<float>>::Call(this, Input(0));
Y->template CopyFrom<int64_t>(out_indices_);
}
DEPLOY_CPU_OPERATOR(NonMaxSuppression);
......
......@@ -10,8 +10,8 @@
* ------------------------------------------------------------
*/
#ifndef SEETADET_CXX_OPERATORS_NMS_OP_H_
#define SEETADET_CXX_OPERATORS_NMS_OP_H_
#ifndef DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
#define DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
#include "dragon/core/operator.h"
......@@ -25,15 +25,18 @@ class NonMaxSuppressionOp final : public Operator<Context> {
iou_threshold_(OP_SINGLE_ARG(float, "iou_threshold", 0.5f)) {}
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
void RunOnDevice() override {
DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(0));
}
template <typename T>
void DoRunWithType();
protected:
float iou_threshold_;
vector<int64_t> out_indices_;
};
} // namespace dragon
#endif // SEETADET_CXX_OPERATORS_NMS_OP_H_
#endif // DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
#include <dragon/utils/math_functions.h>
#include "../utils/detection_utils.h"
#include "retinanet_decoder_op.h"
#include "../operators/retinanet_decoder_op.h"
#include "../utils/detection.h"
namespace dragon {
template <class Context>
template <typename T>
void RetinaNetDecoderOp<Context>::DoRunWithType() {
using BT = float; // DType of BBox
using BC = CPUContext; // Context of BBox
int total_proposals = 0;
auto num_images = Input(SCORES).dim(0);
auto num_anchors = Input(SCORES).dim(1);
auto num_classes = Input(SCORES).dim(2);
auto num_scores = num_anchors * num_classes;
auto num_cell_anchors = int64_t(ratios_.size() * scales_.size());
// Generate anchors.
CHECK_EQ(Input(GRID_INFO).dim(0), int64_t(strides_.size()))
<< "\nProvide " << Input(GRID_INFO).dim(0) << " grids for "
<< strides_.size() << " strides.";
cell_anchors_.resize(strides_.size());
vector<detection::GridArgs<int64_t>> grid_args(strides_.size());
for (int i = 0; i < strides_.size(); ++i) {
grid_args[i].stride = strides_[i];
auto& anchors = cell_anchors_[i];
if (int64_t(anchors.size()) == num_cell_anchors * 4) continue;
anchors.resize(num_cell_anchors * 4);
detection::GenerateAnchors(
strides_[i],
int64_t(ratios_.size()),
int64_t(scales_.size()),
ratios_.data(),
scales_.data(),
anchors.data());
}
// Set grid arguments.
auto* grid_info = Input(GRID_INFO).template data<int64_t, CPUContext>();
detection::SetGridArgs(num_anchors, num_cell_anchors, grid_info, grid_args);
auto* batch_scores = Input(SCORES).template data<T, Context>();
auto* batch_deltas = Input(DELTAS).template data<T, BC>();
auto* im_info = Input(IMAGE_INFO).template data<BT, BC>();
auto* all_proposals = Output(0)->template mutable_data<BT, BC>();
// Decode detections.
auto* scores = Input(SCORES).template data<T, Context>();
auto* deltas = Input(DELTAS).template data<T, CPUContext>();
auto* im_info = Input(IM_INFO).template data<float, CPUContext>();
auto* Y = Output(0)->Reshape({num_images * pre_nms_topn_, 7});
auto* dets = Y->template mutable_data<float, CPUContext>();
int64_t size_dets = 0;
for (int im_idx = 0; im_idx < num_images_; ++im_idx) {
BT im_h = im_info[0];
BT im_w = im_info[1];
BT im_scale_h = im_info[2];
BT im_scale_w = im_info[2];
if (Input(IMAGE_INFO).dim(1) == 4) im_scale_w = im_info[3];
CHECK_EQ(strides_.size(), InputSize() - 3)
<< "\nGiven " << strides_.size() << " strides "
<< "and " << InputSize() - 3 << " features";
// Select the top-k candidates as proposals
auto num_boxes = Input(SCORES).dim(1);
auto num_classes = Input(SCORES).dim(2);
utils::detection::SelectProposals(
Input(SCORES).count(1),
score_thr_,
batch_scores + im_idx * Input(SCORES).stride(0),
roi_scores_,
roi_indices_,
for (int batch_ind = 0; batch_ind < num_images; ++batch_ind) {
detection::ImageArgs<int64_t> im_args(im_info + batch_ind * 4);
im_args.batch_ind = batch_ind;
detection::SelectProposals(
num_scores,
pre_nms_topn_,
score_thresh_,
scores + batch_ind * num_scores,
scores_,
indices_,
ctx());
auto num_candidates = (int)roi_scores_.size();
auto num_proposals = std::min(num_candidates, (int)pre_nms_topn_);
utils::detection::ArgPartition(
num_candidates, num_proposals, true, roi_scores_.data(), indices_);
scores_.resize(indices_.size());
for (int i = 0; i < num_proposals; ++i) {
scores_[i] = roi_scores_[indices_[i]];
indices_[i] = roi_indices_[indices_[i]];
}
// Decode proposals via anchors
int stride_offset = 0;
for (int i = 0; i < strides_.size(); i++) {
auto feature_h = Input(i).dim(2);
auto feature_w = Input(i).dim(3);
auto K = feature_h * feature_w;
auto A = int(ratios_.size() * scales_.size());
anchors_.resize((size_t)(A * 4));
utils::detection::GenerateAnchors(
strides_[i],
(int)ratios_.size(),
(int)scales_.size(),
ratios_.data(),
scales_.data(),
anchors_.data());
utils::detection::GetShiftedAnchors(
num_proposals,
auto* offset_dets = dets + size_dets * 7;
auto num_dets = int64_t(indices_.size());
size_dets += num_dets;
for (int i = 0; i < strides_.size(); ++i) {
detection::GetAnchors(
num_dets,
num_cell_anchors,
num_classes,
A,
feature_h,
feature_w,
strides_[i],
stride_offset,
anchors_.data(),
grid_args[i],
cell_anchors_[i].data(),
indices_.data(),
all_proposals);
stride_offset += (A * K);
offset_dets);
}
utils::detection::GenerateDetections(
num_proposals,
num_boxes,
detection::DecodeDetections(
num_dets,
num_anchors,
num_classes,
im_idx,
im_h,
im_w,
im_scale_h,
im_scale_w,
im_args,
scores_.data(),
batch_deltas + im_idx * Input(DELTAS).stride(0),
deltas + batch_ind * Input(DELTAS).stride(0),
indices_.data(),
all_proposals);
total_proposals += num_proposals;
all_proposals += (num_proposals * 7);
im_info += Input(IMAGE_INFO).dim(1);
offset_dets);
}
Output(0)->Reshape({total_proposals, 7});
}
template <class Context>
void RetinaNetDecoderOp<Context>::RunOnDevice() {
num_images_ = Input(0).dim(0);
CHECK_EQ(Input(-1).dim(0), num_images_)
<< "\nExcepted " << num_images_ << " groups info, got "
<< Input(-1).dim(0) << ".";
Output(0)->Reshape({num_images_ * pre_nms_topn_, 7});
DispatchHelper<TensorTypes<float>>::Call(this, Input(SCORES));
// Shrink to the correct dimensions.
Y->Reshape({size_dets, 7});
}
DEPLOY_CPU_OPERATOR(RetinaNetDecoder);
......@@ -109,7 +88,7 @@ DEPLOY_CPU_OPERATOR(RetinaNetDecoder);
DEPLOY_CUDA_OPERATOR(RetinaNetDecoder);
#endif
OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX);
OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(4).NumOutputs(1);
NO_GRADIENT(RetinaNetDecoder);
......
......@@ -10,8 +10,8 @@
* ------------------------------------------------------------
*/
#ifndef SEETADET_CXX_OPERATORS_RETINANET_DECODER_OP_H_
#define SEETADET_CXX_OPERATORS_RETINANET_DECODER_OP_H_
#ifndef DRAGON_EXTENSION_OPERATORS_RETINANET_DECODER_OP_H_
#define DRAGON_EXTENSION_OPERATORS_RETINANET_DECODER_OP_H_
#include "dragon/core/operator.h"
......@@ -26,24 +26,29 @@ class RetinaNetDecoderOp final : public Operator<Context> {
ratios_(OP_REPEATED_ARG(float, "ratios")),
scales_(OP_REPEATED_ARG(float, "scales")),
pre_nms_topn_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
score_thr_(OP_SINGLE_ARG(float, "score_thresh", 0.05f)) {}
score_thresh_(OP_SINGLE_ARG(float, "score_thresh", 0.05f)) {}
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
void RunOnDevice() override {
DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(SCORES));
}
template <typename T>
void DoRunWithType();
enum INPUT_TAGS { SCORES = -3, DELTAS = -2, IMAGE_INFO = -1 };
enum INPUT_TAGS { SCORES = 0, DELTAS = 1, IM_INFO = 2, GRID_INFO = 3 };
protected:
float score_thr_;
vec64_t strides_, indices_, roi_indices_;
vector<float> ratios_, scales_, anchors_;
vector<float> scores_, roi_scores_;
int64_t num_images_, pre_nms_topn_;
float score_thresh_;
vector<int64_t> strides_;
vector<float> ratios_, scales_;
int64_t pre_nms_topn_;
vector<float> scores_;
vector<int64_t> indices_;
vector<vector<float>> cell_anchors_;
};
} // namespace dragon
#endif // SEETADET_CXX_OPERATORS_RETINANET_DECODER_OP_H_
#endif // DRAGON_EXTENSION_PERATORS_RETINANET_DECODER_OP_H_
#include <dragon/utils/math_functions.h>
#include "../utils/detection_utils.h"
#include "rpn_decoder_op.h"
#include "../operators/rpn_decoder_op.h"
#include "../utils/detection.h"
namespace dragon {
template <class Context>
template <typename T>
void RPNDecoderOp<Context>::DoRunWithType() {
using BT = float; // DType of BBox
using BC = CPUContext; // Context of BBox
int feat_h, feat_w, K, A;
int total_rois = 0, num_rois;
int num_candidates, num_proposals;
auto* batch_scores = Input(SCORES).template data<T, BC>();
auto* batch_deltas = Input(DELTAS).template data<T, BC>();
auto* im_info = Input(IMAGE_INFO).template data<BT, BC>();
auto* all_rois = Output(0)->template mutable_data<BT, BC>();
auto num_images = Input(SCORES).dim(0);
auto num_anchors = Input(SCORES).dim(1);
auto num_cell_anchors = int64_t(ratios_.size() * scales_.size());
// Generate anchors.
CHECK_EQ(Input(GRID_INFO).dim(0), int64_t(strides_.size()))
<< "\nProvide " << Input(GRID_INFO).dim(0) << " grids for "
<< strides_.size() << " strides.";
cell_anchors_.resize(strides_.size());
vector<detection::GridArgs<int64_t>> grid_args(strides_.size());
for (int i = 0; i < strides_.size(); ++i) {
grid_args[i].stride = strides_[i];
auto& anchors = cell_anchors_[i];
if (int64_t(anchors.size()) == num_cell_anchors * 4) continue;
anchors.resize(num_cell_anchors * 4);
detection::GenerateAnchors(
strides_[i],
int64_t(ratios_.size()),
int64_t(scales_.size()),
ratios_.data(),
scales_.data(),
anchors.data());
}
for (int im_idx = 0; im_idx < num_images_; ++im_idx) {
const BT im_h = im_info[0];
const BT im_w = im_info[1];
auto* scores = batch_scores + im_idx * Input(SCORES).stride(0);
auto* deltas = batch_deltas + im_idx * Input(DELTAS).stride(0);
CHECK_EQ(strides_.size(), InputSize() - 3)
<< "\nGiven " << strides_.size() << " strides "
<< "and " << InputSize() - 3 << " feature inputs";
CHECK_EQ(strides_.size(), scales_.size())
<< "\nGiven " << strides_.size() << " strides "
<< "and " << scales_.size() << " scales";
// Select the top-k candidates as proposals
num_candidates = Input(SCORES).dim(1);
num_proposals = std::min(num_candidates, (int)pre_nms_top_n_);
utils::math::ArgPartition(
num_candidates, num_proposals, true, scores, indices_);
// Decode the candidates
int stride_offset = 0;
proposals_.Reshape({num_proposals, 5});
auto* proposals = proposals_.template mutable_data<BT, BC>();
for (int i = 0; i < strides_.size(); i++) {
feat_h = Input(i).dim(2);
feat_w = Input(i).dim(3);
K = feat_h * feat_w;
A = (int)ratios_.size();
anchors_.resize((size_t)(A * 4));
utils::detection::GenerateAnchors(
strides_[i],
(int)ratios_.size(),
1,
ratios_.data(),
scales_.data(),
anchors_.data());
utils::detection::GetShiftedAnchors(
// Set grid arguments.
auto* grid_info = Input(GRID_INFO).template data<int64_t, CPUContext>();
detection::SetGridArgs(num_anchors, num_cell_anchors, grid_info, grid_args);
// Decode proposals.
auto* scores = Input(SCORES).template data<T, CPUContext>();
auto* deltas = Input(DELTAS).template data<T, CPUContext>();
auto* im_info = Input(IM_INFO).template data<float, CPUContext>();
auto* Y = Output("Y")->Reshape({num_images * pre_nms_topn_, 5});
auto* proposals = Y->template mutable_data<float, CPUContext>();
vector<int64_t> size_proposals({0});
for (int batch_ind = 0; batch_ind < num_images; ++batch_ind) {
detection::ImageArgs<int64_t> im_args(im_info + batch_ind * 4);
im_args.batch_ind = batch_ind;
detection::SelectProposals(
num_anchors,
pre_nms_topn_,
score_thresh_,
scores + batch_ind * num_anchors,
scores_,
indices_,
(CPUContext*)nullptr); // Faster.
auto* offset_proposals = proposals + size_proposals.back() * 5;
auto num_proposals = int64_t(indices_.size());
size_proposals.push_back(size_proposals.back() + num_proposals);
for (int i = 0; i < strides_.size(); ++i) {
detection::GetAnchors(
num_proposals,
A,
feat_h,
feat_w,
strides_[i],
stride_offset,
anchors_.data(),
num_cell_anchors,
grid_args[i],
cell_anchors_[i].data(),
indices_.data(),
proposals);
stride_offset += (A * K);
offset_proposals);
}
utils::detection::GenerateProposals(
num_candidates,
num_proposals,
im_h,
im_w,
scores,
deltas,
&indices_[0],
proposals);
// Sort, NMS and Retrieve
utils::detection::SortProposals(
0, num_proposals - 1, num_proposals, proposals);
utils::detection::ApplyNMS(
detection::DecodeProposals(
num_proposals,
post_nms_top_n_,
nms_thr_,
proposals_.template mutable_data<BT, Context>(),
roi_indices_.data(),
num_rois,
ctx());
utils::detection::RetrieveRoIs(
num_rois, im_idx, proposals, roi_indices_.data(), all_rois);
total_rois += num_rois;
all_rois += (num_rois * 5);
im_info += Input(IMAGE_INFO).dim(1);
num_anchors,
im_args,
scores_.data(),
deltas + batch_ind * Input(DELTAS).stride(0),
indices_.data(),
offset_proposals);
detection::SortBoxes<T, detection::Box5d<T>>(
num_proposals, offset_proposals);
}
Output(0)->Reshape({total_rois, 5});
// Distribute rois into K bins
if (OutputSize() > 1) {
CHECK_EQ(max_level_ - min_level_ + 1, OutputSize())
<< "\nExcepted " << OutputSize() << " outputs for levels "
<< "between [" << min_level_ << ", " << max_level_ << "].";
vector<BT*> ys(OutputSize());
vector<vec64_t> bins(OutputSize());
Tensor RoIs;
RoIs.ReshapeLike(*Output(0));
auto* rois = RoIs.template mutable_data<BT, BC>();
ctx()->template Copy<BT, BC, BC>(
Output(0)->count(), rois, Output(0)->template data<BT, BC>());
utils::detection::CollectRoIs(
total_rois,
min_level_,
max_level_,
canonical_level_,
canonical_scale_,
rois,
bins);
for (int i = 0; i < OutputSize(); i++) {
Output(i)->Reshape({std::max((int)bins[i].size(), 1), 5});
ys[i] = Output(i)->template mutable_data<BT, BC>();
// Apply NMS.
auto* proposals_v2 = Y->template data<float, Context>();
int64_t size_rois = 0;
for (int batch_ind = 0; batch_ind < num_images; ++batch_ind) {
auto offset = size_proposals[batch_ind];
auto num_proposals = size_proposals[batch_ind + 1] - offset;
detection::ApplyNMS(
num_proposals,
post_nms_topn_,
nms_thresh_,
proposals_v2 + offset * 5,
nms_indices_,
ctx());
num_proposals = int64_t(nms_indices_.size());
for (int i = 0; i < num_proposals; ++i) {
scores_[size_rois] = batch_ind;
indices_[size_rois++] = nms_indices_[i] + offset;
}
utils::detection::DistributeRoIs(bins, rois, ys);
}
}
template <class Context>
void RPNDecoderOp<Context>::RunOnDevice() {
num_images_ = Input(0).dim(0);
CHECK_EQ(Input(IMAGE_INFO).dim(0), num_images_)
<< "\nExcepted " << num_images_ << " groups info, got "
<< Input(IMAGE_INFO).dim(0) << ".";
roi_indices_.resize(post_nms_top_n_);
Output(0)->Reshape({num_images_ * post_nms_top_n_, 5});
DispatchHelper<TensorTypes<float>>::Call(this, Input(SCORES));
// Apply Histogram.
detection::ApplyHistogram(
size_rois,
min_level_,
max_level_,
canonical_level_,
canonical_scale_,
proposals,
scores_.data(),
indices_.data(),
output_rois_);
// Copy to outputs.
for (int i = 0; i < OutputSize(); ++i) {
const auto& rois = output_rois_[i];
vector<int64_t> dims({int64_t(rois.size()) / 5, 5});
auto* Yi = Output(i)->Reshape(dims);
std::memcpy(
Yi->template mutable_data<T, CPUContext>(),
rois.data(),
sizeof(T) * rois.size());
}
}
DEPLOY_CPU_OPERATOR(RPNDecoder);
......@@ -143,7 +126,7 @@ DEPLOY_CPU_OPERATOR(RPNDecoder);
DEPLOY_CUDA_OPERATOR(RPNDecoder);
#endif
OPERATOR_SCHEMA(RPNDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX);
OPERATOR_SCHEMA(RPNDecoder).NumInputs(4).NumOutputs(1, INT_MAX);
NO_GRADIENT(RPNDecoder);
......
......@@ -10,8 +10,8 @@
* ------------------------------------------------------------
*/
#ifndef SEETADET_CXX_OPERATORS_RPN_DECODER_OP_H_
#define SEETADET_CXX_OPERATORS_RPN_DECODER_OP_H_
#ifndef DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
#define DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
#include "dragon/core/operator.h"
......@@ -25,32 +25,39 @@ class RPNDecoderOp final : public Operator<Context> {
strides_(OP_REPEATED_ARG(int64_t, "strides")),
ratios_(OP_REPEATED_ARG(float, "ratios")),
scales_(OP_REPEATED_ARG(float, "scales")),
pre_nms_top_n_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
post_nms_top_n_(OP_SINGLE_ARG(int64_t, "post_nms_top_n", 1000)),
nms_thr_(OP_SINGLE_ARG(float, "nms_thresh", 0.7f)),
pre_nms_topn_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
post_nms_topn_(OP_SINGLE_ARG(int64_t, "post_nms_top_n", 1000)),
nms_thresh_(OP_SINGLE_ARG(float, "nms_thresh", 0.7f)),
score_thresh_(OP_SINGLE_ARG(float, "score_thresh", 0.f)),
min_level_(OP_SINGLE_ARG(int64_t, "min_level", 2)),
max_level_(OP_SINGLE_ARG(int64_t, "max_level", 5)),
canonical_level_(OP_SINGLE_ARG(int64_t, "canonical_level", 4)),
canonical_scale_(OP_SINGLE_ARG(int64_t, "canonical_scale", 224)) {}
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
void RunOnDevice() override {
DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(SCORES));
}
template <typename T>
void DoRunWithType();
enum INPUT_TAGS { SCORES = -3, DELTAS = -2, IMAGE_INFO = -1 };
enum INPUT_TAGS { SCORES = 0, DELTAS = 1, IM_INFO = 2, GRID_INFO = 3 };
protected:
float nms_thr_;
vec64_t strides_, indices_, roi_indices_;
vector<float> ratios_, scales_, scores_, anchors_;
int64_t pre_nms_top_n_, post_nms_top_n_;
int64_t num_images_, min_level_, max_level_;
float nms_thresh_, score_thresh_;
vector<int64_t> strides_;
vector<float> ratios_, scales_;
int64_t min_level_, max_level_;
int64_t pre_nms_topn_, post_nms_topn_;
int64_t canonical_level_, canonical_scale_;
Tensor proposals_;
vector<float> scores_;
vector<int64_t> indices_, nms_indices_;
vector<vector<float>> cell_anchors_;
vector<vector<float>> output_rois_;
};
} // namespace dragon
#endif // SEETADET_CXX_OPERATORS_RPN_DECODER_OP_H_
#endif // DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
......@@ -8,7 +8,7 @@
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build cxx sources."""
"""Build cpp extensions."""
from __future__ import absolute_import
from __future__ import division
......@@ -16,7 +16,7 @@ from __future__ import print_function
import glob
from dragon.tools import cpp_extension
from dragon.utils import cpp_extension
from setuptools import setup
Extension = cpp_extension.CppExtension
......@@ -32,23 +32,18 @@ def find_sources(*dirs):
sources = []
for path in dirs:
for ext_suffix in ext_suffixes:
sources += glob.glob(
path + '/*' + ext_suffix,
recursive=True,
)
sources += glob.glob(path + '/*' + ext_suffix, recursive=True)
return sources
ext_modules = [
Extension(
name='install.lib.modules._C',
name='seetadet.ops._C',
sources=find_sources('**'),
define_macros=[('THRUST_IGNORE_CUB_VERSION_CHECK', None)],
),
]
setup(
name='SeetaDet',
ext_modules=ext_modules,
cmdclass={'build_ext': cpp_extension.BuildExtension},
)
setup(name='seetadet',
ext_modules=ext_modules,
cmdclass={'build_ext': cpp_extension.BuildExtension})
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_H_
#include "../utils/detection/anchors.h"
#include "../utils/detection/bbox.h"
#include "../utils/detection/nms.h"
#include "../utils/detection/proposals.h"
#include "../utils/detection/types.h"
#endif // DRAGON_EXTENSION_UTILS_DETECTION_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
#include "../../utils/detection/types.h"
namespace dragon {
namespace detection {
/*!
* Anchor Functions.
*/
template <typename IndexT>
inline void SetGridArgs(
const int num_anchors,
const int num_cell_anchors,
const IndexT* grid_info,
vector<GridArgs<IndexT>>& grid_args) {
IndexT grid_offset = 0;
for (int i = 0; i < grid_args.size(); ++i, grid_info += 2) {
auto& args = grid_args[i];
args.h = grid_info[0];
args.w = grid_info[1];
args.offset = grid_offset;
grid_offset += num_cell_anchors * args.h * args.w;
}
std::stringstream ss;
if (grid_offset != num_anchors) {
ss << "Mismatched number of anchors. (Excepted ";
ss << num_anchors << ", Got " << grid_offset << ")";
for (int i = 0; i < grid_args.size(); ++i) {
ss << "\nGrid #" << i << ": "
<< "A=" << num_cell_anchors << ", H=" << grid_args[i].h
<< ", W=" << grid_args[i].w << "\n";
}
}
if (!ss.str().empty()) LOG(FATAL) << ss.str();
}
template <typename T>
inline void GenerateAnchors(
const int stride,
const int num_ratios,
const int num_scales,
const T* ratios,
const T* scales,
T* anchors) {
T* offset_anchors = anchors;
const T area = T(stride * stride);
const T ctr = T(0.5) * T(stride - 1);
for (int i = 0; i < num_ratios; ++i) {
const T ratio_w = std::round(std::sqrt(area / ratios[i]));
const T ratio_h = std::round(ratio_w * ratios[i]);
for (int j = 0; j < num_scales; ++j) {
const T w_half = T(0.5) * (ratio_w * scales[j] - T(1));
const T h_half = T(0.5) * (ratio_h * scales[j] - T(1));
offset_anchors[0] = ctr - w_half;
offset_anchors[1] = ctr - h_half;
offset_anchors[2] = ctr + w_half;
offset_anchors[3] = ctr + h_half;
offset_anchors += 4;
}
}
}
template <typename T>
inline void GetAnchors(
const int num_anchors,
const int num_cell_anchors,
const GridArgs<int64_t>& args,
const T* cell_anchors,
const int64_t* indices,
T* anchors) {
const int64_t index_min = args.offset;
const int64_t index_max = num_cell_anchors * args.h * args.w;
for (int i = 0; i < num_anchors; ++i) {
auto index = indices[i] - index_min;
if (index >= 0 && index < index_max) {
const auto w = index % args.w;
index /= args.w;
const auto h = index % args.h;
index /= args.h;
const auto shift_x = T(w * args.stride);
const auto shift_y = T(h * args.stride);
auto* offset_anchors = anchors + i * 5;
const auto* offset_cell_anchors = cell_anchors + index * 4;
offset_anchors[0] = shift_x + offset_cell_anchors[0];
offset_anchors[1] = shift_y + offset_cell_anchors[1];
offset_anchors[2] = shift_x + offset_cell_anchors[2];
offset_anchors[3] = shift_y + offset_cell_anchors[3];
}
}
}
template <typename T>
inline void GetAnchors(
const int num_anchors,
const int num_cell_anchors,
const int num_classes,
const GridArgs<int64_t>& args,
const T* cell_anchors,
const int64_t* indices,
T* anchors) {
const int64_t index_min = num_classes * args.offset;
const int64_t index_max = num_classes * (num_cell_anchors * args.h * args.w);
for (int i = 0; i < num_anchors; ++i) {
auto index = indices[i] - index_min;
if (index >= 0 && index < index_max) {
index /= num_classes;
const auto w = index % args.w;
index /= args.w;
const auto h = index % args.h;
index /= args.h;
const auto shift_x = T(w * args.stride);
const auto shift_y = T(h * args.stride);
auto* offset_anchors = anchors + i * 7 + 1;
const auto* offset_cell_anchors = cell_anchors + index * 4;
offset_anchors[0] = shift_x + offset_cell_anchors[0];
offset_anchors[1] = shift_y + offset_cell_anchors[1];
offset_anchors[2] = shift_x + offset_cell_anchors[2];
offset_anchors[3] = shift_y + offset_cell_anchors[3];
}
}
}
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
#include "../../utils/detection/types.h"
#if defined(__CUDACC__)
#define HOSTDEVICE_DECL inline __host__ __device__
#else
#define HOSTDEVICE_DECL inline
#endif
namespace dragon {
namespace detection {
/*
* BBox Functions.
*/
template <typename T, class BoxT>
inline void SortBoxes(const int N, T* data, bool descend = true) {
auto* boxes = reinterpret_cast<BoxT*>(data);
std::sort(boxes, boxes + N, [descend](BoxT lhs, BoxT rhs) {
return descend ? (lhs.score > rhs.score) : (lhs.score < rhs.score);
});
}
/*
* BBox Utilities.
*/
namespace utils {
template <typename T>
HOSTDEVICE_DECL bool CheckIoU(const T thresh, const T* a, const T* b) {
#if defined(__CUDACC__)
const T x1 = max(a[0], b[0]);
const T y1 = max(a[1], b[1]);
const T x2 = min(a[2], b[2]);
const T y2 = min(a[3], b[3]);
const T width = max(T(0), x2 - x1 + T(1));
const T height = max(T(0), y2 - y1 + T(1));
#else
const T x1 = std::max(a[0], b[0]);
const T y1 = std::max(a[1], b[1]);
const T x2 = std::min(a[2], b[2]);
const T y2 = std::min(a[3], b[3]);
const T width = std::max(T(0), x2 - x1 + T(1));
const T height = std::max(T(0), y2 - y1 + T(1));
#endif
const T inter = width * height;
const T Sa = (a[2] - a[0] + T(1)) * (a[3] - a[1] + T(1));
const T Sb = (b[2] - b[0] + T(1)) * (b[3] - b[1] + T(1));
return inter > thresh * (Sa + Sb - inter);
}
template <typename T>
inline void BBoxTransform(
const T dx,
const T dy,
const T dw,
const T dh,
const T im_w,
const T im_h,
const T im_scale_h,
const T im_scale_w,
T* bbox) {
const T w = bbox[2] - bbox[0] + 1;
const T h = bbox[3] - bbox[1] + 1;
const T ctr_x = bbox[0] + T(0.5) * w;
const T ctr_y = bbox[1] + T(0.5) * h;
const T pred_ctr_x = dx * w + ctr_x;
const T pred_ctr_y = dy * h + ctr_y;
const T pred_w = std::exp(dw) * w;
const T pred_h = std::exp(dh) * h;
const T x1 = pred_ctr_x - T(0.5) * pred_w;
const T y1 = pred_ctr_y - T(0.5) * pred_h;
const T x2 = pred_ctr_x + T(0.5) * pred_w;
const T y2 = pred_ctr_y + T(0.5) * pred_h;
bbox[0] = std::max(T(0), std::min(x1, im_w - T(1))) / im_scale_w;
bbox[1] = std::max(T(0), std::min(y1, im_h - T(1))) / im_scale_h;
bbox[2] = std::max(T(0), std::min(x2, im_w - T(1))) / im_scale_w;
bbox[3] = std::max(T(0), std::min(y2, im_h - T(1))) / im_scale_h;
}
template <typename T>
inline int GetBBoxLevel(
const int lvl_min,
const int lvl_max,
const int lvl0,
const int s0,
T* bbox) {
const T w = bbox[2] - bbox[0] + 1;
const T h = bbox[3] - bbox[1] + 1;
const T s = std::sqrt(w * h);
const int lvl = lvl0 + std::log2(s / s0 + T(1e-6));
return std::min(std::max(lvl, lvl_min), lvl_max);
}
} // namespace utils
} // namespace detection
} // namespace dragon
#undef HOSTDEVICE_DECL
#endif // DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
#include <dragon/core/common.h>
namespace dragon {
namespace detection {
template <typename MapT>
class KeyValueMapIterator
: public std::iterator<std::input_iterator_tag, MapT> {
public:
typedef KeyValueMapIterator self_type;
typedef ptrdiff_t difference_type;
typedef MapT value_type;
typedef MapT& reference;
KeyValueMapIterator(
typename MapT::key_type* key_ptr,
typename MapT::value_type* value_ptr)
: key_ptr_(key_ptr), value_ptr_(value_ptr) {}
self_type operator++(int) {
self_type ret = *this;
key_ptr_++;
value_ptr_++;
return ret;
}
self_type operator++() {
key_ptr_++;
value_ptr_++;
return *this;
}
self_type operator--() {
key_ptr_--;
value_ptr_--;
return *this;
}
self_type operator--(int) {
self_type ret = *this;
key_ptr_--;
value_ptr_--;
return ret;
}
reference operator*() const {
if (map_.key_ptr != key_ptr_) {
map_.key_ptr = key_ptr_;
map_.value_ptr = value_ptr_;
}
return map_;
}
self_type operator+(difference_type n) const {
return self_type(key_ptr_ + n, value_ptr_ + n);
}
self_type& operator+=(difference_type n) {
key_ptr_ += n;
value_ptr_ += n;
return *this;
}
self_type operator-(difference_type n) const {
return self_type(key_ptr_ - n, value_ptr_ - n);
}
self_type& operator-=(difference_type n) {
key_ptr_ -= n;
value_ptr_ -= n;
return *this;
}
difference_type operator-(self_type other) const {
return key_ptr_ - other.key_ptr_;
}
bool operator<(const self_type& rhs) const {
return key_ptr_ < rhs.key_ptr_;
}
bool operator<=(const self_type& rhs) const {
return key_ptr_ <= rhs.key_ptr_;
}
bool operator==(const self_type& rhs) const {
return key_ptr_ == rhs.key_ptr_;
}
bool operator!=(const self_type& rhs) const {
return key_ptr_ != rhs.key_ptr_;
}
private:
mutable MapT map_;
typename MapT::key_type* key_ptr_;
typename MapT::value_type* value_ptr_;
};
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
#include <dragon/core/context.h>
#include "../../utils/detection/bbox.h"
#include "../../utils/detection/nms.h"
namespace dragon {
namespace detection {
template <>
void ApplyNMS<float, CPUContext>(
const int N,
const int K,
const float thresh,
const float* boxes,
vector<int64_t>& indices,
CPUContext* ctx) {
int num_selected = 0;
indices.resize(K);
vector<char> is_dead(N, 0);
for (int i = 0; i < N; ++i) {
if (is_dead[i]) continue;
indices[num_selected++] = i;
if (num_selected >= K) break;
for (int j = i + 1; j < N; ++j) {
if (is_dead[j]) continue;
if (!utils::CheckIoU(thresh, &boxes[i * 5], &boxes[j * 5])) continue;
is_dead[j] = 1;
}
}
indices.resize(num_selected);
}
} // namespace detection
} // namespace dragon
#include <dragon/core/context_cuda.h>
#include <dragon/core/workspace.h>
#include "../../utils/detection/bbox.h"
#include "../../utils/detection/nms.h"
#include "../../utils/detection/utils.h"
namespace dragon {
namespace detection {
namespace {
#define NUM_THREADS 64
template <typename T>
__global__ void _NonMaxSuppression(
const int N,
const T thresh,
const T* boxes,
uint64_t* mask) {
const int row_start = blockIdx.y;
const int col_start = blockIdx.x;
if (row_start > col_start) return;
const int row_size = min(N - row_start * NUM_THREADS, NUM_THREADS);
const int col_size = min(N - col_start * NUM_THREADS, NUM_THREADS);
__shared__ T block_boxes[NUM_THREADS * 4];
if (threadIdx.x < col_size) {
auto* offset_block_boxes = block_boxes + threadIdx.x * 4;
auto* offset_boxes = boxes + (col_start * NUM_THREADS + threadIdx.x) * 5;
#pragma unroll
for (int i = 0; i < 4; ++i) {
*(offset_block_boxes++) = *(offset_boxes++);
}
}
__syncthreads();
if (threadIdx.x < row_size) {
const int index = row_start * NUM_THREADS + threadIdx.x;
const T* offset_boxes = boxes + index * 5;
unsigned long long val = 0;
const int start = (row_start == col_start) ? (threadIdx.x + 1) : 0;
for (int i = start; i < col_size; ++i) {
if (utils::CheckIoU(thresh, offset_boxes, block_boxes + i * 4)) {
val |= 1ULL << i;
}
}
mask[index * gridDim.x + col_start] = val;
}
}
} // namespace
template <>
void ApplyNMS<float, CUDAContext>(
const int N,
const int K,
const float thresh,
const float* boxes,
vector<int64_t>& indices,
CUDAContext* ctx) {
const auto num_blocks = utils::DivUp(N, NUM_THREADS);
vector<uint64_t> mask_host(N * num_blocks);
auto* mask_dev = (uint64_t*)ctx->workspace()->data<CUDAContext>(
mask_host.size() * sizeof(uint64_t), "BufferKernel");
_NonMaxSuppression<<<
dim3(num_blocks, num_blocks),
NUM_THREADS,
0,
ctx->cuda_stream()>>>(N, thresh, boxes, mask_dev);
CUDA_CHECK(cudaMemcpyAsync(
mask_host.data(),
mask_dev,
mask_host.size() * sizeof(uint64_t),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
ctx->FinishDeviceComputation();
vector<uint64_t> is_dead(num_blocks);
memset(&is_dead[0], 0, sizeof(uint64_t) * num_blocks);
int num_selected = 0;
indices.resize(K);
for (int i = 0; i < N; ++i) {
const int nblock = i / NUM_THREADS;
const int inblock = i % NUM_THREADS;
if (!(is_dead[nblock] & (1ULL << inblock))) {
indices[num_selected++] = i;
if (num_selected >= K) break;
auto* offset_mask = &mask_host[0] + i * num_blocks;
for (int j = nblock; j < num_blocks; ++j) {
is_dead[j] |= offset_mask[j];
}
}
}
indices.resize(num_selected);
}
} // namespace detection
} // namespace dragon
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
#include "../../utils/detection/types.h"
namespace dragon {
namespace detection {
template <typename T, class Context>
void ApplyNMS(
const int N,
const int K,
const T thresh,
const T* boxes,
vector<int64_t>& indices,
Context* ctx);
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
#include <dragon/core/context.h>
#include "../../utils/detection/proposals.h"
namespace dragon {
namespace detection {
namespace {
template <typename KeyT, typename ValueT>
inline void
ArgPartition(const int N, const int K, const ValueT* values, KeyT* keys) {
std::nth_element(keys, keys + K, keys + N, [&values](KeyT lhs, KeyT rhs) {
return values[lhs] > values[rhs];
});
}
} // namespace
template <>
void SelectProposals<float, CPUContext>(
const int N,
const int K,
const float thresh,
const float* scores,
vector<float>& out_scores,
vector<int64_t>& out_indices,
CPUContext* ctx) {
int num_selected = 0;
out_indices.resize(N);
if (thresh > 0.f) {
for (int i = 0; i < N; ++i) {
if (scores[i] > thresh) {
out_indices[num_selected++] = i;
}
}
} else {
num_selected = N;
std::iota(out_indices.begin(), out_indices.end(), 0);
}
if (num_selected > K) {
ArgPartition(num_selected, K, scores, out_indices.data());
out_scores.resize(K);
out_indices.resize(K);
for (int i = 0; i < K; ++i) {
out_scores[i] = scores[out_indices[i]];
}
} else {
out_scores.resize(num_selected);
out_indices.resize(num_selected);
for (int i = 0; i < num_selected; ++i) {
out_scores[i] = scores[out_indices[i]];
}
}
}
} // namespace detection
} // namespace dragon
#include <dragon/core/context_cuda.h>
#include <dragon/core/workspace.h>
#include <dragon/utils/device/common_thrust.h>
#include "../../utils/detection/iterator.h"
#include "../../utils/detection/proposals.h"
namespace dragon {
namespace detection {
namespace {
template <typename KeyT, typename ValueT>
struct ThresholdFunctor {
ThresholdFunctor(ValueT thresh) : thresh_(thresh) {}
inline __device__ bool operator()(
const thrust::tuple<KeyT, ValueT>& kv) const {
return thrust::get<1>(kv) > thresh_;
}
ValueT thresh_;
};
template <typename IterT>
inline void ArgPartition(const int N, const int K, IterT data) {
std::nth_element(
data,
data + K,
data + N,
[](const typename IterT::value_type& lhs,
const typename IterT::value_type& rhs) {
return *lhs.value_ptr > *rhs.value_ptr;
});
}
} // namespace
template <>
void SelectProposals<float, CUDAContext>(
const int N,
const int K,
const float thresh,
const float* scores,
vector<float>& out_scores,
vector<int64_t>& out_indices,
CUDAContext* ctx) {
int num_selected = N;
int64_t* indices = nullptr;
if (thresh > 0.f) {
indices = ctx->workspace()->data<int64_t, CUDAContext>(N, "BufferKernel");
auto policy = thrust::cuda::par.on(ctx->cuda_stream());
auto functor = ThresholdFunctor<int64_t, float>(thresh);
thrust::sequence(policy, indices, indices + N);
auto kv = thrust::make_tuple(indices, const_cast<float*>(scores));
auto first = thrust::make_zip_iterator(kv);
auto last = thrust::partition(policy, first, first + N, functor);
num_selected = last - first;
}
out_scores.resize(num_selected);
out_indices.resize(num_selected);
CUDA_CHECK(cudaMemcpyAsync(
out_scores.data(),
scores,
num_selected * sizeof(float),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
if (thresh > 0.f) {
CUDA_CHECK(cudaMemcpyAsync(
out_indices.data(),
indices,
num_selected * sizeof(int64_t),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
} else {
std::iota(out_indices.begin(), out_indices.end(), 0);
}
ctx->FinishDeviceComputation();
if (num_selected > K) {
auto iter = KeyValueMapIterator<KeyValueMap<int64_t, float>>(
out_indices.data(), out_scores.data());
ArgPartition(num_selected, K, iter);
out_scores.resize(K);
out_indices.resize(K);
}
}
} // namespace detection
} // namespace dragon
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
#include "../../utils/detection/bbox.h"
#include "../../utils/detection/types.h"
namespace dragon {
namespace detection {
template <typename T, class Context>
void SelectProposals(
const int N,
const int K,
const float thresh,
const T* input_scores,
vector<T>& output_scores,
vector<int64_t>& output_indices,
Context* ctx);
template <typename T>
void DecodeProposals(
const int num_proposals,
const int num_anchors,
const ImageArgs<int64_t>& im_args,
const T* scores,
const T* deltas,
const int64_t* indices,
T* proposals) {
T* offset_proposals = proposals;
const T* offset_dx = deltas;
const T* offset_dy = deltas + num_anchors;
const T* offset_dw = deltas + num_anchors * 2;
const T* offset_dh = deltas + num_anchors * 3;
for (int i = 0; i < num_proposals; ++i) {
const auto index = indices[i];
utils::BBoxTransform(
offset_dx[index],
offset_dy[index],
offset_dw[index],
offset_dh[index],
T(im_args.w),
T(im_args.h),
T(1),
T(1),
offset_proposals);
offset_proposals[4] = scores[i];
offset_proposals += 5;
}
}
template <typename T>
void DecodeDetections(
const int num_dets,
const int num_anchors,
const int num_classes,
const ImageArgs<int64_t>& im_args,
const T* scores,
const T* deltas,
const int64_t* indices,
T* dets) {
T* offset_dets = dets;
const T* offset_dx = deltas;
const T* offset_dy = deltas + num_anchors;
const T* offset_dw = deltas + num_anchors * 2;
const T* offset_dh = deltas + num_anchors * 3;
for (int i = 0; i < num_dets; ++i) {
const auto index = indices[i] / num_classes;
utils::BBoxTransform(
offset_dx[index],
offset_dy[index],
offset_dw[index],
offset_dh[index],
T(im_args.w),
T(im_args.h),
T(im_args.scale_h),
T(im_args.scale_w),
offset_dets + 1);
offset_dets[0] = T(im_args.batch_ind);
offset_dets[5] = scores[i];
offset_dets[6] = T(indices[i] % num_classes + 1);
offset_dets += 7;
}
}
template <typename T>
inline void ApplyHistogram(
const int N,
const int lvl_min,
const int lvl_max,
const int lvl0,
const int s0,
const T* boxes,
const T* batch_indices,
const int64_t* box_indices,
vector<vector<T>>& output_rois) {
vector<int> bin_indices(N);
vector<int> bin_count(lvl_max - lvl_min + 1, 0);
for (int i = 0; i < N; ++i) {
const T* offset_boxes = boxes + box_indices[i] * 5;
auto lvl = utils::GetBBoxLevel(lvl_min, lvl_max, lvl0, s0, offset_boxes);
bin_indices[i] = lvl - lvl_min;
bin_count[lvl - lvl_min]++;
}
output_rois.resize(lvl_max - lvl_min + 1);
for (int i = 0; i < output_rois.size(); ++i) {
auto& rois = output_rois[i];
rois.resize(std::max(bin_count[i], 1) * 5);
if (bin_count[i] == 0) rois[0] = T(-1); // Ignored.
}
for (int i = 0; i < N; ++i) {
const T* offset_boxes = boxes + box_indices[i] * 5;
const auto bin_index = bin_indices[i];
const auto roi_index = --bin_count[bin_index];
auto& rois = output_rois[bin_index];
T* offset_rois = rois.data() + roi_index * 5;
offset_rois[0] = batch_indices[i];
offset_rois[1] = offset_boxes[0];
offset_rois[2] = offset_boxes[1];
offset_rois[3] = offset_boxes[2];
offset_rois[4] = offset_boxes[3];
}
}
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
#include <dragon/core/common.h>
namespace dragon {
namespace detection {
template <typename T>
struct Box4d {
T x1, y1, x2, y2;
};
template <typename T>
struct Box5d {
T x1, y1, x2, y2, score;
};
template <typename IndexT>
struct ImageArgs {
ImageArgs(const float* im_info) {
h = im_info[0], w = im_info[1];
scale_h = im_info[2], scale_w = im_info[3];
}
IndexT batch_ind, h, w;
float scale_h, scale_w;
};
template <typename IndexT>
struct GridArgs {
IndexT h, w, stride, offset;
};
template <typename KeyT, typename ValueT>
struct KeyValueMap {
typedef KeyT key_type;
typedef ValueT value_type;
friend void swap(KeyValueMap& x, KeyValueMap& y) {
std::swap(*x.key_ptr, *y.key_ptr);
std::swap(*x.value_ptr, *y.value_ptr);
}
KeyT* key_ptr = nullptr;
ValueT* value_ptr = nullptr;
};
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
namespace dragon {
namespace detection {
/*
* Detection Utilities.
*/
namespace utils {
template <typename T>
inline T DivUp(const T a, const T b) {
return (a + b - T(1)) / b;
}
} // namespace utils
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
#include "detection_utils.h"
#include <dragon/core/context.h>
namespace dragon {
namespace utils {
namespace detection {
template <typename T>
T IoU(const T A[], const T B[]) {
if (A[0] > B[2] || A[1] > B[3] || A[2] < B[0] || A[3] < B[1]) return 0;
const T x1 = std::max(A[0], B[0]);
const T y1 = std::max(A[1], B[1]);
const T x2 = std::min(A[2], B[2]);
const T y2 = std::min(A[3], B[3]);
const T width = std::max((T)0, x2 - x1 + 1);
const T height = std::max((T)0, y2 - y1 + 1);
const T area = width * height;
const T A_area = (A[2] - A[0] + 1) * (A[3] - A[1] + 1);
const T B_area = (B[2] - B[0] + 1) * (B[3] - B[1] + 1);
return area / (A_area + B_area - area);
}
template <>
void ApplyNMS<float, CPUContext>(
const int num_boxes,
const int max_keeps,
const float thresh,
const float* boxes,
int64_t* keep_indices,
int& num_keep,
CPUContext* ctx) {
int count = 0;
std::vector<char> is_dead(num_boxes);
for (int i = 0; i < num_boxes; ++i)
is_dead[i] = 0;
for (int i = 0; i < num_boxes; ++i) {
if (is_dead[i]) continue;
keep_indices[count++] = i;
if (count == max_keeps) break;
for (int j = i + 1; j < num_boxes; ++j)
if (!is_dead[j] && IoU(&boxes[i * 5], &boxes[j * 5]) > thresh) {
is_dead[j] = 1;
}
}
num_keep = count;
}
template <>
void SelectProposals<float, CPUContext>(
const int count,
const float score_thresh,
const float* input_scores,
vector<float>& output_scores,
vector<int64_t>& output_indices,
CPUContext* ctx) {
int num_proposals = 0;
for (int i = 0; i < count; ++i) {
if (input_scores[i] > score_thresh) {
output_indices[num_proposals++] = i;
}
}
output_scores.resize(num_proposals);
for (int i = 0; i < num_proposals; ++i) {
output_scores[i] = input_scores[output_indices[i]];
}
}
} // namespace detection
} // namespace utils
} // namespace dragon
#ifdef USE_CUDA
#include <dragon/core/context_cuda.h>
#include <dragon/core/workspace.h>
#include <dragon/utils/device/common_cub.h>
#include <dragon/utils/device/common_thrust.h>
#include "detection_utils.h"
namespace dragon {
namespace utils {
namespace detection {
#define DIV_UP(m, n) ((m) / (n) + ((m) % (n) > 0))
#define NUM_THREADS 64
namespace {
template <typename T>
struct ThresholdFunctor {
ThresholdFunctor(float thresh) : thresh_(thresh) {}
inline __device__ bool operator()(
const thrust::tuple<int64_t, T>& key_val) const {
return thrust::get<1>(key_val) > thresh_;
}
float thresh_;
};
template <typename T>
__device__ bool _CheckIoU(const T* a, const T* b, const float thresh) {
const T x1 = max(a[0], b[0]);
const T y1 = max(a[1], b[1]);
const T x2 = min(a[2], b[2]);
const T y2 = min(a[3], b[3]);
const T width = max(T(0), x2 - x1 + 1);
const T height = max(T(0), y2 - y1 + 1);
const T inter = width * height;
const T Sa = (a[2] - a[0] + T(1)) * (a[3] - a[1] + T(1));
const T Sb = (b[2] - b[0] + T(1)) * (b[3] - b[1] + T(1));
return inter > thresh * (Sa + Sb - inter);
}
template <typename T>
__global__ void _NonMaxSuppression(
const int num_blocks,
const int num_boxes,
const T thresh,
const T* dev_boxes,
uint64_t* dev_mask) {
const int row_start = blockIdx.y;
const int col_start = blockIdx.x;
if (row_start > col_start) return;
const int row_size = min(num_boxes - row_start * NUM_THREADS, NUM_THREADS);
const int col_size = min(num_boxes - col_start * NUM_THREADS, NUM_THREADS);
__shared__ T block_boxes[NUM_THREADS * 4];
if (threadIdx.x < col_size) {
const int c1 = threadIdx.x * 4;
const int c2 = (col_start * NUM_THREADS + threadIdx.x) * 5;
block_boxes[c1] = dev_boxes[c2];
block_boxes[c1 + 1] = dev_boxes[c2 + 1];
block_boxes[c1 + 2] = dev_boxes[c2 + 2];
block_boxes[c1 + 3] = dev_boxes[c2 + 3];
}
__syncthreads();
if (threadIdx.x < row_size) {
const int index = row_start * NUM_THREADS + threadIdx.x;
const T* dev_box = dev_boxes + index * 5;
unsigned long long val = 0;
const int start = (row_start == col_start) ? (threadIdx.x + 1) : 0;
for (int i = start; i < col_size; ++i) {
if (_CheckIoU(dev_box, block_boxes + i * 4, thresh)) {
val |= 1ULL << i;
}
}
dev_mask[index * num_blocks + col_start] = val;
}
}
} // namespace
template <>
void SelectProposals<float, CUDAContext>(
const int count,
const float score_thresh,
const float* in_scores,
vector<float>& out_scores,
vector<int64_t>& out_indices,
CUDAContext* ctx) {
auto* in_indices = ctx->workspace()->template data<int64_t, CUDAContext>(
{count}, "data:1")[0];
auto iter = thrust::make_zip_iterator(
thrust::make_tuple(in_indices, const_cast<float*>(in_scores)));
auto policy = thrust::cuda::par.on(ctx->cuda_stream());
thrust::counting_iterator<int64_t> offset(0);
thrust::copy(policy, offset, offset + count, in_indices);
auto last = thrust::partition(
policy, iter, iter + count, ThresholdFunctor<float>(score_thresh));
size_t num_proposals = last - iter;
out_scores.resize(num_proposals);
out_indices.resize(num_proposals);
CUDA_CHECK(cudaMemcpyAsync(
out_scores.data(),
in_scores,
num_proposals * sizeof(float),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
CUDA_CHECK(cudaMemcpyAsync(
out_indices.data(),
in_indices,
num_proposals * sizeof(int64_t),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
ctx->FinishDeviceComputation();
}
template <>
void ApplyNMS<float, CUDAContext>(
const int num_boxes,
const int max_keeps,
const float thresh,
const float* boxes,
int64_t* keep_indices,
int& num_keep,
CUDAContext* ctx) {
const int num_blocks = DIV_UP(num_boxes, NUM_THREADS);
vector<uint64_t> mask_host(num_boxes * num_blocks);
auto* mask_dev = (uint64_t*)ctx->workspace()->data<CUDAContext>(
{mask_host.size() * sizeof(uint64_t)}, "data:1")[0];
_NonMaxSuppression<<<
dim3(num_blocks, num_blocks),
NUM_THREADS,
0,
ctx->cuda_stream()>>>(num_blocks, num_boxes, thresh, boxes, mask_dev);
CUDA_CHECK(cudaMemcpyAsync(
mask_host.data(),
mask_dev,
mask_host.size() * sizeof(uint64_t),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
ctx->FinishDeviceComputation();
vector<uint64_t> dead_bit(num_blocks);
memset(&dead_bit[0], 0, sizeof(uint64_t) * num_blocks);
int num_selected = 0;
for (int i = 0; i < num_boxes; ++i) {
const int nblock = i / NUM_THREADS;
const int inblock = i % NUM_THREADS;
if (!(dead_bit[nblock] & (1ULL << inblock))) {
keep_indices[num_selected++] = i;
auto* mask_i = &mask_host[0] + i * num_blocks;
for (int j = nblock; j < num_blocks; ++j)
dead_bit[j] |= mask_i[j];
if (num_selected == max_keeps) break;
}
}
num_keep = num_selected;
}
} // namespace detection
} // namespace utils
} // namespace dragon
#endif // USE_CUDA
/**************************************************************************
* Microsoft COCO Toolbox. version 2.0
* Data, paper, and tutorials available at: http://mscoco.org/
* Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
* Licensed under the Simplified BSD License [see coco/license.txt]
**************************************************************************/
#include "maskApi.h"
#include <math.h>
#include <stdlib.h>
uint umin( uint a, uint b ) { return (a<b) ? a : b; }
uint umax( uint a, uint b ) { return (a>b) ? a : b; }
void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts ) {
R->h=h; R->w=w; R->m=m; R->cnts=(m==0)?0:malloc(sizeof(uint)*m);
if(cnts) for(siz j=0; j<m; j++) R->cnts[j]=cnts[j];
}
void rleFree( RLE *R ) {
free(R->cnts); R->cnts=0;
}
void rlesInit( RLE **R, siz n ) {
*R = (RLE*) malloc(sizeof(RLE)*n);
for(siz i=0; i<n; i++) rleInit((*R)+i,0,0,0,0);
}
void rlesFree( RLE **R, siz n ) {
for(siz i=0; i<n; i++) rleFree((*R)+i); free(*R); *R=0;
}
void rleEncode( RLE *R, const byte *M, siz h, siz w, siz n ) {
siz i, j, k, a=w*h; uint c, *cnts; byte p;
cnts = malloc(sizeof(uint)*(a+1));
for(i=0; i<n; i++) {
const byte *T=M+a*i; k=0; p=0; c=0;
for(j=0; j<a; j++) { if(T[j]!=p) { cnts[k++]=c; c=0; p=T[j]; } c++; }
cnts[k++]=c; rleInit(R+i,h,w,k,cnts);
}
free(cnts);
}
void rleDecode( const RLE *R, byte *M, siz n ) {
for( siz i=0; i<n; i++ ) {
byte v=0; for( siz j=0; j<R[i].m; j++ ) {
for( siz k=0; k<R[i].cnts[j]; k++ ) *(M++)=v; v=!v; }}
}
void rleMerge( const RLE *R, RLE *M, siz n, bool intersect ) {
uint *cnts, c, ca, cb, cc, ct; bool v, va, vb, vp;
siz i, a, b, h=R[0].h, w=R[0].w, m=R[0].m; RLE A, B;
if(n==0) { rleInit(M,0,0,0,0); return; }
if(n==1) { rleInit(M,h,w,m,R[0].cnts); return; }
cnts = malloc(sizeof(uint)*(h*w+1));
for( a=0; a<m; a++ ) cnts[a]=R[0].cnts[a];
for( i=1; i<n; i++ ) {
B=R[i]; if(B.h!=h||B.w!=w) { h=w=m=0; break; }
rleInit(&A,h,w,m,cnts); ca=A.cnts[0]; cb=B.cnts[0];
v=va=vb=0; m=0; a=b=1; cc=0; ct=1;
while( ct>0 ) {
c=umin(ca,cb); cc+=c; ct=0;
ca-=c; if(!ca && a<A.m) { ca=A.cnts[a++]; va=!va; } ct+=ca;
cb-=c; if(!cb && b<B.m) { cb=B.cnts[b++]; vb=!vb; } ct+=cb;
vp=v; if(intersect) v=va&&vb; else v=va||vb;
if( v!=vp||ct==0 ) { cnts[m++]=cc; cc=0; }
}
rleFree(&A);
}
rleInit(M,h,w,m,cnts); free(cnts);
}
void rleArea( const RLE *R, siz n, uint *a ) {
for( siz i=0; i<n; i++ ) {
a[i]=0; for( siz j=1; j<R[i].m; j+=2 ) a[i]+=R[i].cnts[j]; }
}
void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o ) {
siz g, d; BB db, gb; bool crowd;
db=malloc(sizeof(double)*m*4); rleToBbox(dt,db,m);
gb=malloc(sizeof(double)*n*4); rleToBbox(gt,gb,n);
bbIou(db,gb,m,n,iscrowd,o); free(db); free(gb);
for( g=0; g<n; g++ ) for( d=0; d<m; d++ ) if(o[g*m+d]>0) {
crowd=iscrowd!=NULL && iscrowd[g];
if(dt[d].h!=gt[g].h || dt[d].w!=gt[g].w) { o[g*m+d]=-1; continue; }
siz ka, kb, a, b; uint c, ca, cb, ct, i, u; bool va, vb;
ca=dt[d].cnts[0]; ka=dt[d].m; va=vb=0;
cb=gt[g].cnts[0]; kb=gt[g].m; a=b=1; i=u=0; ct=1;
while( ct>0 ) {
c=umin(ca,cb); if(va||vb) { u+=c; if(va&&vb) i+=c; } ct=0;
ca-=c; if(!ca && a<ka) { ca=dt[d].cnts[a++]; va=!va; } ct+=ca;
cb-=c; if(!cb && b<kb) { cb=gt[g].cnts[b++]; vb=!vb; } ct+=cb;
}
if(i==0) u=1; else if(crowd) rleArea(dt+d,1,&u);
o[g*m+d] = (double)i/(double)u;
}
}
void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o ) {
double h, w, i, u, ga, da; siz g, d; bool crowd;
for( g=0; g<n; g++ ) {
BB G=gt+g*4; ga=G[2]*G[3]; crowd=iscrowd!=NULL && iscrowd[g];
for( d=0; d<m; d++ ) {
BB D=dt+d*4; da=D[2]*D[3]; o[g*m+d]=0;
w=fmin(D[2]+D[0],G[2]+G[0])-fmax(D[0],G[0]); if(w<=0) continue;
h=fmin(D[3]+D[1],G[3]+G[1])-fmax(D[1],G[1]); if(h<=0) continue;
i=w*h; u = crowd ? da : da+ga-i; o[g*m+d]=i/u;
}
}
}
void rleToBbox( const RLE *R, BB bb, siz n ) {
for( siz i=0; i<n; i++ ) {
uint h, w, x, y, xs, ys, xe, ye, cc, t; siz j, m;
h=(uint)R[i].h; w=(uint)R[i].w; m=R[i].m;
m=((siz)(m/2))*2; xs=w; ys=h; xe=ye=0; cc=0;
if(m==0) { bb[4*i+0]=bb[4*i+1]=bb[4*i+2]=bb[4*i+3]=0; continue; }
for( j=0; j<m; j++ ) {
cc+=R[i].cnts[j]; t=cc-j%2; y=t%h; x=(t-y)/h;
xs=umin(xs,x); xe=umax(xe,x); ys=umin(ys,y); ye=umax(ye,y);
}
bb[4*i+0]=xs; bb[4*i+2]=xe-xs+1;
bb[4*i+1]=ys; bb[4*i+3]=ye-ys+1;
}
}
void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n ) {
for( siz i=0; i<n; i++ ) {
double xs=bb[4*i+0], xe=xs+bb[4*i+2];
double ys=bb[4*i+1], ye=ys+bb[4*i+3];
double xy[8] = {xs,ys,xs,ye,xe,ye,xe,ys};
rleFrPoly( R+i, xy, 4, h, w );
}
}
int uintCompare(const void *a, const void *b) {
uint c=*((uint*)a), d=*((uint*)b); return c>d?1:c<d?-1:0;
}
void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w ) {
// upsample and get discrete points densely along entire boundary
siz j, m=0; double scale=5; int *x, *y, *u, *v; uint *a, *b;
x=malloc(sizeof(int)*(k+1)); y=malloc(sizeof(int)*(k+1));
for(j=0; j<k; j++) x[j]=(int)(scale*xy[j*2+0]+.5); x[k]=x[0];
for(j=0; j<k; j++) y[j]=(int)(scale*xy[j*2+1]+.5); y[k]=y[0];
for(j=0; j<k; j++) m+=umax(abs(x[j]-x[j+1]),abs(y[j]-y[j+1]))+1;
u=malloc(sizeof(int)*m); v=malloc(sizeof(int)*m); m=0;
for( j=0; j<k; j++ ) {
int xs=x[j], xe=x[j+1], ys=y[j], ye=y[j+1], dx, dy, t;
bool flip; double s; dx=abs(xe-xs); dy=abs(ys-ye);
flip = (dx>=dy && xs>xe) || (dx<dy && ys>ye);
if(flip) { t=xs; xs=xe; xe=t; t=ys; ys=ye; ye=t; }
s = dx>=dy ? (double)(ye-ys)/dx : (double)(xe-xs)/dy;
if(dx>=dy) for( int d=0; d<=dx; d++ ) {
t=flip?dx-d:d; u[m]=t+xs; v[m]=(int)(ys+s*t+.5); m++;
} else for( int d=0; d<=dy; d++ ) {
t=flip?dy-d:d; v[m]=t+ys; u[m]=(int)(xs+s*t+.5); m++;
}
}
// get points along y-boundary and downsample
free(x); free(y); k=m; m=0; double xd, yd;
x=malloc(sizeof(int)*k); y=malloc(sizeof(int)*k);
for( j=1; j<k; j++ ) if(u[j]!=u[j-1]) {
xd=(double)(u[j]<u[j-1]?u[j]:u[j]-1); xd=(xd+.5)/scale-.5;
if( floor(xd)!=xd || xd<0 || xd>w-1 ) continue;
yd=(double)(v[j]<v[j-1]?v[j]:v[j-1]); yd=(yd+.5)/scale-.5;
if(yd<0) yd=0; else if(yd>h) yd=h; yd=ceil(yd);
x[m]=(int) xd; y[m]=(int) yd; m++;
}
// compute rle encoding given y-boundary points
k=m; a=malloc(sizeof(uint)*(k+1));
for( j=0; j<k; j++ ) a[j]=(uint)(x[j]*(int)(h)+y[j]);
a[k++]=(uint)(h*w); free(u); free(v); free(x); free(y);
qsort(a,k,sizeof(uint),uintCompare); uint p=0;
for( j=0; j<k; j++ ) { uint t=a[j]; a[j]-=p; p=t; }
b=malloc(sizeof(uint)*k); j=m=0; b[m++]=a[j++];
while(j<k) if(a[j]>0) b[m++]=a[j++]; else {
j++; if(j<k) b[m-1]+=a[j++]; }
rleInit(R,h,w,m,b); free(a); free(b);
}
char* rleToString( const RLE *R ) {
// Similar to LEB128 but using 6 bits/char and ascii chars 48-111.
siz i, m=R->m, p=0; long x; bool more;
char *s=malloc(sizeof(char)*m*6);
for( i=0; i<m; i++ ) {
x=(long) R->cnts[i]; if(i>2) x-=(long) R->cnts[i-2]; more=1;
while( more ) {
char c=x & 0x1f; x >>= 5; more=(c & 0x10) ? x!=-1 : x!=0;
if(more) c |= 0x20; c+=48; s[p++]=c;
}
}
s[p]=0; return s;
}
void rleFrString( RLE *R, char *s, siz h, siz w ) {
siz m=0, p=0, k; long x; bool more; uint *cnts;
while( s[m] ) m++; cnts=malloc(sizeof(uint)*m); m=0;
while( s[p] ) {
x=0; k=0; more=1;
while( more ) {
char c=s[p]-48; x |= (c & 0x1f) << 5*k;
more = c & 0x20; p++; k++;
if(!more && (c & 0x10)) x |= -1 << 5*k;
}
if(m>2) x+=(long) cnts[m-2]; cnts[m++]=(uint) x;
}
rleInit(R,h,w,m,cnts); free(cnts);
}
/**************************************************************************
* Microsoft COCO Toolbox. version 2.0
* Data, paper, and tutorials available at: http://mscoco.org/
* Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
* Licensed under the Simplified BSD License [see coco/license.txt]
**************************************************************************/
#pragma once
#include <stdbool.h>
typedef unsigned int uint;
typedef unsigned long siz;
typedef unsigned char byte;
typedef double* BB;
typedef struct { siz h, w, m; uint *cnts; } RLE;
// Initialize/destroy RLE.
void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts );
void rleFree( RLE *R );
// Initialize/destroy RLE array.
void rlesInit( RLE **R, siz n );
void rlesFree( RLE **R, siz n );
// Encode binary masks using RLE.
void rleEncode( RLE *R, const byte *mask, siz h, siz w, siz n );
// Decode binary masks encoded via RLE.
void rleDecode( const RLE *R, byte *mask, siz n );
// Compute union or intersection of encoded masks.
void rleMerge( const RLE *R, RLE *M, siz n, bool intersect );
// Compute area of encoded masks.
void rleArea( const RLE *R, siz n, uint *a );
// Compute intersection over union between masks.
void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o );
// Compute intersection over union between bounding boxes.
void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o );
// Get bounding boxes surrounding encoded masks.
void rleToBbox( const RLE *R, BB bb, siz n );
// Convert bounding boxes to encoded masks.
void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n );
// Convert polygon to encoded mask.
void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w );
// Get compressed string representation of encoded mask.
char* rleToString( const RLE *R );
// Convert from compressed string representation of encoded mask.
void rleFrString( RLE *R, char *s, siz h, siz w );
......@@ -8,7 +8,7 @@
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Compile the cython extensions."""
"""Build cython extensions."""
from __future__ import absolute_import
from __future__ import division
......@@ -16,34 +16,25 @@ from __future__ import print_function
from distutils.extension import Extension
from distutils.core import setup
import os
from Cython.Distutils import build_ext
import numpy as np
ext_modules = [
Extension(
'install.lib.utils.cython_bbox',
'seetadet.utils.bbox.cython_bbox',
['cython_bbox.pyx'],
extra_compile_args=['-w'],
include_dirs=[np.get_include()]
include_dirs=[np.get_include()],
),
Extension(
'install.lib.utils.cython_nms',
'seetadet.utils.nms.cython_nms',
['cython_nms.pyx'],
extra_compile_args=['-w'],
include_dirs=[np.get_include()]
),
Extension(
'install.lib.utils.pycocotools._mask',
['maskApi.c', '_mask.pyx'],
include_dirs=[np.get_include(), os.path.dirname(os.path.abspath(__file__))],
extra_compile_args=['-w']
include_dirs=[np.get_include()],
),
]
setup(
name='SeetaDet',
ext_modules=ext_modules,
cmdclass={'build_ext': build_ext},
)
setup(name='seetadet',
ext_modules=ext_modules,
cmdclass={'build_ext': build_ext})
# Datasets
## Introduction
This folder is kept for the record and json datasets.
Please prepare the datasets following the [documentation](../../scripts/datasets/README.md).
# Demo Images
## Introduction
This folder is kept for the demo images.
# Pretrained Models
## Introduction
This folder is kept for the pretrained models.
## ImageNet Pretrained Models
### Training settings
- ResNet models trained with 200 epochs follow the procedure in arXiv.1812.01187.
### ResNet
| Model | Lr sched | Acc@1 | Acc@5 | Source |
| :---: | :------: | :---: | :---: | :----: |
| [R-50](https://dragon.seetatech.com/download/seetadet/pretrained/R-50_in1k_cls90e.pkl) | 90e | 76.53 | 93.16 | Ours |
| [R-50](https://dragon.seetatech.com/download/seetadet/pretrained/R-50_in1k_cls200e.pkl) | 200e | 78.64 | 94.30 | Ours |
### MobileNet
| Model | Lr sched | Acc@1 | Acc@5 | Source |
| :---: | :------: | :---: | :---: | :----: |
| [MobileNetV2](https://dragon.seetatech.com/download/seetadet/pretrained/MobileNetV2_in1k_cls300e.pkl) | 300e | 71.88 | 90.29 | TorchVision |
| [MobileNetV3L](https://dragon.seetatech.com/download/seetadet/pretrained/MobileNetV3L_in1k_cls600e.pkl) | 600e | 74.04 | 91.34 | TorchVision |
### VGG
| Model | Lr sched | Acc@1 | Acc@5 | Source |
| :---: | :------: | :---: | :---: | :----: |
| [VGG-16-FCN](https://dragon.seetatech.com/download/seetadet/pretrained/VGG-16-FCN_in1k.pkl) | - | - | - | weiliu89 |
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Make record file for COCO dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import shutil
from maker import make_record
from roidb import make_database
if __name__ == '__main__':
COCO_ROOT = '/data'
# Encode masks to RLE bytes
if not os.path.exists('build'):
os.makedirs('build')
make_database('train', '2017', COCO_ROOT)
make_database('val', '2017', COCO_ROOT)
# coco_2017_train
make_record(
db_file='build/coco_2017_train.db.pkl',
record_file=os.path.join(COCO_ROOT, 'coco_2017_train'),
images_path=[os.path.join(COCO_ROOT, 'images/train2017')],
splits_path=[os.path.join(COCO_ROOT, 'splits')],
splits=['train2017'],
)
# coco_2017_val
make_record(
db_file='build/coco_2017_val.db.pkl',
record_file=os.path.join(COCO_ROOT, 'coco_2017_val'),
images_path=[os.path.join(COCO_ROOT, 'images/val2017')],
splits_path=[os.path.join(COCO_ROOT, 'splits')],
splits=['val2017'],
)
shutil.rmtree('build')
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
import os
import pickle
import time
import cv2
import dragon
import numpy as np
def make_example(image_file, objects, im_scale=None):
filename = os.path.split(image_file)[-1]
example = {'id': filename.split('.')[0], 'object': []}
if im_scale:
img = cv2.imread(image_file)
img = cv2.resize(
img, None,
fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR,
)
example['height'], example['width'], example['depth'] = img.shape
_, img = cv2.imencode('.jpg', img, [int(cv2.IMWRITE_JPEG_QUALITY), 95])
example['content'] = img.tostring()
else:
with open(image_file, 'rb') as f:
img_bytes = bytes(f.read())
img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 3)
example['height'], example['width'], example['depth'] = img.shape
example['content'] = img_bytes
for obj in objects:
x1, y1, x2, y2 = obj['bbox']
example['object'].append({
'name': obj['name'],
'xmin': x1,
'ymin': y1,
'xmax': x2,
'ymax': y2,
'mask': obj['mask'],
'polygons': obj['polygons'],
'difficult': obj.get('crowd', 0),
})
return example
def make_record(
record_file,
images_path,
db_file,
splits_path,
splits,
ext='.jpg',
im_scale=None,
):
if os.path.exists(record_file):
raise ValueError('The record file is already exist.')
os.makedirs(record_file)
if not isinstance(images_path, list):
images_path = [images_path]
if not isinstance(splits_path, list):
splits_path = [splits_path]
assert len(splits) == len(splits_path)
assert len(splits) == len(images_path)
if db_file is not None:
with open(db_file, 'rb') as f:
all_entries = pickle.load(f)
else:
all_entries = {}
print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
writer = dragon.io.KPLRecordWriter(
path=record_file,
protocol={
'id': 'string',
'content': 'bytes',
'height': 'int64',
'width': 'int64',
'depth': 'int64',
'object': [{
'name': 'string',
'xmin': 'float64',
'ymin': 'float64',
'xmax': 'float64',
'ymax': 'float64',
'mask': 'bytes',
'polygons': [['float64']],
'difficult': 'int64',
}]
}
)
count, total_line = 0, 0
start_time = time.time()
for db_idx, split in enumerate(splits):
split_file = os.path.join(splits_path[db_idx], split + '.txt')
if not os.path.exists(split_file):
# Fallback to try if split provided as json format
split_file = os.path.join(splits_path[db_idx], split + '.json')
if not os.path.exists(split_file):
raise FileNotFoundError('Unable to find the split:', split)
with open(split_file, 'r') as f:
import json
images_info = json.load(f)
total_line = len(images_info['images'])
lines = []
for info in images_info['images']:
lines.append(os.path.splitext(info['file_name'])[0])
else:
with open(split_file, 'r') as f:
lines = f.readlines()
total_line += len(lines)
for line in lines:
count += 1
if count % 2000 == 0:
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
count, total_line, now_time - start_time))
filename = line.strip()
image_file = os.path.join(images_path[db_idx], filename + ext)
objects = all_entries[filename] if filename in all_entries else {}
writer.write(make_example(image_file, objects, im_scale))
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(count, total_line, now_time - start_time))
writer.close()
end_time = time.time()
data_size = os.path.getsize(record_file + '/root.data') * 1e-6
print('{} images take {:.2f} MB in {:.2f} sec.'
.format(total_line, data_size, end_time - start_time))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import os
import os.path as osp
import pickle
from seetadet.utils.pycocotools import mask_utils
from seetadet.utils.pycocotools.coco import COCO
class COCOWrapper(object):
def __init__(self, image_set, year, data_dir):
self._year = year
self._image_set = image_set
self._data_path = osp.join(data_dir)
self.invalid_cnt = 0
self.ignore_cnt = 0
# Load COCO API, classes, class <-> id mappings
self._COCO = COCO(self._get_ann_file())
cats = self._COCO.loadCats(self._COCO.getCatIds())
self._classes = tuple(['__background__'] + [c['name'] for c in cats])
self._class_to_ind = dict(zip(self._classes, range(self.num_classes)))
self._ind_to_class = dict(zip(range(self.num_classes), self._classes))
self._class_to_cat_id = dict(zip([c['name'] for c in cats], self._COCO.getCatIds()))
self._cat_id_to_class_id = dict([(self._class_to_cat_id[cls], self._class_to_ind[cls])
for cls in self._classes[1:]])
self._data_name = {
# 5k ``val2014`` subset
'minival2014': 'val2014',
# ``val2014`` minus ``minival2014``
'valminusminival2014': 'val2014',
}.get(image_set + year, image_set + year)
self._image_index = self._load_image_set_index()
self._annotations = self._load_annotations()
def _get_ann_file(self):
prefix = 'instances' \
if self._image_set.find('test') == -1 \
else 'image_info'
return osp.join(
self._data_path,
'annotations',
prefix + '_' +
self._image_set +
self._year + '.json'
)
def _load_image_set_index(self):
"""Load image ids."""
image_ids = self._COCO.getImgIds()
return image_ids
def _load_annotations(self):
"""Load annotations."""
annotations = [self._load_coco_annotation(index)
for index in self._image_index]
return annotations
def image_path_from_index(self, index):
"""Construct an image path from the image's "index" identifier."""
# Example image path for index=119993:
# images/train2014/COCO_train2014_000000119993.jpg
# images/train2017/000000119993.jpg
filename = str(index).zfill(12) + '.jpg'
if '2014' in self._data_name:
filename = 'COCO_{}_{}'.format(self._data_name, filename)
image_path = osp.join(self._data_path, 'images',
self._data_name, filename)
assert osp.exists(image_path), \
'Path does not exist: {}'.format(image_path)
return image_path
def image_path_at(self, i):
"""Return the absolute path to image i in the image sequence."""
return self.image_path_from_index(self._image_index[i])
def annotation_at(self, i):
"""Return the absolute path to image i in the image sequence."""
return self._annotations[i]
def _load_coco_annotation(self, index):
"""Loads COCO bounding-box instance annotations."""
im_ann = self._COCO.loadImgs(index)[0]
width, height = im_ann['width'], im_ann['height']
ann_ids = self._COCO.getAnnIds(imgIds=index, iscrowd=None)
objects = self._COCO.loadAnns(ann_ids)
# Sanitize boxes -- some are invalid
valid_objects = []
mask, polygons = b'', []
for obj in objects:
x1 = float(max(0, obj['bbox'][0]))
y1 = float(max(0, obj['bbox'][1]))
x2 = float(min(width - 1, x1 + max(0, obj['bbox'][2] - 1)))
y2 = float(min(height - 1, y1 + max(0, obj['bbox'][3] - 1)))
if isinstance(obj['segmentation'], list):
for p in obj['segmentation']:
if len(p) < 6:
print('Remove Invalid segm.')
# Valid polygons have >= 3 points, so require >= 6 coordinates
polygons = [p for p in obj['segmentation'] if len(p) >= 6]
else:
# Crowd masks
# Some are encoded with height or width
# running out of the image bound
# Do not use them or decoding error is inevitable
mask = mask_utils.poly2bytes(obj['segmentation'], height, width)
if obj['area'] > 0 and x2 > x1 and y2 > y1:
obj['clean_bbox'] = [x1, y1, x2, y2]
valid_objects.append({
'bbox': [x1, y1, x2, y2],
'mask': mask,
'polygons': polygons,
'category_id': obj['category_id'],
'class_id': self._cat_id_to_class_id[obj['category_id']],
'crowd': obj['iscrowd'],
})
valid_objects[-1]['name'] = \
self._ind_to_class[valid_objects[-1]['class_id']]
return height, width, valid_objects
@property
def num_images(self):
return len(self._image_index)
@property
def num_classes(self):
return len(self._classes)
def make_database(split, year, data_dir):
coco = COCOWrapper(split, year, data_dir)
print('Preparing to make split: {}, total {} images'
.format(split, coco.num_images))
if not osp.exists(osp.join(coco._data_path, 'splits')):
os.makedirs(osp.join(coco._data_path, 'splits'))
entries = collections.OrderedDict()
for i in range(coco.num_images):
filename = osp.basename(coco.image_path_at(i)).split('.')[0]
h, w, objects = coco.annotation_at(i)
entries[filename] = objects
with open(osp.join('build',
'coco_' + year + '_' + split +
'.db.pkl'), 'wb') as f:
pickle.dump(entries, f, pickle.HIGHEST_PROTOCOL)
with open(osp.join(coco._data_path, 'splits',
split + year + '.txt'), 'w') as f:
for i in range(coco.num_images):
filename = str(osp.basename(coco.image_path_at(i)).split('.')[0])
if i != coco.num_images - 1:
filename += '\n'
f.write(filename)
def merge_database(split, year, db_files):
entries = collections.OrderedDict()
data_path = os.path.dirname(db_files[0])
for db_file in db_files:
with open(db_file, 'rb') as f:
entries = pickle.load(f)
entries.update(entries)
with open(osp.join(data_path,
'coco_' + year + '_' + split +
'.db.pkl'), 'wb') as f:
pickle.dump(entries, f, pickle.HIGHEST_PROTOCOL)
# Prepare Datasets
## Create Datasets for PASCAL VOC
We assume that raw dataset has the following structure:
```
VOC<year>
|_ JPEGImages
| |_ <im-1-name>.jpg
| |_ ...
| |_ <im-N-name>.jpg
|_ Annotations
| |_ <im-1-name>.xml
| |_ ...
| |_ <im-N-name>.xml
|_ ImageSets
| |_ Main
| | |_ trainval.txt
| | |_ test.txt
| | |_ ...
```
Create record and json dataset by:
```
python pascal_voc.py \
--rec /path/to/datasets/voc_trainval0712 \
--gt /path/to/datasets/voc_trainval0712.json \
--images /path/to/VOC2007/JPEGImages \
/path/to/VOC2012/JPEGImages \
--annotations /path/to/VOC2007/Annotations \
/path/to/VOC2012/Annotations \
--splits /path/to/VOC2007/ImageSets/Main/trainval.txt \
/path/to/VOC2012/ImageSets/Main/trainval.txt
```
## Create Datasets for COCO
We assume that raw dataset has the following structure:
```
COCO
|_ images
| |_ train2017
| | |_ <im-1-name>.jpg
| | |_ ...
| | |_ <im-N-name>.jpg
|_ annotations
| |_ instances_train2017.json
| |_ ...
```
Create record dataset by:
```
python coco.py \
--rec /path/to/datasets/coco_train2017 \
--images /path/to/COCO/images/train2017 \
--annotations /path/to/COCO/annotations/instances_train2017.json
```
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Prepare MS COCO datasets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import sys
import time
import dragon
from pycocotools.coco import COCO
from pycocotools.mask import frPyObjects
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Prepare PASCAL VOC datasets')
parser.add_argument(
'--rec',
default=None,
help='path to write record dataset')
parser.add_argument(
'--images',
nargs='+',
type=str,
default=None,
help='path of images folder')
parser.add_argument(
'--annotations',
nargs='+',
type=str,
default=None,
help='path of annotations folder')
parser.add_argument(
'--splits',
nargs='+',
type=str,
default=None,
help='path of split file')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def make_example(img_id, img_file, cocoGt):
"""Return the record example."""
img_meta = cocoGt.imgs[img_id]
img_anns = cocoGt.loadAnns(cocoGt.getAnnIds(imgIds=[img_id]))
cat_id_to_cat = dict((v['id'], v['name'])
for v in cocoGt.cats.values())
with open(img_file, 'rb') as f:
img_bytes = bytes(f.read())
height, width = img_meta['height'], img_meta['width']
example = {'id': str(img_id), 'height': height, 'width': width,
'depth': 3, 'content': img_bytes, 'object': []}
for ann in img_anns:
x1 = float(max(0, ann['bbox'][0]))
y1 = float(max(0, ann['bbox'][1]))
x2 = float(min(width - 1, x1 + max(0, ann['bbox'][2] - 1)))
y2 = float(min(height - 1, y1 + max(0, ann['bbox'][3] - 1)))
mask, polygons = b'', []
segm = ann.get('segmentation', None)
if segm is not None and isinstance(segm, list):
for p in ann['segmentation']:
if len(p) < 6:
print('Remove Invalid segm.')
# Valid polygons have >= 3 points, so require >= 6 coordinates
polygons = [p for p in ann['segmentation'] if len(p) >= 6]
elif segm is not None:
# Crowd masks.
# Some are encoded with wrong height or width.
# Do not use them or decoding error is inevitable.
rle = frPyObjects(ann['segmentation'], height, width)
assert type(rle) == dict
mask = rle['counts']
example['object'].append({
'name': cat_id_to_cat[ann['category_id']],
'xmin': x1, 'ymin': y1, 'xmax': x2, 'ymax': y2,
'mask': mask, 'polygons': polygons,
'difficult': ann.get('iscrowd', 0)})
return example
def write_dataset(args):
assert len(args.images) == len(args.annotations)
if os.path.exists(args.rec):
raise ValueError('The record path is already exist.')
os.makedirs(args.rec)
print('Write record dataset to {}'.format(args.rec))
writer = dragon.io.KPLRecordWriter(
path=args.rec,
protocol={
'id': 'string',
'content': 'bytes',
'height': 'int64',
'width': 'int64',
'depth': 'int64',
'object': [{
'name': 'string',
'xmin': 'float64',
'ymin': 'float64',
'xmax': 'float64',
'ymax': 'float64',
'mask': 'bytes',
'polygons': [['float64']],
'difficult': 'int64',
}]
}
)
# Scan all available entries.
print('Scan entries...')
entries, cocoGts = [], []
for ann_file in args.annotations:
cocoGts.append(COCO(ann_file))
if args.splits is not None:
assert len(args.splits) == len(args.images)
for i, split in enumerate(args.splits):
f = open(split, 'r')
for line in f.readlines():
filename = line.strip()
img_id = int(filename)
img_file = os.path.join(args.images[i], filename + '.jpg')
entries.append((img_id, img_file, cocoGts[i]))
f.close()
else:
for i, cocoGt in enumerate(cocoGts):
for info in cocoGt.imgs.values():
img_id = info['id']
img_file = os.path.join(args.images[i], info['file_name'])
entries.append((img_id, img_file, cocoGts[i]))
print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
start_time = time.time()
for i, entry in enumerate(entries):
if i > 0 and i % 2000 == 0:
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
i, len(entries), now_time - start_time))
writer.write(make_example(*entry))
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
len(entries), len(entries), now_time - start_time))
writer.close()
end_time = time.time()
data_size = os.path.getsize(args.rec + '/root.data') * 1e-6
print('{} images take {:.2f} MB in {:.2f} sec.'
.format(len(entries), data_size, end_time - start_time))
if __name__ == '__main__':
args = parse_args()
if args.rec is not None:
write_dataset(args)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Prepare JSON datasets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import json
import os
import sys
import dragon
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Prepare PASCAL VOC datasets')
parser.add_argument(
'--rec',
default=None,
help='path to read record')
parser.add_argument(
'--gt',
default=None,
help='path to write json ground-truth')
parser.add_argument(
'--categories',
nargs='+',
type=str,
default=None,
help='dataset object categories')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def get_image_id(image_name):
image_id = image_name.split('_')[-1].split('.')[0]
try:
return int(image_id)
except ValueError:
return image_name
def write_dataset(args):
dataset = {'images': [], 'categories': [], 'annotations': []}
kpl_dataset = dragon.io.KPLRecordDataset(args.rec)
cat_to_cat_id = dict(zip(args.categories,
range(1, len(args.categories) + 1)))
print('Writing json dataset to {}'.format(args.gt))
for cat in args.categories:
dataset['categories'].append({
'name': cat, 'id': cat_to_cat_id[cat]})
for _ in range(len(kpl_dataset)):
example = kpl_dataset.get()
image_id = get_image_id(example['id'])
dataset['images'].append({
'id': image_id, 'height': example['height'],
'width': example['width']})
for obj in example['object']:
if 'x2' in obj:
x1, y1, x2, y2 = obj['x1'], obj['y1'], obj['x2'], obj['y2']
elif 'xmin' in obj:
x1, y1, x2, y2 = obj['xmin'], obj['ymin'], obj['xmax'], obj['ymax']
else:
x1, y1, x2, y2 = obj['bbox']
w, h = x2 - x1 + 1, y2 - y1 + 1
dataset['annotations'].append({
'id': str(len(dataset['annotations'])),
'bbox': [x1, y1, w, h],
'area': w * h,
'iscrowd': obj.get('difficult', 0),
'image_id': image_id,
'category_id': cat_to_cat_id[obj['name']]})
with open(args.gt, 'w') as f:
json.dump(dataset, f)
if __name__ == '__main__':
args = parse_args()
if args.rec is None or not os.path.exists(args.rec):
raise ValueError('Specify the prepared record dataset.')
if args.gt is None:
raise ValueError('Specify the path to write json dataset.')
write_dataset(args)
......@@ -8,27 +8,67 @@
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Prepare PASCAL VOC datasets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import sys
import time
import cv2
import dragon
import numpy as np
import xml.etree.ElementTree as ET
import xml.etree.ElementTree
def make_example(image_file, xml_file):
tree = ET.parse(xml_file)
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Prepare PASCAL VOC datasets')
parser.add_argument(
'--rec',
default=None,
help='path to write record dataset')
parser.add_argument(
'--gt',
default=None,
help='path to write json dataset')
parser.add_argument(
'--images',
nargs='+',
type=str,
default=None,
help='path of images folder')
parser.add_argument(
'--annotations',
nargs='+',
type=str,
default=None,
help='path of annotations folder')
parser.add_argument(
'--splits',
nargs='+',
type=str,
default=None,
help='path of split file')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def make_example(img_file, xml_file):
"""Return the record example."""
tree = xml.etree.ElementTree.parse(xml_file)
filename = os.path.split(xml_file)[-1]
objs = tree.findall('object')
objects = tree.findall('object')
size = tree.find('size')
example = {'id': filename.split('.')[0], 'object': []}
with open(image_file, 'rb') as f:
with open(img_file, 'rb') as f:
img_bytes = bytes(f.read())
if size is not None:
example['height'] = int(size.find('height').text)
......@@ -38,7 +78,7 @@ def make_example(image_file, xml_file):
img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 3)
example['height'], example['width'], example['depth'] = img.shape
example['content'] = img_bytes
for ix, obj in enumerate(objs):
for obj in objects:
bbox = obj.find('bndbox')
is_diff = 0
if obj.find('difficult') is not None:
......@@ -49,35 +89,21 @@ def make_example(image_file, xml_file):
'ymin': float(bbox.find('ymin').text),
'xmax': float(bbox.find('xmax').text),
'ymax': float(bbox.find('ymax').text),
'difficult': is_diff,
})
'difficult': is_diff})
return example
def make_record(
record_file,
images_path,
annotations_path,
splits_path,
splits
):
if os.path.exists(record_file):
raise ValueError('The record file is already exist.')
os.makedirs(record_file)
if not isinstance(images_path, list):
images_path = [images_path]
if not isinstance(annotations_path, list):
annotations_path = [annotations_path]
if not isinstance(splits_path, list):
splits_path = [splits_path]
assert len(splits) == len(splits_path)
assert len(splits) == len(images_path)
assert len(splits) == len(annotations_path)
def write_dataset(args):
"""Write the record dataset."""
assert len(args.splits) == len(args.images)
assert len(args.splits) == len(args.annotations)
if os.path.exists(args.rec):
raise ValueError('The record path is already exist.')
os.makedirs(args.rec)
print('Write record dataset to {}'.format(args.rec))
writer = dragon.io.KPLRecordWriter(
path=record_file,
path=args.rec,
protocol={
'id': 'string',
'content': 'bytes',
......@@ -95,36 +121,56 @@ def make_record(
}
)
# Scan all available entries
# Scan all available entries.
print('Scan entries...')
entries = []
for i, split in enumerate(splits):
split_file = os.path.join(splits_path[i], split + '.txt')
with open(split_file, 'r') as f:
for i, split in enumerate(args.splits):
with open(split, 'r') as f:
lines = f.readlines()
for line in lines:
filename = line.strip()
img_file = os.path.join(images_path[i], filename + '.jpg')
ann_file = os.path.join(annotations_path[i], filename + '.xml')
img_file = os.path.join(args.images[i], filename + '.jpg')
ann_file = os.path.join(args.annotations[i], filename + '.xml')
entries.append((img_file, ann_file))
# Parse and write into record file
# Parse and write into record file.
print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
start_time = time.time()
for i, (img_file, ann_file) in enumerate(entries):
for i, (img_file, xml_file) in enumerate(entries):
if i > 0 and i % 2000 == 0:
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
i, len(entries), now_time - start_time))
writer.write(make_example(img_file, ann_file))
writer.write(make_example(img_file, xml_file))
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
len(entries), len(entries), now_time - start_time))
writer.close()
end_time = time.time()
data_size = os.path.getsize(record_file + '/root.data') * 1e-6
data_size = os.path.getsize(args.rec + '/root.data') * 1e-6
print('{} images take {:.2f} MB in {:.2f} sec.'
.format(len(entries), data_size, end_time - start_time))
def write_json_dataset(args):
"""Write the json dataset."""
categories = ['aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
import subprocess
scirpt = os.path.dirname(os.path.abspath(__file__)) + '/json_dataset.py'
cmd = '{} {} '.format(sys.executable, scirpt)
cmd += '--rec {} --gt {} '.format(args.rec, args.gt)
cmd += '--categories {} '.format(' '.join(categories))
return subprocess.call(cmd, shell=True)
if __name__ == '__main__':
args = parse_args()
if args.rec is not None:
write_dataset(args)
if args.gt is not None:
write_json_dataset(args)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import time
import cv2
import dragon
import numpy as np
import xml.etree.ElementTree as ET
def make_example(image_file, xml_file):
tree = ET.parse(xml_file)
filename = os.path.split(xml_file)[-1]
objs = tree.findall('object')
example = {'id': filename.split('.')[0], 'object': []}
with open(image_file, 'rb') as f:
img_bytes = bytes(f.read())
img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 1)
example['height'], example['width'], example['depth'] = img.shape
example['content'] = img_bytes
for ix, obj in enumerate(objs):
bbox = obj.find('bndbox')
is_diff = 0
if obj.find('difficult') is not None:
is_diff = int(obj.find('difficult').text) == 1
example['object'].append({
'name': obj.find('name').text.strip(),
'x1': float(bbox.find('x1').text),
'y1': float(bbox.find('y1').text),
'x2': float(bbox.find('x2').text),
'y2': float(bbox.find('y2').text),
'x3': float(bbox.find('x3').text),
'y3': float(bbox.find('y3').text),
'x4': float(bbox.find('x4').text),
'y4': float(bbox.find('y4').text),
'difficult': is_diff,
})
return example
def make_record(
record_file,
images_path,
annotations_path,
splits_path,
splits
):
if os.path.exists(record_file):
raise ValueError('The record file is already exist.')
os.makedirs(record_file)
if not isinstance(images_path, list):
images_path = [images_path]
if not isinstance(annotations_path, list):
annotations_path = [annotations_path]
if not isinstance(splits_path, list):
splits_path = [splits_path]
assert len(splits) == len(splits_path)
assert len(splits) == len(images_path)
assert len(splits) == len(annotations_path)
print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
writer = dragon.io.KPLRecordWriter(
path=record_file,
protocol={
'id': 'string',
'content': 'bytes',
'height': 'int64',
'width': 'int64',
'depth': 'int64',
'object': [{
'name': 'string',
'x1': 'float64',
'y1': 'float64',
'x2': 'float64',
'y2': 'float64',
'x3': 'float64',
'y3': 'float64',
'x4': 'float64',
'y4': 'float64',
'difficult': 'int64',
}]
}
)
# Scan all available entries
print('Scan entries...')
entries = []
for i, split in enumerate(splits):
split_file = os.path.join(splits_path[i], split + '.txt')
with open(split_file, 'r') as f:
lines = f.readlines()
for line in lines:
filename = line.strip()
img_file = os.path.join(images_path[i], filename + '.jpg')
ann_file = os.path.join(annotations_path[i], filename + '.xml')
entries.append((img_file, ann_file))
# Parse and write into record file
print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
start_time = time.time()
for i, (img_file, ann_file) in enumerate(entries):
if i > 0 and i % 2000 == 0:
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
i, len(entries), now_time - start_time))
writer.write(make_example(img_file, ann_file))
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
len(entries), len(entries), now_time - start_time))
writer.close()
end_time = time.time()
data_size = os.path.getsize(record_file + '/root.data') * 1e-6
print('{} images take {:.2f} MB in {:.2f} sec.'
.format(len(entries), data_size, end_time - start_time))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Make record file for VOC dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from os import path as osp
from maker import make_record
if __name__ == '__main__':
voc_root = '/data'
make_record(
record_file=osp.join(voc_root, 'voc_0712_trainval'),
images_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/JPEGImages'),
osp.join(voc_root, 'VOCdevkit2012/VOC2012/JPEGImages')],
annotations_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/Annotations'),
osp.join(voc_root, 'VOCdevkit2012/VOC2012/Annotations')],
splits_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/ImageSets/Main'),
osp.join(voc_root, 'VOCdevkit2012/VOC2012/ImageSets/Main')],
splits=['trainval', 'trainval']
)
make_record(
record_file=osp.join(voc_root, 'voc_2007_test'),
images_path=osp.join(voc_root, 'VOCdevkit2007/VOC2007/JPEGImages'),
annotations_path=osp.join(voc_root, 'VOCdevkit2007/VOC2007/Annotations'),
splits_path=osp.join(voc_root, 'VOCdevkit2007/VOC2007/ImageSets/Main'),
splits=['test']
)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.core.config import cfg
class AnchorSampler(object):
"""Sample precomputed anchors asynchronously."""
def __init__(self):
self._rpn_target = None
self._retinanet_target = None
self._ssd_target = None
if 'rcnn' in cfg.MODEL.TYPE:
from seetadet.algo.faster_rcnn import anchor_target
self._rpn_target = anchor_target.AnchorTarget()
elif cfg.MODEL.TYPE == 'retinanet':
from seetadet.algo.retinanet import anchor_target
self._retinanet_target = anchor_target.AnchorTarget()
elif cfg.MODEL.TYPE == 'ssd':
from seetadet.algo.ssd import anchor_target
self._ssd_target = anchor_target.AnchorTarget()
def __call__(self, **inputs):
"""Return the sample anchors."""
if self._rpn_target:
fg_inds, bg_inds = \
self._rpn_target.sample_anchors(
gt_boxes=inputs['gt_boxes'],
im_info=inputs['im_info'],
)
return {'fg_inds': fg_inds, 'bg_inds': bg_inds}
if self._retinanet_target:
fg_inds, ignore_inds = \
self._retinanet_target.sample_anchors(
gt_boxes=inputs['gt_boxes'],
im_info=inputs['im_info'],
)
return {'fg_inds': fg_inds, 'bg_inds': ignore_inds}
if self._ssd_target:
fg_inds, neg_inds = \
self._ssd_target.sample_anchors(
gt_boxes=inputs['gt_boxes'],
)
return {'fg_inds': fg_inds, 'bg_inds': neg_inds}
return {}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import math
import numpy as np
import numpy.random as npr
from seetadet.algo.faster_rcnn import generate_anchors as anchor_util
from seetadet.algo.faster_rcnn import utils as rcnn_util
from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util
from seetadet.utils.env import new_tensor
class AnchorTarget(object):
"""Assign ground-truth targets to anchors."""
def __init__(self):
super(AnchorTarget, self).__init__()
# Load the basic configs
self.scales = cfg.RPN.SCALES
self.strides = cfg.RPN.STRIDES
self.ratios = cfg.RPN.ASPECT_RATIOS
self.num_strides = len(self.strides)
# Generate base anchors
self.base_anchors = []
for i in range(self.num_strides):
self.base_anchors.append(
anchor_util.generate_anchors(
self.strides[i],
self.ratios,
np.array([self.scales[i]])
if self.num_strides > 1
else np.array(self.scales)))
# Plan the maximum shifted anchor layout
max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
if cfg.MODEL.COARSEST_STRIDE > 0:
stride = float(cfg.MODEL.COARSEST_STRIDE)
max_size = int(math.ceil(max_size / stride) * stride)
self.max_shapes = [[math.ceil(max_size / stride)] * 2
for stride in self.strides]
self.all_coords = rcnn_util.get_shifted_coords(
self.max_shapes, self.base_anchors)
self.all_anchors = rcnn_util.get_shifted_anchors(
self.max_shapes, self.base_anchors, self.strides)
def sample_anchors(self, gt_boxes, im_info, all_anchors=None):
if all_anchors is None:
all_anchors = self.all_anchors
# Only keep anchors inside the image
# to get higher quality proposals.
inds_inside = np.where(
(all_anchors[:, 0] >= 0) &
(all_anchors[:, 1] >= 0) &
(all_anchors[:, 2] < im_info[1]) &
(all_anchors[:, 3] < im_info[0]))[0]
anchors = all_anchors[inds_inside, :]
num_inside = len(inds_inside)
labels = np.empty((num_inside,), 'int32')
labels.fill(-1)
# Overlaps between the anchors and the gt boxes.
overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
argmax_overlaps = overlaps.argmax(axis=1)
max_overlaps = overlaps[np.arange(num_inside), argmax_overlaps]
# Overlaps between the gt boxes and anchors with highest IoU.
gt_argmax_overlaps = overlaps.argmax(axis=0)
gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])]
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
# Foreground: for each gt, anchor with highest overlap.
labels[gt_argmax_overlaps] = 1
# Foreground: above threshold IoU.
labels[max_overlaps >= cfg.RPN.POSITIVE_OVERLAP] = 1
# Background: below threshold IoU.
labels[max_overlaps < cfg.RPN.NEGATIVE_OVERLAP] = 0
# Retract the clamping if we don't have one.
fg_inds = np.where(labels == 1)[0]
if len(fg_inds) == 0:
labels[gt_argmax_overlaps] = 1
fg_inds = np.where(labels == 1)[0]
# Subsample positive labels if we have too many.
num_fg = int(cfg.RPN.FG_FRACTION * cfg.RPN.BATCH_SIZE)
if len(fg_inds) > num_fg:
fg_inds = npr.choice(fg_inds, num_fg, False)
# Subsample negative labels if we have too many.
num_bg = cfg.RPN.BATCH_SIZE - len(fg_inds)
bg_inds = np.where(labels == 0)[0]
if len(bg_inds) > num_bg:
bg_inds = npr.choice(bg_inds, num_bg, False)
return inds_inside[fg_inds], inds_inside[bg_inds]
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
shapes = [f.shape[-2:] for f in inputs['features']]
image_stride = sum(self.base_anchors[i].shape[0] * np.prod(shapes[i])
for i in range(len(inputs['features'])))
narrow_args = [self.all_coords, self.base_anchors, self.max_shapes, shapes]
outputs = collections.defaultdict(list)
for ix in range(num_images):
fg_inds = inputs['fg_inds'][ix]
bg_inds = inputs['bg_inds'][ix]
gt_boxes = inputs['gt_boxes'][ix]
# Narrow anchors to match the feature layout
anchors = self.all_anchors[fg_inds]
bg_inds = rcnn_util.narrow_anchors(*(narrow_args + [bg_inds]))
_, anchors = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds, anchors]))
fg_inds = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds]))
# Compute bbox targets
gt_assignment = box_util.bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
bbox_targets = box_util.bbox_transform(anchors, gt_boxes[gt_assignment, :4])
outputs['bbox_anchors'].append(anchors)
outputs['bbox_targets'].append(bbox_targets)
# Compute sparse indices
fg_inds += ix * image_stride
bg_inds += ix * image_stride
outputs['cls_inds'].extend([fg_inds, bg_inds])
outputs['bbox_inds'].extend([fg_inds])
outputs['labels'].extend([np.ones_like(fg_inds, 'float32'),
np.zeros_like(bg_inds, 'float32')])
return {
'labels': new_tensor(
np.concatenate(outputs['labels'])),
'cls_inds': new_tensor(
np.concatenate(outputs['cls_inds'])),
'bbox_inds': new_tensor(
np.concatenate(outputs['bbox_inds'])),
'bbox_targets': new_tensor(
np.concatenate(outputs['bbox_targets']).astype('float32')),
'bbox_anchors': new_tensor(
np.concatenate(outputs['bbox_anchors']).astype('float32')),
}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import multiprocessing as mp
import time
import threading
import queue
import dragon
import dragon.vm.torch as torch
import numpy as np
from seetadet.algo.faster_rcnn import data_transformer
from seetadet.core.config import cfg
from seetadet.datasets.factory import get_dataset
from seetadet.utils import blob as blob_util
from seetadet.utils import logger
class DataLoader(object):
"""Load mini-batches of data."""
def __init__(self):
super(DataLoader, self).__init__()
dataset = get_dataset(cfg.TRAIN.DATASET)
self.iterator = Iterator(**{
'dataset': dataset.cls,
'source': dataset.source,
'classes': dataset.classes,
'shuffle': cfg.TRAIN.USE_SHUFFLE,
'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
})
self.iterator.start()
def __call__(self):
outputs = self.iterator.next()
if isinstance(outputs['image'], np.ndarray):
outputs['image'] = torch.from_numpy(outputs['image'])
return outputs
class Iterator(threading.Thread):
"""Iterator to return the batch of data."""
def __init__(self, **kwargs):
super(Iterator, self).__init__()
# Distributed settings
rank, group_size = 0, 1
process_group = dragon.distributed.get_group()
if process_group is not None and \
kwargs.get('phase', 'TRAIN') == 'TRAIN':
group_size = process_group.size
rank = dragon.distributed.get_rank(process_group)
# Configuration
self._batch_size = kwargs.get('batch_size', 2)
self._num_readers = kwargs.get('num_readers', 1)
self._num_transformers = kwargs.get('num_transformers', 3)
self.daemon = True
# Initialize queues
num_batches = self._num_readers
self._queue1 = mp.Queue(num_batches * self._batch_size)
self._queue2 = mp.Queue(num_batches * self._batch_size)
self._queue3 = queue.Queue(num_batches)
# Initialize readers
self._readers = []
for i in range(self._num_readers):
part_idx, num_parts = i, self._num_readers
num_parts *= group_size
part_idx += rank * self._num_readers
self._readers.append(dragon.io.DataReader(
part_idx=part_idx, num_parts=num_parts, **kwargs))
self._readers[i]._seed += part_idx
self._readers[i].q_out = self._queue1
self._readers[i].start()
time.sleep(0.1)
# Initialize transformers
self._transformers = []
for i in range(self._num_transformers):
p = data_transformer.DataTransformer(**kwargs)
p._seed += (i + rank * self._num_transformers)
p.q_in, p.q_out = self._queue1, self._queue2
p.start()
self._transformers.append(p)
time.sleep(0.1)
# Register cleanup callbacks
def cleanup():
def terminate(processes):
for p in processes:
p.terminate()
p.join()
terminate(self._transformers)
logger.info('Terminate DataTransformer.')
terminate(self._readers)
logger.info('Terminate DataReader.')
import atexit
atexit.register(cleanup)
def next(self):
"""Return the next batch of data."""
return self.__next__()
def run(self):
"""Main loop."""
num_images = cfg.TRAIN.IMS_PER_BATCH
num_batches = cfg.TRAIN.ASPECT_GROUPING
logger.info('Initialize prefetching batches...')
example_buffer = [self._queue2.get()
for _ in range(num_images * num_batches)]
next_examples = []
while True:
# Use cached buffer for next N examples
# Examples are sorted to simulate aspect grouping
if len(next_examples) == 0:
next_examples = example_buffer
next_examples.sort(key=lambda d: d['aspect_ratio'])
example_buffer = []
# Prepare the next batch
outputs = collections.defaultdict(list)
for i in range(num_images):
example = next_examples.pop(0)
outputs['image'].append(example['image'])
outputs['gt_boxes'].append(example['boxes'])
outputs['im_info'].append(example['im_info'])
outputs['fg_inds'].append(example.get('fg_inds', None))
outputs['bg_inds'].append(example.get('bg_inds', None))
example_buffer.append(self._queue2.get())
outputs['image'] = blob_util.im_list_to_blob(
outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
# Send batch data to consumer
self._queue3.put(outputs)
def __iter__(self):
"""Return the iterator self."""
return self
def __next__(self):
"""Return the next batch of data."""
return self._queue3.get()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import multiprocessing
import cv2
import numpy as np
import numpy.random as npr
from seetadet.algo import common as algo_common
from seetadet.core.config import cfg
from seetadet.datasets.example import Example
from seetadet.utils import boxes as box_util
from seetadet.utils import image as image_util
class DataTransformer(multiprocessing.Process):
"""DataTransformer."""
def __init__(self, **kwargs):
super(DataTransformer, self).__init__()
self._scales = cfg.TRAIN.SCALES
self._random_scales = cfg.TRAIN.RANDOM_SCALES
self._max_size = cfg.TRAIN.MAX_SIZE
self._seed = cfg.RNG_SEED
self._use_diff = cfg.TRAIN.USE_DIFF
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._use_distort = cfg.TRAIN.USE_COLOR_JITTER
self._classes = kwargs.get('classes', ('__background__',))
self._num_classes = len(self._classes)
self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self._anchor_sampler = algo_common.AnchorSampler()
self.q_in = self.q_out = None
self.daemon = True
def get_boxes(self, example, im_scale, im_offset, flipped):
objects, num_objects = example.objects, 0
height, width = example.height, example.width
if not self._use_diff:
for obj in objects:
if obj.get('difficult', 0) == 0:
num_objects += 1
else:
num_objects = len(objects)
boxes = np.zeros((num_objects, 4), 'float32')
gt_classes = np.zeros((num_objects,), 'float32')
# Filter the difficult instances.
object_idx = 0
for obj in objects:
if not self._use_diff and obj.get('difficult', 0) > 0:
continue
bbox = obj['bbox']
boxes[object_idx, :] = [max(0, bbox[0]),
max(0, bbox[1]),
min(bbox[2], width - 1),
min(bbox[3], height - 1)]
gt_classes[object_idx] = self._class_to_ind[obj['name']]
object_idx += 1
# Flip the boxes if necessary.
if flipped:
boxes = box_util.flip_boxes(boxes, width)
# Scale the boxes to the detecting scale.
boxes *= im_scale
# Offset the boxes to align the cropping.
if im_offset is not None:
boxes[:, 0::2] += im_offset[1]
boxes[:, 1::2] += im_offset[0]
boxes[:, :] = np.minimum(
np.maximum(boxes[:, :], 0),
[im_offset[2][1] - 1, im_offset[2][0] - 1] * 2)
# Attach the classes.
gt_boxes = np.empty((num_objects, 5), dtype=np.float32)
gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
return gt_boxes
def get(self, example):
example = Example(example)
# Resize.
target_size = npr.choice(self._scales)
img, im_scale = image_util.resize_image_with_target_size(
example.image,
target_size=target_size,
max_size=self._max_size,
random_scales=self._random_scales,
)
# Flip.
flipped = False
if self._use_flipped and npr.randint(2) > 0:
img = img[:, ::-1]
flipped = True
# Crop or Pad.
im_offset = None
if self._max_size == 0:
img, im_offset = image_util.get_image_with_target_size(
img, target_size)
# Distort.
if self._use_distort:
img = image_util.distort_image(img)
# Boxes.
boxes = self.get_boxes(example, im_scale, im_offset, flipped)
# Standard outputs.
outputs = {'image': img,
'boxes': boxes,
'im_info': img.shape[:2] + (im_scale,)}
# Attach precomputed targets.
if len(boxes) > 0:
outputs.update(
self._anchor_sampler(
gt_boxes=boxes,
im_info=outputs['im_info']))
return outputs
def run(self):
# Disable the opencv threading.
cv2.setNumThreads(1)
# Fix the process-local random seed.
np.random.seed(self._seed)
# Main prefetch loop
while True:
outputs = self.get(self.q_in.get())
if len(outputs['boxes']) < 1:
continue # Ignore non-object image.
height, width = outputs['image'].shape[:2]
outputs['aspect_ratio'] = float(height) / float(width)
self.q_out.put(outputs)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# Codes are based on:
#
# <https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/generate_anchors.py>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
# Verify that we compute the same anchors as Shaoqing's matlab implementation:
#
# >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat
# >> anchors
#
# anchors =
#
# -83 -39 100 56
# -175 -87 192 104
# -359 -183 376 200
# -55 -55 72 72
# -119 -119 136 136
# -247 -247 264 264
# -35 -79 52 96
# -79 -167 96 184
# -167 -343 184 360
# array([[ -83., -39., 100., 56.],
# [-175., -87., 192., 104.],
# [-359., -183., 376., 200.],
# [ -55., -55., 72., 72.],
# [-119., -119., 136., 136.],
# [-247., -247., 264., 264.],
# [ -35., -79., 52., 96.],
# [ -79., -167., 96., 184.],
# [-167., -343., 184., 360.]])
def generate_anchors(
base_size=16,
ratios=(0.5, 1, 2),
scales=2**np.arange(3, 6),
):
"""
Generate anchor (reference) windows by enumerating aspect ratios X
scales wrt a reference (0, 0, 15, 15) window.
"""
base_anchor = np.array([1, 1, base_size, base_size]) - 1
ratio_anchors = _ratio_enum(base_anchor, ratios)
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in range(ratio_anchors.shape[0])])
return anchors
def generate_anchors_v2(
stride=16,
ratios=(0.5, 1, 2),
sizes=(32, 64, 128, 256, 512),
):
"""
Generates a matrix of anchor boxes in (x1, y1, x2, y2) format. Anchors
are centered on stride / 2, have (approximate) sqrt areas of the specified
sizes, and aspect ratios as given.
"""
return generate_anchors(
base_size=stride,
ratios=ratios,
scales=np.array(sizes, dtype=np.float) / stride,
)
def _whctrs(anchor):
"""Return width, height, x center, and y center for an anchor (window)."""
w = anchor[2] - anchor[0] + 1
h = anchor[3] - anchor[1] + 1
x_ctr = anchor[0] + 0.5 * (w - 1)
y_ctr = anchor[1] + 0.5 * (h - 1)
return w, h, x_ctr, y_ctr
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""
Given a vector of widths (ws) and heights (hs) around a center
(x_ctr, y_ctr), output a set of anchors (windows).
"""
ws = ws[:, np.newaxis]
hs = hs[:, np.newaxis]
anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
y_ctr - 0.5 * (hs - 1),
x_ctr + 0.5 * (ws - 1),
y_ctr + 0.5 * (hs - 1)))
return anchors
def _ratio_enum(anchor, ratios):
"""Enumerate a set of anchors for each aspect ratio wrt an anchor."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
size = w * h
size_ratios = size / ratios
ws = np.round(np.sqrt(size_ratios))
hs = np.round(ws * ratios)
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
def _scale_enum(anchor, scales):
"""Enumerate a set of anchors for each scale wrt an anchor."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws = w * scales
hs = h * scales
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
if __name__ == '__main__':
print(generate_anchors())
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!