Commit 9d12d142 by Ting PAN

Add Model Zoo

1 parent d240a4fd
Showing with 1877 additions and 1614 deletions
[flake8]
max-line-length = 120
ignore = E741, # ambiguous variable name
F403, # ‘from module import *’ used; unable to detect undefined names
F405, # name may be undefined, or defined from star imports: module
F811, # redefinition of unused name from line N
F821, # undefined name
W503, # line break before binary operator
W504, # line break after binary operator
# module imported but unused
per-file-ignores = __init__.py: F401
exclude = seetadet/utils/pycocotools
...@@ -43,8 +43,13 @@ __pycache__ ...@@ -43,8 +43,13 @@ __pycache__
# VSCode files # VSCode files
.vscode .vscode
# PyCharm files # IDEA files
.idea .idea
# OSX dir files # OSX dir files
.DS_Store .DS_Store
# Android files
.gradle
*.iml
local.properties
------------------------------------------------------------------------
The list of most significant changes made over time in SeetaDet.
SeetaDet 0.4.3 (20200724)
Dragon Minimum Required (Version 0.3.0.dev20200723)
Changes:
- Adapt to the latest dragon preview version.
Preview Features:
- None
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.4.2 (20200707)
Dragon Minimum Required (Version 0.3.0.dev20200707)
Changes:
- Adapt to the latest dragon preview version.
Preview Features:
- None
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.4.1 (20200421)
Dragon Minimum Required (Version 0.3.0.dev20200421)
Changes:
- Plan the queueing of testing images instead of reading them all.
Preview Features:
- None
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.4.0 (20200408)
Dragon Minimum Required (Version 0.3.0.dev20200408)
Changes:
Preview Features:
- Optimize the code structure.
- DALI support for SSD, RetinaNet, and Faster-RCNN.
- Use KPLRecord instead of SeetaRecord.
Bugs fixed:
- Fix the frozen Affine issue.
------------------------------------------------------------------------
SeetaDet 0.3.0 (20191121)
Dragon Minimum Required (Version 0.3.0.dev20191121)
Changes:
Preview Features:
- New algorithm: Mask R-CNN.
- Add MobileNet(V2 and NAS) as backbone.
- Refactor testing module, multi-GPU is supported.
Bugs fixed:
- Remove rotated boxes, use Mask R-CNN instead.
------------------------------------------------------------------------
SeetaDet 0.2.3 (20191101)
Dragon Minimum Required (Version 0.3.0.dev20191021)
Changes:
Preview Features:
- Refactor the API of rotated boxes.
- Simplify the solver by adding LRScheduler.
- Change the ``ITER`` naming to ``STEP``.
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.2.2 (20191021)
Dragon Minimum Required (Version 0.3.0.dev20191021)
Changes:
Preview Features:
- Add the dumping if detection results.
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.2.1 (20191017)
Dragon Minimum Required (Version 0.3.0.dev20191017)
Changes:
Preview Features:
- Rotated boxes and FPN support for SSD.
- Frozen the graph to speed up inference.
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.2.0 (20190929)
Dragon Minimum Required (Version 0.3.0.dev20190929)
Changes:
Preview Features:
- Use SeetaRecord instead of LMDB.
- Flatten the implementation of layers.
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.1.2 (20190723)
Dragon Minimum Required (Version 0.3.0.0)
Changes:
Preview Features:
- Change to the PEP8 code style.
- Adapt the new Dragon API.
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.1.1 (20190409)
Dragon Minimum Required (Version 0.3.0.0)
Changes:
Preview Features:
- Add RandomCrop/RandomPad for ScaleJittering.
- Add ResNet18/ResNet34/AirNet for R-CNN and RetinaNet.
- Use C++ Implemented Decoder for RetinaNet instead.
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.1.0 (20190314)
Dragon Minimum Required (Version 0.3.0.0)
Changes:
Preview Features:
- Init repository.
Bugs fixed:
- None
Copyright (c) 2017, SeetaTech, Co.,Ltd. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# Benchmark and Model Zoo
## Introduction
### ImageNet Pretrained Models
#### ResNet Models
- [R-50.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-50.pkl)
- [R-101.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-101.pkl)
#### VGG Models
- [VGG16.SSD.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/VGG16.SSD.pkl)
#### MobileNet Models
- [MobileNetV2.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/MobileNetV2.pkl)
- [ProxylessMobile.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/ProxylessMobile.pkl)
#### AirNet Models
- [AirNet.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/AirNet.pkl)
## Baselines
### Faster R-CNN
Please refer to [Faster R-CNN](configs/faster_rcnn) for details.
### Mask R-CNN
Please refer to [Mask R-CNN](configs/mask_rcnn) for details.
### RetinaNet
Please refer to [RetinaNet](configs/retinanet) for details.
### SSD
Please refer to [SSD](configs/ssd) for details.
## SeetaDet # SeetaDet
## WHAT's SeetaDet? SeetaDet is a platform implementing popular object detection algorithms.
SeetaDet is a platform implementing popular object detection algorithms, This repository is based on [seeta-dragon](https://github.com/seetaresearch/dragon),
including R-CNN series, SSD, and RetinaNet. while the style of codes is torch.
We have achieved the same or higher performance than the baseline reported by the original paper.
This repository is based on [Dragon](https://github.com/seetaresearch/dragon),
while the style of codes is PyTorch.
The torch-style codes help us to simplify the hierarchical pipeline of modern detection. The torch-style codes help us to simplify the hierarchical pipeline of modern detection.
## Requirements ## Requirements
seeta-dragon >= 0.3.0.dev20200723 seeta-dragon >= 0.3.0.dev20201014
## Installation ## Installation
#### Build From Source ### Build From Source
If you prefer to develop modules as well as running experiments, If you prefer to develop modules as well as running experiments,
following commands will build but not install to ***site-packages***: following commands will build but not install to ***site-packages***:
```bash ```bash
cd SeetaDet && python setup.py build cd seetadet && python setup.py build
``` ```
#### Install From Source ### Install From Source
Clone this repository to local disk and install: Clone this repository to local disk and install:
```bash ```bash
cd SeetaDet && python setup.py install cd seetadet && python setup.py install
``` ```
#### Install From Git ### Install From Git
You can also install it from remote repository: You can also install it from remote repository:
...@@ -45,16 +40,16 @@ pip install git+https://gitlab.seetatech.com/seetaresearch/seetadet.git@master ...@@ -45,16 +40,16 @@ pip install git+https://gitlab.seetatech.com/seetaresearch/seetadet.git@master
## Quick Start ## Quick Start
#### Train a detection model ### Train a detection model
```bash ```bash
cd tools cd tools
python train.py --cfg <MODEL_YAML> python train.py --cfg <MODEL_YAML>
``` ```
We have provided the default YAML examples into ``seetadet/configs``. We have provided the default YAML examples into [configs](configs).
#### Test a detection model ### Test a detection model
```bash ```bash
cd tools cd tools
...@@ -64,42 +59,33 @@ Or ...@@ -64,42 +59,33 @@ Or
```bash ```bash
cd tools cd tools
python test_all.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> python test_all.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --last 1
``` ```
#### Export a detection model to ONNX ### Export a detection model to ONNX
```bash ```bash
cd tools cd tools
python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION> python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
``` ```
## Resources ## Benchmark and Model Zoo
#### Pre-trained ImageNet models
| Model | Usage |
| :------: | :------: |
| [VGG16.SSD](https://dragon.seetatech.com/download/models/seetadet/imagenet/VGG16.SSD.pth)| SSD |
| [VGG16.RCNN](https://dragon.seetatech.com/download/models/seetadet/imagenet/VGG16.RCNN.pth)| R-CNN |
| [R-18.Affine](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-18.Affine.pth)| R-CNN, RetinaNet, SSD |
| [R-34.Affine](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-34.Affine.pth)| R-CNN, RetinaNet, SSD |
| [R-50.Affine](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-50.Affine.pth)| R-CNN, RetinaNet, SSD |
| [R-101.Affine](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-101.Affine.pth)| R-CNN, RetinaNet, SSD |
| [AirNet.Affine](https://dragon.seetatech.com/download/models/seetadet/imagenet/AirNet.Affine.pth)| R-CNN, RetinaNet, SSD |
## References
[1] [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497). Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. NIPS, 2015.
[2] [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385). Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. CVPR, 2016. Results and models are available in the [Model Zoo](MODEL_ZOO.md).
[3] [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325). Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. ECCV, 2016. ### Supported Backbones
[4] [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144). Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. CVPR, 2017. - [ResNet](MODEL_ZOO.md#resnet-models)
- [VGG](MODEL_ZOO.md#vgg-models)
- [MobileNet](MODEL_ZOO.md#mobilenet-models)
- [AirNet](MODEL_ZOO.md#airnet-models)
[5] [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002). Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. ICCV, 2017. ### Supported Algorithms
[6] [Mask R-CNN](https://arxiv.org/abs/1703.06870). Kaiming He, Georgia Gkioxari, Piotr Dollár and Ross Girshick. ICCV, 2017. - [Faster R-CNN](configs/faster_rcnn)
- [Mask R-CNN](configs/mask_rcnn)
- [SSD](configs/ssd)
- [RetinaNet](configs/retinanet)
[7] [Detectron](https://github.com/facebookresearch/Detectron). Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollar and Kaiming He. 2018. ## License
[BSD 2-Clause license](LICENSE)
# Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
## Introduction
```
@article{Ren_2017,
title={Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher={Institute of Electrical and Electronics Engineers (IEEE)},
author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
year={2017},
month={Jun},
}
```
## COCO Object Detection Baselines
| Model | Lr sched | Infer time (s/im) | box AP | Download |
| :---: | :------: | :---------------: | :----: | :------: |
| [R-50-FPN-800](coco_faster_rcnn_R-50-FPN_800_1x.yml) | 1x | 0.046 | 38.3 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_1x/model_final.pkl) |
| [R-50-FPN-800](coco_faster_rcnn_R-50-FPN_800_2x.yml) | 2x | 0.046 | 39.7 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_2x/model_final.pkl) |
## Pascal VOC Object Detection Baselines
| Model | Infer time (s/im) | AP@0.5 | Download |
| :---: | :---------------: | :----: | :------: |
| [R-50-FPN-640](voc_faster_rcnn_R-50-FPN_640.yml) | 0.030 | 80.8 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/voc_faster_rcnn_R-50-FPN_640_1x/model_final.pkl) |
NUM_GPUS: 8 NUM_GPUS: 8
VIS: False PIXEL_STDS: [57.375, 57.12, 58.395]
ENABLE_TENSOR_BOARD: False PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: faster_rcnn TYPE: faster_rcnn
BACKBONE: resnet101.fpn BACKBONE: resnet50.fpn
CLASSES: ['__background__', CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'bus', 'train', 'truck', 'boat', 'traffic light',
...@@ -19,30 +19,28 @@ MODEL: ...@@ -19,30 +19,28 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush'] 'teddy bear', 'hair drier', 'toothbrush']
NUM_CLASSES: 81
SOLVER: SOLVER:
BASE_LR: 0.02 BASE_LR: 0.02
LR_POLICY: steps_with_decay
DECAY_STEPS: [60000, 80000] DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000 MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: coco_faster_rcnn SNAPSHOT_PREFIX: coco_faster_rcnn_R-50-FPN_800_1x
FRCNN: FRCNN:
ROI_XFORM_METHOD: RoIAlign BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7 ROI_XFORM_RESOLUTION: 7
TRAIN: TRAIN:
WEIGHTS: '/model/R-101.Affine.pth' WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k' DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 2 IMS_PER_BATCH: 2
BATCH_SIZE: 512 SCALES: [640, 672, 704, 736, 768, 800]
SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects USE_DIFF: False # Do not use crowd objects
TEST: TEST:
DATASET: '/data/coco_2014_minival' DATASET: '/data/coco_2014_minival'
JSON_FILE: '/data/instances_minival2014.json' JSON_FILE: '/data/instances_minival2014.json'
PROTOCOL: 'coco' PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [800] SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
NMS: 0.5 NMS: 0.5
RPN_POST_NMS_TOP_N: 1000
NUM_GPUS: 8 NUM_GPUS: 8
VIS: False PIXEL_STDS: [57.375, 57.12, 58.395]
ENABLE_TENSOR_BOARD: False PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: faster_rcnn TYPE: faster_rcnn
BACKBONE: resnet101.fpn BACKBONE: resnet50.fpn
CLASSES: ['__background__', CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'bus', 'train', 'truck', 'boat', 'traffic light',
...@@ -19,29 +19,28 @@ MODEL: ...@@ -19,29 +19,28 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush'] 'teddy bear', 'hair drier', 'toothbrush']
NUM_CLASSES: 81
SOLVER: SOLVER:
BASE_LR: 0.02 BASE_LR: 0.02
LR_POLICY: steps_with_decay
DECAY_STEPS: [120000, 160000] DECAY_STEPS: [120000, 160000]
MAX_STEPS: 180000 MAX_STEPS: 180000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: coco_faster_rcnn SNAPSHOT_PREFIX: coco_faster_rcnn_R-50-FPN_800_2x
FRCNN: FRCNN:
ROI_XFORM_METHOD: RoIAlign BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7 ROI_XFORM_RESOLUTION: 7
TRAIN: TRAIN:
WEIGHTS: '/model/R-101.Affine.pth' WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k' DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 2 IMS_PER_BATCH: 2
BATCH_SIZE: 512 SCALES: [640, 672, 704, 736, 768, 800]
SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects USE_DIFF: False # Do not use crowd objects
TEST: TEST:
DATASET: '/data/coco_2014_minival' DATASET: '/data/coco_2014_minival'
JSON_FILE: '/data/instances_minival2014.json' JSON_FILE: '/data/instances_minival2014.json'
PROTOCOL: 'coco' PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [800] SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
NMS: 0.5 NMS: 0.5
RPN_POST_NMS_TOP_N: 1000
NUM_GPUS: 1 NUM_GPUS: 1
VIS: False PIXEL_STDS: [57.375, 57.12, 58.395]
ENABLE_TENSOR_BOARD: False PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: faster_rcnn TYPE: faster_rcnn
BACKBONE: resnet50.fpn BACKBONE: resnet50.fpn
...@@ -10,27 +10,26 @@ MODEL: ...@@ -10,27 +10,26 @@ MODEL:
'cow', 'diningtable', 'dog', 'horse', 'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant', 'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor'] 'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21 FRCNN:
BATCH_SIZE: 128
ROI_XFORM_RESOLUTION: 7
SOLVER: SOLVER:
BASE_LR: 0.002 BASE_LR: 0.002
DECAY_STEPS: [100000, 140000] DECAY_STEPS: [80000, 100000]
MAX_STEPS: 140000 MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_faster_rcnn SNAPSHOT_PREFIX: voc_faster_rcnn_R-50-FPN_640
FRCNN:
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
TRAIN: TRAIN:
WEIGHTS: '/model/R-50.Affine.pth' WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/voc_0712_trainval' DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 2 IMS_PER_BATCH: 2
BATCH_SIZE: 128 SCALES: [480, 512, 544, 576, 608, 640]
SCALES: [600] MAX_SIZE: 1066
MAX_SIZE: 1000 USE_COLOR_JITTER: True
TEST: TEST:
DATASET: '/data/voc_2007_test' DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco' PROTOCOL: 'voc2007'
SCALES: [600] IMS_PER_BATCH: 1
MAX_SIZE: 1000 SCALES: [640]
MAX_SIZE: 1066
NMS: 0.45 NMS: 0.45
RPN_POST_NMS_TOP_N: 1000
\ No newline at end of file
NUM_GPUS: 1
VIS: False
ENABLE_TENSOR_BOARD: False
MODEL:
TYPE: faster_rcnn
BACKBONE: vgg16.c4
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21
SOLVER:
BASE_LR: 0.001
WEIGHT_DECAY: 0.0005
DECAY_STEPS: [100000, 140000]
MAX_STEPS: 140000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_faster_rcnn
RPN:
STRIDES: [16]
SCALES: [8, 16, 32] # RField: [128, 256, 512]
ASPECT_RATIOS: [0.5, 1.0, 2.0]
FRCNN:
ROI_XFORM_METHOD: RoIPool
ROI_XFORM_RESOLUTION: 7
MLP_HEAD_DIM: 4096
TRAIN:
WEIGHTS: '/model/VGG16.RCNN.pth'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 2
BATCH_SIZE: 128
SCALES: [600]
MAX_SIZE: 1000
RPN_MIN_SIZE: 16
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco'
SCALES: [600]
MAX_SIZE: 1000
RPN_MIN_SIZE: 16
NMS: 0.45
RPN_POST_NMS_TOP_N: 300
\ No newline at end of file
# Mask R-CNN
## Introduction
```
@article{He_2017,
title={Mask R-CNN},
journal={2017 IEEE International Conference on Computer Vision (ICCV)},
publisher={IEEE},
author={He, Kaiming and Gkioxari, Georgia and Dollar, Piotr and Girshick, Ross},
year={2017},
month={Oct}
}
```
## COCO Instance Segmentation Baselines
| Model | Lr sched | Infer time (s/im) | box AP | mask AP | Download |
| :---: | :------: | :---------------: | :----: | :-----: | :------: |
| [R-50-FPN-800](coco_mask_rcnn_R-50-FPN_800_1x.yml) | 1x | 0.056 | 39.2 | 34.8 | [model](https://dragon.seetatech.com/download/models/seetadet/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_1x/model_final.pkl) |
| [R-50-FPN-800](coco_mask_rcnn_R-50-FPN_800_2x.yml) | 2x | 0.056 | 41.4 | 36.5 | [model](https://dragon.seetatech.com/download/models/seetadet/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_2x/model_final.pkl) |
NUM_GPUS: 8 NUM_GPUS: 8
VIS: False PIXEL_STDS: [57.375, 57.12, 58.395]
ENABLE_TENSOR_BOARD: False PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: mask_rcnn TYPE: mask_rcnn
BACKBONE: resnet101.fpn BACKBONE: resnet50.fpn
CLASSES: ['__background__', CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'bus', 'train', 'truck', 'boat', 'traffic light',
...@@ -19,25 +19,22 @@ MODEL: ...@@ -19,25 +19,22 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush'] 'teddy bear', 'hair drier', 'toothbrush']
NUM_CLASSES: 81
SOLVER: SOLVER:
BASE_LR: 0.02 BASE_LR: 0.02
DECAY_STEPS: [60000, 80000] DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000 MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: coco_mask_rcnn SNAPSHOT_PREFIX: coco_mask_rcnn_R-50-FPN_800_1x
FRCNN: FRCNN:
ROI_XFORM_METHOD: RoIAlign BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7 ROI_XFORM_RESOLUTION: 7
MRCNN: MRCNN:
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 ROI_XFORM_RESOLUTION: 14
TRAIN: TRAIN:
WEIGHTS: '/model/R-101.Affine.pth' WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k' DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 2 IMS_PER_BATCH: 2
BATCH_SIZE: 512 SCALES: [640, 672, 704, 736, 768, 800]
SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects USE_DIFF: False # Do not use crowd objects
TEST: TEST:
...@@ -47,5 +44,3 @@ TEST: ...@@ -47,5 +44,3 @@ TEST:
SCALES: [800] SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
NMS: 0.5 NMS: 0.5
RPN_POST_NMS_TOP_N: 1000
NUM_GPUS: 8 NUM_GPUS: 8
VIS: False PIXEL_STDS: [57.375, 57.12, 58.395]
ENABLE_TENSOR_BOARD: False PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: mask_rcnn TYPE: mask_rcnn
BACKBONE: resnet101.fpn BACKBONE: resnet50.fpn
CLASSES: ['__background__', CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'bus', 'train', 'truck', 'boat', 'traffic light',
...@@ -19,25 +19,22 @@ MODEL: ...@@ -19,25 +19,22 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush'] 'teddy bear', 'hair drier', 'toothbrush']
NUM_CLASSES: 81
SOLVER: SOLVER:
BASE_LR: 0.02 BASE_LR: 0.02
DECAY_STEPS: [120000, 160000] DECAY_STEPS: [120000, 160000]
MAX_STEPS: 180000 MAX_STEPS: 180000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: coco_mask_rcnn SNAPSHOT_PREFIX: coco_mask_rcnn_R-50-FPN_800_2x
FRCNN: FRCNN:
ROI_XFORM_METHOD: RoIAlign BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7 ROI_XFORM_RESOLUTION: 7
MRCNN: MRCNN:
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 ROI_XFORM_RESOLUTION: 14
TRAIN: TRAIN:
WEIGHTS: '/model/R-101.Affine.pth' WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k' DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 2 IMS_PER_BATCH: 2
BATCH_SIZE: 512 SCALES: [640, 672, 704, 736, 768, 800]
SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects USE_DIFF: False # Do not use crowd objects
TEST: TEST:
...@@ -47,4 +44,3 @@ TEST: ...@@ -47,4 +44,3 @@ TEST:
SCALES: [800] SCALES: [800]
MAX_SIZE: 1333 MAX_SIZE: 1333
NMS: 0.5 NMS: 0.5
RPN_POST_NMS_TOP_N: 1000
# Focal Loss for Dense Object Detection
## Introduction
```
@inproceedings{lin2017focal,
title={Focal loss for dense object detection},
author={Lin, Tsung-Yi and Goyal, Priya and Girshick, Ross and He, Kaiming and Doll{\'a}r, Piotr},
booktitle={Proceedings of the IEEE international conference on computer vision},
year={2017}
}
```
## COCO Object Detection Baselines
| Model | Lr sched | Infer time (s/im) | box AP | Download |
| :---: | :------: | :---------------: | :----: | :------: |
| [R-50-FPN-416](coco_retinanet_R-50-FPN_416_6x.yml) | 6x | 0.019 | 34.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_416_6x/model_final.pkl) |
| [R-50-FPN-512](coco_retinanet_R-50-FPN_512_6x.yml) | 6x | 0.022 | 36.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_512_6x/model_final.pkl) |
| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_1x.yml) | 1x | 0.051 | 37.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_1x/model_final.pkl) |
| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_2x.yml) | 2x | 0.051 | 39.1 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_2x/model_final.pkl) |
## Pascal VOC Object Detection Baselines
| Model | Infer time (s/im) | AP@0.5 | Download |
| :---: | :---------------: | :----: | :------: |
| [R-50-FPN-416](voc_retinanet_R-50-FPN_416.yml) | 0.015 | 82.3 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_416/model_final.pkl) |
| [R-50-FPN-512](voc_retinanet_R-50-FPN_512.yml) | 0.017 | 83.0 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_512/model_final.pkl) |
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: retinanet
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SOLVER:
BASE_LR: 0.01
LR_POLICY: steps_with_decay
DECAY_STEPS: [90000, 120000]
MAX_STEPS: 135000
SNAPSHOT_EVERY: 2500
SNAPSHOT_PREFIX: coco_retinanet_R-50-FPN_416_6x
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 8
SCALES: [416]
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2014_minival'
JSON_FILE: '/data/instances_minival2014.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [416]
NMS: 0.5
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: retinanet
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SOLVER:
BASE_LR: 0.01
LR_POLICY: steps_with_decay
DECAY_STEPS: [90000, 120000]
MAX_STEPS: 135000
SNAPSHOT_EVERY: 2500
SNAPSHOT_PREFIX: coco_retinanet_R-50-FPN_512_6x
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 8
SCALES: [512]
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2014_minival'
JSON_FILE: '/data/instances_minival2014.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [512]
NMS: 0.5
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: retinanet
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SOLVER:
BASE_LR: 0.01
LR_POLICY: steps_with_decay
DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: coco_retinanet_R-50-FPN_800_1x
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2014_minival'
JSON_FILE: '/data/instances_minival2014.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 4 NUM_GPUS: 8
VIS: False PIXEL_STDS: [57.375, 57.12, 58.395]
ENABLE_TENSOR_BOARD: False PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: retinanet TYPE: retinanet
BACKBONE: resnet50.fpn BACKBONE: resnet50.fpn
...@@ -19,28 +19,28 @@ MODEL: ...@@ -19,28 +19,28 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush'] 'teddy bear', 'hair drier', 'toothbrush']
NUM_CLASSES: 81 FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SOLVER: SOLVER:
BASE_LR: 0.01 BASE_LR: 0.01
LR_POLICY: steps_with_decay
DECAY_STEPS: [120000, 160000] DECAY_STEPS: [120000, 160000]
MAX_STEPS: 180000 MAX_STEPS: 180000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: coco_retinanet_416 SNAPSHOT_PREFIX: coco_retinanet_R-50-FPN_800_2x
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
TRAIN: TRAIN:
WEIGHTS: '/model/R-50.Affine.pth' WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k' DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 16 IMS_PER_BATCH: 2
SCALES: [416] SCALES: [640, 672, 704, 736, 768, 800]
RANDOM_SCALES: [0.25, 1.0] MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects USE_DIFF: False # Do not use crowd objects
USE_COLOR_JITTER: False
TEST: TEST:
DATASET: '/data/coco_2014_minival' DATASET: '/data/coco_2014_minival'
JSON_FILE: '/data/instances_minival2014.json' JSON_FILE: '/data/instances_minival2014.json'
PROTOCOL: 'coco' PROTOCOL: 'coco'
IMS_PER_BATCH: 1 IMS_PER_BATCH: 1
SCALES: [416] SCALES: [800]
NMS: 0.5 MAX_SIZE: 1333
\ No newline at end of file NMS: 0.5
NUM_GPUS: 1 NUM_GPUS: 1
VIS: False PIXEL_STDS: [57.375, 57.12, 58.395]
VIS_ON_FILE: False PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: retinanet TYPE: retinanet
BACKBONE: airnet.fpn BACKBONE: resnet50.fpn
CLASSES: ['__background__', CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat', 'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair', 'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse', 'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant', 'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor'] 'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [40000, 50000, 60000]
MAX_STEPS: 60000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_retinanet_320
FPN: FPN:
RPN_MIN_LEVEL: 3 RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7 RPN_MAX_LEVEL: 7
RETINANET:
NUM_CONVS: 2
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_retinanet_R-50-FPN_416
PIPELINE:
TYPE: 'ssd'
TRAIN: TRAIN:
WEIGHTS: '/model/AirNet.Affine.pth' WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/voc_0712_trainval' DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 32 IMS_PER_BATCH: 16
SCALES: [320] SCALES: [416]
RANDOM_SCALES: [0.25, 1.0] RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True USE_COLOR_JITTER: True
TEST: TEST:
DATASET: '/data/voc_2007_test' DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco' PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1 IMS_PER_BATCH: 1
SCALES: [320] SCALES: [416]
NMS: 0.45 NMS: 0.45
\ No newline at end of file RETINANET_PRE_NMS_TOP_N: 1000
NUM_GPUS: 1 NUM_GPUS: 2
VIS: False PIXEL_STDS: [57.375, 57.12, 58.395]
VIS_ON_FILE: False PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: retinanet TYPE: retinanet
BACKBONE: resnet34.fpn BACKBONE: resnet50.fpn
CLASSES: ['__background__', CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat', 'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair', 'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse', 'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant', 'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor'] 'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [40000, 50000, 60000]
WARM_UP_STEPS: 2000
MAX_STEPS: 60000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_retinanet_320
FPN: FPN:
RPN_MIN_LEVEL: 3 RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7 RPN_MAX_LEVEL: 7
RETINANET:
NUM_CONVS: 2
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_retinanet_R-50-FPN_512
PIPELINE:
TYPE: 'ssd'
TRAIN: TRAIN:
WEIGHTS: '/model/R-50.Affine.pth' WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/voc_0712_trainval' DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 32 IMS_PER_BATCH: 8
SCALES: [320] SCALES: [512]
RANDOM_SCALES: [0.25, 2.0] RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True USE_COLOR_JITTER: True
TEST: TEST:
DATASET: '/data/voc_2007_test' DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco' PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1 IMS_PER_BATCH: 1
SCALES: [320] SCALES: [512]
NMS: 0.45 NMS: 0.45
\ No newline at end of file RETINANET_PRE_NMS_TOP_N: 1000
# SSD: Single Shot MultiBox Detector
## Introduction
```
@article{Liu_2016,
title={SSD: Single Shot MultiBox Detector},
journal={ECCV},
author={Liu, Wei and Anguelov, Dragomir and Erhan, Dumitru and Szegedy, Christian and Reed, Scott and Fu, Cheng-Yang and Berg, Alexander C.},
year={2016},
}
```
## Pascal VOC Object Detection Baselines
| Model | Infer time (s/im) | AP@0.5 | Download |
| :---: | :---------------: | :----: | :------: |
| [VGG-16-300](voc_ssd_VGG-16_300.yml) | 0.012 | 78.3 | [model](https://dragon.seetatech.com/download/models/seetadet/ssd/voc_ssd_VGG-16_300/model_final.pkl) |
| [VGG-16-512](voc_ssd_VGG-16_512.yml) | 0.021 | 80.1 | [model](https://dragon.seetatech.com/download/models/seetadet/ssd/voc_ssd_VGG-16_512/model_final.pkl) |
NUM_GPUS: 1
VIS: False
ENABLE_TENSOR_BOARD: False
MODEL:
TYPE: ssd
BACKBONE: airnet.fpn
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21
SOLVER:
BASE_LR: 0.001
DECAY_STEPS: [80000, 100000, 120000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_ssd_320
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 8
SSD:
NUM_CONVS: 2
MULTIBOX:
STRIDES: [8, 16, 32, 64, 100, 300]
MIN_SIZES: [30, 60, 110, 162, 213, 264]
MAX_SIZES: [60, 110, 162, 213, 264, 315]
ASPECT_RATIOS: [
[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5],
[1, 2, 0.5],
]
TRAIN:
WEIGHTS: '/model/AirNet.Affine.pth'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 32
SCALES: [320]
RANDOM_SCALES: [0.25, 1.00]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco'
IMS_PER_BATCH: 8
SCALES: [320]
NMS_TOP_K: 400
NMS: 0.45
SCORE_THRESH: 0.01
DETECTIONS_PER_IM: 200
\ No newline at end of file
NUM_GPUS: 1
VIS: False
ENABLE_TENSOR_BOARD: False
MODEL:
TYPE: ssd
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 8
SOLVER:
BASE_LR: 0.001
DECAY_STEPS: [80000, 100000, 120000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_ssd_320
SSD:
NUM_CONVS: 2
MULTIBOX:
STRIDES: [8, 16, 32, 64, 100, 300]
MIN_SIZES: [30, 60, 110, 162, 213, 264]
MAX_SIZES: [60, 110, 162, 213, 264, 315]
ASPECT_RATIOS: [
[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5],
[1, 2, 0.5]
]
TRAIN:
WEIGHTS: '/model/R-50.Affine.pth'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 32
SCALES: [320]
RANDOM_SCALES: [0.25, 1.00]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco'
IMS_PER_BATCH: 8
SCALES: [320]
NMS_TOP_K: 400
NMS: 0.45
SCORE_THRESH: 0.01
DETECTIONS_PER_IM: 200
NUM_GPUS: 1 NUM_GPUS: 1
VIS: False PIXEL_STDS: [1.0, 1.0, 1.0]
ENABLE_TENSOR_BOARD: False PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL: MODEL:
TYPE: ssd TYPE: ssd
BACKBONE: vgg16_reduced_300.mbox BACKBONE: vgg16_reduced_300
FREEZE_AT: 0 COARSEST_STRIDE: 0
CLASSES: ['__background__', CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat', 'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair', 'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse', 'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant', 'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor'] 'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21 SSD:
STRIDES: [8, 16, 32, 64, 100, 300]
ANCHOR_SIZES: [[30, 60],
[60, 110],
[110, 162],
[162, 213],
[213, 264],
[264, 315]]
ASPECT_RATIOS: [[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5],
[1, 2, 0.5]]
SOLVER: SOLVER:
BASE_LR: 0.001 BASE_LR: 0.001
WEIGHT_DECAY: 0.0005 WEIGHT_DECAY: 0.0005
DECAY_STEPS: [80000, 100000, 120000] DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000 MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000 SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_ssd_300 SNAPSHOT_PREFIX: voc_ssd_VGG-16_300
SSD:
MULTIBOX:
STRIDES: [8, 16, 32, 64, 100, 300]
MIN_SIZES: [30, 60, 110, 162, 213, 264]
MAX_SIZES: [60, 110, 162, 213, 264, 315]
ASPECT_RATIOS: [
[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5],
[1, 2, 0.5]
]
TRAIN: TRAIN:
WEIGHTS: '/model/VGG16.SSD.pth' WEIGHTS: '/model/VGG16.SSD.pkl'
DATASET: '/data/voc_0712_trainval' DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 32 IMS_PER_BATCH: 16
SCALES: [300] SCALES: [300]
RANDOM_SCALES: [0.25, 1.00] RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True USE_COLOR_JITTER: True
TEST: TEST:
DATASET: '/data/voc_2007_test' DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco' PROTOCOL: 'voc2007'
IMS_PER_BATCH: 8 IMS_PER_BATCH: 1
SCALES: [300] SCALES: [300]
NMS_TOP_K: 400
NMS: 0.45 NMS: 0.45
SCORE_THRESH: 0.01 SCORE_THRESH: 0.01
DETECTIONS_PER_IM: 200
NUM_GPUS: 2
PIXEL_STDS: [1.0, 1.0, 1.0]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: ssd
BACKBONE: vgg16_reduced_512
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
SSD:
STRIDES: [8, 16, 32, 64, 128, 256, 512]
ANCHOR_SIZES: [[35.84, 76.8],
[76.8, 153.6],
[153.6, 230.4],
[230.4, 307.2],
[307.2, 384.0],
[384.0, 460.8],
[460.8, 537.6]]
ASPECT_RATIOS: [[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5],
[1, 2, 0.5]]
SOLVER:
BASE_LR: 0.001
WEIGHT_DECAY: 0.0005
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_ssd_VGG-16_512
TRAIN:
WEIGHTS: '/model/VGG16.SSD.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 8
SCALES: [512]
RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [512]
NMS: 0.45
SCORE_THRESH: 0.01
...@@ -7,7 +7,6 @@ template <class Context> ...@@ -7,7 +7,6 @@ template <class Context>
template <typename T> template <typename T>
void NonMaxSuppressionOp<Context>::DoRunWithType() { void NonMaxSuppressionOp<Context>::DoRunWithType() {
int num_selected; int num_selected;
utils::detection::ApplyNMS( utils::detection::ApplyNMS(
Output(0)->count(), Output(0)->count(),
Output(0)->count(), Output(0)->count(),
...@@ -16,7 +15,6 @@ void NonMaxSuppressionOp<Context>::DoRunWithType() { ...@@ -16,7 +15,6 @@ void NonMaxSuppressionOp<Context>::DoRunWithType() {
Output(0)->template mutable_data<int64_t, CPUContext>(), Output(0)->template mutable_data<int64_t, CPUContext>(),
num_selected, num_selected,
ctx()); ctx());
Output(0)->Reshape({num_selected}); Output(0)->Reshape({num_selected});
} }
...@@ -24,14 +22,13 @@ template <class Context> ...@@ -24,14 +22,13 @@ template <class Context>
void NonMaxSuppressionOp<Context>::RunOnDevice() { void NonMaxSuppressionOp<Context>::RunOnDevice() {
CHECK(Input(0).ndim() == 2 && Input(0).dim(1) == 5) CHECK(Input(0).ndim() == 2 && Input(0).dim(1) == 5)
<< "\nThe dimensions of boxes should be (num_boxes, 5)."; << "\nThe dimensions of boxes should be (num_boxes, 5).";
Output(0)->Reshape({Input(0).dim(0)}); Output(0)->Reshape({Input(0).dim(0)});
DispatchHelper<TensorTypes<float>>::Call(this, Input(0)); DispatchHelper<TensorTypes<float>>::Call(this, Input(0));
} }
DEPLOY_CPU(NonMaxSuppression); DEPLOY_CPU_OPERATOR(NonMaxSuppression);
#ifdef USE_CUDA #ifdef USE_CUDA
DEPLOY_CUDA(NonMaxSuppression); DEPLOY_CUDA_OPERATOR(NonMaxSuppression);
#endif #endif
OPERATOR_SCHEMA(NonMaxSuppression).NumInputs(1).NumOutputs(1); OPERATOR_SCHEMA(NonMaxSuppression).NumInputs(1).NumOutputs(1);
......
...@@ -22,7 +22,7 @@ class NonMaxSuppressionOp final : public Operator<Context> { ...@@ -22,7 +22,7 @@ class NonMaxSuppressionOp final : public Operator<Context> {
public: public:
NonMaxSuppressionOp(const OperatorDef& def, Workspace* ws) NonMaxSuppressionOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws), : Operator<Context>(def, ws),
iou_threshold_(OpArg<float>("iou_threshold", 0.5f)) {} iou_threshold_(OP_SINGLE_ARG(float, "iou_threshold", 0.5f)) {}
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override;
......
...@@ -10,50 +10,48 @@ template <typename T> ...@@ -10,50 +10,48 @@ template <typename T>
void RetinaNetDecoderOp<Context>::DoRunWithType() { void RetinaNetDecoderOp<Context>::DoRunWithType() {
using BT = float; // DType of BBox using BT = float; // DType of BBox
using BC = CPUContext; // Context of BBox using BC = CPUContext; // Context of BBox
int feat_h, feat_w;
int C = Input(-3).dim(2), A, K;
int total_proposals = 0; int total_proposals = 0;
int num_candidates, num_boxes, num_proposals;
auto* batch_scores = Input(-3).template data<T, BC>(); auto* batch_scores = Input(SCORES).template data<T, Context>();
auto* batch_deltas = Input(-2).template data<T, BC>(); auto* batch_deltas = Input(DELTAS).template data<T, BC>();
auto* im_info = Input(-1).template data<BT, BC>(); auto* im_info = Input(IMAGE_INFO).template data<BT, BC>();
auto* y = Output(0)->template mutable_data<BT, BC>(); auto* all_proposals = Output(0)->template mutable_data<BT, BC>();
for (int n = 0; n < num_images_; ++n) { for (int im_idx = 0; im_idx < num_images_; ++im_idx) {
BT im_h = im_info[0]; BT im_h = im_info[0];
BT im_w = im_info[1]; BT im_w = im_info[1];
BT im_scale_h = im_info[2]; BT im_scale_h = im_info[2];
BT im_scale_w = im_info[2]; BT im_scale_w = im_info[2];
if (Input(-1).dim(1) == 4) im_scale_w = im_info[3]; if (Input(IMAGE_INFO).dim(1) == 4) im_scale_w = im_info[3];
auto* scores = batch_scores + n * Input(-3).stride(0);
auto* deltas = batch_deltas + n * Input(-2).stride(0);
CHECK_EQ(strides_.size(), InputSize() - 3) CHECK_EQ(strides_.size(), InputSize() - 3)
<< "\nGiven " << strides_.size() << " strides " << "\nGiven " << strides_.size() << " strides "
<< "and " << InputSize() - 3 << " features"; << "and " << InputSize() - 3 << " features";
// Select the top-k candidates as proposals // Select the top-k candidates as proposals
num_boxes = Input(-3).dim(1); auto num_boxes = Input(SCORES).dim(1);
num_candidates = Input(-3).count(1); auto num_classes = Input(SCORES).dim(2);
roi_indices_.resize(num_candidates); utils::detection::SelectProposals(
num_candidates = 0; Input(SCORES).count(1),
for (int i = 0; i < roi_indices_.size(); ++i) score_thr_,
if (scores[i] > score_thr_) roi_indices_[num_candidates++] = i; batch_scores + im_idx * Input(SCORES).stride(0),
scores_.resize(num_candidates); roi_scores_,
for (int i = 0; i < num_candidates; ++i) roi_indices_,
scores_[i] = scores[roi_indices_[i]]; ctx());
num_proposals = std::min(num_candidates, (int)pre_nms_topn_); auto num_candidates = (int)roi_scores_.size();
utils::math::ArgPartition( auto num_proposals = std::min(num_candidates, (int)pre_nms_topn_);
num_candidates, num_proposals, true, scores_.data(), indices_); utils::detection::ArgPartition(
for (int i = 0; i < num_proposals; ++i) num_candidates, num_proposals, true, roi_scores_.data(), indices_);
scores_.resize(indices_.size());
for (int i = 0; i < num_proposals; ++i) {
scores_[i] = roi_scores_[indices_[i]];
indices_[i] = roi_indices_[indices_[i]]; indices_[i] = roi_indices_[indices_[i]];
// Decode the candidates }
int base_offset = 0; // Decode proposals via anchors
int stride_offset = 0;
for (int i = 0; i < strides_.size(); i++) { for (int i = 0; i < strides_.size(); i++) {
feat_h = Input(i).dim(2); auto feature_h = Input(i).dim(2);
feat_w = Input(i).dim(3); auto feature_w = Input(i).dim(3);
K = feat_h * feat_w; auto K = feature_h * feature_w;
A = int(ratios_.size() * scales_.size()); auto A = int(ratios_.size() * scales_.size());
anchors_.resize((size_t)(A * 4)); anchors_.resize((size_t)(A * 4));
utils::detection::GenerateAnchors( utils::detection::GenerateAnchors(
strides_[i], strides_[i],
...@@ -62,35 +60,35 @@ void RetinaNetDecoderOp<Context>::DoRunWithType() { ...@@ -62,35 +60,35 @@ void RetinaNetDecoderOp<Context>::DoRunWithType() {
ratios_.data(), ratios_.data(),
scales_.data(), scales_.data(),
anchors_.data()); anchors_.data());
utils::detection::GenerateGridAnchors( utils::detection::GetShiftedAnchors(
num_proposals, num_proposals,
C, num_classes,
A, A,
feat_h, feature_h,
feat_w, feature_w,
strides_[i], strides_[i],
base_offset, stride_offset,
anchors_.data(), anchors_.data(),
indices_.data(), indices_.data(),
y); all_proposals);
base_offset += (A * K); stride_offset += (A * K);
} }
utils::detection::GenerateMCProposals( utils::detection::GenerateDetections(
num_proposals, num_proposals,
num_boxes, num_boxes,
C, num_classes,
n, im_idx,
im_h, im_h,
im_w, im_w,
im_scale_h, im_scale_h,
im_scale_w, im_scale_w,
scores, scores_.data(),
deltas, batch_deltas + im_idx * Input(DELTAS).stride(0),
indices_.data(), indices_.data(),
y); all_proposals);
total_proposals += num_proposals; total_proposals += num_proposals;
y += (num_proposals * 7); all_proposals += (num_proposals * 7);
im_info += Input(-1).dim(1); im_info += Input(IMAGE_INFO).dim(1);
} }
Output(0)->Reshape({total_proposals, 7}); Output(0)->Reshape({total_proposals, 7});
...@@ -99,20 +97,20 @@ void RetinaNetDecoderOp<Context>::DoRunWithType() { ...@@ -99,20 +97,20 @@ void RetinaNetDecoderOp<Context>::DoRunWithType() {
template <class Context> template <class Context>
void RetinaNetDecoderOp<Context>::RunOnDevice() { void RetinaNetDecoderOp<Context>::RunOnDevice() {
num_images_ = Input(0).dim(0); num_images_ = Input(0).dim(0);
CHECK_EQ(Input(-1).dim(0), num_images_) CHECK_EQ(Input(-1).dim(0), num_images_)
<< "\nExcepted " << num_images_ << " groups info, got " << "\nExcepted " << num_images_ << " groups info, got "
<< Input(-1).dim(0) << "."; << Input(-1).dim(0) << ".";
Output(0)->Reshape({num_images_ * pre_nms_topn_, 7}); Output(0)->Reshape({num_images_ * pre_nms_topn_, 7});
DispatchHelper<TensorTypes<float>>::Call(this, Input(-3)); DispatchHelper<TensorTypes<float>>::Call(this, Input(SCORES));
} }
DEPLOY_CPU(RetinaNetDecoder); DEPLOY_CPU_OPERATOR(RetinaNetDecoder);
#ifdef USE_CUDA #ifdef USE_CUDA
DEPLOY_CUDA(RetinaNetDecoder); DEPLOY_CUDA_OPERATOR(RetinaNetDecoder);
#endif #endif
OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX); OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX);
NO_GRADIENT(RetinaNetDecoder);
} // namespace dragon } // namespace dragon
...@@ -22,11 +22,11 @@ class RetinaNetDecoderOp final : public Operator<Context> { ...@@ -22,11 +22,11 @@ class RetinaNetDecoderOp final : public Operator<Context> {
public: public:
RetinaNetDecoderOp(const OperatorDef& def, Workspace* ws) RetinaNetDecoderOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws), : Operator<Context>(def, ws),
strides_(OpArgs<int64_t>("strides")), strides_(OP_REPEATED_ARG(int64_t, "strides")),
ratios_(OpArgs<float>("ratios")), ratios_(OP_REPEATED_ARG(float, "ratios")),
scales_(OpArgs<float>("scales")), scales_(OP_REPEATED_ARG(float, "scales")),
pre_nms_topn_(OpArg<int64_t>("pre_nms_top_n", 6000)), pre_nms_topn_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
score_thr_(OpArg<float>("score_thresh", 0.05f)) {} score_thr_(OP_SINGLE_ARG(float, "score_thresh", 0.05f)) {}
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override;
...@@ -34,10 +34,13 @@ class RetinaNetDecoderOp final : public Operator<Context> { ...@@ -34,10 +34,13 @@ class RetinaNetDecoderOp final : public Operator<Context> {
template <typename T> template <typename T>
void DoRunWithType(); void DoRunWithType();
enum INPUT_TAGS { SCORES = -3, DELTAS = -2, IMAGE_INFO = -1 };
protected: protected:
float score_thr_; float score_thr_;
vec64_t strides_, indices_, roi_indices_; vec64_t strides_, indices_, roi_indices_;
vector<float> ratios_, scales_, scores_, anchors_; vector<float> ratios_, scales_, anchors_;
vector<float> scores_, roi_scores_;
int64_t num_images_, pre_nms_topn_; int64_t num_images_, pre_nms_topn_;
}; };
......
...@@ -15,153 +15,81 @@ void RPNDecoderOp<Context>::DoRunWithType() { ...@@ -15,153 +15,81 @@ void RPNDecoderOp<Context>::DoRunWithType() {
int total_rois = 0, num_rois; int total_rois = 0, num_rois;
int num_candidates, num_proposals; int num_candidates, num_proposals;
auto* batch_scores = Input(-3).template data<T, BC>(); auto* batch_scores = Input(SCORES).template data<T, BC>();
auto* batch_deltas = Input(-2).template data<T, BC>(); auto* batch_deltas = Input(DELTAS).template data<T, BC>();
auto* im_info = Input(-1).template data<BT, BC>(); auto* im_info = Input(IMAGE_INFO).template data<BT, BC>();
auto* y = Output(0)->template mutable_data<BT, BC>(); auto* all_rois = Output(0)->template mutable_data<BT, BC>();
for (int n = 0; n < num_images_; ++n) { for (int im_idx = 0; im_idx < num_images_; ++im_idx) {
const BT im_h = im_info[0]; const BT im_h = im_info[0];
const BT im_w = im_info[1]; const BT im_w = im_info[1];
const BT scale = im_info[2]; auto* scores = batch_scores + im_idx * Input(SCORES).stride(0);
const BT min_box_h = min_size_ * scale; auto* deltas = batch_deltas + im_idx * Input(DELTAS).stride(0);
const BT min_box_w = min_size_ * scale; CHECK_EQ(strides_.size(), InputSize() - 3)
auto* scores = batch_scores + n * Input(-3).stride(0); << "\nGiven " << strides_.size() << " strides "
auto* deltas = batch_deltas + n * Input(-2).stride(0); << "and " << InputSize() - 3 << " feature inputs";
if (strides_.size() == 1) { CHECK_EQ(strides_.size(), scales_.size())
// Case 1: single stride << "\nGiven " << strides_.size() << " strides "
feat_h = Input(0).dim(2); << "and " << scales_.size() << " scales";
feat_w = Input(0).dim(3); // Select the top-k candidates as proposals
num_candidates = Input(SCORES).dim(1);
num_proposals = std::min(num_candidates, (int)pre_nms_top_n_);
utils::math::ArgPartition(
num_candidates, num_proposals, true, scores, indices_);
// Decode the candidates
int stride_offset = 0;
proposals_.Reshape({num_proposals, 5});
auto* proposals = proposals_.template mutable_data<BT, BC>();
for (int i = 0; i < strides_.size(); i++) {
feat_h = Input(i).dim(2);
feat_w = Input(i).dim(3);
K = feat_h * feat_w; K = feat_h * feat_w;
A = int(ratios_.size() * scales_.size()); A = (int)ratios_.size();
// Select the Top-K candidates as proposals
num_candidates = A * K;
num_proposals = std::min(num_candidates, (int)pre_nms_topn_);
utils::math::ArgPartition(
num_candidates, num_proposals, true, scores, indices_);
// Decode the candidates
anchors_.resize((size_t)(A * 4)); anchors_.resize((size_t)(A * 4));
proposals_.Reshape({num_proposals, 5});
utils::detection::GenerateAnchors( utils::detection::GenerateAnchors(
strides_[0], strides_[i],
(int)ratios_.size(), (int)ratios_.size(),
(int)scales_.size(), 1,
ratios_.data(), ratios_.data(),
scales_.data(), scales_.data(),
anchors_.data()); anchors_.data());
utils::detection::GenerateGridAnchors( utils::detection::GetShiftedAnchors(
num_proposals, num_proposals,
A, A,
feat_h, feat_h,
feat_w, feat_w,
strides_[0], strides_[i],
0, stride_offset,
anchors_.data(), anchors_.data(),
indices_.data(), indices_.data(),
proposals_.template mutable_data<BT, BC>());
utils::detection::GenerateSSProposals(
K,
num_proposals,
im_h,
im_w,
min_box_h,
min_box_w,
scores,
deltas,
indices_.data(),
proposals_.template mutable_data<BT, BC>());
// Sort, NMS and Retrieve
utils::detection::SortProposals(
0,
num_proposals - 1,
num_proposals,
proposals_.template mutable_data<BT, BC>());
utils::detection::ApplyNMS(
num_proposals,
post_nms_topn_,
nms_thr_,
proposals_.template mutable_data<BT, Context>(),
roi_indices_.data(),
num_rois,
ctx());
utils::detection::RetrieveRoIs(
num_rois,
n,
proposals_.template data<BT, BC>(),
roi_indices_.data(),
y);
} else if (strides_.size() > 1) {
// Case 2: multiple strides
CHECK_EQ(strides_.size(), InputSize() - 3)
<< "\nGiven " << strides_.size() << " strides "
<< "and " << InputSize() - 3 << " feature inputs";
CHECK_EQ(strides_.size(), scales_.size())
<< "\nGiven " << strides_.size() << " strides "
<< "and " << scales_.size() << " scales";
// Select the top-k candidates as proposals
num_candidates = Input(-3).dim(1);
num_proposals = std::min(num_candidates, (int)pre_nms_topn_);
utils::math::ArgPartition(
num_candidates, num_proposals, true, scores, indices_);
// Decode the candidates
int base_offset = 0;
proposals_.Reshape({num_proposals, 5});
auto* proposals = proposals_.template mutable_data<BT, BC>();
for (int i = 0; i < strides_.size(); i++) {
feat_h = Input(i).dim(2);
feat_w = Input(i).dim(3);
K = feat_h * feat_w;
A = (int)ratios_.size();
anchors_.resize((size_t)(A * 4));
utils::detection::GenerateAnchors(
strides_[i],
(int)ratios_.size(),
1,
ratios_.data(),
scales_.data(),
anchors_.data());
utils::detection::GenerateGridAnchors(
num_proposals,
A,
feat_h,
feat_w,
strides_[i],
base_offset,
anchors_.data(),
indices_.data(),
proposals);
base_offset += (A * K);
}
utils::detection::GenerateMSProposals(
num_candidates,
num_proposals,
im_h,
im_w,
min_box_h,
min_box_w,
scores,
deltas,
&indices_[0],
proposals); proposals);
// Sort, NMS and Retrieve stride_offset += (A * K);
utils::detection::SortProposals(
0, num_proposals - 1, num_proposals, proposals);
utils::detection::ApplyNMS(
num_proposals,
post_nms_topn_,
nms_thr_,
proposals_.template mutable_data<BT, Context>(),
roi_indices_.data(),
num_rois,
ctx());
utils::detection::RetrieveRoIs(
num_rois, n, proposals, roi_indices_.data(), y);
} else {
LOG(FATAL) << "Excepted at least one stride for proposals.";
} }
utils::detection::GenerateProposals(
num_candidates,
num_proposals,
im_h,
im_w,
scores,
deltas,
&indices_[0],
proposals);
// Sort, NMS and Retrieve
utils::detection::SortProposals(
0, num_proposals - 1, num_proposals, proposals);
utils::detection::ApplyNMS(
num_proposals,
post_nms_top_n_,
nms_thr_,
proposals_.template mutable_data<BT, Context>(),
roi_indices_.data(),
num_rois,
ctx());
utils::detection::RetrieveRoIs(
num_rois, im_idx, proposals, roi_indices_.data(), all_rois);
total_rois += num_rois; total_rois += num_rois;
y += (num_rois * 5); all_rois += (num_rois * 5);
im_info += Input(-1).dim(1); im_info += Input(IMAGE_INFO).dim(1);
} }
Output(0)->Reshape({total_rois, 5}); Output(0)->Reshape({total_rois, 5});
...@@ -202,22 +130,21 @@ void RPNDecoderOp<Context>::DoRunWithType() { ...@@ -202,22 +130,21 @@ void RPNDecoderOp<Context>::DoRunWithType() {
template <class Context> template <class Context>
void RPNDecoderOp<Context>::RunOnDevice() { void RPNDecoderOp<Context>::RunOnDevice() {
num_images_ = Input(0).dim(0); num_images_ = Input(0).dim(0);
CHECK_EQ(Input(IMAGE_INFO).dim(0), num_images_)
CHECK_EQ(Input(-1).dim(0), num_images_)
<< "\nExcepted " << num_images_ << " groups info, got " << "\nExcepted " << num_images_ << " groups info, got "
<< Input(-1).dim(0) << "."; << Input(IMAGE_INFO).dim(0) << ".";
roi_indices_.resize(post_nms_top_n_);
roi_indices_.resize(post_nms_topn_); Output(0)->Reshape({num_images_ * post_nms_top_n_, 5});
Output(0)->Reshape({num_images_ * post_nms_topn_, 5}); DispatchHelper<TensorTypes<float>>::Call(this, Input(SCORES));
DispatchHelper<TensorTypes<float>>::Call(this, Input(-3));
} }
DEPLOY_CPU(RPNDecoder); DEPLOY_CPU_OPERATOR(RPNDecoder);
#ifdef USE_CUDA #ifdef USE_CUDA
DEPLOY_CUDA(RPNDecoder); DEPLOY_CUDA_OPERATOR(RPNDecoder);
#endif #endif
OPERATOR_SCHEMA(RPNDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX); OPERATOR_SCHEMA(RPNDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX);
NO_GRADIENT(RPNDecoder);
} // namespace dragon } // namespace dragon
...@@ -22,17 +22,16 @@ class RPNDecoderOp final : public Operator<Context> { ...@@ -22,17 +22,16 @@ class RPNDecoderOp final : public Operator<Context> {
public: public:
RPNDecoderOp(const OperatorDef& def, Workspace* ws) RPNDecoderOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws), : Operator<Context>(def, ws),
strides_(OpArgs<int64_t>("strides")), strides_(OP_REPEATED_ARG(int64_t, "strides")),
ratios_(OpArgs<float>("ratios")), ratios_(OP_REPEATED_ARG(float, "ratios")),
scales_(OpArgs<float>("scales")), scales_(OP_REPEATED_ARG(float, "scales")),
pre_nms_topn_(OpArg<int64_t>("pre_nms_top_n", 6000)), pre_nms_top_n_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
post_nms_topn_(OpArg<int64_t>("post_nms_top_n", 300)), post_nms_top_n_(OP_SINGLE_ARG(int64_t, "post_nms_top_n", 1000)),
nms_thr_(OpArg<float>("nms_thresh", 0.7f)), nms_thr_(OP_SINGLE_ARG(float, "nms_thresh", 0.7f)),
min_size_(OpArg<int64_t>("min_size", 16)), min_level_(OP_SINGLE_ARG(int64_t, "min_level", 2)),
min_level_(OpArg<int64_t>("min_level", 2)), max_level_(OP_SINGLE_ARG(int64_t, "max_level", 5)),
max_level_(OpArg<int64_t>("max_level", 5)), canonical_level_(OP_SINGLE_ARG(int64_t, "canonical_level", 4)),
canonical_level_(OpArg<int64_t>("canonical_level", 4)), canonical_scale_(OP_SINGLE_ARG(int64_t, "canonical_scale", 224)) {}
canonical_scale_(OpArg<int64_t>("canonical_scale", 224)) {}
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override;
...@@ -40,11 +39,13 @@ class RPNDecoderOp final : public Operator<Context> { ...@@ -40,11 +39,13 @@ class RPNDecoderOp final : public Operator<Context> {
template <typename T> template <typename T>
void DoRunWithType(); void DoRunWithType();
enum INPUT_TAGS { SCORES = -3, DELTAS = -2, IMAGE_INFO = -1 };
protected: protected:
float nms_thr_; float nms_thr_;
vec64_t strides_, indices_, roi_indices_; vec64_t strides_, indices_, roi_indices_;
vector<float> ratios_, scales_, scores_, anchors_; vector<float> ratios_, scales_, scores_, anchors_;
int64_t min_size_, pre_nms_topn_, post_nms_topn_; int64_t pre_nms_top_n_, post_nms_top_n_;
int64_t num_images_, min_level_, max_level_; int64_t num_images_, min_level_, max_level_;
int64_t canonical_level_, canonical_scale_; int64_t canonical_level_, canonical_scale_;
Tensor proposals_; Tensor proposals_;
......
...@@ -8,7 +8,6 @@ ...@@ -8,7 +8,6 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Build cxx sources.""" """Build cxx sources."""
from __future__ import absolute_import from __future__ import absolute_import
...@@ -16,14 +15,14 @@ from __future__ import division ...@@ -16,14 +15,14 @@ from __future__ import division
from __future__ import print_function from __future__ import print_function
import glob import glob
from distutils.core import setup
from dragon.tools import cpp_extension from dragon.tools import cpp_extension
if cpp_extension.CUDA_HOME is not None and \ from setuptools import setup
cpp_extension._cuda.is_available():
Extension = cpp_extension.CUDAExtension Extension = cpp_extension.CppExtension
else: if cpp_extension.CUDA_HOME is not None:
Extension = cpp_extension.CppExtension if cpp_extension._cuda.is_available():
Extension = cpp_extension.CUDAExtension
def find_sources(*dirs): def find_sources(*dirs):
...@@ -44,11 +43,12 @@ ext_modules = [ ...@@ -44,11 +43,12 @@ ext_modules = [
Extension( Extension(
name='install.lib.modules._C', name='install.lib.modules._C',
sources=find_sources('**'), sources=find_sources('**'),
define_macros=[('THRUST_IGNORE_CUB_VERSION_CHECK', None)],
), ),
] ]
setup( setup(
name='SeetaDet', name='SeetaDet',
ext_modules=ext_modules, ext_modules=ext_modules,
cmdclass={'build_ext': cpp_extension.BuildExtension} cmdclass={'build_ext': cpp_extension.BuildExtension},
) )
...@@ -47,6 +47,26 @@ void ApplyNMS<float, CPUContext>( ...@@ -47,6 +47,26 @@ void ApplyNMS<float, CPUContext>(
num_keep = count; num_keep = count;
} }
template <>
void SelectProposals<float, CPUContext>(
const int count,
const float score_thresh,
const float* input_scores,
vector<float>& output_scores,
vector<int64_t>& output_indices,
CPUContext* ctx) {
int num_proposals = 0;
for (int i = 0; i < count; ++i) {
if (input_scores[i] > score_thresh) {
output_indices[num_proposals++] = i;
}
}
output_scores.resize(num_proposals);
for (int i = 0; i < num_proposals; ++i) {
output_scores[i] = input_scores[output_indices[i]];
}
}
} // namespace detection } // namespace detection
} // namespace utils } // namespace utils
......
#ifdef USE_CUDA #ifdef USE_CUDA
#include <dragon/core/context_cuda.h> #include <dragon/core/context_cuda.h>
#include <dragon/core/workspace.h>
#include <dragon/utils/device/common_cub.h>
#include <dragon/utils/device/common_thrust.h>
#include "detection_utils.h" #include "detection_utils.h"
namespace dragon { namespace dragon {
...@@ -15,6 +18,16 @@ namespace detection { ...@@ -15,6 +18,16 @@ namespace detection {
namespace { namespace {
template <typename T> template <typename T>
struct ThresholdFunctor {
ThresholdFunctor(float thresh) : thresh_(thresh) {}
inline __device__ bool operator()(
const thrust::tuple<int64_t, T>& key_val) const {
return thrust::get<1>(key_val) > thresh_;
}
float thresh_;
};
template <typename T>
__device__ bool _CheckIoU(const T* a, const T* b, const float thresh) { __device__ bool _CheckIoU(const T* a, const T* b, const float thresh) {
const T x1 = max(a[0], b[0]); const T x1 = max(a[0], b[0]);
const T y1 = max(a[1], b[1]); const T y1 = max(a[1], b[1]);
...@@ -72,6 +85,41 @@ __global__ void _NonMaxSuppression( ...@@ -72,6 +85,41 @@ __global__ void _NonMaxSuppression(
} // namespace } // namespace
template <> template <>
void SelectProposals<float, CUDAContext>(
const int count,
const float score_thresh,
const float* in_scores,
vector<float>& out_scores,
vector<int64_t>& out_indices,
CUDAContext* ctx) {
auto* in_indices = ctx->workspace()->template data<int64_t, CUDAContext>(
{count}, "data:1")[0];
auto iter = thrust::make_zip_iterator(
thrust::make_tuple(in_indices, const_cast<float*>(in_scores)));
auto policy = thrust::cuda::par.on(ctx->cuda_stream());
thrust::counting_iterator<int64_t> offset(0);
thrust::copy(policy, offset, offset + count, in_indices);
auto last = thrust::partition(
policy, iter, iter + count, ThresholdFunctor<float>(score_thresh));
size_t num_proposals = last - iter;
out_scores.resize(num_proposals);
out_indices.resize(num_proposals);
CUDA_CHECK(cudaMemcpyAsync(
out_scores.data(),
in_scores,
num_proposals * sizeof(float),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
CUDA_CHECK(cudaMemcpyAsync(
out_indices.data(),
in_indices,
num_proposals * sizeof(int64_t),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
ctx->FinishDeviceComputation();
}
template <>
void ApplyNMS<float, CUDAContext>( void ApplyNMS<float, CUDAContext>(
const int num_boxes, const int num_boxes,
const int max_keeps, const int max_keeps,
...@@ -83,7 +131,8 @@ void ApplyNMS<float, CUDAContext>( ...@@ -83,7 +131,8 @@ void ApplyNMS<float, CUDAContext>(
const int num_blocks = DIV_UP(num_boxes, NUM_THREADS); const int num_blocks = DIV_UP(num_boxes, NUM_THREADS);
vector<uint64_t> mask_host(num_boxes * num_blocks); vector<uint64_t> mask_host(num_boxes * num_blocks);
auto* mask_dev = (uint64_t*)ctx->New(mask_host.size() * sizeof(uint64_t)); auto* mask_dev = (uint64_t*)ctx->workspace()->data<CUDAContext>(
{mask_host.size() * sizeof(uint64_t)}, "data:1")[0];
_NonMaxSuppression<<< _NonMaxSuppression<<<
dim3(num_blocks, num_blocks), dim3(num_blocks, num_blocks),
...@@ -115,9 +164,7 @@ void ApplyNMS<float, CUDAContext>( ...@@ -115,9 +164,7 @@ void ApplyNMS<float, CUDAContext>(
if (num_selected == max_keeps) break; if (num_selected == max_keeps) break;
} }
} }
num_keep = num_selected; num_keep = num_selected;
ctx->Delete(mask_dev);
} }
} // namespace detection } // namespace detection
......
...@@ -24,45 +24,37 @@ namespace detection { ...@@ -24,45 +24,37 @@ namespace detection {
#define ROUND(x) ((int)((x) + (T)0.5)) #define ROUND(x) ((int)((x) + (T)0.5))
/*! /*!
* Box API * Functional API
*/ */
template <typename T> template <typename T>
inline int FilterBoxes( inline void ArgPartition(
const T dx, const int count,
const T dy, const int kth,
const T d_log_w, const bool descend,
const T d_log_h, const T* v,
const T im_w, vec64_t& indices) {
const T im_h, indices.resize(count);
const T min_box_w, std::iota(indices.begin(), indices.end(), 0);
const T min_box_h, if (descend) {
T* bbox) { std::nth_element(
const T w = bbox[2] - bbox[0] + 1; indices.begin(),
const T h = bbox[3] - bbox[1] + 1; indices.begin() + kth,
const T ctr_x = bbox[0] + (T)0.5 * w; indices.end(),
const T ctr_y = bbox[1] + (T)0.5 * h; [&v](int64_t lhs, int64_t rhs) { return v[lhs] > v[rhs]; });
} else {
const T pred_ctr_x = dx * w + ctr_x; std::nth_element(
const T pred_ctr_y = dy * h + ctr_y; indices.begin(),
const T pred_w = exp(d_log_w) * w; indices.begin() + kth,
const T pred_h = exp(d_log_h) * h; indices.end(),
[&v](int64_t lhs, int64_t rhs) { return v[lhs] < v[rhs]; });
bbox[0] = pred_ctr_x - (T)0.5 * pred_w; }
bbox[1] = pred_ctr_y - (T)0.5 * pred_h;
bbox[2] = pred_ctr_x + (T)0.5 * pred_w;
bbox[3] = pred_ctr_y + (T)0.5 * pred_h;
bbox[0] = std::max((T)0, std::min(bbox[0], im_w - 1));
bbox[1] = std::max((T)0, std::min(bbox[1], im_h - 1));
bbox[2] = std::max((T)0, std::min(bbox[2], im_w - 1));
bbox[3] = std::max((T)0, std::min(bbox[3], im_h - 1));
const T bbox_w = bbox[2] - bbox[0] + 1;
const T bbox_h = bbox[3] - bbox[1] + 1;
return (bbox_w >= min_box_w) * (bbox_h >= min_box_h);
} }
/*!
* Box API
*/
template <typename T> template <typename T>
inline void BBoxTransform( inline void BBoxTransform(
const T dx, const T dx,
...@@ -126,28 +118,28 @@ inline void GenerateAnchors( ...@@ -126,28 +118,28 @@ inline void GenerateAnchors(
} }
template <typename T> template <typename T>
inline void GenerateGridAnchors( inline void GetShiftedAnchors(
const int num_proposals, const int num_proposals,
const int num_anchors, const int num_anchors,
const int feat_h, const int feat_h,
const int feat_w, const int feat_w,
const int stride, const int stride,
const int base_offset, const int stride_offset,
const T* anchors, const T* base_anchors,
const int64_t* indices, const int64_t* indices,
T* proposals) { T* shifted_anchors) {
T x, y; T x, y;
int idx_3d, a, h, w; int idx_3d, a, h, w;
int idx_range = num_anchors * feat_h * feat_w; int idx_range = num_anchors * feat_h * feat_w;
for (int i = 0; i < num_proposals; ++i) { for (int i = 0; i < num_proposals; ++i) {
idx_3d = (int)indices[i] - base_offset; idx_3d = (int)indices[i] - stride_offset;
if (idx_3d >= 0 && idx_3d < idx_range) { if (idx_3d >= 0 && idx_3d < idx_range) {
w = idx_3d % feat_w; w = idx_3d % feat_w;
h = (idx_3d / feat_w) % feat_h; h = (idx_3d / feat_w) % feat_h;
a = idx_3d / feat_w / feat_h; a = idx_3d / feat_w / feat_h;
x = (T)w * stride, y = (T)h * stride; x = (T)w * stride, y = (T)h * stride;
auto* A = anchors + a * 4; auto* A = base_anchors + a * 4;
auto* P = proposals + i * 5; auto* P = shifted_anchors + i * 5;
P[0] = x + A[0], P[1] = y + A[1]; P[0] = x + A[0], P[1] = y + A[1];
P[2] = x + A[2], P[3] = y + A[3]; P[2] = x + A[2], P[3] = y + A[3];
} }
...@@ -155,20 +147,20 @@ inline void GenerateGridAnchors( ...@@ -155,20 +147,20 @@ inline void GenerateGridAnchors(
} }
template <typename T> template <typename T>
inline void GenerateGridAnchors( inline void GetShiftedAnchors(
const int num_proposals, const int num_proposals,
const int num_classes, const int num_classes,
const int num_anchors, const int num_anchors,
const int feat_h, const int feat_h,
const int feat_w, const int feat_w,
const int stride, const int stride,
const int base_offset, const int stride_offset,
const T* anchors, const T* base_anchors,
const int64_t* indices, const int64_t* indices,
T* proposals) { T* shifted_anchors) {
T x, y; T x, y;
int idx_4d, a, h, w; int idx_4d, a, h, w;
int lr = num_classes * base_offset; int lr = num_classes * stride_offset;
int rr = num_classes * (num_anchors * feat_h * feat_w); int rr = num_classes * (num_anchors * feat_h * feat_w);
for (int i = 0; i < num_proposals; ++i) { for (int i = 0; i < num_proposals; ++i) {
idx_4d = (int)indices[i] - lr; idx_4d = (int)indices[i] - lr;
...@@ -178,8 +170,8 @@ inline void GenerateGridAnchors( ...@@ -178,8 +170,8 @@ inline void GenerateGridAnchors(
h = (idx_4d / feat_w) % feat_h; h = (idx_4d / feat_w) % feat_h;
a = idx_4d / feat_w / feat_h; a = idx_4d / feat_w / feat_h;
x = (T)w * stride, y = (T)h * stride; x = (T)w * stride, y = (T)h * stride;
auto* A = anchors + a * 4; auto* A = base_anchors + a * 4;
auto* P = proposals + i * 7 + 1; auto* P = shifted_anchors + i * 7 + 1;
P[0] = x + A[0], P[1] = y + A[1]; P[0] = x + A[0], P[1] = y + A[1];
P[2] = x + A[2], P[3] = y + A[3]; P[2] = x + A[2], P[3] = y + A[3];
} }
...@@ -190,22 +182,30 @@ inline void GenerateGridAnchors( ...@@ -190,22 +182,30 @@ inline void GenerateGridAnchors(
* Proposal API * Proposal API
*/ */
template <typename T, class Context>
void SelectProposals(
const int count,
const float score_thresh,
const T* input_scores,
vector<T>& output_scores,
vector<int64_t>& output_indices,
Context* ctx);
template <typename T> template <typename T>
void GenerateSSProposals( void GenerateProposals_v1(
const int K, const int K,
const int num_proposals, const int num_proposals,
const float im_h, const float im_h,
const float im_w, const float im_w,
const float min_box_h,
const float min_box_w,
const T* scores, const T* scores,
const T* deltas, const T* deltas,
const int64_t* indices, const int64_t* indices,
T* proposals) { T* proposals) {
// Shifted anchors in format: [K, A, 4]
int64_t index, a, k; int64_t index, a, k;
const float* delta; const T* delta;
float* proposal = proposals; T* proposal = proposals;
float dx, dy, d_log_w, d_log_h; T dx, dy, d_log_w, d_log_h;
for (int i = 0; i < num_proposals; ++i) { for (int i = 0; i < num_proposals; ++i) {
index = indices[i]; index = indices[i];
a = index / K, k = index % K; a = index / K, k = index % K;
...@@ -214,61 +214,42 @@ void GenerateSSProposals( ...@@ -214,61 +214,42 @@ void GenerateSSProposals(
dy = delta[(a * 4 + 1) * K]; dy = delta[(a * 4 + 1) * K];
d_log_w = delta[(a * 4 + 2) * K]; d_log_w = delta[(a * 4 + 2) * K];
d_log_h = delta[(a * 4 + 3) * K]; d_log_h = delta[(a * 4 + 3) * K];
proposal[4] = FilterBoxes( BBoxTransform(dx, dy, d_log_w, d_log_h, im_w, im_h, T(1), T(1), proposal);
dx, proposal[4] = scores[index];
dy,
d_log_w,
d_log_h,
im_w,
im_h,
min_box_w,
min_box_h,
proposal) *
scores[index];
proposal += 5; proposal += 5;
} }
} }
template <typename T> template <typename T>
void GenerateMSProposals( void GenerateProposals(
const int num_candidates, const int num_candidates,
const int num_proposals, const int num_proposals,
const float im_h, const float im_h,
const float im_w, const float im_w,
const float min_box_h,
const float min_box_w,
const T* scores, const T* scores,
const T* deltas, const T* deltas,
const int64_t* indices, const int64_t* indices,
T* proposals) { T* proposals) {
// Shifted anchors in format: [4, A, K]
int64_t index; int64_t index;
int64_t num_candidates_2x = 2 * num_candidates; int64_t num_candidates_2x = 2 * num_candidates;
int64_t num_candidates_3x = 3 * num_candidates; int64_t num_candidates_3x = 3 * num_candidates;
float* proposal = proposals; T* proposal = proposals;
float dx, dy, d_log_w, d_log_h; T dx, dy, d_log_w, d_log_h;
for (int i = 0; i < num_proposals; ++i) { for (int i = 0; i < num_proposals; ++i) {
index = indices[i]; index = indices[i];
dx = deltas[index]; dx = deltas[index];
dy = deltas[num_candidates + index]; dy = deltas[num_candidates + index];
d_log_w = deltas[num_candidates_2x + index]; d_log_w = deltas[num_candidates_2x + index];
d_log_h = deltas[num_candidates_3x + index]; d_log_h = deltas[num_candidates_3x + index];
proposal[4] = FilterBoxes( BBoxTransform(dx, dy, d_log_w, d_log_h, im_w, im_h, T(1), T(1), proposal);
dx, proposal[4] = scores[index];
dy,
d_log_w,
d_log_h,
im_w,
im_h,
min_box_w,
min_box_h,
proposal) *
scores[index];
proposal += 5; proposal += 5;
} }
} }
template <typename T> template <typename T>
void GenerateMCProposals( void GenerateDetections(
const int num_proposals, const int num_proposals,
const int num_boxes, const int num_boxes,
const int num_classes, const int num_classes,
...@@ -280,11 +261,11 @@ void GenerateMCProposals( ...@@ -280,11 +261,11 @@ void GenerateMCProposals(
const T* scores, const T* scores,
const T* deltas, const T* deltas,
const int64_t* indices, const int64_t* indices,
T* proposals) { T* detections) {
int64_t index, cls; int64_t index, cls;
int64_t num_boxes_2x = 2 * num_boxes; int64_t num_boxes_2x = 2 * num_boxes;
int64_t num_boxes_3x = 3 * num_boxes; int64_t num_boxes_3x = 3 * num_boxes;
float* proposal = proposals; T* detection = detections;
float dx, dy, d_log_w, d_log_h; float dx, dy, d_log_w, d_log_h;
for (int i = 0; i < num_proposals; ++i) { for (int i = 0; i < num_proposals; ++i) {
cls = indices[i] % num_classes; cls = indices[i] % num_classes;
...@@ -293,7 +274,7 @@ void GenerateMCProposals( ...@@ -293,7 +274,7 @@ void GenerateMCProposals(
dy = deltas[num_boxes + index]; dy = deltas[num_boxes + index];
d_log_w = deltas[num_boxes_2x + index]; d_log_w = deltas[num_boxes_2x + index];
d_log_h = deltas[num_boxes_3x + index]; d_log_h = deltas[num_boxes_3x + index];
proposal[0] = im_idx; detection[0] = im_idx;
BBoxTransform( BBoxTransform(
dx, dx,
dy, dy,
...@@ -303,10 +284,11 @@ void GenerateMCProposals( ...@@ -303,10 +284,11 @@ void GenerateMCProposals(
im_h, im_h,
im_scale_h, im_scale_h,
im_scale_w, im_scale_w,
proposal + 1); detection + 1);
proposal[5] = scores[indices[i]]; // detection[5] = scores[indices[i]];
proposal[6] = cls + 1; detection[5] = scores[i];
proposal += 7; detection[6] = cls + 1;
detection += 7;
} }
} }
......
...@@ -8,7 +8,6 @@ ...@@ -8,7 +8,6 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Compile the cython extensions.""" """Compile the cython extensions."""
from __future__ import absolute_import from __future__ import absolute_import
...@@ -36,7 +35,7 @@ ext_modules = [ ...@@ -36,7 +35,7 @@ ext_modules = [
include_dirs=[np.get_include()] include_dirs=[np.get_include()]
), ),
Extension( Extension(
'install.lib.pycocotools._mask', 'install.lib.utils.pycocotools._mask',
['maskApi.c', '_mask.pyx'], ['maskApi.c', '_mask.pyx'],
include_dirs=[np.get_include(), os.path.dirname(os.path.abspath(__file__))], include_dirs=[np.get_include(), os.path.dirname(os.path.abspath(__file__))],
extra_compile_args=['-w'] extra_compile_args=['-w']
......
...@@ -8,7 +8,6 @@ ...@@ -8,7 +8,6 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Make record file for COCO dataset.""" """Make record file for COCO dataset."""
from __future__ import absolute_import from __future__ import absolute_import
...@@ -27,14 +26,12 @@ if __name__ == '__main__': ...@@ -27,14 +26,12 @@ if __name__ == '__main__':
# Encode masks to RLE bytes # Encode masks to RLE bytes
if not os.path.exists('build'): if not os.path.exists('build'):
os.makedirs('build') os.makedirs('build')
make_mask('train', '2014', COCO_ROOT) make_mask('train', '2014', COCO_ROOT)
make_mask('valminusminival', '2014', COCO_ROOT) make_mask('valminusminival', '2014', COCO_ROOT)
make_mask('minival', '2014', COCO_ROOT) make_mask('minival', '2014', COCO_ROOT)
merge_mask('trainval35k', '2014', [ merge_mask('trainval35k', '2014', ['build/coco_2014_train_mask.pkl',
'build/coco_2014_train_mask.pkl', 'build/coco_2014_valminusminival_mask.pkl'])
'build/coco_2014_valminusminival_mask.pkl']
)
# coco_2014_trainval35k # coco_2014_trainval35k
make_record( make_record(
......
...@@ -10,17 +10,13 @@ ...@@ -10,17 +10,13 @@
# ------------------------------------------------------------ # ------------------------------------------------------------
import os import os
import pickle
import time import time
import cv2 import cv2
import dragon import dragon
import numpy as np import numpy as np
try:
import cPickle
except:
import pickle as cPickle
def make_example(image_file, mask_objects, im_scale=None): def make_example(image_file, mask_objects, im_scale=None):
filename = os.path.split(image_file)[-1] filename = os.path.split(image_file)[-1]
...@@ -52,6 +48,7 @@ def make_example(image_file, mask_objects, im_scale=None): ...@@ -52,6 +48,7 @@ def make_example(image_file, mask_objects, im_scale=None):
'xmax': x2, 'xmax': x2,
'ymax': y2, 'ymax': y2,
'mask': obj['mask'], 'mask': obj['mask'],
'polygons': obj['polygons'],
'difficult': obj.get('crowd', 0), 'difficult': obj.get('crowd', 0),
}) })
...@@ -80,7 +77,7 @@ def make_record( ...@@ -80,7 +77,7 @@ def make_record(
if mask_file is not None: if mask_file is not None:
with open(mask_file, 'rb') as f: with open(mask_file, 'rb') as f:
all_masks = cPickle.load(f) all_masks = pickle.load(f)
else: else:
all_masks = {} all_masks = {}
...@@ -101,6 +98,7 @@ def make_record( ...@@ -101,6 +98,7 @@ def make_record(
'xmax': 'float64', 'xmax': 'float64',
'ymax': 'float64', 'ymax': 'float64',
'mask': 'bytes', 'mask': 'bytes',
'polygons': [['float64']],
'difficult': 'int64', 'difficult': 'int64',
}] }]
} }
...@@ -111,10 +109,22 @@ def make_record( ...@@ -111,10 +109,22 @@ def make_record(
for db_idx, split in enumerate(splits): for db_idx, split in enumerate(splits):
split_file = os.path.join(splits_path[db_idx], split + '.txt') split_file = os.path.join(splits_path[db_idx], split + '.txt')
assert os.path.exists(split_file) if not os.path.exists(split_file):
with open(split_file, 'r') as f: # Fallback to try if split provided as json format
lines = f.readlines() split_file = os.path.join(splits_path[db_idx], split + '.json')
total_line += len(lines) if not os.path.exists(split_file):
raise FileNotFoundError('Unable to find the split:', split)
with open(split_file, 'r') as f:
import json
images_info = json.load(f)
total_line = len(images_info['images'])
lines = []
for info in images_info['images']:
lines.append(os.path.splitext(info['file_name'])[0])
else:
with open(split_file, 'r') as f:
lines = f.readlines()
total_line += len(lines)
for line in lines: for line in lines:
count += 1 count += 1
if count % 2000 == 0: if count % 2000 == 0:
...@@ -123,10 +133,8 @@ def make_record( ...@@ -123,10 +133,8 @@ def make_record(
count, total_line, now_time - start_time)) count, total_line, now_time - start_time))
filename = line.strip() filename = line.strip()
image_file = os.path.join(images_path[db_idx], filename + ext) image_file = os.path.join(images_path[db_idx], filename + ext)
mask_objects = all_masks[filename] if filename in all_masks else None mask_objects = all_masks[filename] if filename in all_masks else {}
if mask_objects is None: writer.write(make_example(image_file, mask_objects, im_scale))
raise ValueError('The image({}) takes invalid mask settings.'.format(filename))
writer.write( make_example(image_file, mask_objects, im_scale))
now_time = time.time() now_time = time.time()
print('{} / {} in {:.2f} sec'.format(count, total_line, now_time - start_time)) print('{} / {} in {:.2f} sec'.format(count, total_line, now_time - start_time))
......
...@@ -9,19 +9,17 @@ ...@@ -9,19 +9,17 @@
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import os import os
import sys
import os.path as osp import os.path as osp
from collections import OrderedDict import pickle
try:
import cPickle
except:
import pickle as cPickle
sys.path.insert(0, '../..') from seetadet.utils.pycocotools import mask_utils
from seetadet.pycocotools.coco import COCO from seetadet.utils.pycocotools.coco import COCO
from seetadet.pycocotools import mask_utils
class COCOWrapper(object): class COCOWrapper(object):
...@@ -31,7 +29,7 @@ class COCOWrapper(object): ...@@ -31,7 +29,7 @@ class COCOWrapper(object):
self._data_path = osp.join(data_dir) self._data_path = osp.join(data_dir)
self.invalid_cnt = 0 self.invalid_cnt = 0
self.ignore_cnt = 0 self.ignore_cnt = 0
# Load COCO API, classes, class <-> id mappings # Load COCO API, classes, class <-> id mappings
self._COCO = COCO(self._get_ann_file()) self._COCO = COCO(self._get_ann_file())
cats = self._COCO.loadCats(self._COCO.getCatIds()) cats = self._COCO.loadCats(self._COCO.getCatIds())
...@@ -39,9 +37,8 @@ class COCOWrapper(object): ...@@ -39,9 +37,8 @@ class COCOWrapper(object):
self._class_to_ind = dict(zip(self._classes, range(self.num_classes))) self._class_to_ind = dict(zip(self._classes, range(self.num_classes)))
self._ind_to_class = dict(zip(range(self.num_classes), self._classes)) self._ind_to_class = dict(zip(range(self.num_classes), self._classes))
self._class_to_cat_id = dict(zip([c['name'] for c in cats], self._COCO.getCatIds())) self._class_to_cat_id = dict(zip([c['name'] for c in cats], self._COCO.getCatIds()))
self._cat_id_to_class_id = dict([(self._class_to_cat_id[cls], self._cat_id_to_class_id = dict([(self._class_to_cat_id[cls], self._class_to_ind[cls])
self._class_to_ind[cls]) for cls in self._classes[1:]])
for cls in self._classes[1:]])
self._data_name = { self._data_name = {
# 5k ``val2014`` subset # 5k ``val2014`` subset
'minival2014': 'val2014', 'minival2014': 'val2014',
...@@ -56,10 +53,10 @@ class COCOWrapper(object): ...@@ -56,10 +53,10 @@ class COCOWrapper(object):
if self._image_set.find('test') == -1 \ if self._image_set.find('test') == -1 \
else 'image_info' else 'image_info'
return osp.join( return osp.join(
self._data_path, self._data_path,
'annotations', 'annotations',
prefix + '_' + prefix + '_' +
self._image_set + self._image_set +
self._year + '.json' self._year + '.json'
) )
...@@ -107,31 +104,32 @@ class COCOWrapper(object): ...@@ -107,31 +104,32 @@ class COCOWrapper(object):
y1 = float(max(0, obj['bbox'][1])) y1 = float(max(0, obj['bbox'][1]))
x2 = float(min(width - 1, x1 + max(0, obj['bbox'][2] - 1))) x2 = float(min(width - 1, x1 + max(0, obj['bbox'][2] - 1)))
y2 = float(min(height - 1, y1 + max(0, obj['bbox'][3] - 1))) y2 = float(min(height - 1, y1 + max(0, obj['bbox'][3] - 1)))
mask, polygons = b'', []
if isinstance(obj['segmentation'], list): if isinstance(obj['segmentation'], list):
for p in obj['segmentation']: for p in obj['segmentation']:
if len(p) < 6: if len(p) < 6:
print('Remove Invalid segm.') print('Remove Invalid segm.')
# Valid polygons have >= 3 points, so require >= 6 coordinates # Valid polygons have >= 3 points, so require >= 6 coordinates
poly = [p for p in obj['segmentation'] if len(p) >= 6] polygons = [p for p in obj['segmentation'] if len(p) >= 6]
mask_bytes = mask_utils.poly2bytes(poly, height, width) # mask_bytes = mask_utils.poly2bytes(poly, height, width)
else: else:
# Crowd masks # Crowd masks
# Some are encoded with height or width # Some are encoded with height or width
# running out of the image bound # running out of the image bound
# Do not use them or decoding error is inevitable # Do not use them or decoding error is inevitable
mask_bytes = mask_utils.poly2bytes(obj['segmentation'], height, width) mask = mask_utils.poly2bytes(obj['segmentation'], height, width)
if obj['area'] > 0 and x2 > x1 and y2 > y1: if obj['area'] > 0 and x2 > x1 and y2 > y1:
obj['clean_bbox'] = [x1, y1, x2, y2] obj['clean_bbox'] = [x1, y1, x2, y2]
valid_objects.append({ valid_objects.append({
'bbox': [x1, y1, x2, y2], 'bbox': [x1, y1, x2, y2],
'mask': mask_bytes, 'mask': mask,
'polygons': polygons,
'category_id': obj['category_id'], 'category_id': obj['category_id'],
'class_id': self._cat_id_to_class_id[obj['category_id']], 'class_id': self._cat_id_to_class_id[obj['category_id']],
'crowd': obj['iscrowd'], 'crowd': obj['iscrowd'],
}) })
valid_objects[-1]['name'] = \ valid_objects[-1]['name'] = \
self._ind_to_class[valid_objects[-1]['class_id']] self._ind_to_class[valid_objects[-1]['class_id']]
return height, width, valid_objects return height, width, valid_objects
@property @property
...@@ -150,31 +148,35 @@ def make_mask(split, year, data_dir): ...@@ -150,31 +148,35 @@ def make_mask(split, year, data_dir):
if not osp.exists(osp.join(coco._data_path, 'splits')): if not osp.exists(osp.join(coco._data_path, 'splits')):
os.makedirs(osp.join(coco._data_path, 'splits')) os.makedirs(osp.join(coco._data_path, 'splits'))
gt_recs = OrderedDict() gt_recs = collections.OrderedDict()
for i in range(coco.num_images): for i in range(coco.num_images):
filename = (coco.image_path_at(i).split('/')[-1]).split('.')[0] filename = osp.basename(coco.image_path_at(i)).split('.')[0]
h, w, objects = coco.annotation_at(i) h, w, objects = coco.annotation_at(i)
gt_recs[filename] = objects gt_recs[filename] = objects
with open(osp.join('build', 'coco_' + year + '_' + split + '_mask.pkl'), 'wb') as f: with open(osp.join('build',
cPickle.dump(gt_recs, f, cPickle.HIGHEST_PROTOCOL) 'coco_' + year +
'_' + split + '_mask.pkl'), 'wb') as f:
pickle.dump(gt_recs, f, pickle.HIGHEST_PROTOCOL)
with open(osp.join(coco._data_path, 'splits', split + '.txt'), 'w') as f: with open(osp.join(coco._data_path, 'splits', split + '.txt'), 'w') as f:
for i in range(coco.num_images): for i in range(coco.num_images):
filename = (coco.image_path_at(i).split('/')[-1]).split('.')[0] filename = str(osp.basename(coco.image_path_at(i)).split('.')[0])
if i != coco.num_images - 1: if i != coco.num_images - 1:
filename += '\n' filename += '\n'
f.write(filename) f.write(filename)
def merge_mask(split, year, mask_files): def merge_mask(split, year, mask_files):
gt_recs = OrderedDict() gt_recs = collections.OrderedDict()
data_path = os.path.dirname(mask_files[0]) data_path = os.path.dirname(mask_files[0])
for mask_file in mask_files: for mask_file in mask_files:
with open(mask_file, 'rb') as f: with open(mask_file, 'rb') as f:
recs = cPickle.load(f) recs = pickle.load(f)
gt_recs.update(recs) gt_recs.update(recs)
with open(osp.join(data_path, 'coco_' + year + '_' + split + '_mask.pkl'), 'wb') as f: with open(osp.join(data_path,
cPickle.dump(gt_recs, f, cPickle.HIGHEST_PROTOCOL) 'coco_' + year +
'_' + split + '_mask.pkl'), 'wb') as f:
pickle.dump(gt_recs, f, pickle.HIGHEST_PROTOCOL)
...@@ -132,4 +132,3 @@ def make_record( ...@@ -132,4 +132,3 @@ def make_record(
data_size = os.path.getsize(record_file + '/root.data') * 1e-6 data_size = os.path.getsize(record_file + '/root.data') * 1e-6
print('{} images take {:.2f} MB in {:.2f} sec.' print('{} images take {:.2f} MB in {:.2f} sec.'
.format(len(entries), data_size, end_time - start_time)) .format(len(entries), data_size, end_time - start_time))
...@@ -8,7 +8,6 @@ ...@@ -8,7 +8,6 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""Make record file for VOC dataset.""" """Make record file for VOC dataset."""
from __future__ import absolute_import from __future__ import absolute_import
...@@ -29,7 +28,7 @@ if __name__ == '__main__': ...@@ -29,7 +28,7 @@ if __name__ == '__main__':
annotations_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/Annotations'), annotations_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/Annotations'),
osp.join(voc_root, 'VOCdevkit2012/VOC2012/Annotations')], osp.join(voc_root, 'VOCdevkit2012/VOC2012/Annotations')],
splits_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/ImageSets/Main'), splits_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/ImageSets/Main'),
osp.join(voc_root, 'VOCdevkit2012/VOC2012/ImageSets/Main')], osp.join(voc_root, 'VOCdevkit2012/VOC2012/ImageSets/Main')],
splits=['trainval', 'trainval'] splits=['trainval', 'trainval']
) )
......
...@@ -8,3 +8,11 @@ ...@@ -8,3 +8,11 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""A platform implementing popular object detection algorithms."""
from __future__ import absolute_import as _absolute_import
from __future__ import division as _division
from __future__ import print_function as _print_function
# Version
from seetadet.version import version as __version__
...@@ -8,3 +8,9 @@ ...@@ -8,3 +8,9 @@
# <https://opensource.org/licenses/BSD-2-Clause> # <https://opensource.org/licenses/BSD-2-Clause>
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.algo.common.anchor_sampler import AnchorSampler
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.core.config import cfg
class AnchorSampler(object):
"""Sample precomputed anchors asynchronously."""
def __init__(self):
self._rpn_target = None
self._retinanet_target = None
self._ssd_target = None
if 'rcnn' in cfg.MODEL.TYPE:
from seetadet.algo.faster_rcnn import anchor_target
self._rpn_target = anchor_target.AnchorTarget()
elif cfg.MODEL.TYPE == 'retinanet':
from seetadet.algo.retinanet import anchor_target
self._retinanet_target = anchor_target.AnchorTarget()
elif cfg.MODEL.TYPE == 'ssd':
from seetadet.algo.ssd import anchor_target
self._ssd_target = anchor_target.AnchorTarget()
def __call__(self, **inputs):
"""Return the sample anchors."""
if self._rpn_target:
fg_inds, bg_inds = \
self._rpn_target.sample_anchors(
gt_boxes=inputs['gt_boxes'],
im_info=inputs['im_info'],
)
return {'fg_inds': fg_inds, 'bg_inds': bg_inds}
if self._retinanet_target:
fg_inds, ignore_inds = \
self._retinanet_target.sample_anchors(
gt_boxes=inputs['gt_boxes'],
im_info=inputs['im_info'],
)
return {'fg_inds': fg_inds, 'bg_inds': ignore_inds}
if self._ssd_target:
fg_inds, neg_inds = \
self._ssd_target.sample_anchors(
gt_boxes=inputs['gt_boxes'],
)
return {'fg_inds': fg_inds, 'bg_inds': neg_inds}
return {}
...@@ -17,7 +17,3 @@ from seetadet.algo.faster_rcnn.anchor_target import AnchorTarget ...@@ -17,7 +17,3 @@ from seetadet.algo.faster_rcnn.anchor_target import AnchorTarget
from seetadet.algo.faster_rcnn.data_loader import DataLoader from seetadet.algo.faster_rcnn.data_loader import DataLoader
from seetadet.algo.faster_rcnn.proposal import Proposal from seetadet.algo.faster_rcnn.proposal import Proposal
from seetadet.algo.faster_rcnn.proposal_target import ProposalTarget from seetadet.algo.faster_rcnn.proposal_target import ProposalTarget
from seetadet.algo.faster_rcnn.utils import generate_grid_anchors
from seetadet.algo.faster_rcnn.utils import map_blobs_by_levels
from seetadet.algo.faster_rcnn.utils import map_rois_to_levels
from seetadet.algo.faster_rcnn.utils import map_returns_to_blobs
...@@ -13,8 +13,11 @@ from __future__ import absolute_import ...@@ -13,8 +13,11 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import collections
import multiprocessing as mp import multiprocessing as mp
import time import time
import threading
import queue
import dragon import dragon
import dragon.vm.torch as torch import dragon.vm.torch as torch
...@@ -23,8 +26,8 @@ import numpy as np ...@@ -23,8 +26,8 @@ import numpy as np
from seetadet.algo.faster_rcnn import data_transformer from seetadet.algo.faster_rcnn import data_transformer
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.datasets.factory import get_dataset from seetadet.datasets.factory import get_dataset
from seetadet.utils import blob as blob_util
from seetadet.utils import logger from seetadet.utils import logger
from seetadet.utils.blob import im_list_to_blob
class DataLoader(object): class DataLoader(object):
...@@ -33,28 +36,24 @@ class DataLoader(object): ...@@ -33,28 +36,24 @@ class DataLoader(object):
def __init__(self): def __init__(self):
super(DataLoader, self).__init__() super(DataLoader, self).__init__()
dataset = get_dataset(cfg.TRAIN.DATASET) dataset = get_dataset(cfg.TRAIN.DATASET)
if cfg.USE_DALI: self.iterator = Iterator(**{
from seetadet.dali import rcnn_pipeline as pipe 'dataset': dataset.cls,
self.iterator = pipe.new_iterator(dataset.source) 'source': dataset.source,
else: 'classes': dataset.classes,
self.iterator = Iterator(**{ 'shuffle': cfg.TRAIN.USE_SHUFFLE,
'dataset': dataset.cls, 'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'source': dataset.source, 'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
'classes': dataset.classes, })
'shuffle': cfg.TRAIN.USE_SHUFFLE, self.iterator.start()
'num_chunks': cfg.TRAIN.SHUFFLE_CHUNKS,
'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
})
def __call__(self): def __call__(self):
outputs = self.iterator.next() outputs = self.iterator.next()
if isinstance(outputs['data'], np.ndarray): if isinstance(outputs['image'], np.ndarray):
outputs['data'] = torch.from_numpy(outputs['data']) outputs['image'] = torch.from_numpy(outputs['image'])
return outputs return outputs
class Iterator(mp.Process): class Iterator(threading.Thread):
"""Iterator to return the batch of data.""" """Iterator to return the batch of data."""
def __init__(self, **kwargs): def __init__(self, **kwargs):
...@@ -68,17 +67,16 @@ class Iterator(mp.Process): ...@@ -68,17 +67,16 @@ class Iterator(mp.Process):
rank = dragon.distributed.get_rank(process_group) rank = dragon.distributed.get_rank(process_group)
# Configuration # Configuration
self._prefetch = kwargs.get('prefetch', 5)
self._batch_size = kwargs.get('batch_size', 2) self._batch_size = kwargs.get('batch_size', 2)
self._num_readers = kwargs.get('num_readers', 1) self._num_readers = kwargs.get('num_readers', 1)
self._num_transformers = kwargs.get('num_transformers', 3) self._num_transformers = kwargs.get('num_transformers', 3)
self.daemon = True self.daemon = True
# Initialize queues # Initialize queues
num_batches = self._prefetch * self._num_readers num_batches = self._num_readers
self.q_in = mp.Queue(num_batches * self._batch_size) self._queue1 = mp.Queue(num_batches * self._batch_size)
self.q1_out = mp.Queue(num_batches * self._batch_size) self._queue2 = mp.Queue(num_batches * self._batch_size)
self.q2_out = mp.Queue(num_batches * self._batch_size) self._queue3 = queue.Queue(num_batches)
# Initialize readers # Initialize readers
self._readers = [] self._readers = []
...@@ -89,7 +87,7 @@ class Iterator(mp.Process): ...@@ -89,7 +87,7 @@ class Iterator(mp.Process):
self._readers.append(dragon.io.DataReader( self._readers.append(dragon.io.DataReader(
part_idx=part_idx, num_parts=num_parts, **kwargs)) part_idx=part_idx, num_parts=num_parts, **kwargs))
self._readers[i]._seed += part_idx self._readers[i]._seed += part_idx
self._readers[i].q_out = self.q_in self._readers[i].q_out = self._queue1
self._readers[i].start() self._readers[i].start()
time.sleep(0.1) time.sleep(0.1)
...@@ -98,8 +96,7 @@ class Iterator(mp.Process): ...@@ -98,8 +96,7 @@ class Iterator(mp.Process):
for i in range(self._num_transformers): for i in range(self._num_transformers):
p = data_transformer.DataTransformer(**kwargs) p = data_transformer.DataTransformer(**kwargs)
p._seed += (i + rank * self._num_transformers) p._seed += (i + rank * self._num_transformers)
p.q_in = self.q_in p.q_in, p.q_out = self._queue1, self._queue2
p.q1_out, p.q2_out = self.q1_out, self.q2_out
p.start() p.start()
self._transformers.append(p) self._transformers.append(p)
time.sleep(0.1) time.sleep(0.1)
...@@ -122,35 +119,43 @@ class Iterator(mp.Process): ...@@ -122,35 +119,43 @@ class Iterator(mp.Process):
"""Return the next batch of data.""" """Return the next batch of data."""
return self.__next__() return self.__next__()
def run(self):
"""Main loop."""
num_images = cfg.TRAIN.IMS_PER_BATCH
num_batches = cfg.TRAIN.ASPECT_GROUPING
logger.info('Initialize prefetching batches...')
example_buffer = [self._queue2.get()
for _ in range(num_images * num_batches)]
next_examples = []
while True:
# Use cached buffer for next N examples
# Examples are sorted to simulate aspect grouping
if len(next_examples) == 0:
next_examples = example_buffer
next_examples.sort(key=lambda d: d['aspect_ratio'])
example_buffer = []
# Prepare the next batch
outputs = collections.defaultdict(list)
for i in range(num_images):
example = next_examples.pop(0)
outputs['image'].append(example['image'])
outputs['gt_boxes'].append(example['boxes'])
outputs['im_info'].append(example['im_info'])
outputs['fg_inds'].append(example.get('fg_inds', None))
outputs['bg_inds'].append(example.get('bg_inds', None))
example_buffer.append(self._queue2.get())
outputs['image'] = blob_util.im_list_to_blob(
outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
# Send batch data to consumer
self._queue3.put(outputs)
def __iter__(self): def __iter__(self):
"""Return the iterator self.""" """Return the iterator self."""
return self return self
def __next__(self): def __next__(self):
"""Return the next batch of data.""" """Return the next batch of data."""
q_out = None return self._queue3.get()
# Two queues to implement aspect-grouping
# This is necessary to reduce the gpu memory
# from fetching a huge square batch blob
while q_out is None:
if self.q1_out.qsize() >= cfg.TRAIN.IMS_PER_BATCH:
q_out = self.q1_out
elif self.q2_out.qsize() >= cfg.TRAIN.IMS_PER_BATCH:
q_out = self.q2_out
self.q1_out, self.q2_out = self.q2_out, self.q1_out
images, images_info, boxes_to_pack = [], [], []
for i in range(cfg.TRAIN.IMS_PER_BATCH):
image, image_scale, boxes = q_out.get()
images.append(image)
images_info.append(list(image.shape[:2]) + [image_scale])
gt_boxes = np.zeros((boxes.shape[0], boxes.shape[1] + 1), 'float32')
gt_boxes[:, :boxes.shape[1]], gt_boxes[:, -1] = boxes, i
boxes_to_pack.append(gt_boxes)
return {
'data': im_list_to_blob(images),
'ims_info': np.array(images_info, dtype=np.float32),
'gt_boxes': np.concatenate(boxes_to_pack),
}
...@@ -15,109 +15,122 @@ from __future__ import print_function ...@@ -15,109 +15,122 @@ from __future__ import print_function
import multiprocessing import multiprocessing
import cv2
import numpy as np import numpy as np
import numpy.random as npr
from seetadet.algo import common as algo_common
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.datasets.example import Example from seetadet.datasets.example import Example
from seetadet.utils import boxes as box_util from seetadet.utils import boxes as box_util
from seetadet.utils.blob import prep_im_for_blob from seetadet.utils import image as image_util
class DataTransformer(multiprocessing.Process): class DataTransformer(multiprocessing.Process):
def __init__(self, **kwargs): def __init__(self, **kwargs):
super(DataTransformer, self).__init__() super(DataTransformer, self).__init__()
self._scales = cfg.TRAIN.SCALES self._scales = cfg.TRAIN.SCALES
self._random_scales = cfg.TRAIN.RANDOM_SCALES
self._max_size = cfg.TRAIN.MAX_SIZE self._max_size = cfg.TRAIN.MAX_SIZE
self._seed = cfg.RNG_SEED self._seed = cfg.RNG_SEED
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._use_diff = cfg.TRAIN.USE_DIFF self._use_diff = cfg.TRAIN.USE_DIFF
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._use_distort = cfg.TRAIN.USE_COLOR_JITTER
self._classes = kwargs.get('classes', ('__background__',)) self._classes = kwargs.get('classes', ('__background__',))
self._num_classes = len(self._classes) self._num_classes = len(self._classes)
self._class_to_ind = dict(zip(self._classes, range(self._num_classes))) self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self.q_in = self.q1_out = self.q2_out = None self._anchor_sampler = algo_common.AnchorSampler()
self.q_in = self.q_out = None
self.daemon = True self.daemon = True
def make_roi_dict(self, example, im_scale, apply_flip=False): def get_boxes(self, example, im_scale):
objects, n_objects = example.objects, 0 objects, num_objects = example.objects, 0
height, width = example.height, example.width height, width = example.height, example.width
if not self._use_diff: if not self._use_diff:
for obj in objects: for obj in objects:
if obj.get('difficult', 0) == 0: if obj.get('difficult', 0) == 0:
n_objects += 1 num_objects += 1
else: else:
n_objects = len(objects) num_objects = len(objects)
roi_dict = { boxes = np.zeros((num_objects, 4), 'float32')
'boxes': np.zeros((n_objects, 4), 'float32'), gt_classes = np.zeros((num_objects,), 'float32')
'gt_classes': np.zeros((n_objects,), 'int32'),
}
# Filter the difficult instances # Filter the difficult instances
object_idx = 0 object_idx = 0
for obj in objects: for obj in objects:
if not self._use_diff and \ if not self._use_diff and obj.get('difficult', 0) > 0:
obj.get('difficult', 0) > 0:
continue continue
bbox = obj['bbox'] bbox = obj['bbox']
roi_dict['boxes'][object_idx, :] = [ boxes[object_idx, :] = [max(0, bbox[0]),
max(0, bbox[0]), max(0, bbox[1]),
max(0, bbox[1]), min(bbox[2], width - 1),
min(bbox[2], width - 1), min(bbox[3], height - 1)]
min(bbox[3], height - 1), gt_classes[object_idx] = self._class_to_ind[obj['name']]
]
roi_dict['gt_classes'][object_idx] = \
self._class_to_ind[obj['name']]
object_idx += 1 object_idx += 1
# Flip the boxes if necessary
if apply_flip:
roi_dict['boxes'] = \
box_util.flip_boxes(
roi_dict['boxes'],
width,
)
# Scale the boxes to the detecting scale # Scale the boxes to the detecting scale
roi_dict['boxes'] *= im_scale boxes *= im_scale
# Attach the classes
gt_boxes = np.empty((num_objects, 5), dtype=np.float32)
gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
return roi_dict return gt_boxes
def get(self, example): def get(self, example):
example = Example(example) example = Example(example)
img = example.image
# Scale # Resize
target_size = self._scales[np.random.randint(len(self._scales))] img, im_scale = image_util.resize_image_with_target_size(
img, im_scale = prep_im_for_blob(img, target_size, self._max_size) example.image,
target_size=npr.choice(self._scales),
max_size=self._max_size,
random_scales=self._random_scales,
)
# Flip # Flip
apply_flip = False flipped = False
if self._use_flipped: if self._use_flipped and npr.randint(2) > 0:
if np.random.randint(2) > 0: img = img[:, ::-1]
img = img[:, ::-1] flipped = True
apply_flip = True
# Distort
if self._use_distort:
img = image_util.distort_image(img)
# Boxes
boxes = self.get_boxes(example, im_scale)
# Flip the boxes if necessary
if flipped:
boxes = box_util.flip_boxes(boxes, img.shape[1])
# Example -> RoIDict # Standard outputs.
roi_dict = self.make_roi_dict(example, im_scale, apply_flip) outputs = {'image': img,
'boxes': boxes,
'im_info': img.shape[:2] + (im_scale,)}
# Post-Process for gt boxes # Attach precomputed targets.
# Shape like: [num_objects, {x1, y1, x2, y2, cls}] if len(boxes) > 0:
gt_boxes = np.empty((len(roi_dict['gt_classes']), 5), dtype=np.float32) outputs.update(
gt_boxes[:, :4], gt_boxes[:, 4] = roi_dict['boxes'], roi_dict['gt_classes'] self._anchor_sampler(
gt_boxes=boxes,
im_info=outputs['im_info']))
return img, im_scale, gt_boxes return outputs
def run(self): def run(self):
# Fix the process-local random seed # Disable the opencv threading.
cv2.setNumThreads(1)
# Fix the process-local random seed.
np.random.seed(self._seed) np.random.seed(self._seed)
# Main prefetch loop # Main prefetch loop
while True: while True:
outputs = self.get(self.q_in.get()) outputs = self.get(self.q_in.get())
if len(outputs[2]) < 1: if len(outputs['boxes']) < 1:
continue # Ignore the non-object image continue # Ignore non-object image.
aspect_ratio = float(outputs[0].shape[0]) / outputs[0].shape[1] height, width = outputs['image'].shape[:2]
if aspect_ratio > 1.: outputs['aspect_ratio'] = float(height) / float(width)
self.q1_out.put(outputs) self.q_out.put(outputs)
else:
self.q2_out.put(outputs)
...@@ -17,8 +17,8 @@ import collections ...@@ -17,8 +17,8 @@ import collections
import numpy as np import numpy as np
from seetadet.algo.faster_rcnn.generate_anchors import generate_anchors from seetadet.algo.faster_rcnn import generate_anchors as anchor_util
from seetadet.algo.faster_rcnn.utils import generate_grid_anchors from seetadet.algo.faster_rcnn import utils as rcnn_util
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util from seetadet.utils import boxes as box_util
from seetadet.utils import nms from seetadet.utils import nms
...@@ -29,59 +29,50 @@ class Proposal(object): ...@@ -29,59 +29,50 @@ class Proposal(object):
def __init__(self): def __init__(self):
super(Proposal, self).__init__() super(Proposal, self).__init__()
# Load the basic configs # Load basic configs
self.scales = cfg.RPN.SCALES self.scales = cfg.RPN.SCALES
self.strides = cfg.RPN.STRIDES self.strides = cfg.RPN.STRIDES
self.ratios = cfg.RPN.ASPECT_RATIOS self.ratios = cfg.RPN.ASPECT_RATIOS
self.num_strides = len(self.strides) self.num_strides = len(self.strides)
self.defaults = collections.OrderedDict([ self.defaults = collections.OrderedDict([
('rois', np.array([[-1, 0, 0, 1, 1]], 'float32')), ('rois', np.array([[-1, 0, 0, 1, 1]], 'float32'))])
]) self.bbox_transform_clip = \
np.log(cfg.TRAIN.MAX_SIZE / min(self.strides))
# Generate base anchors # Generate base anchors
self.base_anchors = [] self.base_anchors = []
for i in range(self.num_strides): for i in range(self.num_strides):
self.base_anchors.append( self.base_anchors.append(
generate_anchors( anchor_util.generate_anchors(
self.strides[i], self.strides[i],
self.ratios, self.ratios,
np.array([self.scales[i]]) np.array([self.scales[i]])
if self.num_strides > 1 if self.num_strides > 1
else np.array(self.scales) else np.array(self.scales)))
)
)
def __call__(self, features, cls_prob, bbox_pred, ims_info): def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
pre_nms_top_n = cfg.TRAIN.RPN_PRE_NMS_TOP_N pre_nms_top_n = cfg.TRAIN.RPN_PRE_NMS_TOP_N
post_nms_top_n = cfg.TRAIN.RPN_POST_NMS_TOP_N post_nms_top_n = cfg.TRAIN.RPN_POST_NMS_TOP_N
nms_thresh = cfg.TRAIN.RPN_NMS_THRESH nms_thresh = cfg.TRAIN.RPN_NMS_THRESH
min_size = cfg.TRAIN.RPN_MIN_SIZE
# Get resources # Get resources
num_images = ims_info.shape[0] shapes = [f.shape[-2:] for f in inputs['features']]
grid_shapes = [f.shape[-2:] for f in features] all_anchors = rcnn_util.get_shifted_anchors(
all_anchors = generate_grid_anchors( shapes, self.base_anchors, self.strides)
grid_shapes, self.base_anchors, self.strides)
# Prepare for the outputs # Prepare for the outputs
batch_rois = [] batch_rois = []
cls_prob = cls_prob.numpy() cls_prob = inputs['cls_prob'].numpy()
bbox_pred = bbox_pred.numpy() # (?, 4, A * K) -> (?, A * K, 4)
if self.num_strides > 1: bbox_pred = inputs['bbox_pred'].numpy()
# (?, 4, A * K) -> (?, A * K, 4) bbox_pred = bbox_pred.transpose((0, 2, 1))
bbox_pred = bbox_pred.transpose((0, 2, 1))
else:
# (?, A * 4, H, W) -> (?, H, W, A * 4)
cls_prob = cls_prob.transpose((0, 2, 3, 1))
bbox_pred = bbox_pred.transpose((0, 2, 3, 1))
# Extract RoIs separately # Extract RoIs separately
for ix in range(num_images): for ix in range(num_images):
# [?, N] -> [? * N, 1] # [?, N] -> [? * N, 1]
scores = cls_prob[ix].reshape((-1, 1)) scores = cls_prob[ix].reshape((-1, 1))
if self.num_strides > 1: deltas = bbox_pred[ix]
deltas = bbox_pred[ix] im_info = inputs['im_info'][ix]
else:
deltas = bbox_pred[ix].reshape((-1, 4))
if pre_nms_top_n <= 0 or pre_nms_top_n >= len(scores): if pre_nms_top_n <= 0 or pre_nms_top_n >= len(scores):
order = np.argsort(-scores.squeeze()) order = np.argsort(-scores.squeeze())
...@@ -97,15 +88,11 @@ class Proposal(object): ...@@ -97,15 +88,11 @@ class Proposal(object):
scores = scores[order] scores = scores[order]
# Convert anchors into proposals via bbox transformations # Convert anchors into proposals via bbox transformations
proposals = box_util.bbox_transform_inv(anchors, deltas) proposals = box_util.bbox_transform_inv(
anchors, deltas, clip=self.bbox_transform_clip)
# Clip predicted boxes to image # Clip predicted boxes to image
proposals = box_util.clip_tiled_boxes(proposals, ims_info[ix, :2]) proposals = box_util.clip_tiled_boxes(proposals, im_info[:2])
# Remove predicted boxes with either height or width < threshold
keep = box_util.filter_boxes(proposals, min_size * ims_info[ix, 2])
proposals = proposals[keep, :]
scores = scores[keep]
# Apply nms (e.g. threshold = 0.7) # Apply nms (e.g. threshold = 0.7)
# Take after_nms_topN (e.g. 300) # Take after_nms_topN (e.g. 300)
......
...@@ -30,19 +30,17 @@ class ProposalTarget(object): ...@@ -30,19 +30,17 @@ class ProposalTarget(object):
def __init__(self): def __init__(self):
super(ProposalTarget, self).__init__() super(ProposalTarget, self).__init__()
self.num_strides = len(cfg.RPN.STRIDES) self.num_strides = len(cfg.RPN.STRIDES)
self.num_classes = cfg.MODEL.NUM_CLASSES self.num_classes = len(cfg.MODEL.CLASSES)
self.defaults = collections.OrderedDict([ self.defaults = collections.OrderedDict([
('rois', np.array([[-1, 0, 0, 1, 1]], 'float32')), ('rois', np.array([[-1, 0, 0, 1, 1]], 'float32')),
('labels', np.array([-1], 'int64')), ('labels', np.array([-1], 'int64')),
('bbox_targets', np.zeros((1, 4), 'float32')), ('bbox_targets', np.zeros((1, 4), 'float32')),
]) ])
def __call__(self, rpn_rois, gt_boxes): def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH num_images = cfg.TRAIN.IMS_PER_BATCH
# Proposal ROIs (0, x1, y1, x2, y2) coming from RPN # Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
all_rois = rpn_rois all_rois = inputs['rois']
# GT boxes (x1, y1, x2, y2, label)
gt_boxes_wide = box_util.dismantle_boxes(gt_boxes, num_images)
# Prepare for the outputs # Prepare for the outputs
keys = self.defaults.keys() keys = self.defaults.keys()
...@@ -50,22 +48,22 @@ class ProposalTarget(object): ...@@ -50,22 +48,22 @@ class ProposalTarget(object):
# Generate targets separately # Generate targets separately
for ix in range(num_images): for ix in range(num_images):
gt_boxes = gt_boxes_wide[ix] # GT boxes (x1, y1, x2, y2, label)
gt_boxes = inputs['gt_boxes'][ix]
# Extract proposals for this image # Extract proposals for this image
rois = all_rois[np.where(all_rois[:, 0].astype('int32') == ix)[0]] rois = all_rois[np.where(all_rois[:, 0].astype('int32') == ix)[0]]
# Include ground-truth boxes in the set of candidate rois # Include ground-truth boxes in the set of candidate rois
inds = np.ones((gt_boxes.shape[0], 1), gt_boxes.dtype) * ix inds = np.ones((gt_boxes.shape[0], 1), gt_boxes.dtype) * ix
rois = np.vstack((rois, np.hstack((inds, gt_boxes[:, :4])))) rois = np.vstack((rois, np.hstack((inds, gt_boxes[:, :4]))))
# Sample a batch of RoIs for training # Sample a batch of RoIs for training
rois_per_image = cfg.TRAIN.BATCH_SIZE rois_per_image = cfg.FRCNN.BATCH_SIZE
fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image) fg_rois_per_image = np.round(cfg.FRCNN.FG_FRACTION * rois_per_image)
rcnn_util.map_returns_to_blobs( rcnn_util.map_returns_to_blobs(
sample_rois( sample_rois(rois,
rois, gt_boxes,
gt_boxes, rois_per_image,
rois_per_image, fg_rois_per_image),
fg_rois_per_image, blobs, keys,
), blobs, keys,
) )
# Stack into continuous blobs # Stack into continuous blobs
...@@ -95,7 +93,7 @@ class ProposalTarget(object): ...@@ -95,7 +93,7 @@ class ProposalTarget(object):
return { return {
'rois': [new_tensor(rois) for rois in rois_wide], 'rois': [new_tensor(rois) for rois in rois_wide],
'labels': new_tensor(blobs['labels']), 'labels': new_tensor(blobs['labels']),
'bbox_indices': new_tensor(cls_inds[fg_inds] + blobs['labels'][fg_inds]), 'bbox_inds': new_tensor(cls_inds[fg_inds] + blobs['labels'][fg_inds]),
'bbox_targets': new_tensor(blobs['bbox_targets'][fg_inds].astype('float32')), 'bbox_targets': new_tensor(blobs['bbox_targets'][fg_inds].astype('float32')),
'bbox_anchors': new_tensor(blobs['rois'][fg_inds, 1:].astype('float32')), 'bbox_anchors': new_tensor(blobs['rois'][fg_inds, 1:].astype('float32')),
} }
...@@ -108,8 +106,8 @@ def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois): ...@@ -108,8 +106,8 @@ def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois):
max_overlaps = overlaps.max(axis=1) max_overlaps = overlaps.max(axis=1)
labels = gt_boxes[gt_assignment, 4].astype('int64') labels = gt_boxes[gt_assignment, 4].astype('int64')
# Select foreground RoIs as those with >= FG_THRESH overlap # Select foreground RoIs as those with >= POSITIVE_OVERLAP
fg_thresh = cfg.TRAIN.FG_THRESH fg_thresh = cfg.FRCNN.POSITIVE_OVERLAP
fg_inds = np.where(max_overlaps >= fg_thresh)[0] fg_inds = np.where(max_overlaps >= fg_thresh)[0]
while fg_inds.size == 0: while fg_inds.size == 0:
fg_thresh -= 0.01 fg_thresh -= 0.01
...@@ -119,9 +117,10 @@ def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois): ...@@ -119,9 +117,10 @@ def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois):
fg_rois_per_this_image = int(min(num_fg_rois, fg_inds.size)) fg_rois_per_this_image = int(min(num_fg_rois, fg_inds.size))
fg_inds = npr.choice(fg_inds, fg_rois_per_this_image, False) fg_inds = npr.choice(fg_inds, fg_rois_per_this_image, False)
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI) # Select background RoIs as those within
bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) & # [NEGATIVE_OVERLAP_LO, NEGATIVE_OVERLAP_HI)
(max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0] bg_inds = np.where((max_overlaps < cfg.FRCNN.NEGATIVE_OVERLAP_HI) &
(max_overlaps >= cfg.FRCNN.NEGATIVE_OVERLAP_LO))[0]
# Compute number of background RoIs to take from this image # Compute number of background RoIs to take from this image
bg_rois_per_this_image = num_rois - fg_rois_per_this_image bg_rois_per_this_image = num_rois - fg_rois_per_this_image
bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size) bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size)
...@@ -129,7 +128,7 @@ def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois): ...@@ -129,7 +128,7 @@ def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois):
if bg_inds.size > 0: if bg_inds.size > 0:
bg_inds = npr.choice(bg_inds, bg_rois_per_this_image, False) bg_inds = npr.choice(bg_inds, bg_rois_per_this_image, False)
# The indices that we're selecting (both fg and bg) # The selecting indices (both fg and bg)
keep_inds = np.append(fg_inds, bg_inds) keep_inds = np.append(fg_inds, bg_inds)
# Select sampled values from various arrays # Select sampled values from various arrays
rois, labels = all_rois[keep_inds], labels[keep_inds] rois, labels = all_rois[keep_inds], labels[keep_inds]
...@@ -137,12 +136,9 @@ def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois): ...@@ -137,12 +136,9 @@ def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois):
labels[fg_rois_per_this_image:] = 0 labels[fg_rois_per_this_image:] = 0
# Compute the target from RoIs # Compute the target from RoIs
return [ outputs = [rois, labels]
rois, outputs += [box_util.bbox_transform(
labels, rois[:, 1:5],
box_util.bbox_transform( gt_boxes[gt_assignment[keep_inds], :4],
rois[:, 1:5], cfg.BBOX_REG_WEIGHTS)]
gt_boxes[gt_assignment[keep_inds], :4], return outputs
cfg.BBOX_REG_WEIGHTS,
)
]
...@@ -20,97 +20,131 @@ import numpy as np ...@@ -20,97 +20,131 @@ import numpy as np
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.modeling.detector import new_detector from seetadet.modeling.detector import new_detector
from seetadet.utils import blob as blob_util
from seetadet.utils import boxes as box_util from seetadet.utils import boxes as box_util
from seetadet.utils import image as image_util
from seetadet.utils import logger
from seetadet.utils import nms as nms_util from seetadet.utils import nms as nms_util
from seetadet.utils import time_util from seetadet.utils import time_util
from seetadet.utils.blob import im_list_to_blob
from seetadet.utils.image import scale_image
def im_detect(detector, raw_image): def get_data(raw_images):
"""Detect a image, with single or multiple scales.""" """Return the test data."""
ims, ims_scale = scale_image(raw_image) max_size = cfg.TEST.MAX_SIZE
images_wide = []
# Prepare blobs image_shapes_wide, image_scales_wide = [], []
data = im_list_to_blob(ims) for img in raw_images:
ims_info = np.array([list(data.shape[1:3]) + [im_scale] images, image_scales = image_util.scale_image(
for im_scale in ims_scale], dtype=np.float32) img, scales=cfg.TEST.SCALES, max_size=max_size)
images_wide += images
# Do Forward image_scales_wide += image_scales
data = torch.from_numpy(data) image_shapes_wide += [img.shape[:2] for img in images]
ims_info = torch.from_numpy(ims_info) images = blob_util.im_list_to_blob(
images_wide, coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
image_shapes = np.array(image_shapes_wide)
image_scales = np.array(image_scales_wide).reshape((len(images), -1))
images_info = np.hstack([image_shapes, image_scales]).astype('float32')
return images, images_info
def ims_detect(detector, raw_images, timer=None):
"""Detect images at single or multiple scales."""
images, images_info = get_data(raw_images)
timer.tic() if timer else timer
# Do forward
inputs = {'image': torch.from_numpy(images),
'im_info': torch.from_numpy(images_info)}
if not hasattr(detector, 'script_forward'): if not hasattr(detector, 'script_forward'):
def script_forward(self, data, ims_info): def script_forward(self, image, im_info):
return self.forward({'data': data, 'ims_info': ims_info}) return self.forward({'image': image, 'im_info': im_info})
detector.script_forward = torch.jit.trace( detector.script_forward = torch.jit.trace(
func=types.MethodType(script_forward, detector), func=types.MethodType(script_forward, detector),
example_inputs=[data, ims_info], example_inputs=[inputs['image'], inputs['im_info']],
) )
outputs = detector.script_forward(inputs['image'], inputs['im_info'])
outputs = detector.script_forward(data, ims_info)
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys()) outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
# Decode results # Decode results
all_scores, all_boxes = [], [] batch_pred = box_util.bbox_transform_inv(
pred_boxes = box_util.bbox_transform_inv(
outputs['rois'][:, 1:5], outputs['rois'][:, 1:5],
outputs['bbox_pred'], outputs['bbox_pred'],
cfg.BBOX_REG_WEIGHTS, cfg.BBOX_REG_WEIGHTS)
) results = [([], []) for _ in range(len(raw_images))]
for i in range(len(images)):
for i in range(len(ims)): ii = i // len(cfg.TEST.SCALES)
inds = np.where(outputs['rois'][:, 0].astype(np.int32) == i)[0] inds = np.where(outputs['rois'][:, 0].astype(np.int32) == i)[0]
boxes = pred_boxes[inds] / ims_scale[i] boxes = batch_pred[inds] / images_info[i][2]
all_scores.append(outputs['cls_prob'][inds]) boxes = box_util.clip_tiled_boxes(boxes, raw_images[ii].shape)
all_boxes.append(box_util.clip_tiled_boxes(boxes, raw_image.shape)) results[ii][0].append(outputs['cls_prob'][inds])
results[ii][1].append(boxes)
return np.vstack(all_scores), np.vstack(all_boxes)
# Merge from multiple scales
ret = [(np.vstack(s), np.vstack(b)) for s, b in results]
def test_net(weights, num_classes, q_in, q_out, device): timer.toc() if timer else timer
num_classes, cfg.GPU_ID = num_classes, device return ret
def test_net(weights, q_in, q_out, device, root_logger=True):
"""Test a network trained with Faster R-CNN algorithm."""
cfg.GPU_ID = device
num_classes = len(cfg.MODEL.CLASSES)
logger.set_root_logger(root_logger)
detector = new_detector(device, weights) detector = new_detector(device, weights)
_t = time_util.new_timers('im_detect', 'misc') must_stop = False
timers = time_util.new_timers('im_detect_bbox', 'misc')
empty_detections = np.zeros((0, 5), 'float32')
while True: while True:
i, raw_image = q_in.get() if must_stop:
if i < 0:
break break
indices, raw_images = [], []
boxes_this_image = [[]] for _ in range(cfg.TEST.IMS_PER_BATCH):
i, raw_image = q_in.get()
with _t['im_detect'].tic_and_toc(): if i < 0:
scores, boxes = im_detect(detector, raw_image) must_stop = True
break
_t['misc'].tic() indices.append(i)
for j in range(1, num_classes): raw_images.append(raw_image)
inds = np.where(scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
cls_scores = scores[inds, j] if len(raw_images) == 0:
cls_boxes = boxes[inds, j * 4:(j + 1) * 4] continue
cls_detections = np.hstack(
(cls_boxes, cls_scores[:, np.newaxis]) results = ims_detect(detector, raw_images, timers['im_detect_bbox'])
).astype(np.float32, copy=False)
if cfg.TEST.USE_SOFT_NMS: for i, (scores, boxes) in enumerate(results):
keep = nms_util.soft_nms( timers['misc'].tic()
cls_detections, boxes_this_image = [[]]
thresh=cfg.TEST.NMS, for j in range(1, num_classes):
method=cfg.TEST.SOFT_NMS_METHOD, inds = np.where(scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
sigma=cfg.TEST.SOFT_NMS_SIGMA, if len(inds) == 0:
) boxes_this_image.append(empty_detections)
else: continue
keep = nms_util.nms( cls_scores = scores[inds, j]
cls_detections, cls_boxes = boxes[inds, j * 4:(j + 1) * 4]
thresh=cfg.TEST.NMS, cls_detections = np.hstack(
) (cls_boxes, cls_scores[:, np.newaxis])) \
cls_detections = cls_detections[keep, :] .astype(np.float32, copy=False)
boxes_this_image.append(cls_detections) if cfg.TEST.USE_SOFT_NMS:
_t['misc'].toc() keep = nms_util.soft_nms(
cls_detections,
q_out.put(( thresh=cfg.TEST.NMS,
i, method=cfg.TEST.SOFT_NMS_METHOD,
dict([('im_detect', _t['im_detect'].average_time), sigma=cfg.TEST.SOFT_NMS_SIGMA,
('misc', _t['misc'].average_time)]), )
dict([('boxes', boxes_this_image)]), else:
)) keep = nms_util.nms(
cls_detections,
thresh=cfg.TEST.NMS,
)
cls_detections = cls_detections[keep, :]
boxes_this_image.append(cls_detections)
timers['misc'].toc()
q_out.put((
indices[i],
dict([('im_detect', timers['im_detect_bbox'].average_time),
('misc', timers['misc'].average_time)]),
dict([('boxes', boxes_this_image)]),
))
...@@ -19,43 +19,78 @@ import numpy as np ...@@ -19,43 +19,78 @@ import numpy as np
from seetadet.core.config import cfg from seetadet.core.config import cfg
def generate_grid_anchors(grid_shapes, base_anchors, strides): def get_shifted_coords(shapes, base_anchors):
num_strides = len(strides) """Return the x-y coordinates of shifted anchors."""
if len(grid_shapes) != num_strides: xs, ys = [], []
raise ValueError( for i in range(len(shapes)):
'Given %d grids for %d strides.' height, width = shapes[i]
% (len(grid_shapes), num_strides) x, y = np.arange(0, width), np.arange(0, height)
) x, y = np.meshgrid(x, y)
# Generate proposals from shifted anchors # Add A anchors (A,) to cell K shifts (K,)
# to get shift coords (A, K)
xs.append(np.tile(x.flatten(), base_anchors[i].shape[0]))
ys.append(np.tile(y.flatten(), base_anchors[i].shape[0]))
return np.concatenate(xs), np.concatenate(ys)
def get_shifted_anchors(shapes, base_anchors, strides):
"""Return the shifted anchors on given shapes."""
anchors_to_pack = [] anchors_to_pack = []
for i in range(len(grid_shapes)): for i in range(len(shapes)):
height, width = grid_shapes[i] height, width = shapes[i]
shift_x = np.arange(0, width) * strides[i] shift_x = np.arange(0, width) * strides[i]
shift_y = np.arange(0, height) * strides[i] shift_y = np.arange(0, height) * strides[i]
shift_x, shift_y = np.meshgrid(shift_x, shift_y) shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(), shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose() shift_x.ravel(), shift_y.ravel())).transpose()
# Add a anchors (1, a, 4) to # Add A anchors (A, 1, 4) to cell K shifts (1, K, 4)
# cell k shifts (k, 1, 4) to get # to get shift anchors (A, K, 4)
# shift anchors (k, a, 4)
# Reshape to (k * a, 4) shifted anchors
a = base_anchors[i].shape[0] a = base_anchors[i].shape[0]
k = shifts.shape[0] k = shifts.shape[0]
anchors = (base_anchors[i].reshape((1, a, 4)) + anchors = (base_anchors[i].reshape((a, 1, 4)) +
shifts.reshape((1, k, 4)).transpose((1, 0, 2))) shifts.reshape((1, k, 4)))
if num_strides > 1: anchors_to_pack.append(anchors.reshape((a * k, 4)))
# Transpose from (K, A, 4) to (A, K, 4)
# We will pack it with other strides to
# match the data format of (N, C, H, W)
anchors = anchors.transpose((1, 0, 2))
anchors = anchors.reshape((a * k, 4))
anchors_to_pack.append(anchors)
else:
# Original order of Faster R-CNN
return anchors.reshape((k * a, 4))
return np.vstack(anchors_to_pack) return np.vstack(anchors_to_pack)
def narrow_anchors(
all_coords,
base_anchors,
max_shapes,
shapes,
inds,
remapping=None,
):
"""Return the valid shifted anchors on given shapes."""
x_coords, y_coords = all_coords
inds_wide, remapping_wide = [], []
offset = num = 0
for i in range(len(max_shapes)):
num += base_anchors[i].shape[0] * np.prod(max_shapes[i])
inds_inside = np.where((inds >= offset) & (inds < num))[0]
inds_wide.append(inds[inds_inside])
if remapping is not None:
remapping_wide.append(remapping[inds_inside])
offset = num
offset1 = offset2 = num1 = num2 = 0
for i in range(len(max_shapes)):
num1 += base_anchors[i].shape[0] * np.prod(max_shapes[i])
num2 += base_anchors[i].shape[0] * np.prod(shapes[i])
inds = inds_wide[i]
x, y = x_coords[inds], y_coords[inds]
a = ((inds - offset1) // max_shapes[i][1]) // max_shapes[i][0]
inds = (a * shapes[i][0] + y) * shapes[i][1] + x + offset2
inds_mask = np.where((x < shapes[i][1]) & (y < shapes[i][0]))[0]
inds_wide[i] = inds[inds_mask]
if remapping is not None:
remapping_wide[i] = remapping_wide[i][inds_mask]
offset1, offset2 = num1, num2
outputs = [np.concatenate(inds_wide)]
if remapping is not None:
outputs += [np.concatenate(remapping_wide)]
return outputs[0] if len(outputs) == 1 else outputs
def map_returns_to_blobs(returns, blobs, keys): def map_returns_to_blobs(returns, blobs, keys):
"""Map returns of image to blobs.""" """Map returns of image to blobs."""
for i, key in enumerate(keys): for i, key in enumerate(keys):
...@@ -83,6 +118,5 @@ def map_blobs_by_levels(blobs, defaults, lvl_inds): ...@@ -83,6 +118,5 @@ def map_blobs_by_levels(blobs, defaults, lvl_inds):
outputs[key].append( outputs[key].append(
blob[inds] blob[inds]
if len(inds) > 0 if len(inds) > 0
else defaults[key] else defaults[key])
)
return outputs return outputs
...@@ -13,8 +13,11 @@ from __future__ import absolute_import ...@@ -13,8 +13,11 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import collections
import multiprocessing as mp import multiprocessing as mp
import time import time
import threading
import queue
import dragon import dragon
import dragon.vm.torch as torch import dragon.vm.torch as torch
...@@ -23,9 +26,8 @@ import numpy as np ...@@ -23,9 +26,8 @@ import numpy as np
from seetadet.algo.mask_rcnn import data_transformer from seetadet.algo.mask_rcnn import data_transformer
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.datasets.factory import get_dataset from seetadet.datasets.factory import get_dataset
from seetadet.utils import blob as blob_util
from seetadet.utils import logger from seetadet.utils import logger
from seetadet.utils.blob import im_list_to_blob
from seetadet.utils.blob import mask_list_to_blob
class DataLoader(object): class DataLoader(object):
...@@ -39,19 +41,19 @@ class DataLoader(object): ...@@ -39,19 +41,19 @@ class DataLoader(object):
'source': dataset.source, 'source': dataset.source,
'classes': dataset.classes, 'classes': dataset.classes,
'shuffle': cfg.TRAIN.USE_SHUFFLE, 'shuffle': cfg.TRAIN.USE_SHUFFLE,
'num_chunks': cfg.TRAIN.SHUFFLE_CHUNKS,
'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2, 'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'num_transformers': cfg.TRAIN.NUM_THREADS - 1, 'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
}) })
self.iterator.start()
def __call__(self): def __call__(self):
outputs = self.iterator.next() outputs = self.iterator.next()
if isinstance(outputs['data'], np.ndarray): if isinstance(outputs['image'], np.ndarray):
outputs['data'] = torch.from_numpy(outputs['data']) outputs['image'] = torch.from_numpy(outputs['image'])
return outputs return outputs
class Iterator(mp.Process): class Iterator(threading.Thread):
"""Iterator to return the batch of data.""" """Iterator to return the batch of data."""
def __init__(self, **kwargs): def __init__(self, **kwargs):
...@@ -65,17 +67,16 @@ class Iterator(mp.Process): ...@@ -65,17 +67,16 @@ class Iterator(mp.Process):
rank = dragon.distributed.get_rank(process_group) rank = dragon.distributed.get_rank(process_group)
# Configuration # Configuration
self._prefetch = kwargs.get('prefetch', 5)
self._batch_size = kwargs.get('batch_size', 2) self._batch_size = kwargs.get('batch_size', 2)
self._num_readers = kwargs.get('num_readers', 1) self._num_readers = kwargs.get('num_readers', 1)
self._num_transformers = kwargs.get('num_transformers', 3) self._num_transformers = kwargs.get('num_transformers', 3)
self.daemon = True self.daemon = True
# Initialize queues # Initialize queues
num_batches = self._prefetch * self._num_readers num_batches = self._num_readers
self.q_in = mp.Queue(num_batches * self._batch_size) self._queue1 = mp.Queue(num_batches * self._batch_size)
self.q1_out = mp.Queue(num_batches * self._batch_size) self._queue2 = mp.Queue(num_batches * self._batch_size)
self.q2_out = mp.Queue(num_batches * self._batch_size) self._queue3 = queue.Queue(num_batches)
# Initialize readers # Initialize readers
self._readers = [] self._readers = []
...@@ -86,7 +87,7 @@ class Iterator(mp.Process): ...@@ -86,7 +87,7 @@ class Iterator(mp.Process):
self._readers.append(dragon.io.DataReader( self._readers.append(dragon.io.DataReader(
part_idx=part_idx, num_parts=num_parts, **kwargs)) part_idx=part_idx, num_parts=num_parts, **kwargs))
self._readers[i]._seed += part_idx self._readers[i]._seed += part_idx
self._readers[i].q_out = self.q_in self._readers[i].q_out = self._queue1
self._readers[i].start() self._readers[i].start()
time.sleep(0.1) time.sleep(0.1)
...@@ -95,8 +96,7 @@ class Iterator(mp.Process): ...@@ -95,8 +96,7 @@ class Iterator(mp.Process):
for i in range(self._num_transformers): for i in range(self._num_transformers):
p = data_transformer.DataTransformer(**kwargs) p = data_transformer.DataTransformer(**kwargs)
p._seed += (i + rank * self._num_transformers) p._seed += (i + rank * self._num_transformers)
p.q_in = self.q_in p.q_in, p.q_out = self._queue1, self._queue2
p.q1_out, p.q2_out = self.q1_out, self.q2_out
p.start() p.start()
self._transformers.append(p) self._transformers.append(p)
time.sleep(0.1) time.sleep(0.1)
...@@ -119,38 +119,44 @@ class Iterator(mp.Process): ...@@ -119,38 +119,44 @@ class Iterator(mp.Process):
"""Return the next batch of data.""" """Return the next batch of data."""
return self.__next__() return self.__next__()
def run(self):
"""Main loop."""
num_images = cfg.TRAIN.IMS_PER_BATCH
num_batches = cfg.TRAIN.ASPECT_GROUPING
logger.info('Initialize prefetching batches...')
example_buffer = [self._queue2.get()
for _ in range(num_images * num_batches)]
next_examples = []
while True:
# Use cached buffer for next N examples
# Examples are sorted to simulate aspect grouping
if len(next_examples) == 0:
next_examples = example_buffer
next_examples.sort(key=lambda d: d['aspect_ratio'])
example_buffer = []
# Prepare the next batch
outputs = collections.defaultdict(list)
for i in range(num_images):
example = next_examples.pop(0)
outputs['image'].append(example['image'])
outputs['gt_boxes'].append(example['boxes'])
outputs['gt_segms'].append(example['segms'])
outputs['im_info'].append(example['im_info'])
outputs['fg_inds'].append(example.get('fg_inds', None))
outputs['bg_inds'].append(example.get('bg_inds', None))
example_buffer.append(self._queue2.get())
outputs['image'] = blob_util.im_list_to_blob(
outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
# Send batch data to consumer
self._queue3.put(outputs)
def __iter__(self): def __iter__(self):
"""Return the iterator self.""" """Return the iterator self."""
return self return self
def __next__(self): def __next__(self):
"""Return the next batch of data.""" """Return the next batch of data."""
q_out = None return self._queue3.get()
# Two queues to implement aspect-grouping
# This is necessary to reduce the gpu memory
# from fetching a huge square batch blob
while q_out is None:
if self.q1_out.qsize() >= cfg.TRAIN.IMS_PER_BATCH:
q_out = self.q1_out
elif self.q2_out.qsize() >= cfg.TRAIN.IMS_PER_BATCH:
q_out = self.q2_out
self.q1_out, self.q2_out = self.q2_out, self.q1_out
images, images_info = [], []
boxes_to_pack, masks_to_pack = [], []
for i in range(cfg.TRAIN.IMS_PER_BATCH):
image, image_scale, boxes, masks = q_out.get()
images.append(image)
images_info.append(list(image.shape[:2]) + [image_scale])
gt_boxes = np.zeros((boxes.shape[0], boxes.shape[1] + 1), 'float32')
gt_boxes[:, :boxes.shape[1]], gt_boxes[:, -1] = boxes, i
boxes_to_pack.append(gt_boxes)
masks_to_pack.append(masks)
return {
'data': im_list_to_blob(images),
'ims_info': np.array(images_info, 'float32'),
'gt_boxes': np.concatenate(boxes_to_pack),
'gt_masks': mask_list_to_blob(masks_to_pack),
}
...@@ -15,134 +15,136 @@ from __future__ import print_function ...@@ -15,134 +15,136 @@ from __future__ import print_function
import multiprocessing import multiprocessing
import cv2
import numpy as np import numpy as np
import numpy.random as npr
from seetadet.algo import common as algo_common
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.datasets.example import Example from seetadet.datasets.example import Example
from seetadet.pycocotools import mask_utils from seetadet.utils.pycocotools import mask_utils
from seetadet.utils import boxes as box_util from seetadet.utils import boxes as box_util
from seetadet.utils.blob import prep_im_for_blob from seetadet.utils import image as image_util
class DataTransformer(multiprocessing.Process): class DataTransformer(multiprocessing.Process):
def __init__(self, **kwargs): def __init__(self, **kwargs):
super(DataTransformer, self).__init__() super(DataTransformer, self).__init__()
self._scales = cfg.TRAIN.SCALES self._scales = cfg.TRAIN.SCALES
self._random_scales = cfg.TRAIN.RANDOM_SCALES
self._max_size = cfg.TRAIN.MAX_SIZE self._max_size = cfg.TRAIN.MAX_SIZE
self._seed = cfg.RNG_SEED self._seed = cfg.RNG_SEED
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._use_diff = cfg.TRAIN.USE_DIFF self._use_diff = cfg.TRAIN.USE_DIFF
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._use_distort = cfg.TRAIN.USE_COLOR_JITTER
self._classes = kwargs.get('classes', ('__background__',)) self._classes = kwargs.get('classes', ('__background__',))
self._num_classes = len(self._classes) self._num_classes = len(self._classes)
self._class_to_ind = dict(zip(self._classes, range(self._num_classes))) self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self.q_in = self.q1_out = self.q2_out = None self._anchor_sampler = algo_common.AnchorSampler()
self.q_in = self.q_out = None
self.daemon = True self.daemon = True
def make_roi_dict(self, example, im_scale, apply_flip=False): def get_boxes_and_segms(self, example, im_scale, flipped):
objects, n_objects = example.objects, 0 objects, num_objects = example.objects, 0
height, width = example.height, example.width height, width = example.height, example.width
if not self._use_diff: if not self._use_diff:
for obj in objects: for obj in objects:
if obj.get('difficult', 0) == 0: if obj.get('difficult', 0) == 0:
n_objects += 1 num_objects += 1
else: else:
n_objects = len(objects) num_objects = len(objects)
roi_dict = { boxes, segms = np.zeros((num_objects, 4), 'float32'), []
'boxes': np.zeros((n_objects, 4), 'float32'), gt_classes = np.zeros((num_objects,), 'float32')
'masks': np.empty((n_objects, height, width), 'uint8'), segm_flags = np.ones((num_objects,), 'float32')
'gt_classes': np.zeros((n_objects, 1), 'int32'),
'mask_flags': np.ones((n_objects, 1), 'float32'),
}
# Filter the difficult instances # Filter the difficult instances.
object_idx = 0 object_idx = 0
for obj in objects: for obj in objects:
if not self._use_diff and \ if not self._use_diff and obj.get('difficult', 0) > 0:
obj.get('difficult', 0) > 0:
continue continue
bbox, mask = obj['bbox'], obj['mask'] bbox = obj['bbox']
roi_dict['boxes'][object_idx, :] = [ boxes[object_idx, :] = [max(0, bbox[0]),
max(0, bbox[0]), max(0, bbox[1]),
max(0, bbox[1]), min(bbox[2], width - 1),
min(bbox[2], width - 1), min(bbox[3], height - 1)]
min(bbox[3], height - 1), if 'mask' in obj:
] mask_img = mask_utils.bytes2img(obj['mask'], height, width)
if mask is not None: segms.append(mask_img[:, ::-1] if flipped else mask_img)
roi_dict['masks'][object_idx] = ( elif 'polygons' in obj:
mask_utils.bytes2img( polygons = obj['polygons']
obj['mask'], segms.append(box_util.flip_polygons(
height, polygons, width) if flipped else polygons)
width,
))
else: else:
roi_dict['mask_flags'][object_idx] = 0. segms.append(None)
roi_dict['gt_classes'][object_idx] = \ segm_flags[object_idx] = 0.
self._class_to_ind[obj['name']] gt_classes[object_idx] = self._class_to_ind[obj['name']]
object_idx += 1 object_idx += 1
# Flip the boxes if necessary # Scale the boxes to the detecting scale.
if apply_flip: boxes *= im_scale
roi_dict['boxes'] = \
box_util.flip_boxes(
roi_dict['boxes'],
width,
)
# Scale the boxes to the detecting scale # Attach the classes and mask flags.
roi_dict['boxes'] *= im_scale gt_boxes = np.empty((num_objects, 6), dtype=np.float32)
gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
gt_boxes[:, 5] = segm_flags # Has segmentation or not.
return roi_dict return gt_boxes, segms
def get(self, example): def get(self, example):
example = Example(example) example = Example(example)
img = example.image
# Scale
target_size = self._scales[np.random.randint(len(self._scales))]
img, im_scale = prep_im_for_blob(img, target_size, self._max_size)
# Flip
apply_flip = False
if self._use_flipped:
if np.random.randint(2) > 0:
img = img[:, ::-1]
apply_flip = True
# Example -> RoIDict
roi_dict = self.make_roi_dict(example, im_scale, apply_flip)
# Post-Process for gt boxes
# Shape like: [num_objects, {x1, y1, x2, y2, cls, flag}]
gt_boxes = \
np.concatenate([
roi_dict['boxes'],
roi_dict['gt_classes'],
roi_dict['mask_flags']
], axis=1)
# Post-Process for gt masks
# Shape like: [num_objects, im_h, im_w]
if gt_boxes.shape[0] > 0:
gt_masks = roi_dict['masks']
if apply_flip:
gt_masks = gt_masks[:, :, ::-1]
else:
gt_masks = None
return img, im_scale, gt_boxes, gt_masks # Resize.
img, im_scale = image_util.resize_image_with_target_size(
example.image,
target_size=npr.choice(self._scales),
max_size=self._max_size,
random_scales=self._random_scales,
)
# Flip.
flipped = False
if self._use_flipped and npr.randint(2) > 0:
img = img[:, ::-1]
flipped = True
# Distort.
if self._use_distort:
img = image_util.distort_image(img)
# Boxes and segmentations.
boxes, segms = self.get_boxes_and_segms(example, im_scale, flipped)
# Flip the boxes if necessary.
if flipped:
boxes = box_util.flip_boxes(boxes, img.shape[1])
# Standard outputs.
outputs = {'image': img,
'boxes': boxes,
'segms': segms,
'im_info': img.shape[:2] + (im_scale,)}
# Attach precomputed targets.
if len(boxes) > 0:
outputs.update(
self._anchor_sampler(
gt_boxes=boxes,
im_info=outputs['im_info']))
return outputs
def run(self): def run(self):
# Fix the process-local random seed # Disable the opencv threading.
cv2.setNumThreads(1)
# Fix the process-local random seed.
np.random.seed(self._seed) np.random.seed(self._seed)
# Main prefetch loop # Main prefetch loop
while True: while True:
outputs = self.get(self.q_in.get()) outputs = self.get(self.q_in.get())
if len(outputs[2]) < 1: if len(outputs['boxes']) < 1:
continue # Ignore the non-object image continue # Ignore non-object image.
aspect_ratio = float(outputs[0].shape[0]) / outputs[0].shape[1] height, width = outputs['image'].shape[:2]
if aspect_ratio > 1.: outputs['aspect_ratio'] = float(height) / float(width)
self.q1_out.put(outputs) self.q_out.put(outputs)
else:
self.q2_out.put(outputs)
...@@ -31,7 +31,7 @@ class ProposalTarget(object): ...@@ -31,7 +31,7 @@ class ProposalTarget(object):
def __init__(self): def __init__(self):
super(ProposalTarget, self).__init__() super(ProposalTarget, self).__init__()
self.resolution = cfg.MRCNN.RESOLUTION self.resolution = cfg.MRCNN.RESOLUTION
self.num_classes = cfg.MODEL.NUM_CLASSES self.num_classes = len(cfg.MODEL.CLASSES)
self.defaults = collections.OrderedDict([ self.defaults = collections.OrderedDict([
('rois', np.array([[-1, 0, 0, 1, 1]], 'float32')), ('rois', np.array([[-1, 0, 0, 1, 1]], 'float32')),
('labels', np.array([-1], 'int64')), ('labels', np.array([-1], 'int64')),
...@@ -39,18 +39,10 @@ class ProposalTarget(object): ...@@ -39,18 +39,10 @@ class ProposalTarget(object):
('mask_targets', -np.ones((1, self.resolution, self.resolution), 'float32')), ('mask_targets', -np.ones((1, self.resolution, self.resolution), 'float32')),
]) ])
def __call__(self, rpn_rois, gt_boxes, gt_masks, ims_info): def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH num_images = cfg.TRAIN.IMS_PER_BATCH
# Proposal ROIs (0, x1, y1, x2, y2) coming from RPN # Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
all_rois = rpn_rois all_rois = inputs['rois']
# GT boxes (x1, y1, x2, y2, label)
# GT masks (num_objects, im_h, im_w)
gt_boxes_wide, gt_masks_wide = \
mask_util.dismantle_masks(
gt_boxes,
gt_masks,
num_images,
)
# Prepare for the outputs # Prepare for the outputs
keys = self.defaults.keys() keys = self.defaults.keys()
...@@ -58,24 +50,25 @@ class ProposalTarget(object): ...@@ -58,24 +50,25 @@ class ProposalTarget(object):
# Generate targets separately # Generate targets separately
for ix in range(num_images): for ix in range(num_images):
gt_boxes = gt_boxes_wide[ix] # GT boxes (x1, y1, x2, y2, label)
gt_masks = gt_masks_wide[ix] gt_boxes = inputs['gt_boxes'][ix]
gt_segms = inputs['gt_segms'][ix]
# Extract proposals for this image # Extract proposals for this image
rois = all_rois[np.where(all_rois[:, 0].astype('int32') == ix)[0]] rois = all_rois[np.where(all_rois[:, 0].astype('int32') == ix)[0]]
# Include ground-truth boxes in the set of candidate rois # Include ground-truth boxes in the set of candidate rois
inds = np.ones((gt_boxes.shape[0], 1), gt_boxes.dtype) * ix inds = np.ones((gt_boxes.shape[0], 1), gt_boxes.dtype) * ix
rois = np.vstack((rois, np.hstack((inds, gt_boxes[:, :4])))) rois = np.vstack((rois, np.hstack((inds, gt_boxes[:, :4]))))
# Sample a batch of RoIs for training # Sample a batch of RoIs for training
rois_per_image = cfg.TRAIN.BATCH_SIZE rois_per_image = cfg.FRCNN.BATCH_SIZE
fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image) fg_rois_per_image = np.round(cfg.FRCNN.FG_FRACTION * rois_per_image)
rcnn_util.map_returns_to_blobs( rcnn_util.map_returns_to_blobs(
sample_rois( sample_rois(
rois, rois,
gt_boxes, gt_boxes,
gt_masks, gt_segms,
rois_per_image, rois_per_image,
fg_rois_per_image, fg_rois_per_image,
ims_info[ix][2], inputs['im_info'][ix][2],
), blobs, keys, ), blobs, keys,
) )
...@@ -122,10 +115,10 @@ class ProposalTarget(object): ...@@ -122,10 +115,10 @@ class ProposalTarget(object):
'rois': [new_tensor(rois_wide[i]) for i in range(num_levels)], 'rois': [new_tensor(rois_wide[i]) for i in range(num_levels)],
'mask_rois': [new_tensor(mask_rois_wide[i]) for i in range(num_levels)], 'mask_rois': [new_tensor(mask_rois_wide[i]) for i in range(num_levels)],
'labels': new_tensor(blobs['labels']), 'labels': new_tensor(blobs['labels']),
'bbox_indices': new_tensor(bbox_cls_inds[fg_inds] + blobs['labels'][fg_inds]), 'bbox_inds': new_tensor(bbox_cls_inds[fg_inds] + blobs['labels'][fg_inds]),
'bbox_targets': new_tensor(blobs['bbox_targets'][fg_inds].astype('float32')), 'bbox_targets': new_tensor(blobs['bbox_targets'][fg_inds].astype('float32')),
'bbox_anchors': new_tensor(blobs['rois'][fg_inds, 1:].astype('float32')), 'bbox_anchors': new_tensor(blobs['rois'][fg_inds, 1:].astype('float32')),
'mask_indices': new_tensor(mask_cls_inds + mask_labels), 'mask_inds': new_tensor(mask_cls_inds + mask_labels),
'mask_targets': new_tensor(blobs['mask_targets']), 'mask_targets': new_tensor(blobs['mask_targets']),
} }
...@@ -134,7 +127,7 @@ def compute_targets( ...@@ -134,7 +127,7 @@ def compute_targets(
ex_rois, ex_rois,
gt_rois, gt_rois,
gt_labels, gt_labels,
gt_masks, gt_segms,
mask_flags, mask_flags,
mask_size, mask_size,
im_scale, im_scale,
...@@ -150,29 +143,25 @@ def compute_targets( ...@@ -150,29 +143,25 @@ def compute_targets(
# Compute mask classification targets # Compute mask classification targets
mask_shape = [mask_size] * 2 mask_shape = [mask_size] * 2
ex_rois_ori = np.round(ex_rois / im_scale).astype(int) ex_rois_ori = np.round(ex_rois / im_scale).astype(int)
gt_rois_ori = np.round(gt_rois / im_scale).astype(int)
mask_targets = -np.ones([len(gt_labels)] + mask_shape, 'float32') mask_targets = -np.ones([len(gt_labels)] + mask_shape, 'float32')
for i in fg_inds: for i in fg_inds:
if mask_flags[i] > 0: if mask_flags[i] > 0:
box_mask = \ if isinstance(gt_segms[i], list):
mask_util.intersect_box_mask( ret = mask_util.warp_mask_via_polygons(
ex_rois_ori[i], gt_segms[i], ex_rois_ori[i], mask_shape)
gt_rois_ori[i], else:
gt_masks[i], gt_rois_ori = np.round(gt_rois / im_scale).astype(int)
) ret = mask_util.warp_mask_via_intersection(
if box_mask is not None: gt_segms[i], ex_rois_ori[i], gt_rois_ori[i], mask_shape)
mask_targets[i] = \ if ret is not None:
mask_util.resize_mask( mask_targets[i] = ret.astype('float32')
mask=box_mask,
size=mask_shape,
)
return bbox_targets, mask_targets return bbox_targets, mask_targets
def sample_rois( def sample_rois(
all_rois, all_rois,
gt_boxes, gt_boxes,
gt_masks, gt_segms,
num_rois, num_rois,
num_fg_rois, num_fg_rois,
im_scale, im_scale,
...@@ -184,15 +173,15 @@ def sample_rois( ...@@ -184,15 +173,15 @@ def sample_rois(
labels = gt_boxes[gt_assignment, 4].astype('int64') labels = gt_boxes[gt_assignment, 4].astype('int64')
# Select foreground RoIs as those with >= FG_THRESH overlap # Select foreground RoIs as those with >= FG_THRESH overlap
fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0] fg_inds = np.where(max_overlaps >= cfg.FRCNN.POSITIVE_OVERLAP)[0]
fg_rois_per_this_image = int(min(num_fg_rois, fg_inds.size)) fg_rois_per_this_image = int(min(num_fg_rois, fg_inds.size))
# Sample foreground regions without replacement # Sample foreground regions without replacement
if fg_inds.size > 0: if fg_inds.size > 0:
fg_inds = npr.choice(fg_inds, fg_rois_per_this_image, False) fg_inds = npr.choice(fg_inds, fg_rois_per_this_image, False)
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI) # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) & bg_inds = np.where((max_overlaps < cfg.FRCNN.NEGATIVE_OVERLAP_HI) &
(max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0] (max_overlaps >= cfg.FRCNN.NEGATIVE_OVERLAP_LO))[0]
# Compute number of background RoIs to take from this image # Compute number of background RoIs to take from this image
bg_rois_per_this_image = num_rois - fg_rois_per_this_image bg_rois_per_this_image = num_rois - fg_rois_per_this_image
bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size) bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size)
...@@ -213,7 +202,7 @@ def sample_rois( ...@@ -213,7 +202,7 @@ def sample_rois(
rois[:, 1:5], rois[:, 1:5],
gt_boxes[gt_assignment[keep_inds], :4], gt_boxes[gt_assignment[keep_inds], :4],
labels, labels,
gt_masks[gt_assignment[fg_inds]], [gt_segms[i] for i in gt_assignment[fg_inds]],
gt_boxes[gt_assignment[fg_inds], 5], gt_boxes[gt_assignment[fg_inds], 5],
cfg.MRCNN.RESOLUTION, cfg.MRCNN.RESOLUTION,
im_scale, im_scale,
......
...@@ -13,13 +13,15 @@ from __future__ import absolute_import ...@@ -13,13 +13,15 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import collections
import math
import numpy as np import numpy as np
from seetadet.algo.faster_rcnn import generate_anchors as anchor_util
from seetadet.algo.faster_rcnn import utils as rcnn_util
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.algo.faster_rcnn.generate_anchors import generate_anchors_v2
from seetadet.algo.faster_rcnn.utils import generate_grid_anchors
from seetadet.utils import boxes as box_util from seetadet.utils import boxes as box_util
from seetadet.utils import logger
from seetadet.utils.env import new_tensor from seetadet.utils.env import new_tensor
...@@ -41,95 +43,113 @@ class AnchorTarget(object): ...@@ -41,95 +43,113 @@ class AnchorTarget(object):
(2 ** (octave / float(scales_per_octave))) (2 ** (octave / float(scales_per_octave)))
for octave in range(scales_per_octave)] for octave in range(scales_per_octave)]
self.base_anchors.append( self.base_anchors.append(
generate_anchors_v2( anchor_util.generate_anchors_v2(
stride=stride, stride=stride,
ratios=self.ratios, ratios=self.ratios,
sizes=sizes, sizes=sizes))
)) # Plan the maximum anchor layout
# Store the cached grid anchors max_size = cfg.TRAIN.MAX_SIZE
self.last_grid_shapes = None if max_size == 0:
self.last_grid_anchors = None max_size = cfg.TRAIN.SCALES[0]
if cfg.MODEL.COARSEST_STRIDE > 0:
stride = float(cfg.MODEL.COARSEST_STRIDE)
max_size = int(math.ceil(max_size / stride) * stride)
self.max_shapes = [[math.ceil(max_size / stride)] * 2
for stride in self.strides]
self.all_coords = rcnn_util.get_shifted_coords(
self.max_shapes, self.base_anchors)
self.all_anchors = rcnn_util.get_shifted_anchors(
self.max_shapes, self.base_anchors, self.strides)
def sample_anchors(self, gt_boxes, im_info, all_anchors=None):
all_anchors = self.all_anchors \
if all_anchors is None else all_anchors
# Remove anchors separating from the image
inds_inside = np.where((all_anchors[:, 0] < im_info[1]) &
(all_anchors[:, 1] < im_info[0]))[0]
anchors = all_anchors[inds_inside, :]
num_inside = len(anchors)
labels = np.empty((num_inside,), dtype='int32')
labels.fill(-1)
# Overlaps between the anchors and the gt boxes.
overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
argmax_overlaps = overlaps.argmax(axis=1)
max_overlaps = overlaps[np.arange(num_inside), argmax_overlaps]
# Foreground: for each gt, anchor with highest overlap.
gt_argmax_overlaps = overlaps.argmax(axis=0)
gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])]
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
gt_assignment = argmax_overlaps[gt_argmax_overlaps]
labels[gt_argmax_overlaps] = gt_boxes[gt_assignment, 4]
# Foreground: above threshold IoU.
inds = max_overlaps >= cfg.RETINANET.POSITIVE_OVERLAP
gt_assignment = argmax_overlaps[inds]
labels[inds] = gt_boxes[gt_assignment, 4]
# Background: below threshold IoU.
labels[max_overlaps < cfg.RETINANET.NEGATIVE_OVERLAP] = 0
# Retract the clamping if we don't have one.
fg_inds = np.where(labels > 0)[0]
if len(fg_inds) == 0:
gt_assignment = argmax_overlaps[gt_argmax_overlaps]
labels[gt_argmax_overlaps] = gt_boxes[gt_assignment, 4]
fg_inds = np.where(labels > 0)[0]
# Select ignore labels to avoid too many negatives
# (~100x faster for 200 background indices)
ignore_inds = np.where(labels < 0)[0]
return inds_inside[fg_inds], inds_inside[ignore_inds]
def __call__(self, features, gt_boxes): def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH num_images = cfg.TRAIN.IMS_PER_BATCH
gt_boxes_wide = box_util.dismantle_boxes(gt_boxes, num_images) shapes = [f.shape[-2:] for f in inputs['features']]
image_stride = sum(self.base_anchors[i].shape[0] * np.prod(shapes[i])
if len(gt_boxes_wide) != num_images: for i in range(len(inputs['features'])))
logger.fatal(
'Input {} images, got {} slices of gt boxes.'
.format(num_images, len(gt_boxes_wide))
)
# Generate grid anchors from base
grid_shapes = [f.shape[-2:] for f in features]
if grid_shapes == self.last_grid_shapes:
all_anchors = self.last_grid_anchors
else:
self.last_grid_shapes = grid_shapes
self.last_grid_anchors = all_anchors = \
generate_grid_anchors(
grid_shapes,
self.base_anchors,
self.strides,
)
num_anchors = all_anchors.shape[0]
# Label: ``1`` is positive, ``0`` is negative, ``-1` is don't care narrow_args = [self.all_coords, self.base_anchors, self.max_shapes, shapes]
labels_wide = -np.ones((num_images, num_anchors,), 'int64') outputs = collections.defaultdict(list)
bbox_indices_wide, bbox_anchors_wide, bbox_targets_wide = [], [], []
# Different from R-CNN, all anchors will be used # Label: ``1`` is positive, ``0`` is negative, ``-1` is don't care
inds_inside, anchors = np.arange(num_anchors), all_anchors output_labels = np.zeros((num_images, image_stride,), 'int64')
num_inside = len(inds_inside)
for ix in range(num_images): for ix in range(num_images):
# GT boxes (x1, y1, x2, y2, label) fg_inds = inputs['fg_inds'][ix]
gt_boxes = gt_boxes_wide[ix] ignore_inds = inputs['bg_inds'][ix]
gt_boxes = inputs['gt_boxes'][ix]
# label: 1 is positive, 0 is negative, -1 is don't care
labels = np.empty((num_inside,), dtype='int64') # Narrow anchors to match the feature layout
labels.fill(-1) anchors = self.all_anchors[fg_inds]
ignore_inds = rcnn_util.narrow_anchors(*(narrow_args + [ignore_inds]))
# Overlaps between the anchors and the gt boxes _, anchors = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds, anchors]))
overlaps = box_util.bbox_overlaps(anchors, gt_boxes) fg_inds = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds]))
argmax_overlaps = overlaps.argmax(1)
max_overlaps = overlaps[np.arange(num_inside), argmax_overlaps] # Compute bbox targets
gt_assignment = box_util.bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
# Foreground: for each gt, anchor with highest overlap bbox_targets = box_util.bbox_transform(anchors, gt_boxes[gt_assignment, :4])
gt_argmax_overlaps = overlaps.argmax(0) outputs['bbox_anchors'].append(anchors)
gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])] outputs['bbox_targets'].append(bbox_targets)
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
gt_inds = argmax_overlaps[gt_argmax_overlaps] # Compute label assignments
labels[gt_argmax_overlaps] = gt_boxes[gt_inds, 4] output_labels[ix, ignore_inds] = -1
output_labels[ix, fg_inds] = gt_boxes[gt_assignment, 4]
# Foreground: above threshold IoU
inds = max_overlaps >= cfg.RETINANET.POSITIVE_OVERLAP # Compute sparse indices
gt_inds = argmax_overlaps[inds] fg_inds += ix * image_stride
labels[inds] = gt_boxes[gt_inds, 4] outputs['bbox_inds'].extend([fg_inds])
fg_inds = np.where(labels > 0)[0]
# Background: below threshold IoU
labels[max_overlaps < cfg.RETINANET.NEGATIVE_OVERLAP] = 0
# Retract the clamping if we don't have one
if len(fg_inds) == 0:
gt_inds = argmax_overlaps[gt_argmax_overlaps]
labels[gt_argmax_overlaps] = gt_boxes[gt_inds, 4]
fg_inds = np.where(labels > 0)[0]
labels_wide[ix, inds_inside] = labels
bbox_anchors_wide.append(anchors[fg_inds])
bbox_indices_wide.append(fg_inds + (num_anchors * ix))
bbox_targets_wide.append(
box_util.bbox_transform(
anchors[fg_inds],
gt_boxes[argmax_overlaps[fg_inds], :4],
)
)
return { return {
'labels': new_tensor(labels_wide), 'labels': new_tensor(output_labels),
'bbox_indices': new_tensor(np.concatenate(bbox_indices_wide)), 'bbox_inds': new_tensor(
'bbox_anchors': new_tensor(np.concatenate(bbox_anchors_wide).astype('float32')), np.concatenate(outputs['bbox_inds'])),
'bbox_targets': new_tensor(np.concatenate(bbox_targets_wide).astype('float32')), 'bbox_targets': new_tensor(
np.concatenate(outputs['bbox_targets']).astype('float32')),
'bbox_anchors': new_tensor(
np.concatenate(outputs['bbox_anchors']).astype('float32')),
} }
...@@ -22,7 +22,10 @@ class DataLoader(object): ...@@ -22,7 +22,10 @@ class DataLoader(object):
"""Provide mini-batches of data.""" """Provide mini-batches of data."""
def __new__(cls): def __new__(cls):
if cfg.TRAIN.MAX_SIZE > 0: pipeline_type = cfg.PIPELINE.TYPE.lower()
if pipeline_type == 'default' or pipeline_type == 'rcnn':
return faster_rcnn.DataLoader() return faster_rcnn.DataLoader()
else: elif pipeline_type == 'ssd':
return ssd.DataLoader() return ssd.DataLoader()
else:
raise ValueError('Unsupported pipeline: ' + pipeline_type)
...@@ -20,60 +20,79 @@ import numpy as np ...@@ -20,60 +20,79 @@ import numpy as np
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.modeling.detector import new_detector from seetadet.modeling.detector import new_detector
from seetadet.utils import blob as blob_util
from seetadet.utils import image as image_util
from seetadet.utils import logger
from seetadet.utils import nms as nms_util from seetadet.utils import nms as nms_util
from seetadet.utils import time_util from seetadet.utils import time_util
from seetadet.utils.blob import im_list_to_blob
from seetadet.utils.image import scale_image
def ims_detect(detector, raw_images): def get_data(raw_images):
"""Detect images, with single or multiple scales.""" """Return the test data."""
ims, ims_scale = [], [] max_size = cfg.TEST.MAX_SIZE
for i in range(len(raw_images)): if cfg.PIPELINE.TYPE.lower() == 'ssd':
im, im_scale = scale_image(raw_images[i]) max_size = 0 # Warped to a fixed size
ims += im images_wide = []
ims_scale += im_scale image_shapes_wide, image_scales_wide = [], []
for img in raw_images:
num_scales = len(ims_scale) // len(raw_images) images, image_scales = image_util.scale_image(
ims_shape = np.array([im.shape[:2] for im in ims]) img, scales=cfg.TEST.SCALES, max_size=max_size)
ims_scale = np.array(ims_scale).reshape((len(ims), -1)) images_wide += images
image_scales_wide += image_scales
# Prepare blobs image_shapes_wide += [img.shape[:2] for img in images]
data = im_list_to_blob(ims) images = blob_util.im_list_to_blob(
ims_info = np.hstack([ims_shape, ims_scale]).astype('float32') images_wide, coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
image_shapes = np.array(image_shapes_wide)
image_scales = np.array(image_scales_wide).reshape((len(images), -1))
images_info = np.hstack([image_shapes, image_scales]).astype('float32')
return images, images_info
def ims_detect(detector, raw_images, timer=None):
"""Detect images at single or multiple scales."""
images, images_info = get_data(raw_images)
timer.tic() if timer else timer
# Do Forward # Do Forward
data = torch.from_numpy(data) inputs = {'image': torch.from_numpy(images),
ims_info = torch.from_numpy(ims_info) 'im_info': torch.from_numpy(images_info)}
# with torch.no_grad():
# outputs = detector.forward(inputs)
if not hasattr(detector, 'script_forward'): if not hasattr(detector, 'script_forward'):
def script_forward(self, data, ims_info): def script_forward(self, image, im_info):
return self.forward({'data': data, 'ims_info': ims_info}) return self.forward({'image': image, 'im_info': im_info})
detector.script_forward = torch.jit.trace( detector.script_forward = torch.jit.trace(
func=types.MethodType(script_forward, detector), func=types.MethodType(script_forward, detector),
example_inputs=[data, ims_info], example_inputs=[inputs['image'], inputs['im_info']],
) )
outputs = detector.script_forward(data, ims_info) outputs = detector.script_forward(inputs['image'], inputs['im_info'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys()) outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
timer.toc() if timer else timer
# Unpack results
results = outputs['detections'] # Decode results
detections = [[] for _ in range(len(raw_images))] detections = outputs['detections']
results = [[] for _ in range(len(raw_images))]
for i in range(len(ims)): for i in range(len(images)):
inds = np.where(results[:, 0].astype(np.int32) == i)[0] inds = np.where(detections[:, 0].astype(np.int32) == i)[0]
detections[i // num_scales].append(results[inds, 1:]) results[i // len(cfg.TEST.SCALES)].append(detections[inds, 1:])
return [np.vstack(detections[i]) for i in range(len(raw_images))] # Merge from multiple scales
ret = [np.vstack(d) for d in results]
timer.toc() if timer else timer
def test_net(weights, num_classes, q_in, q_out, device): return ret
num_classes, cfg.GPU_ID = num_classes, device
def test_net(weights, q_in, q_out, device, root_logger=True):
"""Test a network trained with RetinaNet algorithm."""
cfg.GPU_ID = device
num_classes = len(cfg.MODEL.CLASSES)
logger.set_root_logger(root_logger)
detector = new_detector(device, weights) detector = new_detector(device, weights)
must_stop = False must_stop = False
_t = time_util.new_timers('im_detect', 'misc') timers = time_util.new_timers('im_detect_bbox', 'misc')
empty_detections = np.zeros((0, 5), 'float32')
while True: while True:
if must_stop: if must_stop:
...@@ -91,17 +110,19 @@ def test_net(weights, num_classes, q_in, q_out, device): ...@@ -91,17 +110,19 @@ def test_net(weights, num_classes, q_in, q_out, device):
continue continue
# Run detecting on specific scales # Run detecting on specific scales
with _t['im_detect'].tic_and_toc(): results = ims_detect(detector, raw_images, timers['im_detect_bbox'])
results = ims_detect(detector, raw_images)
# Post-Processing # Post-processing
for i, detections in enumerate(results): for i, detections in enumerate(results):
_t['misc'].tic() timers['misc'].tic()
boxes_this_image = [[]] boxes_this_image = [[]]
# {x1, y1, x2, y2, score, cls} # Detection format: (x1, y1, x2, y2, score, cls)
detections = np.array(detections) detections = np.array(detections)
for j in range(1, num_classes): for j in range(1, num_classes):
cls_indices = np.where(detections[:, 5].astype(np.int32) == j)[0] cls_indices = np.where(detections[:, 5].astype(np.int32) == j)[0]
if len(cls_indices) == 0:
boxes_this_image.append(empty_detections)
continue
cls_boxes = detections[cls_indices, :4] cls_boxes = detections[cls_indices, :4]
cls_scores = detections[cls_indices, 4] cls_scores = detections[cls_indices, 4]
cls_detections = np.hstack(( cls_detections = np.hstack((
...@@ -121,11 +142,11 @@ def test_net(weights, num_classes, q_in, q_out, device): ...@@ -121,11 +142,11 @@ def test_net(weights, num_classes, q_in, q_out, device):
) )
cls_detections = cls_detections[keep, :] cls_detections = cls_detections[keep, :]
boxes_this_image.append(cls_detections) boxes_this_image.append(cls_detections)
_t['misc'].toc() timers['misc'].toc()
q_out.put(( q_out.put((
indices[i], indices[i],
dict([('im_detect', _t['im_detect'].average_time), dict([('im_detect', timers['im_detect_bbox'].average_time),
('misc', _t['misc'].average_time)]), ('misc', timers['misc'].average_time)]),
dict([('boxes', boxes_this_image)]), dict([('boxes', boxes_this_image)]),
)) ))
...@@ -14,7 +14,4 @@ from __future__ import division ...@@ -14,7 +14,4 @@ from __future__ import division
from __future__ import print_function from __future__ import print_function
from seetadet.algo.ssd.data_loader import DataLoader from seetadet.algo.ssd.data_loader import DataLoader
from seetadet.algo.ssd.hard_mining import HardMining from seetadet.algo.ssd.anchor_target import AnchorTarget
from seetadet.algo.ssd.multibox import MultiBoxMatch
from seetadet.algo.ssd.multibox import MultiBoxTarget
from seetadet.algo.ssd.priorbox import PriorBox
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import math
import numpy as np
from seetadet.algo.ssd import generate_anchors as anchor_util
from seetadet.algo.ssd import utils as ssd_util
from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util
from seetadet.utils.env import new_tensor
class AnchorTarget(object):
"""Assign ground-truth targets to anchors."""
def __init__(self):
super(AnchorTarget, self).__init__()
# Load the basic configs
self.strides = cfg.SSD.STRIDES
anchor_sizes = cfg.SSD.ANCHOR_SIZES
aspect_ratios = cfg.SSD.ASPECT_RATIOS
self.base_anchors = []
for i in range(len(anchor_sizes)):
ratios = aspect_ratios[i]
if not isinstance(ratios, (tuple, list)):
# All strides share the same ratios
ratios = aspect_ratios
self.base_anchors.append(
anchor_util.generate_anchors(
min_sizes=[anchor_sizes[i][0]],
max_sizes=[anchor_sizes[i][1]],
ratios=ratios))
# Plan the fixed anchor layout
max_size = cfg.TRAIN.SCALES[0]
if cfg.MODEL.COARSEST_STRIDE > 0:
stride = float(cfg.MODEL.COARSEST_STRIDE)
max_size = int(math.ceil(max_size / stride) * stride)
shapes = [[math.ceil(max_size / stride)] * 2
for stride in self.strides]
self.all_anchors = ssd_util.get_shifted_anchors(
shapes, self.base_anchors, self.strides)
def sample_anchors(self, gt_boxes, all_anchors=None):
anchors = self.all_anchors \
if all_anchors is None else all_anchors
num_anchors = len(anchors)
labels = np.empty((num_anchors,), dtype='int32')
labels.fill(-1)
# Overlaps between the anchors and the gt boxes.
overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
argmax_overlaps = overlaps.argmax(axis=1)
max_overlaps = overlaps[np.arange(num_anchors), argmax_overlaps]
# Foreground: for each gt, anchor with highest overlap.
gt_argmax_overlaps = overlaps.argmax(axis=0)
gt_assignment = argmax_overlaps[gt_argmax_overlaps]
labels[gt_argmax_overlaps] = gt_boxes[gt_assignment, 4]
# Foreground: above threshold IoU.
inds = max_overlaps >= cfg.SSD.POSITIVE_OVERLAP
gt_assignment = argmax_overlaps[inds]
labels[inds] = gt_boxes[gt_assignment, 4]
fg_inds = np.where(labels > 0)[0]
# Negative: not matched and below threshold IoU.
neg_inds = np.where(labels <= 0)[0]
neg_overlaps = max_overlaps[neg_inds]
eligible_neg_inds = np.where(neg_overlaps < cfg.SSD.NEGATIVE_OVERLAP)[0]
neg_inds = neg_inds[eligible_neg_inds]
return fg_inds, neg_inds
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
neg_pos_ratio = cfg.SSD.NEGATIVE_POSITIVE_RATIO
image_stride = self.all_anchors.shape[0]
cls_prob = inputs['cls_prob'].numpy()
outputs = collections.defaultdict(list)
# Label: ``1`` is positive, ``0`` is negative, ``-1` is don't care
output_labels = np.empty((num_images, image_stride,), 'int64')
output_labels.fill(-1)
for ix in range(num_images):
fg_inds = inputs['fg_inds'][ix]
neg_inds = inputs['bg_inds'][ix]
gt_boxes = inputs['gt_boxes'][ix]
# Mining hard negatives as background.
num_pos, num_neg = len(fg_inds), len(neg_inds)
num_bg = min(int(num_pos * neg_pos_ratio), num_neg)
neg_loss = -np.log(np.maximum(
cls_prob[ix, neg_inds][np.arange(num_neg),
np.zeros((num_neg,), 'int32')],
np.finfo(float).eps))
bg_inds = neg_inds[np.argsort(-neg_loss)][:num_bg]
# Compute bbox targets.
anchors = self.all_anchors[fg_inds]
gt_assignment = box_util.bbox_overlaps(
anchors, gt_boxes).argmax(axis=1)
bbox_targets = box_util.bbox_transform(
anchors, gt_boxes[gt_assignment, :4],
cfg.BBOX_REG_WEIGHTS)
outputs['bbox_anchors'].append(anchors)
outputs['bbox_targets'].append(bbox_targets)
# Compute label assignments.
output_labels[ix, bg_inds] = 0
output_labels[ix, fg_inds] = gt_boxes[gt_assignment, 4]
# Compute sparse indices.
fg_inds += ix * image_stride
outputs['bbox_inds'].extend([fg_inds])
return {
'labels': new_tensor(output_labels),
'bbox_inds': new_tensor(
np.concatenate(outputs['bbox_inds'])),
'bbox_targets': new_tensor(
np.concatenate(outputs['bbox_targets']).astype('float32')),
'bbox_anchors': new_tensor(
np.concatenate(outputs['bbox_anchors']).astype('float32')),
}
...@@ -13,8 +13,11 @@ from __future__ import absolute_import ...@@ -13,8 +13,11 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import collections
import multiprocessing as mp import multiprocessing as mp
import time import time
import threading
import queue
import dragon import dragon
import dragon.vm.torch as torch import dragon.vm.torch as torch
...@@ -23,6 +26,7 @@ import numpy as np ...@@ -23,6 +26,7 @@ import numpy as np
from seetadet.algo.ssd import data_transformer from seetadet.algo.ssd import data_transformer
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.datasets.factory import get_dataset from seetadet.datasets.factory import get_dataset
from seetadet.utils import blob as blob_util
from seetadet.utils import logger from seetadet.utils import logger
...@@ -32,28 +36,24 @@ class DataLoader(object): ...@@ -32,28 +36,24 @@ class DataLoader(object):
def __init__(self): def __init__(self):
super(DataLoader, self).__init__() super(DataLoader, self).__init__()
dataset = get_dataset(cfg.TRAIN.DATASET) dataset = get_dataset(cfg.TRAIN.DATASET)
if cfg.USE_DALI: self.iterator = Iterator(**{
from seetadet.dali import ssd_pipeline as pipe 'dataset': dataset.cls,
self.iterator = pipe.new_iterator(dataset.source) 'source': dataset.source,
else: 'classes': dataset.classes,
self.iterator = Iterator(**{ 'shuffle': cfg.TRAIN.USE_SHUFFLE,
'dataset': dataset.cls, 'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'source': dataset.source, 'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
'classes': dataset.classes, })
'shuffle': cfg.TRAIN.USE_SHUFFLE, self.iterator.start()
'num_chunks': cfg.TRAIN.SHUFFLE_CHUNKS,
'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
})
def __call__(self): def __call__(self):
outputs = self.iterator.next() outputs = self.iterator.next()
if isinstance(outputs['data'], np.ndarray): if isinstance(outputs['image'], np.ndarray):
outputs['data'] = torch.from_numpy(outputs['data']) outputs['image'] = torch.from_numpy(outputs['image'])
return outputs return outputs
class Iterator(object): class Iterator(threading.Thread):
"""Iterator to return the batch of data.""" """Iterator to return the batch of data."""
def __init__(self, **kwargs): def __init__(self, **kwargs):
...@@ -67,15 +67,16 @@ class Iterator(object): ...@@ -67,15 +67,16 @@ class Iterator(object):
rank = dragon.distributed.get_rank(process_group) rank = dragon.distributed.get_rank(process_group)
# Configuration # Configuration
self._prefetch = kwargs.get('prefetch', 5) self._batch_size = kwargs.get('batch_size', 8)
self._batch_size = kwargs.get('batch_size', 32)
self._num_readers = kwargs.get('num_readers', 1) self._num_readers = kwargs.get('num_readers', 1)
self._num_transformers = kwargs.get('num_transformers', 3) self._num_transformers = kwargs.get('num_transformers', 3)
self.daemon = True
# Initialize queues # Initialize queues
num_batches = self._prefetch * self._num_readers num_batches = self._num_readers
self.q_in = mp.Queue(num_batches * self._batch_size) self._queue1 = mp.Queue(num_batches * self._batch_size)
self.q_out = mp.Queue(num_batches * self._batch_size) self._queue2 = mp.Queue(num_batches * self._batch_size)
self._queue3 = queue.Queue(num_batches)
# Initialize readers # Initialize readers
self._readers = [] self._readers = []
...@@ -86,7 +87,7 @@ class Iterator(object): ...@@ -86,7 +87,7 @@ class Iterator(object):
self._readers.append(dragon.io.DataReader( self._readers.append(dragon.io.DataReader(
part_idx=part_idx, num_parts=num_parts, **kwargs)) part_idx=part_idx, num_parts=num_parts, **kwargs))
self._readers[i]._seed += part_idx self._readers[i]._seed += part_idx
self._readers[i].q_out = self.q_in self._readers[i].q_out = self._queue1
self._readers[i].start() self._readers[i].start()
time.sleep(0.1) time.sleep(0.1)
...@@ -95,7 +96,7 @@ class Iterator(object): ...@@ -95,7 +96,7 @@ class Iterator(object):
for i in range(self._num_transformers): for i in range(self._num_transformers):
p = data_transformer.DataTransformer(**kwargs) p = data_transformer.DataTransformer(**kwargs)
p._seed += (i + rank * self._num_transformers) p._seed += (i + rank * self._num_transformers)
p.q_in, p.q_out = self.q_in, self.q_out p.q_in, p.q_out = self._queue1, self._queue2
p.start() p.start()
self._transformers.append(p) self._transformers.append(p)
time.sleep(0.1) time.sleep(0.1)
...@@ -118,26 +119,41 @@ class Iterator(object): ...@@ -118,26 +119,41 @@ class Iterator(object):
"""Return the next batch of data.""" """Return the next batch of data."""
return self.__next__() return self.__next__()
def run(self):
"""Main loop."""
num_images = cfg.TRAIN.IMS_PER_BATCH
num_batches = cfg.TRAIN.ASPECT_GROUPING
logger.info('Initialize prefetching batches...')
example_buffer = [self._queue2.get()
for _ in range(num_images * num_batches)]
next_examples = []
while True:
# Use cached buffer for next N examples
if len(next_examples) == 0:
next_examples = example_buffer
example_buffer = []
# Prepare the next batch
outputs = collections.defaultdict(list)
for i in range(num_images):
example = next_examples.pop(0)
outputs['image'].append(example['image'])
outputs['gt_boxes'].append(example['boxes'])
outputs['im_info'].append(example['im_info'])
outputs['fg_inds'].append(example.get('fg_inds', None))
outputs['bg_inds'].append(example.get('bg_inds', None))
example_buffer.append(self._queue2.get())
outputs['image'] = blob_util.im_list_to_blob(
outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
# Send batch data to consumer
self._queue3.put(outputs)
def __iter__(self): def __iter__(self):
"""Return the iterator self.""" """Return the iterator self."""
return self return self
def __next__(self): def __next__(self):
"""Return the next batch of data.""" """Return the next batch of data."""
n = cfg.TRAIN.IMS_PER_BATCH return self._queue3.get()
h = w = cfg.TRAIN.SCALES[0]
boxes_to_pack = []
image, boxes = self.q_out.get()
images = np.zeros((n, h, w, 3), image.dtype)
for i in range(n):
images[i] = image
gt_boxes = np.zeros((boxes.shape[0], boxes.shape[1] + 1), 'float32')
gt_boxes[:, :boxes.shape[1]], gt_boxes[:, -1] = boxes, i
boxes_to_pack.append(gt_boxes)
if i != (cfg.TRAIN.IMS_PER_BATCH - 1):
image, boxes = self.q_out.get()
boxes_to_pack = np.concatenate(boxes_to_pack)
return {'data': images, 'gt_boxes': boxes_to_pack}
...@@ -14,8 +14,12 @@ from __future__ import division ...@@ -14,8 +14,12 @@ from __future__ import division
from __future__ import print_function from __future__ import print_function
import multiprocessing import multiprocessing
import cv2
import numpy as np import numpy as np
import numpy.random as npr
from seetadet.algo import common as algo_common
from seetadet.algo.ssd import transforms from seetadet.algo.ssd import transforms
from seetadet.core.config import cfg from seetadet.core.config import cfg
from seetadet.datasets.example import Example from seetadet.datasets.example import Example
...@@ -27,108 +31,95 @@ class DataTransformer(multiprocessing.Process): ...@@ -27,108 +31,95 @@ class DataTransformer(multiprocessing.Process):
super(DataTransformer, self).__init__() super(DataTransformer, self).__init__()
self._scale = cfg.TRAIN.SCALES[0] self._scale = cfg.TRAIN.SCALES[0]
self._seed = cfg.RNG_SEED self._seed = cfg.RNG_SEED
self._mirror = cfg.TRAIN.USE_FLIPPED
self._use_diff = cfg.TRAIN.USE_DIFF self._use_diff = cfg.TRAIN.USE_DIFF
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._classes = kwargs.get('classes', ('__background__',)) self._classes = kwargs.get('classes', ('__background__',))
self._num_classes = len(self._classes) self._num_classes = len(self._classes)
self._class_to_ind = dict(zip(self._classes, range(self._num_classes))) self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self.augment_image = \ self._anchor_sampler = algo_common.AnchorSampler()
transforms.Compose( self._apply_transform = transforms.Compose(transforms.Distort(),
transforms.Distort(), # Color augmentation transforms.Expand(),
transforms.Expand(), # Expand and padding transforms.Sample(),
transforms.Sample(), # Sample a patch randomly transforms.Resize())
transforms.Resize(), # Resize to a fixed scale
)
self.q_in = self.q_out = None self.q_in = self.q_out = None
self.daemon = True self.daemon = True
def make_roi_dict(self, example, apply_flip=False): def get_boxes(self, example):
objects, n_objects = example.objects, 0 objects, num_objects = example.objects, 0
height, width = example.height, example.width height, width = example.height, example.width
if not self._use_diff: if not self._use_diff:
for obj in objects: for obj in objects:
if obj.get('difficult', 0) == 0: if obj.get('difficult', 0) == 0:
n_objects += 1 num_objects += 1
else: else:
n_objects = len(objects) num_objects = len(objects)
roi_dict = { boxes = np.zeros((num_objects, 4), 'float32')
'boxes': np.zeros((n_objects, 4), 'float32'), gt_classes = np.zeros((num_objects,), 'int32')
'gt_classes': np.zeros((n_objects,), 'int32'),
}
# Filter the difficult instances # Filter the difficult instances.
object_idx = 0 object_idx = 0
for obj in objects: for obj in objects:
if not self._use_diff and \ if not self._use_diff and obj.get('difficult', 0) > 0:
obj.get('difficult', 0) > 0:
continue continue
bbox = obj['bbox'] bbox = obj['bbox']
roi_dict['boxes'][object_idx, :] = [ boxes[object_idx, :] = [max(0, bbox[0]),
max(0, bbox[0]), max(0, bbox[1]),
max(0, bbox[1]), min(bbox[2], width - 1),
min(bbox[2], width - 1), min(bbox[3], height - 1)]
min(bbox[3], height - 1), gt_classes[object_idx] = self._class_to_ind[obj['name']]
]
roi_dict['gt_classes'][object_idx] = \
self._class_to_ind[obj['name']]
object_idx += 1 object_idx += 1
if apply_flip: # Normalize.
roi_dict['boxes'] = \ boxes[:, 0::2] /= width
box_util.flip_boxes( boxes[:, 1::2] /= height
roi_dict['boxes'],
width,
)
# Normalize to unit sizes # Attach the classes.
roi_dict['boxes'][:, 0::2] /= width gt_boxes = np.empty((num_objects, 5), dtype=np.float32)
roi_dict['boxes'][:, 1::2] /= height gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
return roi_dict return gt_boxes
def get(self, example): def get(self, example):
example = Example(example) example = Example(example)
img = example.image
# Flip
apply_flip = False
if self._mirror:
if np.random.randint(2) > 0:
img = img[:, ::-1]
apply_flip = True
# Example -> RoIDict # Boxes.
roi_dict = self.make_roi_dict(example, apply_flip) boxes = self.get_boxes(example)
if len(boxes) == 0:
return {'boxes': boxes}
# Post-Process for gt boxes # Distort => Expand => Sample => Resize
# Shape like: [num_objects, {x1, y1, x2, y2, cls}] img, boxes = self._apply_transform(example.image, boxes)
gt_boxes = np.empty((roi_dict['gt_classes'].size, 5), 'float32')
gt_boxes[:, :4], gt_boxes[:, 4] = roi_dict['boxes'], roi_dict['gt_classes']
if len(gt_boxes) == 0: # Restore to the blob scale.
# Ignore the non-object image boxes[:, :4] *= self._scale
return img, gt_boxes
# Distort => Expand => Sample => Resize # Flip.
img, gt_boxes = self.augment_image(img, gt_boxes) if self._use_flipped and npr.randint(2) > 0:
img = img[:, ::-1]
boxes = box_util.flip_boxes(boxes, img.shape[1])
# Restore to the blob scale # Standard outputs.
gt_boxes[:, :4] *= self._scale outputs = {'image': img, 'boxes': boxes, 'im_info': img.shape[:2]}
# Post-Process for image # Attach precomputed targets.
if img.dtype == 'uint16': if len(boxes) > 0:
img = img.astype('float32') / 256. outputs.update(
self._anchor_sampler(
gt_boxes=boxes,
im_info=outputs['im_info']))
return img, gt_boxes return outputs
def run(self): def run(self):
# Fix the process-local random seed # Disable the opencv threading.
cv2.setNumThreads(1)
# Fix the process-local random seed.
np.random.seed(self._seed) np.random.seed(self._seed)
# Main prefetch loop # Main prefetch loop
while True: while True:
outputs = self.get(self.q_in.get()) outputs = self.get(self.q_in.get())
if len(outputs[1]) < 1: if len(outputs['boxes']) < 1:
continue # Ignore the non-object image continue # Ignore non-object image.
self.q_out.put(outputs) self.q_out.put(outputs)
This diff is collapsed. Click to expand it.
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!