Commit 9d12d142 by Ting PAN

Add Model Zoo

1 parent d240a4fd
Showing with 1877 additions and 1614 deletions
[flake8]
max-line-length = 120
ignore = E741, # ambiguous variable name
F403, # ‘from module import *’ used; unable to detect undefined names
F405, # name may be undefined, or defined from star imports: module
F811, # redefinition of unused name from line N
F821, # undefined name
W503, # line break before binary operator
W504, # line break after binary operator
# module imported but unused
per-file-ignores = __init__.py: F401
exclude = seetadet/utils/pycocotools
......@@ -43,8 +43,13 @@ __pycache__
# VSCode files
.vscode
# PyCharm files
# IDEA files
.idea
# OSX dir files
.DS_Store
# Android files
.gradle
*.iml
local.properties
------------------------------------------------------------------------
The list of most significant changes made over time in SeetaDet.
SeetaDet 0.4.3 (20200724)
Dragon Minimum Required (Version 0.3.0.dev20200723)
Changes:
- Adapt to the latest dragon preview version.
Preview Features:
- None
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.4.2 (20200707)
Dragon Minimum Required (Version 0.3.0.dev20200707)
Changes:
- Adapt to the latest dragon preview version.
Preview Features:
- None
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.4.1 (20200421)
Dragon Minimum Required (Version 0.3.0.dev20200421)
Changes:
- Plan the queueing of testing images instead of reading them all.
Preview Features:
- None
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.4.0 (20200408)
Dragon Minimum Required (Version 0.3.0.dev20200408)
Changes:
Preview Features:
- Optimize the code structure.
- DALI support for SSD, RetinaNet, and Faster-RCNN.
- Use KPLRecord instead of SeetaRecord.
Bugs fixed:
- Fix the frozen Affine issue.
------------------------------------------------------------------------
SeetaDet 0.3.0 (20191121)
Dragon Minimum Required (Version 0.3.0.dev20191121)
Changes:
Preview Features:
- New algorithm: Mask R-CNN.
- Add MobileNet(V2 and NAS) as backbone.
- Refactor testing module, multi-GPU is supported.
Bugs fixed:
- Remove rotated boxes, use Mask R-CNN instead.
------------------------------------------------------------------------
SeetaDet 0.2.3 (20191101)
Dragon Minimum Required (Version 0.3.0.dev20191021)
Changes:
Preview Features:
- Refactor the API of rotated boxes.
- Simplify the solver by adding LRScheduler.
- Change the ``ITER`` naming to ``STEP``.
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.2.2 (20191021)
Dragon Minimum Required (Version 0.3.0.dev20191021)
Changes:
Preview Features:
- Add the dumping if detection results.
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.2.1 (20191017)
Dragon Minimum Required (Version 0.3.0.dev20191017)
Changes:
Preview Features:
- Rotated boxes and FPN support for SSD.
- Frozen the graph to speed up inference.
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.2.0 (20190929)
Dragon Minimum Required (Version 0.3.0.dev20190929)
Changes:
Preview Features:
- Use SeetaRecord instead of LMDB.
- Flatten the implementation of layers.
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.1.2 (20190723)
Dragon Minimum Required (Version 0.3.0.0)
Changes:
Preview Features:
- Change to the PEP8 code style.
- Adapt the new Dragon API.
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.1.1 (20190409)
Dragon Minimum Required (Version 0.3.0.0)
Changes:
Preview Features:
- Add RandomCrop/RandomPad for ScaleJittering.
- Add ResNet18/ResNet34/AirNet for R-CNN and RetinaNet.
- Use C++ Implemented Decoder for RetinaNet instead.
Bugs fixed:
- None
------------------------------------------------------------------------
SeetaDet 0.1.0 (20190314)
Dragon Minimum Required (Version 0.3.0.0)
Changes:
Preview Features:
- Init repository.
Bugs fixed:
- None
Copyright (c) 2017, SeetaTech, Co.,Ltd. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# Benchmark and Model Zoo
## Introduction
### ImageNet Pretrained Models
#### ResNet Models
- [R-50.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-50.pkl)
- [R-101.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-101.pkl)
#### VGG Models
- [VGG16.SSD.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/VGG16.SSD.pkl)
#### MobileNet Models
- [MobileNetV2.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/MobileNetV2.pkl)
- [ProxylessMobile.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/ProxylessMobile.pkl)
#### AirNet Models
- [AirNet.pkl](https://dragon.seetatech.com/download/models/seetadet/imagenet/AirNet.pkl)
## Baselines
### Faster R-CNN
Please refer to [Faster R-CNN](configs/faster_rcnn) for details.
### Mask R-CNN
Please refer to [Mask R-CNN](configs/mask_rcnn) for details.
### RetinaNet
Please refer to [RetinaNet](configs/retinanet) for details.
### SSD
Please refer to [SSD](configs/ssd) for details.
## SeetaDet
# SeetaDet
## WHAT's SeetaDet?
SeetaDet is a platform implementing popular object detection algorithms.
SeetaDet is a platform implementing popular object detection algorithms,
including R-CNN series, SSD, and RetinaNet.
We have achieved the same or higher performance than the baseline reported by the original paper.
This repository is based on [Dragon](https://github.com/seetaresearch/dragon),
while the style of codes is PyTorch.
This repository is based on [seeta-dragon](https://github.com/seetaresearch/dragon),
while the style of codes is torch.
The torch-style codes help us to simplify the hierarchical pipeline of modern detection.
## Requirements
seeta-dragon >= 0.3.0.dev20200723
seeta-dragon >= 0.3.0.dev20201014
## Installation
#### Build From Source
### Build From Source
If you prefer to develop modules as well as running experiments,
following commands will build but not install to ***site-packages***:
```bash
cd SeetaDet && python setup.py build
cd seetadet && python setup.py build
```
#### Install From Source
### Install From Source
Clone this repository to local disk and install:
```bash
cd SeetaDet && python setup.py install
cd seetadet && python setup.py install
```
#### Install From Git
### Install From Git
You can also install it from remote repository:
......@@ -45,16 +40,16 @@ pip install git+https://gitlab.seetatech.com/seetaresearch/seetadet.git@master
## Quick Start
#### Train a detection model
### Train a detection model
```bash
cd tools
python train.py --cfg <MODEL_YAML>
```
We have provided the default YAML examples into ``seetadet/configs``.
We have provided the default YAML examples into [configs](configs).
#### Test a detection model
### Test a detection model
```bash
cd tools
......@@ -64,42 +59,33 @@ Or
```bash
cd tools
python test_all.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR>
python test_all.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --last 1
```
#### Export a detection model to ONNX
### Export a detection model to ONNX
```bash
cd tools
python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
```
## Resources
#### Pre-trained ImageNet models
| Model | Usage |
| :------: | :------: |
| [VGG16.SSD](https://dragon.seetatech.com/download/models/seetadet/imagenet/VGG16.SSD.pth)| SSD |
| [VGG16.RCNN](https://dragon.seetatech.com/download/models/seetadet/imagenet/VGG16.RCNN.pth)| R-CNN |
| [R-18.Affine](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-18.Affine.pth)| R-CNN, RetinaNet, SSD |
| [R-34.Affine](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-34.Affine.pth)| R-CNN, RetinaNet, SSD |
| [R-50.Affine](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-50.Affine.pth)| R-CNN, RetinaNet, SSD |
| [R-101.Affine](https://dragon.seetatech.com/download/models/seetadet/imagenet/R-101.Affine.pth)| R-CNN, RetinaNet, SSD |
| [AirNet.Affine](https://dragon.seetatech.com/download/models/seetadet/imagenet/AirNet.Affine.pth)| R-CNN, RetinaNet, SSD |
## References
[1] [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497). Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. NIPS, 2015.
## Benchmark and Model Zoo
[2] [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385). Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. CVPR, 2016.
Results and models are available in the [Model Zoo](MODEL_ZOO.md).
[3] [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325). Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. ECCV, 2016.
### Supported Backbones
[4] [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144). Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. CVPR, 2017.
- [ResNet](MODEL_ZOO.md#resnet-models)
- [VGG](MODEL_ZOO.md#vgg-models)
- [MobileNet](MODEL_ZOO.md#mobilenet-models)
- [AirNet](MODEL_ZOO.md#airnet-models)
[5] [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002). Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. ICCV, 2017.
### Supported Algorithms
[6] [Mask R-CNN](https://arxiv.org/abs/1703.06870). Kaiming He, Georgia Gkioxari, Piotr Dollár and Ross Girshick. ICCV, 2017.
- [Faster R-CNN](configs/faster_rcnn)
- [Mask R-CNN](configs/mask_rcnn)
- [SSD](configs/ssd)
- [RetinaNet](configs/retinanet)
[7] [Detectron](https://github.com/facebookresearch/Detectron). Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollar and Kaiming He. 2018.
## License
[BSD 2-Clause license](LICENSE)
# Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
## Introduction
```
@article{Ren_2017,
title={Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher={Institute of Electrical and Electronics Engineers (IEEE)},
author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
year={2017},
month={Jun},
}
```
## COCO Object Detection Baselines
| Model | Lr sched | Infer time (s/im) | box AP | Download |
| :---: | :------: | :---------------: | :----: | :------: |
| [R-50-FPN-800](coco_faster_rcnn_R-50-FPN_800_1x.yml) | 1x | 0.046 | 38.3 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_1x/model_final.pkl) |
| [R-50-FPN-800](coco_faster_rcnn_R-50-FPN_800_2x.yml) | 2x | 0.046 | 39.7 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/coco_faster_rcnn_R-50-FPN_800_2x/model_final.pkl) |
## Pascal VOC Object Detection Baselines
| Model | Infer time (s/im) | AP@0.5 | Download |
| :---: | :---------------: | :----: | :------: |
| [R-50-FPN-640](voc_faster_rcnn_R-50-FPN_640.yml) | 0.030 | 80.8 | [model](https://dragon.seetatech.com/download/models/seetadet/faster_rcnn/voc_faster_rcnn_R-50-FPN_640_1x/model_final.pkl) |
NUM_GPUS: 8
VIS: False
ENABLE_TENSOR_BOARD: False
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: faster_rcnn
BACKBONE: resnet101.fpn
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
......@@ -19,30 +19,28 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
NUM_CLASSES: 81
SOLVER:
BASE_LR: 0.02
LR_POLICY: steps_with_decay
DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: coco_faster_rcnn
SNAPSHOT_PREFIX: coco_faster_rcnn_R-50-FPN_800_1x
FRCNN:
ROI_XFORM_METHOD: RoIAlign
BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7
TRAIN:
WEIGHTS: '/model/R-101.Affine.pth'
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 2
BATCH_SIZE: 512
SCALES: [800]
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2014_minival'
JSON_FILE: '/data/instances_minival2014.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
RPN_POST_NMS_TOP_N: 1000
NUM_GPUS: 8
VIS: False
ENABLE_TENSOR_BOARD: False
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: faster_rcnn
BACKBONE: resnet101.fpn
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
......@@ -19,29 +19,28 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
NUM_CLASSES: 81
SOLVER:
BASE_LR: 0.02
LR_POLICY: steps_with_decay
DECAY_STEPS: [120000, 160000]
MAX_STEPS: 180000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: coco_faster_rcnn
SNAPSHOT_PREFIX: coco_faster_rcnn_R-50-FPN_800_2x
FRCNN:
ROI_XFORM_METHOD: RoIAlign
BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7
TRAIN:
WEIGHTS: '/model/R-101.Affine.pth'
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 2
BATCH_SIZE: 512
SCALES: [800]
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2014_minival'
JSON_FILE: '/data/instances_minival2014.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
RPN_POST_NMS_TOP_N: 1000
NUM_GPUS: 1
VIS: False
ENABLE_TENSOR_BOARD: False
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: faster_rcnn
BACKBONE: resnet50.fpn
......@@ -10,27 +10,26 @@ MODEL:
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21
FRCNN:
BATCH_SIZE: 128
ROI_XFORM_RESOLUTION: 7
SOLVER:
BASE_LR: 0.002
DECAY_STEPS: [100000, 140000]
MAX_STEPS: 140000
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_faster_rcnn
FRCNN:
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
SNAPSHOT_PREFIX: voc_faster_rcnn_R-50-FPN_640
TRAIN:
WEIGHTS: '/model/R-50.Affine.pth'
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 2
BATCH_SIZE: 128
SCALES: [600]
MAX_SIZE: 1000
SCALES: [480, 512, 544, 576, 608, 640]
MAX_SIZE: 1066
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco'
SCALES: [600]
MAX_SIZE: 1000
PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [640]
MAX_SIZE: 1066
NMS: 0.45
RPN_POST_NMS_TOP_N: 1000
\ No newline at end of file
NUM_GPUS: 1
VIS: False
ENABLE_TENSOR_BOARD: False
MODEL:
TYPE: faster_rcnn
BACKBONE: vgg16.c4
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21
SOLVER:
BASE_LR: 0.001
WEIGHT_DECAY: 0.0005
DECAY_STEPS: [100000, 140000]
MAX_STEPS: 140000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_faster_rcnn
RPN:
STRIDES: [16]
SCALES: [8, 16, 32] # RField: [128, 256, 512]
ASPECT_RATIOS: [0.5, 1.0, 2.0]
FRCNN:
ROI_XFORM_METHOD: RoIPool
ROI_XFORM_RESOLUTION: 7
MLP_HEAD_DIM: 4096
TRAIN:
WEIGHTS: '/model/VGG16.RCNN.pth'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 2
BATCH_SIZE: 128
SCALES: [600]
MAX_SIZE: 1000
RPN_MIN_SIZE: 16
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco'
SCALES: [600]
MAX_SIZE: 1000
RPN_MIN_SIZE: 16
NMS: 0.45
RPN_POST_NMS_TOP_N: 300
\ No newline at end of file
# Mask R-CNN
## Introduction
```
@article{He_2017,
title={Mask R-CNN},
journal={2017 IEEE International Conference on Computer Vision (ICCV)},
publisher={IEEE},
author={He, Kaiming and Gkioxari, Georgia and Dollar, Piotr and Girshick, Ross},
year={2017},
month={Oct}
}
```
## COCO Instance Segmentation Baselines
| Model | Lr sched | Infer time (s/im) | box AP | mask AP | Download |
| :---: | :------: | :---------------: | :----: | :-----: | :------: |
| [R-50-FPN-800](coco_mask_rcnn_R-50-FPN_800_1x.yml) | 1x | 0.056 | 39.2 | 34.8 | [model](https://dragon.seetatech.com/download/models/seetadet/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_1x/model_final.pkl) |
| [R-50-FPN-800](coco_mask_rcnn_R-50-FPN_800_2x.yml) | 2x | 0.056 | 41.4 | 36.5 | [model](https://dragon.seetatech.com/download/models/seetadet/mask_rcnn/coco_mask_rcnn_R-50-FPN_800_2x/model_final.pkl) |
NUM_GPUS: 8
VIS: False
ENABLE_TENSOR_BOARD: False
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: mask_rcnn
BACKBONE: resnet101.fpn
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
......@@ -19,25 +19,22 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
NUM_CLASSES: 81
SOLVER:
BASE_LR: 0.02
DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: coco_mask_rcnn
SNAPSHOT_PREFIX: coco_mask_rcnn_R-50-FPN_800_1x
FRCNN:
ROI_XFORM_METHOD: RoIAlign
BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7
MRCNN:
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
TRAIN:
WEIGHTS: '/model/R-101.Affine.pth'
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 2
BATCH_SIZE: 512
SCALES: [800]
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
......@@ -47,5 +44,3 @@ TEST:
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
RPN_POST_NMS_TOP_N: 1000
NUM_GPUS: 8
VIS: False
ENABLE_TENSOR_BOARD: False
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: mask_rcnn
BACKBONE: resnet101.fpn
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
......@@ -19,25 +19,22 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
NUM_CLASSES: 81
SOLVER:
BASE_LR: 0.02
DECAY_STEPS: [120000, 160000]
MAX_STEPS: 180000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: coco_mask_rcnn
SNAPSHOT_PREFIX: coco_mask_rcnn_R-50-FPN_800_2x
FRCNN:
ROI_XFORM_METHOD: RoIAlign
BATCH_SIZE: 512
ROI_XFORM_RESOLUTION: 7
MRCNN:
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14
TRAIN:
WEIGHTS: '/model/R-101.Affine.pth'
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 2
BATCH_SIZE: 512
SCALES: [800]
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
......@@ -47,4 +44,3 @@ TEST:
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
RPN_POST_NMS_TOP_N: 1000
# Focal Loss for Dense Object Detection
## Introduction
```
@inproceedings{lin2017focal,
title={Focal loss for dense object detection},
author={Lin, Tsung-Yi and Goyal, Priya and Girshick, Ross and He, Kaiming and Doll{\'a}r, Piotr},
booktitle={Proceedings of the IEEE international conference on computer vision},
year={2017}
}
```
## COCO Object Detection Baselines
| Model | Lr sched | Infer time (s/im) | box AP | Download |
| :---: | :------: | :---------------: | :----: | :------: |
| [R-50-FPN-416](coco_retinanet_R-50-FPN_416_6x.yml) | 6x | 0.019 | 34.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_416_6x/model_final.pkl) |
| [R-50-FPN-512](coco_retinanet_R-50-FPN_512_6x.yml) | 6x | 0.022 | 36.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_512_6x/model_final.pkl) |
| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_1x.yml) | 1x | 0.051 | 37.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_1x/model_final.pkl) |
| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_2x.yml) | 2x | 0.051 | 39.1 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_2x/model_final.pkl) |
## Pascal VOC Object Detection Baselines
| Model | Infer time (s/im) | AP@0.5 | Download |
| :---: | :---------------: | :----: | :------: |
| [R-50-FPN-416](voc_retinanet_R-50-FPN_416.yml) | 0.015 | 82.3 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_416/model_final.pkl) |
| [R-50-FPN-512](voc_retinanet_R-50-FPN_512.yml) | 0.017 | 83.0 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_512/model_final.pkl) |
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: retinanet
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SOLVER:
BASE_LR: 0.01
LR_POLICY: steps_with_decay
DECAY_STEPS: [90000, 120000]
MAX_STEPS: 135000
SNAPSHOT_EVERY: 2500
SNAPSHOT_PREFIX: coco_retinanet_R-50-FPN_416_6x
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 8
SCALES: [416]
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2014_minival'
JSON_FILE: '/data/instances_minival2014.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [416]
NMS: 0.5
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: retinanet
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SOLVER:
BASE_LR: 0.01
LR_POLICY: steps_with_decay
DECAY_STEPS: [90000, 120000]
MAX_STEPS: 135000
SNAPSHOT_EVERY: 2500
SNAPSHOT_PREFIX: coco_retinanet_R-50-FPN_512_6x
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 8
SCALES: [512]
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2014_minival'
JSON_FILE: '/data/instances_minival2014.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [512]
NMS: 0.5
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: retinanet
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SOLVER:
BASE_LR: 0.01
LR_POLICY: steps_with_decay
DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: coco_retinanet_R-50-FPN_800_1x
TRAIN:
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2014_minival'
JSON_FILE: '/data/instances_minival2014.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 4
VIS: False
ENABLE_TENSOR_BOARD: False
NUM_GPUS: 8
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: retinanet
BACKBONE: resnet50.fpn
......@@ -19,28 +19,28 @@ MODEL:
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
NUM_CLASSES: 81
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SOLVER:
BASE_LR: 0.01
LR_POLICY: steps_with_decay
DECAY_STEPS: [120000, 160000]
MAX_STEPS: 180000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: coco_retinanet_416
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
SNAPSHOT_PREFIX: coco_retinanet_R-50-FPN_800_2x
TRAIN:
WEIGHTS: '/model/R-50.Affine.pth'
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/coco_2014_trainval35k'
IMS_PER_BATCH: 16
SCALES: [416]
RANDOM_SCALES: [0.25, 1.0]
USE_DIFF: False # Do not use crowd objects
USE_COLOR_JITTER: False
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
USE_DIFF: False # Do not use crowd objects
TEST:
DATASET: '/data/coco_2014_minival'
JSON_FILE: '/data/instances_minival2014.json'
PROTOCOL: 'coco'
IMS_PER_BATCH: 1
SCALES: [416]
NMS: 0.5
\ No newline at end of file
SCALES: [800]
MAX_SIZE: 1333
NMS: 0.5
NUM_GPUS: 1
VIS: False
VIS_ON_FILE: False
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: retinanet
BACKBONE: airnet.fpn
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [40000, 50000, 60000]
MAX_STEPS: 60000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_retinanet_320
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
RETINANET:
NUM_CONVS: 2
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_retinanet_R-50-FPN_416
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/AirNet.Affine.pth'
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 32
SCALES: [320]
IMS_PER_BATCH: 16
SCALES: [416]
RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco'
PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [320]
NMS: 0.45
\ No newline at end of file
SCALES: [416]
NMS: 0.45
RETINANET_PRE_NMS_TOP_N: 1000
NUM_GPUS: 1
VIS: False
VIS_ON_FILE: False
NUM_GPUS: 2
PIXEL_STDS: [57.375, 57.12, 58.395]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: retinanet
BACKBONE: resnet34.fpn
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [40000, 50000, 60000]
WARM_UP_STEPS: 2000
MAX_STEPS: 60000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_retinanet_320
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 7
RETINANET:
NUM_CONVS: 2
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_retinanet_R-50-FPN_512
PIPELINE:
TYPE: 'ssd'
TRAIN:
WEIGHTS: '/model/R-50.Affine.pth'
WEIGHTS: '/model/R-50.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 32
SCALES: [320]
RANDOM_SCALES: [0.25, 2.0]
IMS_PER_BATCH: 8
SCALES: [512]
RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco'
PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [320]
NMS: 0.45
\ No newline at end of file
SCALES: [512]
NMS: 0.45
RETINANET_PRE_NMS_TOP_N: 1000
# SSD: Single Shot MultiBox Detector
## Introduction
```
@article{Liu_2016,
title={SSD: Single Shot MultiBox Detector},
journal={ECCV},
author={Liu, Wei and Anguelov, Dragomir and Erhan, Dumitru and Szegedy, Christian and Reed, Scott and Fu, Cheng-Yang and Berg, Alexander C.},
year={2016},
}
```
## Pascal VOC Object Detection Baselines
| Model | Infer time (s/im) | AP@0.5 | Download |
| :---: | :---------------: | :----: | :------: |
| [VGG-16-300](voc_ssd_VGG-16_300.yml) | 0.012 | 78.3 | [model](https://dragon.seetatech.com/download/models/seetadet/ssd/voc_ssd_VGG-16_300/model_final.pkl) |
| [VGG-16-512](voc_ssd_VGG-16_512.yml) | 0.021 | 80.1 | [model](https://dragon.seetatech.com/download/models/seetadet/ssd/voc_ssd_VGG-16_512/model_final.pkl) |
NUM_GPUS: 1
VIS: False
ENABLE_TENSOR_BOARD: False
MODEL:
TYPE: ssd
BACKBONE: airnet.fpn
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21
SOLVER:
BASE_LR: 0.001
DECAY_STEPS: [80000, 100000, 120000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_ssd_320
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 8
SSD:
NUM_CONVS: 2
MULTIBOX:
STRIDES: [8, 16, 32, 64, 100, 300]
MIN_SIZES: [30, 60, 110, 162, 213, 264]
MAX_SIZES: [60, 110, 162, 213, 264, 315]
ASPECT_RATIOS: [
[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5],
[1, 2, 0.5],
]
TRAIN:
WEIGHTS: '/model/AirNet.Affine.pth'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 32
SCALES: [320]
RANDOM_SCALES: [0.25, 1.00]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco'
IMS_PER_BATCH: 8
SCALES: [320]
NMS_TOP_K: 400
NMS: 0.45
SCORE_THRESH: 0.01
DETECTIONS_PER_IM: 200
\ No newline at end of file
NUM_GPUS: 1
VIS: False
ENABLE_TENSOR_BOARD: False
MODEL:
TYPE: ssd
BACKBONE: resnet50.fpn
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21
FPN:
RPN_MIN_LEVEL: 3
RPN_MAX_LEVEL: 8
SOLVER:
BASE_LR: 0.001
DECAY_STEPS: [80000, 100000, 120000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_ssd_320
SSD:
NUM_CONVS: 2
MULTIBOX:
STRIDES: [8, 16, 32, 64, 100, 300]
MIN_SIZES: [30, 60, 110, 162, 213, 264]
MAX_SIZES: [60, 110, 162, 213, 264, 315]
ASPECT_RATIOS: [
[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5],
[1, 2, 0.5]
]
TRAIN:
WEIGHTS: '/model/R-50.Affine.pth'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 32
SCALES: [320]
RANDOM_SCALES: [0.25, 1.00]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco'
IMS_PER_BATCH: 8
SCALES: [320]
NMS_TOP_K: 400
NMS: 0.45
SCORE_THRESH: 0.01
DETECTIONS_PER_IM: 200
NUM_GPUS: 1
VIS: False
ENABLE_TENSOR_BOARD: False
PIXEL_STDS: [1.0, 1.0, 1.0]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: ssd
BACKBONE: vgg16_reduced_300.mbox
FREEZE_AT: 0
BACKBONE: vgg16_reduced_300
COARSEST_STRIDE: 0
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
NUM_CLASSES: 21
SSD:
STRIDES: [8, 16, 32, 64, 100, 300]
ANCHOR_SIZES: [[30, 60],
[60, 110],
[110, 162],
[162, 213],
[213, 264],
[264, 315]]
ASPECT_RATIOS: [[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5],
[1, 2, 0.5]]
SOLVER:
BASE_LR: 0.001
WEIGHT_DECAY: 0.0005
DECAY_STEPS: [80000, 100000, 120000]
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_ssd_300
SSD:
MULTIBOX:
STRIDES: [8, 16, 32, 64, 100, 300]
MIN_SIZES: [30, 60, 110, 162, 213, 264]
MAX_SIZES: [60, 110, 162, 213, 264, 315]
ASPECT_RATIOS: [
[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5],
[1, 2, 0.5]
]
SNAPSHOT_PREFIX: voc_ssd_VGG-16_300
TRAIN:
WEIGHTS: '/model/VGG16.SSD.pth'
WEIGHTS: '/model/VGG16.SSD.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 32
IMS_PER_BATCH: 16
SCALES: [300]
RANDOM_SCALES: [0.25, 1.00]
RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007' # 'voc2007', 'voc2010', 'coco'
IMS_PER_BATCH: 8
PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [300]
NMS_TOP_K: 400
NMS: 0.45
SCORE_THRESH: 0.01
DETECTIONS_PER_IM: 200
NUM_GPUS: 2
PIXEL_STDS: [1.0, 1.0, 1.0]
PIXEL_MEANS: [103.53, 116.28, 123.675]
MODEL:
TYPE: ssd
BACKBONE: vgg16_reduced_512
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
SSD:
STRIDES: [8, 16, 32, 64, 128, 256, 512]
ANCHOR_SIZES: [[35.84, 76.8],
[76.8, 153.6],
[153.6, 230.4],
[230.4, 307.2],
[307.2, 384.0],
[384.0, 460.8],
[460.8, 537.6]]
ASPECT_RATIOS: [[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5],
[1, 2, 0.5]]
SOLVER:
BASE_LR: 0.001
WEIGHT_DECAY: 0.0005
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: voc_ssd_VGG-16_512
TRAIN:
WEIGHTS: '/model/VGG16.SSD.pkl'
DATASET: '/data/voc_0712_trainval'
IMS_PER_BATCH: 8
SCALES: [512]
RANDOM_SCALES: [0.25, 1.0]
USE_COLOR_JITTER: True
TEST:
DATASET: '/data/voc_2007_test'
PROTOCOL: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [512]
NMS: 0.45
SCORE_THRESH: 0.01
......@@ -7,7 +7,6 @@ template <class Context>
template <typename T>
void NonMaxSuppressionOp<Context>::DoRunWithType() {
int num_selected;
utils::detection::ApplyNMS(
Output(0)->count(),
Output(0)->count(),
......@@ -16,7 +15,6 @@ void NonMaxSuppressionOp<Context>::DoRunWithType() {
Output(0)->template mutable_data<int64_t, CPUContext>(),
num_selected,
ctx());
Output(0)->Reshape({num_selected});
}
......@@ -24,14 +22,13 @@ template <class Context>
void NonMaxSuppressionOp<Context>::RunOnDevice() {
CHECK(Input(0).ndim() == 2 && Input(0).dim(1) == 5)
<< "\nThe dimensions of boxes should be (num_boxes, 5).";
Output(0)->Reshape({Input(0).dim(0)});
DispatchHelper<TensorTypes<float>>::Call(this, Input(0));
}
DEPLOY_CPU(NonMaxSuppression);
DEPLOY_CPU_OPERATOR(NonMaxSuppression);
#ifdef USE_CUDA
DEPLOY_CUDA(NonMaxSuppression);
DEPLOY_CUDA_OPERATOR(NonMaxSuppression);
#endif
OPERATOR_SCHEMA(NonMaxSuppression).NumInputs(1).NumOutputs(1);
......
......@@ -22,7 +22,7 @@ class NonMaxSuppressionOp final : public Operator<Context> {
public:
NonMaxSuppressionOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws),
iou_threshold_(OpArg<float>("iou_threshold", 0.5f)) {}
iou_threshold_(OP_SINGLE_ARG(float, "iou_threshold", 0.5f)) {}
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
......
......@@ -10,50 +10,48 @@ template <typename T>
void RetinaNetDecoderOp<Context>::DoRunWithType() {
using BT = float; // DType of BBox
using BC = CPUContext; // Context of BBox
int feat_h, feat_w;
int C = Input(-3).dim(2), A, K;
int total_proposals = 0;
int num_candidates, num_boxes, num_proposals;
auto* batch_scores = Input(-3).template data<T, BC>();
auto* batch_deltas = Input(-2).template data<T, BC>();
auto* im_info = Input(-1).template data<BT, BC>();
auto* y = Output(0)->template mutable_data<BT, BC>();
auto* batch_scores = Input(SCORES).template data<T, Context>();
auto* batch_deltas = Input(DELTAS).template data<T, BC>();
auto* im_info = Input(IMAGE_INFO).template data<BT, BC>();
auto* all_proposals = Output(0)->template mutable_data<BT, BC>();
for (int n = 0; n < num_images_; ++n) {
for (int im_idx = 0; im_idx < num_images_; ++im_idx) {
BT im_h = im_info[0];
BT im_w = im_info[1];
BT im_scale_h = im_info[2];
BT im_scale_w = im_info[2];
if (Input(-1).dim(1) == 4) im_scale_w = im_info[3];
auto* scores = batch_scores + n * Input(-3).stride(0);
auto* deltas = batch_deltas + n * Input(-2).stride(0);
if (Input(IMAGE_INFO).dim(1) == 4) im_scale_w = im_info[3];
CHECK_EQ(strides_.size(), InputSize() - 3)
<< "\nGiven " << strides_.size() << " strides "
<< "and " << InputSize() - 3 << " features";
// Select the top-k candidates as proposals
num_boxes = Input(-3).dim(1);
num_candidates = Input(-3).count(1);
roi_indices_.resize(num_candidates);
num_candidates = 0;
for (int i = 0; i < roi_indices_.size(); ++i)
if (scores[i] > score_thr_) roi_indices_[num_candidates++] = i;
scores_.resize(num_candidates);
for (int i = 0; i < num_candidates; ++i)
scores_[i] = scores[roi_indices_[i]];
num_proposals = std::min(num_candidates, (int)pre_nms_topn_);
utils::math::ArgPartition(
num_candidates, num_proposals, true, scores_.data(), indices_);
for (int i = 0; i < num_proposals; ++i)
auto num_boxes = Input(SCORES).dim(1);
auto num_classes = Input(SCORES).dim(2);
utils::detection::SelectProposals(
Input(SCORES).count(1),
score_thr_,
batch_scores + im_idx * Input(SCORES).stride(0),
roi_scores_,
roi_indices_,
ctx());
auto num_candidates = (int)roi_scores_.size();
auto num_proposals = std::min(num_candidates, (int)pre_nms_topn_);
utils::detection::ArgPartition(
num_candidates, num_proposals, true, roi_scores_.data(), indices_);
scores_.resize(indices_.size());
for (int i = 0; i < num_proposals; ++i) {
scores_[i] = roi_scores_[indices_[i]];
indices_[i] = roi_indices_[indices_[i]];
// Decode the candidates
int base_offset = 0;
}
// Decode proposals via anchors
int stride_offset = 0;
for (int i = 0; i < strides_.size(); i++) {
feat_h = Input(i).dim(2);
feat_w = Input(i).dim(3);
K = feat_h * feat_w;
A = int(ratios_.size() * scales_.size());
auto feature_h = Input(i).dim(2);
auto feature_w = Input(i).dim(3);
auto K = feature_h * feature_w;
auto A = int(ratios_.size() * scales_.size());
anchors_.resize((size_t)(A * 4));
utils::detection::GenerateAnchors(
strides_[i],
......@@ -62,35 +60,35 @@ void RetinaNetDecoderOp<Context>::DoRunWithType() {
ratios_.data(),
scales_.data(),
anchors_.data());
utils::detection::GenerateGridAnchors(
utils::detection::GetShiftedAnchors(
num_proposals,
C,
num_classes,
A,
feat_h,
feat_w,
feature_h,
feature_w,
strides_[i],
base_offset,
stride_offset,
anchors_.data(),
indices_.data(),
y);
base_offset += (A * K);
all_proposals);
stride_offset += (A * K);
}
utils::detection::GenerateMCProposals(
utils::detection::GenerateDetections(
num_proposals,
num_boxes,
C,
n,
num_classes,
im_idx,
im_h,
im_w,
im_scale_h,
im_scale_w,
scores,
deltas,
scores_.data(),
batch_deltas + im_idx * Input(DELTAS).stride(0),
indices_.data(),
y);
all_proposals);
total_proposals += num_proposals;
y += (num_proposals * 7);
im_info += Input(-1).dim(1);
all_proposals += (num_proposals * 7);
im_info += Input(IMAGE_INFO).dim(1);
}
Output(0)->Reshape({total_proposals, 7});
......@@ -99,20 +97,20 @@ void RetinaNetDecoderOp<Context>::DoRunWithType() {
template <class Context>
void RetinaNetDecoderOp<Context>::RunOnDevice() {
num_images_ = Input(0).dim(0);
CHECK_EQ(Input(-1).dim(0), num_images_)
<< "\nExcepted " << num_images_ << " groups info, got "
<< Input(-1).dim(0) << ".";
Output(0)->Reshape({num_images_ * pre_nms_topn_, 7});
DispatchHelper<TensorTypes<float>>::Call(this, Input(-3));
DispatchHelper<TensorTypes<float>>::Call(this, Input(SCORES));
}
DEPLOY_CPU(RetinaNetDecoder);
DEPLOY_CPU_OPERATOR(RetinaNetDecoder);
#ifdef USE_CUDA
DEPLOY_CUDA(RetinaNetDecoder);
DEPLOY_CUDA_OPERATOR(RetinaNetDecoder);
#endif
OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX);
NO_GRADIENT(RetinaNetDecoder);
} // namespace dragon
......@@ -22,11 +22,11 @@ class RetinaNetDecoderOp final : public Operator<Context> {
public:
RetinaNetDecoderOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws),
strides_(OpArgs<int64_t>("strides")),
ratios_(OpArgs<float>("ratios")),
scales_(OpArgs<float>("scales")),
pre_nms_topn_(OpArg<int64_t>("pre_nms_top_n", 6000)),
score_thr_(OpArg<float>("score_thresh", 0.05f)) {}
strides_(OP_REPEATED_ARG(int64_t, "strides")),
ratios_(OP_REPEATED_ARG(float, "ratios")),
scales_(OP_REPEATED_ARG(float, "scales")),
pre_nms_topn_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
score_thr_(OP_SINGLE_ARG(float, "score_thresh", 0.05f)) {}
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
......@@ -34,10 +34,13 @@ class RetinaNetDecoderOp final : public Operator<Context> {
template <typename T>
void DoRunWithType();
enum INPUT_TAGS { SCORES = -3, DELTAS = -2, IMAGE_INFO = -1 };
protected:
float score_thr_;
vec64_t strides_, indices_, roi_indices_;
vector<float> ratios_, scales_, scores_, anchors_;
vector<float> ratios_, scales_, anchors_;
vector<float> scores_, roi_scores_;
int64_t num_images_, pre_nms_topn_;
};
......
......@@ -15,153 +15,81 @@ void RPNDecoderOp<Context>::DoRunWithType() {
int total_rois = 0, num_rois;
int num_candidates, num_proposals;
auto* batch_scores = Input(-3).template data<T, BC>();
auto* batch_deltas = Input(-2).template data<T, BC>();
auto* im_info = Input(-1).template data<BT, BC>();
auto* y = Output(0)->template mutable_data<BT, BC>();
auto* batch_scores = Input(SCORES).template data<T, BC>();
auto* batch_deltas = Input(DELTAS).template data<T, BC>();
auto* im_info = Input(IMAGE_INFO).template data<BT, BC>();
auto* all_rois = Output(0)->template mutable_data<BT, BC>();
for (int n = 0; n < num_images_; ++n) {
for (int im_idx = 0; im_idx < num_images_; ++im_idx) {
const BT im_h = im_info[0];
const BT im_w = im_info[1];
const BT scale = im_info[2];
const BT min_box_h = min_size_ * scale;
const BT min_box_w = min_size_ * scale;
auto* scores = batch_scores + n * Input(-3).stride(0);
auto* deltas = batch_deltas + n * Input(-2).stride(0);
if (strides_.size() == 1) {
// Case 1: single stride
feat_h = Input(0).dim(2);
feat_w = Input(0).dim(3);
auto* scores = batch_scores + im_idx * Input(SCORES).stride(0);
auto* deltas = batch_deltas + im_idx * Input(DELTAS).stride(0);
CHECK_EQ(strides_.size(), InputSize() - 3)
<< "\nGiven " << strides_.size() << " strides "
<< "and " << InputSize() - 3 << " feature inputs";
CHECK_EQ(strides_.size(), scales_.size())
<< "\nGiven " << strides_.size() << " strides "
<< "and " << scales_.size() << " scales";
// Select the top-k candidates as proposals
num_candidates = Input(SCORES).dim(1);
num_proposals = std::min(num_candidates, (int)pre_nms_top_n_);
utils::math::ArgPartition(
num_candidates, num_proposals, true, scores, indices_);
// Decode the candidates
int stride_offset = 0;
proposals_.Reshape({num_proposals, 5});
auto* proposals = proposals_.template mutable_data<BT, BC>();
for (int i = 0; i < strides_.size(); i++) {
feat_h = Input(i).dim(2);
feat_w = Input(i).dim(3);
K = feat_h * feat_w;
A = int(ratios_.size() * scales_.size());
// Select the Top-K candidates as proposals
num_candidates = A * K;
num_proposals = std::min(num_candidates, (int)pre_nms_topn_);
utils::math::ArgPartition(
num_candidates, num_proposals, true, scores, indices_);
// Decode the candidates
A = (int)ratios_.size();
anchors_.resize((size_t)(A * 4));
proposals_.Reshape({num_proposals, 5});
utils::detection::GenerateAnchors(
strides_[0],
strides_[i],
(int)ratios_.size(),
(int)scales_.size(),
1,
ratios_.data(),
scales_.data(),
anchors_.data());
utils::detection::GenerateGridAnchors(
utils::detection::GetShiftedAnchors(
num_proposals,
A,
feat_h,
feat_w,
strides_[0],
0,
strides_[i],
stride_offset,
anchors_.data(),
indices_.data(),
proposals_.template mutable_data<BT, BC>());
utils::detection::GenerateSSProposals(
K,
num_proposals,
im_h,
im_w,
min_box_h,
min_box_w,
scores,
deltas,
indices_.data(),
proposals_.template mutable_data<BT, BC>());
// Sort, NMS and Retrieve
utils::detection::SortProposals(
0,
num_proposals - 1,
num_proposals,
proposals_.template mutable_data<BT, BC>());
utils::detection::ApplyNMS(
num_proposals,
post_nms_topn_,
nms_thr_,
proposals_.template mutable_data<BT, Context>(),
roi_indices_.data(),
num_rois,
ctx());
utils::detection::RetrieveRoIs(
num_rois,
n,
proposals_.template data<BT, BC>(),
roi_indices_.data(),
y);
} else if (strides_.size() > 1) {
// Case 2: multiple strides
CHECK_EQ(strides_.size(), InputSize() - 3)
<< "\nGiven " << strides_.size() << " strides "
<< "and " << InputSize() - 3 << " feature inputs";
CHECK_EQ(strides_.size(), scales_.size())
<< "\nGiven " << strides_.size() << " strides "
<< "and " << scales_.size() << " scales";
// Select the top-k candidates as proposals
num_candidates = Input(-3).dim(1);
num_proposals = std::min(num_candidates, (int)pre_nms_topn_);
utils::math::ArgPartition(
num_candidates, num_proposals, true, scores, indices_);
// Decode the candidates
int base_offset = 0;
proposals_.Reshape({num_proposals, 5});
auto* proposals = proposals_.template mutable_data<BT, BC>();
for (int i = 0; i < strides_.size(); i++) {
feat_h = Input(i).dim(2);
feat_w = Input(i).dim(3);
K = feat_h * feat_w;
A = (int)ratios_.size();
anchors_.resize((size_t)(A * 4));
utils::detection::GenerateAnchors(
strides_[i],
(int)ratios_.size(),
1,
ratios_.data(),
scales_.data(),
anchors_.data());
utils::detection::GenerateGridAnchors(
num_proposals,
A,
feat_h,
feat_w,
strides_[i],
base_offset,
anchors_.data(),
indices_.data(),
proposals);
base_offset += (A * K);
}
utils::detection::GenerateMSProposals(
num_candidates,
num_proposals,
im_h,
im_w,
min_box_h,
min_box_w,
scores,
deltas,
&indices_[0],
proposals);
// Sort, NMS and Retrieve
utils::detection::SortProposals(
0, num_proposals - 1, num_proposals, proposals);
utils::detection::ApplyNMS(
num_proposals,
post_nms_topn_,
nms_thr_,
proposals_.template mutable_data<BT, Context>(),
roi_indices_.data(),
num_rois,
ctx());
utils::detection::RetrieveRoIs(
num_rois, n, proposals, roi_indices_.data(), y);
} else {
LOG(FATAL) << "Excepted at least one stride for proposals.";
stride_offset += (A * K);
}
utils::detection::GenerateProposals(
num_candidates,
num_proposals,
im_h,
im_w,
scores,
deltas,
&indices_[0],
proposals);
// Sort, NMS and Retrieve
utils::detection::SortProposals(
0, num_proposals - 1, num_proposals, proposals);
utils::detection::ApplyNMS(
num_proposals,
post_nms_top_n_,
nms_thr_,
proposals_.template mutable_data<BT, Context>(),
roi_indices_.data(),
num_rois,
ctx());
utils::detection::RetrieveRoIs(
num_rois, im_idx, proposals, roi_indices_.data(), all_rois);
total_rois += num_rois;
y += (num_rois * 5);
im_info += Input(-1).dim(1);
all_rois += (num_rois * 5);
im_info += Input(IMAGE_INFO).dim(1);
}
Output(0)->Reshape({total_rois, 5});
......@@ -202,22 +130,21 @@ void RPNDecoderOp<Context>::DoRunWithType() {
template <class Context>
void RPNDecoderOp<Context>::RunOnDevice() {
num_images_ = Input(0).dim(0);
CHECK_EQ(Input(-1).dim(0), num_images_)
CHECK_EQ(Input(IMAGE_INFO).dim(0), num_images_)
<< "\nExcepted " << num_images_ << " groups info, got "
<< Input(-1).dim(0) << ".";
roi_indices_.resize(post_nms_topn_);
Output(0)->Reshape({num_images_ * post_nms_topn_, 5});
DispatchHelper<TensorTypes<float>>::Call(this, Input(-3));
<< Input(IMAGE_INFO).dim(0) << ".";
roi_indices_.resize(post_nms_top_n_);
Output(0)->Reshape({num_images_ * post_nms_top_n_, 5});
DispatchHelper<TensorTypes<float>>::Call(this, Input(SCORES));
}
DEPLOY_CPU(RPNDecoder);
DEPLOY_CPU_OPERATOR(RPNDecoder);
#ifdef USE_CUDA
DEPLOY_CUDA(RPNDecoder);
DEPLOY_CUDA_OPERATOR(RPNDecoder);
#endif
OPERATOR_SCHEMA(RPNDecoder).NumInputs(3, INT_MAX).NumOutputs(1, INT_MAX);
NO_GRADIENT(RPNDecoder);
} // namespace dragon
......@@ -22,17 +22,16 @@ class RPNDecoderOp final : public Operator<Context> {
public:
RPNDecoderOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws),
strides_(OpArgs<int64_t>("strides")),
ratios_(OpArgs<float>("ratios")),
scales_(OpArgs<float>("scales")),
pre_nms_topn_(OpArg<int64_t>("pre_nms_top_n", 6000)),
post_nms_topn_(OpArg<int64_t>("post_nms_top_n", 300)),
nms_thr_(OpArg<float>("nms_thresh", 0.7f)),
min_size_(OpArg<int64_t>("min_size", 16)),
min_level_(OpArg<int64_t>("min_level", 2)),
max_level_(OpArg<int64_t>("max_level", 5)),
canonical_level_(OpArg<int64_t>("canonical_level", 4)),
canonical_scale_(OpArg<int64_t>("canonical_scale", 224)) {}
strides_(OP_REPEATED_ARG(int64_t, "strides")),
ratios_(OP_REPEATED_ARG(float, "ratios")),
scales_(OP_REPEATED_ARG(float, "scales")),
pre_nms_top_n_(OP_SINGLE_ARG(int64_t, "pre_nms_top_n", 6000)),
post_nms_top_n_(OP_SINGLE_ARG(int64_t, "post_nms_top_n", 1000)),
nms_thr_(OP_SINGLE_ARG(float, "nms_thresh", 0.7f)),
min_level_(OP_SINGLE_ARG(int64_t, "min_level", 2)),
max_level_(OP_SINGLE_ARG(int64_t, "max_level", 5)),
canonical_level_(OP_SINGLE_ARG(int64_t, "canonical_level", 4)),
canonical_scale_(OP_SINGLE_ARG(int64_t, "canonical_scale", 224)) {}
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
......@@ -40,11 +39,13 @@ class RPNDecoderOp final : public Operator<Context> {
template <typename T>
void DoRunWithType();
enum INPUT_TAGS { SCORES = -3, DELTAS = -2, IMAGE_INFO = -1 };
protected:
float nms_thr_;
vec64_t strides_, indices_, roi_indices_;
vector<float> ratios_, scales_, scores_, anchors_;
int64_t min_size_, pre_nms_topn_, post_nms_topn_;
int64_t pre_nms_top_n_, post_nms_top_n_;
int64_t num_images_, min_level_, max_level_;
int64_t canonical_level_, canonical_scale_;
Tensor proposals_;
......
......@@ -8,7 +8,6 @@
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build cxx sources."""
from __future__ import absolute_import
......@@ -16,14 +15,14 @@ from __future__ import division
from __future__ import print_function
import glob
from distutils.core import setup
from dragon.tools import cpp_extension
if cpp_extension.CUDA_HOME is not None and \
cpp_extension._cuda.is_available():
Extension = cpp_extension.CUDAExtension
else:
Extension = cpp_extension.CppExtension
from setuptools import setup
Extension = cpp_extension.CppExtension
if cpp_extension.CUDA_HOME is not None:
if cpp_extension._cuda.is_available():
Extension = cpp_extension.CUDAExtension
def find_sources(*dirs):
......@@ -44,11 +43,12 @@ ext_modules = [
Extension(
name='install.lib.modules._C',
sources=find_sources('**'),
define_macros=[('THRUST_IGNORE_CUB_VERSION_CHECK', None)],
),
]
setup(
name='SeetaDet',
ext_modules=ext_modules,
cmdclass={'build_ext': cpp_extension.BuildExtension}
cmdclass={'build_ext': cpp_extension.BuildExtension},
)
......@@ -47,6 +47,26 @@ void ApplyNMS<float, CPUContext>(
num_keep = count;
}
template <>
void SelectProposals<float, CPUContext>(
const int count,
const float score_thresh,
const float* input_scores,
vector<float>& output_scores,
vector<int64_t>& output_indices,
CPUContext* ctx) {
int num_proposals = 0;
for (int i = 0; i < count; ++i) {
if (input_scores[i] > score_thresh) {
output_indices[num_proposals++] = i;
}
}
output_scores.resize(num_proposals);
for (int i = 0; i < num_proposals; ++i) {
output_scores[i] = input_scores[output_indices[i]];
}
}
} // namespace detection
} // namespace utils
......
#ifdef USE_CUDA
#include <dragon/core/context_cuda.h>
#include <dragon/core/workspace.h>
#include <dragon/utils/device/common_cub.h>
#include <dragon/utils/device/common_thrust.h>
#include "detection_utils.h"
namespace dragon {
......@@ -15,6 +18,16 @@ namespace detection {
namespace {
template <typename T>
struct ThresholdFunctor {
ThresholdFunctor(float thresh) : thresh_(thresh) {}
inline __device__ bool operator()(
const thrust::tuple<int64_t, T>& key_val) const {
return thrust::get<1>(key_val) > thresh_;
}
float thresh_;
};
template <typename T>
__device__ bool _CheckIoU(const T* a, const T* b, const float thresh) {
const T x1 = max(a[0], b[0]);
const T y1 = max(a[1], b[1]);
......@@ -72,6 +85,41 @@ __global__ void _NonMaxSuppression(
} // namespace
template <>
void SelectProposals<float, CUDAContext>(
const int count,
const float score_thresh,
const float* in_scores,
vector<float>& out_scores,
vector<int64_t>& out_indices,
CUDAContext* ctx) {
auto* in_indices = ctx->workspace()->template data<int64_t, CUDAContext>(
{count}, "data:1")[0];
auto iter = thrust::make_zip_iterator(
thrust::make_tuple(in_indices, const_cast<float*>(in_scores)));
auto policy = thrust::cuda::par.on(ctx->cuda_stream());
thrust::counting_iterator<int64_t> offset(0);
thrust::copy(policy, offset, offset + count, in_indices);
auto last = thrust::partition(
policy, iter, iter + count, ThresholdFunctor<float>(score_thresh));
size_t num_proposals = last - iter;
out_scores.resize(num_proposals);
out_indices.resize(num_proposals);
CUDA_CHECK(cudaMemcpyAsync(
out_scores.data(),
in_scores,
num_proposals * sizeof(float),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
CUDA_CHECK(cudaMemcpyAsync(
out_indices.data(),
in_indices,
num_proposals * sizeof(int64_t),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
ctx->FinishDeviceComputation();
}
template <>
void ApplyNMS<float, CUDAContext>(
const int num_boxes,
const int max_keeps,
......@@ -83,7 +131,8 @@ void ApplyNMS<float, CUDAContext>(
const int num_blocks = DIV_UP(num_boxes, NUM_THREADS);
vector<uint64_t> mask_host(num_boxes * num_blocks);
auto* mask_dev = (uint64_t*)ctx->New(mask_host.size() * sizeof(uint64_t));
auto* mask_dev = (uint64_t*)ctx->workspace()->data<CUDAContext>(
{mask_host.size() * sizeof(uint64_t)}, "data:1")[0];
_NonMaxSuppression<<<
dim3(num_blocks, num_blocks),
......@@ -115,9 +164,7 @@ void ApplyNMS<float, CUDAContext>(
if (num_selected == max_keeps) break;
}
}
num_keep = num_selected;
ctx->Delete(mask_dev);
}
} // namespace detection
......
......@@ -24,45 +24,37 @@ namespace detection {
#define ROUND(x) ((int)((x) + (T)0.5))
/*!
* Box API
* Functional API
*/
template <typename T>
inline int FilterBoxes(
const T dx,
const T dy,
const T d_log_w,
const T d_log_h,
const T im_w,
const T im_h,
const T min_box_w,
const T min_box_h,
T* bbox) {
const T w = bbox[2] - bbox[0] + 1;
const T h = bbox[3] - bbox[1] + 1;
const T ctr_x = bbox[0] + (T)0.5 * w;
const T ctr_y = bbox[1] + (T)0.5 * h;
const T pred_ctr_x = dx * w + ctr_x;
const T pred_ctr_y = dy * h + ctr_y;
const T pred_w = exp(d_log_w) * w;
const T pred_h = exp(d_log_h) * h;
bbox[0] = pred_ctr_x - (T)0.5 * pred_w;
bbox[1] = pred_ctr_y - (T)0.5 * pred_h;
bbox[2] = pred_ctr_x + (T)0.5 * pred_w;
bbox[3] = pred_ctr_y + (T)0.5 * pred_h;
bbox[0] = std::max((T)0, std::min(bbox[0], im_w - 1));
bbox[1] = std::max((T)0, std::min(bbox[1], im_h - 1));
bbox[2] = std::max((T)0, std::min(bbox[2], im_w - 1));
bbox[3] = std::max((T)0, std::min(bbox[3], im_h - 1));
const T bbox_w = bbox[2] - bbox[0] + 1;
const T bbox_h = bbox[3] - bbox[1] + 1;
return (bbox_w >= min_box_w) * (bbox_h >= min_box_h);
inline void ArgPartition(
const int count,
const int kth,
const bool descend,
const T* v,
vec64_t& indices) {
indices.resize(count);
std::iota(indices.begin(), indices.end(), 0);
if (descend) {
std::nth_element(
indices.begin(),
indices.begin() + kth,
indices.end(),
[&v](int64_t lhs, int64_t rhs) { return v[lhs] > v[rhs]; });
} else {
std::nth_element(
indices.begin(),
indices.begin() + kth,
indices.end(),
[&v](int64_t lhs, int64_t rhs) { return v[lhs] < v[rhs]; });
}
}
/*!
* Box API
*/
template <typename T>
inline void BBoxTransform(
const T dx,
......@@ -126,28 +118,28 @@ inline void GenerateAnchors(
}
template <typename T>
inline void GenerateGridAnchors(
inline void GetShiftedAnchors(
const int num_proposals,
const int num_anchors,
const int feat_h,
const int feat_w,
const int stride,
const int base_offset,
const T* anchors,
const int stride_offset,
const T* base_anchors,
const int64_t* indices,
T* proposals) {
T* shifted_anchors) {
T x, y;
int idx_3d, a, h, w;
int idx_range = num_anchors * feat_h * feat_w;
for (int i = 0; i < num_proposals; ++i) {
idx_3d = (int)indices[i] - base_offset;
idx_3d = (int)indices[i] - stride_offset;
if (idx_3d >= 0 && idx_3d < idx_range) {
w = idx_3d % feat_w;
h = (idx_3d / feat_w) % feat_h;
a = idx_3d / feat_w / feat_h;
x = (T)w * stride, y = (T)h * stride;
auto* A = anchors + a * 4;
auto* P = proposals + i * 5;
auto* A = base_anchors + a * 4;
auto* P = shifted_anchors + i * 5;
P[0] = x + A[0], P[1] = y + A[1];
P[2] = x + A[2], P[3] = y + A[3];
}
......@@ -155,20 +147,20 @@ inline void GenerateGridAnchors(
}
template <typename T>
inline void GenerateGridAnchors(
inline void GetShiftedAnchors(
const int num_proposals,
const int num_classes,
const int num_anchors,
const int feat_h,
const int feat_w,
const int stride,
const int base_offset,
const T* anchors,
const int stride_offset,
const T* base_anchors,
const int64_t* indices,
T* proposals) {
T* shifted_anchors) {
T x, y;
int idx_4d, a, h, w;
int lr = num_classes * base_offset;
int lr = num_classes * stride_offset;
int rr = num_classes * (num_anchors * feat_h * feat_w);
for (int i = 0; i < num_proposals; ++i) {
idx_4d = (int)indices[i] - lr;
......@@ -178,8 +170,8 @@ inline void GenerateGridAnchors(
h = (idx_4d / feat_w) % feat_h;
a = idx_4d / feat_w / feat_h;
x = (T)w * stride, y = (T)h * stride;
auto* A = anchors + a * 4;
auto* P = proposals + i * 7 + 1;
auto* A = base_anchors + a * 4;
auto* P = shifted_anchors + i * 7 + 1;
P[0] = x + A[0], P[1] = y + A[1];
P[2] = x + A[2], P[3] = y + A[3];
}
......@@ -190,22 +182,30 @@ inline void GenerateGridAnchors(
* Proposal API
*/
template <typename T, class Context>
void SelectProposals(
const int count,
const float score_thresh,
const T* input_scores,
vector<T>& output_scores,
vector<int64_t>& output_indices,
Context* ctx);
template <typename T>
void GenerateSSProposals(
void GenerateProposals_v1(
const int K,
const int num_proposals,
const float im_h,
const float im_w,
const float min_box_h,
const float min_box_w,
const T* scores,
const T* deltas,
const int64_t* indices,
T* proposals) {
// Shifted anchors in format: [K, A, 4]
int64_t index, a, k;
const float* delta;
float* proposal = proposals;
float dx, dy, d_log_w, d_log_h;
const T* delta;
T* proposal = proposals;
T dx, dy, d_log_w, d_log_h;
for (int i = 0; i < num_proposals; ++i) {
index = indices[i];
a = index / K, k = index % K;
......@@ -214,61 +214,42 @@ void GenerateSSProposals(
dy = delta[(a * 4 + 1) * K];
d_log_w = delta[(a * 4 + 2) * K];
d_log_h = delta[(a * 4 + 3) * K];
proposal[4] = FilterBoxes(
dx,
dy,
d_log_w,
d_log_h,
im_w,
im_h,
min_box_w,
min_box_h,
proposal) *
scores[index];
BBoxTransform(dx, dy, d_log_w, d_log_h, im_w, im_h, T(1), T(1), proposal);
proposal[4] = scores[index];
proposal += 5;
}
}
template <typename T>
void GenerateMSProposals(
void GenerateProposals(
const int num_candidates,
const int num_proposals,
const float im_h,
const float im_w,
const float min_box_h,
const float min_box_w,
const T* scores,
const T* deltas,
const int64_t* indices,
T* proposals) {
// Shifted anchors in format: [4, A, K]
int64_t index;
int64_t num_candidates_2x = 2 * num_candidates;
int64_t num_candidates_3x = 3 * num_candidates;
float* proposal = proposals;
float dx, dy, d_log_w, d_log_h;
T* proposal = proposals;
T dx, dy, d_log_w, d_log_h;
for (int i = 0; i < num_proposals; ++i) {
index = indices[i];
dx = deltas[index];
dy = deltas[num_candidates + index];
d_log_w = deltas[num_candidates_2x + index];
d_log_h = deltas[num_candidates_3x + index];
proposal[4] = FilterBoxes(
dx,
dy,
d_log_w,
d_log_h,
im_w,
im_h,
min_box_w,
min_box_h,
proposal) *
scores[index];
BBoxTransform(dx, dy, d_log_w, d_log_h, im_w, im_h, T(1), T(1), proposal);
proposal[4] = scores[index];
proposal += 5;
}
}
template <typename T>
void GenerateMCProposals(
void GenerateDetections(
const int num_proposals,
const int num_boxes,
const int num_classes,
......@@ -280,11 +261,11 @@ void GenerateMCProposals(
const T* scores,
const T* deltas,
const int64_t* indices,
T* proposals) {
T* detections) {
int64_t index, cls;
int64_t num_boxes_2x = 2 * num_boxes;
int64_t num_boxes_3x = 3 * num_boxes;
float* proposal = proposals;
T* detection = detections;
float dx, dy, d_log_w, d_log_h;
for (int i = 0; i < num_proposals; ++i) {
cls = indices[i] % num_classes;
......@@ -293,7 +274,7 @@ void GenerateMCProposals(
dy = deltas[num_boxes + index];
d_log_w = deltas[num_boxes_2x + index];
d_log_h = deltas[num_boxes_3x + index];
proposal[0] = im_idx;
detection[0] = im_idx;
BBoxTransform(
dx,
dy,
......@@ -303,10 +284,11 @@ void GenerateMCProposals(
im_h,
im_scale_h,
im_scale_w,
proposal + 1);
proposal[5] = scores[indices[i]];
proposal[6] = cls + 1;
proposal += 7;
detection + 1);
// detection[5] = scores[indices[i]];
detection[5] = scores[i];
detection[6] = cls + 1;
detection += 7;
}
}
......
......@@ -8,7 +8,6 @@
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Compile the cython extensions."""
from __future__ import absolute_import
......@@ -36,7 +35,7 @@ ext_modules = [
include_dirs=[np.get_include()]
),
Extension(
'install.lib.pycocotools._mask',
'install.lib.utils.pycocotools._mask',
['maskApi.c', '_mask.pyx'],
include_dirs=[np.get_include(), os.path.dirname(os.path.abspath(__file__))],
extra_compile_args=['-w']
......
......@@ -8,7 +8,6 @@
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Make record file for COCO dataset."""
from __future__ import absolute_import
......@@ -27,14 +26,12 @@ if __name__ == '__main__':
# Encode masks to RLE bytes
if not os.path.exists('build'):
os.makedirs('build')
os.makedirs('build')
make_mask('train', '2014', COCO_ROOT)
make_mask('valminusminival', '2014', COCO_ROOT)
make_mask('minival', '2014', COCO_ROOT)
merge_mask('trainval35k', '2014', [
'build/coco_2014_train_mask.pkl',
'build/coco_2014_valminusminival_mask.pkl']
)
merge_mask('trainval35k', '2014', ['build/coco_2014_train_mask.pkl',
'build/coco_2014_valminusminival_mask.pkl'])
# coco_2014_trainval35k
make_record(
......
......@@ -10,17 +10,13 @@
# ------------------------------------------------------------
import os
import pickle
import time
import cv2
import dragon
import numpy as np
try:
import cPickle
except:
import pickle as cPickle
def make_example(image_file, mask_objects, im_scale=None):
filename = os.path.split(image_file)[-1]
......@@ -52,6 +48,7 @@ def make_example(image_file, mask_objects, im_scale=None):
'xmax': x2,
'ymax': y2,
'mask': obj['mask'],
'polygons': obj['polygons'],
'difficult': obj.get('crowd', 0),
})
......@@ -80,7 +77,7 @@ def make_record(
if mask_file is not None:
with open(mask_file, 'rb') as f:
all_masks = cPickle.load(f)
all_masks = pickle.load(f)
else:
all_masks = {}
......@@ -101,6 +98,7 @@ def make_record(
'xmax': 'float64',
'ymax': 'float64',
'mask': 'bytes',
'polygons': [['float64']],
'difficult': 'int64',
}]
}
......@@ -111,10 +109,22 @@ def make_record(
for db_idx, split in enumerate(splits):
split_file = os.path.join(splits_path[db_idx], split + '.txt')
assert os.path.exists(split_file)
with open(split_file, 'r') as f:
lines = f.readlines()
total_line += len(lines)
if not os.path.exists(split_file):
# Fallback to try if split provided as json format
split_file = os.path.join(splits_path[db_idx], split + '.json')
if not os.path.exists(split_file):
raise FileNotFoundError('Unable to find the split:', split)
with open(split_file, 'r') as f:
import json
images_info = json.load(f)
total_line = len(images_info['images'])
lines = []
for info in images_info['images']:
lines.append(os.path.splitext(info['file_name'])[0])
else:
with open(split_file, 'r') as f:
lines = f.readlines()
total_line += len(lines)
for line in lines:
count += 1
if count % 2000 == 0:
......@@ -123,10 +133,8 @@ def make_record(
count, total_line, now_time - start_time))
filename = line.strip()
image_file = os.path.join(images_path[db_idx], filename + ext)
mask_objects = all_masks[filename] if filename in all_masks else None
if mask_objects is None:
raise ValueError('The image({}) takes invalid mask settings.'.format(filename))
writer.write( make_example(image_file, mask_objects, im_scale))
mask_objects = all_masks[filename] if filename in all_masks else {}
writer.write(make_example(image_file, mask_objects, im_scale))
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(count, total_line, now_time - start_time))
......
......@@ -9,19 +9,17 @@
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import os
import sys
import os.path as osp
from collections import OrderedDict
try:
import cPickle
except:
import pickle as cPickle
import pickle
sys.path.insert(0, '../..')
from seetadet.pycocotools.coco import COCO
from seetadet.pycocotools import mask_utils
from seetadet.utils.pycocotools import mask_utils
from seetadet.utils.pycocotools.coco import COCO
class COCOWrapper(object):
......@@ -31,7 +29,7 @@ class COCOWrapper(object):
self._data_path = osp.join(data_dir)
self.invalid_cnt = 0
self.ignore_cnt = 0
# Load COCO API, classes, class <-> id mappings
self._COCO = COCO(self._get_ann_file())
cats = self._COCO.loadCats(self._COCO.getCatIds())
......@@ -39,9 +37,8 @@ class COCOWrapper(object):
self._class_to_ind = dict(zip(self._classes, range(self.num_classes)))
self._ind_to_class = dict(zip(range(self.num_classes), self._classes))
self._class_to_cat_id = dict(zip([c['name'] for c in cats], self._COCO.getCatIds()))
self._cat_id_to_class_id = dict([(self._class_to_cat_id[cls],
self._class_to_ind[cls])
for cls in self._classes[1:]])
self._cat_id_to_class_id = dict([(self._class_to_cat_id[cls], self._class_to_ind[cls])
for cls in self._classes[1:]])
self._data_name = {
# 5k ``val2014`` subset
'minival2014': 'val2014',
......@@ -56,10 +53,10 @@ class COCOWrapper(object):
if self._image_set.find('test') == -1 \
else 'image_info'
return osp.join(
self._data_path,
self._data_path,
'annotations',
prefix + '_' +
self._image_set +
prefix + '_' +
self._image_set +
self._year + '.json'
)
......@@ -107,31 +104,32 @@ class COCOWrapper(object):
y1 = float(max(0, obj['bbox'][1]))
x2 = float(min(width - 1, x1 + max(0, obj['bbox'][2] - 1)))
y2 = float(min(height - 1, y1 + max(0, obj['bbox'][3] - 1)))
mask, polygons = b'', []
if isinstance(obj['segmentation'], list):
for p in obj['segmentation']:
if len(p) < 6:
print('Remove Invalid segm.')
# Valid polygons have >= 3 points, so require >= 6 coordinates
poly = [p for p in obj['segmentation'] if len(p) >= 6]
mask_bytes = mask_utils.poly2bytes(poly, height, width)
polygons = [p for p in obj['segmentation'] if len(p) >= 6]
# mask_bytes = mask_utils.poly2bytes(poly, height, width)
else:
# Crowd masks
# Some are encoded with height or width
# running out of the image bound
# Do not use them or decoding error is inevitable
mask_bytes = mask_utils.poly2bytes(obj['segmentation'], height, width)
mask = mask_utils.poly2bytes(obj['segmentation'], height, width)
if obj['area'] > 0 and x2 > x1 and y2 > y1:
obj['clean_bbox'] = [x1, y1, x2, y2]
valid_objects.append({
'bbox': [x1, y1, x2, y2],
'mask': mask_bytes,
'mask': mask,
'polygons': polygons,
'category_id': obj['category_id'],
'class_id': self._cat_id_to_class_id[obj['category_id']],
'crowd': obj['iscrowd'],
})
valid_objects[-1]['name'] = \
self._ind_to_class[valid_objects[-1]['class_id']]
return height, width, valid_objects
@property
......@@ -150,31 +148,35 @@ def make_mask(split, year, data_dir):
if not osp.exists(osp.join(coco._data_path, 'splits')):
os.makedirs(osp.join(coco._data_path, 'splits'))
gt_recs = OrderedDict()
gt_recs = collections.OrderedDict()
for i in range(coco.num_images):
filename = (coco.image_path_at(i).split('/')[-1]).split('.')[0]
filename = osp.basename(coco.image_path_at(i)).split('.')[0]
h, w, objects = coco.annotation_at(i)
gt_recs[filename] = objects
with open(osp.join('build', 'coco_' + year + '_' + split + '_mask.pkl'), 'wb') as f:
cPickle.dump(gt_recs, f, cPickle.HIGHEST_PROTOCOL)
with open(osp.join('build',
'coco_' + year +
'_' + split + '_mask.pkl'), 'wb') as f:
pickle.dump(gt_recs, f, pickle.HIGHEST_PROTOCOL)
with open(osp.join(coco._data_path, 'splits', split + '.txt'), 'w') as f:
for i in range(coco.num_images):
filename = (coco.image_path_at(i).split('/')[-1]).split('.')[0]
filename = str(osp.basename(coco.image_path_at(i)).split('.')[0])
if i != coco.num_images - 1:
filename += '\n'
f.write(filename)
def merge_mask(split, year, mask_files):
gt_recs = OrderedDict()
gt_recs = collections.OrderedDict()
data_path = os.path.dirname(mask_files[0])
for mask_file in mask_files:
with open(mask_file, 'rb') as f:
recs = cPickle.load(f)
recs = pickle.load(f)
gt_recs.update(recs)
with open(osp.join(data_path, 'coco_' + year + '_' + split + '_mask.pkl'), 'wb') as f:
cPickle.dump(gt_recs, f, cPickle.HIGHEST_PROTOCOL)
with open(osp.join(data_path,
'coco_' + year +
'_' + split + '_mask.pkl'), 'wb') as f:
pickle.dump(gt_recs, f, pickle.HIGHEST_PROTOCOL)
......@@ -132,4 +132,3 @@ def make_record(
data_size = os.path.getsize(record_file + '/root.data') * 1e-6
print('{} images take {:.2f} MB in {:.2f} sec.'
.format(len(entries), data_size, end_time - start_time))
......@@ -8,7 +8,6 @@
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Make record file for VOC dataset."""
from __future__ import absolute_import
......@@ -29,7 +28,7 @@ if __name__ == '__main__':
annotations_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/Annotations'),
osp.join(voc_root, 'VOCdevkit2012/VOC2012/Annotations')],
splits_path=[osp.join(voc_root, 'VOCdevkit2007/VOC2007/ImageSets/Main'),
osp.join(voc_root, 'VOCdevkit2012/VOC2012/ImageSets/Main')],
osp.join(voc_root, 'VOCdevkit2012/VOC2012/ImageSets/Main')],
splits=['trainval', 'trainval']
)
......
......@@ -8,3 +8,11 @@
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""A platform implementing popular object detection algorithms."""
from __future__ import absolute_import as _absolute_import
from __future__ import division as _division
from __future__ import print_function as _print_function
# Version
from seetadet.version import version as __version__
......@@ -8,3 +8,9 @@
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.algo.common.anchor_sampler import AnchorSampler
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.core.config import cfg
class AnchorSampler(object):
"""Sample precomputed anchors asynchronously."""
def __init__(self):
self._rpn_target = None
self._retinanet_target = None
self._ssd_target = None
if 'rcnn' in cfg.MODEL.TYPE:
from seetadet.algo.faster_rcnn import anchor_target
self._rpn_target = anchor_target.AnchorTarget()
elif cfg.MODEL.TYPE == 'retinanet':
from seetadet.algo.retinanet import anchor_target
self._retinanet_target = anchor_target.AnchorTarget()
elif cfg.MODEL.TYPE == 'ssd':
from seetadet.algo.ssd import anchor_target
self._ssd_target = anchor_target.AnchorTarget()
def __call__(self, **inputs):
"""Return the sample anchors."""
if self._rpn_target:
fg_inds, bg_inds = \
self._rpn_target.sample_anchors(
gt_boxes=inputs['gt_boxes'],
im_info=inputs['im_info'],
)
return {'fg_inds': fg_inds, 'bg_inds': bg_inds}
if self._retinanet_target:
fg_inds, ignore_inds = \
self._retinanet_target.sample_anchors(
gt_boxes=inputs['gt_boxes'],
im_info=inputs['im_info'],
)
return {'fg_inds': fg_inds, 'bg_inds': ignore_inds}
if self._ssd_target:
fg_inds, neg_inds = \
self._ssd_target.sample_anchors(
gt_boxes=inputs['gt_boxes'],
)
return {'fg_inds': fg_inds, 'bg_inds': neg_inds}
return {}
......@@ -17,7 +17,3 @@ from seetadet.algo.faster_rcnn.anchor_target import AnchorTarget
from seetadet.algo.faster_rcnn.data_loader import DataLoader
from seetadet.algo.faster_rcnn.proposal import Proposal
from seetadet.algo.faster_rcnn.proposal_target import ProposalTarget
from seetadet.algo.faster_rcnn.utils import generate_grid_anchors
from seetadet.algo.faster_rcnn.utils import map_blobs_by_levels
from seetadet.algo.faster_rcnn.utils import map_rois_to_levels
from seetadet.algo.faster_rcnn.utils import map_returns_to_blobs
......@@ -13,8 +13,11 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import multiprocessing as mp
import time
import threading
import queue
import dragon
import dragon.vm.torch as torch
......@@ -23,8 +26,8 @@ import numpy as np
from seetadet.algo.faster_rcnn import data_transformer
from seetadet.core.config import cfg
from seetadet.datasets.factory import get_dataset
from seetadet.utils import blob as blob_util
from seetadet.utils import logger
from seetadet.utils.blob import im_list_to_blob
class DataLoader(object):
......@@ -33,28 +36,24 @@ class DataLoader(object):
def __init__(self):
super(DataLoader, self).__init__()
dataset = get_dataset(cfg.TRAIN.DATASET)
if cfg.USE_DALI:
from seetadet.dali import rcnn_pipeline as pipe
self.iterator = pipe.new_iterator(dataset.source)
else:
self.iterator = Iterator(**{
'dataset': dataset.cls,
'source': dataset.source,
'classes': dataset.classes,
'shuffle': cfg.TRAIN.USE_SHUFFLE,
'num_chunks': cfg.TRAIN.SHUFFLE_CHUNKS,
'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
})
self.iterator = Iterator(**{
'dataset': dataset.cls,
'source': dataset.source,
'classes': dataset.classes,
'shuffle': cfg.TRAIN.USE_SHUFFLE,
'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
})
self.iterator.start()
def __call__(self):
outputs = self.iterator.next()
if isinstance(outputs['data'], np.ndarray):
outputs['data'] = torch.from_numpy(outputs['data'])
if isinstance(outputs['image'], np.ndarray):
outputs['image'] = torch.from_numpy(outputs['image'])
return outputs
class Iterator(mp.Process):
class Iterator(threading.Thread):
"""Iterator to return the batch of data."""
def __init__(self, **kwargs):
......@@ -68,17 +67,16 @@ class Iterator(mp.Process):
rank = dragon.distributed.get_rank(process_group)
# Configuration
self._prefetch = kwargs.get('prefetch', 5)
self._batch_size = kwargs.get('batch_size', 2)
self._num_readers = kwargs.get('num_readers', 1)
self._num_transformers = kwargs.get('num_transformers', 3)
self.daemon = True
# Initialize queues
num_batches = self._prefetch * self._num_readers
self.q_in = mp.Queue(num_batches * self._batch_size)
self.q1_out = mp.Queue(num_batches * self._batch_size)
self.q2_out = mp.Queue(num_batches * self._batch_size)
num_batches = self._num_readers
self._queue1 = mp.Queue(num_batches * self._batch_size)
self._queue2 = mp.Queue(num_batches * self._batch_size)
self._queue3 = queue.Queue(num_batches)
# Initialize readers
self._readers = []
......@@ -89,7 +87,7 @@ class Iterator(mp.Process):
self._readers.append(dragon.io.DataReader(
part_idx=part_idx, num_parts=num_parts, **kwargs))
self._readers[i]._seed += part_idx
self._readers[i].q_out = self.q_in
self._readers[i].q_out = self._queue1
self._readers[i].start()
time.sleep(0.1)
......@@ -98,8 +96,7 @@ class Iterator(mp.Process):
for i in range(self._num_transformers):
p = data_transformer.DataTransformer(**kwargs)
p._seed += (i + rank * self._num_transformers)
p.q_in = self.q_in
p.q1_out, p.q2_out = self.q1_out, self.q2_out
p.q_in, p.q_out = self._queue1, self._queue2
p.start()
self._transformers.append(p)
time.sleep(0.1)
......@@ -122,35 +119,43 @@ class Iterator(mp.Process):
"""Return the next batch of data."""
return self.__next__()
def run(self):
"""Main loop."""
num_images = cfg.TRAIN.IMS_PER_BATCH
num_batches = cfg.TRAIN.ASPECT_GROUPING
logger.info('Initialize prefetching batches...')
example_buffer = [self._queue2.get()
for _ in range(num_images * num_batches)]
next_examples = []
while True:
# Use cached buffer for next N examples
# Examples are sorted to simulate aspect grouping
if len(next_examples) == 0:
next_examples = example_buffer
next_examples.sort(key=lambda d: d['aspect_ratio'])
example_buffer = []
# Prepare the next batch
outputs = collections.defaultdict(list)
for i in range(num_images):
example = next_examples.pop(0)
outputs['image'].append(example['image'])
outputs['gt_boxes'].append(example['boxes'])
outputs['im_info'].append(example['im_info'])
outputs['fg_inds'].append(example.get('fg_inds', None))
outputs['bg_inds'].append(example.get('bg_inds', None))
example_buffer.append(self._queue2.get())
outputs['image'] = blob_util.im_list_to_blob(
outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
# Send batch data to consumer
self._queue3.put(outputs)
def __iter__(self):
"""Return the iterator self."""
return self
def __next__(self):
"""Return the next batch of data."""
q_out = None
# Two queues to implement aspect-grouping
# This is necessary to reduce the gpu memory
# from fetching a huge square batch blob
while q_out is None:
if self.q1_out.qsize() >= cfg.TRAIN.IMS_PER_BATCH:
q_out = self.q1_out
elif self.q2_out.qsize() >= cfg.TRAIN.IMS_PER_BATCH:
q_out = self.q2_out
self.q1_out, self.q2_out = self.q2_out, self.q1_out
images, images_info, boxes_to_pack = [], [], []
for i in range(cfg.TRAIN.IMS_PER_BATCH):
image, image_scale, boxes = q_out.get()
images.append(image)
images_info.append(list(image.shape[:2]) + [image_scale])
gt_boxes = np.zeros((boxes.shape[0], boxes.shape[1] + 1), 'float32')
gt_boxes[:, :boxes.shape[1]], gt_boxes[:, -1] = boxes, i
boxes_to_pack.append(gt_boxes)
return {
'data': im_list_to_blob(images),
'ims_info': np.array(images_info, dtype=np.float32),
'gt_boxes': np.concatenate(boxes_to_pack),
}
return self._queue3.get()
......@@ -15,109 +15,122 @@ from __future__ import print_function
import multiprocessing
import cv2
import numpy as np
import numpy.random as npr
from seetadet.algo import common as algo_common
from seetadet.core.config import cfg
from seetadet.datasets.example import Example
from seetadet.utils import boxes as box_util
from seetadet.utils.blob import prep_im_for_blob
from seetadet.utils import image as image_util
class DataTransformer(multiprocessing.Process):
def __init__(self, **kwargs):
super(DataTransformer, self).__init__()
self._scales = cfg.TRAIN.SCALES
self._random_scales = cfg.TRAIN.RANDOM_SCALES
self._max_size = cfg.TRAIN.MAX_SIZE
self._seed = cfg.RNG_SEED
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._use_diff = cfg.TRAIN.USE_DIFF
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._use_distort = cfg.TRAIN.USE_COLOR_JITTER
self._classes = kwargs.get('classes', ('__background__',))
self._num_classes = len(self._classes)
self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self.q_in = self.q1_out = self.q2_out = None
self._anchor_sampler = algo_common.AnchorSampler()
self.q_in = self.q_out = None
self.daemon = True
def make_roi_dict(self, example, im_scale, apply_flip=False):
objects, n_objects = example.objects, 0
def get_boxes(self, example, im_scale):
objects, num_objects = example.objects, 0
height, width = example.height, example.width
if not self._use_diff:
for obj in objects:
if obj.get('difficult', 0) == 0:
n_objects += 1
num_objects += 1
else:
n_objects = len(objects)
num_objects = len(objects)
roi_dict = {
'boxes': np.zeros((n_objects, 4), 'float32'),
'gt_classes': np.zeros((n_objects,), 'int32'),
}
boxes = np.zeros((num_objects, 4), 'float32')
gt_classes = np.zeros((num_objects,), 'float32')
# Filter the difficult instances
object_idx = 0
for obj in objects:
if not self._use_diff and \
obj.get('difficult', 0) > 0:
if not self._use_diff and obj.get('difficult', 0) > 0:
continue
bbox = obj['bbox']
roi_dict['boxes'][object_idx, :] = [
max(0, bbox[0]),
max(0, bbox[1]),
min(bbox[2], width - 1),
min(bbox[3], height - 1),
]
roi_dict['gt_classes'][object_idx] = \
self._class_to_ind[obj['name']]
boxes[object_idx, :] = [max(0, bbox[0]),
max(0, bbox[1]),
min(bbox[2], width - 1),
min(bbox[3], height - 1)]
gt_classes[object_idx] = self._class_to_ind[obj['name']]
object_idx += 1
# Flip the boxes if necessary
if apply_flip:
roi_dict['boxes'] = \
box_util.flip_boxes(
roi_dict['boxes'],
width,
)
# Scale the boxes to the detecting scale
roi_dict['boxes'] *= im_scale
boxes *= im_scale
# Attach the classes
gt_boxes = np.empty((num_objects, 5), dtype=np.float32)
gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
return roi_dict
return gt_boxes
def get(self, example):
example = Example(example)
img = example.image
# Scale
target_size = self._scales[np.random.randint(len(self._scales))]
img, im_scale = prep_im_for_blob(img, target_size, self._max_size)
# Resize
img, im_scale = image_util.resize_image_with_target_size(
example.image,
target_size=npr.choice(self._scales),
max_size=self._max_size,
random_scales=self._random_scales,
)
# Flip
apply_flip = False
if self._use_flipped:
if np.random.randint(2) > 0:
img = img[:, ::-1]
apply_flip = True
flipped = False
if self._use_flipped and npr.randint(2) > 0:
img = img[:, ::-1]
flipped = True
# Distort
if self._use_distort:
img = image_util.distort_image(img)
# Boxes
boxes = self.get_boxes(example, im_scale)
# Flip the boxes if necessary
if flipped:
boxes = box_util.flip_boxes(boxes, img.shape[1])
# Example -> RoIDict
roi_dict = self.make_roi_dict(example, im_scale, apply_flip)
# Standard outputs.
outputs = {'image': img,
'boxes': boxes,
'im_info': img.shape[:2] + (im_scale,)}
# Post-Process for gt boxes
# Shape like: [num_objects, {x1, y1, x2, y2, cls}]
gt_boxes = np.empty((len(roi_dict['gt_classes']), 5), dtype=np.float32)
gt_boxes[:, :4], gt_boxes[:, 4] = roi_dict['boxes'], roi_dict['gt_classes']
# Attach precomputed targets.
if len(boxes) > 0:
outputs.update(
self._anchor_sampler(
gt_boxes=boxes,
im_info=outputs['im_info']))
return img, im_scale, gt_boxes
return outputs
def run(self):
# Fix the process-local random seed
# Disable the opencv threading.
cv2.setNumThreads(1)
# Fix the process-local random seed.
np.random.seed(self._seed)
# Main prefetch loop
while True:
outputs = self.get(self.q_in.get())
if len(outputs[2]) < 1:
continue # Ignore the non-object image
aspect_ratio = float(outputs[0].shape[0]) / outputs[0].shape[1]
if aspect_ratio > 1.:
self.q1_out.put(outputs)
else:
self.q2_out.put(outputs)
if len(outputs['boxes']) < 1:
continue # Ignore non-object image.
height, width = outputs['image'].shape[:2]
outputs['aspect_ratio'] = float(height) / float(width)
self.q_out.put(outputs)
......@@ -17,8 +17,8 @@ import collections
import numpy as np
from seetadet.algo.faster_rcnn.generate_anchors import generate_anchors
from seetadet.algo.faster_rcnn.utils import generate_grid_anchors
from seetadet.algo.faster_rcnn import generate_anchors as anchor_util
from seetadet.algo.faster_rcnn import utils as rcnn_util
from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util
from seetadet.utils import nms
......@@ -29,59 +29,50 @@ class Proposal(object):
def __init__(self):
super(Proposal, self).__init__()
# Load the basic configs
# Load basic configs
self.scales = cfg.RPN.SCALES
self.strides = cfg.RPN.STRIDES
self.ratios = cfg.RPN.ASPECT_RATIOS
self.num_strides = len(self.strides)
self.defaults = collections.OrderedDict([
('rois', np.array([[-1, 0, 0, 1, 1]], 'float32')),
])
('rois', np.array([[-1, 0, 0, 1, 1]], 'float32'))])
self.bbox_transform_clip = \
np.log(cfg.TRAIN.MAX_SIZE / min(self.strides))
# Generate base anchors
self.base_anchors = []
for i in range(self.num_strides):
self.base_anchors.append(
generate_anchors(
anchor_util.generate_anchors(
self.strides[i],
self.ratios,
np.array([self.scales[i]])
if self.num_strides > 1
else np.array(self.scales)
)
)
else np.array(self.scales)))
def __call__(self, features, cls_prob, bbox_pred, ims_info):
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
pre_nms_top_n = cfg.TRAIN.RPN_PRE_NMS_TOP_N
post_nms_top_n = cfg.TRAIN.RPN_POST_NMS_TOP_N
nms_thresh = cfg.TRAIN.RPN_NMS_THRESH
min_size = cfg.TRAIN.RPN_MIN_SIZE
# Get resources
num_images = ims_info.shape[0]
grid_shapes = [f.shape[-2:] for f in features]
all_anchors = generate_grid_anchors(
grid_shapes, self.base_anchors, self.strides)
shapes = [f.shape[-2:] for f in inputs['features']]
all_anchors = rcnn_util.get_shifted_anchors(
shapes, self.base_anchors, self.strides)
# Prepare for the outputs
batch_rois = []
cls_prob = cls_prob.numpy()
bbox_pred = bbox_pred.numpy()
if self.num_strides > 1:
# (?, 4, A * K) -> (?, A * K, 4)
bbox_pred = bbox_pred.transpose((0, 2, 1))
else:
# (?, A * 4, H, W) -> (?, H, W, A * 4)
cls_prob = cls_prob.transpose((0, 2, 3, 1))
bbox_pred = bbox_pred.transpose((0, 2, 3, 1))
cls_prob = inputs['cls_prob'].numpy()
# (?, 4, A * K) -> (?, A * K, 4)
bbox_pred = inputs['bbox_pred'].numpy()
bbox_pred = bbox_pred.transpose((0, 2, 1))
# Extract RoIs separately
for ix in range(num_images):
# [?, N] -> [? * N, 1]
scores = cls_prob[ix].reshape((-1, 1))
if self.num_strides > 1:
deltas = bbox_pred[ix]
else:
deltas = bbox_pred[ix].reshape((-1, 4))
deltas = bbox_pred[ix]
im_info = inputs['im_info'][ix]
if pre_nms_top_n <= 0 or pre_nms_top_n >= len(scores):
order = np.argsort(-scores.squeeze())
......@@ -97,15 +88,11 @@ class Proposal(object):
scores = scores[order]
# Convert anchors into proposals via bbox transformations
proposals = box_util.bbox_transform_inv(anchors, deltas)
proposals = box_util.bbox_transform_inv(
anchors, deltas, clip=self.bbox_transform_clip)
# Clip predicted boxes to image
proposals = box_util.clip_tiled_boxes(proposals, ims_info[ix, :2])
# Remove predicted boxes with either height or width < threshold
keep = box_util.filter_boxes(proposals, min_size * ims_info[ix, 2])
proposals = proposals[keep, :]
scores = scores[keep]
proposals = box_util.clip_tiled_boxes(proposals, im_info[:2])
# Apply nms (e.g. threshold = 0.7)
# Take after_nms_topN (e.g. 300)
......
......@@ -30,19 +30,17 @@ class ProposalTarget(object):
def __init__(self):
super(ProposalTarget, self).__init__()
self.num_strides = len(cfg.RPN.STRIDES)
self.num_classes = cfg.MODEL.NUM_CLASSES
self.num_classes = len(cfg.MODEL.CLASSES)
self.defaults = collections.OrderedDict([
('rois', np.array([[-1, 0, 0, 1, 1]], 'float32')),
('labels', np.array([-1], 'int64')),
('bbox_targets', np.zeros((1, 4), 'float32')),
])
def __call__(self, rpn_rois, gt_boxes):
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
# Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
all_rois = rpn_rois
# GT boxes (x1, y1, x2, y2, label)
gt_boxes_wide = box_util.dismantle_boxes(gt_boxes, num_images)
all_rois = inputs['rois']
# Prepare for the outputs
keys = self.defaults.keys()
......@@ -50,22 +48,22 @@ class ProposalTarget(object):
# Generate targets separately
for ix in range(num_images):
gt_boxes = gt_boxes_wide[ix]
# GT boxes (x1, y1, x2, y2, label)
gt_boxes = inputs['gt_boxes'][ix]
# Extract proposals for this image
rois = all_rois[np.where(all_rois[:, 0].astype('int32') == ix)[0]]
# Include ground-truth boxes in the set of candidate rois
inds = np.ones((gt_boxes.shape[0], 1), gt_boxes.dtype) * ix
rois = np.vstack((rois, np.hstack((inds, gt_boxes[:, :4]))))
# Sample a batch of RoIs for training
rois_per_image = cfg.TRAIN.BATCH_SIZE
fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)
rois_per_image = cfg.FRCNN.BATCH_SIZE
fg_rois_per_image = np.round(cfg.FRCNN.FG_FRACTION * rois_per_image)
rcnn_util.map_returns_to_blobs(
sample_rois(
rois,
gt_boxes,
rois_per_image,
fg_rois_per_image,
), blobs, keys,
sample_rois(rois,
gt_boxes,
rois_per_image,
fg_rois_per_image),
blobs, keys,
)
# Stack into continuous blobs
......@@ -95,7 +93,7 @@ class ProposalTarget(object):
return {
'rois': [new_tensor(rois) for rois in rois_wide],
'labels': new_tensor(blobs['labels']),
'bbox_indices': new_tensor(cls_inds[fg_inds] + blobs['labels'][fg_inds]),
'bbox_inds': new_tensor(cls_inds[fg_inds] + blobs['labels'][fg_inds]),
'bbox_targets': new_tensor(blobs['bbox_targets'][fg_inds].astype('float32')),
'bbox_anchors': new_tensor(blobs['rois'][fg_inds, 1:].astype('float32')),
}
......@@ -108,8 +106,8 @@ def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois):
max_overlaps = overlaps.max(axis=1)
labels = gt_boxes[gt_assignment, 4].astype('int64')
# Select foreground RoIs as those with >= FG_THRESH overlap
fg_thresh = cfg.TRAIN.FG_THRESH
# Select foreground RoIs as those with >= POSITIVE_OVERLAP
fg_thresh = cfg.FRCNN.POSITIVE_OVERLAP
fg_inds = np.where(max_overlaps >= fg_thresh)[0]
while fg_inds.size == 0:
fg_thresh -= 0.01
......@@ -119,9 +117,10 @@ def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois):
fg_rois_per_this_image = int(min(num_fg_rois, fg_inds.size))
fg_inds = npr.choice(fg_inds, fg_rois_per_this_image, False)
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
(max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
# Select background RoIs as those within
# [NEGATIVE_OVERLAP_LO, NEGATIVE_OVERLAP_HI)
bg_inds = np.where((max_overlaps < cfg.FRCNN.NEGATIVE_OVERLAP_HI) &
(max_overlaps >= cfg.FRCNN.NEGATIVE_OVERLAP_LO))[0]
# Compute number of background RoIs to take from this image
bg_rois_per_this_image = num_rois - fg_rois_per_this_image
bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size)
......@@ -129,7 +128,7 @@ def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois):
if bg_inds.size > 0:
bg_inds = npr.choice(bg_inds, bg_rois_per_this_image, False)
# The indices that we're selecting (both fg and bg)
# The selecting indices (both fg and bg)
keep_inds = np.append(fg_inds, bg_inds)
# Select sampled values from various arrays
rois, labels = all_rois[keep_inds], labels[keep_inds]
......@@ -137,12 +136,9 @@ def sample_rois(all_rois, gt_boxes, num_rois, num_fg_rois):
labels[fg_rois_per_this_image:] = 0
# Compute the target from RoIs
return [
rois,
labels,
box_util.bbox_transform(
rois[:, 1:5],
gt_boxes[gt_assignment[keep_inds], :4],
cfg.BBOX_REG_WEIGHTS,
)
]
outputs = [rois, labels]
outputs += [box_util.bbox_transform(
rois[:, 1:5],
gt_boxes[gt_assignment[keep_inds], :4],
cfg.BBOX_REG_WEIGHTS)]
return outputs
......@@ -20,97 +20,131 @@ import numpy as np
from seetadet.core.config import cfg
from seetadet.modeling.detector import new_detector
from seetadet.utils import blob as blob_util
from seetadet.utils import boxes as box_util
from seetadet.utils import image as image_util
from seetadet.utils import logger
from seetadet.utils import nms as nms_util
from seetadet.utils import time_util
from seetadet.utils.blob import im_list_to_blob
from seetadet.utils.image import scale_image
def im_detect(detector, raw_image):
"""Detect a image, with single or multiple scales."""
ims, ims_scale = scale_image(raw_image)
# Prepare blobs
data = im_list_to_blob(ims)
ims_info = np.array([list(data.shape[1:3]) + [im_scale]
for im_scale in ims_scale], dtype=np.float32)
# Do Forward
data = torch.from_numpy(data)
ims_info = torch.from_numpy(ims_info)
def get_data(raw_images):
"""Return the test data."""
max_size = cfg.TEST.MAX_SIZE
images_wide = []
image_shapes_wide, image_scales_wide = [], []
for img in raw_images:
images, image_scales = image_util.scale_image(
img, scales=cfg.TEST.SCALES, max_size=max_size)
images_wide += images
image_scales_wide += image_scales
image_shapes_wide += [img.shape[:2] for img in images]
images = blob_util.im_list_to_blob(
images_wide, coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
image_shapes = np.array(image_shapes_wide)
image_scales = np.array(image_scales_wide).reshape((len(images), -1))
images_info = np.hstack([image_shapes, image_scales]).astype('float32')
return images, images_info
def ims_detect(detector, raw_images, timer=None):
"""Detect images at single or multiple scales."""
images, images_info = get_data(raw_images)
timer.tic() if timer else timer
# Do forward
inputs = {'image': torch.from_numpy(images),
'im_info': torch.from_numpy(images_info)}
if not hasattr(detector, 'script_forward'):
def script_forward(self, data, ims_info):
return self.forward({'data': data, 'ims_info': ims_info})
def script_forward(self, image, im_info):
return self.forward({'image': image, 'im_info': im_info})
detector.script_forward = torch.jit.trace(
func=types.MethodType(script_forward, detector),
example_inputs=[data, ims_info],
example_inputs=[inputs['image'], inputs['im_info']],
)
outputs = detector.script_forward(data, ims_info)
outputs = detector.script_forward(inputs['image'], inputs['im_info'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
# Decode results
all_scores, all_boxes = [], []
pred_boxes = box_util.bbox_transform_inv(
batch_pred = box_util.bbox_transform_inv(
outputs['rois'][:, 1:5],
outputs['bbox_pred'],
cfg.BBOX_REG_WEIGHTS,
)
for i in range(len(ims)):
cfg.BBOX_REG_WEIGHTS)
results = [([], []) for _ in range(len(raw_images))]
for i in range(len(images)):
ii = i // len(cfg.TEST.SCALES)
inds = np.where(outputs['rois'][:, 0].astype(np.int32) == i)[0]
boxes = pred_boxes[inds] / ims_scale[i]
all_scores.append(outputs['cls_prob'][inds])
all_boxes.append(box_util.clip_tiled_boxes(boxes, raw_image.shape))
return np.vstack(all_scores), np.vstack(all_boxes)
def test_net(weights, num_classes, q_in, q_out, device):
num_classes, cfg.GPU_ID = num_classes, device
boxes = batch_pred[inds] / images_info[i][2]
boxes = box_util.clip_tiled_boxes(boxes, raw_images[ii].shape)
results[ii][0].append(outputs['cls_prob'][inds])
results[ii][1].append(boxes)
# Merge from multiple scales
ret = [(np.vstack(s), np.vstack(b)) for s, b in results]
timer.toc() if timer else timer
return ret
def test_net(weights, q_in, q_out, device, root_logger=True):
"""Test a network trained with Faster R-CNN algorithm."""
cfg.GPU_ID = device
num_classes = len(cfg.MODEL.CLASSES)
logger.set_root_logger(root_logger)
detector = new_detector(device, weights)
_t = time_util.new_timers('im_detect', 'misc')
must_stop = False
timers = time_util.new_timers('im_detect_bbox', 'misc')
empty_detections = np.zeros((0, 5), 'float32')
while True:
i, raw_image = q_in.get()
if i < 0:
if must_stop:
break
boxes_this_image = [[]]
with _t['im_detect'].tic_and_toc():
scores, boxes = im_detect(detector, raw_image)
_t['misc'].tic()
for j in range(1, num_classes):
inds = np.where(scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
cls_scores = scores[inds, j]
cls_boxes = boxes[inds, j * 4:(j + 1) * 4]
cls_detections = np.hstack(
(cls_boxes, cls_scores[:, np.newaxis])
).astype(np.float32, copy=False)
if cfg.TEST.USE_SOFT_NMS:
keep = nms_util.soft_nms(
cls_detections,
thresh=cfg.TEST.NMS,
method=cfg.TEST.SOFT_NMS_METHOD,
sigma=cfg.TEST.SOFT_NMS_SIGMA,
)
else:
keep = nms_util.nms(
cls_detections,
thresh=cfg.TEST.NMS,
)
cls_detections = cls_detections[keep, :]
boxes_this_image.append(cls_detections)
_t['misc'].toc()
q_out.put((
i,
dict([('im_detect', _t['im_detect'].average_time),
('misc', _t['misc'].average_time)]),
dict([('boxes', boxes_this_image)]),
))
indices, raw_images = [], []
for _ in range(cfg.TEST.IMS_PER_BATCH):
i, raw_image = q_in.get()
if i < 0:
must_stop = True
break
indices.append(i)
raw_images.append(raw_image)
if len(raw_images) == 0:
continue
results = ims_detect(detector, raw_images, timers['im_detect_bbox'])
for i, (scores, boxes) in enumerate(results):
timers['misc'].tic()
boxes_this_image = [[]]
for j in range(1, num_classes):
inds = np.where(scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
if len(inds) == 0:
boxes_this_image.append(empty_detections)
continue
cls_scores = scores[inds, j]
cls_boxes = boxes[inds, j * 4:(j + 1) * 4]
cls_detections = np.hstack(
(cls_boxes, cls_scores[:, np.newaxis])) \
.astype(np.float32, copy=False)
if cfg.TEST.USE_SOFT_NMS:
keep = nms_util.soft_nms(
cls_detections,
thresh=cfg.TEST.NMS,
method=cfg.TEST.SOFT_NMS_METHOD,
sigma=cfg.TEST.SOFT_NMS_SIGMA,
)
else:
keep = nms_util.nms(
cls_detections,
thresh=cfg.TEST.NMS,
)
cls_detections = cls_detections[keep, :]
boxes_this_image.append(cls_detections)
timers['misc'].toc()
q_out.put((
indices[i],
dict([('im_detect', timers['im_detect_bbox'].average_time),
('misc', timers['misc'].average_time)]),
dict([('boxes', boxes_this_image)]),
))
......@@ -19,43 +19,78 @@ import numpy as np
from seetadet.core.config import cfg
def generate_grid_anchors(grid_shapes, base_anchors, strides):
num_strides = len(strides)
if len(grid_shapes) != num_strides:
raise ValueError(
'Given %d grids for %d strides.'
% (len(grid_shapes), num_strides)
)
# Generate proposals from shifted anchors
def get_shifted_coords(shapes, base_anchors):
"""Return the x-y coordinates of shifted anchors."""
xs, ys = [], []
for i in range(len(shapes)):
height, width = shapes[i]
x, y = np.arange(0, width), np.arange(0, height)
x, y = np.meshgrid(x, y)
# Add A anchors (A,) to cell K shifts (K,)
# to get shift coords (A, K)
xs.append(np.tile(x.flatten(), base_anchors[i].shape[0]))
ys.append(np.tile(y.flatten(), base_anchors[i].shape[0]))
return np.concatenate(xs), np.concatenate(ys)
def get_shifted_anchors(shapes, base_anchors, strides):
"""Return the shifted anchors on given shapes."""
anchors_to_pack = []
for i in range(len(grid_shapes)):
height, width = grid_shapes[i]
for i in range(len(shapes)):
height, width = shapes[i]
shift_x = np.arange(0, width) * strides[i]
shift_y = np.arange(0, height) * strides[i]
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose()
# Add a anchors (1, a, 4) to
# cell k shifts (k, 1, 4) to get
# shift anchors (k, a, 4)
# Reshape to (k * a, 4) shifted anchors
# Add A anchors (A, 1, 4) to cell K shifts (1, K, 4)
# to get shift anchors (A, K, 4)
a = base_anchors[i].shape[0]
k = shifts.shape[0]
anchors = (base_anchors[i].reshape((1, a, 4)) +
shifts.reshape((1, k, 4)).transpose((1, 0, 2)))
if num_strides > 1:
# Transpose from (K, A, 4) to (A, K, 4)
# We will pack it with other strides to
# match the data format of (N, C, H, W)
anchors = anchors.transpose((1, 0, 2))
anchors = anchors.reshape((a * k, 4))
anchors_to_pack.append(anchors)
else:
# Original order of Faster R-CNN
return anchors.reshape((k * a, 4))
anchors = (base_anchors[i].reshape((a, 1, 4)) +
shifts.reshape((1, k, 4)))
anchors_to_pack.append(anchors.reshape((a * k, 4)))
return np.vstack(anchors_to_pack)
def narrow_anchors(
all_coords,
base_anchors,
max_shapes,
shapes,
inds,
remapping=None,
):
"""Return the valid shifted anchors on given shapes."""
x_coords, y_coords = all_coords
inds_wide, remapping_wide = [], []
offset = num = 0
for i in range(len(max_shapes)):
num += base_anchors[i].shape[0] * np.prod(max_shapes[i])
inds_inside = np.where((inds >= offset) & (inds < num))[0]
inds_wide.append(inds[inds_inside])
if remapping is not None:
remapping_wide.append(remapping[inds_inside])
offset = num
offset1 = offset2 = num1 = num2 = 0
for i in range(len(max_shapes)):
num1 += base_anchors[i].shape[0] * np.prod(max_shapes[i])
num2 += base_anchors[i].shape[0] * np.prod(shapes[i])
inds = inds_wide[i]
x, y = x_coords[inds], y_coords[inds]
a = ((inds - offset1) // max_shapes[i][1]) // max_shapes[i][0]
inds = (a * shapes[i][0] + y) * shapes[i][1] + x + offset2
inds_mask = np.where((x < shapes[i][1]) & (y < shapes[i][0]))[0]
inds_wide[i] = inds[inds_mask]
if remapping is not None:
remapping_wide[i] = remapping_wide[i][inds_mask]
offset1, offset2 = num1, num2
outputs = [np.concatenate(inds_wide)]
if remapping is not None:
outputs += [np.concatenate(remapping_wide)]
return outputs[0] if len(outputs) == 1 else outputs
def map_returns_to_blobs(returns, blobs, keys):
"""Map returns of image to blobs."""
for i, key in enumerate(keys):
......@@ -83,6 +118,5 @@ def map_blobs_by_levels(blobs, defaults, lvl_inds):
outputs[key].append(
blob[inds]
if len(inds) > 0
else defaults[key]
)
else defaults[key])
return outputs
......@@ -13,8 +13,11 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import multiprocessing as mp
import time
import threading
import queue
import dragon
import dragon.vm.torch as torch
......@@ -23,9 +26,8 @@ import numpy as np
from seetadet.algo.mask_rcnn import data_transformer
from seetadet.core.config import cfg
from seetadet.datasets.factory import get_dataset
from seetadet.utils import blob as blob_util
from seetadet.utils import logger
from seetadet.utils.blob import im_list_to_blob
from seetadet.utils.blob import mask_list_to_blob
class DataLoader(object):
......@@ -39,19 +41,19 @@ class DataLoader(object):
'source': dataset.source,
'classes': dataset.classes,
'shuffle': cfg.TRAIN.USE_SHUFFLE,
'num_chunks': cfg.TRAIN.SHUFFLE_CHUNKS,
'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
})
self.iterator.start()
def __call__(self):
outputs = self.iterator.next()
if isinstance(outputs['data'], np.ndarray):
outputs['data'] = torch.from_numpy(outputs['data'])
if isinstance(outputs['image'], np.ndarray):
outputs['image'] = torch.from_numpy(outputs['image'])
return outputs
class Iterator(mp.Process):
class Iterator(threading.Thread):
"""Iterator to return the batch of data."""
def __init__(self, **kwargs):
......@@ -65,17 +67,16 @@ class Iterator(mp.Process):
rank = dragon.distributed.get_rank(process_group)
# Configuration
self._prefetch = kwargs.get('prefetch', 5)
self._batch_size = kwargs.get('batch_size', 2)
self._num_readers = kwargs.get('num_readers', 1)
self._num_transformers = kwargs.get('num_transformers', 3)
self.daemon = True
# Initialize queues
num_batches = self._prefetch * self._num_readers
self.q_in = mp.Queue(num_batches * self._batch_size)
self.q1_out = mp.Queue(num_batches * self._batch_size)
self.q2_out = mp.Queue(num_batches * self._batch_size)
num_batches = self._num_readers
self._queue1 = mp.Queue(num_batches * self._batch_size)
self._queue2 = mp.Queue(num_batches * self._batch_size)
self._queue3 = queue.Queue(num_batches)
# Initialize readers
self._readers = []
......@@ -86,7 +87,7 @@ class Iterator(mp.Process):
self._readers.append(dragon.io.DataReader(
part_idx=part_idx, num_parts=num_parts, **kwargs))
self._readers[i]._seed += part_idx
self._readers[i].q_out = self.q_in
self._readers[i].q_out = self._queue1
self._readers[i].start()
time.sleep(0.1)
......@@ -95,8 +96,7 @@ class Iterator(mp.Process):
for i in range(self._num_transformers):
p = data_transformer.DataTransformer(**kwargs)
p._seed += (i + rank * self._num_transformers)
p.q_in = self.q_in
p.q1_out, p.q2_out = self.q1_out, self.q2_out
p.q_in, p.q_out = self._queue1, self._queue2
p.start()
self._transformers.append(p)
time.sleep(0.1)
......@@ -119,38 +119,44 @@ class Iterator(mp.Process):
"""Return the next batch of data."""
return self.__next__()
def run(self):
"""Main loop."""
num_images = cfg.TRAIN.IMS_PER_BATCH
num_batches = cfg.TRAIN.ASPECT_GROUPING
logger.info('Initialize prefetching batches...')
example_buffer = [self._queue2.get()
for _ in range(num_images * num_batches)]
next_examples = []
while True:
# Use cached buffer for next N examples
# Examples are sorted to simulate aspect grouping
if len(next_examples) == 0:
next_examples = example_buffer
next_examples.sort(key=lambda d: d['aspect_ratio'])
example_buffer = []
# Prepare the next batch
outputs = collections.defaultdict(list)
for i in range(num_images):
example = next_examples.pop(0)
outputs['image'].append(example['image'])
outputs['gt_boxes'].append(example['boxes'])
outputs['gt_segms'].append(example['segms'])
outputs['im_info'].append(example['im_info'])
outputs['fg_inds'].append(example.get('fg_inds', None))
outputs['bg_inds'].append(example.get('bg_inds', None))
example_buffer.append(self._queue2.get())
outputs['image'] = blob_util.im_list_to_blob(
outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
# Send batch data to consumer
self._queue3.put(outputs)
def __iter__(self):
"""Return the iterator self."""
return self
def __next__(self):
"""Return the next batch of data."""
q_out = None
# Two queues to implement aspect-grouping
# This is necessary to reduce the gpu memory
# from fetching a huge square batch blob
while q_out is None:
if self.q1_out.qsize() >= cfg.TRAIN.IMS_PER_BATCH:
q_out = self.q1_out
elif self.q2_out.qsize() >= cfg.TRAIN.IMS_PER_BATCH:
q_out = self.q2_out
self.q1_out, self.q2_out = self.q2_out, self.q1_out
images, images_info = [], []
boxes_to_pack, masks_to_pack = [], []
for i in range(cfg.TRAIN.IMS_PER_BATCH):
image, image_scale, boxes, masks = q_out.get()
images.append(image)
images_info.append(list(image.shape[:2]) + [image_scale])
gt_boxes = np.zeros((boxes.shape[0], boxes.shape[1] + 1), 'float32')
gt_boxes[:, :boxes.shape[1]], gt_boxes[:, -1] = boxes, i
boxes_to_pack.append(gt_boxes)
masks_to_pack.append(masks)
return {
'data': im_list_to_blob(images),
'ims_info': np.array(images_info, 'float32'),
'gt_boxes': np.concatenate(boxes_to_pack),
'gt_masks': mask_list_to_blob(masks_to_pack),
}
return self._queue3.get()
......@@ -15,134 +15,136 @@ from __future__ import print_function
import multiprocessing
import cv2
import numpy as np
import numpy.random as npr
from seetadet.algo import common as algo_common
from seetadet.core.config import cfg
from seetadet.datasets.example import Example
from seetadet.pycocotools import mask_utils
from seetadet.utils.pycocotools import mask_utils
from seetadet.utils import boxes as box_util
from seetadet.utils.blob import prep_im_for_blob
from seetadet.utils import image as image_util
class DataTransformer(multiprocessing.Process):
def __init__(self, **kwargs):
super(DataTransformer, self).__init__()
self._scales = cfg.TRAIN.SCALES
self._random_scales = cfg.TRAIN.RANDOM_SCALES
self._max_size = cfg.TRAIN.MAX_SIZE
self._seed = cfg.RNG_SEED
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._use_diff = cfg.TRAIN.USE_DIFF
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._use_distort = cfg.TRAIN.USE_COLOR_JITTER
self._classes = kwargs.get('classes', ('__background__',))
self._num_classes = len(self._classes)
self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self.q_in = self.q1_out = self.q2_out = None
self._anchor_sampler = algo_common.AnchorSampler()
self.q_in = self.q_out = None
self.daemon = True
def make_roi_dict(self, example, im_scale, apply_flip=False):
objects, n_objects = example.objects, 0
def get_boxes_and_segms(self, example, im_scale, flipped):
objects, num_objects = example.objects, 0
height, width = example.height, example.width
if not self._use_diff:
for obj in objects:
if obj.get('difficult', 0) == 0:
n_objects += 1
num_objects += 1
else:
n_objects = len(objects)
num_objects = len(objects)
roi_dict = {
'boxes': np.zeros((n_objects, 4), 'float32'),
'masks': np.empty((n_objects, height, width), 'uint8'),
'gt_classes': np.zeros((n_objects, 1), 'int32'),
'mask_flags': np.ones((n_objects, 1), 'float32'),
}
boxes, segms = np.zeros((num_objects, 4), 'float32'), []
gt_classes = np.zeros((num_objects,), 'float32')
segm_flags = np.ones((num_objects,), 'float32')
# Filter the difficult instances
# Filter the difficult instances.
object_idx = 0
for obj in objects:
if not self._use_diff and \
obj.get('difficult', 0) > 0:
if not self._use_diff and obj.get('difficult', 0) > 0:
continue
bbox, mask = obj['bbox'], obj['mask']
roi_dict['boxes'][object_idx, :] = [
max(0, bbox[0]),
max(0, bbox[1]),
min(bbox[2], width - 1),
min(bbox[3], height - 1),
]
if mask is not None:
roi_dict['masks'][object_idx] = (
mask_utils.bytes2img(
obj['mask'],
height,
width,
))
bbox = obj['bbox']
boxes[object_idx, :] = [max(0, bbox[0]),
max(0, bbox[1]),
min(bbox[2], width - 1),
min(bbox[3], height - 1)]
if 'mask' in obj:
mask_img = mask_utils.bytes2img(obj['mask'], height, width)
segms.append(mask_img[:, ::-1] if flipped else mask_img)
elif 'polygons' in obj:
polygons = obj['polygons']
segms.append(box_util.flip_polygons(
polygons, width) if flipped else polygons)
else:
roi_dict['mask_flags'][object_idx] = 0.
roi_dict['gt_classes'][object_idx] = \
self._class_to_ind[obj['name']]
segms.append(None)
segm_flags[object_idx] = 0.
gt_classes[object_idx] = self._class_to_ind[obj['name']]
object_idx += 1
# Flip the boxes if necessary
if apply_flip:
roi_dict['boxes'] = \
box_util.flip_boxes(
roi_dict['boxes'],
width,
)
# Scale the boxes to the detecting scale.
boxes *= im_scale
# Scale the boxes to the detecting scale
roi_dict['boxes'] *= im_scale
# Attach the classes and mask flags.
gt_boxes = np.empty((num_objects, 6), dtype=np.float32)
gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
gt_boxes[:, 5] = segm_flags # Has segmentation or not.
return roi_dict
return gt_boxes, segms
def get(self, example):
example = Example(example)
img = example.image
# Scale
target_size = self._scales[np.random.randint(len(self._scales))]
img, im_scale = prep_im_for_blob(img, target_size, self._max_size)
# Flip
apply_flip = False
if self._use_flipped:
if np.random.randint(2) > 0:
img = img[:, ::-1]
apply_flip = True
# Example -> RoIDict
roi_dict = self.make_roi_dict(example, im_scale, apply_flip)
# Post-Process for gt boxes
# Shape like: [num_objects, {x1, y1, x2, y2, cls, flag}]
gt_boxes = \
np.concatenate([
roi_dict['boxes'],
roi_dict['gt_classes'],
roi_dict['mask_flags']
], axis=1)
# Post-Process for gt masks
# Shape like: [num_objects, im_h, im_w]
if gt_boxes.shape[0] > 0:
gt_masks = roi_dict['masks']
if apply_flip:
gt_masks = gt_masks[:, :, ::-1]
else:
gt_masks = None
return img, im_scale, gt_boxes, gt_masks
# Resize.
img, im_scale = image_util.resize_image_with_target_size(
example.image,
target_size=npr.choice(self._scales),
max_size=self._max_size,
random_scales=self._random_scales,
)
# Flip.
flipped = False
if self._use_flipped and npr.randint(2) > 0:
img = img[:, ::-1]
flipped = True
# Distort.
if self._use_distort:
img = image_util.distort_image(img)
# Boxes and segmentations.
boxes, segms = self.get_boxes_and_segms(example, im_scale, flipped)
# Flip the boxes if necessary.
if flipped:
boxes = box_util.flip_boxes(boxes, img.shape[1])
# Standard outputs.
outputs = {'image': img,
'boxes': boxes,
'segms': segms,
'im_info': img.shape[:2] + (im_scale,)}
# Attach precomputed targets.
if len(boxes) > 0:
outputs.update(
self._anchor_sampler(
gt_boxes=boxes,
im_info=outputs['im_info']))
return outputs
def run(self):
# Fix the process-local random seed
# Disable the opencv threading.
cv2.setNumThreads(1)
# Fix the process-local random seed.
np.random.seed(self._seed)
# Main prefetch loop
while True:
outputs = self.get(self.q_in.get())
if len(outputs[2]) < 1:
continue # Ignore the non-object image
aspect_ratio = float(outputs[0].shape[0]) / outputs[0].shape[1]
if aspect_ratio > 1.:
self.q1_out.put(outputs)
else:
self.q2_out.put(outputs)
if len(outputs['boxes']) < 1:
continue # Ignore non-object image.
height, width = outputs['image'].shape[:2]
outputs['aspect_ratio'] = float(height) / float(width)
self.q_out.put(outputs)
......@@ -31,7 +31,7 @@ class ProposalTarget(object):
def __init__(self):
super(ProposalTarget, self).__init__()
self.resolution = cfg.MRCNN.RESOLUTION
self.num_classes = cfg.MODEL.NUM_CLASSES
self.num_classes = len(cfg.MODEL.CLASSES)
self.defaults = collections.OrderedDict([
('rois', np.array([[-1, 0, 0, 1, 1]], 'float32')),
('labels', np.array([-1], 'int64')),
......@@ -39,18 +39,10 @@ class ProposalTarget(object):
('mask_targets', -np.ones((1, self.resolution, self.resolution), 'float32')),
])
def __call__(self, rpn_rois, gt_boxes, gt_masks, ims_info):
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
# Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
all_rois = rpn_rois
# GT boxes (x1, y1, x2, y2, label)
# GT masks (num_objects, im_h, im_w)
gt_boxes_wide, gt_masks_wide = \
mask_util.dismantle_masks(
gt_boxes,
gt_masks,
num_images,
)
all_rois = inputs['rois']
# Prepare for the outputs
keys = self.defaults.keys()
......@@ -58,24 +50,25 @@ class ProposalTarget(object):
# Generate targets separately
for ix in range(num_images):
gt_boxes = gt_boxes_wide[ix]
gt_masks = gt_masks_wide[ix]
# GT boxes (x1, y1, x2, y2, label)
gt_boxes = inputs['gt_boxes'][ix]
gt_segms = inputs['gt_segms'][ix]
# Extract proposals for this image
rois = all_rois[np.where(all_rois[:, 0].astype('int32') == ix)[0]]
# Include ground-truth boxes in the set of candidate rois
inds = np.ones((gt_boxes.shape[0], 1), gt_boxes.dtype) * ix
rois = np.vstack((rois, np.hstack((inds, gt_boxes[:, :4]))))
# Sample a batch of RoIs for training
rois_per_image = cfg.TRAIN.BATCH_SIZE
fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)
rois_per_image = cfg.FRCNN.BATCH_SIZE
fg_rois_per_image = np.round(cfg.FRCNN.FG_FRACTION * rois_per_image)
rcnn_util.map_returns_to_blobs(
sample_rois(
rois,
gt_boxes,
gt_masks,
gt_segms,
rois_per_image,
fg_rois_per_image,
ims_info[ix][2],
inputs['im_info'][ix][2],
), blobs, keys,
)
......@@ -122,10 +115,10 @@ class ProposalTarget(object):
'rois': [new_tensor(rois_wide[i]) for i in range(num_levels)],
'mask_rois': [new_tensor(mask_rois_wide[i]) for i in range(num_levels)],
'labels': new_tensor(blobs['labels']),
'bbox_indices': new_tensor(bbox_cls_inds[fg_inds] + blobs['labels'][fg_inds]),
'bbox_inds': new_tensor(bbox_cls_inds[fg_inds] + blobs['labels'][fg_inds]),
'bbox_targets': new_tensor(blobs['bbox_targets'][fg_inds].astype('float32')),
'bbox_anchors': new_tensor(blobs['rois'][fg_inds, 1:].astype('float32')),
'mask_indices': new_tensor(mask_cls_inds + mask_labels),
'mask_inds': new_tensor(mask_cls_inds + mask_labels),
'mask_targets': new_tensor(blobs['mask_targets']),
}
......@@ -134,7 +127,7 @@ def compute_targets(
ex_rois,
gt_rois,
gt_labels,
gt_masks,
gt_segms,
mask_flags,
mask_size,
im_scale,
......@@ -150,29 +143,25 @@ def compute_targets(
# Compute mask classification targets
mask_shape = [mask_size] * 2
ex_rois_ori = np.round(ex_rois / im_scale).astype(int)
gt_rois_ori = np.round(gt_rois / im_scale).astype(int)
mask_targets = -np.ones([len(gt_labels)] + mask_shape, 'float32')
for i in fg_inds:
if mask_flags[i] > 0:
box_mask = \
mask_util.intersect_box_mask(
ex_rois_ori[i],
gt_rois_ori[i],
gt_masks[i],
)
if box_mask is not None:
mask_targets[i] = \
mask_util.resize_mask(
mask=box_mask,
size=mask_shape,
)
if isinstance(gt_segms[i], list):
ret = mask_util.warp_mask_via_polygons(
gt_segms[i], ex_rois_ori[i], mask_shape)
else:
gt_rois_ori = np.round(gt_rois / im_scale).astype(int)
ret = mask_util.warp_mask_via_intersection(
gt_segms[i], ex_rois_ori[i], gt_rois_ori[i], mask_shape)
if ret is not None:
mask_targets[i] = ret.astype('float32')
return bbox_targets, mask_targets
def sample_rois(
all_rois,
gt_boxes,
gt_masks,
gt_segms,
num_rois,
num_fg_rois,
im_scale,
......@@ -184,15 +173,15 @@ def sample_rois(
labels = gt_boxes[gt_assignment, 4].astype('int64')
# Select foreground RoIs as those with >= FG_THRESH overlap
fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]
fg_inds = np.where(max_overlaps >= cfg.FRCNN.POSITIVE_OVERLAP)[0]
fg_rois_per_this_image = int(min(num_fg_rois, fg_inds.size))
# Sample foreground regions without replacement
if fg_inds.size > 0:
fg_inds = npr.choice(fg_inds, fg_rois_per_this_image, False)
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
(max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
bg_inds = np.where((max_overlaps < cfg.FRCNN.NEGATIVE_OVERLAP_HI) &
(max_overlaps >= cfg.FRCNN.NEGATIVE_OVERLAP_LO))[0]
# Compute number of background RoIs to take from this image
bg_rois_per_this_image = num_rois - fg_rois_per_this_image
bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size)
......@@ -213,7 +202,7 @@ def sample_rois(
rois[:, 1:5],
gt_boxes[gt_assignment[keep_inds], :4],
labels,
gt_masks[gt_assignment[fg_inds]],
[gt_segms[i] for i in gt_assignment[fg_inds]],
gt_boxes[gt_assignment[fg_inds], 5],
cfg.MRCNN.RESOLUTION,
im_scale,
......
......@@ -13,13 +13,15 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import math
import numpy as np
from seetadet.algo.faster_rcnn import generate_anchors as anchor_util
from seetadet.algo.faster_rcnn import utils as rcnn_util
from seetadet.core.config import cfg
from seetadet.algo.faster_rcnn.generate_anchors import generate_anchors_v2
from seetadet.algo.faster_rcnn.utils import generate_grid_anchors
from seetadet.utils import boxes as box_util
from seetadet.utils import logger
from seetadet.utils.env import new_tensor
......@@ -41,95 +43,113 @@ class AnchorTarget(object):
(2 ** (octave / float(scales_per_octave)))
for octave in range(scales_per_octave)]
self.base_anchors.append(
generate_anchors_v2(
anchor_util.generate_anchors_v2(
stride=stride,
ratios=self.ratios,
sizes=sizes,
))
# Store the cached grid anchors
self.last_grid_shapes = None
self.last_grid_anchors = None
sizes=sizes))
# Plan the maximum anchor layout
max_size = cfg.TRAIN.MAX_SIZE
if max_size == 0:
max_size = cfg.TRAIN.SCALES[0]
if cfg.MODEL.COARSEST_STRIDE > 0:
stride = float(cfg.MODEL.COARSEST_STRIDE)
max_size = int(math.ceil(max_size / stride) * stride)
self.max_shapes = [[math.ceil(max_size / stride)] * 2
for stride in self.strides]
self.all_coords = rcnn_util.get_shifted_coords(
self.max_shapes, self.base_anchors)
self.all_anchors = rcnn_util.get_shifted_anchors(
self.max_shapes, self.base_anchors, self.strides)
def sample_anchors(self, gt_boxes, im_info, all_anchors=None):
all_anchors = self.all_anchors \
if all_anchors is None else all_anchors
# Remove anchors separating from the image
inds_inside = np.where((all_anchors[:, 0] < im_info[1]) &
(all_anchors[:, 1] < im_info[0]))[0]
anchors = all_anchors[inds_inside, :]
num_inside = len(anchors)
labels = np.empty((num_inside,), dtype='int32')
labels.fill(-1)
# Overlaps between the anchors and the gt boxes.
overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
argmax_overlaps = overlaps.argmax(axis=1)
max_overlaps = overlaps[np.arange(num_inside), argmax_overlaps]
# Foreground: for each gt, anchor with highest overlap.
gt_argmax_overlaps = overlaps.argmax(axis=0)
gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])]
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
gt_assignment = argmax_overlaps[gt_argmax_overlaps]
labels[gt_argmax_overlaps] = gt_boxes[gt_assignment, 4]
# Foreground: above threshold IoU.
inds = max_overlaps >= cfg.RETINANET.POSITIVE_OVERLAP
gt_assignment = argmax_overlaps[inds]
labels[inds] = gt_boxes[gt_assignment, 4]
# Background: below threshold IoU.
labels[max_overlaps < cfg.RETINANET.NEGATIVE_OVERLAP] = 0
# Retract the clamping if we don't have one.
fg_inds = np.where(labels > 0)[0]
if len(fg_inds) == 0:
gt_assignment = argmax_overlaps[gt_argmax_overlaps]
labels[gt_argmax_overlaps] = gt_boxes[gt_assignment, 4]
fg_inds = np.where(labels > 0)[0]
# Select ignore labels to avoid too many negatives
# (~100x faster for 200 background indices)
ignore_inds = np.where(labels < 0)[0]
return inds_inside[fg_inds], inds_inside[ignore_inds]
def __call__(self, features, gt_boxes):
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
gt_boxes_wide = box_util.dismantle_boxes(gt_boxes, num_images)
if len(gt_boxes_wide) != num_images:
logger.fatal(
'Input {} images, got {} slices of gt boxes.'
.format(num_images, len(gt_boxes_wide))
)
# Generate grid anchors from base
grid_shapes = [f.shape[-2:] for f in features]
if grid_shapes == self.last_grid_shapes:
all_anchors = self.last_grid_anchors
else:
self.last_grid_shapes = grid_shapes
self.last_grid_anchors = all_anchors = \
generate_grid_anchors(
grid_shapes,
self.base_anchors,
self.strides,
)
num_anchors = all_anchors.shape[0]
shapes = [f.shape[-2:] for f in inputs['features']]
image_stride = sum(self.base_anchors[i].shape[0] * np.prod(shapes[i])
for i in range(len(inputs['features'])))
# Label: ``1`` is positive, ``0`` is negative, ``-1` is don't care
labels_wide = -np.ones((num_images, num_anchors,), 'int64')
bbox_indices_wide, bbox_anchors_wide, bbox_targets_wide = [], [], []
narrow_args = [self.all_coords, self.base_anchors, self.max_shapes, shapes]
outputs = collections.defaultdict(list)
# Different from R-CNN, all anchors will be used
inds_inside, anchors = np.arange(num_anchors), all_anchors
num_inside = len(inds_inside)
# Label: ``1`` is positive, ``0`` is negative, ``-1` is don't care
output_labels = np.zeros((num_images, image_stride,), 'int64')
for ix in range(num_images):
# GT boxes (x1, y1, x2, y2, label)
gt_boxes = gt_boxes_wide[ix]
# label: 1 is positive, 0 is negative, -1 is don't care
labels = np.empty((num_inside,), dtype='int64')
labels.fill(-1)
# Overlaps between the anchors and the gt boxes
overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
argmax_overlaps = overlaps.argmax(1)
max_overlaps = overlaps[np.arange(num_inside), argmax_overlaps]
# Foreground: for each gt, anchor with highest overlap
gt_argmax_overlaps = overlaps.argmax(0)
gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])]
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
gt_inds = argmax_overlaps[gt_argmax_overlaps]
labels[gt_argmax_overlaps] = gt_boxes[gt_inds, 4]
# Foreground: above threshold IoU
inds = max_overlaps >= cfg.RETINANET.POSITIVE_OVERLAP
gt_inds = argmax_overlaps[inds]
labels[inds] = gt_boxes[gt_inds, 4]
fg_inds = np.where(labels > 0)[0]
# Background: below threshold IoU
labels[max_overlaps < cfg.RETINANET.NEGATIVE_OVERLAP] = 0
# Retract the clamping if we don't have one
if len(fg_inds) == 0:
gt_inds = argmax_overlaps[gt_argmax_overlaps]
labels[gt_argmax_overlaps] = gt_boxes[gt_inds, 4]
fg_inds = np.where(labels > 0)[0]
labels_wide[ix, inds_inside] = labels
bbox_anchors_wide.append(anchors[fg_inds])
bbox_indices_wide.append(fg_inds + (num_anchors * ix))
bbox_targets_wide.append(
box_util.bbox_transform(
anchors[fg_inds],
gt_boxes[argmax_overlaps[fg_inds], :4],
)
)
fg_inds = inputs['fg_inds'][ix]
ignore_inds = inputs['bg_inds'][ix]
gt_boxes = inputs['gt_boxes'][ix]
# Narrow anchors to match the feature layout
anchors = self.all_anchors[fg_inds]
ignore_inds = rcnn_util.narrow_anchors(*(narrow_args + [ignore_inds]))
_, anchors = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds, anchors]))
fg_inds = rcnn_util.narrow_anchors(*(narrow_args + [fg_inds]))
# Compute bbox targets
gt_assignment = box_util.bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
bbox_targets = box_util.bbox_transform(anchors, gt_boxes[gt_assignment, :4])
outputs['bbox_anchors'].append(anchors)
outputs['bbox_targets'].append(bbox_targets)
# Compute label assignments
output_labels[ix, ignore_inds] = -1
output_labels[ix, fg_inds] = gt_boxes[gt_assignment, 4]
# Compute sparse indices
fg_inds += ix * image_stride
outputs['bbox_inds'].extend([fg_inds])
return {
'labels': new_tensor(labels_wide),
'bbox_indices': new_tensor(np.concatenate(bbox_indices_wide)),
'bbox_anchors': new_tensor(np.concatenate(bbox_anchors_wide).astype('float32')),
'bbox_targets': new_tensor(np.concatenate(bbox_targets_wide).astype('float32')),
'labels': new_tensor(output_labels),
'bbox_inds': new_tensor(
np.concatenate(outputs['bbox_inds'])),
'bbox_targets': new_tensor(
np.concatenate(outputs['bbox_targets']).astype('float32')),
'bbox_anchors': new_tensor(
np.concatenate(outputs['bbox_anchors']).astype('float32')),
}
......@@ -22,7 +22,10 @@ class DataLoader(object):
"""Provide mini-batches of data."""
def __new__(cls):
if cfg.TRAIN.MAX_SIZE > 0:
pipeline_type = cfg.PIPELINE.TYPE.lower()
if pipeline_type == 'default' or pipeline_type == 'rcnn':
return faster_rcnn.DataLoader()
else:
elif pipeline_type == 'ssd':
return ssd.DataLoader()
else:
raise ValueError('Unsupported pipeline: ' + pipeline_type)
......@@ -20,60 +20,79 @@ import numpy as np
from seetadet.core.config import cfg
from seetadet.modeling.detector import new_detector
from seetadet.utils import blob as blob_util
from seetadet.utils import image as image_util
from seetadet.utils import logger
from seetadet.utils import nms as nms_util
from seetadet.utils import time_util
from seetadet.utils.blob import im_list_to_blob
from seetadet.utils.image import scale_image
def ims_detect(detector, raw_images):
"""Detect images, with single or multiple scales."""
ims, ims_scale = [], []
for i in range(len(raw_images)):
im, im_scale = scale_image(raw_images[i])
ims += im
ims_scale += im_scale
num_scales = len(ims_scale) // len(raw_images)
ims_shape = np.array([im.shape[:2] for im in ims])
ims_scale = np.array(ims_scale).reshape((len(ims), -1))
# Prepare blobs
data = im_list_to_blob(ims)
ims_info = np.hstack([ims_shape, ims_scale]).astype('float32')
def get_data(raw_images):
"""Return the test data."""
max_size = cfg.TEST.MAX_SIZE
if cfg.PIPELINE.TYPE.lower() == 'ssd':
max_size = 0 # Warped to a fixed size
images_wide = []
image_shapes_wide, image_scales_wide = [], []
for img in raw_images:
images, image_scales = image_util.scale_image(
img, scales=cfg.TEST.SCALES, max_size=max_size)
images_wide += images
image_scales_wide += image_scales
image_shapes_wide += [img.shape[:2] for img in images]
images = blob_util.im_list_to_blob(
images_wide, coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
image_shapes = np.array(image_shapes_wide)
image_scales = np.array(image_scales_wide).reshape((len(images), -1))
images_info = np.hstack([image_shapes, image_scales]).astype('float32')
return images, images_info
def ims_detect(detector, raw_images, timer=None):
"""Detect images at single or multiple scales."""
images, images_info = get_data(raw_images)
timer.tic() if timer else timer
# Do Forward
data = torch.from_numpy(data)
ims_info = torch.from_numpy(ims_info)
inputs = {'image': torch.from_numpy(images),
'im_info': torch.from_numpy(images_info)}
# with torch.no_grad():
# outputs = detector.forward(inputs)
if not hasattr(detector, 'script_forward'):
def script_forward(self, data, ims_info):
return self.forward({'data': data, 'ims_info': ims_info})
def script_forward(self, image, im_info):
return self.forward({'image': image, 'im_info': im_info})
detector.script_forward = torch.jit.trace(
func=types.MethodType(script_forward, detector),
example_inputs=[data, ims_info],
example_inputs=[inputs['image'], inputs['im_info']],
)
outputs = detector.script_forward(data, ims_info)
outputs = detector.script_forward(inputs['image'], inputs['im_info'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
# Unpack results
results = outputs['detections']
detections = [[] for _ in range(len(raw_images))]
for i in range(len(ims)):
inds = np.where(results[:, 0].astype(np.int32) == i)[0]
detections[i // num_scales].append(results[inds, 1:])
return [np.vstack(detections[i]) for i in range(len(raw_images))]
def test_net(weights, num_classes, q_in, q_out, device):
num_classes, cfg.GPU_ID = num_classes, device
timer.toc() if timer else timer
# Decode results
detections = outputs['detections']
results = [[] for _ in range(len(raw_images))]
for i in range(len(images)):
inds = np.where(detections[:, 0].astype(np.int32) == i)[0]
results[i // len(cfg.TEST.SCALES)].append(detections[inds, 1:])
# Merge from multiple scales
ret = [np.vstack(d) for d in results]
timer.toc() if timer else timer
return ret
def test_net(weights, q_in, q_out, device, root_logger=True):
"""Test a network trained with RetinaNet algorithm."""
cfg.GPU_ID = device
num_classes = len(cfg.MODEL.CLASSES)
logger.set_root_logger(root_logger)
detector = new_detector(device, weights)
must_stop = False
_t = time_util.new_timers('im_detect', 'misc')
timers = time_util.new_timers('im_detect_bbox', 'misc')
empty_detections = np.zeros((0, 5), 'float32')
while True:
if must_stop:
......@@ -91,17 +110,19 @@ def test_net(weights, num_classes, q_in, q_out, device):
continue
# Run detecting on specific scales
with _t['im_detect'].tic_and_toc():
results = ims_detect(detector, raw_images)
results = ims_detect(detector, raw_images, timers['im_detect_bbox'])
# Post-Processing
# Post-processing
for i, detections in enumerate(results):
_t['misc'].tic()
timers['misc'].tic()
boxes_this_image = [[]]
# {x1, y1, x2, y2, score, cls}
# Detection format: (x1, y1, x2, y2, score, cls)
detections = np.array(detections)
for j in range(1, num_classes):
cls_indices = np.where(detections[:, 5].astype(np.int32) == j)[0]
if len(cls_indices) == 0:
boxes_this_image.append(empty_detections)
continue
cls_boxes = detections[cls_indices, :4]
cls_scores = detections[cls_indices, 4]
cls_detections = np.hstack((
......@@ -121,11 +142,11 @@ def test_net(weights, num_classes, q_in, q_out, device):
)
cls_detections = cls_detections[keep, :]
boxes_this_image.append(cls_detections)
_t['misc'].toc()
timers['misc'].toc()
q_out.put((
indices[i],
dict([('im_detect', _t['im_detect'].average_time),
('misc', _t['misc'].average_time)]),
dict([('im_detect', timers['im_detect_bbox'].average_time),
('misc', timers['misc'].average_time)]),
dict([('boxes', boxes_this_image)]),
))
......@@ -14,7 +14,4 @@ from __future__ import division
from __future__ import print_function
from seetadet.algo.ssd.data_loader import DataLoader
from seetadet.algo.ssd.hard_mining import HardMining
from seetadet.algo.ssd.multibox import MultiBoxMatch
from seetadet.algo.ssd.multibox import MultiBoxTarget
from seetadet.algo.ssd.priorbox import PriorBox
from seetadet.algo.ssd.anchor_target import AnchorTarget
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import math
import numpy as np
from seetadet.algo.ssd import generate_anchors as anchor_util
from seetadet.algo.ssd import utils as ssd_util
from seetadet.core.config import cfg
from seetadet.utils import boxes as box_util
from seetadet.utils.env import new_tensor
class AnchorTarget(object):
"""Assign ground-truth targets to anchors."""
def __init__(self):
super(AnchorTarget, self).__init__()
# Load the basic configs
self.strides = cfg.SSD.STRIDES
anchor_sizes = cfg.SSD.ANCHOR_SIZES
aspect_ratios = cfg.SSD.ASPECT_RATIOS
self.base_anchors = []
for i in range(len(anchor_sizes)):
ratios = aspect_ratios[i]
if not isinstance(ratios, (tuple, list)):
# All strides share the same ratios
ratios = aspect_ratios
self.base_anchors.append(
anchor_util.generate_anchors(
min_sizes=[anchor_sizes[i][0]],
max_sizes=[anchor_sizes[i][1]],
ratios=ratios))
# Plan the fixed anchor layout
max_size = cfg.TRAIN.SCALES[0]
if cfg.MODEL.COARSEST_STRIDE > 0:
stride = float(cfg.MODEL.COARSEST_STRIDE)
max_size = int(math.ceil(max_size / stride) * stride)
shapes = [[math.ceil(max_size / stride)] * 2
for stride in self.strides]
self.all_anchors = ssd_util.get_shifted_anchors(
shapes, self.base_anchors, self.strides)
def sample_anchors(self, gt_boxes, all_anchors=None):
anchors = self.all_anchors \
if all_anchors is None else all_anchors
num_anchors = len(anchors)
labels = np.empty((num_anchors,), dtype='int32')
labels.fill(-1)
# Overlaps between the anchors and the gt boxes.
overlaps = box_util.bbox_overlaps(anchors, gt_boxes)
argmax_overlaps = overlaps.argmax(axis=1)
max_overlaps = overlaps[np.arange(num_anchors), argmax_overlaps]
# Foreground: for each gt, anchor with highest overlap.
gt_argmax_overlaps = overlaps.argmax(axis=0)
gt_assignment = argmax_overlaps[gt_argmax_overlaps]
labels[gt_argmax_overlaps] = gt_boxes[gt_assignment, 4]
# Foreground: above threshold IoU.
inds = max_overlaps >= cfg.SSD.POSITIVE_OVERLAP
gt_assignment = argmax_overlaps[inds]
labels[inds] = gt_boxes[gt_assignment, 4]
fg_inds = np.where(labels > 0)[0]
# Negative: not matched and below threshold IoU.
neg_inds = np.where(labels <= 0)[0]
neg_overlaps = max_overlaps[neg_inds]
eligible_neg_inds = np.where(neg_overlaps < cfg.SSD.NEGATIVE_OVERLAP)[0]
neg_inds = neg_inds[eligible_neg_inds]
return fg_inds, neg_inds
def __call__(self, **inputs):
num_images = cfg.TRAIN.IMS_PER_BATCH
neg_pos_ratio = cfg.SSD.NEGATIVE_POSITIVE_RATIO
image_stride = self.all_anchors.shape[0]
cls_prob = inputs['cls_prob'].numpy()
outputs = collections.defaultdict(list)
# Label: ``1`` is positive, ``0`` is negative, ``-1` is don't care
output_labels = np.empty((num_images, image_stride,), 'int64')
output_labels.fill(-1)
for ix in range(num_images):
fg_inds = inputs['fg_inds'][ix]
neg_inds = inputs['bg_inds'][ix]
gt_boxes = inputs['gt_boxes'][ix]
# Mining hard negatives as background.
num_pos, num_neg = len(fg_inds), len(neg_inds)
num_bg = min(int(num_pos * neg_pos_ratio), num_neg)
neg_loss = -np.log(np.maximum(
cls_prob[ix, neg_inds][np.arange(num_neg),
np.zeros((num_neg,), 'int32')],
np.finfo(float).eps))
bg_inds = neg_inds[np.argsort(-neg_loss)][:num_bg]
# Compute bbox targets.
anchors = self.all_anchors[fg_inds]
gt_assignment = box_util.bbox_overlaps(
anchors, gt_boxes).argmax(axis=1)
bbox_targets = box_util.bbox_transform(
anchors, gt_boxes[gt_assignment, :4],
cfg.BBOX_REG_WEIGHTS)
outputs['bbox_anchors'].append(anchors)
outputs['bbox_targets'].append(bbox_targets)
# Compute label assignments.
output_labels[ix, bg_inds] = 0
output_labels[ix, fg_inds] = gt_boxes[gt_assignment, 4]
# Compute sparse indices.
fg_inds += ix * image_stride
outputs['bbox_inds'].extend([fg_inds])
return {
'labels': new_tensor(output_labels),
'bbox_inds': new_tensor(
np.concatenate(outputs['bbox_inds'])),
'bbox_targets': new_tensor(
np.concatenate(outputs['bbox_targets']).astype('float32')),
'bbox_anchors': new_tensor(
np.concatenate(outputs['bbox_anchors']).astype('float32')),
}
......@@ -13,8 +13,11 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import multiprocessing as mp
import time
import threading
import queue
import dragon
import dragon.vm.torch as torch
......@@ -23,6 +26,7 @@ import numpy as np
from seetadet.algo.ssd import data_transformer
from seetadet.core.config import cfg
from seetadet.datasets.factory import get_dataset
from seetadet.utils import blob as blob_util
from seetadet.utils import logger
......@@ -32,28 +36,24 @@ class DataLoader(object):
def __init__(self):
super(DataLoader, self).__init__()
dataset = get_dataset(cfg.TRAIN.DATASET)
if cfg.USE_DALI:
from seetadet.dali import ssd_pipeline as pipe
self.iterator = pipe.new_iterator(dataset.source)
else:
self.iterator = Iterator(**{
'dataset': dataset.cls,
'source': dataset.source,
'classes': dataset.classes,
'shuffle': cfg.TRAIN.USE_SHUFFLE,
'num_chunks': cfg.TRAIN.SHUFFLE_CHUNKS,
'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
})
self.iterator = Iterator(**{
'dataset': dataset.cls,
'source': dataset.source,
'classes': dataset.classes,
'shuffle': cfg.TRAIN.USE_SHUFFLE,
'batch_size': cfg.TRAIN.IMS_PER_BATCH * 2,
'num_transformers': cfg.TRAIN.NUM_THREADS - 1,
})
self.iterator.start()
def __call__(self):
outputs = self.iterator.next()
if isinstance(outputs['data'], np.ndarray):
outputs['data'] = torch.from_numpy(outputs['data'])
if isinstance(outputs['image'], np.ndarray):
outputs['image'] = torch.from_numpy(outputs['image'])
return outputs
class Iterator(object):
class Iterator(threading.Thread):
"""Iterator to return the batch of data."""
def __init__(self, **kwargs):
......@@ -67,15 +67,16 @@ class Iterator(object):
rank = dragon.distributed.get_rank(process_group)
# Configuration
self._prefetch = kwargs.get('prefetch', 5)
self._batch_size = kwargs.get('batch_size', 32)
self._batch_size = kwargs.get('batch_size', 8)
self._num_readers = kwargs.get('num_readers', 1)
self._num_transformers = kwargs.get('num_transformers', 3)
self.daemon = True
# Initialize queues
num_batches = self._prefetch * self._num_readers
self.q_in = mp.Queue(num_batches * self._batch_size)
self.q_out = mp.Queue(num_batches * self._batch_size)
num_batches = self._num_readers
self._queue1 = mp.Queue(num_batches * self._batch_size)
self._queue2 = mp.Queue(num_batches * self._batch_size)
self._queue3 = queue.Queue(num_batches)
# Initialize readers
self._readers = []
......@@ -86,7 +87,7 @@ class Iterator(object):
self._readers.append(dragon.io.DataReader(
part_idx=part_idx, num_parts=num_parts, **kwargs))
self._readers[i]._seed += part_idx
self._readers[i].q_out = self.q_in
self._readers[i].q_out = self._queue1
self._readers[i].start()
time.sleep(0.1)
......@@ -95,7 +96,7 @@ class Iterator(object):
for i in range(self._num_transformers):
p = data_transformer.DataTransformer(**kwargs)
p._seed += (i + rank * self._num_transformers)
p.q_in, p.q_out = self.q_in, self.q_out
p.q_in, p.q_out = self._queue1, self._queue2
p.start()
self._transformers.append(p)
time.sleep(0.1)
......@@ -118,26 +119,41 @@ class Iterator(object):
"""Return the next batch of data."""
return self.__next__()
def run(self):
"""Main loop."""
num_images = cfg.TRAIN.IMS_PER_BATCH
num_batches = cfg.TRAIN.ASPECT_GROUPING
logger.info('Initialize prefetching batches...')
example_buffer = [self._queue2.get()
for _ in range(num_images * num_batches)]
next_examples = []
while True:
# Use cached buffer for next N examples
if len(next_examples) == 0:
next_examples = example_buffer
example_buffer = []
# Prepare the next batch
outputs = collections.defaultdict(list)
for i in range(num_images):
example = next_examples.pop(0)
outputs['image'].append(example['image'])
outputs['gt_boxes'].append(example['boxes'])
outputs['im_info'].append(example['im_info'])
outputs['fg_inds'].append(example.get('fg_inds', None))
outputs['bg_inds'].append(example.get('bg_inds', None))
example_buffer.append(self._queue2.get())
outputs['image'] = blob_util.im_list_to_blob(
outputs['image'], coarsest_stride=cfg.MODEL.COARSEST_STRIDE)
# Send batch data to consumer
self._queue3.put(outputs)
def __iter__(self):
"""Return the iterator self."""
return self
def __next__(self):
"""Return the next batch of data."""
n = cfg.TRAIN.IMS_PER_BATCH
h = w = cfg.TRAIN.SCALES[0]
boxes_to_pack = []
image, boxes = self.q_out.get()
images = np.zeros((n, h, w, 3), image.dtype)
for i in range(n):
images[i] = image
gt_boxes = np.zeros((boxes.shape[0], boxes.shape[1] + 1), 'float32')
gt_boxes[:, :boxes.shape[1]], gt_boxes[:, -1] = boxes, i
boxes_to_pack.append(gt_boxes)
if i != (cfg.TRAIN.IMS_PER_BATCH - 1):
image, boxes = self.q_out.get()
boxes_to_pack = np.concatenate(boxes_to_pack)
return {'data': images, 'gt_boxes': boxes_to_pack}
return self._queue3.get()
......@@ -14,8 +14,12 @@ from __future__ import division
from __future__ import print_function
import multiprocessing
import cv2
import numpy as np
import numpy.random as npr
from seetadet.algo import common as algo_common
from seetadet.algo.ssd import transforms
from seetadet.core.config import cfg
from seetadet.datasets.example import Example
......@@ -27,108 +31,95 @@ class DataTransformer(multiprocessing.Process):
super(DataTransformer, self).__init__()
self._scale = cfg.TRAIN.SCALES[0]
self._seed = cfg.RNG_SEED
self._mirror = cfg.TRAIN.USE_FLIPPED
self._use_diff = cfg.TRAIN.USE_DIFF
self._use_flipped = cfg.TRAIN.USE_FLIPPED
self._classes = kwargs.get('classes', ('__background__',))
self._num_classes = len(self._classes)
self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self.augment_image = \
transforms.Compose(
transforms.Distort(), # Color augmentation
transforms.Expand(), # Expand and padding
transforms.Sample(), # Sample a patch randomly
transforms.Resize(), # Resize to a fixed scale
)
self._anchor_sampler = algo_common.AnchorSampler()
self._apply_transform = transforms.Compose(transforms.Distort(),
transforms.Expand(),
transforms.Sample(),
transforms.Resize())
self.q_in = self.q_out = None
self.daemon = True
def make_roi_dict(self, example, apply_flip=False):
objects, n_objects = example.objects, 0
def get_boxes(self, example):
objects, num_objects = example.objects, 0
height, width = example.height, example.width
if not self._use_diff:
for obj in objects:
if obj.get('difficult', 0) == 0:
n_objects += 1
num_objects += 1
else:
n_objects = len(objects)
num_objects = len(objects)
roi_dict = {
'boxes': np.zeros((n_objects, 4), 'float32'),
'gt_classes': np.zeros((n_objects,), 'int32'),
}
boxes = np.zeros((num_objects, 4), 'float32')
gt_classes = np.zeros((num_objects,), 'int32')
# Filter the difficult instances
# Filter the difficult instances.
object_idx = 0
for obj in objects:
if not self._use_diff and \
obj.get('difficult', 0) > 0:
if not self._use_diff and obj.get('difficult', 0) > 0:
continue
bbox = obj['bbox']
roi_dict['boxes'][object_idx, :] = [
max(0, bbox[0]),
max(0, bbox[1]),
min(bbox[2], width - 1),
min(bbox[3], height - 1),
]
roi_dict['gt_classes'][object_idx] = \
self._class_to_ind[obj['name']]
boxes[object_idx, :] = [max(0, bbox[0]),
max(0, bbox[1]),
min(bbox[2], width - 1),
min(bbox[3], height - 1)]
gt_classes[object_idx] = self._class_to_ind[obj['name']]
object_idx += 1
if apply_flip:
roi_dict['boxes'] = \
box_util.flip_boxes(
roi_dict['boxes'],
width,
)
# Normalize.
boxes[:, 0::2] /= width
boxes[:, 1::2] /= height
# Normalize to unit sizes
roi_dict['boxes'][:, 0::2] /= width
roi_dict['boxes'][:, 1::2] /= height
# Attach the classes.
gt_boxes = np.empty((num_objects, 5), dtype=np.float32)
gt_boxes[:, :4], gt_boxes[:, 4] = boxes, gt_classes
return roi_dict
return gt_boxes
def get(self, example):
example = Example(example)
img = example.image
# Flip
apply_flip = False
if self._mirror:
if np.random.randint(2) > 0:
img = img[:, ::-1]
apply_flip = True
# Example -> RoIDict
roi_dict = self.make_roi_dict(example, apply_flip)
# Boxes.
boxes = self.get_boxes(example)
if len(boxes) == 0:
return {'boxes': boxes}
# Post-Process for gt boxes
# Shape like: [num_objects, {x1, y1, x2, y2, cls}]
gt_boxes = np.empty((roi_dict['gt_classes'].size, 5), 'float32')
gt_boxes[:, :4], gt_boxes[:, 4] = roi_dict['boxes'], roi_dict['gt_classes']
# Distort => Expand => Sample => Resize
img, boxes = self._apply_transform(example.image, boxes)
if len(gt_boxes) == 0:
# Ignore the non-object image
return img, gt_boxes
# Restore to the blob scale.
boxes[:, :4] *= self._scale
# Distort => Expand => Sample => Resize
img, gt_boxes = self.augment_image(img, gt_boxes)
# Flip.
if self._use_flipped and npr.randint(2) > 0:
img = img[:, ::-1]
boxes = box_util.flip_boxes(boxes, img.shape[1])
# Restore to the blob scale
gt_boxes[:, :4] *= self._scale
# Standard outputs.
outputs = {'image': img, 'boxes': boxes, 'im_info': img.shape[:2]}
# Post-Process for image
if img.dtype == 'uint16':
img = img.astype('float32') / 256.
# Attach precomputed targets.
if len(boxes) > 0:
outputs.update(
self._anchor_sampler(
gt_boxes=boxes,
im_info=outputs['im_info']))
return img, gt_boxes
return outputs
def run(self):
# Fix the process-local random seed
# Disable the opencv threading.
cv2.setNumThreads(1)
# Fix the process-local random seed.
np.random.seed(self._seed)
# Main prefetch loop
while True:
outputs = self.get(self.q_in.get())
if len(outputs[1]) < 1:
continue # Ignore the non-object image
if len(outputs['boxes']) < 1:
continue # Ignore non-object image.
self.q_out.put(outputs)
This diff is collapsed. Click to expand it.
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!