Commit d0fa332c by Ting PAN

Select pybind11 to expose the C++ API

1 parent 1d03e8e2
Showing with 1363 additions and 1377 deletions
...@@ -10,3 +10,6 @@ ...@@ -10,3 +10,6 @@
[submodule "ThirdParty/cub"] [submodule "ThirdParty/cub"]
path = ThirdParty/cub path = ThirdParty/cub
url = https://github.com/NVlabs/cub url = https://github.com/NVlabs/cub
[submodule "ThirdParty/pybind11"]
path = ThirdParty/pybind11
url = https://github.com/pybind/pybind11
------------------------------------------------------------------------ ------------------------------------------------------------------------
The list of most significant changes made over time in Dragon. The list of most significant changes made over time in Dragon.
Dragon 0.3.0.0 (20190110) Dragon 0.3.0.0 (20190309)
DRAGON_VERSION == 3000 DRAGON_VERSION == 3000
Changes (w.r.t. Dragon 0.2.2.13): Changes (w.r.t. Dragon 0.2.2.13):
...@@ -24,6 +24,8 @@ Preview Features: ...@@ -24,6 +24,8 @@ Preview Features:
- Use ``Eigen`` as the default cpu math library instead of ``OpenBLAS``. - Use ``Eigen`` as the default cpu math library instead of ``OpenBLAS``.
- Use ``PyBind11`` as the default python module exporter.
- Integer data types support for common operators, - Integer data types support for common operators,
see the documentation for more detail information. see the documentation for more detail information.
...@@ -32,6 +34,8 @@ Preview Features: ...@@ -32,6 +34,8 @@ Preview Features:
which unifies the naming of static and dynamic computation graph. which unifies the naming of static and dynamic computation graph.
- The behavior of accumulating gradients have been canceled.
Bugs fixed: Bugs fixed:
......
...@@ -8,23 +8,22 @@ ...@@ -8,23 +8,22 @@
Quick Reference Quick Reference
--------------- ---------------
========================== ============================================================================= =============================== =============================================================================
List Brief List Brief
========================== ============================================================================= =============================== =============================================================================
`EnableCPU`_ Enable CPU mode globally. `EnableCPU`_ Enable CPU mode globally.
`IsCUDADriverSufficient`_ Is CUDADriver sufficient? `EnableCUDA`_ Enable CUDA mode globally.
`EnableCUDA`_ Enable CUDA mode globally. `SetRandomSeed`_ Set the global random seed.
`SetRandomSeed`_ Set the global random seed. `GetRandomSeed`_ Get the global random seed.
`GetRandomSeed`_ Get the global random seed. `SetGPU`_ Set the global id GPU.
`SetGPU`_ Set the global id GPU. `GetGPU`_ Get the global id of GPU.
`GetGPU`_ Get the global id of GPU. `SetGraphOptimizationLevel`_ Set the default level of graph optimization.
`SetDebugMode`_ Enable Debug mode globally. `LogMetaGraph`_ Enable to log meta graph globally.
`LogMetaGraph`_ Enable to log meta graph globally. `LogOptimizedGraph`_ Enable to log optimized graph globally.
`LogOptimizedGraph`_ Enable to log optimized graph globally. `ExportMetaGraph`_ Enable to export all runnable meta graphs into text files.
`ExportMetaGraph`_ Enable to export all runnable meta graphs into text files. `SetLoggingLevel`_ Set the minimum level of Logging.
`SetLoggingLevel`_ Set the minimum level of Logging. `SetLoggingFile`_ Redirect the logging into the specific file.
`SetLoggingFile`_ Redirect the logging into the specific file. =============================== =============================================================================
========================== =============================================================================
API Reference API Reference
------------- -------------
...@@ -33,13 +32,12 @@ API Reference ...@@ -33,13 +32,12 @@ API Reference
:members: :members:
.. _EnableCPU: #dragon.config.EnableCPU .. _EnableCPU: #dragon.config.EnableCPU
.. _IsCUDADriverSufficient: #dragon.config.IsCUDADriverSufficient
.. _EnableCUDA: #dragon.config.EnableCUDA .. _EnableCUDA: #dragon.config.EnableCUDA
.. _SetRandomSeed: #dragon.config.SetRandomSeed .. _SetRandomSeed: #dragon.config.SetRandomSeed
.. _GetRandomSeed: #dragon.config.GetRandomSeed .. _GetRandomSeed: #dragon.config.GetRandomSeed
.. _SetGPU: #dragon.config.SetGPU .. _SetGPU: #dragon.config.SetGPU
.. _GetGPU: #dragon.config.GetGPU .. _GetGPU: #dragon.config.GetGPU
.. _SetDebugMode: #dragon.config.SetDebugMode .. _SetGraphOptimizationLevel: #dragon.config.SetGraphOptimizationLevel
.. _LogMetaGraph: #dragon.config.LogMetaGraph .. _LogMetaGraph: #dragon.config.LogMetaGraph
.. _LogOptimizedGraph: #dragon.config.LogOptimizedGraph .. _LogOptimizedGraph: #dragon.config.LogOptimizedGraph
.. _ExportMetaGraph: #dragon.config.ExportMetaGraph .. _ExportMetaGraph: #dragon.config.ExportMetaGraph
......
...@@ -27,6 +27,7 @@ C++ Binding Wrapper ...@@ -27,6 +27,7 @@ C++ Binding Wrapper
core/workspace core/workspace
core/tensor_utils core/tensor_utils
core/mpi core/mpi
core/cuda
core/gradient_maker core/gradient_maker
============================== ======================================================================= ============================== =======================================================================
...@@ -34,11 +35,13 @@ List Brief ...@@ -34,11 +35,13 @@ List Brief
============================== ======================================================================= ============================== =======================================================================
`dragon.core.workspace`_ The interfaces of Workspace, mostly are the wrappers of C++. `dragon.core.workspace`_ The interfaces of Workspace, mostly are the wrappers of C++.
`dragon.core.gradient_maker`_ The generator of GradientOps. `dragon.core.gradient_maker`_ The generator of GradientOps.
`dragon.core.tensor_utils`_ The Tensor utilities. `dragon.core.tensor_utils`_ List some extended Tensor C++ API.
`dragon.core.mpi`_ The MPI utilities. `dragon.core.mpi`_ List some useful MPI C++ API.
`dragon.core.cuda`_ List some useful CUDA C++ API.
============================== ======================================================================= ============================== =======================================================================
.. _dragon.core.mpi: core/mpi.html .. _dragon.core.mpi: core/mpi.html
.. _dragon.core.cuda: core/cuda.html
.. _dragon.core.scope: core/scope.html .. _dragon.core.scope: core/scope.html
.. _dragon.core.tensor: core/tensor.html .. _dragon.core.tensor: core/tensor.html
.. _dragon.core.tensor_utils: core/tensor_utils.html .. _dragon.core.tensor_utils: core/tensor_utils.html
......
===========
:mod:`CUDA`
===========
.. toctree::
:hidden:
Quick Reference
---------------
============================== =============================================================================
List Brief
============================== =============================================================================
`IsCUDADriverSufficient`_ Is cuda driver sufficient?
`GetDevice`_ Get the current active cuda device.
`SynchronizeStream`_ Synchronize the specified cuda stream.
============================== =============================================================================
.. automodule:: dragon.core.cuda
:members:
.. _IsCUDADriverSufficient: #dragon.core.cuda.IsCUDADriverSufficient
.. _GetDevice: #dragon.core.cuda.GetDevice
.. _SynchronizeStream: #dragon.core.cuda.SynchronizeStream
\ No newline at end of file
...@@ -16,10 +16,9 @@ List Brief ...@@ -16,10 +16,9 @@ List Brief
`FromPyArray`_ Create a Tensor from a existing Array. `FromPyArray`_ Create a Tensor from a existing Array.
`SetPyArray`_ Set a Tensor from a existing Array. `SetPyArray`_ Set a Tensor from a existing Array.
`ToPyArray`_ Create a Array from a existing Tensor. `ToPyArray`_ Create a Array from a existing Tensor.
`ToPyArrayEx`_ Create a const Array from a existing Tensor. `GetStorage`_ Get the storage of a existing Tensor.
`ToCPUTensor`_ Switch the storage of a existing Tensor on cpu memory. `ToCPUTensor`_ Switch the storage of a existing Tensor on cpu memory.
`ToCUDATensor`_ Switch the storage of a existing Tensor on cuda memory. `ToCUDATensor`_ Switch the storage of a existing Tensor on cuda memory.
`GetTensorInfo`_ Get the info of a existing Tensor.
============================== ============================================================================= ============================== =============================================================================
API Reference API Reference
...@@ -33,7 +32,6 @@ API Reference ...@@ -33,7 +32,6 @@ API Reference
.. _FromPyArray: #dragon.core.tensor_utils.FromPyArray .. _FromPyArray: #dragon.core.tensor_utils.FromPyArray
.. _SetPyArray: #dragon.core.tensor_utils.SetPyArray .. _SetPyArray: #dragon.core.tensor_utils.SetPyArray
.. _ToPyArray: #dragon.core.tensor_utils.ToPyArray .. _ToPyArray: #dragon.core.tensor_utils.ToPyArray
.. _ToPyArrayEx: #dragon.core.tensor_utils.ToPyArrayEx .. _GetStorage: #dragon.core.tensor_utils.GetStorage
.. _ToCPUTensor: #dragon.core.tensor_utils.ToCPUTensor .. _ToCPUTensor: #dragon.core.tensor_utils.ToCPUTensor
.. _ToCUDATensor: #dragon.core.tensor_utils.ToCUDATensor .. _ToCUDATensor: #dragon.core.tensor_utils.ToCUDATensor
.. _GetTensorInfo: #dragon.core.tensor_utils.GetTensorInfo \ No newline at end of file
\ No newline at end of file
...@@ -14,7 +14,7 @@ List Brief ...@@ -14,7 +14,7 @@ List Brief
`HasTensor`_ Query whether tensor has registered in current workspace. `HasTensor`_ Query whether tensor has registered in current workspace.
`CreateFiller`_ Create the filler in the backend. `CreateFiller`_ Create the filler in the backend.
`GetTensorName`_ Query the name represented in current workspace. `GetTensorName`_ Query the name represented in current workspace.
`RenameTensor`_ Rename a tensor in current workspace. `SetTensorAlias`_ Bind a alias to a existed tensor.
`FeedTensor`_ Feed the values to the given tensor. `FeedTensor`_ Feed the values to the given tensor.
`FetchTensor`_ Fetch the values of given tensor. `FetchTensor`_ Fetch the values of given tensor.
`ResetTensor`_ Reset the memory of given tensor. `ResetTensor`_ Reset the memory of given tensor.
...@@ -27,7 +27,7 @@ Operator ...@@ -27,7 +27,7 @@ Operator
============================== ============================================================================= ============================== =============================================================================
List Brief List Brief
============================== ============================================================================= ============================== =============================================================================
`RunOperator`_ Create and Run the operator in the VM backend. `RunOperator`_ Run the operator in the VM backend.
============================== ============================================================================= ============================== =============================================================================
...@@ -39,7 +39,6 @@ List Brief ...@@ -39,7 +39,6 @@ List Brief
============================== ============================================================================= ============================== =============================================================================
`CreateGraph`_ Create the graph in the backend. `CreateGraph`_ Create the graph in the backend.
`RunGraph`_ Run the specific graph. `RunGraph`_ Run the specific graph.
`RunGraphEx`_ Run the graph from the meta definition.
============================== ============================================================================= ============================== =============================================================================
Misc Misc
...@@ -73,14 +72,13 @@ API Reference ...@@ -73,14 +72,13 @@ API Reference
.. _CreateGraph: #dragon.core.workspace.CreateGraph .. _CreateGraph: #dragon.core.workspace.CreateGraph
.. _HasTensor: #dragon.core.workspace.HasTensor .. _HasTensor: #dragon.core.workspace.HasTensor
.. _GetTensorName: #dragon.core.workspace.GetTensorName .. _GetTensorName: #dragon.core.workspace.GetTensorName
.. _RenameTensor: #dragon.core.workspace.RenameTensor .. _SetTensorAlias: #dragon.core.workspace.SetTensorAlias
.. _CreateFiller: #dragon.core.workspace.CreateFiller .. _CreateFiller: #dragon.core.workspace.CreateFiller
.. _FetchTensor: #dragon.core.workspace.FetchTensor .. _FetchTensor: #dragon.core.workspace.FetchTensor
.. _FeedTensor: #dragon.core.workspace.FeedTensor .. _FeedTensor: #dragon.core.workspace.FeedTensor
.. _ResetTensor: #dragon.core.workspace.ResetTensor .. _ResetTensor: #dragon.core.workspace.ResetTensor
.. _RunOperator: #dragon.core.workspace.RunOperator .. _RunOperator: #dragon.core.workspace.RunOperator
.. _RunGraph: #dragon.core.workspace.RunGraph .. _RunGraph: #dragon.core.workspace.RunGraph
.. _RunGraphEx: #dragon.core.workspace.RunGraphEx
.. _Snapshot: #dragon.core.workspace.Snapshot .. _Snapshot: #dragon.core.workspace.Snapshot
.. _Restore: #dragon.core.workspace.Restore .. _Restore: #dragon.core.workspace.Restore
.. _LogMetaGraph: #dragon.core.workspace.LogMetaGraph .. _LogMetaGraph: #dragon.core.workspace.LogMetaGraph
......
...@@ -42,7 +42,6 @@ List Brief ...@@ -42,7 +42,6 @@ List Brief
`NNResize`_ Resize the image with *Nearest-Neighbor* method. `NNResize`_ Resize the image with *Nearest-Neighbor* method.
`BilinearResize`_ Resize the image with *Bi-Linear* method. `BilinearResize`_ Resize the image with *Bi-Linear* method.
`BiasAdd`_ Add the bias across channels to a *NCHW* or *NHWC* input. `BiasAdd`_ Add the bias across channels to a *NCHW* or *NHWC* input.
`DenseConcat`_ Memory-efficient concatenation for DenseNet. `[Huang et.al, 2017] <http://arxiv.org/abs/1608.06993>`_.
`DropBlock2d`_ Randomly drop the outputs according to the spatial blocks. `[Ghiasi et.al, 2018] <https://arxiv.org/abs/1810.12890>`_. `DropBlock2d`_ Randomly drop the outputs according to the spatial blocks. `[Ghiasi et.al, 2018] <https://arxiv.org/abs/1810.12890>`_.
=================== ====================================================================== =================== ======================================================================
...@@ -113,7 +112,9 @@ List Brief ...@@ -113,7 +112,9 @@ List Brief
`Eltwise`_ Element-wise Sum or Product the arbitrary number of inputs. `Eltwise`_ Element-wise Sum or Product the arbitrary number of inputs.
`Affine`_ Calculate *Y = Ax + b* along the given range of axes. `Affine`_ Calculate *Y = Ax + b* along the given range of axes.
`GramMatrix`_ Calculate the gram matrix. `[Gatys et.al, 2016] <https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf>`_. `GramMatrix`_ Calculate the gram matrix. `[Gatys et.al, 2016] <https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf>`_.
`Moments`_ Compute the mean and variance of inputs along the given axes. `Moments`_ Calculate the mean and variance of inputs along the given axes.
`Accumulate`_ Calculate *y = alpha * x + beta * y*
`MovingAverage`_ Calculate the *y = (1 - decay) * x + decay * y*
================== ====================================================================== ================== ======================================================================
Normalization Normalization
...@@ -174,12 +175,11 @@ Misc ...@@ -174,12 +175,11 @@ Misc
================= ====================================================================== ================= ======================================================================
List Brief List Brief
================= ====================================================================== ================= ======================================================================
`AsType`_ Cast the data type of inputs to a specific one. `Cast`_ Cast the data type of inputs to a specific one.
`Run`_ Run a custom operator. (Without GradientFlow) `Run`_ Run a custom operator. (Without GradientFlow)
`Template`_ Run a custom operator. (With GradientFlow) `Template`_ Run a custom operator. (With GradientFlow)
`Accuracy`_ Calculate the Top-K accuracy. `Accuracy`_ Calculate the Top-K accuracy.
`StopGradient`_ Return the identity of input with truncated gradient flow. `StopGradient`_ Return the identity of input with truncated gradient flow.
`MovingAverage`_ Calculate the moving average.
================= ====================================================================== ================= ======================================================================
Contrib Contrib
...@@ -268,6 +268,8 @@ List Brief ...@@ -268,6 +268,8 @@ List Brief
.. _Affine: operators/arithmetic.html#dragon.operators.arithmetic.Affine .. _Affine: operators/arithmetic.html#dragon.operators.arithmetic.Affine
.. _GramMatrix: operators/arithmetic.html#dragon.operators.arithmetic.GramMatrix .. _GramMatrix: operators/arithmetic.html#dragon.operators.arithmetic.GramMatrix
.. _Moments: operators/arithmetic.html#dragon.operators.arithmetic.Moments .. _Moments: operators/arithmetic.html#dragon.operators.arithmetic.Moments
.. _Accumulate: operators/arithmetic.html#dragon.operators.arithmetic.Accumulate
.. _MovingAverage: operators/arithmetic.html#dragon.operators.arithmetic.MovingAverage
.. _BatchNorm: operators/norm.html#dragon.operators.norm.BatchNorm .. _BatchNorm: operators/norm.html#dragon.operators.norm.BatchNorm
.. _GroupNorm: operators/norm.html#dragon.operators.norm.GroupNorm .. _GroupNorm: operators/norm.html#dragon.operators.norm.GroupNorm
...@@ -304,12 +306,11 @@ List Brief ...@@ -304,12 +306,11 @@ List Brief
.. _Less: operators/control_flow.html#dragon.operators.control_flow.Less .. _Less: operators/control_flow.html#dragon.operators.control_flow.Less
.. _Greater: operators/control_flow.html#dragon.operators.control_flow.Greater .. _Greater: operators/control_flow.html#dragon.operators.control_flow.Greater
.. _AsType: operators/misc.html#dragon.operators.misc.AsType .. _Cast: operators/misc.html#dragon.operators.misc.Cast
.. _Run: operators/misc.html#dragon.operators.misc.Run .. _Run: operators/misc.html#dragon.operators.misc.Run
.. _Template: operators/misc.html#dragon.operators.misc.Template .. _Template: operators/misc.html#dragon.operators.misc.Template
.. _Accuracy: operators/misc.html#dragon.operators.misc.Accuracy .. _Accuracy: operators/misc.html#dragon.operators.misc.Accuracy
.. _StopGradient: operators/misc.html#dragon.operators.misc.StopGradient .. _StopGradient: operators/misc.html#dragon.operators.misc.StopGradient
.. _MovingAverage: operators/misc.html#dragon.operators.misc.MovingAverage
.. _Proposal: operators/contrib/rcnn.html#dragon.operators.contrib.rcnn.ops.Proposal .. _Proposal: operators/contrib/rcnn.html#dragon.operators.contrib.rcnn.ops.Proposal
......
...@@ -19,16 +19,12 @@ ToolBox ...@@ -19,16 +19,12 @@ ToolBox
:hidden: :hidden:
tools/db tools/db
tools/im2db
tools/summary_writer
tools/tensorboard tools/tensorboard
==================== ==================================================================================== ==================== ====================================================================================
List Brief List Brief
==================== ==================================================================================== ==================== ====================================================================================
`LMDB`_ A wrapper of LMDB package. `LMDB`_ A wrapper of LMDB package.
`IM2DB`_ Make the sequential database for images.
`SummaryWriter`_ Write summaries for DragonBoard.
`TensorBoard`_ Write summaries for TensorBoard. `TensorBoard`_ Write summaries for TensorBoard.
==================== ==================================================================================== ==================== ====================================================================================
...@@ -38,8 +34,5 @@ List Brief ...@@ -38,8 +34,5 @@ List Brief
<p style="text-indent:1.5em; font-size: 18px; max-width: 830px;"> <p style="text-indent:1.5em; font-size: 18px; max-width: 830px;">
.. _pip: https://pypi.python.org/pypi/pip .. _pip: https://pypi.python.org/pypi/pip
.. _LMDB: tools/db.html .. _LMDB: tools/db.html
.. _IM2DB: tools/im2db.html
.. _SummaryWriter: tools/summary_writer.html
.. _TensorBoard: tools/tensorboard.html .. _TensorBoard: tools/tensorboard.html
====================
:mod:`SummaryWriter`
====================
.. toctree::
:hidden:
Quick Reference
---------------
==================== =============================================================================
List Brief
==================== =============================================================================
`ScalarSummary`_ Write scalar summary.
==================== =============================================================================
API Reference
-------------
.. currentmodule:: dragon.tools.summary_writer
.. autoclass:: ScalarSummary
:members:
.. automethod:: __init__
.. _ScalarSummary: #dragon.tools.summary_writer.ScalarSummary
\ No newline at end of file
...@@ -2,40 +2,30 @@ ...@@ -2,40 +2,30 @@
:mod:`dragon.utils` :mod:`dragon.utils`
=================== ===================
Wrapper Vision
------- ------
.. toctree:: .. toctree::
:hidden: :hidden:
utils/vision/database
utils/vision/data_batch utils/vision/data_batch
=================================== =====================================================================
List Brief
=================================== =====================================================================
`dragon.utils.vision.data_batch`_ Efficient Batch data provider based on `LMDB`_.
=================================== =====================================================================
Component
---------
.. toctree::
:hidden:
utils/vision/data_reader utils/vision/data_reader
utils/vision/data_transformer utils/vision/data_transformer
utils/vision/blob_fetcher utils/vision/blob_fetcher
========================================== ===================================================================== ========================================= =====================================================================
List Brief List Brief
========================================== ===================================================================== ========================================= =====================================================================
`dragon.utils.vision.data_reader`_ Queue encoded string from `LMDB`_. `dragon.utils.vision.im2db`_ Make the sequential database for images.
`dragon.utils.vision.data_transformer`_ Queue transformed images from `DataReader`_. `dragon.utils.vision.data_batch`_ Efficient Batch data provider based on `LMDB`_.
`dragon.utils.vision.blob_fetcher`_ Queue blobs from `DataTransformer`_. `dragon.utils.vision.data_reader`_ Queue encoded string from `LMDB`_.
========================================== ===================================================================== `dragon.utils.vision.data_transformer`_ Queue transformed images from `DataReader`_.
`dragon.utils.vision.blob_fetcher`_ Queue blobs from `DataTransformer`_.
========================================= =====================================================================
.. _LMDB: http://lmdb.readthedocs.io/en/release .. _LMDB: http://lmdb.readthedocs.io/en/release
.. _dragon.utils.vision.im2db: utils/vision/database.html
.. _DataReader: utils/vision/data_reader.html#dragon.utils.vision.data_reader .. _DataReader: utils/vision/data_reader.html#dragon.utils.vision.data_reader
.. _DataTransformer: utils/vision/data_transformer.html#dragon.utils.vision.data_transformer .. _DataTransformer: utils/vision/data_transformer.html#dragon.utils.vision.data_transformer
.. _dragon.utils.vision.data_batch: utils/vision/data_batch.html .. _dragon.utils.vision.data_batch: utils/vision/data_batch.html
......
============ ===============
:mod:`IM2DB` :mod:`Database`
============ ===============
.. toctree:: .. toctree::
:hidden: :hidden:
...@@ -19,8 +19,8 @@ List Brief ...@@ -19,8 +19,8 @@ List Brief
API Reference API Reference
------------- -------------
.. automodule:: dragon.tools.im2db .. automodule:: dragon.utils.vision.im2db
:members: :members:
.. _resize_image: #dragon.tools.im2db.resize_image .. _resize_image: #dragon.utils.vision.im2db.resize_image
.. _make_db: #dragon.tools.im2db.make_db .. _make_db: #dragon.utils.vision.im2db.make_db
\ No newline at end of file \ No newline at end of file
...@@ -20,20 +20,23 @@ VirtualBox ...@@ -20,20 +20,23 @@ VirtualBox
vm/caffe vm/caffe
vm/theano vm/theano
vm/torch
==================== ==================================================================================== ==================== ====================================================================================
List Brief List Brief
==================== ==================================================================================== ==================== ====================================================================================
`Theano`_ **Theano** is an inception of the modern deep learning frameworks. `Theano`_ **Theano** is an inception of the modern deep learning frameworks.
`Caffe`_ **Caffe** is one of the most famous deep learning framework for Computer Vision. `Caffe`_ **Caffe** is one of the most famous deep learning framework for Computer Vision.
`PyTorch`_ **PyTorch** provides straight-forward operations on research prototyping.
==================== ==================================================================================== ==================== ====================================================================================
.. |para| raw:: html .. |para| raw:: html
<p style="text-indent:1.5em; font-size: 18px; max-width: 830px;"> <p style="text-indent:1.5em; font-size: 18px; max-width: 830px;">
.. _TinyDragon: ../index.html#tinydragon .. _TinyDragon: ../index.html#tinydragon
.. _Theano: vm/theano.html .. _Theano: vm/theano.html
.. _Caffe: vm/caffe.html .. _Caffe: vm/caffe.html
.. _PyTorch: vm/torch.html
.. _TensorFlow: ../index.html#tensorflow .. _TensorFlow: ../index.html#tensorflow
...@@ -66,7 +66,6 @@ List Brief ...@@ -66,7 +66,6 @@ List Brief
`AddLayer`_ The extended implementation of ``EltwiseLayer``. `AddLayer`_ The extended implementation of ``EltwiseLayer``.
`ConcatLayer`_ The implementation of ``ConcatLayer``. `ConcatLayer`_ The implementation of ``ConcatLayer``.
`SliceLayer`_ The implementation of ``SliceLayer``. `SliceLayer`_ The implementation of ``SliceLayer``.
`DenseConcatLayer`_ The implementation for `DenseNet`_.
`CropLayer`_ The implementation of ``CropLayer``. `CropLayer`_ The implementation of ``CropLayer``.
`ReshapeLayer`_ The implementation of ``ReshapeLayer``. `ReshapeLayer`_ The implementation of ``ReshapeLayer``.
`PermuteLayer`_ The implementation of ``PermuteLayer``. `PermuteLayer`_ The implementation of ``PermuteLayer``.
...@@ -180,7 +179,6 @@ API Reference ...@@ -180,7 +179,6 @@ API Reference
.. _AddLayer: #dragon.vm.caffe.layers.common.AddLayer .. _AddLayer: #dragon.vm.caffe.layers.common.AddLayer
.. _ConcatLayer: #dragon.vm.caffe.layers.common.ConcatLayer .. _ConcatLayer: #dragon.vm.caffe.layers.common.ConcatLayer
.. _SliceLayer: #dragon.vm.caffe.layers.common.SliceLayer .. _SliceLayer: #dragon.vm.caffe.layers.common.SliceLayer
.. _DenseConcatLayer: #dragon.vm.caffe.layers.common.DenseConcatLayer
.. _CropLayer: #dragon.vm.caffe.layers.common.CropLayer .. _CropLayer: #dragon.vm.caffe.layers.common.CropLayer
.. _ReshapeLayer: #dragon.vm.caffe.layers.common.ReshapeLayer .. _ReshapeLayer: #dragon.vm.caffe.layers.common.ReshapeLayer
.. _PermuteLayer: #dragon.vm.caffe.layers.common.PermuteLayer .. _PermuteLayer: #dragon.vm.caffe.layers.common.PermuteLayer
...@@ -210,12 +208,10 @@ API Reference ...@@ -210,12 +208,10 @@ API Reference
.. _MPIBroadcastLayer: #dragon.vm.caffe.layers.mpi.MPIBroadcastLayer .. _MPIBroadcastLayer: #dragon.vm.caffe.layers.mpi.MPIBroadcastLayer
.. _MPIGatherLayer: #dragon.vm.caffe.layers.mpi.MPIGatherLayer .. _MPIGatherLayer: #dragon.vm.caffe.layers.mpi.MPIGatherLayer
.. _Layer.Setup: #dragon.vm.caffe.layer.Layer.Setup .. _Layer.Setup: #dragon.vm.caffe.layer.Layer.Setup
.. _Layer.Fill: #dragon.vm.caffe.layer.Layer.Fill .. _Layer.Fill: #dragon.vm.caffe.layer.Layer.Fill
.. _LMDB: http://lmdb.readthedocs.io/en/release .. _LMDB: http://lmdb.readthedocs.io/en/release
.. _DenseNet: http://arxiv.org/abs/1608.06993
.. _LayerSetUp(layer.hpp, L91): https://github.com/BVLC/caffe/blob/effcdb0b62410b2a6a54f18f23cf90733a115673/include/caffe/layer.hpp#L91 .. _LayerSetUp(layer.hpp, L91): https://github.com/BVLC/caffe/blob/effcdb0b62410b2a6a54f18f23cf90733a115673/include/caffe/layer.hpp#L91
.. _DataParameter.source: https://github.com/BVLC/caffe/blob/effcdb0b62410b2a6a54f18f23cf90733a115673/src/caffe/proto/caffe.proto#L647 .. _DataParameter.source: https://github.com/BVLC/caffe/blob/effcdb0b62410b2a6a54f18f23cf90733a115673/src/caffe/proto/caffe.proto#L647
.. _DataParameter.prefetch: https://github.com/BVLC/caffe/blob/effcdb0b62410b2a6a54f18f23cf90733a115673/src/caffe/proto/caffe.proto#L672 .. _DataParameter.prefetch: https://github.com/BVLC/caffe/blob/effcdb0b62410b2a6a54f18f23cf90733a115673/src/caffe/proto/caffe.proto#L672
......
============
:mod:`Torch`
============
Abstraction
-----------
|para| `PyTorch`_ provides straight-forward operations on research prototyping.
|para| We are aware that **Dragon** is a graph-based framework with strictly naming
for tensors, operators, and workspaces, while `Torch`_ is not.
A simple way to bridge their differences is **JIT**, which traces the anonymous expressions,
indicates a series of executions to the backend. If so, **AutoGrad** will just be a trick(Remember the *Chain Rule*).
|para| Rewriting the GC(*Garbage Collection*) is crucial in this role,
as the costly deconstruction on memories and operators must be avoided.
We could either persist a Operator(i.e. **Module**),
or reuse the several memories by turns(i.e. **MemoryPool**), if naming them formally.
|para| We are still working hard to cover the original PyTorch operators,
however, a bunch of extended operators in many other frameworks can be used.
Our **PyTorch** will be unique and more powerful than the official one.
Related Work
------------
|paratitle| **Proto-based Intermediate Representation**
|para| Recent years, several powerful frameworks choose the ProtocolBuffer to
describe the operators with various arguments, including `Caffe`_, `Caffe2`_, `TensorFlow`_, and `ONNX`_.
The most important reason is that, these descriptors can be easily serialized and sent to the backend.
With the help of **Factory Pattern**, we have had an elegant way to dispatch the executions, while not
call them imperatively. This way is also known as the **Declarative Programming**.
|para| Attaching the IR(Intermediate Representation) takes the following advantages:
* Traceable pipelines, much helpful for visualizing and debugging.
* Deterministic executions, detailed optimization can be applied.
* Efficient deployments, data-flows has been well organized.
|para| A good news is that, we can reduce the overhead of IR below 5% of computation time,
which means the dynamic graph could work as fast as the static graph while retain the flexibility.
|paratitle| **Caffe2**
|para| We have noticed that some developers discouraged the **Declarative Programming** in 2017 and early 2018,
due to the counter-intuitive building of computation graph. Actually, `Caffe2`_ had published Operator-Wise execution
(a.k.a, *workspace.RunOperator()*) since 2016. In other words, **Imperative Programming** is the subset of **Declarative Programming**,
if we process the declaration implicitly. This mechanism is sometimes called **JIT** by someone.
Architectures
-------------
.. toctree::
:hidden:
.. _Torch: http://torch.ch
.. _PyTorch: https://pytorch.org
.. _Caffe: http://caffe.berkeleyvision.org
.. _Caffe2: http://caffe2.ai
.. _TensorFlow: https://www.tensorflow.org
.. _ONNX: https://onnx.ai
.. |nbsp| raw:: html
&nbsp
.. |br| raw:: html
<br />
.. |paratitle| raw:: html
<p style="font-size: 20px">
.. |sectitle| raw:: html
<p style="text-indent:1em; font-size: 18px">
.. |para| raw:: html
<p style="text-indent:1.5em; font-size: 18px; max-width: 830px;">
.. |context| raw:: html
<p style="font-size: 18px; max-width: 830px;">
...@@ -97,6 +97,7 @@ include_directories(${PROJECT_SOURCE_DIR}/src) ...@@ -97,6 +97,7 @@ include_directories(${PROJECT_SOURCE_DIR}/src)
if (BUILD_PYTHON_API) if (BUILD_PYTHON_API)
include_directories(${PYTHON_INCLUDE_DIRS}) include_directories(${PYTHON_INCLUDE_DIRS})
include_directories(${NUMPY_INCLUDE_DIR}) include_directories(${NUMPY_INCLUDE_DIR})
include_directories(${THIRD_PARTY_DIR}/pybind11/include)
endif() endif()
if (WITH_CUDA) if (WITH_CUDA)
include_directories(${CUDA_INCLUDE_DIRS}) include_directories(${CUDA_INCLUDE_DIRS})
......
...@@ -38,7 +38,7 @@ class CPUContext { ...@@ -38,7 +38,7 @@ class CPUContext {
void SwitchToDevice() {} void SwitchToDevice() {}
/*! \brief Switch to the device with the given stream */ /*! \brief Switch to the device with the given stream */
void SwitchToDevice(int stream_id) {} void SwitchToDevice(const int stream_id) {}
/*! \brief Synchronize the dispatched operations */ /*! \brief Synchronize the dispatched operations */
void FinishDeviceCompution() {} void FinishDeviceCompution() {}
...@@ -106,6 +106,9 @@ class CPUContext { ...@@ -106,6 +106,9 @@ class CPUContext {
/*! \brief Return the device id */ /*! \brief Return the device id */
int device_id() const { return 0; } int device_id() const { return 0; }
/*! \brief Return the stream id */
int stream_id() const { return 0; }
/*! \brief Set the stream id */ /*! \brief Set the stream id */
void set_stream_id(int stream_id) {} void set_stream_id(int stream_id) {}
......
...@@ -32,6 +32,7 @@ class CNRTObject; ...@@ -32,6 +32,7 @@ class CNRTObject;
class CNMLContext { class CNMLContext {
public: public:
/*! \brief Default Constructor */
CNMLContext(const DeviceOption& option) CNMLContext(const DeviceOption& option)
: device_id_(option.device_id()), : device_id_(option.device_id()),
random_seed_(option.has_random_seed() ? random_seed_(option.has_random_seed() ?
...@@ -39,34 +40,43 @@ class CNMLContext { ...@@ -39,34 +40,43 @@ class CNMLContext {
CHECK_EQ(option.device_type(), PROTO_CNML); CHECK_EQ(option.device_type(), PROTO_CNML);
} }
/*! \brief Constructor with the specified device id */
CNMLContext(const int device_id = 0) CNMLContext(const int device_id = 0)
: device_id_(device_id), : device_id_(device_id),
random_seed_(DEFAULT_RNG_SEED) {} random_seed_(DEFAULT_RNG_SEED) {}
/*! \brief Switch to the device with the given stream */
void SwitchToDevice(int stream_id); void SwitchToDevice(int stream_id);
inline void SwitchToDevice() { SwitchToDevice(1); } /*! \brief Switch to the device of this context */
inline void SwitchToDevice() { SwitchToDevice(0); }
/*! \brief Synchronize the dispatched operations */
void FinishDeviceCompution(); void FinishDeviceCompution();
/*! \brief Malloc the memory */
static void* New(size_t nbytes); static void* New(size_t nbytes);
/*! \brief Zero-Reset the memory */
static void Memset( static void Memset(
size_t nbytes, size_t nbytes,
void* ptr); void* ptr);
/*! \brief Zero-Reset the memory asynchronously */
inline void MemsetAsync( inline void MemsetAsync(
size_t nbytes, size_t nbytes,
void* ptr) { void* ptr) {
Memset(nbytes, ptr); Memset(nbytes, ptr);
} }
/*! \brief Copy the memory */
template<class DstContext, class SrcContext> template<class DstContext, class SrcContext>
static void Memcpy( static void Memcpy(
size_t nbytes, size_t nbytes,
void* dst, void* dst,
const void* src); const void* src);
/*! \brief Copy the memory with given type asynchronously */
template<class DstContext, class SrcContext> template<class DstContext, class SrcContext>
inline void MemcpyAsync( inline void MemcpyAsync(
size_t nbytes, size_t nbytes,
...@@ -75,23 +85,33 @@ class CNMLContext { ...@@ -75,23 +85,33 @@ class CNMLContext {
Memcpy<DstContext, SrcContext>(dst, src, nbytes); Memcpy<DstContext, SrcContext>(dst, src, nbytes);
} }
/*! \brief Free the memory */
static void Delete(void* data); static void Delete(void* data);
inline int device_id() const { return device_id_; } /*! \brief Return the device id */
int device_id() const { return device_id_; }
inline void set_stream_id(int stream_id) { stream_id_ = stream_id; } /*! \brief Return the stream id */
int stream_id() const { return stream_id_; }
/*! \brief Set the stream id */
void set_stream_id(int stream_id) { stream_id_ = stream_id; }
inline cnrtStream_t cnrt_stream() { /*! \brief Return the internal cnrt stream */
cnrtStream_t cnrt_stream() {
return cnrt_stream(device_id_, stream_id_); return cnrt_stream(device_id_, stream_id_);
} }
/*! \brief Return the specified cnrt stream */
static cnrtStream_t cnrt_stream( static cnrtStream_t cnrt_stream(
int device_id, int device_id,
int stream_id); int stream_id);
/*! \brief Return the global context locker */
static std::mutex& mutex() { static std::mutex m; return m; } static std::mutex& mutex() { static std::mutex m; return m; }
static CNRTObject* cuda_object(); /*! \brief Return the thread local cnrt object */
static CNRTObject* cnrt_object();
private: private:
int device_id_, stream_id_ = 1, random_seed_; int device_id_, stream_id_ = 1, random_seed_;
......
...@@ -80,11 +80,16 @@ class CUDAObject { ...@@ -80,11 +80,16 @@ class CUDAObject {
} return dev_streams[stream_id]; } return dev_streams[stream_id];
} }
/*! \brief Return the default cuda stream */ /*! \brief Return the default cuda stream of current device */
cudaStream_t GetDefaultStream() { cudaStream_t GetDefaultStream() {
return GetStream(CUDA_GET_DEVICE(), 0); return GetStream(CUDA_GET_DEVICE(), 0);
} }
/*! \brief Return the default cuda stream of given device */
cudaStream_t GetDefaultStream(int device_id) {
return GetStream(device_id, 0);
}
/*! \brief Return the specified cublas handle */ /*! \brief Return the specified cublas handle */
cublasHandle_t GetCuBLASHandle(int device_id, int stream_id) { cublasHandle_t GetCuBLASHandle(int device_id, int stream_id) {
vector<cublasHandle_t>& dev_handles = cublas_handles[device_id]; vector<cublasHandle_t>& dev_handles = cublas_handles[device_id];
...@@ -141,13 +146,13 @@ class CUDAContext { ...@@ -141,13 +146,13 @@ class CUDAContext {
random_seed_(DEFAULT_RNG_SEED) {} random_seed_(DEFAULT_RNG_SEED) {}
/*! \brief Switch to the device with the given stream */ /*! \brief Switch to the device with the given stream */
void SwitchToDevice(int stream_id) { void SwitchToDevice(const int stream_id) {
CUDA_CHECK(cudaSetDevice(device_id_)); CUDA_CHECK(cudaSetDevice(device_id_));
stream_id_ = stream_id; stream_id_ = stream_id;
} }
/*! \brief Switch to the device of this context */ /*! \brief Switch to the device of this context */
void SwitchToDevice() { SwitchToDevice(1); } void SwitchToDevice() { SwitchToDevice(0); }
/*! \brief Synchronize the dispatched operations */ /*! \brief Synchronize the dispatched operations */
void FinishDeviceCompution() { void FinishDeviceCompution() {
...@@ -191,8 +196,19 @@ class CUDAContext { ...@@ -191,8 +196,19 @@ class CUDAContext {
size_t nbytes, size_t nbytes,
void* dst, void* dst,
const void* src) { const void* src) {
MemcpyEx<DstContext, SrcContext>(
nbytes, dst, src, active_device_id());
}
/*! \brief Copy the memory [Extended] */
template<class DstContext, class SrcContext>
static void MemcpyEx(
size_t nbytes,
void* dst,
const void* src,
int device_id) {
cudaStream_t stream = CUDAContext:: cudaStream_t stream = CUDAContext::
cuda_object()->GetDefaultStream(); cuda_object()->GetDefaultStream(device_id);
CUDA_CHECK(cudaMemcpyAsync(dst, src, nbytes, CUDA_CHECK(cudaMemcpyAsync(dst, src, nbytes,
cudaMemcpyDefault, stream)); cudaMemcpyDefault, stream));
cudaError_t error = SynchronizeStream(stream); cudaError_t error = SynchronizeStream(stream);
...@@ -230,9 +246,15 @@ class CUDAContext { ...@@ -230,9 +246,15 @@ class CUDAContext {
return cudaGetLastError(); return cudaGetLastError();
} }
/*! \brief Return the device id */ /*! \brief Return the device id of this context */
int device_id() const { return device_id_; } int device_id() const { return device_id_; }
/*! \brief Return the active device id of current thread */
static int active_device_id() { return CUDA_GET_DEVICE(); }
/*! \brief Return the stream id */
int stream_id() const { return stream_id_; }
/*! \brief Set the stream id */ /*! \brief Set the stream id */
void set_stream_id(int stream_id) { stream_id_ = stream_id; } void set_stream_id(int stream_id) { stream_id_ = stream_id; }
...@@ -292,85 +314,48 @@ class CUDAContext { ...@@ -292,85 +314,48 @@ class CUDAContext {
} }
private: private:
int device_id_, stream_id_ = 1, random_seed_; int device_id_, stream_id_ = 0, random_seed_;
unique_ptr<std::mt19937> rand_generator_; unique_ptr<std::mt19937> rand_generator_;
curandGenerator_t curand_generator_ = nullptr; curandGenerator_t curand_generator_ = nullptr;
}; };
template <class Context>
class CUDAClosure {
public:
/*! \brief Default Constructor */
CUDAClosure() {}
/*! \brief Constructor with the given context */
explicit CUDAClosure(Context* ctx): ctx_(ctx) {}
/*! \brief Synchronize the dispatched operations */
void Sync() {
for (auto stream_id : active_streams_) {
cudaStreamSynchronize(cuda_object_
.GetStream(ctx_->device_id(), stream_id));
cudaError_t error = cudaGetLastError();
CHECK_EQ(error, cudaSuccess)
<< "\nCUDA Error: " << cudaGetErrorString(error);
}
active_streams_.clear();
}
/*! \brief Return the specified cuda stream */
cudaStream_t cuda_stream(int stream_id) {
active_streams_.push_back(stream_id);
return cuda_object_.GetStream(
ctx_->device_id(), stream_id);
}
/*! \brief Return the specified cublas handle */
cublasHandle_t cublas_handle(int stream_id) {
active_streams_.push_back(stream_id);
return cuda_object_.GetCuBLASHandle(
ctx_->device_id(), stream_id);
}
/*! \brief Return the specified cudnn handle */
#ifdef WITH_CUDNN
cudnnHandle_t cudnn_handle(int stream_id) {
active_streams_.push_back(stream_id);
return cuda_object_.GetCuDNNHandle(
ctx_->device_id(), stream_id);
}
#endif
protected:
Context* ctx_;
CUDAObject cuda_object_;
vector<int> active_streams_;
};
#else // WITH_CUDA #else // WITH_CUDA
class CUDAContext { class CUDAContext {
public: public:
/*! \brief Default Constructor */
CUDAContext(const DeviceOption& option) { CUDA_NOT_COMPILED; } CUDAContext(const DeviceOption& option) { CUDA_NOT_COMPILED; }
/*! \brief Constructor with the specified device id */
CUDAContext(const int device_id = 0) { CUDA_NOT_COMPILED; } CUDAContext(const int device_id = 0) { CUDA_NOT_COMPILED; }
void SwitchToDevice() { CUDA_NOT_COMPILED; } /*! \brief Switch to the device with the given stream */
void SwitchToDevice(int stream_id) { CUDA_NOT_COMPILED; } void SwitchToDevice(int stream_id) { CUDA_NOT_COMPILED; }
/*! \brief Switch to the device of this context */
void SwitchToDevice() { CUDA_NOT_COMPILED; }
/*! \brief Synchronize the dispatched operations */
void FinishDeviceCompution() { CUDA_NOT_COMPILED; } void FinishDeviceCompution() { CUDA_NOT_COMPILED; }
/*! \brief Malloc the memory */
static void* New(size_t nbytes) { CUDA_NOT_COMPILED; }
/*! \brief Zero-Reset the memory */
static void Memset( static void Memset(
size_t nbytes, size_t nbytes,
void* ptr) { void* ptr) {
CUDA_NOT_COMPILED; CUDA_NOT_COMPILED;
} }
/*! \brief Zero-Reset the memory asynchronously */
void MemsetAsync( void MemsetAsync(
size_t nbytes, size_t nbytes,
void* ptr) { void* ptr) {
CUDA_NOT_COMPILED; CUDA_NOT_COMPILED;
} }
/*! \brief Copy the memory */
template<class DstContext, class SrcContext> template<class DstContext, class SrcContext>
static void Memcpy( static void Memcpy(
size_t nbytes, size_t nbytes,
...@@ -379,6 +364,17 @@ class CUDAContext { ...@@ -379,6 +364,17 @@ class CUDAContext {
CUDA_NOT_COMPILED; CUDA_NOT_COMPILED;
} }
/*! \brief Copy the memory [Extended] */
template<class DstContext, class SrcContext>
static void MemcpyEx(
size_t nbytes,
void* dst,
const void* src,
int device_id) {
CUDA_NOT_COMPILED;
}
/*! \brief Copy the memory asynchronously */
template<class DstContext, class SrcContext> template<class DstContext, class SrcContext>
void MemcpyAsync( void MemcpyAsync(
size_t nbytes, size_t nbytes,
...@@ -387,7 +383,16 @@ class CUDAContext { ...@@ -387,7 +383,16 @@ class CUDAContext {
CUDA_NOT_COMPILED; CUDA_NOT_COMPILED;
} }
/*! \brief Return the device id */
int device_id() const { return 0; } int device_id() const { return 0; }
/*! \brief Return the active device id of current thread */
static int active_device_id() { return 0; }
/*! \brief Return the stream id */
int stream_id() const { return 0; }
/*! \brief Set the stream id */
void set_stream_id(int stream_id) {} void set_stream_id(int stream_id) {}
}; };
......
...@@ -20,80 +20,69 @@ namespace dragon { ...@@ -20,80 +20,69 @@ namespace dragon {
class GraphBase { class GraphBase {
public: public:
struct Node { /*! \brief Default constructor */
vector<string> parents;
vector<string> childs;
int op_idx = -1;
OperatorDef op_def;
};
GraphBase( GraphBase(
const GraphDef& meta_graph, const GraphDef& meta_graph,
Workspace* ws); Workspace* ws);
/*! \brief Default deconstructor */
virtual ~GraphBase() {} virtual ~GraphBase() {}
GraphDef BuildUpdateOps(const GraphDef& input_def);
/*! \brief Create a graph from the optimized def */
virtual bool Create( virtual bool Create(
const GraphDef& optimized_graph, const GraphDef& optimized_graph,
Workspace* ws) = 0; Workspace* ws) = 0;
/*! \brief Run the graph once synchronously */
virtual bool Run( virtual bool Run(
const string& include, const string& include,
const string& exclude, const string& exclude,
const int stream_id = 1) = 0; int stream_id = 0) = 0;
/*! \brief Return the name of this graph */
string name() const { return name_; } string name() const { return name_; }
protected: protected:
/*! \brief Store the name and running phase */
string name_, phase_; string name_, phase_;
/*! \brief Store the defined arguments */
Map<string, Argument> args_; Map<string, Argument> args_;
/*! \brief Store the parent workspace */
Workspace* ws_; Workspace* ws_;
}; };
class Graph : public GraphBase { class Graph : public GraphBase {
public: public:
/*! \brief Default constructor */
Graph(const GraphDef& meta_graph, Workspace* ws); Graph(const GraphDef& meta_graph, Workspace* ws);
/*! \brief Default deconstructor */
virtual ~Graph() { for (auto* op : ops_) delete op; } virtual ~Graph() { for (auto* op : ops_) delete op; }
/*! \brief Create a graph from the optimized def */
bool Create( bool Create(
const GraphDef& optimized_graph, const GraphDef& optimized_graph,
Workspace* ws) override; Workspace* ws) override;
/*! \brief Run the graph once synchronously */
bool Run( bool Run(
const string& include, const string& include,
const string& exclude, const string& exclude,
const int stream_id = 1) override; int stream_id = 0) override;
GraphDef Prune(const GraphDef& meta_graph);
GraphDef Share(const GraphDef& optimized_graph);
void ShareGrads(GraphDef& optimized_graph);
GraphDef BuildUpdateOps(const GraphDef& meta_graph);
void RecomputingAware(
const GraphDef& optimized_graph,
Workspace* ws);
/*! \brief Return the parent workspace */
Workspace* ws() const { return ws_; } Workspace* ws() const { return ws_; }
protected: protected:
void ForwardShareDyeing( /*! \brief Store the internal operators */
const string& u,
const string& ancestor);
void ForwardPruneDyeing(
const string& u,
const string& leaf,
const vector<string>& path);
void BackwardPruneDyeing(string v);
vector<OperatorBase*> ops_; vector<OperatorBase*> ops_;
Map<string, Node> dag_;
Map<string, bool> visited_, colored_;
Map<string, string> renamed_;
Set<string> targets_;
}; };
/*! \brief Create a graph from the raw def */
GraphBase* NewGraph( GraphBase* NewGraph(
const GraphDef& meta_graph, const GraphDef& meta_graph,
Workspace* ws); Workspace* ws);
......
...@@ -19,14 +19,19 @@ namespace dragon { ...@@ -19,14 +19,19 @@ namespace dragon {
class GraphGradientMaker { class GraphGradientMaker {
public: public:
GraphGradientMaker(): cur_op_idx_(0) {} GraphGradientMaker()
: cur_op_idx_(0) {}
void Make( void Make(
const GraphDef& forward_def, const vector<OperatorDef*>& forward_def,
const vector<string>& targets, const vector<string>& targets,
GraphDef& new_def); GraphDef& new_def);
void Share(const string& grads_prefix, GraphDef& graph); void Make(
const GraphDef& forward_def,
GraphDef& backward_def);
void Share(GraphDef& graph);
void SetTerms(const Map<string, string>& terms) { terms_ = terms; } void SetTerms(const Map<string, string>& terms) { terms_ = terms; }
void SetOperatorPrefix(const string& prefix) { op_prefix_ = prefix; } void SetOperatorPrefix(const string& prefix) { op_prefix_ = prefix; }
......
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_CORE_GRAPH_OPTIMIZER_H_
#define DRAGON_CORE_GRAPH_OPTIMIZER_H_
#include "core/common.h"
namespace dragon {
class Workspace;
class GraphOptimizer {
public:
/*! \brief The simple node structure */
struct Node {
vector<string> parents;
vector<string> childs;
int op_idx = -1;
OperatorDef op_def;
};
/*! \brief Default constructor */
GraphOptimizer(Workspace* ws) : ws_(ws) {}
/*! \brief Prune the redundant nodes (-O1) */
GraphDef PruneNodes(const GraphDef& input_def);
/*! \brief Add the inplace for outputs (-O2) */
GraphDef AddInplace(const GraphDef& input_def);
/*! \brief Plan the recomputing for inputs (-O3) */
GraphDef MirrorStage(
const GraphDef& input_def,
Map< string, vector<int> >& op_indices);
/*! \brief Allocate the buffer for outputs (-O3) */
GraphDef SimulateGC(const GraphDef& input_def);
protected:
/*! \brief Traverse from input gradients to dying the nodes */
void ForwardPruneTraversal(
const string& u,
const string& leaf,
const vector<string>& path);
/*! \brief Traverse from targets to dying the nodes */
void BackwardPruneTraversal(const string& v);
/*! \brief Traverse from inputs to find the available inplace chain */
void InplaceTraversal(
const string& u,
const string& ancestor);
/* \brief Store the workspace of parent graph */
Workspace* ws_;
/* \brief Store the DAG */
Map<string, Node> dag_;
/* \brief Store the traversal flags */
Map<string, bool> visited_, colored_;
/* \brief Store the inplace relations */
Map<string, string> renamed_;
};
} // namespace dragon
#endif // DRAGON_CORE_GRAPH_OPTIMIZER_H_
\ No newline at end of file
...@@ -35,8 +35,6 @@ class MixedMemory { ...@@ -35,8 +35,6 @@ class MixedMemory {
STATE_AT_CUDA, STATE_AT_CUDA,
/*! \brief Memory could be modified by CNMLContext last time */ /*! \brief Memory could be modified by CNMLContext last time */
STATE_AT_CNML, STATE_AT_CNML,
/*! \brief Memory should be copied to another device next time */
SWITCHED,
/*! \brief Host and Device now hold the same contents */ /*! \brief Host and Device now hold the same contents */
SYNCED, SYNCED,
} State; } State;
...@@ -46,7 +44,7 @@ class MixedMemory { ...@@ -46,7 +44,7 @@ class MixedMemory {
cuda_ptr_(nullptr), cnml_ptr_(nullptr) {} cuda_ptr_(nullptr), cnml_ptr_(nullptr) {}
/*! \brief Constructor with the known meta and size */ /*! \brief Constructor with the known meta and size */
MixedMemory(const TypeMeta& meta, const size_t nbytes) MixedMemory(const TypeMeta& meta, size_t nbytes)
: meta_(meta), nbytes_(nbytes), cpu_ptr_(nullptr), : meta_(meta), nbytes_(nbytes), cpu_ptr_(nullptr),
cuda_ptr_(nullptr), cnml_ptr_(nullptr) {} cuda_ptr_(nullptr), cnml_ptr_(nullptr) {}
...@@ -54,19 +52,19 @@ class MixedMemory { ...@@ -54,19 +52,19 @@ class MixedMemory {
~MixedMemory(); ~MixedMemory();
/*! \brief Return the const data pointer on CPUContext */ /*! \brief Return the const data pointer on CPUContext */
const void* cpu_data(); const void* cpu_data(size_t nbytes = 0);
/*! \brief Return the const data pointer on CUDAContext */ /*! \brief Return the const data pointer on CUDAContext */
const void* cuda_data(); const void* cuda_data(size_t nbytes = 0);
/*! \brief Return the const data pointer on CNMLContext */ /*! \brief Return the const data pointer on CNMLContext */
const void* cnml_data(); const void* cnml_data();
/*! \brief Return the mutable data pointer on CPUContext */ /*! \brief Return the mutable data pointer on CPUContext */
void* mutable_cpu_data(); void* mutable_cpu_data(size_t nbytes = 0);
/*! \brief Return the mutable data pointer on CUDAContext */ /*! \brief Return the mutable data pointer on CUDAContext */
void* mutable_cuda_data(); void* mutable_cuda_data(size_t nbytes = 0);
/*! \brief Return the mutable data pointer on CNMLContext */ /*! \brief Return the mutable data pointer on CNMLContext */
void* mutable_cnml_data(); void* mutable_cnml_data();
...@@ -85,11 +83,11 @@ class MixedMemory { ...@@ -85,11 +83,11 @@ class MixedMemory {
/*! \brief Set the cpu data pointer from external context */ /*! \brief Set the cpu data pointer from external context */
void set_cpu_data(void* cpu_ptr, size_t nbytes); void set_cpu_data(void* cpu_ptr, size_t nbytes);
/*! \brief Switch to the device set by Context before */
void SwitchToDevice();
/*! \brief Switch to the specified device */ /*! \brief Switch to the specified device */
void SwitchToDevice(int device_id);
/*! \brief Switch to the specified cuda device */
void SwitchToCUDADevice(int device_id); void SwitchToCUDADevice(int device_id);
/*! \brief Return the total bytes of this memory */ /*! \brief Return the total bytes of this memory */
...@@ -110,14 +108,17 @@ class MixedMemory { ...@@ -110,14 +108,17 @@ class MixedMemory {
/*! \brief Set the storage order */ /*! \brief Set the storage order */
void set_order(StorageOrder order) { order_ = order; } void set_order(StorageOrder order) { order_ = order; }
/*! \brief Return the device id of the memory on device */
int device_id() const { return ptr_device_; }
/*! \brief Return a string to describe the internal structure */ /*! \brief Return a string to describe the internal structure */
const Map<string, string> info() const; const Map<string, string> info() const;
/*! \brief Control the state machine to CPUContext */ /*! \brief Control the state machine to CPUContext */
void ToCPU(); void ToCPU(size_t nbytes = 0);
/*! \brief Control the state machine to CUDAContext */ /*! \brief Control the state machine to CUDAContext */
void ToCUDA(); void ToCUDA(size_t nbytes = 0);
private: private:
/*! \brief The type meta to call the deconstructor */ /*! \brief The type meta to call the deconstructor */
...@@ -137,7 +138,7 @@ class MixedMemory { ...@@ -137,7 +138,7 @@ class MixedMemory {
/*! \brief Whether this memory owns the cpu data pointer */ /*! \brief Whether this memory owns the cpu data pointer */
int own_cpu_ptr_ = 1; int own_cpu_ptr_ = 1;
/*! \brief Store the device id for some data pointers */ /*! \brief Store the device id for some data pointers */
int ptr_device_ = 0; int ptr_device_ = 0;
......
...@@ -30,10 +30,10 @@ class Workspace; ...@@ -30,10 +30,10 @@ class Workspace;
class OperatorBase { class OperatorBase {
public: public:
/*! Default constructor */ /*! \brief Default constructor */
OperatorBase(const OperatorDef& def, Workspace* ws); OperatorBase(const OperatorDef& def, Workspace* ws);
/*! Default deconstructor */ /*! \brief Default deconstructor */
virtual ~OperatorBase() {} virtual ~OperatorBase() {}
/*! \brief Return the specified input tensor */ /*! \brief Return the specified input tensor */
...@@ -49,19 +49,13 @@ class OperatorBase { ...@@ -49,19 +49,13 @@ class OperatorBase {
int OutputSize() { return (int)outputs_.size(); } int OutputSize() { return (int)outputs_.size(); }
/*! \brief Modify this operator according to the given def */ /*! \brief Modify this operator according to the given def */
void MutableOp(const OperatorDef& def); void UpdateFrom(const OperatorDef& def);
/*! \brief Modify this operator according to the given properties */
void MutableOp(
const vector<string>& inputs,
const vector<string>& outputs,
const string& anchor);
/*! \brief Switch the internal running phase */ /*! \brief Switch the internal running phase */
void SwitchToPhase(const string& phase) { phase_ = phase; } void SwitchToPhase(const string& phase) { phase_ = phase; }
/*! \brief Run this operator on the specified stream */ /*! \brief Run this operator on the specified stream */
virtual void Run(int stream_id = 1) { NOT_IMPLEMENTED; } virtual void Run(int stream_id = 0) { NOT_IMPLEMENTED; }
/*! \brief Fusion this operator into the specified graph */ /*! \brief Fusion this operator into the specified graph */
virtual void Fusion(void* graph) { NOT_IMPLEMENTED; } virtual void Fusion(void* graph) { NOT_IMPLEMENTED; }
...@@ -100,14 +94,14 @@ class OperatorBase { ...@@ -100,14 +94,14 @@ class OperatorBase {
/*! \brief Return the specified argument */ /*! \brief Return the specified argument */
const Argument& arg(const string& name) { return *(args_[name]); } const Argument& arg(const string& name) { return *(args_[name]); }
typedef Map<string, vector<OperatorBase*> > RecomputeMap; typedef Map<string, vector<OperatorBase*> > SubGraph;
/*! \brief Return the recomputing map of this operator */ /*! \brief Return the recomputing subgraph of this operator */
RecomputeMap& recompute_map() { return recompute_map_; } SubGraph& subgraph() { return subgraph_; }
/*! \brief Set the given recomputing map */ /*! \brief Set the given recomputing subgraph */
void set_recompute_map(RecomputeMap recompute_map) { void set_subgraph(SubGraph subgraph) {
recompute_map_ = recompute_map; subgraph_ = subgraph;
} }
/*! \brief Return the stored operator def */ /*! \brief Return the stored operator def */
...@@ -129,7 +123,7 @@ class OperatorBase { ...@@ -129,7 +123,7 @@ class OperatorBase {
protected: protected:
string phase_, anchor_; string phase_, anchor_;
Map<std::string, const Argument*> args_; Map<std::string, const Argument*> args_;
Map<string, vector<OperatorBase*> > recompute_map_; SubGraph subgraph_;
vector<Tensor*> inputs_, outputs_; vector<Tensor*> inputs_, outputs_;
OperatorDef def_; OperatorDef def_;
Workspace* ws_; Workspace* ws_;
...@@ -138,50 +132,66 @@ class OperatorBase { ...@@ -138,50 +132,66 @@ class OperatorBase {
template <class Context> template <class Context>
class Operator : public OperatorBase { class Operator : public OperatorBase {
public: public:
/*! \brief Default constructor */
Operator(const OperatorDef& def, Workspace* ws) Operator(const OperatorDef& def, Workspace* ws)
: OperatorBase(def, ws), ctx_(def.device_option()), : OperatorBase(def, ws), ctx_(def.device_option()),
allow_recompute_(OperatorBase::Arg<bool>( allow_recomputing_(OperatorBase::Arg<bool>(
"recomputing_aware", false)), "allow_recomputing", false)),
do_sync_(OperatorBase::Arg<bool>( do_sync_(OperatorBase::Arg<bool>(
"do_sync", true)) { "do_sync", false)) {
allow_run_ = true; allow_run_ = true;
allow_run_ &= _MPICheck(); allow_run_ &= MPICheck();
allow_run_ &= (!(OutputSize() == 1 && allow_run_ &= (!(OutputSize() == 1 &&
Output(0)->name() == "ignore")); Output(0)->name() == "ignore"));
} }
void Run(int stream_id = 1) final { /*! \brief Run this operator on the specified stream */
void Run(int stream_id = 0) final {
if (!allow_run_) return; if (!allow_run_) return;
if (allow_recompute_) MakeResource(); if (allow_recomputing_) PrepareResource();
ctx()->SwitchToDevice(stream_id); ctx()->SwitchToDevice(stream_id);
MemorySwitch(); MemorySwitch();
RunOnDevice(); RunOnDevice();
if (do_sync_) ctx()->FinishDeviceCompution(); if (do_sync_ || stream_id > 0) {
if (allow_recompute_) CleanResource(); // We will sync the stream 0 at the specific time
ctx()->FinishDeviceCompution();
}
if (allow_recomputing_) ReleaseResource();
} }
virtual void ElimateCorruption(); /*! \brief Prepare the content of inputs */
virtual void MakeResource(); virtual void PrepareResource();
virtual void CleanResource();
/*! \brief Release the ownership of inputs */
virtual void ReleaseResource();
/*! \brief Coordinate the context of inputs and outputs */
virtual void MemorySwitch() { virtual void MemorySwitch() {
for (auto* I : inputs_) for (auto* e : inputs_)
if(I->name() != "ignore") I->SwitchToDevice(); if(e->name() != "ignore")
for (auto* O : outputs_) e->SwitchToDevice(ctx()->device_id());
if(O->name() != "ignore") O->SwitchToDevice(); for (auto* e : outputs_)
if(e->name() != "ignore")
e->SwitchToDevice(ctx()->device_id());
} }
/*! \brief Implement the detailed execution */
virtual void RunOnDevice() = 0; virtual void RunOnDevice() = 0;
/*! \brief Return the internal context */
Context* ctx() { return &ctx_; } Context* ctx() { return &ctx_; }
/*! \brief Whether this operator can be ignored */
bool AllowRun() { return allow_run_; } bool AllowRun() { return allow_run_; }
protected: protected:
/*! \brief Store the internal context */
Context ctx_; Context ctx_;
bool allow_run_, allow_recompute_, do_sync_; bool allow_run_, allow_recomputing_, do_sync_;
private: private:
bool _MPICheck() { /*! \brief Check the MPI conditions */
bool MPICheck() {
#ifndef WITH_MPI #ifndef WITH_MPI
return true; return true;
#else #else
...@@ -197,7 +207,13 @@ class Operator : public OperatorBase { ...@@ -197,7 +207,13 @@ class Operator : public OperatorBase {
} }
}; };
OperatorBase* CreateOperator(const OperatorDef& def, Workspace* ws); /*! \brief New a operator from the raw def */
OperatorBase* NewOperator(
const OperatorDef& def,
Workspace* ws);
/*! Macros */
#define USE_SIMPLE_CTOR_DTOR(name) \ #define USE_SIMPLE_CTOR_DTOR(name) \
name(const OperatorDef& def, Workspace* ws) \ name(const OperatorDef& def, Workspace* ws) \
...@@ -350,7 +366,9 @@ DECLARE_REGISTRY( ...@@ -350,7 +366,9 @@ DECLARE_REGISTRY(
<< "\nExcepted the size of " << #argument \ << "\nExcepted the size of " << #argument \
<< " > " << idx << ". (Got " \ << " > " << idx << ". (Got " \
<< argument##_desc.size() << ")."; \ << argument##_desc.size() << ")."; \
Tensor* argument##_tensor = ws()->GetTensor(argument##_desc[idx]); \ Tensor* argument##_tensor = ws()->GetTensor( \
str::replace_first(argument##_desc[idx], \
"${ANCHOR}", anchor())); \
CHECK(argument##_tensor->IsType<type>()) \ CHECK(argument##_tensor->IsType<type>()) \
<< "\nThe type of " << #argument << " should be " << #type << "."; \ << "\nThe type of " << #argument << " should be " << #type << "."; \
CHECK_EQ(argument##_tensor->count(), 1) \ CHECK_EQ(argument##_tensor->count(), 1) \
......
...@@ -46,10 +46,17 @@ class GradientMakerBase { ...@@ -46,10 +46,17 @@ class GradientMakerBase {
virtual Gradient Make() { virtual Gradient Make() {
vector<OperatorDef> new_defs = MakeDefs(); vector<OperatorDef> new_defs = MakeDefs();
Argument anchor; if (def.has_uid()) {
anchor.set_name("anchor"); anchor.set_s(def.name()); // Attach the anchor to the name if having UID
for (int i = 0; i < new_defs.size(); i++) for (int i = 0; i < new_defs.size(); i++)
new_defs[i].add_arg()->CopyFrom(anchor); new_defs[i].set_name(def.name());
} else {
// Otherwise, just put it into the arguments
Argument anchor;
anchor.set_name("anchor"); anchor.set_s(def.name());
for (int i = 0; i < new_defs.size(); i++)
new_defs[i].add_arg()->CopyFrom(anchor);
}
return Gradient(new_defs, g_inputs_, DefaultValues()); return Gradient(new_defs, g_inputs_, DefaultValues());
}; };
...@@ -117,10 +124,10 @@ class NoGradient : public GradientMakerBase { ...@@ -117,10 +124,10 @@ class NoGradient : public GradientMakerBase {
class SimpleGradientMaker final : public GradientMakerBase { class SimpleGradientMaker final : public GradientMakerBase {
public: public:
/*! /*!
* <SimpleMaker> * <SimpleMaker>
* *
* Inputs: X1, X2, ..., Xn, dY * Inputs: X1, X2, ..., Xn, dY
* Outputs: dX1, dX2, ..., dXn * Outputs: dX1, dX2, ..., dXn
* *
*/ */
GRADIENT_MAKER_CTOR(SimpleGradientMaker); GRADIENT_MAKER_CTOR(SimpleGradientMaker);
...@@ -141,12 +148,12 @@ class SimpleGradientMaker final : public GradientMakerBase { ...@@ -141,12 +148,12 @@ class SimpleGradientMaker final : public GradientMakerBase {
class InplaceGradientMaker final : public GradientMakerBase { class InplaceGradientMaker final : public GradientMakerBase {
public: public:
/*! /*!
* <InplaceMaker> * <InplaceMaker>
* *
* Inputs: Y, dY * Inputs: Y, dY
* Outputs: dX * Outputs: dX
* *
*/ */
GRADIENT_MAKER_CTOR(InplaceGradientMaker); GRADIENT_MAKER_CTOR(InplaceGradientMaker);
vector<OperatorDef> MakeDefs() override { vector<OperatorDef> MakeDefs() override {
return SingleDef( return SingleDef(
......
...@@ -80,7 +80,7 @@ class Tensor { ...@@ -80,7 +80,7 @@ class Tensor {
int ndim() const { return (int)dims_.size(); } int ndim() const { return (int)dims_.size(); }
/*! \brief Return the dimension of given axis */ /*! \brief Return the dimension of given axis */
int64_t dim(const int64_t i) const{ return dims_[axis(i)]; } int64_t dim(int64_t i) const{ return dims_[axis(i)]; }
/*! \brief Return all the dimensions */ /*! \brief Return all the dimensions */
const vector<int64_t>& dims() const { return dims_; } const vector<int64_t>& dims() const { return dims_; }
...@@ -95,7 +95,7 @@ class Tensor { ...@@ -95,7 +95,7 @@ class Tensor {
size_t capacity() const { return capacity_; } size_t capacity() const { return capacity_; }
/*! \brief Return the number of elements along the [start, end) axes */ /*! \brief Return the number of elements along the [start, end) axes */
int64_t count(const int64_t start, const int64_t end) const { int64_t count(int64_t start, int64_t end) const {
int64_t nelements = 1; int64_t nelements = 1;
for (int64_t i = start; i < end; i++) nelements *= dim(i); for (int64_t i = start; i < end; i++) nelements *= dim(i);
return nelements; return nelements;
...@@ -105,10 +105,10 @@ class Tensor { ...@@ -105,10 +105,10 @@ class Tensor {
int64_t count() const { return (int64_t)size_; } int64_t count() const { return (int64_t)size_; }
/*! \brief Return the number of elements from the start axis */ /*! \brief Return the number of elements from the start axis */
int64_t count(const int64_t start) const { return count(start, ndim()); } int64_t count(int64_t start) const { return count(start, ndim()); }
/*! \brief Return the stride of given axis */ /*! \brief Return the stride of given axis */
int64_t stride(const int64_t i) const { return strides_[axis(i)]; } int64_t stride(int64_t i) const { return strides_[axis(i)]; }
/*! \brief Return all the strides */ /*! \brief Return all the strides */
const vector<int64_t>& strides() const { return strides_; } const vector<int64_t>& strides() const { return strides_; }
...@@ -128,11 +128,11 @@ class Tensor { ...@@ -128,11 +128,11 @@ class Tensor {
/*! \brief Return a string to describe the dimensions of this tensor */ /*! \brief Return a string to describe the dimensions of this tensor */
string DimString() const { return DimString(dims_); } string DimString() const { return DimString(dims_); }
/*! \brief Whether the memory of this tensor is unstable */ /*! \brief Return the version of this tensor */
bool is_corrupted() const { return is_corrupted_; } int version() const { return version_; }
/*! \brief Mark the internal memory to be unstable */ /*! \brief Set the version of this tensor */
void Corrupt() { is_corrupted_ = true; } void set_version(int version) { version_ = version; }
/*! \brief Whether this tensor holds a valid memory */ /*! \brief Whether this tensor holds a valid memory */
bool has_memory() const { return memory_ || ex_memory_ != nullptr; } bool has_memory() const { return memory_ || ex_memory_ != nullptr; }
...@@ -152,10 +152,10 @@ class Tensor { ...@@ -152,10 +152,10 @@ class Tensor {
return memory()->state(); return memory()->state();
} }
/*! \brief Switch the memory to device set by Context before */ /*! \brief Switch the memory to the specific device */
void SwitchToDevice() { void SwitchToDevice(int device_id) {
MixedMemory* mem = memory(); MixedMemory* mem = memory();
if (mem) mem->SwitchToDevice(); if (mem) mem->SwitchToDevice(device_id);
} }
/*! \brief Return the type meta of this tensor */ /*! \brief Return the type meta of this tensor */
...@@ -177,10 +177,10 @@ class Tensor { ...@@ -177,10 +177,10 @@ class Tensor {
} else { } else {
if (TypeMeta::Id<Context>() == if (TypeMeta::Id<Context>() ==
TypeMeta::Id<CPUContext>()) { TypeMeta::Id<CPUContext>()) {
*data_ptr = mem->mutable_cpu_data(); *data_ptr = mem->mutable_cpu_data(nbytes());
} else if (TypeMeta::Id<Context>() == } else if (TypeMeta::Id<Context>() ==
TypeMeta::Id<CUDAContext>()) { TypeMeta::Id<CUDAContext>()) {
*data_ptr = mem->mutable_cuda_data(); *data_ptr = mem->mutable_cuda_data(nbytes());
} else if (TypeMeta::Id<Context>() == } else if (TypeMeta::Id<Context>() ==
TypeMeta::Id<CNMLContext>()) { TypeMeta::Id<CNMLContext>()) {
*data_ptr = mem->mutable_cnml_data(); *data_ptr = mem->mutable_cnml_data();
...@@ -198,10 +198,10 @@ class Tensor { ...@@ -198,10 +198,10 @@ class Tensor {
CHECK(mem) << "\nMemory access before allowcating."; CHECK(mem) << "\nMemory access before allowcating.";
if (TypeMeta::Id<Context>() == if (TypeMeta::Id<Context>() ==
TypeMeta::Id<CPUContext>()) { TypeMeta::Id<CPUContext>()) {
return mem->cpu_data(); return mem->cpu_data(nbytes());
} else if (TypeMeta::Id<Context>() == } else if (TypeMeta::Id<Context>() ==
TypeMeta::Id<CUDAContext>()) { TypeMeta::Id<CUDAContext>()) {
return mem->cuda_data(); return mem->cuda_data(nbytes());
} else if (TypeMeta::Id<Context>() == } else if (TypeMeta::Id<Context>() ==
TypeMeta::Id<CNMLContext>()) { TypeMeta::Id<CNMLContext>()) {
return mem->cnml_data(); return mem->cnml_data();
...@@ -258,10 +258,18 @@ class Tensor { ...@@ -258,10 +258,18 @@ class Tensor {
T* mutable_data() { T* mutable_data() {
void* data_ptr; void* data_ptr;
mutable_data_ptr<Context>(&data_ptr); mutable_data_ptr<Context>(&data_ptr);
if (data_ptr && meta_ == TypeMeta::Make<T>()) if (data_ptr) {
return static_cast<T*>(data_ptr); auto meta = TypeMeta::Make<T>();
return static_cast<T*>( if (meta_ == meta) {
raw_mutable_data<Context>(TypeMeta::Make<T>())); return static_cast<T*>(data_ptr);
} else if (capacity_ >=
size_ * meta.itemsize()) {
meta_ = meta;
return static_cast<T*>(data_ptr);
}
}
return static_cast<T*>(raw_mutable_data
<Context>(TypeMeta::Make<T>()));
} }
/*! \brief Get the typed const data pointer */ /*! \brief Get the typed const data pointer */
...@@ -325,6 +333,9 @@ class Tensor { ...@@ -325,6 +333,9 @@ class Tensor {
/*! \brief Store the size and capacity */ /*! \brief Store the size and capacity */
size_t size_ = 0, capacity_ = 0; size_t size_ = 0, capacity_ = 0;
/*! \brief Store the version for shared tensor */
int version_ = -1;
/*! \brief Store the dimensions and strides */ /*! \brief Store the dimensions and strides */
vector<int64_t> dims_, strides_; vector<int64_t> dims_, strides_;
...@@ -335,7 +346,7 @@ class Tensor { ...@@ -335,7 +346,7 @@ class Tensor {
MixedMemory* ex_memory_ = nullptr; MixedMemory* ex_memory_ = nullptr;
/*! \brief External memory indicators */ /*! \brief External memory indicators */
bool is_corrupted_ = false, is_shared_ = false, own_mem_ = true; bool is_shared_ = false, own_mem_ = true;
}; };
} // namespace dragon } // namespace dragon
......
...@@ -52,12 +52,12 @@ class TypeMeta { ...@@ -52,12 +52,12 @@ class TypeMeta {
return *this; return *this;
} }
bool operator == (const TypeMeta& other) const { bool operator == (const TypeMeta& other) const {
return (id_ == other.id_); return (id_ == other.id_);
} }
bool operator != (const TypeMeta& other) const { bool operator != (const TypeMeta& other) const {
return (id_ != other.id_); return (id_ != other.id_);
} }
const TypeId& id() const { return id_; } const TypeId& id() const { return id_; }
...@@ -69,8 +69,8 @@ class TypeMeta { ...@@ -69,8 +69,8 @@ class TypeMeta {
template <typename T> template <typename T>
static TypeId Id() { static TypeId Id() {
// return T's id // Return T's id
// using a intptr_t as hash key // Using a intptr_t as hash key
return TypeRegister<T>::id(); return TypeRegister<T>::id();
} }
...@@ -78,7 +78,7 @@ class TypeMeta { ...@@ -78,7 +78,7 @@ class TypeMeta {
static size_t Itemsize() { return sizeof(T); } static size_t Itemsize() { return sizeof(T); }
template <typename T> template <typename T>
bool Match() const { return (id_ == Id<T>()); } bool Match() const { return (id_ == Id<T>()); }
template <typename T> template <typename T>
static void Ctor(void* ptr, size_t n) { static void Ctor(void* ptr, size_t n) {
......
...@@ -19,14 +19,12 @@ ...@@ -19,14 +19,12 @@
namespace dragon { namespace dragon {
#define WORKSPACE_MAX_CORRUPTED_SIZE 2
class Workspace { class Workspace {
public: public:
typedef Map<string, Map<string, int64_t> > DummyNameMap; typedef Map<string, Map<string, int64_t> > DummyNameMap;
typedef Map<string, unique_ptr<Tensor> > TensorMap; typedef Map<string, unique_ptr<Tensor> > TensorMap;
typedef Map<string, string> TensorProxyMap; typedef Map<string, string> TensorAliasMap;
typedef Map<string, TensorFillerProto> TensorFillerMap; typedef Map<string, TensorFillerProto> TensorFillerMap;
typedef Map<string, unique_ptr<OperatorBase> > OperatorMap; typedef Map<string, unique_ptr<OperatorBase> > OperatorMap;
...@@ -73,7 +71,7 @@ class Workspace { ...@@ -73,7 +71,7 @@ class Workspace {
/* \brief Whether the specified filler is in this workspace */ /* \brief Whether the specified filler is in this workspace */
bool HasFiller(const string& name, bool use_remote = true) const; bool HasFiller(const string& name, bool use_remote = true) const;
/*! \brief Create the specified filler */ /*! \brief Create the specified filler */
void CreateFiller(const TensorFillerProto filler); void CreateFiller(const TensorFillerProto filler);
...@@ -107,19 +105,15 @@ class Workspace { ...@@ -107,19 +105,15 @@ class Workspace {
return Tcaches; return Tcaches;
} }
/*! \brief Creathe a persistent operator in this workspace */ /*! \brief Create a operator in this workspace */
void CreatePersistentOp(const OperatorDef& def); OperatorBase* CreateOperator(const OperatorDef& def);
/*! \brief Run the specified persistent operator */ /*! \brief Run the specified persistent operator */
void RunPersistentOp(
const string& key,
const string& anchor,
const vector<string>& inputs,
const vector<string>& outputs);
/*! \brief Try to run the operator in a adaptive mode */
void RunOperator(const OperatorDef& def); void RunOperator(const OperatorDef& def);
/*! \brief Try to run the operator in a adaptive mode */
void RunOperatorOnce(const OperatorDef& def);
/*! \brief Create a Graph in this workspace */ /*! \brief Create a Graph in this workspace */
GraphBase* CreateGraph(const GraphDef& def); GraphBase* CreateGraph(const GraphDef& def);
...@@ -128,13 +122,13 @@ class Workspace { ...@@ -128,13 +122,13 @@ class Workspace {
const string& graph_name, const string& graph_name,
const string& include, const string& include,
const string& exclude, const string& exclude,
const int stream_id = 1); int stream_id = 0);
/*! \brief Return all the stored graph names */ /*! \brief Return all the stored graph names */
vector<string> GetGraphs() const; vector<string> GetGraphs() const;
/* \brief Set a proxy name for the tensor */ /* \brief Set an alias for the tensor */
bool SetTensorProxy(const string& key, const string& proxy); bool SetTensorAlias(const string& name, const string& alias);
/* \brief Return a unique dummy name within this workspace */ /* \brief Return a unique dummy name within this workspace */
string GetDummyName( string GetDummyName(
...@@ -157,7 +151,7 @@ class Workspace { ...@@ -157,7 +151,7 @@ class Workspace {
TensorFillerMap tensor_filler_map_; TensorFillerMap tensor_filler_map_;
/*! \brief Store the proxy name of tensors */ /*! \brief Store the proxy name of tensors */
TensorProxyMap tensor_proxy_map_; TensorAliasMap tensor_alias_map_;
/*! \brief Store the registered operators for dynamic graph */ /*! \brief Store the registered operators for dynamic graph */
OperatorMap operator_map_; OperatorMap operator_map_;
......
...@@ -99,6 +99,6 @@ class CuDNNSoftmaxGradientOp final : public Operator<Context> { ...@@ -99,6 +99,6 @@ class CuDNNSoftmaxGradientOp final : public Operator<Context> {
#endif // WITH_CUDNN #endif // WITH_CUDNN
} } // namespace dragon
#endif // DRAGON_OPERATORS_ACTIVATION_SOFTMAX_OP_H_ #endif // DRAGON_OPERATORS_ACTIVATION_SOFTMAX_OP_H_
\ No newline at end of file
...@@ -10,29 +10,29 @@ ...@@ -10,29 +10,29 @@
* ------------------------------------------------------------ * ------------------------------------------------------------
*/ */
#ifndef DRAGON_OPERATORS_UPDATE_MOVING_AVERAGE_OP_H_ #ifndef DRAGON_OPERATORS_ARITHMETIC_ACCUMULATE_OP_H_
#define DRAGON_OPERATORS_UPDATE_MOVING_AVERAGE_OP_H_ #define DRAGON_OPERATORS_ARITHMETIC_ACCUMULATE_OP_H_
#include "core/operator.h" #include "core/operator.h"
namespace dragon { namespace dragon {
template <class Context> template <class Context>
class MovingAverageOp final : public Operator<Context> { class AccumulateOp final : public Operator<Context> {
public: public:
MovingAverageOp(const OperatorDef& def, Workspace* ws) AccumulateOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws), : Operator<Context>(def, ws),
decay(OperatorBase::Arg<float>("decay", 1.f)) {} alpha(OperatorBase::Arg<float>("alpha", 1.f)),
beta(OperatorBase::Arg<float>("beta", 1.f)) {}
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override;
template <typename T> void RunWithType(); template <typename T> void RunWithType(Tensor* X, Tensor* Y);
protected: protected:
float decay; float alpha, beta;
}; };
} // namespace dragon } // namespace dragon
#endif // DRAGON_OPERATORS_ARITHMETIC_ACCUMULATE_OP_H_
#endif // DRAGON_OPERATORS_UPDATE_MOVING_AVERAGE_OP_H_ \ No newline at end of file
\ No newline at end of file
...@@ -46,12 +46,12 @@ class AffineGradientOp final : public Operator<Context> { ...@@ -46,12 +46,12 @@ class AffineGradientOp final : public Operator<Context> {
void RunOnDevice() override; void RunOnDevice() override;
template <typename T> void BiasRunWithType(); template <typename T> void BiasRunWithType();
template <typename T> void ScaleRunWithType(); template <typename T> void ScaleRunWithType();
template <typename T> void ComputeScaleGradient(T* dYxX, T* dA);
template <typename T> void RunWithType(); template <typename T> void RunWithType();
protected: protected:
int64_t axis, num_axes; int64_t axis, num_axes;
int64_t outer_dim, inner_dim, scale_dim, sum_dim, dim; int64_t outer_dim, inner_dim, scale_dim, sum_dim, dim;
Tensor sum_result;
}; };
#ifdef WITH_CUDNN #ifdef WITH_CUDNN
...@@ -125,18 +125,12 @@ public: ...@@ -125,18 +125,12 @@ public:
template <typename DT, typename CT> template <typename DT, typename CT>
void ComputeScaleGradient(DT* dYxX, DT* dA); void ComputeScaleGradient(DT* dYxX, DT* dA);
template <typename DT, typename CT>
void ComputeBiasGradient(const DT* dY, DT* dB);
template <typename T> void ComputeScaleGradient_v2(T* dYxX, T* dA); template <typename T> void ComputeScaleGradient_v2(T* dYxX, T* dA);
template <typename T> void ComputeBiasGradient_v2(const T* dY, T* dB);
template <typename DT, typename CT> void RunWithType(); template <typename DT, typename CT> void RunWithType();
protected: protected:
USE_CUDNN_AFFINE_FUCNTIONS; USE_CUDNN_AFFINE_FUCNTIONS;
int64_t outer_dim, inner_dim, scale_dim, dim, sum_dim; int64_t outer_dim, inner_dim, scale_dim, dim, sum_dim;
Tensor sum_result;
}; };
#endif #endif
......
...@@ -10,36 +10,33 @@ ...@@ -10,36 +10,33 @@
* ------------------------------------------------------------ * ------------------------------------------------------------
*/ */
#ifndef DRAGON_OPERATORS_VISION_DENSE_CONCAT_OP_H_ #ifndef DRAGON_OPERATORS_ARITHMETIC_SQRT_OP_H_
#define DRAGON_OPERATORS_VISION_DENSE_CONCAT_OP_H_ #define DRAGON_OPERATORS_ARITHMETIC_SQRT_OP_H_
#include "operators/ndarray/concat_op.h" #include "core/operator.h"
namespace dragon { namespace dragon {
template <class Context> template <class Context>
class DenseConcatOp final : public ConcatOp<Context> { class SqrtOp final : public Operator<Context> {
public: public:
DenseConcatOp(const OperatorDef& def, Workspace* ws) USE_SIMPLE_CTOR_DTOR(SqrtOp);
: ConcatOp<Context>(def, ws) {}
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
template <typename T> void RunWithType();
}; };
template <class Context> template <class Context>
class DenseConcatGradientOp final : public ConcatGradientOp<Context> { class SqrtGradientOp final : public Operator<Context> {
public: public:
DenseConcatGradientOp(const OperatorDef& def, Workspace* ws) USE_SIMPLE_CTOR_DTOR(SqrtGradientOp);
: ConcatGradientOp<Context>(def, ws),
growth_rate(OperatorBase::Arg<int64_t>("growth_rate", 0)) {}
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void ElimateCorruption() override; void RunOnDevice() override;
template <typename T> void RestoreX1(); template <typename T> void RunWithType();
protected:
int64_t growth_rate;
}; };
} // namespace dragon } // namespace dragon
#endif // DRAGON_OPERATORS_VISION_DENSE_CONCAT_OP_H_ #endif // DRAGON_OPERATORS_ARITHMETIC_SQRT_OP_H_
\ No newline at end of file \ No newline at end of file
...@@ -19,7 +19,7 @@ namespace dragon { ...@@ -19,7 +19,7 @@ namespace dragon {
template <class Context> template <class Context>
class SquareOp final : public Operator<Context> { class SquareOp final : public Operator<Context> {
public: public:
USE_SIMPLE_CTOR_DTOR(SquareOp); USE_SIMPLE_CTOR_DTOR(SquareOp);
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
...@@ -29,7 +29,7 @@ public: ...@@ -29,7 +29,7 @@ public:
template <class Context> template <class Context>
class SquareGradientOp final : public Operator<Context> { class SquareGradientOp final : public Operator<Context> {
public: public:
USE_SIMPLE_CTOR_DTOR(SquareGradientOp); USE_SIMPLE_CTOR_DTOR(SquareGradientOp);
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
......
...@@ -37,7 +37,7 @@ class SigmoidFocalLossOp ...@@ -37,7 +37,7 @@ class SigmoidFocalLossOp
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override;
template <typename T> void RunWithType(); template <typename Tx, typename Ty> void RunWithType();
protected: protected:
float alpha, gamma, pos_alpha, neg_alpha; float alpha, gamma, pos_alpha, neg_alpha;
...@@ -66,7 +66,7 @@ class SigmoidFocalLossGradientOp ...@@ -66,7 +66,7 @@ class SigmoidFocalLossGradientOp
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override;
template <typename T> void RunWithType(); template <typename Tx, typename Ty> void RunWithType();
protected: protected:
float alpha, gamma, pos_alpha, neg_alpha; float alpha, gamma, pos_alpha, neg_alpha;
......
...@@ -37,7 +37,7 @@ class SoftmaxFocalLossOp ...@@ -37,7 +37,7 @@ class SoftmaxFocalLossOp
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override;
template <typename T> void RunWithType(); template <typename Tx, typename Ty> void RunWithType();
protected: protected:
float alpha, gamma, pos_alpha, neg_alpha; float alpha, gamma, pos_alpha, neg_alpha;
...@@ -66,7 +66,7 @@ class SoftmaxFocalLossGradientOp ...@@ -66,7 +66,7 @@ class SoftmaxFocalLossGradientOp
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override;
template <typename T> void RunWithType(); template <typename Tx, typename Ty> void RunWithType();
protected: protected:
float alpha, gamma, pos_alpha, neg_alpha; float alpha, gamma, pos_alpha, neg_alpha;
......
...@@ -10,29 +10,41 @@ ...@@ -10,29 +10,41 @@
* ------------------------------------------------------------ * ------------------------------------------------------------
*/ */
#ifndef DRAGON_OPERATORS_MISC_ASTYPE_OP_H_ #ifndef DRAGON_OPERATORS_MISC_CAST_OP_H_
#define DRAGON_OPERATORS_MISC_ASTYPE_OP_H_ #define DRAGON_OPERATORS_MISC_CAST_OP_H_
#include "core/operator.h" #include "core/operator.h"
namespace dragon { namespace dragon {
template <class Context> template <class Context>
class AsTypeOp final : public Operator<Context> { class CastOp final : public Operator<Context> {
public: public:
AsTypeOp(const OperatorDef& def, Workspace* ws) CastOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws), : Operator<Context>(def, ws),
dtype(OperatorBase::Arg<string>("dtype", "float32")), dtype(OperatorBase::Arg<string>("dtype", "float32")),
inplace(OperatorBase::Arg<bool>("inplace", false)) {} inplace(OperatorBase::Arg<bool>("inplace", false)) {}
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override;
protected: protected:
string dtype; string dtype;
bool inplace; bool inplace;
}; };
template <class Context>
class CastGradientOp final : public Operator<Context> {
public:
USE_SIMPLE_CTOR_DTOR(CastGradientOp);
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
protected:
string dtype;
};
} // namespace dragon } // namespace dragon
#endif // DRAGON_OPERATORS_MISC_ASTYPE_OP_H_ #endif // DRAGON_OPERATORS_MISC_CAST_OP_H_
\ No newline at end of file \ No newline at end of file
...@@ -128,7 +128,7 @@ public: ...@@ -128,7 +128,7 @@ public:
template <class Context> template <class Context>
class TruncatedNormalOp final : public InitializeOp<Context> { class TruncatedNormalOp final : public InitializeOp<Context> {
public: public:
TruncatedNormalOp(const OperatorDef& def, Workspace* ws) TruncatedNormalOp(const OperatorDef& def, Workspace* ws)
: InitializeOp<Context>(def, ws) { : InitializeOp<Context>(def, ws) {
this->filler_proto.set_type("truncated_normal"); this->filler_proto.set_type("truncated_normal");
......
...@@ -25,8 +25,7 @@ class AdamUpdateOp final : public UpdateOpBase<Context> { ...@@ -25,8 +25,7 @@ class AdamUpdateOp final : public UpdateOpBase<Context> {
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
USE_UPDATER_FUNCTIONS(Context); USE_UPDATER_FUNCTIONS(Context);
void ComputeRunWithFloat32() override; void ComputeUpdates(Tensor* dX) override;
void ComputeRunWithFloat16() override;
protected: protected:
int t; float lr, beta1, beta2, eps; int t; float lr, beta1, beta2, eps;
......
...@@ -75,7 +75,6 @@ class CollectiveUpdateOp final : public Operator<Context> { ...@@ -75,7 +75,6 @@ class CollectiveUpdateOp final : public Operator<Context> {
#ifdef WITH_NCCL #ifdef WITH_NCCL
ncclComm_t nccl_comm; ncclComm_t nccl_comm;
CUDAClosure<Context> closure;
#endif #endif
}; };
......
...@@ -25,8 +25,7 @@ class NesterovUpdateOp final : public UpdateOpBase<Context> { ...@@ -25,8 +25,7 @@ class NesterovUpdateOp final : public UpdateOpBase<Context> {
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
USE_UPDATER_FUNCTIONS(Context); USE_UPDATER_FUNCTIONS(Context);
void ComputeRunWithFloat32() override; void ComputeUpdates(Tensor* dX) override;
void ComputeRunWithFloat16() override;
protected: protected:
float lr, momentum; float lr, momentum;
......
...@@ -25,8 +25,7 @@ class RMSPropUpdateOp final : public UpdateOpBase<Context> { ...@@ -25,8 +25,7 @@ class RMSPropUpdateOp final : public UpdateOpBase<Context> {
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
USE_UPDATER_FUNCTIONS(Context); USE_UPDATER_FUNCTIONS(Context);
void ComputeRunWithFloat32() override; void ComputeUpdates(Tensor* dX) override;
void ComputeRunWithFloat16() override;
protected: protected:
float lr, decay, eps; float lr, decay, eps;
......
...@@ -26,8 +26,7 @@ class SGDUpdateOp final : public UpdateOpBase<Context> { ...@@ -26,8 +26,7 @@ class SGDUpdateOp final : public UpdateOpBase<Context> {
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
USE_UPDATER_FUNCTIONS(Context); USE_UPDATER_FUNCTIONS(Context);
void ComputeRunWithFloat32() override; void ComputeUpdates(Tensor* dX) override;
void ComputeRunWithFloat16() override;
protected: protected:
float old_lr, lr, momentum, correction; float old_lr, lr, momentum, correction;
......
...@@ -24,29 +24,29 @@ class UpdateOpBase : public Operator<Context> { ...@@ -24,29 +24,29 @@ class UpdateOpBase : public Operator<Context> {
: Operator<Context>(def, ws), : Operator<Context>(def, ws),
lr_mult(OperatorBase::Arg<float>("lr_mult", 1.f)), lr_mult(OperatorBase::Arg<float>("lr_mult", 1.f)),
decay_mult(OperatorBase::Arg<float>("decay_mult", 1.f)), decay_mult(OperatorBase::Arg<float>("decay_mult", 1.f)),
slot(OperatorBase::Arg<string>("slot", "")), slot(OperatorBase::Arg<string>("slot", "")) {
zero_grad(OperatorBase::Arg<bool>("zero_grad", true)) {
CHECK(!slot.empty()) << "\nRequired a non-empty slot"; CHECK(!slot.empty()) << "\nRequired a non-empty slot";
} }
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
string Slot() { return slot + "/" + Output(0)->name(); }
float Param(const string& name) const; float Param(const string& name) const;
string Slot();
void RunOnDevice() override; template <typename T>
template <typename T> void PreprocessRunWithType(); void ProcessGradients(Tensor* dX, Tensor* X);
virtual void ComputeRunWithFloat32() = 0; virtual void ComputeUpdates(Tensor* dX) = 0;
virtual void ComputeRunWithFloat16() = 0;
void UpdateRunWithFloat32(); template <typename T>
void UpdateRunWithFloat16(); void ApplyUpdates(Tensor* dX, Tensor* X);
void RunOnDevice() override;
protected: protected:
float lr_mult, decay_mult; float lr_mult, decay_mult;
float l2_decay, clip_thresh, scale_factor; float l2_decay, clip_thresh, scale_factor;
string slot; string slot;
bool zero_grad;
}; };
#define USE_UPDATER_FUNCTIONS(context) \ #define USE_UPDATER_FUNCTIONS(context) \
......
...@@ -88,6 +88,7 @@ class CuDNNConv2dOp final : public Conv2dOp<Context> { ...@@ -88,6 +88,7 @@ class CuDNNConv2dOp final : public Conv2dOp<Context> {
} }
void RunOnDevice() override; void RunOnDevice() override;
void SetConvDescFromInputs();
template <typename T> void ResetDesc(); template <typename T> void ResetDesc();
template <typename T> void RunWithType(); template <typename T> void RunWithType();
...@@ -101,7 +102,7 @@ class CuDNNConv2dOp final : public Conv2dOp<Context> { ...@@ -101,7 +102,7 @@ class CuDNNConv2dOp final : public Conv2dOp<Context> {
cudnnFilterDescriptor_t filter_desc; cudnnFilterDescriptor_t filter_desc;
size_t fwd_data_size; size_t fwd_data_size;
int64_t cudnn_group; int64_t cudnn_group;
vector<int64_t> input_dims; vector<int64_t> input_dims, filter_dims;
bool enable_tensor_core; bool enable_tensor_core;
}; };
...@@ -142,6 +143,7 @@ class CuDNNConv2dGradientOp final : public Conv2dGradientOp<Context> { ...@@ -142,6 +143,7 @@ class CuDNNConv2dGradientOp final : public Conv2dGradientOp<Context> {
} }
void RunOnDevice() override; void RunOnDevice() override;
void SetConvDescFromInputs();
template <typename T> void ResetDesc(); template <typename T> void ResetDesc();
template <typename T> void RunWithType(); template <typename T> void RunWithType();
...@@ -156,7 +158,7 @@ class CuDNNConv2dGradientOp final : public Conv2dGradientOp<Context> { ...@@ -156,7 +158,7 @@ class CuDNNConv2dGradientOp final : public Conv2dGradientOp<Context> {
cudnnFilterDescriptor_t filter_desc; cudnnFilterDescriptor_t filter_desc;
size_t bwd_filter_size, bwd_data_size; size_t bwd_filter_size, bwd_data_size;
int64_t cudnn_group; int64_t cudnn_group;
vector<int64_t> input_dims; vector<int64_t> input_dims, filter_dims;
bool enable_tensor_core; bool enable_tensor_core;
}; };
......
...@@ -20,10 +20,10 @@ namespace dragon { ...@@ -20,10 +20,10 @@ namespace dragon {
template <class Context> template <class Context>
class ConvTranspose2dOp : public ConvOpBase<Context> { class ConvTranspose2dOp : public ConvOpBase<Context> {
public: public:
ConvTranspose2dOp(const OperatorDef& def, Workspace* ws) ConvTranspose2dOp(const OperatorDef& def, Workspace* ws)
: ConvOpBase<Context>(def, ws) { : ConvOpBase<Context>(def, ws) {
this->num_spatial_axes = 2; this->num_spatial_axes = 2;
Setup(); Setup();
} }
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
USE_CONVOLUTION_FUNCTIONS; USE_CONVOLUTION_FUNCTIONS;
...@@ -95,6 +95,7 @@ class CuDNNConvTranspose2dOp final ...@@ -95,6 +95,7 @@ class CuDNNConvTranspose2dOp final
} }
void RunOnDevice() override; void RunOnDevice() override;
void SetConvDescFromInputs();
template <typename T> void ResetDesc(); template <typename T> void ResetDesc();
template <typename T> void RunWithType(); template <typename T> void RunWithType();
...@@ -108,7 +109,7 @@ class CuDNNConvTranspose2dOp final ...@@ -108,7 +109,7 @@ class CuDNNConvTranspose2dOp final
cudnnFilterDescriptor_t filter_desc; cudnnFilterDescriptor_t filter_desc;
size_t fwd_data_size; size_t fwd_data_size;
int64_t cudnn_group; int64_t cudnn_group;
vector<int64_t> input_dims; vector<int64_t> output_dims, filter_dims;
bool enable_tensor_core; bool enable_tensor_core;
}; };
...@@ -152,6 +153,7 @@ public: ...@@ -152,6 +153,7 @@ public:
} }
void RunOnDevice() override; void RunOnDevice() override;
void SetConvDescFromInputs();
template <typename T> void ResetDesc(); template <typename T> void ResetDesc();
template <typename T> void RunWithType(); template <typename T> void RunWithType();
...@@ -166,7 +168,7 @@ public: ...@@ -166,7 +168,7 @@ public:
cudnnFilterDescriptor_t filter_desc; cudnnFilterDescriptor_t filter_desc;
size_t bwd_filter_size, bwd_data_size; size_t bwd_filter_size, bwd_data_size;
int64_t cudnn_group; int64_t cudnn_group;
vector<int64_t> input_dims; vector<int64_t> output_dims, filter_dims;
bool enable_tensor_core; bool enable_tensor_core;
}; };
......
...@@ -55,6 +55,7 @@ class NNResizeGradientOp final : public Operator<Context> { ...@@ -55,6 +55,7 @@ class NNResizeGradientOp final : public Operator<Context> {
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override;
void RunWithFloat16();
template <typename T> void RunWithType(); template <typename T> void RunWithType();
protected: protected:
......
...@@ -26,7 +26,7 @@ class Pool2dOp : public Operator<Context> { ...@@ -26,7 +26,7 @@ class Pool2dOp : public Operator<Context> {
data_format(OperatorBase::Arg<string>("data_format", "NCHW")), data_format(OperatorBase::Arg<string>("data_format", "NCHW")),
padding(OperatorBase::Arg<string>("padding", "VALID")), padding(OperatorBase::Arg<string>("padding", "VALID")),
global_pooling(OperatorBase::Arg<bool>("global_pooling", false)), global_pooling(OperatorBase::Arg<bool>("global_pooling", false)),
ceil_mode(OperatorBase::Arg<bool>("ceil", true)) { ceil_mode(OperatorBase::Arg<bool>("ceil_mode", true)) {
auto ks = OperatorBase::Args<int64_t>("kernel_shape"); auto ks = OperatorBase::Args<int64_t>("kernel_shape");
auto s = OperatorBase::Args<int64_t>("strides"); auto s = OperatorBase::Args<int64_t>("strides");
auto p = OperatorBase::Args<int64_t>("pads"); auto p = OperatorBase::Args<int64_t>("pads");
...@@ -68,7 +68,7 @@ class Pool2dGradientOp : public Operator<Context> { ...@@ -68,7 +68,7 @@ class Pool2dGradientOp : public Operator<Context> {
data_format(OperatorBase::Arg<string>("data_format", "NCHW")), data_format(OperatorBase::Arg<string>("data_format", "NCHW")),
padding(OperatorBase::Arg<string>("padding", "VALID")), padding(OperatorBase::Arg<string>("padding", "VALID")),
global_pooling(OperatorBase::Arg<bool>("global_pooling", false)), global_pooling(OperatorBase::Arg<bool>("global_pooling", false)),
ceil_mode(OperatorBase::Arg<bool>("ceil", true)) { ceil_mode(OperatorBase::Arg<bool>("ceil_mode", true)) {
auto ks = OperatorBase::Args<int64_t>("kernel_shape"); auto ks = OperatorBase::Args<int64_t>("kernel_shape");
auto s = OperatorBase::Args<int64_t>("strides"); auto s = OperatorBase::Args<int64_t>("strides");
auto p = OperatorBase::Args<int64_t>("pads"); auto p = OperatorBase::Args<int64_t>("pads");
......
...@@ -54,6 +54,7 @@ class ROIAlignGradientOp final : public Operator<Context> { ...@@ -54,6 +54,7 @@ class ROIAlignGradientOp final : public Operator<Context> {
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override;
void RunWithFloat16();
template <typename T> void RunWithType(); template <typename T> void RunWithType();
protected: protected:
......
...@@ -49,6 +49,7 @@ class ROIPoolGradientOp final : public Operator<Context> { ...@@ -49,6 +49,7 @@ class ROIPoolGradientOp final : public Operator<Context> {
USE_OPERATOR_FUNCTIONS; USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override; void RunOnDevice() override;
void RunWithFloat16();
template <typename T> void RunWithType(); template <typename T> void RunWithType();
protected: protected:
......
...@@ -12,7 +12,7 @@ namespace dragon { ...@@ -12,7 +12,7 @@ namespace dragon {
template <typename T> template <typename T>
using BlockReduce = cub::BlockReduce<T, CUDA_THREADS>; using BlockReduce = cub::BlockReduce<T, CUDA_THREADS>;
} } // namespace dragon
#endif // WITH_CUDA #endif // WITH_CUDA
......
...@@ -102,7 +102,7 @@ template <typename T, class Context> ...@@ -102,7 +102,7 @@ template <typename T, class Context>
void Set( void Set(
const int n, const int n,
const T alpha, const T alpha,
T* x, T* y,
Context* ctx); Context* ctx);
template <typename T, class Context> template <typename T, class Context>
...@@ -122,6 +122,15 @@ void Axpy( ...@@ -122,6 +122,15 @@ void Axpy(
Context* ctx); Context* ctx);
template<typename T, class Context> template<typename T, class Context>
void Axpby(
const int n,
const float alpha,
const T* x,
const float beta,
T* y,
Context* ctx);
template<typename T, class Context>
void AddScalar( void AddScalar(
const int n, const int n,
const float alpha, const float alpha,
...@@ -141,17 +150,8 @@ void AddScalar( ...@@ -141,17 +150,8 @@ void AddScalar(
template<typename T, class Context> template<typename T, class Context>
void InvStd( void InvStd(
const int n, const int n,
float eps, const float eps,
const T* x,
T* y,
Context* ctx);
template<typename T, class Context>
void Axpby(
const int n,
float alpha,
const T* x, const T* x,
float beta,
T* y, T* y,
Context* ctx); Context* ctx);
......
...@@ -378,8 +378,8 @@ void NLLLoss( ...@@ -378,8 +378,8 @@ void NLLLoss(
const Tx* log_prob, const Tx* log_prob,
const Ty* labels, const Ty* labels,
const int* ignores, const int* ignores,
float* losses, Tx* losses,
float* flags, int* flags,
Context* ctx); Context* ctx);
template <typename Tx, typename Ty, class Context> template <typename Tx, typename Ty, class Context>
...@@ -392,7 +392,7 @@ void NLLLossGrad( ...@@ -392,7 +392,7 @@ void NLLLossGrad(
const Ty* labels, const Ty* labels,
const int* ignores, const int* ignores,
Tx* dx, Tx* dx,
float* flags, int* flags,
Context* ctx); Context* ctx);
/*! loss.sigmoid_ce_loss */ /*! loss.sigmoid_ce_loss */
...@@ -403,7 +403,7 @@ void SigmoidCrossEntropy( ...@@ -403,7 +403,7 @@ void SigmoidCrossEntropy(
const T* logits, const T* logits,
const T* targets, const T* targets,
T* losses, T* losses,
T* flags, int* flags,
Context* ctx); Context* ctx);
template <typename T, class Context> template <typename T, class Context>
...@@ -412,12 +412,12 @@ void SigmoidCrossEntropyGrad( ...@@ -412,12 +412,12 @@ void SigmoidCrossEntropyGrad(
const T* logits, const T* logits,
const T* targets, const T* targets,
T* dlogits, T* dlogits,
T* flags, int* flags,
Context* ctx); Context* ctx);
/*! loss.sigmoid_focal_loss */ /*! loss.sigmoid_focal_loss */
template <typename T, class Context> template <typename Tx, typename Ty, class Context>
void SigmoidFocalLoss( void SigmoidFocalLoss(
const int outer_dim, const int outer_dim,
const int axis_dim, const int axis_dim,
...@@ -426,13 +426,13 @@ void SigmoidFocalLoss( ...@@ -426,13 +426,13 @@ void SigmoidFocalLoss(
const float neg_alpha, const float neg_alpha,
const float gamma, const float gamma,
const int neg_id, const int neg_id,
const float* logits, const Tx* logits,
const float* targets, const Ty* targets,
float* losses, Tx* losses,
float* flags, int* flags,
Context* ctx); Context* ctx);
template <typename T, class Context> template <typename Tx, typename Ty, class Context>
void SigmoidFocalLossGrad( void SigmoidFocalLossGrad(
const int outer_dim, const int outer_dim,
const int axis_dim, const int axis_dim,
...@@ -441,10 +441,10 @@ void SigmoidFocalLossGrad( ...@@ -441,10 +441,10 @@ void SigmoidFocalLossGrad(
const float neg_alpha, const float neg_alpha,
const float gamma, const float gamma,
const int neg_id, const int neg_id,
const float* logits, const Tx* logits,
const float* targets, const Ty* targets,
float* dlogits, Tx* dlogits,
float* flags, int* flags,
Context* ctx); Context* ctx);
/*! loss.smooth_l1_loss */ /*! loss.smooth_l1_loss */
...@@ -477,7 +477,7 @@ void SoftmaxCrossEntropy( ...@@ -477,7 +477,7 @@ void SoftmaxCrossEntropy(
/*! loss.softmax_focal_loss */ /*! loss.softmax_focal_loss */
template <typename T, class Context> template <typename Tx, typename Ty, class Context>
void SoftmaxFocalLoss( void SoftmaxFocalLoss(
const int outer_dim, const int outer_dim,
const int axis_dim, const int axis_dim,
...@@ -487,14 +487,14 @@ void SoftmaxFocalLoss( ...@@ -487,14 +487,14 @@ void SoftmaxFocalLoss(
const float neg_alpha, const float neg_alpha,
const float gamma, const float gamma,
const int neg_id, const int neg_id,
const T* prob, const Tx* prob,
const T* labels, const Ty* labels,
const int* ignores, const int* ignores,
T* losses, Tx* losses,
T* flags, int* flags,
Context* ctx); Context* ctx);
template <typename T, class Context> template <typename Tx, typename Ty, class Context>
void SoftmaxFocalLossGrad( void SoftmaxFocalLossGrad(
const int outer_dim, const int outer_dim,
const int axis_dim, const int axis_dim,
...@@ -504,11 +504,11 @@ void SoftmaxFocalLossGrad( ...@@ -504,11 +504,11 @@ void SoftmaxFocalLossGrad(
const float neg_alpha, const float neg_alpha,
const float gamma, const float gamma,
const int neg_id, const int neg_id,
const T* prob, const Tx* prob,
const T* labels, const Ty* labels,
const int* ignores, const int* ignores,
T* dx, Tx* dx,
T* flags, int* flags,
Context* ctx); Context* ctx);
/*! loss.sparse_softmax_cross_entropy */ /*! loss.sparse_softmax_cross_entropy */
...@@ -522,8 +522,8 @@ void SparseSoftmaxCrossEntropy( ...@@ -522,8 +522,8 @@ void SparseSoftmaxCrossEntropy(
const Tx* prob, const Tx* prob,
const Ty* labels, const Ty* labels,
const int* ignores, const int* ignores,
float* losses, Tx* losses,
float* flags, int* flags,
Context* ctx); Context* ctx);
template <typename Tx, typename Ty, class Context> template <typename Tx, typename Ty, class Context>
...@@ -536,7 +536,7 @@ void SparseSoftmaxCrossEntropyGrad( ...@@ -536,7 +536,7 @@ void SparseSoftmaxCrossEntropyGrad(
const Ty* labels, const Ty* labels,
const int* ignores, const int* ignores,
Tx* dx, Tx* dx,
float* flags, int* flags,
Context* ctx); Context* ctx);
/*! misc.astype */ /*! misc.astype */
...@@ -548,6 +548,16 @@ void TypeA2B( ...@@ -548,6 +548,16 @@ void TypeA2B(
Tb* b, Tb* b,
Context* ctx); Context* ctx);
/*! misc.gradient */
template <typename T, class Context>
void GradientTwoSum(
const int count,
const T* dy1,
const T* dy2,
T* dx,
Context* ctx);
/*! misc.image_data */ /*! misc.image_data */
template <typename Tx, typename Ty, class Context> template <typename Tx, typename Ty, class Context>
...@@ -976,11 +986,18 @@ void SGDUpdate( ...@@ -976,11 +986,18 @@ void SGDUpdate(
/*! update.op_base */ /*! update.op_base */
template <typename T, class Context> template <typename T, class Context>
void MixedPrecisionL2Decay(
const int count,
const float alpha,
const T* w,
float* dx,
Context* ctx);
template <typename T, class Context>
void MixedPrecisionUpdate( void MixedPrecisionUpdate(
const int count, const int count,
const float* updates, const float* updates,
T* w, T* w,
T* g,
Context* ctx); Context* ctx);
/*! vision.bias_add */ /*! vision.bias_add */
......
...@@ -37,6 +37,20 @@ inline std::vector<std::string> split( ...@@ -37,6 +37,20 @@ inline std::vector<std::string> split(
return ret; return ret;
} }
inline std::string replace_first(
const std::string& str,
const std::string& pattern,
const std::string& excepted) {
size_t pos = 0;
if ((pos = str.find(pattern)) != std::string::npos) {
std::string ret(str);
ret.replace(pos, pattern.size(), excepted);
return ret;
} else {
return str;
}
}
} // namespace str } // namespace str
} // namespace dragon } // namespace dragon
......
...@@ -269,7 +269,7 @@ void LoadONNXModel( ...@@ -269,7 +269,7 @@ void LoadONNXModel(
* * * *
* * * * * * * * * * * * * * * * * * * * */ * * * * * * * * * * * * * * * * * * * * */
void SetLogLevel(const std::string& level) { void SetLoggingLevel(const std::string& level) {
SetLogDestination(StrToLogSeverity(level)); SetLogDestination(StrToLogSeverity(level));
} }
......
...@@ -97,7 +97,7 @@ DRAGON_API std::string CreateGraph( ...@@ -97,7 +97,7 @@ DRAGON_API std::string CreateGraph(
DRAGON_API void RunGraph( DRAGON_API void RunGraph(
const std::string& graph_name, const std::string& graph_name,
Workspace_t ws, Workspace_t ws,
const int stream_id = 1); int stream_id = 0);
/* * * * * * * * * * * * * * * * * * * * * /* * * * * * * * * * * * * * * * * * * * *
* * * *
...@@ -156,7 +156,7 @@ DRAGON_API void LoadONNXModel( ...@@ -156,7 +156,7 @@ DRAGON_API void LoadONNXModel(
* * * *
* * * * * * * * * * * * * * * * * * * * */ * * * * * * * * * * * * * * * * * * * * */
DRAGON_API void SetLogLevel(const std::string& level); DRAGON_API void SetLoggingLevel(const std::string& level);
} // namespace dragon } // namespace dragon
......
...@@ -19,95 +19,45 @@ namespace dragon { ...@@ -19,95 +19,45 @@ namespace dragon {
namespace python { namespace python {
PyObject* CreateGradientDefsCC(PyObject* self, PyObject* args) { void AddGradientMethods(pybind11::module& m) {
PyObject* def_string = nullptr; m.def("CreateGradientDefs", [](
PyObject* py_g_outputs = nullptr; const string& forward_def,
if (!PyArg_ParseTuple(args, "SO!", const vector<string>& g_outputs) {
&def_string, &PyList_Type, &py_g_outputs)) { OperatorDef def;
PyErr_SetString(PyExc_ValueError, if (!def.ParseFromString(forward_def))
"Excepted a serialized string of OperatorDef " LOG(FATAL) << "Failed to parse the OperatorDef.";
"and a list containing outputs of this GradientOp."); if (!GradientRegistry()->Has(def.type()))
return nullptr; LOG(FATAL) << def.type() << "Op has no gradients.";
} Gradient grad = MakeGradientForOp(def, g_outputs);
OperatorDef def; vector<pybind11::bytes> grad_ops;
if (!def.ParseFromString(PyBytes_AsStringEx(def_string))) { for (const auto& e : grad.ops)
PyErr_SetString(PyExc_ValueError, grad_ops.push_back(e.SerializeAsString());
"Failed to parse the OperatorDef."); return std::tuple<
return nullptr; vector<pybind11::bytes>, vector<string>, vector<float>
} >(grad_ops, grad.g_inputs, grad.defaults);
if (!GradientRegistry()->Has(def.type())) { });
PyErr_SetString(PyExc_KeyError,
"This Operator does not register GradientOp.");
return nullptr;
}
vector<string> g_outputs;
PyList_AsVecString(py_g_outputs, g_outputs, "ignore");
Gradient grad = MakeGradientForOp(def, g_outputs);
PyObject* g_ops = PyList_New(grad.ops.size());
PyObject* g_input = PyList_New(grad.g_inputs.size());
PyObject* g_defaults = PyList_New(grad.defaults.size());
for (int i = 0; i < grad.ops.size(); i++) {
PyObject* e = String_AsPyBytes(grad.ops[i].SerializeAsString());
SetPyList(g_ops, i, e);
}
for (int i = 0; i < grad.g_inputs.size(); i++) {
PyObject* e = String_AsPyUnicode(grad.g_inputs[i]);
SetPyList(g_input, i, e);
}
for (int i = 0; i < grad.defaults.size(); i++) {
PyObject* e = PyFloat_FromDouble(grad.defaults[i]);
SetPyList(g_defaults, i, e);
}
PyObject* pack = PyTuple_Pack(3, g_ops, g_input, g_defaults);
Py_XDECREF(g_ops);
Py_XDECREF(g_input);
Py_XDECREF(g_defaults);
return pack;
}
PyObject* RunGradientFlowCC(PyObject* self, PyObject* args) { m.def("FlowGradients", [](
PyObject* py_fp_ops, *py_targets; const vector<OperatorDef*>& forward_ops,
PyObject* py_input_grads, *py_ignore_grads; const vector<string>& targets,
PyObject* py_share_grads, *py_export_graph; const vector<string>& input_grads,
if (!PyArg_ParseTuple(args, "OOOOOO", const vector<string>& ignore_grads,
&py_fp_ops, &py_targets, const bool is_sharing,
&py_input_grads, &py_ignore_grads, const bool verbose) {
&py_share_grads, &py_export_graph)) { // Make => Optimize => Run
PyErr_SetString(PyExc_ValueError, GraphDef backward_ops;
"Excepted a list of serialized input ops, targets, " GraphGradientMaker maker;
"input grads, ignore grads and whehter to share grads or log graph."); for (auto& grad : input_grads) maker.AddExternalGrad(grad);
return nullptr; for (auto& grad : ignore_grads) maker.AddIgnoreGrad(grad);
} maker.Make(forward_ops, targets, backward_ops);
// Make -> Optm -> Run if (is_sharing) maker.Share(backward_ops);
vector<string> targets, input_grads, ignore_grads; pybind11::gil_scoped_release g;
PyList_AsVecString(py_targets, targets, ""); for (auto& op : backward_ops.op()) {
PyList_AsVecString(py_input_grads, input_grads, ""); if (verbose) std::cout << op.DebugString() << std::endl;
PyList_AsVecString(py_ignore_grads, ignore_grads, ""); if (op.has_uid()) ws()->RunOperator(op);
GraphDef fp_ops, bp_ops; else ws()->RunOperatorOnce(op);
if (!fp_ops.ParseFromString(PyBytes_AsStringEx(py_fp_ops))) { }
PyErr_SetString(PyExc_RuntimeError, });
"Failed to parse the GraphDef of forward ops.");
return nullptr;
}
GraphGradientMaker maker;
for (auto& grad : input_grads) maker.AddExternalGrad(grad);
for (auto& grad : ignore_grads) maker.AddIgnoreGrad(grad);
maker.Make(fp_ops, targets, bp_ops);
bool share_grads = PyObject_IsTrue(py_share_grads) ? true : false;
bool export_graph = PyObject_IsTrue(py_export_graph) ? true : false;
if (share_grads) maker.Share("/share/buffer/grads", bp_ops);
if (export_graph) {
Tensor* tensor = ws()->CreateTensor(
"/graph_def/dynamic/gradient_flow")->Reshape({ 1 });
string* data = tensor->mutable_data<string, CPUContext>();
data[0] = bp_ops.SerializeAsString();
tensor = ws()->CreateTensor(
"/graph_def/dynamic/forward_flow")->Reshape({ 1 });
data = tensor->mutable_data<string, CPUContext>();
data[0] = fp_ops.SerializeAsString();
}
for (auto& op : bp_ops.op()) ws()->RunOperator(op);
Py_RETURN_TRUE;
} }
} // namespace python } // namespace python
......
...@@ -19,15 +19,10 @@ namespace dragon { ...@@ -19,15 +19,10 @@ namespace dragon {
namespace python { namespace python {
inline PyObject* SetLogLevelCC(PyObject* self, PyObject* args) { void AddConfigMethods(pybind11::module& m) {
char* cname; m.def("SetLoggingLevel", [](const string& level) {
if (!PyArg_ParseTuple(args, "s", &cname)) { SetLogDestination(StrToLogSeverity(level));
PyErr_SetString(PyExc_ValueError, });
"Excepted the logging level.");
return nullptr;
}
SetLogDestination(StrToLogSeverity(string(cname)));
Py_RETURN_TRUE;
} }
} // namespace python } // namespace python
......
...@@ -19,15 +19,34 @@ namespace python { ...@@ -19,15 +19,34 @@ namespace python {
#include "py_dragon.h" #include "py_dragon.h"
inline PyObject* IsCUDADriverSufficientCC(PyObject* self, PyObject* args) { void AddCUDAMethods(pybind11::module& m) {
m.def("IsCUDADriverSufficient", []() {
#ifdef WITH_CUDA #ifdef WITH_CUDA
int count; int count;
cudaError_t err = cudaGetDeviceCount(&count); cudaError_t err = cudaGetDeviceCount(&count);
if (err == cudaErrorInsufficientDriver) return PyBool_FromLong(0); if (err == cudaErrorInsufficientDriver) false;
return PyBool_FromLong(1); return true;
#else #else
return PyBool_FromLong(0); return false;
#endif #endif
});
m.def("cudaGetDevice", []() {
return CUDAContext::active_device_id();
});
m.def("cudaStreamSynchronize", [](
int device_id, int stream_id) {
#ifdef WITH_CUDA
if (device_id < 0) device_id =
CUDAContext::active_device_id();
cudaStreamSynchronize(CUDAContext::cuda_object()
->GetStream(device_id, stream_id));
cudaError_t error = cudaGetLastError();
CHECK_EQ(error, cudaSuccess)
<< "\nCUDA Error: " << cudaGetErrorString(error);
#endif
});
} }
} // namespace python } // namespace python
......
...@@ -13,8 +13,9 @@ ...@@ -13,8 +13,9 @@
#ifndef DRAGON_PYTHON_PY_DRAGON_H_ #ifndef DRAGON_PYTHON_PY_DRAGON_H_
#define DRAGON_PYTHON_PY_DRAGON_H_ #define DRAGON_PYTHON_PY_DRAGON_H_
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#include "py_types.h" #include "py_types.h"
#include "py_macros.h"
#include "core/common.h" #include "core/common.h"
#include "core/registry.h" #include "core/registry.h"
#include "core/context.h" #include "core/context.h"
...@@ -25,6 +26,9 @@ ...@@ -25,6 +26,9 @@
#include "core/workspace.h" #include "core/workspace.h"
#include "utils/caffemodel.h" #include "utils/caffemodel.h"
#include <pybind11/stl.h>
#include <pybind11/pybind11.h>
namespace dragon { namespace dragon {
namespace python { namespace python {
...@@ -32,83 +36,80 @@ namespace python { ...@@ -32,83 +36,80 @@ namespace python {
class TensorFetcherBase { class TensorFetcherBase {
public: public:
virtual ~TensorFetcherBase() {} virtual ~TensorFetcherBase() {}
virtual PyObject* Fetch(const Tensor& tensor) = 0; virtual pybind11::object Fetch(const Tensor& tensor) = 0;
}; };
class TensorFeederBase { class TensorFeederBase {
public: public:
virtual ~TensorFeederBase() {} virtual ~TensorFeederBase() {}
virtual PyObject* Feed( virtual void Feed(
const DeviceOption& option, const DeviceOption& option,
PyArrayObject* array, PyArrayObject* array,
Tensor* tensor) = 0; Tensor* tensor) = 0;
}; };
DECLARE_TYPED_REGISTRY(TensorFetcherRegistry, TypeId, TensorFetcherBase); DECLARE_TYPED_REGISTRY(TensorFetcherRegistry, TypeId, TensorFetcherBase);
#define REGISTER_TENSOR_FETCHER(type, ...) \ #define REGISTER_TENSOR_FETCHER(type, ...) \
REGISTER_TYPED_CLASS(TensorFetcherRegistry, type, __VA_ARGS__) REGISTER_TYPED_CLASS(TensorFetcherRegistry, type, __VA_ARGS__)
inline TensorFetcherBase* CreateFetcher(TypeId type) { inline TensorFetcherBase* CreateFetcher(TypeId type) {
return TensorFetcherRegistry()->Create(type); return TensorFetcherRegistry()->Create(type);
} }
DECLARE_TYPED_REGISTRY(TensorFeederRegistry, TypeId, TensorFeederBase); DECLARE_TYPED_REGISTRY(TensorFeederRegistry, TypeId, TensorFeederBase);
#define REGISTER_TENSOR_FEEDER(type, ...) \ #define REGISTER_TENSOR_FEEDER(type, ...) \
REGISTER_TYPED_CLASS(TensorFeederRegistry, type, __VA_ARGS__) REGISTER_TYPED_CLASS(TensorFeederRegistry, type, __VA_ARGS__)
class NumpyFetcher : public TensorFetcherBase { class NumpyFetcher : public TensorFetcherBase {
public: public:
PyObject* Fetch(const Tensor& tensor) override { pybind11::object Fetch(const Tensor& tensor) override {
CHECK_GT(tensor.count(), 0); CHECK_GT(tensor.count(), 0);
vector<npy_intp> npy_dims; vector<npy_intp> npy_dims;
for (const auto dim : tensor.dims()) npy_dims.push_back(dim); for (const auto dim : tensor.dims()) npy_dims.push_back(dim);
int npy_type = TypeMetaToNPY(tensor.meta()); int npy_type = TypeMetaToNPY(tensor.meta());
if (npy_type == -1) { if (npy_type == -1) {
string s = "The data type of Tensor(" + LOG(FATAL) << "The data type of Tensor(" +
tensor.name() + ") is unknown. Have you solved it ?"; tensor.name() + ") is unknown. Have you solved it ?";
PyErr_SetString(PyExc_RuntimeError, s.c_str());
return nullptr;
} }
CHECK(tensor.memory()) << "\nIllegal memory access.";
// Create a empty array with the same shape // Create a empty array with the same shape
PyObject* array = PyArray_SimpleNew( PyObject* array = PyArray_SimpleNew(
tensor.ndim(), npy_dims.data(), npy_type); tensor.ndim(), npy_dims.data(), npy_type);
// Copy the tensor data to the numpy array // Copy the tensor data to the numpy array
if (tensor.memory_state() == MixedMemory::STATE_AT_CUDA) { if (tensor.memory_state() == MixedMemory::STATE_AT_CUDA) {
CUDAContext::Memcpy<CPUContext, CUDAContext>(tensor.nbytes(), CUDAContext::MemcpyEx<CPUContext, CUDAContext>(tensor.nbytes(),
PyArray_DATA(reinterpret_cast<PyArrayObject*>(array)), PyArray_DATA(reinterpret_cast<PyArrayObject*>(array)),
tensor.raw_data<CUDAContext>()); tensor.raw_data<CUDAContext>(),
tensor.memory()->device_id());
} else { } else {
CPUContext::Memcpy<CPUContext, CPUContext>(tensor.nbytes(), CPUContext::Memcpy<CPUContext, CPUContext>(tensor.nbytes(),
PyArray_DATA(reinterpret_cast<PyArrayObject*>(array)), PyArray_DATA(reinterpret_cast<PyArrayObject*>(array)),
tensor.raw_data<CPUContext>()); tensor.raw_data<CPUContext>());
} }
return array; return pybind11::reinterpret_steal<pybind11::object>(array);
} }
}; };
class StringFetcher : public TensorFetcherBase { class StringFetcher : public TensorFetcherBase {
public: public:
PyObject* Fetch(const Tensor& tensor) override { pybind11::object Fetch(const Tensor& tensor) override {
CHECK_GT(tensor.count(), 0); CHECK_EQ(tensor.count(), 1);
return String_AsPyBytes(*tensor.data<string, CPUContext>()); return pybind11::bytes(tensor.data<string, CPUContext>()[0]);
} }
}; };
class NumpyFeeder : public TensorFeederBase { class NumpyFeeder : public TensorFeederBase {
public: public:
PyObject* Feed( void Feed(
const DeviceOption& option, const DeviceOption& option,
PyArrayObject* original_array, PyArrayObject* original_array,
Tensor* tensor) override { Tensor* tensor) override {
PyArrayObject* array = PyArray_GETCONTIGUOUS(original_array); PyArrayObject* array = PyArray_GETCONTIGUOUS(original_array);
const TypeMeta& meta = TypeNPYToMeta(PyArray_TYPE(array)); const TypeMeta& meta = TypeNPYToMeta(PyArray_TYPE(array));
if (meta.id() == 0) { if (meta.id() == 0) LOG(FATAL) << "Unsupported data type.";
PyErr_SetString(PyExc_TypeError, "Unsupported data type."); tensor->SetMeta(meta);
return nullptr;
}
if (meta.id() != tensor->meta().id() && tensor->meta().id() != 0)
LOG(WARNING) << "Feed Tensor(" << tensor->name() << ")"
<< " with different data type from original one.";
int ndim = PyArray_NDIM(array); int ndim = PyArray_NDIM(array);
npy_intp* npy_dims = PyArray_DIMS(array); npy_intp* npy_dims = PyArray_DIMS(array);
vector<int64_t> dims; vector<int64_t> dims;
...@@ -116,21 +117,22 @@ class NumpyFeeder : public TensorFeederBase { ...@@ -116,21 +117,22 @@ class NumpyFeeder : public TensorFeederBase {
tensor->Reshape(dims); tensor->Reshape(dims);
if (option.device_type() == PROTO_CUDA) { if (option.device_type() == PROTO_CUDA) {
#ifdef WITH_CUDA #ifdef WITH_CUDA
CUDAContext context(option); CUDAContext::MemcpyEx<CUDAContext, CPUContext>(
context.SwitchToDevice(); tensor->nbytes(),
auto* data = tensor->raw_mutable_data<CUDAContext>(meta); tensor->raw_mutable_data<CUDAContext>(),
context.Memcpy<CUDAContext, CPUContext>(tensor->nbytes(), static_cast<void*>(PyArray_DATA(array)),
data, static_cast<void*>(PyArray_DATA(array))); option.device_id());
#else #else
LOG(FATAL) << "CUDA was not compiled."; LOG(FATAL) << "CUDA was not compiled.";
#endif #endif
} else { } else {
auto* data = tensor->raw_mutable_data<CPUContext>(meta); auto* data = tensor->raw_mutable_data<CPUContext>();
CPUContext::Memcpy<CPUContext, CPUContext>(tensor->nbytes(), CPUContext::Memcpy<CPUContext, CPUContext>(
data, static_cast<void*>(PyArray_DATA(array))); tensor->nbytes(),
tensor->raw_mutable_data<CPUContext>(),
static_cast<void*>(PyArray_DATA(array)));
} }
Py_XDECREF(array); Py_XDECREF(array);
Py_RETURN_TRUE;
} }
}; };
......
...@@ -19,66 +19,41 @@ namespace dragon { ...@@ -19,66 +19,41 @@ namespace dragon {
namespace python { namespace python {
inline PyObject* CreateGraphCC(PyObject* self, PyObject* args) { void AddGraphMethods(pybind11::module& m) {
PyObject* graph_str, *verbose; /*! \brief Create a graph from the serialized def */
if (!PyArg_ParseTuple(args, "S|O", &graph_str, &verbose)) { m.def("CreateGraph", [](
PyErr_SetString(PyExc_ValueError, const string& serialized,
"Excepted a serialized string of GraphDef."); const bool verbose) {
return nullptr; GraphDef graph_def;
} if (!graph_def.ParseFromString(serialized))
if (verbose == nullptr) verbose = Py_False; LOG(FATAL) << "Failed to parse the GraphDef.";
auto* graph = ws()->CreateGraph(graph_def);
GraphDef graph_def; if (verbose) {
if (!graph_def.ParseFromString(PyBytes_AsStringEx(graph_str))) { // It is not a good design to print the debug string
PyErr_SetString(PyExc_RuntimeError,
"Failed to parse the GraphDef.");
return nullptr;
}
auto* graph = ws()->CreateGraph(graph_def);
if (!graph) {
PyErr_SetString(PyExc_RuntimeError,
"Failed to create the Graph.");
return nullptr;
} else {
// It is not a good design to print the debug string
if (PyObject_IsTrue(verbose) ? true : false) {
auto* graph_tensor = ws()->CreateTensor( auto* graph_tensor = ws()->CreateTensor(
"/graph_def/optimized/" + graph->name()); "/graph_def/optimized/" + graph->name());
if (graph_tensor->count() > 0) { if (graph_tensor->count() > 0) {
auto* data = graph_tensor->mutable_data<string, CPUContext>(); auto* data = graph_tensor->mutable_data<string, CPUContext>();
std::cout << data[0] << std::endl; std::cout << data[0] << std::endl;
} }
}
}
// Return the graph name may be different from the def
// We will make a unique dummy name on creating the graph
return String_AsPyUnicode(graph->name());
}
inline PyObject* RunGraphCC(PyObject* self, PyObject* args) {
char* cname, *include, *exclude;
if (!PyArg_ParseTuple(args, "sss",
&cname, &include, &exclude)) {
PyErr_SetString(PyExc_ValueError,
"Excepted the graph name, include and exclude rules.");
return nullptr;
}
ws()->RunGraph(
string(cname),
string(include),
string(exclude)
);
Py_RETURN_TRUE;
}
inline PyObject* GraphsCC(PyObject* self, PyObject* args) { }
vector<string> graphs = ws()->GetGraphs(); // Return the graph name may be different from the def
PyObject* list = PyList_New(graphs.size()); // We will make a unique dummy name on creating the graph
for (int i = 0; i < graphs.size(); i++) return graph->name();
CHECK_EQ(PyList_SetItem(list, i, String_AsPyUnicode(graphs[i])), 0); });
return list;
/*! \brief Run an existing graph */
m.def("RunGraph", [](
const string& name,
const string& include,
const string& exclude) {
pybind11::gil_scoped_release g;
ws()->RunGraph(name, include, exclude);
});
/*! \brief List all of the existing graphs */
m.def("Graphs", []() { ws()->GetGraphs(); });
} }
} // namespace python } // namespace python
......
...@@ -19,48 +19,42 @@ namespace dragon { ...@@ -19,48 +19,42 @@ namespace dragon {
namespace python { namespace python {
inline PyObject* SnapshotCC(PyObject* self, PyObject* args) { void AddIOMethods(pybind11::module& m) {
char* path; int format; m.def("Snapshot", [](
PyObject* names; vector<Tensor*> tensors; const string& filename,
if (!PyArg_ParseTuple(args, "sOi", &path, &names, &format)) { vector<string>& names,
PyErr_SetString(PyExc_ValueError, const int format) {
"Excepted the model path, tensors, and data format."); vector<Tensor*> tensors;
return nullptr; switch (format) {
} case 0: // Pickle
switch (format) { LOG(FATAL) << "Format depends on Pickle. "
case 0: // Pickle "Can't be used in C++.";
PyErr_SetString(PyExc_NotImplementedError, break;
"Format depends on Pickle. Can't be used in C++."); case 1: // CaffeModel
break; for (const auto& e : names)
case 1: // CaffeModel tensors.emplace_back(ws()->GetTensor(e));
for (int i = 0; i < PyList_Size(names); i++) SavaCaffeModel(filename, tensors);
tensors.push_back(ws()->GetTensor( break;
PyString_AsString(PyList_GetItem(names, i)))); default:
SavaCaffeModel(path, tensors); LOG(FATAL) << "Unknwon format, code: " << format;
break; }
default: LOG(FATAL) << "Unknwon format, code: " << format; });
}
Py_RETURN_TRUE;
}
inline PyObject* RestoreCC(PyObject* self, PyObject* args) { m.def("Restore", [](
char* path; int format; const string& filename,
if (!PyArg_ParseTuple(args, "si", &path, &format)) { const int format) {
PyErr_SetString(PyExc_ValueError, switch (format) {
"Excepted the model path and data format."); case 0: // Pickle
return nullptr; LOG(FATAL) << "Format depends on Pickle. "
} "Can't be used in C++.";
switch (format) { break;
case 0: // Pickle case 1: // CaffeModel
PyErr_SetString(PyExc_NotImplementedError, LoadCaffeModel(filename, ws());
"Format depends on Pickle. Can't be used in C++."); break;
break; default:
case 1: // CaffeModel LOG(FATAL) << "Unknwon format, code: " << format;
LoadCaffeModel(path, ws()); }
break; });
default: LOG(FATAL) << "Unknwon format, code: " << format;
}
Py_RETURN_TRUE;
} }
} // namespace python } // namespace python
......
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_PYTHON_PY_MACROS_H_
#define DRAGON_PYTHON_PY_MACROS_H_
#include <string>
#include <sstream>
#include <Python.h>
#include <numpy/arrayobject.h>
namespace dragon {
namespace python {
#ifdef WITH_PYTHON3
#define PyInt_FromLong PyLong_FromLong
#define _PyInt_AsInt _PyLong_AsInt
#define PyString_AsString PyUnicode_AsUTF8
#endif
/*!
* ------------------------------------------------------------
*
* <Having Fun with PyString>
*
* For Python3, Get/Return PyUnicode for regular string.
* For Python3, Get/Return PyBytes for google-protobuf.
* For Python2, Get/Return PyBytes only.
*
* ------------------------------------------------------------
*/
#define PyBytes_AsStringEx(pystring) \
std::string(PyBytes_AsString(pystring), PyBytes_Size(pystring))
// Return string to Python
inline PyObject* String_AsPyBytes(const std::string& cstring) {
return PyBytes_FromStringAndSize(cstring.c_str(), cstring.size());
}
inline PyObject* String_AsPyUnicode(const std::string& cstring) {
#ifdef WITH_PYTHON3
return PyUnicode_FromStringAndSize(cstring.c_str(), cstring.size());
#else
return PyBytes_FromStringAndSize(cstring.c_str(), cstring.size());
#endif
}
// Macors
#define PyList_AsVecString(plist, vs, defaults) \
for (int i = 0; i < PyList_Size(plist); i++) { \
PyObject* e = PyList_GetItem(plist, i); \
if (e == Py_None) vs.emplace_back(defaults); \
else vs.push_back(PyString_AsString(PyObject_Str(e))); \
}
#define SetPyList(plist, ix, e) \
PyList_SetItem(plist, ix, e)
#define SetPyDictS2S(object, key, value) \
PyDict_SetItemString(object, key, Py_BuildValue("s", value))
#define SetPyDictS2I(object, key, value) \
PyDict_SetItemString(object, key, Py_BuildValue("i", value))
// Misc
template <typename T>
inline void MakeStringInternal(std::stringstream& ss, const T& t) { ss << t; }
template <typename T,typename ... Args>
inline void MakeStringInternal(std::stringstream& ss, const T& t, const Args& ... args) {
MakeStringInternal(ss, t);
MakeStringInternal(ss, args...);
}
template <typename ... Args>
std::string MakeString(const Args&... args) {
std::stringstream ss;
MakeStringInternal(ss, args...);
return std::string(ss.str());
}
inline void PrErr_SetString(PyObject* type, const std::string& str) {
PyErr_SetString(type, str.c_str());
}
} // namespace python
} // namespace dragon
#endif // DRAGON_PYTHON_PY_MACROS_H_
\ No newline at end of file
...@@ -15,125 +15,126 @@ ...@@ -15,125 +15,126 @@
#include "py_dragon.h" #include "py_dragon.h"
namespace dragon {
namespace python {
#ifdef WITH_MPI #ifdef WITH_MPI
#include <mpi.h> #include <mpi.h>
#endif
inline PyObject* MPIInitCC(PyObject* self, PyObject* args) { namespace dragon {
int thread_type;
MPI_Init_thread(NULL, NULL, MPI_THREAD_MULTIPLE, &thread_type);
CHECK_EQ(thread_type, MPI_THREAD_MULTIPLE)
<< "\nRequire to enable <MPI_THREAD_MULTIPLE> support.";
Py_RETURN_TRUE;
}
inline PyObject* MPIFinalizeCC(PyObject* self, PyObject* args) { namespace python {
MPI_Finalize();
Py_RETURN_TRUE;
}
inline PyObject* MPIRankCC(PyObject* self, PyObject* args) { void AddMPIMethods(pybind11::module& m) {
int world_rank; m.def("MPIInit", []() {
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); #ifdef WITH_MPI
return PyInt_FromLong(world_rank); // Enabling the multi-threads for Python is meaningless
} // While we will still hold this interface here
int thread_type;
char* mt_is_required = nullptr;
mt_is_required = getenv("DRAGON_MPI_THREADS_ENABLE");
if (mt_is_required != nullptr && string(mt_is_required) == "1") {
MPI_Init_thread(NULL, NULL, MPI_THREAD_MULTIPLE, &thread_type);
CHECK_EQ(thread_type, MPI_THREAD_MULTIPLE)
<< "\nRequire to enable <MPI_THREAD_MULTIPLE> support.";
} else {
MPI_Init_thread(NULL, NULL, MPI_THREAD_SINGLE, &thread_type);
}
#else
LOG(FATAL) << "MPI was not compiled.";
#endif
});
inline PyObject* MPISizeCC(PyObject* self, PyObject* args) { m.def("MPIRank", []() {
int world_size; #ifdef WITH_MPI
MPI_Comm_size(MPI_COMM_WORLD, &world_size); int world_rank;
return PyInt_FromLong(world_size); MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
} return world_rank;
#else
LOG(FATAL) << "MPI was not compiled.";
#endif
});
inline PyObject* MPICreateGroupCC(PyObject* self, PyObject* args) { m.def("MPISize", []() {
PyObject *incl, *excl, *ret; #ifdef WITH_MPI
int local_root, world_size; int world_size;
if (!PyArg_ParseTuple(args, "iOO", &local_root, &incl, &excl)) { MPI_Comm_size(MPI_COMM_WORLD, &world_size);
PyErr_SetString(PyExc_ValueError, return world_size;
"Excepted the local root, include and exclued list."); #else
return nullptr; LOG(FATAL) << "MPI was not compiled.";
} #endif
MPI_Group world_group, local_group; });
MPI_Comm local_comm;
int err_code; m.def("MPICreateGroup", [](
MPI_Comm_group(MPI_COMM_WORLD, &world_group); const int local_root,
MPI_Comm_size(MPI_COMM_WORLD, &world_size); const vector<int>& incl,
set<int> all_ranks; const vector<int>& excl) {
for (int i = 0; i < world_size; i++) all_ranks.insert(i); #ifdef WITH_MPI
local_group = world_group; int world_size;
MPI_Group world_group, local_group;
// Check inclue ranks MPI_Comm local_comm;
int size = (int)PyList_Size(incl); int err_code;
if (size > 0) { MPI_Comm_group(MPI_COMM_WORLD, &world_group);
all_ranks.clear(); MPI_Comm_size(MPI_COMM_WORLD, &world_size);
unique_ptr<int> incl_ranks(new int[size]);
int* ranks = incl_ranks.get(); set<int> all_ranks;
for (int i = 0; i < size; i++) { for (int i = 0; i < world_size; i++) all_ranks.insert(i);
ranks[i] = _PyInt_AsInt(PyList_GetItem(incl, i)); local_group = world_group;
all_ranks.insert(ranks[i]);
} // Check include ranks
err_code = MPI_Group_incl(world_group, size, ranks, &local_group); if (!incl.empty()) {
CHECK(err_code == MPI_SUCCESS) << "\nFail to create mpi group."; all_ranks.clear();
} for (auto e : incl) all_ranks.insert(e);
err_code = MPI_Group_incl(world_group,
// Check exclude ranks (int)incl.size(), incl.data(), &local_group);
size = (int)PyList_Size(excl); CHECK(err_code == MPI_SUCCESS)
if (size > 0) { << "\nFail to create MPI Group.";
all_ranks.clear(); Set<int> tmp;
unique_ptr<int> excl_ranks(new int[size]);
int* ranks = excl_ranks.get();
for (int i = 0; i < size; i++) {
ranks[i] = _PyInt_AsInt(PyList_GetItem(excl, i));
tmp.insert(ranks[i]);
} }
for (int i = 0; i < world_size; i++)
if (!tmp.count(i)) all_ranks.insert(i);
err_code = MPI_Group_excl(world_group, size, ranks, &local_group);
CHECK(err_code == MPI_SUCCESS) << "Fail to create mpi group.";
}
err_code = MPI_Comm_create(MPI_COMM_WORLD, local_group, &local_comm); // Check exclude ranks
CHECK(err_code == MPI_SUCCESS) << "Fail to create mpi group."; if (!excl.empty()) {
all_ranks.clear(); Set<int> tmp;
for (auto e : excl) tmp.insert(e);
for (int i = 0; i < world_size; i++)
if (!tmp.count(i)) all_ranks.insert(i);
err_code = MPI_Group_excl(world_group,
(int)excl.size(), excl.data(), &local_group);
CHECK(err_code == MPI_SUCCESS)
<< "\nFail to create MPI Group.";
}
if (local_comm != MPI_COMM_NULL) { err_code = MPI_Comm_create(MPI_COMM_WORLD, local_group, &local_comm);
int world_rank, local_size; CHECK(err_code == MPI_SUCCESS) << "\nFail to create MPI Group.";
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
if (world_rank == local_root) { if (local_comm != MPI_COMM_NULL) {
MPI_Comm_size(local_comm, &local_size); int world_rank, local_size;
std::stringstream ss; MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
ss << "Rank[" << world_rank << "]: " if (world_rank == local_root) {
<< "Create a mpi group of " << local_size << " members"; MPI_Comm_size(local_comm, &local_size);
ss << "\nGroup: ["; std::stringstream ss;
for (auto rank : all_ranks) { ss << "Rank[" << world_rank << "]: "
if (rank != local_root) ss << rank << ", "; << "Create a mpi group of " << local_size << " members";
else ss << rank << "*, "; ss << "\nGroup: [";
for (auto rank : all_ranks) {
if (rank != local_root) ss << rank << ", ";
else ss << rank << "*, ";
}
string log_info = ss.str(); log_info[log_info.size() - 2] = ']';
LOG(INFO) << log_info;
} }
string log_info = ss.str(); log_info[log_info.size() - 2] = ']';
LOG(INFO) << log_info;
} }
} return vector<long>({ (long)local_comm, (long)local_group });
ret = PyList_New(2); #else
PyList_SetItem(ret, 0, PyInt_FromLong((long)local_comm)); LOG(FATAL) << "MPI was not compiled.";
PyList_SetItem(ret, 1, PyInt_FromLong((long)local_group)); #endif
return ret; });
}
#else // WITH_MPI
#define MPI_NOT_IMPLEMENTED \
LOG(FATAL) << "MPI was not compiled."; \
Py_RETURN_TRUE
inline PyObject* MPIInitCC(PyObject* self, PyObject* args) { MPI_NOT_IMPLEMENTED; } m.def("MPIFinalize", []() {
inline PyObject* MPIFinalizeCC(PyObject* self, PyObject* args) { MPI_NOT_IMPLEMENTED; } #ifdef WITH_MPI
inline PyObject* MPIRankCC(PyObject* self, PyObject* args) { MPI_NOT_IMPLEMENTED; } MPI_Finalize();
inline PyObject* MPISizeCC(PyObject* self, PyObject* args) { MPI_NOT_IMPLEMENTED; } #else
inline PyObject* MPICreateGroupCC(PyObject* self, PyObject* args) { MPI_NOT_IMPLEMENTED; } LOG(FATAL) << "MPI was not compiled.";
#endif
#endif // WITH_MPI });
}
} // namespace python } // namespace python
......
/*! /*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd. * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
* *
* Licensed under the BSD 2-Clause License. * Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License * You should have received a copy of the BSD 2-Clause License
* along with the Xpensource.org/licenses/BSD-2-Clause> * along with the software. If not, See,
* *
* ------------------------------------------------------------ * <https://opensource.org/licenses/BSD-2-Clause>
*/ *
* ------------------------------------------------------------
*/
#ifndef DRAGON_PYTHON_PY_ONNX_H_ #ifndef DRAGON_PYTHON_PY_ONNX_H_
#define DRAGON_PYTHON_PY_ONNX_H_ #define DRAGON_PYTHON_PY_ONNX_H_
...@@ -19,21 +21,18 @@ namespace dragon { ...@@ -19,21 +21,18 @@ namespace dragon {
namespace python { namespace python {
inline PyObject* ImportONNXModelCC(PyObject* self, PyObject* args) { void AddONNXMethods(pybind11::module& m) {
char* model_path; m.def("ImportONNXModel", [](
if (!PyArg_ParseTuple(args, "s", &model_path)) { const string& model_path) {
PyErr_SetString(PyExc_ValueError, GraphDef init_graph, pred_graph;
"Excepted the model path."); onnx::ONNXBackend onnx_backend;
return nullptr; onnx_backend.Prepare(model_path, &init_graph, &pred_graph);
} // Serializing to Python is intractable
GraphDef init_graph, pred_graph; // We should apply the initializer immediately
onnx::ONNXBackend onnx_backend; ws()->CreateGraph(init_graph);
onnx_backend.Prepare(model_path, &init_graph, &pred_graph); ws()->RunGraph(init_graph.name(), "", "");
// Serializing to Python is intractable return pybind11::bytes(pred_graph.SerializeAsString());
// We should apply the initializer immediately });
ws()->CreateGraph(init_graph);
ws()->RunGraph(init_graph.name(), "", "");
return String_AsPyBytes(pred_graph.SerializeAsString());
} }
} // namespace python } // namespace python
......
...@@ -19,91 +19,38 @@ namespace dragon { ...@@ -19,91 +19,38 @@ namespace dragon {
namespace python { namespace python {
inline PyObject* RegisteredOperatorsCC(PyObject* self, PyObject* args) { void AddOperatorMethods(pybind11::module& m) {
set<string> all_keys; /*! \brief Return all the registered operators */
for (const auto& name : CPUOperatorRegistry()->keys()) all_keys.insert(name); m.def("RegisteredOperators", []() { return CPUOperatorRegistry()->keys(); });
PyObject* list = PyList_New(all_keys.size());
int idx = 0; /*! \brief Return all the operators without gradients */
for (const string& name : all_keys) m.def("NoGradientOperators", []() { return NoGradientRegistry()->keys(); });
CHECK_EQ(PyList_SetItem(list, idx++, String_AsPyUnicode(name)), 0);
return list; /*! \brief Run a operator from the def reference */
} m.def("RunOperator", [](
OperatorDef* def,
inline PyObject* NoGradientOperatorsCC(PyObject* self, PyObject* args) { const bool verbose) {
set<string> all_keys; pybind11::gil_scoped_release g;
for (const auto& name : NoGradientRegistry()->keys()) all_keys.insert(name); if (verbose) {
PyObject* list = PyList_New(all_keys.size()); // It is not a good design to print the debug string
int idx = 0; std::cout << def->DebugString() << std::endl;
for (const string& name : all_keys) }
CHECK_EQ(PyList_SetItem(list, idx++, String_AsPyUnicode(name)), 0); ws()->RunOperator(*def);
return list; });
}
/*! \brief Run a operator from the serialized def */
inline PyObject* RunOperatorCC(PyObject* self, PyObject* args) { m.def("RunOperator", [](
PyObject* op_str; const string& serialized,
if (!PyArg_ParseTuple(args, "S", &op_str)) { const bool verbose) {
PyErr_SetString(PyExc_ValueError, OperatorDef def;
"Excepted a serialized string of OperatorDef."); CHECK(def.ParseFromString(serialized));
return nullptr; pybind11::gil_scoped_release g;
} if (verbose) {
OperatorDef op_def; // It is not a good design to print the debug string
if (!op_def.ParseFromString(PyBytes_AsStringEx(op_str))) { std::cout << def.DebugString() << std::endl;
PyErr_SetString(PyExc_RuntimeError, }
"Failed to parse the OperatorDef."); ws()->RunOperatorOnce(def);
return nullptr; });
}
ws()->RunOperator(op_def);
Py_RETURN_TRUE;
}
inline PyObject* RunOperatorsCC(PyObject* self, PyObject* args) {
PyObject* py_ops;
if (!PyArg_ParseTuple(args, "O", &py_ops)) {
PyErr_SetString(PyExc_ValueError,
"Excepted a list of serialized string of OperatorDef.");
return nullptr;
}
OperatorDef op_def;
for (int i = 0; i < PyList_Size(py_ops); i++) {
PyObject* op_str = PyList_GetItem(py_ops, i);
CHECK(op_def.ParseFromString(PyBytes_AsStringEx(op_str)));
ws()->RunOperator(op_def);
}
Py_RETURN_TRUE;
}
inline PyObject* CreatePersistentOpCC(PyObject* self, PyObject* args) {
PyObject* op_str;
if (!PyArg_ParseTuple(args, "S", &op_str)) {
PyErr_SetString(PyExc_ValueError,
"Excepted a serialized string of OperatorDef.");
return nullptr;
}
OperatorDef op_def;
if (!op_def.ParseFromString(PyBytes_AsStringEx(op_str))) {
PyErr_SetString(PyExc_RuntimeError,
"Failed to parse the OperatorDef.");
return nullptr;
}
ws()->CreatePersistentOp(op_def);
Py_RETURN_TRUE;
}
inline PyObject* RunPersistentOpCC(PyObject* self, PyObject* args) {
char* key, *anchor;
PyObject* py_inputs, *py_outputs;
if (!PyArg_ParseTuple(args, "ssOO",
&key, &anchor, &py_inputs, &py_outputs)) {
PyErr_SetString(PyExc_ValueError,
"Excepted a persistent key, anchor, "
"list of inputs and outputs.");
return nullptr;
}
vector<string> inputs, outputs;
PyList_AsVecString(py_inputs, inputs, "");
PyList_AsVecString(py_outputs, outputs, "");
ws()->RunPersistentOp(key, anchor, inputs, outputs);
Py_RETURN_TRUE;
} }
} // namespace python } // namespace python
......
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_PYTHON_PY_PROTO_H_
#define DRAGON_PYTHON_PY_PROTO_H_
#include "py_dragon.h"
namespace dragon {
namespace python {
void AddProtoMethods(pybind11::module& m) {
/*! \brief Extented C-Style OperatorDef */
pybind11::class_<OperatorDef>(m, "OperatorDef")
.def(pybind11::init())
.def("CopyFrom", [](
OperatorDef* self,
OperatorDef* other) {
self->CopyFrom(*other);
}).def("ParseFrom", [](
OperatorDef* self,
const string& serialized) {
self->ParseFromString(serialized);
}).def("SerializeAs", [](
OperatorDef* self) {
return pybind11::bytes(self->SerializeAsString());
}).def("add_input", [](
OperatorDef* self,
const string& input) {
self->add_input(input);
}).def("add_output", [](
OperatorDef* self,
const string& output) {
self->add_output(output);
}).def_property("name",
[](OperatorDef* self) {
return self->name(); },
[](OperatorDef* self, const string& name) {
self->set_name(name);
}).def_property("type",
[](OperatorDef* self) {
return self->type(); },
[](OperatorDef* self, const string& type) {
self->set_type(type);
}).def_property("input",
[](OperatorDef* self) -> vector<string> {
return { self->input().begin(), self->input().end() }; },
[](OperatorDef* self, const vector<string>& input) {
*(self->mutable_input()) = { input.begin(), input.end() };
}).def_property("output",
[](OperatorDef* self) -> vector<string> {
return{ self->output().begin(), self->output().end() }; },
[](OperatorDef* self, const vector<string>& output) {
*(self->mutable_output()) = { output.begin(), output.end() };
});
m.def("TestOperatorDefs", [](vector<OperatorDef*> defs) {
for (auto* def : defs) {
std::cout << def->DebugString() << std::endl;
}
});
}
} // namespace python
} // namespace dragon
#endif DRAGON_PYTHON_PY_PROTO_H_
\ No newline at end of file
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
#ifndef DRAGON_PYTHON_PY_TYPES_H_ #ifndef DRAGON_PYTHON_PY_TYPES_H_
#define DRAGON_PYTHON_PY_TYPES_H_ #define DRAGON_PYTHON_PY_TYPES_H_
#include <string>
#include <numpy/arrayobject.h> #include <numpy/arrayobject.h>
#include "core/types.h" #include "core/types.h"
...@@ -31,6 +32,7 @@ inline const int TypeMetaToNPY(const TypeMeta& meta) { ...@@ -31,6 +32,7 @@ inline const int TypeMetaToNPY(const TypeMeta& meta) {
{ TypeMeta::Id<float16>(), NPY_FLOAT16 }, { TypeMeta::Id<float16>(), NPY_FLOAT16 },
{ TypeMeta::Id<float>(), NPY_FLOAT32 }, { TypeMeta::Id<float>(), NPY_FLOAT32 },
{ TypeMeta::Id<double>(), NPY_FLOAT64 }, { TypeMeta::Id<double>(), NPY_FLOAT64 },
{ TypeMeta::Id<std::string>(), NPY_OBJECT },
}; };
return m2npy_type_map.count(meta.id()) ? m2npy_type_map[meta.id()] : -1; return m2npy_type_map.count(meta.id()) ? m2npy_type_map[meta.id()] : -1;
} }
...@@ -45,6 +47,8 @@ inline const TypeMeta& TypeNPYToMeta(int npy_type) { ...@@ -45,6 +47,8 @@ inline const TypeMeta& TypeNPYToMeta(int npy_type) {
{ NPY_FLOAT16, TypeMeta::Make<float16>() }, { NPY_FLOAT16, TypeMeta::Make<float16>() },
{ NPY_FLOAT32, TypeMeta::Make<float>() }, { NPY_FLOAT32, TypeMeta::Make<float>() },
{ NPY_FLOAT64, TypeMeta::Make<double>() }, { NPY_FLOAT64, TypeMeta::Make<double>() },
{ NPY_UNICODE, TypeMeta::Make<std::string>() },
{ NPY_STRING, TypeMeta::Make<std::string>() },
}; };
static TypeMeta unknown_type; static TypeMeta unknown_type;
return npy2m_type_map.count(npy_type) ? return npy2m_type_map.count(npy_type) ?
......
...@@ -24,6 +24,7 @@ from dragon.core.tensor import Tensor ...@@ -24,6 +24,7 @@ from dragon.core.tensor import Tensor
import dragon.core.workspace as workspace import dragon.core.workspace as workspace
import dragon.core.tensor_utils as tensor_utils import dragon.core.tensor_utils as tensor_utils
import dragon.core.mpi as mpi import dragon.core.mpi as mpi
import dragon.core.cuda as cuda
import dragon.memonger as memonger import dragon.memonger as memonger
# Operators # Operators
......
...@@ -23,7 +23,7 @@ option = {} ...@@ -23,7 +23,7 @@ option = {}
# The current device, 'CPU', 'CUDA' or 'CNML' # The current device, 'CPU', 'CUDA' or 'CNML'
option['device'] = 'CPU' option['device'] = 'CPU'
# The device id # The device index
option['device_id'] = 0 option['device_id'] = 0
# Whether to use cuDNN if possible # Whether to use cuDNN if possible
...@@ -32,8 +32,8 @@ option['use_cudnn'] = False ...@@ -32,8 +32,8 @@ option['use_cudnn'] = False
# The global random seed # The global random seed
option['random_seed'] = 3 option['random_seed'] = 3
# Disable the memonger if true # Set the level of graph optimization
option['debug_mode'] = False option['graph_optimization_level'] = 3
# Whether to share grads # Whether to share grads
option['share_grads'] = True option['share_grads'] = True
...@@ -76,29 +76,13 @@ def EnableCPU(): ...@@ -76,29 +76,13 @@ def EnableCPU():
option['device'] = 'CPU' option['device'] = 'CPU'
def IsCUDADriverSufficient():
"""Is CUDADriver sufficient?
Returns
-------
boolean
``True`` if your device(s) support CUDA otherwise ``False``.
References
----------
The wrapper of ``IsCUDADriverSufficientCC``.
"""
return C.IsCUDADriverSufficientCC()
def EnableCUDA(gpu_id=0, use_cudnn=True): def EnableCUDA(gpu_id=0, use_cudnn=True):
"""Enable NVIDIA's CUDA mode globally. """Enable NVIDIA's CUDA mode globally.
Parameters Parameters
---------- ----------
gpu_id : int gpu_id : int
The id of GPU to use. The index of GPU to use.
use_cudnn : boolean use_cudnn : boolean
Whether to use cuDNN if available. Whether to use cuDNN if available.
...@@ -119,7 +103,7 @@ def EnableCNML(mlu_id=0): ...@@ -119,7 +103,7 @@ def EnableCNML(mlu_id=0):
Parameters Parameters
---------- ----------
device_id : int device_id : int
The id of MLU to use. The index of MLU to use.
Returns Returns
------- -------
...@@ -161,12 +145,12 @@ def GetRandomSeed(): ...@@ -161,12 +145,12 @@ def GetRandomSeed():
def SetGPU(id): def SetGPU(id):
"""Set the global id GPU. """Set the global index GPU.
Parameters Parameters
---------- ----------
id : int id : int
The id of GPU to use. The index of GPU to use.
Returns Returns
------- -------
...@@ -178,26 +162,26 @@ def SetGPU(id): ...@@ -178,26 +162,26 @@ def SetGPU(id):
def GetGPU(): def GetGPU():
"""Get the global id of GPU. """Get the global index of GPU.
Returns Returns
------- -------
int int
The global id of GPU. The global index of GPU.
""" """
return option['device_id'] return option['device_id']
def SetDebugMode(enabled=True): def SetGraphType(graph_type=''):
"""Enable Debug mode globally. """Set the graph type.
It will disable all memory sharing optimizations. If empty, the default DAG graph will be used.
Parameters Parameters
---------- ----------
enabled : boolean graph_type : str
Whether to enable debug mode. The graph type.
Returns Returns
------- -------
...@@ -205,18 +189,28 @@ def SetDebugMode(enabled=True): ...@@ -205,18 +189,28 @@ def SetDebugMode(enabled=True):
""" """
global option global option
option['debug_mode'] = enabled option['graph_type'] = graph_type
def SetGraphType(graph_type=''): def SetGraphOptimizationLevel(level=3):
"""Set the graph type. """Set the default level of graph optimization.
If empty, the default DAG graph will be used. We have predefined four levels:
-O0(level=0): Do nothing.
-O1(level=1): Prune the redundant nodes.
-O2(level=2): Add the inplace to outputs.
Note that the graph will no longer be a DAG.
-O3(level=3): Allocate the buffer for outputs.
This level is memory-efficient while debugging will be non-trivial.
Parameters Parameters
---------- ----------
graph_type : str level : {0, 1, 2, 3}, optional, default=3
The graph type. The level, see the documentation for details.
Returns Returns
------- -------
...@@ -224,7 +218,7 @@ def SetGraphType(graph_type=''): ...@@ -224,7 +218,7 @@ def SetGraphType(graph_type=''):
""" """
global option global option
option['graph_type'] = graph_type option['graph_optimization_level'] = level
def LogMetaGraph(enabled=True): def LogMetaGraph(enabled=True):
...@@ -301,7 +295,7 @@ def SetLoggingLevel(level): ...@@ -301,7 +295,7 @@ def SetLoggingLevel(level):
The default level is *INFO*. The default level is *INFO*.
""" """
C.SetLogLevelCC(level) C.SetLoggingLevel(level)
logging.set_verbosity({ logging.set_verbosity({
'DEBUG': logging.DEBUG, 'DEBUG': logging.DEBUG,
'INFO': logging.INFO, 'INFO': logging.INFO,
......
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""List some useful CUDA C++ API."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import dragon.import_c_api as C
def IsCUDADriverSufficient():
"""Is cuda driver sufficient?
Returns
-------
boolean
``True`` if your device(s) support CUDA otherwise ``False``.
"""
return C.IsCUDADriverSufficient()
def GetDevice():
"""Get the current active cuda device.
Returns
-------
int
The device index.
"""
return C.cudaGetDevice()
def SynchronizeStream(device_id=None, stream_id=0):
"""Synchronize the specified cuda stream.
If ``device_id`` is *None*, the current active device will be selected.
Returns
-------
device_id : int or None
The device index.
stream_id : int
The stream index.
"""
return C.cudaStreamSynchronize(
device_id if device_id else -1, stream_id)
\ No newline at end of file
...@@ -49,9 +49,9 @@ class GraphGradientMaker(object): ...@@ -49,9 +49,9 @@ class GraphGradientMaker(object):
Parameters Parameters
---------- ----------
forward_op : dragon_pb2.OperatorDef forward_op : OperatorDef
The OperatorDef of ``ForwardOp``. The OperatorDef of ``ForwardOp``.
g_outputs : list of str or list of None g_outputs : list of str
The inputs of ``BackwardOp`` (Precomputed grads). The inputs of ``BackwardOp`` (Precomputed grads).
name : str, optional name : str, optional
The optional operator name. The optional operator name.
...@@ -61,13 +61,9 @@ class GraphGradientMaker(object): ...@@ -61,13 +61,9 @@ class GraphGradientMaker(object):
tuple tuple
The OpDef, outputs and defaults of ``BackwardOp``. The OpDef, outputs and defaults of ``BackwardOp``.
References
----------
The wrapper of ``CreateGradientDefsCC``.
""" """
g_ops, g_inputs, defaults = \ g_ops, g_inputs, defaults = C.CreateGradientDefs(
C.CreateGradientDefsCC(forward_op.SerializeToString(), g_outputs) forward_op.SerializeToString(), g_outputs)
for idx, g_op in enumerate(g_ops): for idx, g_op in enumerate(g_ops):
new_def = pb.OperatorDef() new_def = pb.OperatorDef()
new_def.ParseFromString(g_op) new_def.ParseFromString(g_op)
...@@ -80,13 +76,13 @@ class GraphGradientMaker(object): ...@@ -80,13 +76,13 @@ class GraphGradientMaker(object):
Parameters Parameters
---------- ----------
forward_op : dragon_pb2.OperatorDef forward_op : OperatorDef
The OperatorDef of ``ForwardOp``. The OperatorDef of ``ForwardOp``.
inputs_to_grads : dict inputs_to_grads : dict
The dict of <input, g_input>. The dict of <input, g_input>.
blacklist : set of str blacklist : set of str
The set of ``NoGradient`` tensors. The set of ``NoGradient`` tensors.
targets : list of str targets : sequence of str
The solving targets. The solving targets.
Returns Returns
...@@ -123,7 +119,7 @@ class GraphGradientMaker(object): ...@@ -123,7 +119,7 @@ class GraphGradientMaker(object):
Parameters Parameters
---------- ----------
forward_ops : list of dragon_pb2.OperatorDef forward_ops : sequence of OperatorDef
The operators of ``ForwardOp``. The operators of ``ForwardOp``.
targets : sequence of str targets : sequence of str
The solving targets. The solving targets.
...@@ -168,12 +164,12 @@ class GraphGradientMaker(object): ...@@ -168,12 +164,12 @@ class GraphGradientMaker(object):
is_skip, gen_grads = \ is_skip, gen_grads = \
cls.CheckGrad(forward_op, inputs_to_grads, blacklist, targets) cls.CheckGrad(forward_op, inputs_to_grads, blacklist, targets)
# Missing grads are represented as ``None`` # Missing grads are represented as ``None``
g_outputs = list(inputs_to_grads.get(name, None) for name in forward_op.output) g_outputs = list(inputs_to_grads.get(name, 'ignore') for name in forward_op.output)
g_ops, g_inputs, defaults = cls.CreateGrad(forward_op, g_outputs) g_ops, g_inputs, defaults = cls.CreateGrad(forward_op, g_outputs)
# Append ops # Append ops
if not is_skip: if not is_skip:
# --> GenOp # GradientGenerateOp
if len(gen_grads) > 0: if len(gen_grads) > 0:
op_inputs = []; op_outputs = []; values = [] op_inputs = []; op_outputs = []; values = []
for item in gen_grads: for item in gen_grads:
...@@ -185,7 +181,7 @@ class GraphGradientMaker(object): ...@@ -185,7 +181,7 @@ class GraphGradientMaker(object):
if forward_op.HasField('device_option'): if forward_op.HasField('device_option'):
gen_op.device_option.CopyFrom(forward_op.device_option) gen_op.device_option.CopyFrom(forward_op.device_option)
backward_ops.append(gen_op) backward_ops.append(gen_op)
# --> GradOp # GradientOp
for g_op in g_ops: for g_op in g_ops:
g_op.name = OperatorHelper.get_name() if auto_names else 'runtime' g_op.name = OperatorHelper.get_name() if auto_names else 'runtime'
backward_ops.append(g_op) backward_ops.append(g_op)
......
...@@ -33,7 +33,7 @@ class OperatorHelper(object): ...@@ -33,7 +33,7 @@ class OperatorHelper(object):
# Input(0) => Output(0), shape and data type unchanged. # Input(0) => Output(0), shape and data type unchanged.
'Relu', 'PRelu', 'Elu', 'SElu', 'Sigmoid', 'Tanh', 'Dropout', 'Softmax', 'Relu', 'PRelu', 'Elu', 'SElu', 'Sigmoid', 'Tanh', 'Dropout', 'Softmax',
'Add', 'Sub', 'Mul', 'Div', 'Clip', 'Log', 'Exp', 'Pow', 'Square', 'Sqrt', 'Add', 'Sub', 'Mul', 'Div', 'Clip', 'Log', 'Exp', 'Pow', 'Square', 'Sqrt',
'Affine', 'Copy', 'Compare', 'StopGradient', 'MovingAverage', 'MPIBroadcast', 'Accumulate', 'Affine', 'Copy', 'Compare', 'StopGradient', 'MPIBroadcast',
'BatchNorm', 'GroupNorm', 'L2Norm', 'LRN', 'BiasAdd', 'DropBlock2d', 'BatchNorm', 'GroupNorm', 'L2Norm', 'LRN', 'BiasAdd', 'DropBlock2d',
) )
...@@ -885,10 +885,6 @@ class OperatorHelper(object): ...@@ -885,10 +885,6 @@ class OperatorHelper(object):
def _apply_BilinearResize(cls, arguments, inputs, outputs): def _apply_BilinearResize(cls, arguments, inputs, outputs):
return cls._apply_NNResize(arguments, inputs, outputs) return cls._apply_NNResize(arguments, inputs, outputs)
@classmethod
def _apply_DenseConcat(cls, arguments, inputs, outputs):
return cls._apply_Concat(arguments, inputs, outputs)
class GradientHelper(object): class GradientHelper(object):
"""A helper to store the known gradient relations. """A helper to store the known gradient relations.
......
...@@ -43,8 +43,9 @@ def get_logger(): ...@@ -43,8 +43,9 @@ def get_logger():
logger = _logging.getLogger('dragon') logger = _logging.getLogger('dragon')
logger.setLevel(INFO) logger.setLevel(INFO)
logger.propagate = False
if not _logging.getLogger().handlers: if True:
# Determine whether we are in an interactive environment # Determine whether we are in an interactive environment
_interactive = False _interactive = False
try: try:
......
...@@ -9,31 +9,15 @@ ...@@ -9,31 +9,15 @@
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""List some useful MPI C++ API."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import numpy as np
import dragon.import_c_api as C import dragon.import_c_api as C
__all__ = [
'Init',
'Is_Init',
'Rank',
'Size',
'CreateGroup',
'Snapshot',
'AllowSnapshot',
'Parallel',
'AllowParallel',
'SetParallelMode',
'GetParallelMode',
'Finalize',
]
_GLOBAL_MPI_IS_INIT = False _GLOBAL_MPI_IS_INIT = False
_GLOBAL_MPI_SNAPSHOT_RANKS = [] _GLOBAL_MPI_SNAPSHOT_RANKS = []
_GLOBAL_MPI_PARALLEL_GROUPS = [] _GLOBAL_MPI_PARALLEL_GROUPS = []
...@@ -55,12 +39,8 @@ def Init(): ...@@ -55,12 +39,8 @@ def Init():
----- -----
This function can only be called once. This function can only be called once.
References
----------
The wrapper of ``MPIInitCC``
""" """
C.MPIInitCC() C.MPIInit()
global _GLOBAL_MPI_IS_INIT global _GLOBAL_MPI_IS_INIT
global _GLOBAL_MPI_SNAPSHOT_RANKS global _GLOBAL_MPI_SNAPSHOT_RANKS
_GLOBAL_MPI_IS_INIT = True _GLOBAL_MPI_IS_INIT = True
...@@ -86,13 +66,9 @@ def Rank(): ...@@ -86,13 +66,9 @@ def Rank():
int int
The world rank. The world rank.
References
----------
The wrapper of ``MPIRankCC``.
""" """
_check_init() _check_init()
return C.MPIRankCC() return C.MPIRank()
def Size(): def Size():
...@@ -103,13 +79,9 @@ def Size(): ...@@ -103,13 +79,9 @@ def Size():
int int
The world size. The world size.
References
----------
The wrapper of ``MPISizeCC``.
""" """
_check_init() _check_init()
return C.MPISizeCC() return C.MPISize()
def CreateGroup(root=0, incl=[], excl=[]): def CreateGroup(root=0, incl=[], excl=[]):
...@@ -129,14 +101,9 @@ def CreateGroup(root=0, incl=[], excl=[]): ...@@ -129,14 +101,9 @@ def CreateGroup(root=0, incl=[], excl=[]):
tuple tuple
The local common and group id. The local common and group id.
References
----------
The wrapper of ``MPICreateGroupCC``.
""" """
_check_init() _check_init()
comm, group = C.MPICreateGroupCC(root, incl, excl) return C.MPICreateGroup(root, incl, excl)
return np.int64(comm), np.int64(group)
def Snapshot(incl): def Snapshot(incl):
...@@ -193,6 +160,7 @@ def AllowSnapshot(): ...@@ -193,6 +160,7 @@ def AllowSnapshot():
Returns Returns
------- -------
boolean boolean
""" """
return Rank() in _GLOBAL_MPI_SNAPSHOT_RANKS return Rank() in _GLOBAL_MPI_SNAPSHOT_RANKS
...@@ -212,12 +180,12 @@ def AllowParallel(): ...@@ -212,12 +180,12 @@ def AllowParallel():
def SetParallelMode(mode): def SetParallelMode(mode):
"""Set the mode of data parallelism. """Set the communication mode of data parallelism.
Parameters Parameters
---------- ----------
mode : str mode : {'MPI', 'NCCL'}, optional
The mode, ``MPI``, ``NCCL`` or ``MIXED``. The communication mode.
Returns Returns
------- -------
...@@ -228,20 +196,18 @@ def SetParallelMode(mode): ...@@ -228,20 +196,18 @@ def SetParallelMode(mode):
The default mode is ``MPI``. The default mode is ``MPI``.
""" """
assert mode == 'MPI' or \ assert mode == 'MPI' or mode == 'NCCL'
mode == 'NCCL' or \
mode == 'MIXED'
global _GLOBAL_MPI_PARALLEL_MODE global _GLOBAL_MPI_PARALLEL_MODE
_GLOBAL_MPI_PARALLEL_MODE = mode _GLOBAL_MPI_PARALLEL_MODE = mode
def GetParallelMode(): def GetParallelMode():
"""Get the current mode of data parallelism. """Get the current communication mode of data parallelism.
Returns Returns
------- -------
str str : {'MPI', 'NCCL'}
The mode, ``MPI``, ``NCCL`` or ``MIXED``. The communication mode.
""" """
return _GLOBAL_MPI_PARALLEL_MODE return _GLOBAL_MPI_PARALLEL_MODE
...@@ -260,4 +226,4 @@ def Finalize(): ...@@ -260,4 +226,4 @@ def Finalize():
""" """
_check_init() _check_init()
C.MPIFinalizeCC() C.MPIFinalize()
\ No newline at end of file \ No newline at end of file
...@@ -21,6 +21,7 @@ import numpy as np ...@@ -21,6 +21,7 @@ import numpy as np
from google.protobuf.message import Message from google.protobuf.message import Message
import dragon.config as cfg import dragon.config as cfg
import dragon.import_c_api as C
from dragon.proto import dragon_pb2 as pb from dragon.proto import dragon_pb2 as pb
from dragon.core.scope import get_default_device from dragon.core.scope import get_default_device
...@@ -50,14 +51,15 @@ else: ...@@ -50,14 +51,15 @@ else:
argument.name = key argument.name = key
if type(value) is float: argument.f = value if type(value) is float: argument.f = value
elif type(value) in (bool, int, long, np.int64) : argument.i = value elif type(value) in (bool, int, long, np.int64) : argument.i = value
elif type(value) in (str, unicode): argument.s = value elif type(value) is str: argument.s = value
elif type(value) is unicode: argument.s = str(value)
elif isinstance(value, Message): argument.s = value.SerializeToString() elif isinstance(value, Message): argument.s = value.SerializeToString()
elif all(type(v) is float for v in value): argument.floats.extend(value) elif all(type(v) is float for v in value): argument.floats.extend(value)
elif all(type(v) is int for v in value): argument.ints.extend(value) elif all(type(v) is int for v in value): argument.ints.extend(value)
elif all(type(v) is long for v in value): argument.ints.extend(value) elif all(type(v) is long for v in value): argument.ints.extend(value)
elif all(type(v) is str for v in value): argument.strings.extend(value) elif all(type(v) is str for v in value): argument.strings.extend(value)
elif all(type(v) is unicode or type(v) is str for v in value): elif all(type(v) is unicode for v in value):
argument.strings.extend(value) argument.strings.extend([str(v) for v in value])
elif all(isinstance(v, Message) for v in value): elif all(isinstance(v, Message) for v in value):
argument.strings.extend([v.SerializeToString() for v in value]) argument.strings.extend([v.SerializeToString() for v in value])
else: else:
...@@ -67,8 +69,10 @@ else: ...@@ -67,8 +69,10 @@ else:
return argument return argument
def MakeOperatorDef(op_type, inputs, outputs, name='', def MakeOperatorDef(
device_option=None, arg=None, engine=None, **kwargs): op_type, inputs=(), outputs=(),
name='', uid=None, device_option=None,
arg=None, engine=None, **kwargs):
operator = pb.OperatorDef() operator = pb.OperatorDef()
operator.type = op_type operator.type = op_type
operator.name = name operator.name = name
...@@ -81,22 +85,29 @@ def MakeOperatorDef(op_type, inputs, outputs, name='', ...@@ -81,22 +85,29 @@ def MakeOperatorDef(op_type, inputs, outputs, name='',
if 'random_seed' in kwargs: if 'random_seed' in kwargs:
operator.device_option.random_seed = kwargs['random_seed'] operator.device_option.random_seed = kwargs['random_seed']
del kwargs['random_seed'] del kwargs['random_seed']
if arg is not None: if uid is not None: operator.uid = uid
operator.arg.extend(arg) if arg is not None: operator.arg.extend(arg)
for k,v in kwargs.items(): for k,v in kwargs.items():
if v is None: continue if v is None: continue
operator.arg.add().CopyFrom(MakeArgument(k,v)) operator.arg.add().CopyFrom(MakeArgument(k,v))
return operator return operator
def MutableOperatorDef(meta_def, inputs, outputs): def MakeCXXOperatorDef(
op = pb.OperatorDef(); op.CopyFrom(meta_def) op_type, inputs=(), outputs=(),
op.ClearField('input'); op.input.extend(inputs) name='', uid=None, device_option=None,
op.ClearField('output'); op.output.extend(outputs) arg=None, engine=None, **kwargs):
return op c_def = C.OperatorDef()
py_def = MakeOperatorDef(
op_type, inputs, outputs, name, uid,
device_option, arg, engine, **kwargs)
c_def.ParseFrom(py_def.SerializeToString())
return c_def
def MakeDeviceOption(device_type, device_id, engine=None, rng_seed=None): def MakeDeviceOption(
device_type, device_id,
engine=None, rng_seed=None):
option = pb.DeviceOption() option = pb.DeviceOption()
option.device_type = device_type option.device_type = device_type
option.device_id = device_id option.device_id = device_id
...@@ -121,7 +132,9 @@ for i in range(_PREDEFINED_DEVICE_LIMITS): ...@@ -121,7 +132,9 @@ for i in range(_PREDEFINED_DEVICE_LIMITS):
MakeDeviceOption(identify, i, 'CUDNN') MakeDeviceOption(identify, i, 'CUDNN')
def GetDeviceOption(device_type, device_id=0, engine=None, rng_seed=None): def GetDeviceOption(
device_type, device_id=0,
engine=None, rng_seed=None):
ctx = (device_type, device_id, engine if engine else '') ctx = (device_type, device_id, engine if engine else '')
option = _PREDEFINED_DEVICE_OPTION_DICT[ctx] option = _PREDEFINED_DEVICE_OPTION_DICT[ctx]
if rng_seed is not None: if rng_seed is not None:
......
...@@ -88,11 +88,11 @@ class WorkspaceScope(object): ...@@ -88,11 +88,11 @@ class WorkspaceScope(object):
self.prev = 'default' self.prev = 'default'
def __enter__(self): def __enter__(self):
self.prev = C.CurrentWorkspaceCC() self.prev = C.CurrentWorkspace()
C.SwitchWorkspaceCC(self.ws, True) C.SwitchWorkspace(self.ws, True)
def __exit__(self, type, value, traceback): def __exit__(self, type, value, traceback):
C.SwitchWorkspaceCC(self.prev, True) C.SwitchWorkspace(self.prev, True)
_GLOBAL_TENSOR_STACK = _ThreadLocalStack() _GLOBAL_TENSOR_STACK = _ThreadLocalStack()
......
...@@ -355,10 +355,9 @@ class Tensor(object): ...@@ -355,10 +355,9 @@ class Tensor(object):
""" """
if inplace: if inplace:
return Tensor.CreateOperator( return Tensor.CreateOperator(
'AsType', [], existing_outputs=[self], dtype=dtype) 'Cast', [], existing_outputs=[self], dtype=dtype)
else: else:
return Tensor.CreateOperator( return Tensor.CreateOperator('Cast', self, dtype=dtype)
'AsType', self, dtype=dtype)
@property @property
def extra_targets(self): def extra_targets(self):
......
...@@ -9,6 +9,8 @@ ...@@ -9,6 +9,8 @@
# #
# ------------------------------------------------------------ # ------------------------------------------------------------
"""List some extended Tensor C++ API."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
...@@ -23,21 +25,7 @@ from dragon.core.tensor import Tensor ...@@ -23,21 +25,7 @@ from dragon.core.tensor import Tensor
from dragon.core.proto_utils import GetDeviceOption from dragon.core.proto_utils import GetDeviceOption
__all__ = [ def FromShape(shape, dtype='float32', name=None):
'FromShape',
'SetShape',
'FromTensor',
'FromPyArray',
'SetPyArray',
'ToPyArray',
'ToPyArrayEx',
'ToCPUTensor',
'ToCUDATensor',
'GetTensorInfo',
]
def FromShape(shape, dtype='float32', ctx=None, name=None):
"""Create a Tensor from the shape. """Create a Tensor from the shape.
If specifying a existed tensor with larger shape, If specifying a existed tensor with larger shape,
...@@ -49,8 +37,6 @@ def FromShape(shape, dtype='float32', ctx=None, name=None): ...@@ -49,8 +37,6 @@ def FromShape(shape, dtype='float32', ctx=None, name=None):
The shape info. The shape info.
dtype : str dtype : str
The data type. The data type.
ctx : dragon_pb2.DeviceOption
The context info.
name : str, optional name : str, optional
The optional tensor name. The optional tensor name.
...@@ -59,19 +45,14 @@ def FromShape(shape, dtype='float32', ctx=None, name=None): ...@@ -59,19 +45,14 @@ def FromShape(shape, dtype='float32', ctx=None, name=None):
Tensor Tensor
The tensor with the specific shape. The tensor with the specific shape.
References
----------
The wrapper of ``TensorFromShapeCC``.
""" """
tensor = _try_get_tensor(name) tensor = _try_get_tensor(name)
tensor.shape = list(shape)
if not isinstance(shape, (tuple, list)): if not isinstance(shape, (tuple, list)):
raise TypeError('The shape should be a tuple or list.') raise TypeError('The shape should be a tuple or list.')
if ctx is None: ctx = GetDeviceOption('CPU') C.TensorFromShape(
C.TensorFromShapeCC(
_stringify_tensor(tensor), _stringify_tensor(tensor),
list(shape), dtype, list(shape), dtype)
_stringify_proto(ctx))
return tensor return tensor
...@@ -91,12 +72,8 @@ def SetShape(tensor, shape, dtype='float32'): ...@@ -91,12 +72,8 @@ def SetShape(tensor, shape, dtype='float32'):
------- -------
None None
References
----------
The wrapper of ``TensorFromShapeCC``.
""" """
C.TensorFromShapeCC(_stringify_tensor(tensor), shape, dtype) C.TensorFromShape(_stringify_tensor(tensor), shape, dtype)
def FromTensor(src, src_ctx=None, name=None, ctx=None): def FromTensor(src, src_ctx=None, name=None, ctx=None):
...@@ -109,11 +86,11 @@ def FromTensor(src, src_ctx=None, name=None, ctx=None): ...@@ -109,11 +86,11 @@ def FromTensor(src, src_ctx=None, name=None, ctx=None):
---------- ----------
src_ctx : str src_ctx : str
The name of source tensor. The name of source tensor.
src_ctx : dragon_pb2.DeviceOption src_ctx : DeviceOption
The context of source tensor. The context of source tensor.
name : str name : str
The optional tensor name for destination tensor. The optional tensor name for destination tensor.
ctx : dragon_pb2.DeviceOption ctx : DeviceOption
The context for destination tensor. The context for destination tensor.
Returns Returns
...@@ -121,17 +98,13 @@ def FromTensor(src, src_ctx=None, name=None, ctx=None): ...@@ -121,17 +98,13 @@ def FromTensor(src, src_ctx=None, name=None, ctx=None):
Tensor Tensor
The tensor with the same data as source. The tensor with the same data as source.
References
----------
The wrapper of ``TensorFromTensorCC``.
""" """
tensor = _try_get_tensor(name) tensor = _try_get_tensor(name)
if src_ctx is None: src_ctx = GetDeviceOption('CPU') if src_ctx is None: src_ctx = GetDeviceOption('CPU')
if ctx is None: ctx = GetDeviceOption('CPU') if ctx is None: ctx = GetDeviceOption('CPU')
C.TensorFromTensorCC( C.TensorFromTensor(
_stringify_tensor(tensor), _stringify_tensor(src), _stringify_tensor(tensor), _stringify_tensor(src),
_stringify_proto(ctx), _stringify_proto(src_ctx)) _stringify_proto(ctx), _stringify_proto(src_ctx))
return tensor return tensor
...@@ -155,15 +128,11 @@ def FromPyArray(array, name=None): ...@@ -155,15 +128,11 @@ def FromPyArray(array, name=None):
Tensor Tensor
The tensor sharing the memory with original array. The tensor sharing the memory with original array.
References
----------
The wrapper of ``TensorFromPyArrayCC``.
""" """
tensor = _try_get_tensor(name) tensor = _try_get_tensor(name)
if not isinstance(array, np.ndarray): if not isinstance(array, np.ndarray):
raise TypeError('The given nd-array should be numpy.ndarray.') raise TypeError('The given nd-array should be numpy.ndarray.')
C.TensorFromPyArrayCC(_stringify_tensor(tensor), array) C.TensorFromPyArray(_stringify_tensor(tensor), array)
return tensor return tensor
...@@ -188,154 +157,58 @@ def SetPyArray(tensor, array): ...@@ -188,154 +157,58 @@ def SetPyArray(tensor, array):
The wrapper of ``TensorFromPyArrayCC``. The wrapper of ``TensorFromPyArrayCC``.
""" """
C.TensorFromPyArrayCC(_stringify_tensor(tensor), array) C.TensorFromPyArray(_stringify_tensor(tensor), array)
def ToPyArray(tensor): def ToPyArray(tensor, readonly=False):
"""Create a Array from a existing Tensor. """Create a Array from a existing Tensor.
Note that memory of Array are ``zero-copied``. Note that memory of Array are *zero-copied*.
Parameters Parameters
---------- ----------
tensor : Tensor or str tensor : Tensor or str
The input tensor. The input tensor.
readonly : boolean
Whether to sync the contents with device.
Returns Returns
------- -------
numpy.ndarray numpy.ndarray
The array sharing the memory with original tensor. The array sharing the memory with original tensor.
References
----------
The wrapper of ``TensorToPyArrayCC``.
"""
return C.TensorToPyArrayCC(_stringify_tensor(tensor))
def ToPyArrayEx(tensor):
"""Create a const Array from a existing Tensor.
Note that memory of Array are ``zero-copied`` and ``const``.
Parameters
----------
tensor : Tensor or str
The input tensor.
Returns
-------
numpy.ndarray
The array sharing the memory with original tensor.
References
----------
The wrapper of ``TensorToPyArrayExCC``.
"""
return C.TensorToPyArrayExCC(_stringify_tensor(tensor))
def ToCPUTensor(tensor):
"""Switch the storage of a existing Tensor on cpu memory.
Parameters
----------
tensor : Tensor or str
The input tensor.
Returns
-------
None
References
----------
The wrapper of ``ToCPUTensorCC``.
""" """
return C.ToCPUTensorCC(_stringify_tensor(tensor)) return C.TensorToPyArray(_stringify_tensor(tensor), readonly)
def ToCUDATensor(tensor, device=0): def GetStorage(tensor):
"""Switch the storage of a existing Tensor on cuda memory. """Get the storage of a existing Tensor.
Parameters Parameters
---------- ----------
tensor : Tensor or str tensor : Tensor or str
The input tensor. The input tensor.
device : int
The id of the device to use.
Returns Returns
------- -------
None TensorStorage
The storage of the backend.
References
----------
The wrapper of ``ToCUDATensorCC``.
""" """
return C.ToCUDATensorCC(_stringify_tensor(tensor), device) tensor = _stringify_tensor(tensor)
if not dg.workspace.HasTensor(tensor): return None
return C.GetTensor(tensor)
def GetTensorInfo(tensor, stream=1):
"""Get the info of a existing Tensor.
The string info contains following fields:
stream #1: ``dtype``, ``from_numpy``, ``init``, ``mem``, ``mem_at``, ``device_id``
stream #2: ``shape``
stream #3: #1 + #2
Parameters
----------
tensor : Tensor or str
The input tensor.
stream : int
The stream id.
Returns
-------
dict
The info.
References
----------
The wrapper of ``GetTensorInfoCC``.
"""
if not dg.workspace.HasTensor(_stringify_tensor(tensor)): return None
info = C.GetTensorInfoCC(_stringify_tensor(tensor), stream)
info['mem'] = []
if 'CPU' in info:
info['mem'].append('CPU'); info['device_id'] = 0
if 'CUDA' in info:
info['mem'].append('CUDA'); info['device_id'] = int(info['CUDA'])
if 'CNML' in info:
info['mem'].append('CNML'); info['device_id'] = int(info['CNML'])
info['init'] = len(info['mem']) > 0
return info
def _stringify_proto(obj): def _stringify_proto(obj):
"""Try to stringify a proto-buffer structure.""" """Try to stringify a proto-buffer structure."""
if obj is str: return obj return obj.SerializeToString()
elif isinstance(obj, Message): return obj.SerializeToString()
else: raise TypeError('Object can not be serialized as a string.')
def _stringify_tensor(obj): def _stringify_tensor(obj):
"""Try to stringify a tensor.""" """Try to stringify a tensor."""
if hasattr(obj, 'name'): return obj.name if hasattr(obj, 'name'): return obj.name
else: else: return str(obj)
try:
obj = str(obj)
except Exception as e:
raise TypeError('Object can bot be used as a tensor. Error: {0}'.format(str(e)))
return obj
def _try_get_tensor(name=None): def _try_get_tensor(name=None):
......
...@@ -33,8 +33,8 @@ except ImportError as e: ...@@ -33,8 +33,8 @@ except ImportError as e:
sys.exit(1) sys.exit(1)
REGISTERED_OPERATORS = set(s for s in RegisteredOperatorsCC()) REGISTERED_OPERATORS = set(s for s in RegisteredOperators())
NO_GRADIENT_OPERATORS = set(s for s in NoGradientOperatorsCC()) NO_GRADIENT_OPERATORS = set(s for s in NoGradientOperators())
atexit.register(OnModuleExitCC) atexit.register(OnModuleExit)
\ No newline at end of file \ No newline at end of file
...@@ -100,8 +100,8 @@ class ArgumentHelper(object): ...@@ -100,8 +100,8 @@ class ArgumentHelper(object):
arguments[name] = None arguments[name] = None
arguments[name + '_desc'] = property.name arguments[name + '_desc'] = property.name
return arguments return arguments
extra_kwargs = {'gen_desc_{}'.format(name): Generator} kwargs.update({'gen_desc_{}'.format(name): Generator})
return op_func(*args, **kwargs, **extra_kwargs) return op_func(*args, **kwargs)
return Impl return Impl
return Decorator return Decorator
...@@ -138,8 +138,8 @@ class ArgumentHelper(object): ...@@ -138,8 +138,8 @@ class ArgumentHelper(object):
else: else:
arguments[desc_name] = properties arguments[desc_name] = properties
return arguments return arguments
extra_kwargs = {'gen_desc_{}'.format(name): Generator} kwargs.update({'gen_desc_{}'.format(name): Generator})
return op_func(*args, **kwargs, **extra_kwargs) return op_func(*args, **kwargs)
return Impl return Impl
return Decorator return Decorator
......
...@@ -140,11 +140,13 @@ def Minimum(inputs, **kwargs): ...@@ -140,11 +140,13 @@ def Minimum(inputs, **kwargs):
@OpSchema.Inputs(1) @OpSchema.Inputs(1)
def Moments(inputs, axes=None, keep_dims=False, **kwargs): def Moments(inputs, axes=None, keep_dims=False, **kwargs):
"""Compute the mean and variance of inputs along the given axes. """Calculate the mean and variance of inputs along the given axes.
The data type of moments will be *float32* typically, The data type of moments will be *float32* typically,
except the *float64* inputs (*float64* moments instead). except the *float64* inputs (*float64* moments instead).
If ``axes`` is *None*, a Scalar will be returned.
**Type Constraints**: (*int8*, *uint8*, *int32*, *int64*, *float16*, *float32*, *float64*) **Type Constraints**: (*int8*, *uint8*, *int32*, *int64*, *float16*, *float32*, *float64*)
Parameters Parameters
...@@ -206,9 +208,9 @@ def Matmul(inputs, transA=False, transB=False, **kwargs): ...@@ -206,9 +208,9 @@ def Matmul(inputs, transA=False, transB=False, **kwargs):
---------- ----------
inputs : sequence of Tensor inputs : sequence of Tensor
The inputs, A and B. The inputs, A and B.
transA : bool transA : bool, optional, default=False
Whether to transpose A. Whether to transpose A.
transB : bool transB : bool, optional, default=False
Whether to transpose B. Whether to transpose B.
Returns Returns
...@@ -234,9 +236,9 @@ def Dot(inputs, transA=False, transB=False, **kwargs): ...@@ -234,9 +236,9 @@ def Dot(inputs, transA=False, transB=False, **kwargs):
---------- ----------
inputs : sequence of Tensor inputs : sequence of Tensor
The inputs, A and B. The inputs, A and B.
transA : bool transA : bool, optional, default=False
Whether to transpose A. Whether to transpose A.
transB : bool transB : bool, optional, default=False
Whether to transpose B. Whether to transpose B.
Returns Returns
...@@ -262,9 +264,9 @@ def FullyConnected(inputs, num_output, axis=1, transW=True, **kwargs): ...@@ -262,9 +264,9 @@ def FullyConnected(inputs, num_output, axis=1, transW=True, **kwargs):
The inputs, represent [X, W] + [b]. The inputs, represent [X, W] + [b].
num_output : int num_output : int
The output dim. The output dim.
axis : int, optional axis : int, optional, default=1
The start axis to calculate, can be negative. The start axis to calculate, can be negative.
transW : bool, optional transW : bool, optional, default=True
Whether to transpose the W. Whether to transpose the W.
Returns Returns
...@@ -346,7 +348,7 @@ def Exp(inputs, **kwargs): ...@@ -346,7 +348,7 @@ def Exp(inputs, **kwargs):
@OpSchema.Inputs(1) @OpSchema.Inputs(1)
def Pow(inputs, power, shift=None, scale=None, **kwargs): def Pow(inputs, power, shift=0., scale=1., **kwargs):
"""Calculate the power of input. """Calculate the power of input.
Formulation: |power_function| Formulation: |power_function|
...@@ -357,11 +359,11 @@ def Pow(inputs, power, shift=None, scale=None, **kwargs): ...@@ -357,11 +359,11 @@ def Pow(inputs, power, shift=None, scale=None, **kwargs):
---------- ----------
inputs : Tensor inputs : Tensor
The input tensor. The input tensor.
power : float power : float, required
The power factor. The power factor.
shift : float, optional shift : float, optional, default=0.
The shift magnitude. The shift magnitude.
scale : float, optional scale : float, optional, default=1.
The scale factor. The scale factor.
Returns Returns
...@@ -414,7 +416,7 @@ def Sqrt(inputs, **kwargs): ...@@ -414,7 +416,7 @@ def Sqrt(inputs, **kwargs):
The sqrt result. The sqrt result.
""" """
return Tensor.CreateOperator('Pow', power=0.5, **ParseArgs(locals())) return Tensor.CreateOperator('Sqrt', **ParseArgs(locals()))
@OpSchema.Inputs(2, 3) @OpSchema.Inputs(2, 3)
...@@ -433,9 +435,9 @@ def Affine(inputs, axis=1, num_axes=1, **kwargs): ...@@ -433,9 +435,9 @@ def Affine(inputs, axis=1, num_axes=1, **kwargs):
---------- ----------
inputs : sequence of Tensor inputs : sequence of Tensor
The inputs, represent [x, A] + [b]. The inputs, represent [x, A] + [b].
axis : int, optional axis : int, optional, default=1
The start axis to scale, can be negative. The start axis to scale, can be negative.
num_axes : int, optional num_axes : int, optional, default=1
The number of axes to scale. The number of axes to scale.
Returns Returns
...@@ -459,7 +461,7 @@ def GramMatrix(inputs, axis=1, **kwargs): ...@@ -459,7 +461,7 @@ def GramMatrix(inputs, axis=1, **kwargs):
---------= ---------=
inputs : Tensor inputs : Tensor
The input tensor. The input tensor.
axis : int, optional axis : int, optional, default=1
The start axis to calculate. The start axis to calculate.
Returns Returns
...@@ -469,3 +471,48 @@ def GramMatrix(inputs, axis=1, **kwargs): ...@@ -469,3 +471,48 @@ def GramMatrix(inputs, axis=1, **kwargs):
""" """
return Tensor.CreateOperator('GramMatrix', **ParseArgs(locals())) return Tensor.CreateOperator('GramMatrix', **ParseArgs(locals()))
@OpSchema.Inputs(1, INT_MAX)
def Accumulate(inputs, alpha=1., beta=1., **kwargs):
"""Calculate *y = alpha * x + beta * y*
**Type Constraints**: (*int8*, *uint8*, *int32*, *int64*, *float16*, *float32*, *float64*)
Parameters
----------
inputs : sequence of Tensor
The inputs, i.e., the *x*.
alpha : float, optional, default=1.
The alpha value.
beta : float, optional, default=1.
Returns
-------
sequence of Tensor
The outputs, i.e., the *y*.
"""
return Tensor.CreateOperator('Accumulate', **ParseArgs(locals()))
@OpSchema.Inputs(1, INT_MAX)
def MovingAverage(inputs, decay, **kwargs):
"""Calculate the *y = (1 - decay) * x + decay * y*
**Type Constraints**: (*int8*, *uint8*, *int32*, *int64*, *float16*, *float32*, *float64*)
Parameters
----------
inputs : sequence of Tensor
The inputs, i.e., the *x*.
decay : float, required
The decay factor.
Returns
-------
sequence of Tensor
The outputs, i.e., the *y*.
"""
return Accumulate(inputs, 1 - decay, decay, **kwargs)
\ No newline at end of file
...@@ -17,7 +17,7 @@ from . import * ...@@ -17,7 +17,7 @@ from . import *
@OpSchema.Inputs(1) @OpSchema.Inputs(1)
def AsType(inputs, dtype='float32', inplace=False, **kwargs): def Cast(inputs, dtype='float32', inplace=False, **kwargs):
"""Cast the data type of inputs to a specific one. """Cast the data type of inputs to a specific one.
If ``inplace`` is ``True``, cast ``self`` instead of returning a new one. If ``inplace`` is ``True``, cast ``self`` instead of returning a new one.
...@@ -41,7 +41,7 @@ def AsType(inputs, dtype='float32', inplace=False, **kwargs): ...@@ -41,7 +41,7 @@ def AsType(inputs, dtype='float32', inplace=False, **kwargs):
Examples Examples
-------- --------
>>> x = Tensor('x', dtype='float32').Variable() >>> x = Tensor('x', dtype='float32').Variable()
>>> y = AsType(x, 'int32') >>> y = Cast(x, 'int32')
>>> z = x.astype('int64') >>> z = x.astype('int64')
>>> xx = x.astype('float64', inplace=True) >>> xx = x.astype('float64', inplace=True)
>>> print(x.name, xx.name) >>> print(x.name, xx.name)
...@@ -53,7 +53,7 @@ def AsType(inputs, dtype='float32', inplace=False, **kwargs): ...@@ -53,7 +53,7 @@ def AsType(inputs, dtype='float32', inplace=False, **kwargs):
arguments['inputs'] = [] arguments['inputs'] = []
arguments['existing_outputs'] = [inputs] arguments['existing_outputs'] = [inputs]
return Tensor.CreateOperator('AsType', **arguments) return Tensor.CreateOperator('Cast', **arguments)
def Run(inputs, module, op, param_str='', num_outputs=1, **kwargs): def Run(inputs, module, op, param_str='', num_outputs=1, **kwargs):
...@@ -173,28 +173,4 @@ def StopGradient(inputs, **kwargs): ...@@ -173,28 +173,4 @@ def StopGradient(inputs, **kwargs):
A identity of input. A identity of input.
""" """
return Tensor.CreateOperator('StopGradient', **ParseArgs(locals())) return Tensor.CreateOperator('StopGradient', **ParseArgs(locals()))
\ No newline at end of file
@OpSchema.Inputs(1)
def MovingAverage(inputs, decay, **kwargs):
"""Calculate the moving average.
**Type Constraints**: (*int8*, *uint8*, *int32*, *int64*, *float16*, *float32*, *float64*)
Parameters
----------
inputs : Tensor
The values to calculate moving average.
decay : float
The decay factor.
Returns
-------
Tensor
The output tensor, i.e., ``variable``, calculated as:
|moving_average_function|
"""
return Tensor.CreateOperator('MovingAverage', **ParseArgs(locals()))
\ No newline at end of file
...@@ -740,7 +740,6 @@ def Shape(inputs, **kwargs): ...@@ -740,7 +740,6 @@ def Shape(inputs, **kwargs):
return Tensor.CreateOperator('Shape', **ParseArgs(locals())) return Tensor.CreateOperator('Shape', **ParseArgs(locals()))
@OpSchema.Inputs(0)
@ArgumentHelper.Desc('start') @ArgumentHelper.Desc('start')
@ArgumentHelper.Desc('stop') @ArgumentHelper.Desc('stop')
@ArgumentHelper.Desc('step') @ArgumentHelper.Desc('step')
......
...@@ -62,7 +62,7 @@ def Conv2d( ...@@ -62,7 +62,7 @@ def Conv2d(
The dilation multiple(s) of convolution. The dilation multiple(s) of convolution.
group : int, optional, default=1 group : int, optional, default=1
The group size of convolution. The group size of convolution.
padding : {'VALID', 'SAME, 'SAME_UPPER', 'SAME_LOWER'}, optional padding : {'VALID', 'SAME', 'SAME_UPPER', 'SAME_LOWER'}, optional
The padding algorithm. The padding algorithm.
data_format : {'NCHW', 'NHWC'}, optional data_format : {'NCHW', 'NHWC'}, optional
The data_format. The data_format.
...@@ -119,7 +119,7 @@ def DepthwiseConv2d( ...@@ -119,7 +119,7 @@ def DepthwiseConv2d(
The stride(s) of convolution. The stride(s) of convolution.
pads : sequence of int, optional, default=0 pads : sequence of int, optional, default=0
The zero padding size(s) of convolution. The zero padding size(s) of convolution.
padding : {'VALID', 'SAME, 'SAME_UPPER', 'SAME_LOWER'}, optional padding : {'VALID', 'SAME', 'SAME_UPPER', 'SAME_LOWER'}, optional
The padding algorithm. The padding algorithm.
data_format : {'NCHW', 'NHWC'}, optional data_format : {'NCHW', 'NHWC'}, optional
The data_format. The data_format.
...@@ -183,7 +183,7 @@ def ConvTranspose2d( ...@@ -183,7 +183,7 @@ def ConvTranspose2d(
The padding value add to one side(right) of the output. The padding value add to one side(right) of the output.
output_shape : sequence of (int, Tensor), optional output_shape : sequence of (int, Tensor), optional
The deterministic output shape for **SAME** padding. The deterministic output shape for **SAME** padding.
padding : {'VALID', 'SAME, 'SAME_UPPER', 'SAME_LOWER'}, optional padding : {'VALID', 'SAME', 'SAME_UPPER', 'SAME_LOWER'}, optional
The padding algorithm. The padding algorithm.
data_format : {'NCHW', 'NHWC'}, optional data_format : {'NCHW', 'NHWC'}, optional
The data_format. The data_format.
...@@ -224,7 +224,7 @@ def ConvTranspose2d( ...@@ -224,7 +224,7 @@ def ConvTranspose2d(
@OpSchema.Inputs(1) @OpSchema.Inputs(1)
def Pool2d( def Pool2d(
inputs, kernel_shape, strides, pads=0, padding='VALID', ceil=True, inputs, kernel_shape, strides, pads=0, padding='VALID', ceil_mode=True,
mode='MAX', data_format='NCHW', global_pooling=False, **kwargs): mode='MAX', data_format='NCHW', global_pooling=False, **kwargs):
"""2D Pooling, MAX or AVG. """2D Pooling, MAX or AVG.
...@@ -248,9 +248,9 @@ def Pool2d( ...@@ -248,9 +248,9 @@ def Pool2d(
The stride(s) of of pooling, The stride(s) of of pooling,
pads : sequence of int, optional, default=0 pads : sequence of int, optional, default=0
The zero padding size(s) of pooling. The zero padding size(s) of pooling.
padding : {'VALID', 'SAME, 'SAME_UPPER', 'SAME_LOWER'}, optional padding : {'VALID', 'SAME', 'SAME_UPPER', 'SAME_LOWER'}, optional
The padding algorithm. The padding algorithm.
ceil : bool, optional ceil_mode : bool, optional, default=True
Whether to ceil the boundary. Whether to ceil the boundary.
mode : {'MAX', 'AVG'}, optional mode : {'MAX', 'AVG'}, optional
The pooling mode. The pooling mode.
...@@ -505,48 +505,6 @@ def BiasAdd(inputs, data_format='NCHW', **kwargs): ...@@ -505,48 +505,6 @@ def BiasAdd(inputs, data_format='NCHW', **kwargs):
return Tensor.CreateOperator('BiasAdd', **arguments) return Tensor.CreateOperator('BiasAdd', **arguments)
@OpSchema.Inputs(2)
def DenseConcat(inputs, growth_rate=0, axis=1, **kwargs):
"""Memory-efficient concatenation for DenseNet `[Huang et.al, 2017] <http://arxiv.org/abs/1608.06993>`_.
This operator is forked from ``Concat``.
The memory optimization requires the following settings:
1. Set the ``growth_rate``, the value must larger than ``0``.
2. Set the ``mirror_stage`` to True.
Parameters
----------
inputs : sequence of Tensor
The inputs, represent A(old) and B(new) respectively.
growth_rate : int, optional, default=0
The growth rate.
axis : int, optional
The axis to concatenate.
mirror_stage : bool, optional
Whether to share input A for output C. Default is ``False``.
Returns
-------
Tensor
The concatenated tensor, represents C.
Examples
--------
>>> A = Tensor().Variable()
>>> B = Tensor().Variable()
>>> C = DenseConcat([A, B], axis=1) # Simple concatenation
>>> import dragon.memonger as opt
>>> C = opt.Drop(DenseConcat, [A, B], axis=1) # Memory-efficient concatenation
>>> D = DenseConcat([A, B], axis=1, mirror_stage=True) # Memory-efficient concatenation, equivalent
"""
return Tensor.CreateOperator('DenseConcat', **ParseArgs(locals()))
@OpSchema.Inputs(1) @OpSchema.Inputs(1)
@ArgumentHelper.Desc('keep_prob', as_target=False) @ArgumentHelper.Desc('keep_prob', as_target=False)
def DropBlock2d( def DropBlock2d(
......
...@@ -52,7 +52,6 @@ LRN = vision_ops.LRN ...@@ -52,7 +52,6 @@ LRN = vision_ops.LRN
NNResize = vision_ops.NNResize NNResize = vision_ops.NNResize
BilinearResize = vision_ops.BilinearResize BilinearResize = vision_ops.BilinearResize
BiasAdd = vision_ops.BiasAdd BiasAdd = vision_ops.BiasAdd
DenseConcat = vision_ops.DenseConcat
DropBlock2d = vision_ops.DropBlock2d DropBlock2d = vision_ops.DropBlock2d
# Recurrent # Recurrent
...@@ -104,6 +103,8 @@ FullyConnected = math_ops.FullyConnected ...@@ -104,6 +103,8 @@ FullyConnected = math_ops.FullyConnected
Eltwise = math_ops.Eltwise Eltwise = math_ops.Eltwise
Affine = math_ops.Affine Affine = math_ops.Affine
GramMatrix = math_ops.GramMatrix GramMatrix = math_ops.GramMatrix
Accumulate = math_ops.Accumulate
MovingAverage = math_ops.MovingAverage
# Normalization # Normalization
BatchNorm = norm_ops.BatchNorm BatchNorm = norm_ops.BatchNorm
...@@ -137,19 +138,18 @@ Squeeze = array_ops.Squeeze ...@@ -137,19 +138,18 @@ Squeeze = array_ops.Squeeze
Shape = array_ops.Shape Shape = array_ops.Shape
Arange = array_ops.Arange Arange = array_ops.Arange
# ControlFlow # Control Flow
Copy = control_flow_ops.Copy Copy = control_flow_ops.Copy
Equal = control_flow_ops.Equal Equal = control_flow_ops.Equal
Less = control_flow_ops.Less Less = control_flow_ops.Less
Grater = control_flow_ops.Greater Grater = control_flow_ops.Greater
# Misc # Misc
Cast = AsType = misc_ops.AsType Cast = AsType = misc_ops.Cast
Run = misc_ops.Run Run = misc_ops.Run
Template = misc_ops.Template Template = misc_ops.Template
Accuracy = misc_ops.Accuracy Accuracy = misc_ops.Accuracy
StopGradient = misc_ops.StopGradient StopGradient = misc_ops.StopGradient
MovingAverage = misc_ops.MovingAverage
# MPI # MPI
MPIBroadcast = mpi_ops.MPIBroadcast MPIBroadcast = mpi_ops.MPIBroadcast
......
This diff is collapsed. Click to expand it.
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!