Commit d0fa332c by Ting PAN

Select pybind11 to expose the C++ API

1 parent 1d03e8e2
Showing with 1363 additions and 1377 deletions
......@@ -10,3 +10,6 @@
[submodule "ThirdParty/cub"]
path = ThirdParty/cub
url = https://github.com/NVlabs/cub
[submodule "ThirdParty/pybind11"]
path = ThirdParty/pybind11
url = https://github.com/pybind/pybind11
------------------------------------------------------------------------
The list of most significant changes made over time in Dragon.
Dragon 0.3.0.0 (20190110)
Dragon 0.3.0.0 (20190309)
DRAGON_VERSION == 3000
Changes (w.r.t. Dragon 0.2.2.13):
......@@ -24,6 +24,8 @@ Preview Features:
- Use ``Eigen`` as the default cpu math library instead of ``OpenBLAS``.
- Use ``PyBind11`` as the default python module exporter.
- Integer data types support for common operators,
see the documentation for more detail information.
......@@ -32,6 +34,8 @@ Preview Features:
which unifies the naming of static and dynamic computation graph.
- The behavior of accumulating gradients have been canceled.
Bugs fixed:
......
......@@ -8,23 +8,22 @@
Quick Reference
---------------
========================== =============================================================================
List Brief
========================== =============================================================================
`EnableCPU`_ Enable CPU mode globally.
`IsCUDADriverSufficient`_ Is CUDADriver sufficient?
`EnableCUDA`_ Enable CUDA mode globally.
`SetRandomSeed`_ Set the global random seed.
`GetRandomSeed`_ Get the global random seed.
`SetGPU`_ Set the global id GPU.
`GetGPU`_ Get the global id of GPU.
`SetDebugMode`_ Enable Debug mode globally.
`LogMetaGraph`_ Enable to log meta graph globally.
`LogOptimizedGraph`_ Enable to log optimized graph globally.
`ExportMetaGraph`_ Enable to export all runnable meta graphs into text files.
`SetLoggingLevel`_ Set the minimum level of Logging.
`SetLoggingFile`_ Redirect the logging into the specific file.
========================== =============================================================================
=============================== =============================================================================
List Brief
=============================== =============================================================================
`EnableCPU`_ Enable CPU mode globally.
`EnableCUDA`_ Enable CUDA mode globally.
`SetRandomSeed`_ Set the global random seed.
`GetRandomSeed`_ Get the global random seed.
`SetGPU`_ Set the global id GPU.
`GetGPU`_ Get the global id of GPU.
`SetGraphOptimizationLevel`_ Set the default level of graph optimization.
`LogMetaGraph`_ Enable to log meta graph globally.
`LogOptimizedGraph`_ Enable to log optimized graph globally.
`ExportMetaGraph`_ Enable to export all runnable meta graphs into text files.
`SetLoggingLevel`_ Set the minimum level of Logging.
`SetLoggingFile`_ Redirect the logging into the specific file.
=============================== =============================================================================
API Reference
-------------
......@@ -33,13 +32,12 @@ API Reference
:members:
.. _EnableCPU: #dragon.config.EnableCPU
.. _IsCUDADriverSufficient: #dragon.config.IsCUDADriverSufficient
.. _EnableCUDA: #dragon.config.EnableCUDA
.. _SetRandomSeed: #dragon.config.SetRandomSeed
.. _GetRandomSeed: #dragon.config.GetRandomSeed
.. _SetGPU: #dragon.config.SetGPU
.. _GetGPU: #dragon.config.GetGPU
.. _SetDebugMode: #dragon.config.SetDebugMode
.. _SetGraphOptimizationLevel: #dragon.config.SetGraphOptimizationLevel
.. _LogMetaGraph: #dragon.config.LogMetaGraph
.. _LogOptimizedGraph: #dragon.config.LogOptimizedGraph
.. _ExportMetaGraph: #dragon.config.ExportMetaGraph
......
......@@ -27,6 +27,7 @@ C++ Binding Wrapper
core/workspace
core/tensor_utils
core/mpi
core/cuda
core/gradient_maker
============================== =======================================================================
......@@ -34,11 +35,13 @@ List Brief
============================== =======================================================================
`dragon.core.workspace`_ The interfaces of Workspace, mostly are the wrappers of C++.
`dragon.core.gradient_maker`_ The generator of GradientOps.
`dragon.core.tensor_utils`_ The Tensor utilities.
`dragon.core.mpi`_ The MPI utilities.
`dragon.core.tensor_utils`_ List some extended Tensor C++ API.
`dragon.core.mpi`_ List some useful MPI C++ API.
`dragon.core.cuda`_ List some useful CUDA C++ API.
============================== =======================================================================
.. _dragon.core.mpi: core/mpi.html
.. _dragon.core.cuda: core/cuda.html
.. _dragon.core.scope: core/scope.html
.. _dragon.core.tensor: core/tensor.html
.. _dragon.core.tensor_utils: core/tensor_utils.html
......
===========
:mod:`CUDA`
===========
.. toctree::
:hidden:
Quick Reference
---------------
============================== =============================================================================
List Brief
============================== =============================================================================
`IsCUDADriverSufficient`_ Is cuda driver sufficient?
`GetDevice`_ Get the current active cuda device.
`SynchronizeStream`_ Synchronize the specified cuda stream.
============================== =============================================================================
.. automodule:: dragon.core.cuda
:members:
.. _IsCUDADriverSufficient: #dragon.core.cuda.IsCUDADriverSufficient
.. _GetDevice: #dragon.core.cuda.GetDevice
.. _SynchronizeStream: #dragon.core.cuda.SynchronizeStream
\ No newline at end of file
......@@ -16,10 +16,9 @@ List Brief
`FromPyArray`_ Create a Tensor from a existing Array.
`SetPyArray`_ Set a Tensor from a existing Array.
`ToPyArray`_ Create a Array from a existing Tensor.
`ToPyArrayEx`_ Create a const Array from a existing Tensor.
`GetStorage`_ Get the storage of a existing Tensor.
`ToCPUTensor`_ Switch the storage of a existing Tensor on cpu memory.
`ToCUDATensor`_ Switch the storage of a existing Tensor on cuda memory.
`GetTensorInfo`_ Get the info of a existing Tensor.
============================== =============================================================================
API Reference
......@@ -33,7 +32,6 @@ API Reference
.. _FromPyArray: #dragon.core.tensor_utils.FromPyArray
.. _SetPyArray: #dragon.core.tensor_utils.SetPyArray
.. _ToPyArray: #dragon.core.tensor_utils.ToPyArray
.. _ToPyArrayEx: #dragon.core.tensor_utils.ToPyArrayEx
.. _GetStorage: #dragon.core.tensor_utils.GetStorage
.. _ToCPUTensor: #dragon.core.tensor_utils.ToCPUTensor
.. _ToCUDATensor: #dragon.core.tensor_utils.ToCUDATensor
.. _GetTensorInfo: #dragon.core.tensor_utils.GetTensorInfo
\ No newline at end of file
.. _ToCUDATensor: #dragon.core.tensor_utils.ToCUDATensor
\ No newline at end of file
......@@ -14,7 +14,7 @@ List Brief
`HasTensor`_ Query whether tensor has registered in current workspace.
`CreateFiller`_ Create the filler in the backend.
`GetTensorName`_ Query the name represented in current workspace.
`RenameTensor`_ Rename a tensor in current workspace.
`SetTensorAlias`_ Bind a alias to a existed tensor.
`FeedTensor`_ Feed the values to the given tensor.
`FetchTensor`_ Fetch the values of given tensor.
`ResetTensor`_ Reset the memory of given tensor.
......@@ -27,7 +27,7 @@ Operator
============================== =============================================================================
List Brief
============================== =============================================================================
`RunOperator`_ Create and Run the operator in the VM backend.
`RunOperator`_ Run the operator in the VM backend.
============================== =============================================================================
......@@ -39,7 +39,6 @@ List Brief
============================== =============================================================================
`CreateGraph`_ Create the graph in the backend.
`RunGraph`_ Run the specific graph.
`RunGraphEx`_ Run the graph from the meta definition.
============================== =============================================================================
Misc
......@@ -73,14 +72,13 @@ API Reference
.. _CreateGraph: #dragon.core.workspace.CreateGraph
.. _HasTensor: #dragon.core.workspace.HasTensor
.. _GetTensorName: #dragon.core.workspace.GetTensorName
.. _RenameTensor: #dragon.core.workspace.RenameTensor
.. _SetTensorAlias: #dragon.core.workspace.SetTensorAlias
.. _CreateFiller: #dragon.core.workspace.CreateFiller
.. _FetchTensor: #dragon.core.workspace.FetchTensor
.. _FeedTensor: #dragon.core.workspace.FeedTensor
.. _ResetTensor: #dragon.core.workspace.ResetTensor
.. _RunOperator: #dragon.core.workspace.RunOperator
.. _RunGraph: #dragon.core.workspace.RunGraph
.. _RunGraphEx: #dragon.core.workspace.RunGraphEx
.. _Snapshot: #dragon.core.workspace.Snapshot
.. _Restore: #dragon.core.workspace.Restore
.. _LogMetaGraph: #dragon.core.workspace.LogMetaGraph
......
......@@ -42,7 +42,6 @@ List Brief
`NNResize`_ Resize the image with *Nearest-Neighbor* method.
`BilinearResize`_ Resize the image with *Bi-Linear* method.
`BiasAdd`_ Add the bias across channels to a *NCHW* or *NHWC* input.
`DenseConcat`_ Memory-efficient concatenation for DenseNet. `[Huang et.al, 2017] <http://arxiv.org/abs/1608.06993>`_.
`DropBlock2d`_ Randomly drop the outputs according to the spatial blocks. `[Ghiasi et.al, 2018] <https://arxiv.org/abs/1810.12890>`_.
=================== ======================================================================
......@@ -113,7 +112,9 @@ List Brief
`Eltwise`_ Element-wise Sum or Product the arbitrary number of inputs.
`Affine`_ Calculate *Y = Ax + b* along the given range of axes.
`GramMatrix`_ Calculate the gram matrix. `[Gatys et.al, 2016] <https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf>`_.
`Moments`_ Compute the mean and variance of inputs along the given axes.
`Moments`_ Calculate the mean and variance of inputs along the given axes.
`Accumulate`_ Calculate *y = alpha * x + beta * y*
`MovingAverage`_ Calculate the *y = (1 - decay) * x + decay * y*
================== ======================================================================
Normalization
......@@ -174,12 +175,11 @@ Misc
================= ======================================================================
List Brief
================= ======================================================================
`AsType`_ Cast the data type of inputs to a specific one.
`Cast`_ Cast the data type of inputs to a specific one.
`Run`_ Run a custom operator. (Without GradientFlow)
`Template`_ Run a custom operator. (With GradientFlow)
`Accuracy`_ Calculate the Top-K accuracy.
`StopGradient`_ Return the identity of input with truncated gradient flow.
`MovingAverage`_ Calculate the moving average.
================= ======================================================================
Contrib
......@@ -268,6 +268,8 @@ List Brief
.. _Affine: operators/arithmetic.html#dragon.operators.arithmetic.Affine
.. _GramMatrix: operators/arithmetic.html#dragon.operators.arithmetic.GramMatrix
.. _Moments: operators/arithmetic.html#dragon.operators.arithmetic.Moments
.. _Accumulate: operators/arithmetic.html#dragon.operators.arithmetic.Accumulate
.. _MovingAverage: operators/arithmetic.html#dragon.operators.arithmetic.MovingAverage
.. _BatchNorm: operators/norm.html#dragon.operators.norm.BatchNorm
.. _GroupNorm: operators/norm.html#dragon.operators.norm.GroupNorm
......@@ -304,12 +306,11 @@ List Brief
.. _Less: operators/control_flow.html#dragon.operators.control_flow.Less
.. _Greater: operators/control_flow.html#dragon.operators.control_flow.Greater
.. _AsType: operators/misc.html#dragon.operators.misc.AsType
.. _Cast: operators/misc.html#dragon.operators.misc.Cast
.. _Run: operators/misc.html#dragon.operators.misc.Run
.. _Template: operators/misc.html#dragon.operators.misc.Template
.. _Accuracy: operators/misc.html#dragon.operators.misc.Accuracy
.. _StopGradient: operators/misc.html#dragon.operators.misc.StopGradient
.. _MovingAverage: operators/misc.html#dragon.operators.misc.MovingAverage
.. _Proposal: operators/contrib/rcnn.html#dragon.operators.contrib.rcnn.ops.Proposal
......
......@@ -19,16 +19,12 @@ ToolBox
:hidden:
tools/db
tools/im2db
tools/summary_writer
tools/tensorboard
==================== ====================================================================================
List Brief
==================== ====================================================================================
`LMDB`_ A wrapper of LMDB package.
`IM2DB`_ Make the sequential database for images.
`SummaryWriter`_ Write summaries for DragonBoard.
`TensorBoard`_ Write summaries for TensorBoard.
==================== ====================================================================================
......@@ -38,8 +34,5 @@ List Brief
<p style="text-indent:1.5em; font-size: 18px; max-width: 830px;">
.. _pip: https://pypi.python.org/pypi/pip
.. _LMDB: tools/db.html
.. _IM2DB: tools/im2db.html
.. _SummaryWriter: tools/summary_writer.html
.. _TensorBoard: tools/tensorboard.html
====================
:mod:`SummaryWriter`
====================
.. toctree::
:hidden:
Quick Reference
---------------
==================== =============================================================================
List Brief
==================== =============================================================================
`ScalarSummary`_ Write scalar summary.
==================== =============================================================================
API Reference
-------------
.. currentmodule:: dragon.tools.summary_writer
.. autoclass:: ScalarSummary
:members:
.. automethod:: __init__
.. _ScalarSummary: #dragon.tools.summary_writer.ScalarSummary
\ No newline at end of file
......@@ -2,40 +2,30 @@
:mod:`dragon.utils`
===================
Wrapper
-------
Vision
------
.. toctree::
:hidden:
utils/vision/database
utils/vision/data_batch
=================================== =====================================================================
List Brief
=================================== =====================================================================
`dragon.utils.vision.data_batch`_ Efficient Batch data provider based on `LMDB`_.
=================================== =====================================================================
Component
---------
.. toctree::
:hidden:
utils/vision/data_reader
utils/vision/data_transformer
utils/vision/blob_fetcher
========================================== =====================================================================
List Brief
========================================== =====================================================================
`dragon.utils.vision.data_reader`_ Queue encoded string from `LMDB`_.
`dragon.utils.vision.data_transformer`_ Queue transformed images from `DataReader`_.
`dragon.utils.vision.blob_fetcher`_ Queue blobs from `DataTransformer`_.
========================================== =====================================================================
========================================= =====================================================================
List Brief
========================================= =====================================================================
`dragon.utils.vision.im2db`_ Make the sequential database for images.
`dragon.utils.vision.data_batch`_ Efficient Batch data provider based on `LMDB`_.
`dragon.utils.vision.data_reader`_ Queue encoded string from `LMDB`_.
`dragon.utils.vision.data_transformer`_ Queue transformed images from `DataReader`_.
`dragon.utils.vision.blob_fetcher`_ Queue blobs from `DataTransformer`_.
========================================= =====================================================================
.. _LMDB: http://lmdb.readthedocs.io/en/release
.. _dragon.utils.vision.im2db: utils/vision/database.html
.. _DataReader: utils/vision/data_reader.html#dragon.utils.vision.data_reader
.. _DataTransformer: utils/vision/data_transformer.html#dragon.utils.vision.data_transformer
.. _dragon.utils.vision.data_batch: utils/vision/data_batch.html
......
============
:mod:`IM2DB`
============
===============
:mod:`Database`
===============
.. toctree::
:hidden:
......@@ -19,8 +19,8 @@ List Brief
API Reference
-------------
.. automodule:: dragon.tools.im2db
.. automodule:: dragon.utils.vision.im2db
:members:
.. _resize_image: #dragon.tools.im2db.resize_image
.. _make_db: #dragon.tools.im2db.make_db
\ No newline at end of file
.. _resize_image: #dragon.utils.vision.im2db.resize_image
.. _make_db: #dragon.utils.vision.im2db.make_db
\ No newline at end of file
......@@ -20,20 +20,23 @@ VirtualBox
vm/caffe
vm/theano
vm/torch
==================== ====================================================================================
List Brief
==================== ====================================================================================
`Theano`_ **Theano** is an inception of the modern deep learning frameworks.
`Caffe`_ **Caffe** is one of the most famous deep learning framework for Computer Vision.
`PyTorch`_ **PyTorch** provides straight-forward operations on research prototyping.
==================== ====================================================================================
.. |para| raw:: html
<p style="text-indent:1.5em; font-size: 18px; max-width: 830px;">
.. _TinyDragon: ../index.html#tinydragon
.. _Theano: vm/theano.html
.. _Caffe: vm/caffe.html
.. _PyTorch: vm/torch.html
.. _TensorFlow: ../index.html#tensorflow
......@@ -66,7 +66,6 @@ List Brief
`AddLayer`_ The extended implementation of ``EltwiseLayer``.
`ConcatLayer`_ The implementation of ``ConcatLayer``.
`SliceLayer`_ The implementation of ``SliceLayer``.
`DenseConcatLayer`_ The implementation for `DenseNet`_.
`CropLayer`_ The implementation of ``CropLayer``.
`ReshapeLayer`_ The implementation of ``ReshapeLayer``.
`PermuteLayer`_ The implementation of ``PermuteLayer``.
......@@ -180,7 +179,6 @@ API Reference
.. _AddLayer: #dragon.vm.caffe.layers.common.AddLayer
.. _ConcatLayer: #dragon.vm.caffe.layers.common.ConcatLayer
.. _SliceLayer: #dragon.vm.caffe.layers.common.SliceLayer
.. _DenseConcatLayer: #dragon.vm.caffe.layers.common.DenseConcatLayer
.. _CropLayer: #dragon.vm.caffe.layers.common.CropLayer
.. _ReshapeLayer: #dragon.vm.caffe.layers.common.ReshapeLayer
.. _PermuteLayer: #dragon.vm.caffe.layers.common.PermuteLayer
......@@ -210,12 +208,10 @@ API Reference
.. _MPIBroadcastLayer: #dragon.vm.caffe.layers.mpi.MPIBroadcastLayer
.. _MPIGatherLayer: #dragon.vm.caffe.layers.mpi.MPIGatherLayer
.. _Layer.Setup: #dragon.vm.caffe.layer.Layer.Setup
.. _Layer.Fill: #dragon.vm.caffe.layer.Layer.Fill
.. _LMDB: http://lmdb.readthedocs.io/en/release
.. _DenseNet: http://arxiv.org/abs/1608.06993
.. _LayerSetUp(layer.hpp, L91): https://github.com/BVLC/caffe/blob/effcdb0b62410b2a6a54f18f23cf90733a115673/include/caffe/layer.hpp#L91
.. _DataParameter.source: https://github.com/BVLC/caffe/blob/effcdb0b62410b2a6a54f18f23cf90733a115673/src/caffe/proto/caffe.proto#L647
.. _DataParameter.prefetch: https://github.com/BVLC/caffe/blob/effcdb0b62410b2a6a54f18f23cf90733a115673/src/caffe/proto/caffe.proto#L672
......
============
:mod:`Torch`
============
Abstraction
-----------
|para| `PyTorch`_ provides straight-forward operations on research prototyping.
|para| We are aware that **Dragon** is a graph-based framework with strictly naming
for tensors, operators, and workspaces, while `Torch`_ is not.
A simple way to bridge their differences is **JIT**, which traces the anonymous expressions,
indicates a series of executions to the backend. If so, **AutoGrad** will just be a trick(Remember the *Chain Rule*).
|para| Rewriting the GC(*Garbage Collection*) is crucial in this role,
as the costly deconstruction on memories and operators must be avoided.
We could either persist a Operator(i.e. **Module**),
or reuse the several memories by turns(i.e. **MemoryPool**), if naming them formally.
|para| We are still working hard to cover the original PyTorch operators,
however, a bunch of extended operators in many other frameworks can be used.
Our **PyTorch** will be unique and more powerful than the official one.
Related Work
------------
|paratitle| **Proto-based Intermediate Representation**
|para| Recent years, several powerful frameworks choose the ProtocolBuffer to
describe the operators with various arguments, including `Caffe`_, `Caffe2`_, `TensorFlow`_, and `ONNX`_.
The most important reason is that, these descriptors can be easily serialized and sent to the backend.
With the help of **Factory Pattern**, we have had an elegant way to dispatch the executions, while not
call them imperatively. This way is also known as the **Declarative Programming**.
|para| Attaching the IR(Intermediate Representation) takes the following advantages:
* Traceable pipelines, much helpful for visualizing and debugging.
* Deterministic executions, detailed optimization can be applied.
* Efficient deployments, data-flows has been well organized.
|para| A good news is that, we can reduce the overhead of IR below 5% of computation time,
which means the dynamic graph could work as fast as the static graph while retain the flexibility.
|paratitle| **Caffe2**
|para| We have noticed that some developers discouraged the **Declarative Programming** in 2017 and early 2018,
due to the counter-intuitive building of computation graph. Actually, `Caffe2`_ had published Operator-Wise execution
(a.k.a, *workspace.RunOperator()*) since 2016. In other words, **Imperative Programming** is the subset of **Declarative Programming**,
if we process the declaration implicitly. This mechanism is sometimes called **JIT** by someone.
Architectures
-------------
.. toctree::
:hidden:
.. _Torch: http://torch.ch
.. _PyTorch: https://pytorch.org
.. _Caffe: http://caffe.berkeleyvision.org
.. _Caffe2: http://caffe2.ai
.. _TensorFlow: https://www.tensorflow.org
.. _ONNX: https://onnx.ai
.. |nbsp| raw:: html
&nbsp
.. |br| raw:: html
<br />
.. |paratitle| raw:: html
<p style="font-size: 20px">
.. |sectitle| raw:: html
<p style="text-indent:1em; font-size: 18px">
.. |para| raw:: html
<p style="text-indent:1.5em; font-size: 18px; max-width: 830px;">
.. |context| raw:: html
<p style="font-size: 18px; max-width: 830px;">
......@@ -97,6 +97,7 @@ include_directories(${PROJECT_SOURCE_DIR}/src)
if (BUILD_PYTHON_API)
include_directories(${PYTHON_INCLUDE_DIRS})
include_directories(${NUMPY_INCLUDE_DIR})
include_directories(${THIRD_PARTY_DIR}/pybind11/include)
endif()
if (WITH_CUDA)
include_directories(${CUDA_INCLUDE_DIRS})
......
......@@ -38,7 +38,7 @@ class CPUContext {
void SwitchToDevice() {}
/*! \brief Switch to the device with the given stream */
void SwitchToDevice(int stream_id) {}
void SwitchToDevice(const int stream_id) {}
/*! \brief Synchronize the dispatched operations */
void FinishDeviceCompution() {}
......@@ -106,6 +106,9 @@ class CPUContext {
/*! \brief Return the device id */
int device_id() const { return 0; }
/*! \brief Return the stream id */
int stream_id() const { return 0; }
/*! \brief Set the stream id */
void set_stream_id(int stream_id) {}
......
......@@ -32,6 +32,7 @@ class CNRTObject;
class CNMLContext {
public:
/*! \brief Default Constructor */
CNMLContext(const DeviceOption& option)
: device_id_(option.device_id()),
random_seed_(option.has_random_seed() ?
......@@ -39,34 +40,43 @@ class CNMLContext {
CHECK_EQ(option.device_type(), PROTO_CNML);
}
/*! \brief Constructor with the specified device id */
CNMLContext(const int device_id = 0)
: device_id_(device_id),
random_seed_(DEFAULT_RNG_SEED) {}
/*! \brief Switch to the device with the given stream */
void SwitchToDevice(int stream_id);
inline void SwitchToDevice() { SwitchToDevice(1); }
/*! \brief Switch to the device of this context */
inline void SwitchToDevice() { SwitchToDevice(0); }
/*! \brief Synchronize the dispatched operations */
void FinishDeviceCompution();
/*! \brief Malloc the memory */
static void* New(size_t nbytes);
/*! \brief Zero-Reset the memory */
static void Memset(
size_t nbytes,
void* ptr);
/*! \brief Zero-Reset the memory asynchronously */
inline void MemsetAsync(
size_t nbytes,
void* ptr) {
Memset(nbytes, ptr);
}
/*! \brief Copy the memory */
template<class DstContext, class SrcContext>
static void Memcpy(
size_t nbytes,
void* dst,
const void* src);
/*! \brief Copy the memory with given type asynchronously */
template<class DstContext, class SrcContext>
inline void MemcpyAsync(
size_t nbytes,
......@@ -75,23 +85,33 @@ class CNMLContext {
Memcpy<DstContext, SrcContext>(dst, src, nbytes);
}
/*! \brief Free the memory */
static void Delete(void* data);
inline int device_id() const { return device_id_; }
/*! \brief Return the device id */
int device_id() const { return device_id_; }
inline void set_stream_id(int stream_id) { stream_id_ = stream_id; }
/*! \brief Return the stream id */
int stream_id() const { return stream_id_; }
/*! \brief Set the stream id */
void set_stream_id(int stream_id) { stream_id_ = stream_id; }
inline cnrtStream_t cnrt_stream() {
/*! \brief Return the internal cnrt stream */
cnrtStream_t cnrt_stream() {
return cnrt_stream(device_id_, stream_id_);
}
/*! \brief Return the specified cnrt stream */
static cnrtStream_t cnrt_stream(
int device_id,
int stream_id);
/*! \brief Return the global context locker */
static std::mutex& mutex() { static std::mutex m; return m; }
static CNRTObject* cuda_object();
/*! \brief Return the thread local cnrt object */
static CNRTObject* cnrt_object();
private:
int device_id_, stream_id_ = 1, random_seed_;
......
......@@ -80,11 +80,16 @@ class CUDAObject {
} return dev_streams[stream_id];
}
/*! \brief Return the default cuda stream */
/*! \brief Return the default cuda stream of current device */
cudaStream_t GetDefaultStream() {
return GetStream(CUDA_GET_DEVICE(), 0);
}
/*! \brief Return the default cuda stream of given device */
cudaStream_t GetDefaultStream(int device_id) {
return GetStream(device_id, 0);
}
/*! \brief Return the specified cublas handle */
cublasHandle_t GetCuBLASHandle(int device_id, int stream_id) {
vector<cublasHandle_t>& dev_handles = cublas_handles[device_id];
......@@ -141,13 +146,13 @@ class CUDAContext {
random_seed_(DEFAULT_RNG_SEED) {}
/*! \brief Switch to the device with the given stream */
void SwitchToDevice(int stream_id) {
void SwitchToDevice(const int stream_id) {
CUDA_CHECK(cudaSetDevice(device_id_));
stream_id_ = stream_id;
}
/*! \brief Switch to the device of this context */
void SwitchToDevice() { SwitchToDevice(1); }
void SwitchToDevice() { SwitchToDevice(0); }
/*! \brief Synchronize the dispatched operations */
void FinishDeviceCompution() {
......@@ -191,8 +196,19 @@ class CUDAContext {
size_t nbytes,
void* dst,
const void* src) {
MemcpyEx<DstContext, SrcContext>(
nbytes, dst, src, active_device_id());
}
/*! \brief Copy the memory [Extended] */
template<class DstContext, class SrcContext>
static void MemcpyEx(
size_t nbytes,
void* dst,
const void* src,
int device_id) {
cudaStream_t stream = CUDAContext::
cuda_object()->GetDefaultStream();
cuda_object()->GetDefaultStream(device_id);
CUDA_CHECK(cudaMemcpyAsync(dst, src, nbytes,
cudaMemcpyDefault, stream));
cudaError_t error = SynchronizeStream(stream);
......@@ -230,9 +246,15 @@ class CUDAContext {
return cudaGetLastError();
}
/*! \brief Return the device id */
/*! \brief Return the device id of this context */
int device_id() const { return device_id_; }
/*! \brief Return the active device id of current thread */
static int active_device_id() { return CUDA_GET_DEVICE(); }
/*! \brief Return the stream id */
int stream_id() const { return stream_id_; }
/*! \brief Set the stream id */
void set_stream_id(int stream_id) { stream_id_ = stream_id; }
......@@ -292,85 +314,48 @@ class CUDAContext {
}
private:
int device_id_, stream_id_ = 1, random_seed_;
int device_id_, stream_id_ = 0, random_seed_;
unique_ptr<std::mt19937> rand_generator_;
curandGenerator_t curand_generator_ = nullptr;
};
template <class Context>
class CUDAClosure {
public:
/*! \brief Default Constructor */
CUDAClosure() {}
/*! \brief Constructor with the given context */
explicit CUDAClosure(Context* ctx): ctx_(ctx) {}
/*! \brief Synchronize the dispatched operations */
void Sync() {
for (auto stream_id : active_streams_) {
cudaStreamSynchronize(cuda_object_
.GetStream(ctx_->device_id(), stream_id));
cudaError_t error = cudaGetLastError();
CHECK_EQ(error, cudaSuccess)
<< "\nCUDA Error: " << cudaGetErrorString(error);
}
active_streams_.clear();
}
/*! \brief Return the specified cuda stream */
cudaStream_t cuda_stream(int stream_id) {
active_streams_.push_back(stream_id);
return cuda_object_.GetStream(
ctx_->device_id(), stream_id);
}
/*! \brief Return the specified cublas handle */
cublasHandle_t cublas_handle(int stream_id) {
active_streams_.push_back(stream_id);
return cuda_object_.GetCuBLASHandle(
ctx_->device_id(), stream_id);
}
/*! \brief Return the specified cudnn handle */
#ifdef WITH_CUDNN
cudnnHandle_t cudnn_handle(int stream_id) {
active_streams_.push_back(stream_id);
return cuda_object_.GetCuDNNHandle(
ctx_->device_id(), stream_id);
}
#endif
protected:
Context* ctx_;
CUDAObject cuda_object_;
vector<int> active_streams_;
};
#else // WITH_CUDA
class CUDAContext {
public:
/*! \brief Default Constructor */
CUDAContext(const DeviceOption& option) { CUDA_NOT_COMPILED; }
/*! \brief Constructor with the specified device id */
CUDAContext(const int device_id = 0) { CUDA_NOT_COMPILED; }
void SwitchToDevice() { CUDA_NOT_COMPILED; }
/*! \brief Switch to the device with the given stream */
void SwitchToDevice(int stream_id) { CUDA_NOT_COMPILED; }
/*! \brief Switch to the device of this context */
void SwitchToDevice() { CUDA_NOT_COMPILED; }
/*! \brief Synchronize the dispatched operations */
void FinishDeviceCompution() { CUDA_NOT_COMPILED; }
/*! \brief Malloc the memory */
static void* New(size_t nbytes) { CUDA_NOT_COMPILED; }
/*! \brief Zero-Reset the memory */
static void Memset(
size_t nbytes,
void* ptr) {
CUDA_NOT_COMPILED;
}
/*! \brief Zero-Reset the memory asynchronously */
void MemsetAsync(
size_t nbytes,
void* ptr) {
CUDA_NOT_COMPILED;
}
/*! \brief Copy the memory */
template<class DstContext, class SrcContext>
static void Memcpy(
size_t nbytes,
......@@ -379,6 +364,17 @@ class CUDAContext {
CUDA_NOT_COMPILED;
}
/*! \brief Copy the memory [Extended] */
template<class DstContext, class SrcContext>
static void MemcpyEx(
size_t nbytes,
void* dst,
const void* src,
int device_id) {
CUDA_NOT_COMPILED;
}
/*! \brief Copy the memory asynchronously */
template<class DstContext, class SrcContext>
void MemcpyAsync(
size_t nbytes,
......@@ -387,7 +383,16 @@ class CUDAContext {
CUDA_NOT_COMPILED;
}
/*! \brief Return the device id */
int device_id() const { return 0; }
/*! \brief Return the active device id of current thread */
static int active_device_id() { return 0; }
/*! \brief Return the stream id */
int stream_id() const { return 0; }
/*! \brief Set the stream id */
void set_stream_id(int stream_id) {}
};
......
......@@ -20,80 +20,69 @@ namespace dragon {
class GraphBase {
public:
struct Node {
vector<string> parents;
vector<string> childs;
int op_idx = -1;
OperatorDef op_def;
};
/*! \brief Default constructor */
GraphBase(
const GraphDef& meta_graph,
Workspace* ws);
/*! \brief Default deconstructor */
virtual ~GraphBase() {}
GraphDef BuildUpdateOps(const GraphDef& input_def);
/*! \brief Create a graph from the optimized def */
virtual bool Create(
const GraphDef& optimized_graph,
Workspace* ws) = 0;
/*! \brief Run the graph once synchronously */
virtual bool Run(
const string& include,
const string& exclude,
const int stream_id = 1) = 0;
int stream_id = 0) = 0;
/*! \brief Return the name of this graph */
string name() const { return name_; }
protected:
/*! \brief Store the name and running phase */
string name_, phase_;
/*! \brief Store the defined arguments */
Map<string, Argument> args_;
/*! \brief Store the parent workspace */
Workspace* ws_;
};
class Graph : public GraphBase {
public:
/*! \brief Default constructor */
Graph(const GraphDef& meta_graph, Workspace* ws);
/*! \brief Default deconstructor */
virtual ~Graph() { for (auto* op : ops_) delete op; }
/*! \brief Create a graph from the optimized def */
bool Create(
const GraphDef& optimized_graph,
Workspace* ws) override;
/*! \brief Run the graph once synchronously */
bool Run(
const string& include,
const string& exclude,
const int stream_id = 1) override;
GraphDef Prune(const GraphDef& meta_graph);
GraphDef Share(const GraphDef& optimized_graph);
void ShareGrads(GraphDef& optimized_graph);
GraphDef BuildUpdateOps(const GraphDef& meta_graph);
void RecomputingAware(
const GraphDef& optimized_graph,
Workspace* ws);
int stream_id = 0) override;
/*! \brief Return the parent workspace */
Workspace* ws() const { return ws_; }
protected:
void ForwardShareDyeing(
const string& u,
const string& ancestor);
void ForwardPruneDyeing(
const string& u,
const string& leaf,
const vector<string>& path);
void BackwardPruneDyeing(string v);
/*! \brief Store the internal operators */
vector<OperatorBase*> ops_;
Map<string, Node> dag_;
Map<string, bool> visited_, colored_;
Map<string, string> renamed_;
Set<string> targets_;
};
/*! \brief Create a graph from the raw def */
GraphBase* NewGraph(
const GraphDef& meta_graph,
Workspace* ws);
......
......@@ -19,14 +19,19 @@ namespace dragon {
class GraphGradientMaker {
public:
GraphGradientMaker(): cur_op_idx_(0) {}
GraphGradientMaker()
: cur_op_idx_(0) {}
void Make(
const GraphDef& forward_def,
const vector<string>& targets,
GraphDef& new_def);
const vector<OperatorDef*>& forward_def,
const vector<string>& targets,
GraphDef& new_def);
void Share(const string& grads_prefix, GraphDef& graph);
void Make(
const GraphDef& forward_def,
GraphDef& backward_def);
void Share(GraphDef& graph);
void SetTerms(const Map<string, string>& terms) { terms_ = terms; }
void SetOperatorPrefix(const string& prefix) { op_prefix_ = prefix; }
......
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_CORE_GRAPH_OPTIMIZER_H_
#define DRAGON_CORE_GRAPH_OPTIMIZER_H_
#include "core/common.h"
namespace dragon {
class Workspace;
class GraphOptimizer {
public:
/*! \brief The simple node structure */
struct Node {
vector<string> parents;
vector<string> childs;
int op_idx = -1;
OperatorDef op_def;
};
/*! \brief Default constructor */
GraphOptimizer(Workspace* ws) : ws_(ws) {}
/*! \brief Prune the redundant nodes (-O1) */
GraphDef PruneNodes(const GraphDef& input_def);
/*! \brief Add the inplace for outputs (-O2) */
GraphDef AddInplace(const GraphDef& input_def);
/*! \brief Plan the recomputing for inputs (-O3) */
GraphDef MirrorStage(
const GraphDef& input_def,
Map< string, vector<int> >& op_indices);
/*! \brief Allocate the buffer for outputs (-O3) */
GraphDef SimulateGC(const GraphDef& input_def);
protected:
/*! \brief Traverse from input gradients to dying the nodes */
void ForwardPruneTraversal(
const string& u,
const string& leaf,
const vector<string>& path);
/*! \brief Traverse from targets to dying the nodes */
void BackwardPruneTraversal(const string& v);
/*! \brief Traverse from inputs to find the available inplace chain */
void InplaceTraversal(
const string& u,
const string& ancestor);
/* \brief Store the workspace of parent graph */
Workspace* ws_;
/* \brief Store the DAG */
Map<string, Node> dag_;
/* \brief Store the traversal flags */
Map<string, bool> visited_, colored_;
/* \brief Store the inplace relations */
Map<string, string> renamed_;
};
} // namespace dragon
#endif // DRAGON_CORE_GRAPH_OPTIMIZER_H_
\ No newline at end of file
......@@ -35,8 +35,6 @@ class MixedMemory {
STATE_AT_CUDA,
/*! \brief Memory could be modified by CNMLContext last time */
STATE_AT_CNML,
/*! \brief Memory should be copied to another device next time */
SWITCHED,
/*! \brief Host and Device now hold the same contents */
SYNCED,
} State;
......@@ -46,7 +44,7 @@ class MixedMemory {
cuda_ptr_(nullptr), cnml_ptr_(nullptr) {}
/*! \brief Constructor with the known meta and size */
MixedMemory(const TypeMeta& meta, const size_t nbytes)
MixedMemory(const TypeMeta& meta, size_t nbytes)
: meta_(meta), nbytes_(nbytes), cpu_ptr_(nullptr),
cuda_ptr_(nullptr), cnml_ptr_(nullptr) {}
......@@ -54,19 +52,19 @@ class MixedMemory {
~MixedMemory();
/*! \brief Return the const data pointer on CPUContext */
const void* cpu_data();
const void* cpu_data(size_t nbytes = 0);
/*! \brief Return the const data pointer on CUDAContext */
const void* cuda_data();
const void* cuda_data(size_t nbytes = 0);
/*! \brief Return the const data pointer on CNMLContext */
const void* cnml_data();
/*! \brief Return the mutable data pointer on CPUContext */
void* mutable_cpu_data();
void* mutable_cpu_data(size_t nbytes = 0);
/*! \brief Return the mutable data pointer on CUDAContext */
void* mutable_cuda_data();
void* mutable_cuda_data(size_t nbytes = 0);
/*! \brief Return the mutable data pointer on CNMLContext */
void* mutable_cnml_data();
......@@ -85,11 +83,11 @@ class MixedMemory {
/*! \brief Set the cpu data pointer from external context */
void set_cpu_data(void* cpu_ptr, size_t nbytes);
/*! \brief Switch to the device set by Context before */
void SwitchToDevice();
/*! \brief Switch to the specified device */
void SwitchToDevice(int device_id);
/*! \brief Switch to the specified cuda device */
void SwitchToCUDADevice(int device_id);
/*! \brief Return the total bytes of this memory */
......@@ -110,14 +108,17 @@ class MixedMemory {
/*! \brief Set the storage order */
void set_order(StorageOrder order) { order_ = order; }
/*! \brief Return the device id of the memory on device */
int device_id() const { return ptr_device_; }
/*! \brief Return a string to describe the internal structure */
const Map<string, string> info() const;
/*! \brief Control the state machine to CPUContext */
void ToCPU();
void ToCPU(size_t nbytes = 0);
/*! \brief Control the state machine to CUDAContext */
void ToCUDA();
void ToCUDA(size_t nbytes = 0);
private:
/*! \brief The type meta to call the deconstructor */
......@@ -137,7 +138,7 @@ class MixedMemory {
/*! \brief Whether this memory owns the cpu data pointer */
int own_cpu_ptr_ = 1;
/*! \brief Store the device id for some data pointers */
int ptr_device_ = 0;
......
......@@ -30,10 +30,10 @@ class Workspace;
class OperatorBase {
public:
/*! Default constructor */
/*! \brief Default constructor */
OperatorBase(const OperatorDef& def, Workspace* ws);
/*! Default deconstructor */
/*! \brief Default deconstructor */
virtual ~OperatorBase() {}
/*! \brief Return the specified input tensor */
......@@ -49,19 +49,13 @@ class OperatorBase {
int OutputSize() { return (int)outputs_.size(); }
/*! \brief Modify this operator according to the given def */
void MutableOp(const OperatorDef& def);
/*! \brief Modify this operator according to the given properties */
void MutableOp(
const vector<string>& inputs,
const vector<string>& outputs,
const string& anchor);
void UpdateFrom(const OperatorDef& def);
/*! \brief Switch the internal running phase */
void SwitchToPhase(const string& phase) { phase_ = phase; }
/*! \brief Run this operator on the specified stream */
virtual void Run(int stream_id = 1) { NOT_IMPLEMENTED; }
virtual void Run(int stream_id = 0) { NOT_IMPLEMENTED; }
/*! \brief Fusion this operator into the specified graph */
virtual void Fusion(void* graph) { NOT_IMPLEMENTED; }
......@@ -100,14 +94,14 @@ class OperatorBase {
/*! \brief Return the specified argument */
const Argument& arg(const string& name) { return *(args_[name]); }
typedef Map<string, vector<OperatorBase*> > RecomputeMap;
typedef Map<string, vector<OperatorBase*> > SubGraph;
/*! \brief Return the recomputing map of this operator */
RecomputeMap& recompute_map() { return recompute_map_; }
/*! \brief Return the recomputing subgraph of this operator */
SubGraph& subgraph() { return subgraph_; }
/*! \brief Set the given recomputing map */
void set_recompute_map(RecomputeMap recompute_map) {
recompute_map_ = recompute_map;
/*! \brief Set the given recomputing subgraph */
void set_subgraph(SubGraph subgraph) {
subgraph_ = subgraph;
}
/*! \brief Return the stored operator def */
......@@ -129,7 +123,7 @@ class OperatorBase {
protected:
string phase_, anchor_;
Map<std::string, const Argument*> args_;
Map<string, vector<OperatorBase*> > recompute_map_;
SubGraph subgraph_;
vector<Tensor*> inputs_, outputs_;
OperatorDef def_;
Workspace* ws_;
......@@ -138,50 +132,66 @@ class OperatorBase {
template <class Context>
class Operator : public OperatorBase {
public:
/*! \brief Default constructor */
Operator(const OperatorDef& def, Workspace* ws)
: OperatorBase(def, ws), ctx_(def.device_option()),
allow_recompute_(OperatorBase::Arg<bool>(
"recomputing_aware", false)),
allow_recomputing_(OperatorBase::Arg<bool>(
"allow_recomputing", false)),
do_sync_(OperatorBase::Arg<bool>(
"do_sync", true)) {
"do_sync", false)) {
allow_run_ = true;
allow_run_ &= _MPICheck();
allow_run_ &= MPICheck();
allow_run_ &= (!(OutputSize() == 1 &&
Output(0)->name() == "ignore"));
}
void Run(int stream_id = 1) final {
/*! \brief Run this operator on the specified stream */
void Run(int stream_id = 0) final {
if (!allow_run_) return;
if (allow_recompute_) MakeResource();
if (allow_recomputing_) PrepareResource();
ctx()->SwitchToDevice(stream_id);
MemorySwitch();
RunOnDevice();
if (do_sync_) ctx()->FinishDeviceCompution();
if (allow_recompute_) CleanResource();
if (do_sync_ || stream_id > 0) {
// We will sync the stream 0 at the specific time
ctx()->FinishDeviceCompution();
}
if (allow_recomputing_) ReleaseResource();
}
virtual void ElimateCorruption();
virtual void MakeResource();
virtual void CleanResource();
/*! \brief Prepare the content of inputs */
virtual void PrepareResource();
/*! \brief Release the ownership of inputs */
virtual void ReleaseResource();
/*! \brief Coordinate the context of inputs and outputs */
virtual void MemorySwitch() {
for (auto* I : inputs_)
if(I->name() != "ignore") I->SwitchToDevice();
for (auto* O : outputs_)
if(O->name() != "ignore") O->SwitchToDevice();
for (auto* e : inputs_)
if(e->name() != "ignore")
e->SwitchToDevice(ctx()->device_id());
for (auto* e : outputs_)
if(e->name() != "ignore")
e->SwitchToDevice(ctx()->device_id());
}
/*! \brief Implement the detailed execution */
virtual void RunOnDevice() = 0;
/*! \brief Return the internal context */
Context* ctx() { return &ctx_; }
/*! \brief Whether this operator can be ignored */
bool AllowRun() { return allow_run_; }
protected:
/*! \brief Store the internal context */
Context ctx_;
bool allow_run_, allow_recompute_, do_sync_;
bool allow_run_, allow_recomputing_, do_sync_;
private:
bool _MPICheck() {
/*! \brief Check the MPI conditions */
bool MPICheck() {
#ifndef WITH_MPI
return true;
#else
......@@ -197,7 +207,13 @@ class Operator : public OperatorBase {
}
};
OperatorBase* CreateOperator(const OperatorDef& def, Workspace* ws);
/*! \brief New a operator from the raw def */
OperatorBase* NewOperator(
const OperatorDef& def,
Workspace* ws);
/*! Macros */
#define USE_SIMPLE_CTOR_DTOR(name) \
name(const OperatorDef& def, Workspace* ws) \
......@@ -350,7 +366,9 @@ DECLARE_REGISTRY(
<< "\nExcepted the size of " << #argument \
<< " > " << idx << ". (Got " \
<< argument##_desc.size() << ")."; \
Tensor* argument##_tensor = ws()->GetTensor(argument##_desc[idx]); \
Tensor* argument##_tensor = ws()->GetTensor( \
str::replace_first(argument##_desc[idx], \
"${ANCHOR}", anchor())); \
CHECK(argument##_tensor->IsType<type>()) \
<< "\nThe type of " << #argument << " should be " << #type << "."; \
CHECK_EQ(argument##_tensor->count(), 1) \
......
......@@ -46,10 +46,17 @@ class GradientMakerBase {
virtual Gradient Make() {
vector<OperatorDef> new_defs = MakeDefs();
Argument anchor;
anchor.set_name("anchor"); anchor.set_s(def.name());
for (int i = 0; i < new_defs.size(); i++)
new_defs[i].add_arg()->CopyFrom(anchor);
if (def.has_uid()) {
// Attach the anchor to the name if having UID
for (int i = 0; i < new_defs.size(); i++)
new_defs[i].set_name(def.name());
} else {
// Otherwise, just put it into the arguments
Argument anchor;
anchor.set_name("anchor"); anchor.set_s(def.name());
for (int i = 0; i < new_defs.size(); i++)
new_defs[i].add_arg()->CopyFrom(anchor);
}
return Gradient(new_defs, g_inputs_, DefaultValues());
};
......@@ -117,10 +124,10 @@ class NoGradient : public GradientMakerBase {
class SimpleGradientMaker final : public GradientMakerBase {
public:
/*!
* <SimpleMaker>
* <SimpleMaker>
*
* Inputs: X1, X2, ..., Xn, dY
* Outputs: dX1, dX2, ..., dXn
* Inputs: X1, X2, ..., Xn, dY
* Outputs: dX1, dX2, ..., dXn
*
*/
GRADIENT_MAKER_CTOR(SimpleGradientMaker);
......@@ -141,12 +148,12 @@ class SimpleGradientMaker final : public GradientMakerBase {
class InplaceGradientMaker final : public GradientMakerBase {
public:
/*!
* <InplaceMaker>
*
* Inputs: Y, dY
* Outputs: dX
*
*/
* <InplaceMaker>
*
* Inputs: Y, dY
* Outputs: dX
*
*/
GRADIENT_MAKER_CTOR(InplaceGradientMaker);
vector<OperatorDef> MakeDefs() override {
return SingleDef(
......
......@@ -80,7 +80,7 @@ class Tensor {
int ndim() const { return (int)dims_.size(); }
/*! \brief Return the dimension of given axis */
int64_t dim(const int64_t i) const{ return dims_[axis(i)]; }
int64_t dim(int64_t i) const{ return dims_[axis(i)]; }
/*! \brief Return all the dimensions */
const vector<int64_t>& dims() const { return dims_; }
......@@ -95,7 +95,7 @@ class Tensor {
size_t capacity() const { return capacity_; }
/*! \brief Return the number of elements along the [start, end) axes */
int64_t count(const int64_t start, const int64_t end) const {
int64_t count(int64_t start, int64_t end) const {
int64_t nelements = 1;
for (int64_t i = start; i < end; i++) nelements *= dim(i);
return nelements;
......@@ -105,10 +105,10 @@ class Tensor {
int64_t count() const { return (int64_t)size_; }
/*! \brief Return the number of elements from the start axis */
int64_t count(const int64_t start) const { return count(start, ndim()); }
int64_t count(int64_t start) const { return count(start, ndim()); }
/*! \brief Return the stride of given axis */
int64_t stride(const int64_t i) const { return strides_[axis(i)]; }
int64_t stride(int64_t i) const { return strides_[axis(i)]; }
/*! \brief Return all the strides */
const vector<int64_t>& strides() const { return strides_; }
......@@ -128,11 +128,11 @@ class Tensor {
/*! \brief Return a string to describe the dimensions of this tensor */
string DimString() const { return DimString(dims_); }
/*! \brief Whether the memory of this tensor is unstable */
bool is_corrupted() const { return is_corrupted_; }
/*! \brief Return the version of this tensor */
int version() const { return version_; }
/*! \brief Mark the internal memory to be unstable */
void Corrupt() { is_corrupted_ = true; }
/*! \brief Set the version of this tensor */
void set_version(int version) { version_ = version; }
/*! \brief Whether this tensor holds a valid memory */
bool has_memory() const { return memory_ || ex_memory_ != nullptr; }
......@@ -152,10 +152,10 @@ class Tensor {
return memory()->state();
}
/*! \brief Switch the memory to device set by Context before */
void SwitchToDevice() {
/*! \brief Switch the memory to the specific device */
void SwitchToDevice(int device_id) {
MixedMemory* mem = memory();
if (mem) mem->SwitchToDevice();
if (mem) mem->SwitchToDevice(device_id);
}
/*! \brief Return the type meta of this tensor */
......@@ -177,10 +177,10 @@ class Tensor {
} else {
if (TypeMeta::Id<Context>() ==
TypeMeta::Id<CPUContext>()) {
*data_ptr = mem->mutable_cpu_data();
*data_ptr = mem->mutable_cpu_data(nbytes());
} else if (TypeMeta::Id<Context>() ==
TypeMeta::Id<CUDAContext>()) {
*data_ptr = mem->mutable_cuda_data();
*data_ptr = mem->mutable_cuda_data(nbytes());
} else if (TypeMeta::Id<Context>() ==
TypeMeta::Id<CNMLContext>()) {
*data_ptr = mem->mutable_cnml_data();
......@@ -198,10 +198,10 @@ class Tensor {
CHECK(mem) << "\nMemory access before allowcating.";
if (TypeMeta::Id<Context>() ==
TypeMeta::Id<CPUContext>()) {
return mem->cpu_data();
return mem->cpu_data(nbytes());
} else if (TypeMeta::Id<Context>() ==
TypeMeta::Id<CUDAContext>()) {
return mem->cuda_data();
return mem->cuda_data(nbytes());
} else if (TypeMeta::Id<Context>() ==
TypeMeta::Id<CNMLContext>()) {
return mem->cnml_data();
......@@ -258,10 +258,18 @@ class Tensor {
T* mutable_data() {
void* data_ptr;
mutable_data_ptr<Context>(&data_ptr);
if (data_ptr && meta_ == TypeMeta::Make<T>())
return static_cast<T*>(data_ptr);
return static_cast<T*>(
raw_mutable_data<Context>(TypeMeta::Make<T>()));
if (data_ptr) {
auto meta = TypeMeta::Make<T>();
if (meta_ == meta) {
return static_cast<T*>(data_ptr);
} else if (capacity_ >=
size_ * meta.itemsize()) {
meta_ = meta;
return static_cast<T*>(data_ptr);
}
}
return static_cast<T*>(raw_mutable_data
<Context>(TypeMeta::Make<T>()));
}
/*! \brief Get the typed const data pointer */
......@@ -325,6 +333,9 @@ class Tensor {
/*! \brief Store the size and capacity */
size_t size_ = 0, capacity_ = 0;
/*! \brief Store the version for shared tensor */
int version_ = -1;
/*! \brief Store the dimensions and strides */
vector<int64_t> dims_, strides_;
......@@ -335,7 +346,7 @@ class Tensor {
MixedMemory* ex_memory_ = nullptr;
/*! \brief External memory indicators */
bool is_corrupted_ = false, is_shared_ = false, own_mem_ = true;
bool is_shared_ = false, own_mem_ = true;
};
} // namespace dragon
......
......@@ -52,12 +52,12 @@ class TypeMeta {
return *this;
}
bool operator == (const TypeMeta& other) const {
return (id_ == other.id_);
bool operator == (const TypeMeta& other) const {
return (id_ == other.id_);
}
bool operator != (const TypeMeta& other) const {
return (id_ != other.id_);
bool operator != (const TypeMeta& other) const {
return (id_ != other.id_);
}
const TypeId& id() const { return id_; }
......@@ -69,8 +69,8 @@ class TypeMeta {
template <typename T>
static TypeId Id() {
// return T's id
// using a intptr_t as hash key
// Return T's id
// Using a intptr_t as hash key
return TypeRegister<T>::id();
}
......@@ -78,7 +78,7 @@ class TypeMeta {
static size_t Itemsize() { return sizeof(T); }
template <typename T>
bool Match() const { return (id_ == Id<T>()); }
bool Match() const { return (id_ == Id<T>()); }
template <typename T>
static void Ctor(void* ptr, size_t n) {
......
......@@ -19,14 +19,12 @@
namespace dragon {
#define WORKSPACE_MAX_CORRUPTED_SIZE 2
class Workspace {
public:
typedef Map<string, Map<string, int64_t> > DummyNameMap;
typedef Map<string, unique_ptr<Tensor> > TensorMap;
typedef Map<string, string> TensorProxyMap;
typedef Map<string, string> TensorAliasMap;
typedef Map<string, TensorFillerProto> TensorFillerMap;
typedef Map<string, unique_ptr<OperatorBase> > OperatorMap;
......@@ -73,7 +71,7 @@ class Workspace {
/* \brief Whether the specified filler is in this workspace */
bool HasFiller(const string& name, bool use_remote = true) const;
/*! \brief Create the specified filler */
void CreateFiller(const TensorFillerProto filler);
......@@ -107,19 +105,15 @@ class Workspace {
return Tcaches;
}
/*! \brief Creathe a persistent operator in this workspace */
void CreatePersistentOp(const OperatorDef& def);
/*! \brief Create a operator in this workspace */
OperatorBase* CreateOperator(const OperatorDef& def);
/*! \brief Run the specified persistent operator */
void RunPersistentOp(
const string& key,
const string& anchor,
const vector<string>& inputs,
const vector<string>& outputs);
/*! \brief Try to run the operator in a adaptive mode */
void RunOperator(const OperatorDef& def);
/*! \brief Try to run the operator in a adaptive mode */
void RunOperatorOnce(const OperatorDef& def);
/*! \brief Create a Graph in this workspace */
GraphBase* CreateGraph(const GraphDef& def);
......@@ -128,13 +122,13 @@ class Workspace {
const string& graph_name,
const string& include,
const string& exclude,
const int stream_id = 1);
int stream_id = 0);
/*! \brief Return all the stored graph names */
vector<string> GetGraphs() const;
/* \brief Set a proxy name for the tensor */
bool SetTensorProxy(const string& key, const string& proxy);
/* \brief Set an alias for the tensor */
bool SetTensorAlias(const string& name, const string& alias);
/* \brief Return a unique dummy name within this workspace */
string GetDummyName(
......@@ -157,7 +151,7 @@ class Workspace {
TensorFillerMap tensor_filler_map_;
/*! \brief Store the proxy name of tensors */
TensorProxyMap tensor_proxy_map_;
TensorAliasMap tensor_alias_map_;
/*! \brief Store the registered operators for dynamic graph */
OperatorMap operator_map_;
......
......@@ -99,6 +99,6 @@ class CuDNNSoftmaxGradientOp final : public Operator<Context> {
#endif // WITH_CUDNN
}
} // namespace dragon
#endif // DRAGON_OPERATORS_ACTIVATION_SOFTMAX_OP_H_
\ No newline at end of file
......@@ -10,29 +10,29 @@
* ------------------------------------------------------------
*/
#ifndef DRAGON_OPERATORS_UPDATE_MOVING_AVERAGE_OP_H_
#define DRAGON_OPERATORS_UPDATE_MOVING_AVERAGE_OP_H_
#ifndef DRAGON_OPERATORS_ARITHMETIC_ACCUMULATE_OP_H_
#define DRAGON_OPERATORS_ARITHMETIC_ACCUMULATE_OP_H_
#include "core/operator.h"
namespace dragon {
template <class Context>
class MovingAverageOp final : public Operator<Context> {
class AccumulateOp final : public Operator<Context> {
public:
MovingAverageOp(const OperatorDef& def, Workspace* ws)
AccumulateOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws),
decay(OperatorBase::Arg<float>("decay", 1.f)) {}
alpha(OperatorBase::Arg<float>("alpha", 1.f)),
beta(OperatorBase::Arg<float>("beta", 1.f)) {}
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
template <typename T> void RunWithType();
template <typename T> void RunWithType(Tensor* X, Tensor* Y);
protected:
float decay;
float alpha, beta;
};
} // namespace dragon
} // namespace dragon
#endif // DRAGON_OPERATORS_UPDATE_MOVING_AVERAGE_OP_H_
\ No newline at end of file
#endif // DRAGON_OPERATORS_ARITHMETIC_ACCUMULATE_OP_H_
\ No newline at end of file
......@@ -46,12 +46,12 @@ class AffineGradientOp final : public Operator<Context> {
void RunOnDevice() override;
template <typename T> void BiasRunWithType();
template <typename T> void ScaleRunWithType();
template <typename T> void ComputeScaleGradient(T* dYxX, T* dA);
template <typename T> void RunWithType();
protected:
int64_t axis, num_axes;
int64_t outer_dim, inner_dim, scale_dim, sum_dim, dim;
Tensor sum_result;
};
#ifdef WITH_CUDNN
......@@ -125,18 +125,12 @@ public:
template <typename DT, typename CT>
void ComputeScaleGradient(DT* dYxX, DT* dA);
template <typename DT, typename CT>
void ComputeBiasGradient(const DT* dY, DT* dB);
template <typename T> void ComputeScaleGradient_v2(T* dYxX, T* dA);
template <typename T> void ComputeBiasGradient_v2(const T* dY, T* dB);
template <typename DT, typename CT> void RunWithType();
protected:
USE_CUDNN_AFFINE_FUCNTIONS;
int64_t outer_dim, inner_dim, scale_dim, dim, sum_dim;
Tensor sum_result;
};
#endif
......
......@@ -10,36 +10,33 @@
* ------------------------------------------------------------
*/
#ifndef DRAGON_OPERATORS_VISION_DENSE_CONCAT_OP_H_
#define DRAGON_OPERATORS_VISION_DENSE_CONCAT_OP_H_
#ifndef DRAGON_OPERATORS_ARITHMETIC_SQRT_OP_H_
#define DRAGON_OPERATORS_ARITHMETIC_SQRT_OP_H_
#include "operators/ndarray/concat_op.h"
#include "core/operator.h"
namespace dragon {
template <class Context>
class DenseConcatOp final : public ConcatOp<Context> {
class SqrtOp final : public Operator<Context> {
public:
DenseConcatOp(const OperatorDef& def, Workspace* ws)
: ConcatOp<Context>(def, ws) {}
USE_SIMPLE_CTOR_DTOR(SqrtOp);
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
template <typename T> void RunWithType();
};
template <class Context>
class DenseConcatGradientOp final : public ConcatGradientOp<Context> {
class SqrtGradientOp final : public Operator<Context> {
public:
DenseConcatGradientOp(const OperatorDef& def, Workspace* ws)
: ConcatGradientOp<Context>(def, ws),
growth_rate(OperatorBase::Arg<int64_t>("growth_rate", 0)) {}
USE_SIMPLE_CTOR_DTOR(SqrtGradientOp);
USE_OPERATOR_FUNCTIONS;
void ElimateCorruption() override;
template <typename T> void RestoreX1();
protected:
int64_t growth_rate;
void RunOnDevice() override;
template <typename T> void RunWithType();
};
} // namespace dragon
#endif // DRAGON_OPERATORS_VISION_DENSE_CONCAT_OP_H_
\ No newline at end of file
#endif // DRAGON_OPERATORS_ARITHMETIC_SQRT_OP_H_
\ No newline at end of file
......@@ -19,7 +19,7 @@ namespace dragon {
template <class Context>
class SquareOp final : public Operator<Context> {
public:
public:
USE_SIMPLE_CTOR_DTOR(SquareOp);
USE_OPERATOR_FUNCTIONS;
......@@ -29,7 +29,7 @@ public:
template <class Context>
class SquareGradientOp final : public Operator<Context> {
public:
public:
USE_SIMPLE_CTOR_DTOR(SquareGradientOp);
USE_OPERATOR_FUNCTIONS;
......
......@@ -37,7 +37,7 @@ class SigmoidFocalLossOp
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
template <typename T> void RunWithType();
template <typename Tx, typename Ty> void RunWithType();
protected:
float alpha, gamma, pos_alpha, neg_alpha;
......@@ -66,7 +66,7 @@ class SigmoidFocalLossGradientOp
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
template <typename T> void RunWithType();
template <typename Tx, typename Ty> void RunWithType();
protected:
float alpha, gamma, pos_alpha, neg_alpha;
......
......@@ -37,7 +37,7 @@ class SoftmaxFocalLossOp
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
template <typename T> void RunWithType();
template <typename Tx, typename Ty> void RunWithType();
protected:
float alpha, gamma, pos_alpha, neg_alpha;
......@@ -66,7 +66,7 @@ class SoftmaxFocalLossGradientOp
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
template <typename T> void RunWithType();
template <typename Tx, typename Ty> void RunWithType();
protected:
float alpha, gamma, pos_alpha, neg_alpha;
......
......@@ -10,29 +10,41 @@
* ------------------------------------------------------------
*/
#ifndef DRAGON_OPERATORS_MISC_ASTYPE_OP_H_
#define DRAGON_OPERATORS_MISC_ASTYPE_OP_H_
#ifndef DRAGON_OPERATORS_MISC_CAST_OP_H_
#define DRAGON_OPERATORS_MISC_CAST_OP_H_
#include "core/operator.h"
namespace dragon {
template <class Context>
class AsTypeOp final : public Operator<Context> {
class CastOp final : public Operator<Context> {
public:
AsTypeOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws),
dtype(OperatorBase::Arg<string>("dtype", "float32")),
inplace(OperatorBase::Arg<bool>("inplace", false)) {}
USE_OPERATOR_FUNCTIONS;
CastOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws),
dtype(OperatorBase::Arg<string>("dtype", "float32")),
inplace(OperatorBase::Arg<bool>("inplace", false)) {}
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
void RunOnDevice() override;
protected:
string dtype;
bool inplace;
};
template <class Context>
class CastGradientOp final : public Operator<Context> {
public:
USE_SIMPLE_CTOR_DTOR(CastGradientOp);
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
protected:
string dtype;
};
} // namespace dragon
#endif // DRAGON_OPERATORS_MISC_ASTYPE_OP_H_
\ No newline at end of file
#endif // DRAGON_OPERATORS_MISC_CAST_OP_H_
\ No newline at end of file
......@@ -128,7 +128,7 @@ public:
template <class Context>
class TruncatedNormalOp final : public InitializeOp<Context> {
public:
public:
TruncatedNormalOp(const OperatorDef& def, Workspace* ws)
: InitializeOp<Context>(def, ws) {
this->filler_proto.set_type("truncated_normal");
......
......@@ -25,8 +25,7 @@ class AdamUpdateOp final : public UpdateOpBase<Context> {
USE_OPERATOR_FUNCTIONS;
USE_UPDATER_FUNCTIONS(Context);
void ComputeRunWithFloat32() override;
void ComputeRunWithFloat16() override;
void ComputeUpdates(Tensor* dX) override;
protected:
int t; float lr, beta1, beta2, eps;
......
......@@ -75,7 +75,6 @@ class CollectiveUpdateOp final : public Operator<Context> {
#ifdef WITH_NCCL
ncclComm_t nccl_comm;
CUDAClosure<Context> closure;
#endif
};
......
......@@ -25,8 +25,7 @@ class NesterovUpdateOp final : public UpdateOpBase<Context> {
USE_OPERATOR_FUNCTIONS;
USE_UPDATER_FUNCTIONS(Context);
void ComputeRunWithFloat32() override;
void ComputeRunWithFloat16() override;
void ComputeUpdates(Tensor* dX) override;
protected:
float lr, momentum;
......
......@@ -25,8 +25,7 @@ class RMSPropUpdateOp final : public UpdateOpBase<Context> {
USE_OPERATOR_FUNCTIONS;
USE_UPDATER_FUNCTIONS(Context);
void ComputeRunWithFloat32() override;
void ComputeRunWithFloat16() override;
void ComputeUpdates(Tensor* dX) override;
protected:
float lr, decay, eps;
......
......@@ -26,8 +26,7 @@ class SGDUpdateOp final : public UpdateOpBase<Context> {
USE_OPERATOR_FUNCTIONS;
USE_UPDATER_FUNCTIONS(Context);
void ComputeRunWithFloat32() override;
void ComputeRunWithFloat16() override;
void ComputeUpdates(Tensor* dX) override;
protected:
float old_lr, lr, momentum, correction;
......
......@@ -24,29 +24,29 @@ class UpdateOpBase : public Operator<Context> {
: Operator<Context>(def, ws),
lr_mult(OperatorBase::Arg<float>("lr_mult", 1.f)),
decay_mult(OperatorBase::Arg<float>("decay_mult", 1.f)),
slot(OperatorBase::Arg<string>("slot", "")),
zero_grad(OperatorBase::Arg<bool>("zero_grad", true)) {
slot(OperatorBase::Arg<string>("slot", "")) {
CHECK(!slot.empty()) << "\nRequired a non-empty slot";
}
USE_OPERATOR_FUNCTIONS;
string Slot() { return slot + "/" + Output(0)->name(); }
float Param(const string& name) const;
string Slot();
void RunOnDevice() override;
template <typename T> void PreprocessRunWithType();
template <typename T>
void ProcessGradients(Tensor* dX, Tensor* X);
virtual void ComputeRunWithFloat32() = 0;
virtual void ComputeRunWithFloat16() = 0;
virtual void ComputeUpdates(Tensor* dX) = 0;
void UpdateRunWithFloat32();
void UpdateRunWithFloat16();
template <typename T>
void ApplyUpdates(Tensor* dX, Tensor* X);
void RunOnDevice() override;
protected:
float lr_mult, decay_mult;
float l2_decay, clip_thresh, scale_factor;
string slot;
bool zero_grad;
};
#define USE_UPDATER_FUNCTIONS(context) \
......
......@@ -88,6 +88,7 @@ class CuDNNConv2dOp final : public Conv2dOp<Context> {
}
void RunOnDevice() override;
void SetConvDescFromInputs();
template <typename T> void ResetDesc();
template <typename T> void RunWithType();
......@@ -101,7 +102,7 @@ class CuDNNConv2dOp final : public Conv2dOp<Context> {
cudnnFilterDescriptor_t filter_desc;
size_t fwd_data_size;
int64_t cudnn_group;
vector<int64_t> input_dims;
vector<int64_t> input_dims, filter_dims;
bool enable_tensor_core;
};
......@@ -142,6 +143,7 @@ class CuDNNConv2dGradientOp final : public Conv2dGradientOp<Context> {
}
void RunOnDevice() override;
void SetConvDescFromInputs();
template <typename T> void ResetDesc();
template <typename T> void RunWithType();
......@@ -156,7 +158,7 @@ class CuDNNConv2dGradientOp final : public Conv2dGradientOp<Context> {
cudnnFilterDescriptor_t filter_desc;
size_t bwd_filter_size, bwd_data_size;
int64_t cudnn_group;
vector<int64_t> input_dims;
vector<int64_t> input_dims, filter_dims;
bool enable_tensor_core;
};
......
......@@ -20,10 +20,10 @@ namespace dragon {
template <class Context>
class ConvTranspose2dOp : public ConvOpBase<Context> {
public:
ConvTranspose2dOp(const OperatorDef& def, Workspace* ws)
ConvTranspose2dOp(const OperatorDef& def, Workspace* ws)
: ConvOpBase<Context>(def, ws) {
this->num_spatial_axes = 2;
Setup();
Setup();
}
USE_OPERATOR_FUNCTIONS;
USE_CONVOLUTION_FUNCTIONS;
......@@ -95,6 +95,7 @@ class CuDNNConvTranspose2dOp final
}
void RunOnDevice() override;
void SetConvDescFromInputs();
template <typename T> void ResetDesc();
template <typename T> void RunWithType();
......@@ -108,7 +109,7 @@ class CuDNNConvTranspose2dOp final
cudnnFilterDescriptor_t filter_desc;
size_t fwd_data_size;
int64_t cudnn_group;
vector<int64_t> input_dims;
vector<int64_t> output_dims, filter_dims;
bool enable_tensor_core;
};
......@@ -152,6 +153,7 @@ public:
}
void RunOnDevice() override;
void SetConvDescFromInputs();
template <typename T> void ResetDesc();
template <typename T> void RunWithType();
......@@ -166,7 +168,7 @@ public:
cudnnFilterDescriptor_t filter_desc;
size_t bwd_filter_size, bwd_data_size;
int64_t cudnn_group;
vector<int64_t> input_dims;
vector<int64_t> output_dims, filter_dims;
bool enable_tensor_core;
};
......
......@@ -55,6 +55,7 @@ class NNResizeGradientOp final : public Operator<Context> {
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
void RunWithFloat16();
template <typename T> void RunWithType();
protected:
......
......@@ -26,7 +26,7 @@ class Pool2dOp : public Operator<Context> {
data_format(OperatorBase::Arg<string>("data_format", "NCHW")),
padding(OperatorBase::Arg<string>("padding", "VALID")),
global_pooling(OperatorBase::Arg<bool>("global_pooling", false)),
ceil_mode(OperatorBase::Arg<bool>("ceil", true)) {
ceil_mode(OperatorBase::Arg<bool>("ceil_mode", true)) {
auto ks = OperatorBase::Args<int64_t>("kernel_shape");
auto s = OperatorBase::Args<int64_t>("strides");
auto p = OperatorBase::Args<int64_t>("pads");
......@@ -68,7 +68,7 @@ class Pool2dGradientOp : public Operator<Context> {
data_format(OperatorBase::Arg<string>("data_format", "NCHW")),
padding(OperatorBase::Arg<string>("padding", "VALID")),
global_pooling(OperatorBase::Arg<bool>("global_pooling", false)),
ceil_mode(OperatorBase::Arg<bool>("ceil", true)) {
ceil_mode(OperatorBase::Arg<bool>("ceil_mode", true)) {
auto ks = OperatorBase::Args<int64_t>("kernel_shape");
auto s = OperatorBase::Args<int64_t>("strides");
auto p = OperatorBase::Args<int64_t>("pads");
......
......@@ -54,6 +54,7 @@ class ROIAlignGradientOp final : public Operator<Context> {
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
void RunWithFloat16();
template <typename T> void RunWithType();
protected:
......
......@@ -49,6 +49,7 @@ class ROIPoolGradientOp final : public Operator<Context> {
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override;
void RunWithFloat16();
template <typename T> void RunWithType();
protected:
......
......@@ -12,7 +12,7 @@ namespace dragon {
template <typename T>
using BlockReduce = cub::BlockReduce<T, CUDA_THREADS>;
}
} // namespace dragon
#endif // WITH_CUDA
......
......@@ -102,7 +102,7 @@ template <typename T, class Context>
void Set(
const int n,
const T alpha,
T* x,
T* y,
Context* ctx);
template <typename T, class Context>
......@@ -122,6 +122,15 @@ void Axpy(
Context* ctx);
template<typename T, class Context>
void Axpby(
const int n,
const float alpha,
const T* x,
const float beta,
T* y,
Context* ctx);
template<typename T, class Context>
void AddScalar(
const int n,
const float alpha,
......@@ -141,17 +150,8 @@ void AddScalar(
template<typename T, class Context>
void InvStd(
const int n,
float eps,
const T* x,
T* y,
Context* ctx);
template<typename T, class Context>
void Axpby(
const int n,
float alpha,
const float eps,
const T* x,
float beta,
T* y,
Context* ctx);
......
......@@ -378,8 +378,8 @@ void NLLLoss(
const Tx* log_prob,
const Ty* labels,
const int* ignores,
float* losses,
float* flags,
Tx* losses,
int* flags,
Context* ctx);
template <typename Tx, typename Ty, class Context>
......@@ -392,7 +392,7 @@ void NLLLossGrad(
const Ty* labels,
const int* ignores,
Tx* dx,
float* flags,
int* flags,
Context* ctx);
/*! loss.sigmoid_ce_loss */
......@@ -403,7 +403,7 @@ void SigmoidCrossEntropy(
const T* logits,
const T* targets,
T* losses,
T* flags,
int* flags,
Context* ctx);
template <typename T, class Context>
......@@ -412,12 +412,12 @@ void SigmoidCrossEntropyGrad(
const T* logits,
const T* targets,
T* dlogits,
T* flags,
int* flags,
Context* ctx);
/*! loss.sigmoid_focal_loss */
template <typename T, class Context>
template <typename Tx, typename Ty, class Context>
void SigmoidFocalLoss(
const int outer_dim,
const int axis_dim,
......@@ -426,13 +426,13 @@ void SigmoidFocalLoss(
const float neg_alpha,
const float gamma,
const int neg_id,
const float* logits,
const float* targets,
float* losses,
float* flags,
const Tx* logits,
const Ty* targets,
Tx* losses,
int* flags,
Context* ctx);
template <typename T, class Context>
template <typename Tx, typename Ty, class Context>
void SigmoidFocalLossGrad(
const int outer_dim,
const int axis_dim,
......@@ -441,10 +441,10 @@ void SigmoidFocalLossGrad(
const float neg_alpha,
const float gamma,
const int neg_id,
const float* logits,
const float* targets,
float* dlogits,
float* flags,
const Tx* logits,
const Ty* targets,
Tx* dlogits,
int* flags,
Context* ctx);
/*! loss.smooth_l1_loss */
......@@ -477,7 +477,7 @@ void SoftmaxCrossEntropy(
/*! loss.softmax_focal_loss */
template <typename T, class Context>
template <typename Tx, typename Ty, class Context>
void SoftmaxFocalLoss(
const int outer_dim,
const int axis_dim,
......@@ -487,14 +487,14 @@ void SoftmaxFocalLoss(
const float neg_alpha,
const float gamma,
const int neg_id,
const T* prob,
const T* labels,
const Tx* prob,
const Ty* labels,
const int* ignores,
T* losses,
T* flags,
Tx* losses,
int* flags,
Context* ctx);
template <typename T, class Context>
template <typename Tx, typename Ty, class Context>
void SoftmaxFocalLossGrad(
const int outer_dim,
const int axis_dim,
......@@ -504,11 +504,11 @@ void SoftmaxFocalLossGrad(
const float neg_alpha,
const float gamma,
const int neg_id,
const T* prob,
const T* labels,
const Tx* prob,
const Ty* labels,
const int* ignores,
T* dx,
T* flags,
Tx* dx,
int* flags,
Context* ctx);
/*! loss.sparse_softmax_cross_entropy */
......@@ -522,8 +522,8 @@ void SparseSoftmaxCrossEntropy(
const Tx* prob,
const Ty* labels,
const int* ignores,
float* losses,
float* flags,
Tx* losses,
int* flags,
Context* ctx);
template <typename Tx, typename Ty, class Context>
......@@ -536,7 +536,7 @@ void SparseSoftmaxCrossEntropyGrad(
const Ty* labels,
const int* ignores,
Tx* dx,
float* flags,
int* flags,
Context* ctx);
/*! misc.astype */
......@@ -548,6 +548,16 @@ void TypeA2B(
Tb* b,
Context* ctx);
/*! misc.gradient */
template <typename T, class Context>
void GradientTwoSum(
const int count,
const T* dy1,
const T* dy2,
T* dx,
Context* ctx);
/*! misc.image_data */
template <typename Tx, typename Ty, class Context>
......@@ -976,11 +986,18 @@ void SGDUpdate(
/*! update.op_base */
template <typename T, class Context>
void MixedPrecisionL2Decay(
const int count,
const float alpha,
const T* w,
float* dx,
Context* ctx);
template <typename T, class Context>
void MixedPrecisionUpdate(
const int count,
const float* updates,
T* w,
T* g,
Context* ctx);
/*! vision.bias_add */
......
......@@ -37,6 +37,20 @@ inline std::vector<std::string> split(
return ret;
}
inline std::string replace_first(
const std::string& str,
const std::string& pattern,
const std::string& excepted) {
size_t pos = 0;
if ((pos = str.find(pattern)) != std::string::npos) {
std::string ret(str);
ret.replace(pos, pattern.size(), excepted);
return ret;
} else {
return str;
}
}
} // namespace str
} // namespace dragon
......
......@@ -269,7 +269,7 @@ void LoadONNXModel(
* *
* * * * * * * * * * * * * * * * * * * * */
void SetLogLevel(const std::string& level) {
void SetLoggingLevel(const std::string& level) {
SetLogDestination(StrToLogSeverity(level));
}
......
......@@ -97,7 +97,7 @@ DRAGON_API std::string CreateGraph(
DRAGON_API void RunGraph(
const std::string& graph_name,
Workspace_t ws,
const int stream_id = 1);
int stream_id = 0);
/* * * * * * * * * * * * * * * * * * * * *
* *
......@@ -156,7 +156,7 @@ DRAGON_API void LoadONNXModel(
* *
* * * * * * * * * * * * * * * * * * * * */
DRAGON_API void SetLogLevel(const std::string& level);
DRAGON_API void SetLoggingLevel(const std::string& level);
} // namespace dragon
......
......@@ -19,95 +19,45 @@ namespace dragon {
namespace python {
PyObject* CreateGradientDefsCC(PyObject* self, PyObject* args) {
PyObject* def_string = nullptr;
PyObject* py_g_outputs = nullptr;
if (!PyArg_ParseTuple(args, "SO!",
&def_string, &PyList_Type, &py_g_outputs)) {
PyErr_SetString(PyExc_ValueError,
"Excepted a serialized string of OperatorDef "
"and a list containing outputs of this GradientOp.");
return nullptr;
}
OperatorDef def;
if (!def.ParseFromString(PyBytes_AsStringEx(def_string))) {
PyErr_SetString(PyExc_ValueError,
"Failed to parse the OperatorDef.");
return nullptr;
}
if (!GradientRegistry()->Has(def.type())) {
PyErr_SetString(PyExc_KeyError,
"This Operator does not register GradientOp.");
return nullptr;
}
vector<string> g_outputs;
PyList_AsVecString(py_g_outputs, g_outputs, "ignore");
Gradient grad = MakeGradientForOp(def, g_outputs);
PyObject* g_ops = PyList_New(grad.ops.size());
PyObject* g_input = PyList_New(grad.g_inputs.size());
PyObject* g_defaults = PyList_New(grad.defaults.size());
for (int i = 0; i < grad.ops.size(); i++) {
PyObject* e = String_AsPyBytes(grad.ops[i].SerializeAsString());
SetPyList(g_ops, i, e);
}
for (int i = 0; i < grad.g_inputs.size(); i++) {
PyObject* e = String_AsPyUnicode(grad.g_inputs[i]);
SetPyList(g_input, i, e);
}
for (int i = 0; i < grad.defaults.size(); i++) {
PyObject* e = PyFloat_FromDouble(grad.defaults[i]);
SetPyList(g_defaults, i, e);
}
PyObject* pack = PyTuple_Pack(3, g_ops, g_input, g_defaults);
Py_XDECREF(g_ops);
Py_XDECREF(g_input);
Py_XDECREF(g_defaults);
return pack;
}
void AddGradientMethods(pybind11::module& m) {
m.def("CreateGradientDefs", [](
const string& forward_def,
const vector<string>& g_outputs) {
OperatorDef def;
if (!def.ParseFromString(forward_def))
LOG(FATAL) << "Failed to parse the OperatorDef.";
if (!GradientRegistry()->Has(def.type()))
LOG(FATAL) << def.type() << "Op has no gradients.";
Gradient grad = MakeGradientForOp(def, g_outputs);
vector<pybind11::bytes> grad_ops;
for (const auto& e : grad.ops)
grad_ops.push_back(e.SerializeAsString());
return std::tuple<
vector<pybind11::bytes>, vector<string>, vector<float>
>(grad_ops, grad.g_inputs, grad.defaults);
});
PyObject* RunGradientFlowCC(PyObject* self, PyObject* args) {
PyObject* py_fp_ops, *py_targets;
PyObject* py_input_grads, *py_ignore_grads;
PyObject* py_share_grads, *py_export_graph;
if (!PyArg_ParseTuple(args, "OOOOOO",
&py_fp_ops, &py_targets,
&py_input_grads, &py_ignore_grads,
&py_share_grads, &py_export_graph)) {
PyErr_SetString(PyExc_ValueError,
"Excepted a list of serialized input ops, targets, "
"input grads, ignore grads and whehter to share grads or log graph.");
return nullptr;
}
// Make -> Optm -> Run
vector<string> targets, input_grads, ignore_grads;
PyList_AsVecString(py_targets, targets, "");
PyList_AsVecString(py_input_grads, input_grads, "");
PyList_AsVecString(py_ignore_grads, ignore_grads, "");
GraphDef fp_ops, bp_ops;
if (!fp_ops.ParseFromString(PyBytes_AsStringEx(py_fp_ops))) {
PyErr_SetString(PyExc_RuntimeError,
"Failed to parse the GraphDef of forward ops.");
return nullptr;
}
GraphGradientMaker maker;
for (auto& grad : input_grads) maker.AddExternalGrad(grad);
for (auto& grad : ignore_grads) maker.AddIgnoreGrad(grad);
maker.Make(fp_ops, targets, bp_ops);
bool share_grads = PyObject_IsTrue(py_share_grads) ? true : false;
bool export_graph = PyObject_IsTrue(py_export_graph) ? true : false;
if (share_grads) maker.Share("/share/buffer/grads", bp_ops);
if (export_graph) {
Tensor* tensor = ws()->CreateTensor(
"/graph_def/dynamic/gradient_flow")->Reshape({ 1 });
string* data = tensor->mutable_data<string, CPUContext>();
data[0] = bp_ops.SerializeAsString();
tensor = ws()->CreateTensor(
"/graph_def/dynamic/forward_flow")->Reshape({ 1 });
data = tensor->mutable_data<string, CPUContext>();
data[0] = fp_ops.SerializeAsString();
}
for (auto& op : bp_ops.op()) ws()->RunOperator(op);
Py_RETURN_TRUE;
m.def("FlowGradients", [](
const vector<OperatorDef*>& forward_ops,
const vector<string>& targets,
const vector<string>& input_grads,
const vector<string>& ignore_grads,
const bool is_sharing,
const bool verbose) {
// Make => Optimize => Run
GraphDef backward_ops;
GraphGradientMaker maker;
for (auto& grad : input_grads) maker.AddExternalGrad(grad);
for (auto& grad : ignore_grads) maker.AddIgnoreGrad(grad);
maker.Make(forward_ops, targets, backward_ops);
if (is_sharing) maker.Share(backward_ops);
pybind11::gil_scoped_release g;
for (auto& op : backward_ops.op()) {
if (verbose) std::cout << op.DebugString() << std::endl;
if (op.has_uid()) ws()->RunOperator(op);
else ws()->RunOperatorOnce(op);
}
});
}
} // namespace python
......
......@@ -19,15 +19,10 @@ namespace dragon {
namespace python {
inline PyObject* SetLogLevelCC(PyObject* self, PyObject* args) {
char* cname;
if (!PyArg_ParseTuple(args, "s", &cname)) {
PyErr_SetString(PyExc_ValueError,
"Excepted the logging level.");
return nullptr;
}
SetLogDestination(StrToLogSeverity(string(cname)));
Py_RETURN_TRUE;
void AddConfigMethods(pybind11::module& m) {
m.def("SetLoggingLevel", [](const string& level) {
SetLogDestination(StrToLogSeverity(level));
});
}
} // namespace python
......
......@@ -19,15 +19,34 @@ namespace python {
#include "py_dragon.h"
inline PyObject* IsCUDADriverSufficientCC(PyObject* self, PyObject* args) {
void AddCUDAMethods(pybind11::module& m) {
m.def("IsCUDADriverSufficient", []() {
#ifdef WITH_CUDA
int count;
cudaError_t err = cudaGetDeviceCount(&count);
if (err == cudaErrorInsufficientDriver) return PyBool_FromLong(0);
return PyBool_FromLong(1);
int count;
cudaError_t err = cudaGetDeviceCount(&count);
if (err == cudaErrorInsufficientDriver) false;
return true;
#else
return PyBool_FromLong(0);
return false;
#endif
});
m.def("cudaGetDevice", []() {
return CUDAContext::active_device_id();
});
m.def("cudaStreamSynchronize", [](
int device_id, int stream_id) {
#ifdef WITH_CUDA
if (device_id < 0) device_id =
CUDAContext::active_device_id();
cudaStreamSynchronize(CUDAContext::cuda_object()
->GetStream(device_id, stream_id));
cudaError_t error = cudaGetLastError();
CHECK_EQ(error, cudaSuccess)
<< "\nCUDA Error: " << cudaGetErrorString(error);
#endif
});
}
} // namespace python
......
......@@ -13,8 +13,9 @@
#ifndef DRAGON_PYTHON_PY_DRAGON_H_
#define DRAGON_PYTHON_PY_DRAGON_H_
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#include "py_types.h"
#include "py_macros.h"
#include "core/common.h"
#include "core/registry.h"
#include "core/context.h"
......@@ -25,6 +26,9 @@
#include "core/workspace.h"
#include "utils/caffemodel.h"
#include <pybind11/stl.h>
#include <pybind11/pybind11.h>
namespace dragon {
namespace python {
......@@ -32,83 +36,80 @@ namespace python {
class TensorFetcherBase {
public:
virtual ~TensorFetcherBase() {}
virtual PyObject* Fetch(const Tensor& tensor) = 0;
virtual pybind11::object Fetch(const Tensor& tensor) = 0;
};
class TensorFeederBase {
public:
virtual ~TensorFeederBase() {}
virtual PyObject* Feed(
virtual void Feed(
const DeviceOption& option,
PyArrayObject* array,
Tensor* tensor) = 0;
};
DECLARE_TYPED_REGISTRY(TensorFetcherRegistry, TypeId, TensorFetcherBase);
#define REGISTER_TENSOR_FETCHER(type, ...) \
REGISTER_TYPED_CLASS(TensorFetcherRegistry, type, __VA_ARGS__)
inline TensorFetcherBase* CreateFetcher(TypeId type) {
return TensorFetcherRegistry()->Create(type);
return TensorFetcherRegistry()->Create(type);
}
DECLARE_TYPED_REGISTRY(TensorFeederRegistry, TypeId, TensorFeederBase);
#define REGISTER_TENSOR_FEEDER(type, ...) \
REGISTER_TYPED_CLASS(TensorFeederRegistry, type, __VA_ARGS__)
class NumpyFetcher : public TensorFetcherBase {
public:
PyObject* Fetch(const Tensor& tensor) override {
pybind11::object Fetch(const Tensor& tensor) override {
CHECK_GT(tensor.count(), 0);
vector<npy_intp> npy_dims;
for (const auto dim : tensor.dims()) npy_dims.push_back(dim);
int npy_type = TypeMetaToNPY(tensor.meta());
if (npy_type == -1) {
string s = "The data type of Tensor(" +
LOG(FATAL) << "The data type of Tensor(" +
tensor.name() + ") is unknown. Have you solved it ?";
PyErr_SetString(PyExc_RuntimeError, s.c_str());
return nullptr;
}
CHECK(tensor.memory()) << "\nIllegal memory access.";
// Create a empty array with the same shape
PyObject* array = PyArray_SimpleNew(
tensor.ndim(), npy_dims.data(), npy_type);
// Copy the tensor data to the numpy array
if (tensor.memory_state() == MixedMemory::STATE_AT_CUDA) {
CUDAContext::Memcpy<CPUContext, CUDAContext>(tensor.nbytes(),
PyArray_DATA(reinterpret_cast<PyArrayObject*>(array)),
tensor.raw_data<CUDAContext>());
CUDAContext::MemcpyEx<CPUContext, CUDAContext>(tensor.nbytes(),
PyArray_DATA(reinterpret_cast<PyArrayObject*>(array)),
tensor.raw_data<CUDAContext>(),
tensor.memory()->device_id());
} else {
CPUContext::Memcpy<CPUContext, CPUContext>(tensor.nbytes(),
PyArray_DATA(reinterpret_cast<PyArrayObject*>(array)),
tensor.raw_data<CPUContext>());
}
return array;
return pybind11::reinterpret_steal<pybind11::object>(array);
}
};
class StringFetcher : public TensorFetcherBase {
public:
PyObject* Fetch(const Tensor& tensor) override {
CHECK_GT(tensor.count(), 0);
return String_AsPyBytes(*tensor.data<string, CPUContext>());
pybind11::object Fetch(const Tensor& tensor) override {
CHECK_EQ(tensor.count(), 1);
return pybind11::bytes(tensor.data<string, CPUContext>()[0]);
}
};
class NumpyFeeder : public TensorFeederBase {
public:
PyObject* Feed(
void Feed(
const DeviceOption& option,
PyArrayObject* original_array,
Tensor* tensor) override {
PyArrayObject* array = PyArray_GETCONTIGUOUS(original_array);
const TypeMeta& meta = TypeNPYToMeta(PyArray_TYPE(array));
if (meta.id() == 0) {
PyErr_SetString(PyExc_TypeError, "Unsupported data type.");
return nullptr;
}
if (meta.id() != tensor->meta().id() && tensor->meta().id() != 0)
LOG(WARNING) << "Feed Tensor(" << tensor->name() << ")"
<< " with different data type from original one.";
if (meta.id() == 0) LOG(FATAL) << "Unsupported data type.";
tensor->SetMeta(meta);
int ndim = PyArray_NDIM(array);
npy_intp* npy_dims = PyArray_DIMS(array);
vector<int64_t> dims;
......@@ -116,21 +117,22 @@ class NumpyFeeder : public TensorFeederBase {
tensor->Reshape(dims);
if (option.device_type() == PROTO_CUDA) {
#ifdef WITH_CUDA
CUDAContext context(option);
context.SwitchToDevice();
auto* data = tensor->raw_mutable_data<CUDAContext>(meta);
context.Memcpy<CUDAContext, CPUContext>(tensor->nbytes(),
data, static_cast<void*>(PyArray_DATA(array)));
#else
CUDAContext::MemcpyEx<CUDAContext, CPUContext>(
tensor->nbytes(),
tensor->raw_mutable_data<CUDAContext>(),
static_cast<void*>(PyArray_DATA(array)),
option.device_id());
#else
LOG(FATAL) << "CUDA was not compiled.";
#endif
} else {
auto* data = tensor->raw_mutable_data<CPUContext>(meta);
CPUContext::Memcpy<CPUContext, CPUContext>(tensor->nbytes(),
data, static_cast<void*>(PyArray_DATA(array)));
auto* data = tensor->raw_mutable_data<CPUContext>();
CPUContext::Memcpy<CPUContext, CPUContext>(
tensor->nbytes(),
tensor->raw_mutable_data<CPUContext>(),
static_cast<void*>(PyArray_DATA(array)));
}
Py_XDECREF(array);
Py_RETURN_TRUE;
}
};
......
......@@ -19,66 +19,41 @@ namespace dragon {
namespace python {
inline PyObject* CreateGraphCC(PyObject* self, PyObject* args) {
PyObject* graph_str, *verbose;
if (!PyArg_ParseTuple(args, "S|O", &graph_str, &verbose)) {
PyErr_SetString(PyExc_ValueError,
"Excepted a serialized string of GraphDef.");
return nullptr;
}
if (verbose == nullptr) verbose = Py_False;
GraphDef graph_def;
if (!graph_def.ParseFromString(PyBytes_AsStringEx(graph_str))) {
PyErr_SetString(PyExc_RuntimeError,
"Failed to parse the GraphDef.");
return nullptr;
}
auto* graph = ws()->CreateGraph(graph_def);
if (!graph) {
PyErr_SetString(PyExc_RuntimeError,
"Failed to create the Graph.");
return nullptr;
} else {
// It is not a good design to print the debug string
if (PyObject_IsTrue(verbose) ? true : false) {
void AddGraphMethods(pybind11::module& m) {
/*! \brief Create a graph from the serialized def */
m.def("CreateGraph", [](
const string& serialized,
const bool verbose) {
GraphDef graph_def;
if (!graph_def.ParseFromString(serialized))
LOG(FATAL) << "Failed to parse the GraphDef.";
auto* graph = ws()->CreateGraph(graph_def);
if (verbose) {
// It is not a good design to print the debug string
auto* graph_tensor = ws()->CreateTensor(
"/graph_def/optimized/" + graph->name());
if (graph_tensor->count() > 0) {
auto* data = graph_tensor->mutable_data<string, CPUContext>();
std::cout << data[0] << std::endl;
}
}
}
// Return the graph name may be different from the def
// We will make a unique dummy name on creating the graph
return String_AsPyUnicode(graph->name());
}
inline PyObject* RunGraphCC(PyObject* self, PyObject* args) {
char* cname, *include, *exclude;
if (!PyArg_ParseTuple(args, "sss",
&cname, &include, &exclude)) {
PyErr_SetString(PyExc_ValueError,
"Excepted the graph name, include and exclude rules.");
return nullptr;
}
ws()->RunGraph(
string(cname),
string(include),
string(exclude)
);
Py_RETURN_TRUE;
}
inline PyObject* GraphsCC(PyObject* self, PyObject* args) {
vector<string> graphs = ws()->GetGraphs();
PyObject* list = PyList_New(graphs.size());
for (int i = 0; i < graphs.size(); i++)
CHECK_EQ(PyList_SetItem(list, i, String_AsPyUnicode(graphs[i])), 0);
return list;
}
// Return the graph name may be different from the def
// We will make a unique dummy name on creating the graph
return graph->name();
});
/*! \brief Run an existing graph */
m.def("RunGraph", [](
const string& name,
const string& include,
const string& exclude) {
pybind11::gil_scoped_release g;
ws()->RunGraph(name, include, exclude);
});
/*! \brief List all of the existing graphs */
m.def("Graphs", []() { ws()->GetGraphs(); });
}
} // namespace python
......
......@@ -19,48 +19,42 @@ namespace dragon {
namespace python {
inline PyObject* SnapshotCC(PyObject* self, PyObject* args) {
char* path; int format;
PyObject* names; vector<Tensor*> tensors;
if (!PyArg_ParseTuple(args, "sOi", &path, &names, &format)) {
PyErr_SetString(PyExc_ValueError,
"Excepted the model path, tensors, and data format.");
return nullptr;
}
switch (format) {
case 0: // Pickle
PyErr_SetString(PyExc_NotImplementedError,
"Format depends on Pickle. Can't be used in C++.");
break;
case 1: // CaffeModel
for (int i = 0; i < PyList_Size(names); i++)
tensors.push_back(ws()->GetTensor(
PyString_AsString(PyList_GetItem(names, i))));
SavaCaffeModel(path, tensors);
break;
default: LOG(FATAL) << "Unknwon format, code: " << format;
}
Py_RETURN_TRUE;
}
void AddIOMethods(pybind11::module& m) {
m.def("Snapshot", [](
const string& filename,
vector<string>& names,
const int format) {
vector<Tensor*> tensors;
switch (format) {
case 0: // Pickle
LOG(FATAL) << "Format depends on Pickle. "
"Can't be used in C++.";
break;
case 1: // CaffeModel
for (const auto& e : names)
tensors.emplace_back(ws()->GetTensor(e));
SavaCaffeModel(filename, tensors);
break;
default:
LOG(FATAL) << "Unknwon format, code: " << format;
}
});
inline PyObject* RestoreCC(PyObject* self, PyObject* args) {
char* path; int format;
if (!PyArg_ParseTuple(args, "si", &path, &format)) {
PyErr_SetString(PyExc_ValueError,
"Excepted the model path and data format.");
return nullptr;
}
switch (format) {
case 0: // Pickle
PyErr_SetString(PyExc_NotImplementedError,
"Format depends on Pickle. Can't be used in C++.");
break;
case 1: // CaffeModel
LoadCaffeModel(path, ws());
break;
default: LOG(FATAL) << "Unknwon format, code: " << format;
}
Py_RETURN_TRUE;
m.def("Restore", [](
const string& filename,
const int format) {
switch (format) {
case 0: // Pickle
LOG(FATAL) << "Format depends on Pickle. "
"Can't be used in C++.";
break;
case 1: // CaffeModel
LoadCaffeModel(filename, ws());
break;
default:
LOG(FATAL) << "Unknwon format, code: " << format;
}
});
}
} // namespace python
......
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_PYTHON_PY_MACROS_H_
#define DRAGON_PYTHON_PY_MACROS_H_
#include <string>
#include <sstream>
#include <Python.h>
#include <numpy/arrayobject.h>
namespace dragon {
namespace python {
#ifdef WITH_PYTHON3
#define PyInt_FromLong PyLong_FromLong
#define _PyInt_AsInt _PyLong_AsInt
#define PyString_AsString PyUnicode_AsUTF8
#endif
/*!
* ------------------------------------------------------------
*
* <Having Fun with PyString>
*
* For Python3, Get/Return PyUnicode for regular string.
* For Python3, Get/Return PyBytes for google-protobuf.
* For Python2, Get/Return PyBytes only.
*
* ------------------------------------------------------------
*/
#define PyBytes_AsStringEx(pystring) \
std::string(PyBytes_AsString(pystring), PyBytes_Size(pystring))
// Return string to Python
inline PyObject* String_AsPyBytes(const std::string& cstring) {
return PyBytes_FromStringAndSize(cstring.c_str(), cstring.size());
}
inline PyObject* String_AsPyUnicode(const std::string& cstring) {
#ifdef WITH_PYTHON3
return PyUnicode_FromStringAndSize(cstring.c_str(), cstring.size());
#else
return PyBytes_FromStringAndSize(cstring.c_str(), cstring.size());
#endif
}
// Macors
#define PyList_AsVecString(plist, vs, defaults) \
for (int i = 0; i < PyList_Size(plist); i++) { \
PyObject* e = PyList_GetItem(plist, i); \
if (e == Py_None) vs.emplace_back(defaults); \
else vs.push_back(PyString_AsString(PyObject_Str(e))); \
}
#define SetPyList(plist, ix, e) \
PyList_SetItem(plist, ix, e)
#define SetPyDictS2S(object, key, value) \
PyDict_SetItemString(object, key, Py_BuildValue("s", value))
#define SetPyDictS2I(object, key, value) \
PyDict_SetItemString(object, key, Py_BuildValue("i", value))
// Misc
template <typename T>
inline void MakeStringInternal(std::stringstream& ss, const T& t) { ss << t; }
template <typename T,typename ... Args>
inline void MakeStringInternal(std::stringstream& ss, const T& t, const Args& ... args) {
MakeStringInternal(ss, t);
MakeStringInternal(ss, args...);
}
template <typename ... Args>
std::string MakeString(const Args&... args) {
std::stringstream ss;
MakeStringInternal(ss, args...);
return std::string(ss.str());
}
inline void PrErr_SetString(PyObject* type, const std::string& str) {
PyErr_SetString(type, str.c_str());
}
} // namespace python
} // namespace dragon
#endif // DRAGON_PYTHON_PY_MACROS_H_
\ No newline at end of file
......@@ -15,125 +15,126 @@
#include "py_dragon.h"
namespace dragon {
namespace python {
#ifdef WITH_MPI
#include <mpi.h>
#endif
inline PyObject* MPIInitCC(PyObject* self, PyObject* args) {
int thread_type;
MPI_Init_thread(NULL, NULL, MPI_THREAD_MULTIPLE, &thread_type);
CHECK_EQ(thread_type, MPI_THREAD_MULTIPLE)
<< "\nRequire to enable <MPI_THREAD_MULTIPLE> support.";
Py_RETURN_TRUE;
}
namespace dragon {
inline PyObject* MPIFinalizeCC(PyObject* self, PyObject* args) {
MPI_Finalize();
Py_RETURN_TRUE;
}
namespace python {
inline PyObject* MPIRankCC(PyObject* self, PyObject* args) {
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
return PyInt_FromLong(world_rank);
}
void AddMPIMethods(pybind11::module& m) {
m.def("MPIInit", []() {
#ifdef WITH_MPI
// Enabling the multi-threads for Python is meaningless
// While we will still hold this interface here
int thread_type;
char* mt_is_required = nullptr;
mt_is_required = getenv("DRAGON_MPI_THREADS_ENABLE");
if (mt_is_required != nullptr && string(mt_is_required) == "1") {
MPI_Init_thread(NULL, NULL, MPI_THREAD_MULTIPLE, &thread_type);
CHECK_EQ(thread_type, MPI_THREAD_MULTIPLE)
<< "\nRequire to enable <MPI_THREAD_MULTIPLE> support.";
} else {
MPI_Init_thread(NULL, NULL, MPI_THREAD_SINGLE, &thread_type);
}
#else
LOG(FATAL) << "MPI was not compiled.";
#endif
});
inline PyObject* MPISizeCC(PyObject* self, PyObject* args) {
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
return PyInt_FromLong(world_size);
}
m.def("MPIRank", []() {
#ifdef WITH_MPI
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
return world_rank;
#else
LOG(FATAL) << "MPI was not compiled.";
#endif
});
inline PyObject* MPICreateGroupCC(PyObject* self, PyObject* args) {
PyObject *incl, *excl, *ret;
int local_root, world_size;
if (!PyArg_ParseTuple(args, "iOO", &local_root, &incl, &excl)) {
PyErr_SetString(PyExc_ValueError,
"Excepted the local root, include and exclued list.");
return nullptr;
}
MPI_Group world_group, local_group;
MPI_Comm local_comm;
int err_code;
MPI_Comm_group(MPI_COMM_WORLD, &world_group);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
set<int> all_ranks;
for (int i = 0; i < world_size; i++) all_ranks.insert(i);
local_group = world_group;
// Check inclue ranks
int size = (int)PyList_Size(incl);
if (size > 0) {
all_ranks.clear();
unique_ptr<int> incl_ranks(new int[size]);
int* ranks = incl_ranks.get();
for (int i = 0; i < size; i++) {
ranks[i] = _PyInt_AsInt(PyList_GetItem(incl, i));
all_ranks.insert(ranks[i]);
}
err_code = MPI_Group_incl(world_group, size, ranks, &local_group);
CHECK(err_code == MPI_SUCCESS) << "\nFail to create mpi group.";
}
// Check exclude ranks
size = (int)PyList_Size(excl);
if (size > 0) {
all_ranks.clear(); Set<int> tmp;
unique_ptr<int> excl_ranks(new int[size]);
int* ranks = excl_ranks.get();
for (int i = 0; i < size; i++) {
ranks[i] = _PyInt_AsInt(PyList_GetItem(excl, i));
tmp.insert(ranks[i]);
m.def("MPISize", []() {
#ifdef WITH_MPI
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
return world_size;
#else
LOG(FATAL) << "MPI was not compiled.";
#endif
});
m.def("MPICreateGroup", [](
const int local_root,
const vector<int>& incl,
const vector<int>& excl) {
#ifdef WITH_MPI
int world_size;
MPI_Group world_group, local_group;
MPI_Comm local_comm;
int err_code;
MPI_Comm_group(MPI_COMM_WORLD, &world_group);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
set<int> all_ranks;
for (int i = 0; i < world_size; i++) all_ranks.insert(i);
local_group = world_group;
// Check include ranks
if (!incl.empty()) {
all_ranks.clear();
for (auto e : incl) all_ranks.insert(e);
err_code = MPI_Group_incl(world_group,
(int)incl.size(), incl.data(), &local_group);
CHECK(err_code == MPI_SUCCESS)
<< "\nFail to create MPI Group.";
}
for (int i = 0; i < world_size; i++)
if (!tmp.count(i)) all_ranks.insert(i);
err_code = MPI_Group_excl(world_group, size, ranks, &local_group);
CHECK(err_code == MPI_SUCCESS) << "Fail to create mpi group.";
}
err_code = MPI_Comm_create(MPI_COMM_WORLD, local_group, &local_comm);
CHECK(err_code == MPI_SUCCESS) << "Fail to create mpi group.";
// Check exclude ranks
if (!excl.empty()) {
all_ranks.clear(); Set<int> tmp;
for (auto e : excl) tmp.insert(e);
for (int i = 0; i < world_size; i++)
if (!tmp.count(i)) all_ranks.insert(i);
err_code = MPI_Group_excl(world_group,
(int)excl.size(), excl.data(), &local_group);
CHECK(err_code == MPI_SUCCESS)
<< "\nFail to create MPI Group.";
}
if (local_comm != MPI_COMM_NULL) {
int world_rank, local_size;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
if (world_rank == local_root) {
MPI_Comm_size(local_comm, &local_size);
std::stringstream ss;
ss << "Rank[" << world_rank << "]: "
<< "Create a mpi group of " << local_size << " members";
ss << "\nGroup: [";
for (auto rank : all_ranks) {
if (rank != local_root) ss << rank << ", ";
else ss << rank << "*, ";
err_code = MPI_Comm_create(MPI_COMM_WORLD, local_group, &local_comm);
CHECK(err_code == MPI_SUCCESS) << "\nFail to create MPI Group.";
if (local_comm != MPI_COMM_NULL) {
int world_rank, local_size;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
if (world_rank == local_root) {
MPI_Comm_size(local_comm, &local_size);
std::stringstream ss;
ss << "Rank[" << world_rank << "]: "
<< "Create a mpi group of " << local_size << " members";
ss << "\nGroup: [";
for (auto rank : all_ranks) {
if (rank != local_root) ss << rank << ", ";
else ss << rank << "*, ";
}
string log_info = ss.str(); log_info[log_info.size() - 2] = ']';
LOG(INFO) << log_info;
}
string log_info = ss.str(); log_info[log_info.size() - 2] = ']';
LOG(INFO) << log_info;
}
}
ret = PyList_New(2);
PyList_SetItem(ret, 0, PyInt_FromLong((long)local_comm));
PyList_SetItem(ret, 1, PyInt_FromLong((long)local_group));
return ret;
}
#else // WITH_MPI
#define MPI_NOT_IMPLEMENTED \
LOG(FATAL) << "MPI was not compiled."; \
Py_RETURN_TRUE
return vector<long>({ (long)local_comm, (long)local_group });
#else
LOG(FATAL) << "MPI was not compiled.";
#endif
});
inline PyObject* MPIInitCC(PyObject* self, PyObject* args) { MPI_NOT_IMPLEMENTED; }
inline PyObject* MPIFinalizeCC(PyObject* self, PyObject* args) { MPI_NOT_IMPLEMENTED; }
inline PyObject* MPIRankCC(PyObject* self, PyObject* args) { MPI_NOT_IMPLEMENTED; }
inline PyObject* MPISizeCC(PyObject* self, PyObject* args) { MPI_NOT_IMPLEMENTED; }
inline PyObject* MPICreateGroupCC(PyObject* self, PyObject* args) { MPI_NOT_IMPLEMENTED; }
#endif // WITH_MPI
m.def("MPIFinalize", []() {
#ifdef WITH_MPI
MPI_Finalize();
#else
LOG(FATAL) << "MPI was not compiled.";
#endif
});
}
} // namespace python
......
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the Xpensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_PYTHON_PY_ONNX_H_
#define DRAGON_PYTHON_PY_ONNX_H_
......@@ -19,21 +21,18 @@ namespace dragon {
namespace python {
inline PyObject* ImportONNXModelCC(PyObject* self, PyObject* args) {
char* model_path;
if (!PyArg_ParseTuple(args, "s", &model_path)) {
PyErr_SetString(PyExc_ValueError,
"Excepted the model path.");
return nullptr;
}
GraphDef init_graph, pred_graph;
onnx::ONNXBackend onnx_backend;
onnx_backend.Prepare(model_path, &init_graph, &pred_graph);
// Serializing to Python is intractable
// We should apply the initializer immediately
ws()->CreateGraph(init_graph);
ws()->RunGraph(init_graph.name(), "", "");
return String_AsPyBytes(pred_graph.SerializeAsString());
void AddONNXMethods(pybind11::module& m) {
m.def("ImportONNXModel", [](
const string& model_path) {
GraphDef init_graph, pred_graph;
onnx::ONNXBackend onnx_backend;
onnx_backend.Prepare(model_path, &init_graph, &pred_graph);
// Serializing to Python is intractable
// We should apply the initializer immediately
ws()->CreateGraph(init_graph);
ws()->RunGraph(init_graph.name(), "", "");
return pybind11::bytes(pred_graph.SerializeAsString());
});
}
} // namespace python
......
......@@ -19,91 +19,38 @@ namespace dragon {
namespace python {
inline PyObject* RegisteredOperatorsCC(PyObject* self, PyObject* args) {
set<string> all_keys;
for (const auto& name : CPUOperatorRegistry()->keys()) all_keys.insert(name);
PyObject* list = PyList_New(all_keys.size());
int idx = 0;
for (const string& name : all_keys)
CHECK_EQ(PyList_SetItem(list, idx++, String_AsPyUnicode(name)), 0);
return list;
}
inline PyObject* NoGradientOperatorsCC(PyObject* self, PyObject* args) {
set<string> all_keys;
for (const auto& name : NoGradientRegistry()->keys()) all_keys.insert(name);
PyObject* list = PyList_New(all_keys.size());
int idx = 0;
for (const string& name : all_keys)
CHECK_EQ(PyList_SetItem(list, idx++, String_AsPyUnicode(name)), 0);
return list;
}
inline PyObject* RunOperatorCC(PyObject* self, PyObject* args) {
PyObject* op_str;
if (!PyArg_ParseTuple(args, "S", &op_str)) {
PyErr_SetString(PyExc_ValueError,
"Excepted a serialized string of OperatorDef.");
return nullptr;
}
OperatorDef op_def;
if (!op_def.ParseFromString(PyBytes_AsStringEx(op_str))) {
PyErr_SetString(PyExc_RuntimeError,
"Failed to parse the OperatorDef.");
return nullptr;
}
ws()->RunOperator(op_def);
Py_RETURN_TRUE;
}
inline PyObject* RunOperatorsCC(PyObject* self, PyObject* args) {
PyObject* py_ops;
if (!PyArg_ParseTuple(args, "O", &py_ops)) {
PyErr_SetString(PyExc_ValueError,
"Excepted a list of serialized string of OperatorDef.");
return nullptr;
}
OperatorDef op_def;
for (int i = 0; i < PyList_Size(py_ops); i++) {
PyObject* op_str = PyList_GetItem(py_ops, i);
CHECK(op_def.ParseFromString(PyBytes_AsStringEx(op_str)));
ws()->RunOperator(op_def);
}
Py_RETURN_TRUE;
}
inline PyObject* CreatePersistentOpCC(PyObject* self, PyObject* args) {
PyObject* op_str;
if (!PyArg_ParseTuple(args, "S", &op_str)) {
PyErr_SetString(PyExc_ValueError,
"Excepted a serialized string of OperatorDef.");
return nullptr;
}
OperatorDef op_def;
if (!op_def.ParseFromString(PyBytes_AsStringEx(op_str))) {
PyErr_SetString(PyExc_RuntimeError,
"Failed to parse the OperatorDef.");
return nullptr;
}
ws()->CreatePersistentOp(op_def);
Py_RETURN_TRUE;
}
inline PyObject* RunPersistentOpCC(PyObject* self, PyObject* args) {
char* key, *anchor;
PyObject* py_inputs, *py_outputs;
if (!PyArg_ParseTuple(args, "ssOO",
&key, &anchor, &py_inputs, &py_outputs)) {
PyErr_SetString(PyExc_ValueError,
"Excepted a persistent key, anchor, "
"list of inputs and outputs.");
return nullptr;
}
vector<string> inputs, outputs;
PyList_AsVecString(py_inputs, inputs, "");
PyList_AsVecString(py_outputs, outputs, "");
ws()->RunPersistentOp(key, anchor, inputs, outputs);
Py_RETURN_TRUE;
void AddOperatorMethods(pybind11::module& m) {
/*! \brief Return all the registered operators */
m.def("RegisteredOperators", []() { return CPUOperatorRegistry()->keys(); });
/*! \brief Return all the operators without gradients */
m.def("NoGradientOperators", []() { return NoGradientRegistry()->keys(); });
/*! \brief Run a operator from the def reference */
m.def("RunOperator", [](
OperatorDef* def,
const bool verbose) {
pybind11::gil_scoped_release g;
if (verbose) {
// It is not a good design to print the debug string
std::cout << def->DebugString() << std::endl;
}
ws()->RunOperator(*def);
});
/*! \brief Run a operator from the serialized def */
m.def("RunOperator", [](
const string& serialized,
const bool verbose) {
OperatorDef def;
CHECK(def.ParseFromString(serialized));
pybind11::gil_scoped_release g;
if (verbose) {
// It is not a good design to print the debug string
std::cout << def.DebugString() << std::endl;
}
ws()->RunOperatorOnce(def);
});
}
} // namespace python
......
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_PYTHON_PY_PROTO_H_
#define DRAGON_PYTHON_PY_PROTO_H_
#include "py_dragon.h"
namespace dragon {
namespace python {
void AddProtoMethods(pybind11::module& m) {
/*! \brief Extented C-Style OperatorDef */
pybind11::class_<OperatorDef>(m, "OperatorDef")
.def(pybind11::init())
.def("CopyFrom", [](
OperatorDef* self,
OperatorDef* other) {
self->CopyFrom(*other);
}).def("ParseFrom", [](
OperatorDef* self,
const string& serialized) {
self->ParseFromString(serialized);
}).def("SerializeAs", [](
OperatorDef* self) {
return pybind11::bytes(self->SerializeAsString());
}).def("add_input", [](
OperatorDef* self,
const string& input) {
self->add_input(input);
}).def("add_output", [](
OperatorDef* self,
const string& output) {
self->add_output(output);
}).def_property("name",
[](OperatorDef* self) {
return self->name(); },
[](OperatorDef* self, const string& name) {
self->set_name(name);
}).def_property("type",
[](OperatorDef* self) {
return self->type(); },
[](OperatorDef* self, const string& type) {
self->set_type(type);
}).def_property("input",
[](OperatorDef* self) -> vector<string> {
return { self->input().begin(), self->input().end() }; },
[](OperatorDef* self, const vector<string>& input) {
*(self->mutable_input()) = { input.begin(), input.end() };
}).def_property("output",
[](OperatorDef* self) -> vector<string> {
return{ self->output().begin(), self->output().end() }; },
[](OperatorDef* self, const vector<string>& output) {
*(self->mutable_output()) = { output.begin(), output.end() };
});
m.def("TestOperatorDefs", [](vector<OperatorDef*> defs) {
for (auto* def : defs) {
std::cout << def->DebugString() << std::endl;
}
});
}
} // namespace python
} // namespace dragon
#endif DRAGON_PYTHON_PY_PROTO_H_
\ No newline at end of file
......@@ -13,6 +13,7 @@
#ifndef DRAGON_PYTHON_PY_TYPES_H_
#define DRAGON_PYTHON_PY_TYPES_H_
#include <string>
#include <numpy/arrayobject.h>
#include "core/types.h"
......@@ -31,6 +32,7 @@ inline const int TypeMetaToNPY(const TypeMeta& meta) {
{ TypeMeta::Id<float16>(), NPY_FLOAT16 },
{ TypeMeta::Id<float>(), NPY_FLOAT32 },
{ TypeMeta::Id<double>(), NPY_FLOAT64 },
{ TypeMeta::Id<std::string>(), NPY_OBJECT },
};
return m2npy_type_map.count(meta.id()) ? m2npy_type_map[meta.id()] : -1;
}
......@@ -45,6 +47,8 @@ inline const TypeMeta& TypeNPYToMeta(int npy_type) {
{ NPY_FLOAT16, TypeMeta::Make<float16>() },
{ NPY_FLOAT32, TypeMeta::Make<float>() },
{ NPY_FLOAT64, TypeMeta::Make<double>() },
{ NPY_UNICODE, TypeMeta::Make<std::string>() },
{ NPY_STRING, TypeMeta::Make<std::string>() },
};
static TypeMeta unknown_type;
return npy2m_type_map.count(npy_type) ?
......
......@@ -24,6 +24,7 @@ from dragon.core.tensor import Tensor
import dragon.core.workspace as workspace
import dragon.core.tensor_utils as tensor_utils
import dragon.core.mpi as mpi
import dragon.core.cuda as cuda
import dragon.memonger as memonger
# Operators
......
......@@ -23,7 +23,7 @@ option = {}
# The current device, 'CPU', 'CUDA' or 'CNML'
option['device'] = 'CPU'
# The device id
# The device index
option['device_id'] = 0
# Whether to use cuDNN if possible
......@@ -32,8 +32,8 @@ option['use_cudnn'] = False
# The global random seed
option['random_seed'] = 3
# Disable the memonger if true
option['debug_mode'] = False
# Set the level of graph optimization
option['graph_optimization_level'] = 3
# Whether to share grads
option['share_grads'] = True
......@@ -76,29 +76,13 @@ def EnableCPU():
option['device'] = 'CPU'
def IsCUDADriverSufficient():
"""Is CUDADriver sufficient?
Returns
-------
boolean
``True`` if your device(s) support CUDA otherwise ``False``.
References
----------
The wrapper of ``IsCUDADriverSufficientCC``.
"""
return C.IsCUDADriverSufficientCC()
def EnableCUDA(gpu_id=0, use_cudnn=True):
"""Enable NVIDIA's CUDA mode globally.
Parameters
----------
gpu_id : int
The id of GPU to use.
The index of GPU to use.
use_cudnn : boolean
Whether to use cuDNN if available.
......@@ -119,7 +103,7 @@ def EnableCNML(mlu_id=0):
Parameters
----------
device_id : int
The id of MLU to use.
The index of MLU to use.
Returns
-------
......@@ -161,12 +145,12 @@ def GetRandomSeed():
def SetGPU(id):
"""Set the global id GPU.
"""Set the global index GPU.
Parameters
----------
id : int
The id of GPU to use.
The index of GPU to use.
Returns
-------
......@@ -178,26 +162,26 @@ def SetGPU(id):
def GetGPU():
"""Get the global id of GPU.
"""Get the global index of GPU.
Returns
-------
int
The global id of GPU.
The global index of GPU.
"""
return option['device_id']
def SetDebugMode(enabled=True):
"""Enable Debug mode globally.
def SetGraphType(graph_type=''):
"""Set the graph type.
It will disable all memory sharing optimizations.
If empty, the default DAG graph will be used.
Parameters
----------
enabled : boolean
Whether to enable debug mode.
graph_type : str
The graph type.
Returns
-------
......@@ -205,18 +189,28 @@ def SetDebugMode(enabled=True):
"""
global option
option['debug_mode'] = enabled
option['graph_type'] = graph_type
def SetGraphType(graph_type=''):
"""Set the graph type.
def SetGraphOptimizationLevel(level=3):
"""Set the default level of graph optimization.
If empty, the default DAG graph will be used.
We have predefined four levels:
-O0(level=0): Do nothing.
-O1(level=1): Prune the redundant nodes.
-O2(level=2): Add the inplace to outputs.
Note that the graph will no longer be a DAG.
-O3(level=3): Allocate the buffer for outputs.
This level is memory-efficient while debugging will be non-trivial.
Parameters
----------
graph_type : str
The graph type.
level : {0, 1, 2, 3}, optional, default=3
The level, see the documentation for details.
Returns
-------
......@@ -224,7 +218,7 @@ def SetGraphType(graph_type=''):
"""
global option
option['graph_type'] = graph_type
option['graph_optimization_level'] = level
def LogMetaGraph(enabled=True):
......@@ -301,7 +295,7 @@ def SetLoggingLevel(level):
The default level is *INFO*.
"""
C.SetLogLevelCC(level)
C.SetLoggingLevel(level)
logging.set_verbosity({
'DEBUG': logging.DEBUG,
'INFO': logging.INFO,
......
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""List some useful CUDA C++ API."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import dragon.import_c_api as C
def IsCUDADriverSufficient():
"""Is cuda driver sufficient?
Returns
-------
boolean
``True`` if your device(s) support CUDA otherwise ``False``.
"""
return C.IsCUDADriverSufficient()
def GetDevice():
"""Get the current active cuda device.
Returns
-------
int
The device index.
"""
return C.cudaGetDevice()
def SynchronizeStream(device_id=None, stream_id=0):
"""Synchronize the specified cuda stream.
If ``device_id`` is *None*, the current active device will be selected.
Returns
-------
device_id : int or None
The device index.
stream_id : int
The stream index.
"""
return C.cudaStreamSynchronize(
device_id if device_id else -1, stream_id)
\ No newline at end of file
......@@ -49,9 +49,9 @@ class GraphGradientMaker(object):
Parameters
----------
forward_op : dragon_pb2.OperatorDef
forward_op : OperatorDef
The OperatorDef of ``ForwardOp``.
g_outputs : list of str or list of None
g_outputs : list of str
The inputs of ``BackwardOp`` (Precomputed grads).
name : str, optional
The optional operator name.
......@@ -61,13 +61,9 @@ class GraphGradientMaker(object):
tuple
The OpDef, outputs and defaults of ``BackwardOp``.
References
----------
The wrapper of ``CreateGradientDefsCC``.
"""
g_ops, g_inputs, defaults = \
C.CreateGradientDefsCC(forward_op.SerializeToString(), g_outputs)
g_ops, g_inputs, defaults = C.CreateGradientDefs(
forward_op.SerializeToString(), g_outputs)
for idx, g_op in enumerate(g_ops):
new_def = pb.OperatorDef()
new_def.ParseFromString(g_op)
......@@ -80,13 +76,13 @@ class GraphGradientMaker(object):
Parameters
----------
forward_op : dragon_pb2.OperatorDef
forward_op : OperatorDef
The OperatorDef of ``ForwardOp``.
inputs_to_grads : dict
The dict of <input, g_input>.
blacklist : set of str
The set of ``NoGradient`` tensors.
targets : list of str
targets : sequence of str
The solving targets.
Returns
......@@ -123,7 +119,7 @@ class GraphGradientMaker(object):
Parameters
----------
forward_ops : list of dragon_pb2.OperatorDef
forward_ops : sequence of OperatorDef
The operators of ``ForwardOp``.
targets : sequence of str
The solving targets.
......@@ -168,12 +164,12 @@ class GraphGradientMaker(object):
is_skip, gen_grads = \
cls.CheckGrad(forward_op, inputs_to_grads, blacklist, targets)
# Missing grads are represented as ``None``
g_outputs = list(inputs_to_grads.get(name, None) for name in forward_op.output)
g_outputs = list(inputs_to_grads.get(name, 'ignore') for name in forward_op.output)
g_ops, g_inputs, defaults = cls.CreateGrad(forward_op, g_outputs)
# Append ops
if not is_skip:
# --> GenOp
# GradientGenerateOp
if len(gen_grads) > 0:
op_inputs = []; op_outputs = []; values = []
for item in gen_grads:
......@@ -185,7 +181,7 @@ class GraphGradientMaker(object):
if forward_op.HasField('device_option'):
gen_op.device_option.CopyFrom(forward_op.device_option)
backward_ops.append(gen_op)
# --> GradOp
# GradientOp
for g_op in g_ops:
g_op.name = OperatorHelper.get_name() if auto_names else 'runtime'
backward_ops.append(g_op)
......
......@@ -33,7 +33,7 @@ class OperatorHelper(object):
# Input(0) => Output(0), shape and data type unchanged.
'Relu', 'PRelu', 'Elu', 'SElu', 'Sigmoid', 'Tanh', 'Dropout', 'Softmax',
'Add', 'Sub', 'Mul', 'Div', 'Clip', 'Log', 'Exp', 'Pow', 'Square', 'Sqrt',
'Affine', 'Copy', 'Compare', 'StopGradient', 'MovingAverage', 'MPIBroadcast',
'Accumulate', 'Affine', 'Copy', 'Compare', 'StopGradient', 'MPIBroadcast',
'BatchNorm', 'GroupNorm', 'L2Norm', 'LRN', 'BiasAdd', 'DropBlock2d',
)
......@@ -885,10 +885,6 @@ class OperatorHelper(object):
def _apply_BilinearResize(cls, arguments, inputs, outputs):
return cls._apply_NNResize(arguments, inputs, outputs)
@classmethod
def _apply_DenseConcat(cls, arguments, inputs, outputs):
return cls._apply_Concat(arguments, inputs, outputs)
class GradientHelper(object):
"""A helper to store the known gradient relations.
......
......@@ -43,8 +43,9 @@ def get_logger():
logger = _logging.getLogger('dragon')
logger.setLevel(INFO)
logger.propagate = False
if not _logging.getLogger().handlers:
if True:
# Determine whether we are in an interactive environment
_interactive = False
try:
......
......@@ -9,31 +9,15 @@
#
# ------------------------------------------------------------
"""List some useful MPI C++ API."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import dragon.import_c_api as C
__all__ = [
'Init',
'Is_Init',
'Rank',
'Size',
'CreateGroup',
'Snapshot',
'AllowSnapshot',
'Parallel',
'AllowParallel',
'SetParallelMode',
'GetParallelMode',
'Finalize',
]
_GLOBAL_MPI_IS_INIT = False
_GLOBAL_MPI_SNAPSHOT_RANKS = []
_GLOBAL_MPI_PARALLEL_GROUPS = []
......@@ -55,12 +39,8 @@ def Init():
-----
This function can only be called once.
References
----------
The wrapper of ``MPIInitCC``
"""
C.MPIInitCC()
C.MPIInit()
global _GLOBAL_MPI_IS_INIT
global _GLOBAL_MPI_SNAPSHOT_RANKS
_GLOBAL_MPI_IS_INIT = True
......@@ -86,13 +66,9 @@ def Rank():
int
The world rank.
References
----------
The wrapper of ``MPIRankCC``.
"""
_check_init()
return C.MPIRankCC()
return C.MPIRank()
def Size():
......@@ -103,13 +79,9 @@ def Size():
int
The world size.
References
----------
The wrapper of ``MPISizeCC``.
"""
_check_init()
return C.MPISizeCC()
return C.MPISize()
def CreateGroup(root=0, incl=[], excl=[]):
......@@ -129,14 +101,9 @@ def CreateGroup(root=0, incl=[], excl=[]):
tuple
The local common and group id.
References
----------
The wrapper of ``MPICreateGroupCC``.
"""
_check_init()
comm, group = C.MPICreateGroupCC(root, incl, excl)
return np.int64(comm), np.int64(group)
return C.MPICreateGroup(root, incl, excl)
def Snapshot(incl):
......@@ -193,6 +160,7 @@ def AllowSnapshot():
Returns
-------
boolean
"""
return Rank() in _GLOBAL_MPI_SNAPSHOT_RANKS
......@@ -212,12 +180,12 @@ def AllowParallel():
def SetParallelMode(mode):
"""Set the mode of data parallelism.
"""Set the communication mode of data parallelism.
Parameters
----------
mode : str
The mode, ``MPI``, ``NCCL`` or ``MIXED``.
mode : {'MPI', 'NCCL'}, optional
The communication mode.
Returns
-------
......@@ -228,20 +196,18 @@ def SetParallelMode(mode):
The default mode is ``MPI``.
"""
assert mode == 'MPI' or \
mode == 'NCCL' or \
mode == 'MIXED'
assert mode == 'MPI' or mode == 'NCCL'
global _GLOBAL_MPI_PARALLEL_MODE
_GLOBAL_MPI_PARALLEL_MODE = mode
def GetParallelMode():
"""Get the current mode of data parallelism.
"""Get the current communication mode of data parallelism.
Returns
-------
str
The mode, ``MPI``, ``NCCL`` or ``MIXED``.
str : {'MPI', 'NCCL'}
The communication mode.
"""
return _GLOBAL_MPI_PARALLEL_MODE
......@@ -260,4 +226,4 @@ def Finalize():
"""
_check_init()
C.MPIFinalizeCC()
\ No newline at end of file
C.MPIFinalize()
\ No newline at end of file
......@@ -21,6 +21,7 @@ import numpy as np
from google.protobuf.message import Message
import dragon.config as cfg
import dragon.import_c_api as C
from dragon.proto import dragon_pb2 as pb
from dragon.core.scope import get_default_device
......@@ -50,14 +51,15 @@ else:
argument.name = key
if type(value) is float: argument.f = value
elif type(value) in (bool, int, long, np.int64) : argument.i = value
elif type(value) in (str, unicode): argument.s = value
elif type(value) is str: argument.s = value
elif type(value) is unicode: argument.s = str(value)
elif isinstance(value, Message): argument.s = value.SerializeToString()
elif all(type(v) is float for v in value): argument.floats.extend(value)
elif all(type(v) is int for v in value): argument.ints.extend(value)
elif all(type(v) is long for v in value): argument.ints.extend(value)
elif all(type(v) is str for v in value): argument.strings.extend(value)
elif all(type(v) is unicode or type(v) is str for v in value):
argument.strings.extend(value)
elif all(type(v) is unicode for v in value):
argument.strings.extend([str(v) for v in value])
elif all(isinstance(v, Message) for v in value):
argument.strings.extend([v.SerializeToString() for v in value])
else:
......@@ -67,8 +69,10 @@ else:
return argument
def MakeOperatorDef(op_type, inputs, outputs, name='',
device_option=None, arg=None, engine=None, **kwargs):
def MakeOperatorDef(
op_type, inputs=(), outputs=(),
name='', uid=None, device_option=None,
arg=None, engine=None, **kwargs):
operator = pb.OperatorDef()
operator.type = op_type
operator.name = name
......@@ -81,22 +85,29 @@ def MakeOperatorDef(op_type, inputs, outputs, name='',
if 'random_seed' in kwargs:
operator.device_option.random_seed = kwargs['random_seed']
del kwargs['random_seed']
if arg is not None:
operator.arg.extend(arg)
if uid is not None: operator.uid = uid
if arg is not None: operator.arg.extend(arg)
for k,v in kwargs.items():
if v is None: continue
operator.arg.add().CopyFrom(MakeArgument(k,v))
return operator
def MutableOperatorDef(meta_def, inputs, outputs):
op = pb.OperatorDef(); op.CopyFrom(meta_def)
op.ClearField('input'); op.input.extend(inputs)
op.ClearField('output'); op.output.extend(outputs)
return op
def MakeCXXOperatorDef(
op_type, inputs=(), outputs=(),
name='', uid=None, device_option=None,
arg=None, engine=None, **kwargs):
c_def = C.OperatorDef()
py_def = MakeOperatorDef(
op_type, inputs, outputs, name, uid,
device_option, arg, engine, **kwargs)
c_def.ParseFrom(py_def.SerializeToString())
return c_def
def MakeDeviceOption(device_type, device_id, engine=None, rng_seed=None):
def MakeDeviceOption(
device_type, device_id,
engine=None, rng_seed=None):
option = pb.DeviceOption()
option.device_type = device_type
option.device_id = device_id
......@@ -121,7 +132,9 @@ for i in range(_PREDEFINED_DEVICE_LIMITS):
MakeDeviceOption(identify, i, 'CUDNN')
def GetDeviceOption(device_type, device_id=0, engine=None, rng_seed=None):
def GetDeviceOption(
device_type, device_id=0,
engine=None, rng_seed=None):
ctx = (device_type, device_id, engine if engine else '')
option = _PREDEFINED_DEVICE_OPTION_DICT[ctx]
if rng_seed is not None:
......
......@@ -88,11 +88,11 @@ class WorkspaceScope(object):
self.prev = 'default'
def __enter__(self):
self.prev = C.CurrentWorkspaceCC()
C.SwitchWorkspaceCC(self.ws, True)
self.prev = C.CurrentWorkspace()
C.SwitchWorkspace(self.ws, True)
def __exit__(self, type, value, traceback):
C.SwitchWorkspaceCC(self.prev, True)
C.SwitchWorkspace(self.prev, True)
_GLOBAL_TENSOR_STACK = _ThreadLocalStack()
......
......@@ -355,10 +355,9 @@ class Tensor(object):
"""
if inplace:
return Tensor.CreateOperator(
'AsType', [], existing_outputs=[self], dtype=dtype)
'Cast', [], existing_outputs=[self], dtype=dtype)
else:
return Tensor.CreateOperator(
'AsType', self, dtype=dtype)
return Tensor.CreateOperator('Cast', self, dtype=dtype)
@property
def extra_targets(self):
......
......@@ -9,6 +9,8 @@
#
# ------------------------------------------------------------
"""List some extended Tensor C++ API."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
......@@ -23,21 +25,7 @@ from dragon.core.tensor import Tensor
from dragon.core.proto_utils import GetDeviceOption
__all__ = [
'FromShape',
'SetShape',
'FromTensor',
'FromPyArray',
'SetPyArray',
'ToPyArray',
'ToPyArrayEx',
'ToCPUTensor',
'ToCUDATensor',
'GetTensorInfo',
]
def FromShape(shape, dtype='float32', ctx=None, name=None):
def FromShape(shape, dtype='float32', name=None):
"""Create a Tensor from the shape.
If specifying a existed tensor with larger shape,
......@@ -49,8 +37,6 @@ def FromShape(shape, dtype='float32', ctx=None, name=None):
The shape info.
dtype : str
The data type.
ctx : dragon_pb2.DeviceOption
The context info.
name : str, optional
The optional tensor name.
......@@ -59,19 +45,14 @@ def FromShape(shape, dtype='float32', ctx=None, name=None):
Tensor
The tensor with the specific shape.
References
----------
The wrapper of ``TensorFromShapeCC``.
"""
tensor = _try_get_tensor(name)
tensor.shape = list(shape)
if not isinstance(shape, (tuple, list)):
raise TypeError('The shape should be a tuple or list.')
if ctx is None: ctx = GetDeviceOption('CPU')
C.TensorFromShapeCC(
C.TensorFromShape(
_stringify_tensor(tensor),
list(shape), dtype,
_stringify_proto(ctx))
list(shape), dtype)
return tensor
......@@ -91,12 +72,8 @@ def SetShape(tensor, shape, dtype='float32'):
-------
None
References
----------
The wrapper of ``TensorFromShapeCC``.
"""
C.TensorFromShapeCC(_stringify_tensor(tensor), shape, dtype)
C.TensorFromShape(_stringify_tensor(tensor), shape, dtype)
def FromTensor(src, src_ctx=None, name=None, ctx=None):
......@@ -109,11 +86,11 @@ def FromTensor(src, src_ctx=None, name=None, ctx=None):
----------
src_ctx : str
The name of source tensor.
src_ctx : dragon_pb2.DeviceOption
src_ctx : DeviceOption
The context of source tensor.
name : str
The optional tensor name for destination tensor.
ctx : dragon_pb2.DeviceOption
ctx : DeviceOption
The context for destination tensor.
Returns
......@@ -121,17 +98,13 @@ def FromTensor(src, src_ctx=None, name=None, ctx=None):
Tensor
The tensor with the same data as source.
References
----------
The wrapper of ``TensorFromTensorCC``.
"""
tensor = _try_get_tensor(name)
if src_ctx is None: src_ctx = GetDeviceOption('CPU')
if ctx is None: ctx = GetDeviceOption('CPU')
C.TensorFromTensorCC(
C.TensorFromTensor(
_stringify_tensor(tensor), _stringify_tensor(src),
_stringify_proto(ctx), _stringify_proto(src_ctx))
_stringify_proto(ctx), _stringify_proto(src_ctx))
return tensor
......@@ -155,15 +128,11 @@ def FromPyArray(array, name=None):
Tensor
The tensor sharing the memory with original array.
References
----------
The wrapper of ``TensorFromPyArrayCC``.
"""
tensor = _try_get_tensor(name)
if not isinstance(array, np.ndarray):
raise TypeError('The given nd-array should be numpy.ndarray.')
C.TensorFromPyArrayCC(_stringify_tensor(tensor), array)
C.TensorFromPyArray(_stringify_tensor(tensor), array)
return tensor
......@@ -188,154 +157,58 @@ def SetPyArray(tensor, array):
The wrapper of ``TensorFromPyArrayCC``.
"""
C.TensorFromPyArrayCC(_stringify_tensor(tensor), array)
C.TensorFromPyArray(_stringify_tensor(tensor), array)
def ToPyArray(tensor):
def ToPyArray(tensor, readonly=False):
"""Create a Array from a existing Tensor.
Note that memory of Array are ``zero-copied``.
Note that memory of Array are *zero-copied*.
Parameters
----------
tensor : Tensor or str
The input tensor.
readonly : boolean
Whether to sync the contents with device.
Returns
-------
numpy.ndarray
The array sharing the memory with original tensor.
References
----------
The wrapper of ``TensorToPyArrayCC``.
"""
return C.TensorToPyArrayCC(_stringify_tensor(tensor))
def ToPyArrayEx(tensor):
"""Create a const Array from a existing Tensor.
Note that memory of Array are ``zero-copied`` and ``const``.
Parameters
----------
tensor : Tensor or str
The input tensor.
Returns
-------
numpy.ndarray
The array sharing the memory with original tensor.
References
----------
The wrapper of ``TensorToPyArrayExCC``.
"""
return C.TensorToPyArrayExCC(_stringify_tensor(tensor))
def ToCPUTensor(tensor):
"""Switch the storage of a existing Tensor on cpu memory.
Parameters
----------
tensor : Tensor or str
The input tensor.
Returns
-------
None
References
----------
The wrapper of ``ToCPUTensorCC``.
"""
return C.ToCPUTensorCC(_stringify_tensor(tensor))
return C.TensorToPyArray(_stringify_tensor(tensor), readonly)
def ToCUDATensor(tensor, device=0):
"""Switch the storage of a existing Tensor on cuda memory.
def GetStorage(tensor):
"""Get the storage of a existing Tensor.
Parameters
----------
tensor : Tensor or str
The input tensor.
device : int
The id of the device to use.
Returns
-------
None
References
----------
The wrapper of ``ToCUDATensorCC``.
TensorStorage
The storage of the backend.
"""
return C.ToCUDATensorCC(_stringify_tensor(tensor), device)
def GetTensorInfo(tensor, stream=1):
"""Get the info of a existing Tensor.
The string info contains following fields:
stream #1: ``dtype``, ``from_numpy``, ``init``, ``mem``, ``mem_at``, ``device_id``
stream #2: ``shape``
stream #3: #1 + #2
Parameters
----------
tensor : Tensor or str
The input tensor.
stream : int
The stream id.
Returns
-------
dict
The info.
References
----------
The wrapper of ``GetTensorInfoCC``.
"""
if not dg.workspace.HasTensor(_stringify_tensor(tensor)): return None
info = C.GetTensorInfoCC(_stringify_tensor(tensor), stream)
info['mem'] = []
if 'CPU' in info:
info['mem'].append('CPU'); info['device_id'] = 0
if 'CUDA' in info:
info['mem'].append('CUDA'); info['device_id'] = int(info['CUDA'])
if 'CNML' in info:
info['mem'].append('CNML'); info['device_id'] = int(info['CNML'])
info['init'] = len(info['mem']) > 0
return info
tensor = _stringify_tensor(tensor)
if not dg.workspace.HasTensor(tensor): return None
return C.GetTensor(tensor)
def _stringify_proto(obj):
"""Try to stringify a proto-buffer structure."""
if obj is str: return obj
elif isinstance(obj, Message): return obj.SerializeToString()
else: raise TypeError('Object can not be serialized as a string.')
return obj.SerializeToString()
def _stringify_tensor(obj):
"""Try to stringify a tensor."""
if hasattr(obj, 'name'): return obj.name
else:
try:
obj = str(obj)
except Exception as e:
raise TypeError('Object can bot be used as a tensor. Error: {0}'.format(str(e)))
return obj
else: return str(obj)
def _try_get_tensor(name=None):
......
......@@ -33,8 +33,8 @@ except ImportError as e:
sys.exit(1)
REGISTERED_OPERATORS = set(s for s in RegisteredOperatorsCC())
NO_GRADIENT_OPERATORS = set(s for s in NoGradientOperatorsCC())
REGISTERED_OPERATORS = set(s for s in RegisteredOperators())
NO_GRADIENT_OPERATORS = set(s for s in NoGradientOperators())
atexit.register(OnModuleExitCC)
\ No newline at end of file
atexit.register(OnModuleExit)
\ No newline at end of file
......@@ -100,8 +100,8 @@ class ArgumentHelper(object):
arguments[name] = None
arguments[name + '_desc'] = property.name
return arguments
extra_kwargs = {'gen_desc_{}'.format(name): Generator}
return op_func(*args, **kwargs, **extra_kwargs)
kwargs.update({'gen_desc_{}'.format(name): Generator})
return op_func(*args, **kwargs)
return Impl
return Decorator
......@@ -138,8 +138,8 @@ class ArgumentHelper(object):
else:
arguments[desc_name] = properties
return arguments
extra_kwargs = {'gen_desc_{}'.format(name): Generator}
return op_func(*args, **kwargs, **extra_kwargs)
kwargs.update({'gen_desc_{}'.format(name): Generator})
return op_func(*args, **kwargs)
return Impl
return Decorator
......
......@@ -140,11 +140,13 @@ def Minimum(inputs, **kwargs):
@OpSchema.Inputs(1)
def Moments(inputs, axes=None, keep_dims=False, **kwargs):
"""Compute the mean and variance of inputs along the given axes.
"""Calculate the mean and variance of inputs along the given axes.
The data type of moments will be *float32* typically,
except the *float64* inputs (*float64* moments instead).
If ``axes`` is *None*, a Scalar will be returned.
**Type Constraints**: (*int8*, *uint8*, *int32*, *int64*, *float16*, *float32*, *float64*)
Parameters
......@@ -206,9 +208,9 @@ def Matmul(inputs, transA=False, transB=False, **kwargs):
----------
inputs : sequence of Tensor
The inputs, A and B.
transA : bool
transA : bool, optional, default=False
Whether to transpose A.
transB : bool
transB : bool, optional, default=False
Whether to transpose B.
Returns
......@@ -234,9 +236,9 @@ def Dot(inputs, transA=False, transB=False, **kwargs):
----------
inputs : sequence of Tensor
The inputs, A and B.
transA : bool
transA : bool, optional, default=False
Whether to transpose A.
transB : bool
transB : bool, optional, default=False
Whether to transpose B.
Returns
......@@ -262,9 +264,9 @@ def FullyConnected(inputs, num_output, axis=1, transW=True, **kwargs):
The inputs, represent [X, W] + [b].
num_output : int
The output dim.
axis : int, optional
axis : int, optional, default=1
The start axis to calculate, can be negative.
transW : bool, optional
transW : bool, optional, default=True
Whether to transpose the W.
Returns
......@@ -346,7 +348,7 @@ def Exp(inputs, **kwargs):
@OpSchema.Inputs(1)
def Pow(inputs, power, shift=None, scale=None, **kwargs):
def Pow(inputs, power, shift=0., scale=1., **kwargs):
"""Calculate the power of input.
Formulation: |power_function|
......@@ -357,11 +359,11 @@ def Pow(inputs, power, shift=None, scale=None, **kwargs):
----------
inputs : Tensor
The input tensor.
power : float
power : float, required
The power factor.
shift : float, optional
shift : float, optional, default=0.
The shift magnitude.
scale : float, optional
scale : float, optional, default=1.
The scale factor.
Returns
......@@ -414,7 +416,7 @@ def Sqrt(inputs, **kwargs):
The sqrt result.
"""
return Tensor.CreateOperator('Pow', power=0.5, **ParseArgs(locals()))
return Tensor.CreateOperator('Sqrt', **ParseArgs(locals()))
@OpSchema.Inputs(2, 3)
......@@ -433,9 +435,9 @@ def Affine(inputs, axis=1, num_axes=1, **kwargs):
----------
inputs : sequence of Tensor
The inputs, represent [x, A] + [b].
axis : int, optional
axis : int, optional, default=1
The start axis to scale, can be negative.
num_axes : int, optional
num_axes : int, optional, default=1
The number of axes to scale.
Returns
......@@ -459,7 +461,7 @@ def GramMatrix(inputs, axis=1, **kwargs):
---------=
inputs : Tensor
The input tensor.
axis : int, optional
axis : int, optional, default=1
The start axis to calculate.
Returns
......@@ -469,3 +471,48 @@ def GramMatrix(inputs, axis=1, **kwargs):
"""
return Tensor.CreateOperator('GramMatrix', **ParseArgs(locals()))
@OpSchema.Inputs(1, INT_MAX)
def Accumulate(inputs, alpha=1., beta=1., **kwargs):
"""Calculate *y = alpha * x + beta * y*
**Type Constraints**: (*int8*, *uint8*, *int32*, *int64*, *float16*, *float32*, *float64*)
Parameters
----------
inputs : sequence of Tensor
The inputs, i.e., the *x*.
alpha : float, optional, default=1.
The alpha value.
beta : float, optional, default=1.
Returns
-------
sequence of Tensor
The outputs, i.e., the *y*.
"""
return Tensor.CreateOperator('Accumulate', **ParseArgs(locals()))
@OpSchema.Inputs(1, INT_MAX)
def MovingAverage(inputs, decay, **kwargs):
"""Calculate the *y = (1 - decay) * x + decay * y*
**Type Constraints**: (*int8*, *uint8*, *int32*, *int64*, *float16*, *float32*, *float64*)
Parameters
----------
inputs : sequence of Tensor
The inputs, i.e., the *x*.
decay : float, required
The decay factor.
Returns
-------
sequence of Tensor
The outputs, i.e., the *y*.
"""
return Accumulate(inputs, 1 - decay, decay, **kwargs)
\ No newline at end of file
......@@ -17,7 +17,7 @@ from . import *
@OpSchema.Inputs(1)
def AsType(inputs, dtype='float32', inplace=False, **kwargs):
def Cast(inputs, dtype='float32', inplace=False, **kwargs):
"""Cast the data type of inputs to a specific one.
If ``inplace`` is ``True``, cast ``self`` instead of returning a new one.
......@@ -41,7 +41,7 @@ def AsType(inputs, dtype='float32', inplace=False, **kwargs):
Examples
--------
>>> x = Tensor('x', dtype='float32').Variable()
>>> y = AsType(x, 'int32')
>>> y = Cast(x, 'int32')
>>> z = x.astype('int64')
>>> xx = x.astype('float64', inplace=True)
>>> print(x.name, xx.name)
......@@ -53,7 +53,7 @@ def AsType(inputs, dtype='float32', inplace=False, **kwargs):
arguments['inputs'] = []
arguments['existing_outputs'] = [inputs]
return Tensor.CreateOperator('AsType', **arguments)
return Tensor.CreateOperator('Cast', **arguments)
def Run(inputs, module, op, param_str='', num_outputs=1, **kwargs):
......@@ -173,28 +173,4 @@ def StopGradient(inputs, **kwargs):
A identity of input.
"""
return Tensor.CreateOperator('StopGradient', **ParseArgs(locals()))
@OpSchema.Inputs(1)
def MovingAverage(inputs, decay, **kwargs):
"""Calculate the moving average.
**Type Constraints**: (*int8*, *uint8*, *int32*, *int64*, *float16*, *float32*, *float64*)
Parameters
----------
inputs : Tensor
The values to calculate moving average.
decay : float
The decay factor.
Returns
-------
Tensor
The output tensor, i.e., ``variable``, calculated as:
|moving_average_function|
"""
return Tensor.CreateOperator('MovingAverage', **ParseArgs(locals()))
\ No newline at end of file
return Tensor.CreateOperator('StopGradient', **ParseArgs(locals()))
\ No newline at end of file
......@@ -740,7 +740,6 @@ def Shape(inputs, **kwargs):
return Tensor.CreateOperator('Shape', **ParseArgs(locals()))
@OpSchema.Inputs(0)
@ArgumentHelper.Desc('start')
@ArgumentHelper.Desc('stop')
@ArgumentHelper.Desc('step')
......
......@@ -62,7 +62,7 @@ def Conv2d(
The dilation multiple(s) of convolution.
group : int, optional, default=1
The group size of convolution.
padding : {'VALID', 'SAME, 'SAME_UPPER', 'SAME_LOWER'}, optional
padding : {'VALID', 'SAME', 'SAME_UPPER', 'SAME_LOWER'}, optional
The padding algorithm.
data_format : {'NCHW', 'NHWC'}, optional
The data_format.
......@@ -119,7 +119,7 @@ def DepthwiseConv2d(
The stride(s) of convolution.
pads : sequence of int, optional, default=0
The zero padding size(s) of convolution.
padding : {'VALID', 'SAME, 'SAME_UPPER', 'SAME_LOWER'}, optional
padding : {'VALID', 'SAME', 'SAME_UPPER', 'SAME_LOWER'}, optional
The padding algorithm.
data_format : {'NCHW', 'NHWC'}, optional
The data_format.
......@@ -183,7 +183,7 @@ def ConvTranspose2d(
The padding value add to one side(right) of the output.
output_shape : sequence of (int, Tensor), optional
The deterministic output shape for **SAME** padding.
padding : {'VALID', 'SAME, 'SAME_UPPER', 'SAME_LOWER'}, optional
padding : {'VALID', 'SAME', 'SAME_UPPER', 'SAME_LOWER'}, optional
The padding algorithm.
data_format : {'NCHW', 'NHWC'}, optional
The data_format.
......@@ -224,7 +224,7 @@ def ConvTranspose2d(
@OpSchema.Inputs(1)
def Pool2d(
inputs, kernel_shape, strides, pads=0, padding='VALID', ceil=True,
inputs, kernel_shape, strides, pads=0, padding='VALID', ceil_mode=True,
mode='MAX', data_format='NCHW', global_pooling=False, **kwargs):
"""2D Pooling, MAX or AVG.
......@@ -248,9 +248,9 @@ def Pool2d(
The stride(s) of of pooling,
pads : sequence of int, optional, default=0
The zero padding size(s) of pooling.
padding : {'VALID', 'SAME, 'SAME_UPPER', 'SAME_LOWER'}, optional
padding : {'VALID', 'SAME', 'SAME_UPPER', 'SAME_LOWER'}, optional
The padding algorithm.
ceil : bool, optional
ceil_mode : bool, optional, default=True
Whether to ceil the boundary.
mode : {'MAX', 'AVG'}, optional
The pooling mode.
......@@ -505,48 +505,6 @@ def BiasAdd(inputs, data_format='NCHW', **kwargs):
return Tensor.CreateOperator('BiasAdd', **arguments)
@OpSchema.Inputs(2)
def DenseConcat(inputs, growth_rate=0, axis=1, **kwargs):
"""Memory-efficient concatenation for DenseNet `[Huang et.al, 2017] <http://arxiv.org/abs/1608.06993>`_.
This operator is forked from ``Concat``.
The memory optimization requires the following settings:
1. Set the ``growth_rate``, the value must larger than ``0``.
2. Set the ``mirror_stage`` to True.
Parameters
----------
inputs : sequence of Tensor
The inputs, represent A(old) and B(new) respectively.
growth_rate : int, optional, default=0
The growth rate.
axis : int, optional
The axis to concatenate.
mirror_stage : bool, optional
Whether to share input A for output C. Default is ``False``.
Returns
-------
Tensor
The concatenated tensor, represents C.
Examples
--------
>>> A = Tensor().Variable()
>>> B = Tensor().Variable()
>>> C = DenseConcat([A, B], axis=1) # Simple concatenation
>>> import dragon.memonger as opt
>>> C = opt.Drop(DenseConcat, [A, B], axis=1) # Memory-efficient concatenation
>>> D = DenseConcat([A, B], axis=1, mirror_stage=True) # Memory-efficient concatenation, equivalent
"""
return Tensor.CreateOperator('DenseConcat', **ParseArgs(locals()))
@OpSchema.Inputs(1)
@ArgumentHelper.Desc('keep_prob', as_target=False)
def DropBlock2d(
......
......@@ -52,7 +52,6 @@ LRN = vision_ops.LRN
NNResize = vision_ops.NNResize
BilinearResize = vision_ops.BilinearResize
BiasAdd = vision_ops.BiasAdd
DenseConcat = vision_ops.DenseConcat
DropBlock2d = vision_ops.DropBlock2d
# Recurrent
......@@ -104,6 +103,8 @@ FullyConnected = math_ops.FullyConnected
Eltwise = math_ops.Eltwise
Affine = math_ops.Affine
GramMatrix = math_ops.GramMatrix
Accumulate = math_ops.Accumulate
MovingAverage = math_ops.MovingAverage
# Normalization
BatchNorm = norm_ops.BatchNorm
......@@ -137,19 +138,18 @@ Squeeze = array_ops.Squeeze
Shape = array_ops.Shape
Arange = array_ops.Arange
# ControlFlow
# Control Flow
Copy = control_flow_ops.Copy
Equal = control_flow_ops.Equal
Less = control_flow_ops.Less
Grater = control_flow_ops.Greater
# Misc
Cast = AsType = misc_ops.AsType
Cast = AsType = misc_ops.Cast
Run = misc_ops.Run
Template = misc_ops.Template
Accuracy = misc_ops.Accuracy
StopGradient = misc_ops.StopGradient
MovingAverage = misc_ops.MovingAverage
# MPI
MPIBroadcast = mpi_ops.MPIBroadcast
......
This diff is collapsed. Click to expand it.
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!