Custom operators

ONNX Runtime provides options to run custom operators that are not official ONNX operators. The contrib ops domain contains some common non-official ops, however it’s not recommended to add operators here to avoid increasing binary size of the core runtime package.

Register a custom operator
CUDA custom ops

Register a custom operator

A new op can be registered with ONNX Runtime using the Custom Operator API (onnxruntime_c_api.h)

Create an OrtCustomOpDomain with the domain name used by the custom ops
Create an OrtCustomOp structure for each op and add them to the OrtCustomOpDomain with OrtCustomOpDomain_Add
Call OrtAddCustomOpDomain to add the custom domain of ops to the session options

Examples

C++ helper API: custom ops MyCustomOp and SliceCustomOp use the C++ helper API (onnxruntime_cxx_api.h). The test file also demonstrates an option to compile the custom ops into a shared library to be used to run a model via the C++ API.
Custom op shared library: sample custom op shared library containing two custom kernels
Custom op shared library with Python API: testRegisterCustomOpsLibrary uses the Python API to register a shared library with custom op kernels. Currently, the only supported Execution Providers (EPs) for custom ops registered via this approach are the CUDA and the CPU EPs.
E2E example: Export and run a PyTorch model with custom op

CUDA custom ops

When a model being inferred on GPU, onnxruntime will insert MemcpyToHost op before a CPU custom op and append MemcpyFromHost after to make sure tensor(s) are accessible throughout calling, meaning there are no extra efforts required from custom op developer for the case.

When using CUDA custom ops, to ensure synchronization between ORT’s CUDA kernels and the custom CUDA kernels, they must all use the same CUDA compute stream. To ensure this, you may first create a CUDA stream and pass it to the underlying Session via SessionOptions (use OrtCudaProviderOptions struct). This will ensure ORT’s CUDA kernels use that stream and if the custom CUDA kernels are launched using the same stream, synchronization is now taken care of implicitly.

For a sample, please see how the afore-mentioned MyCustomOp is being launched and how the Session using this custom op is created.

Custom operators

Contents

Register a custom operator

Examples

CUDA custom ops