ONNX Runtime C# API

The ONNX runtime provides a C# .NET binding for running inference on ONNX models in any of the .NET standard platforms.

Supported Versions
Builds
API Reference
Reuse input/output tensor buffers
- Chaining: Feed model A’s output(s) as input(s) to model B
- Multiple inference runs with fixed sized input(s) and output(s)
Running on GPU (Optional)
Samples

Supported Versions

.NET standard 1.1

Builds

Artifact	Description	Supported Platforms
Microsoft.ML.OnnxRuntime	CPU (Release)	Windows, Linux, Mac, X64, X86 (Windows-only), ARM64 (Windows-only)…more details: compatibility
Microsoft.ML.OnnxRuntime.Gpu	GPU - CUDA (Release)	Windows, Linux, Mac, X64…more details: compatibility
Microsoft.ML.OnnxRuntime.DirectML	GPU - DirectML (Release)	Windows 10 1709+
ort-nightly	CPU, GPU (Dev)	Same as Release versions

API Reference

C# API Reference

Reuse input/output tensor buffers

In some scenarios, you may want to reuse input/output tensors. This often happens when you want to chain 2 models (ie. feed one’s output as input to another), or want to accelerate inference speed during multiple inference runs.

Chaining: Feed model A’s output(s) as input(s) to model B

InferenceSession session1, session2;  // let's say 2 sessions are initialized

Tensor<float> t1;  // let's say data is fed into the Tensor objects
var inputs1 = new List<NamedOnnxValue>()
              {
                  NamedOnnxValue.CreateFromTensor<float>("name1", t1)
              };
// session1 inference
using (var outputs1 = session1.Run(inputs1))
{
    // get intermediate value
    var input2 = outputs1.First();
    
    // modify the name of the ONNX value
    input2.Name = "name2";

    // create input list for session2
    var inputs2 = new List<NamedOnnxValue>() { input2 };

    // session2 inference
    using (var results = session2.Run(inputs2))
    {
        // manipulate the results
    }
}

Multiple inference runs with fixed sized input(s) and output(s)

If the model have fixed sized inputs and outputs of numeric tensors, you can use “FixedBufferOnnxValue” to accelerate the inference speed. By using “FixedBufferOnnxValue”, the container objects only need to be allocated/disposed one time during multiple InferenceSession.Run() calls. This avoids some overhead which may be beneficial for smaller models where the time is noticeable in the overall running time.

An example can be found at TestReusingFixedBufferOnnxValueNonStringTypeMultiInferences():

Microsoft.ML.OnnxRuntime.Tests/InferenceTest.cs#L1047

Running on GPU (Optional)

If using the GPU package, simply use the appropriate SessionOptions when creating an InferenceSession.

int gpuDeviceId = 0; // The GPU device ID to execute on
var session = new InferenceSession("model.onnx", SessionOptions.MakeSessionOptionWithCudaProvider(gpuDeviceId));

Samples

See Tutorials: Basics - C#