ONNX Runtime C# API
The ONNX runtime provides a C# .NET binding for running inference on ONNX models in any of the .NET standard platforms.
Contents
- Supported Versions
- Builds
- API Reference
- Reuse input/output tensor buffers
- Running on GPU (Optional)
- Samples
Supported Versions
.NET standard 1.1
Builds
Artifact | Description | Supported Platforms |
---|---|---|
Microsoft.ML.OnnxRuntime | CPU (Release) | Windows, Linux, Mac, X64, X86 (Windows-only), ARM64 (Windows-only)…more details: compatibility |
Microsoft.ML.OnnxRuntime.Gpu | GPU - CUDA (Release) | Windows, Linux, Mac, X64…more details: compatibility |
Microsoft.ML.OnnxRuntime.DirectML | GPU - DirectML (Release) | Windows 10 1709+ |
ort-nightly | CPU, GPU (Dev) | Same as Release versions |
API Reference
Reuse input/output tensor buffers
In some scenarios, you may want to reuse input/output tensors. This often happens when you want to chain 2 models (ie. feed one’s output as input to another), or want to accelerate inference speed during multiple inference runs.
Chaining: Feed model A’s output(s) as input(s) to model B
InferenceSession session1, session2; // let's say 2 sessions are initialized
Tensor<float> t1; // let's say data is fed into the Tensor objects
var inputs1 = new List<NamedOnnxValue>()
{
NamedOnnxValue.CreateFromTensor<float>("name1", t1)
};
// session1 inference
using (var outputs1 = session1.Run(inputs1))
{
// get intermediate value
var input2 = outputs1.First();
// modify the name of the ONNX value
input2.Name = "name2";
// create input list for session2
var inputs2 = new List<NamedOnnxValue>() { input2 };
// session2 inference
using (var results = session2.Run(inputs2))
{
// manipulate the results
}
}
Multiple inference runs with fixed sized input(s) and output(s)
If the model have fixed sized inputs and outputs of numeric tensors, you can use “FixedBufferOnnxValue” to accelerate the inference speed. By using “FixedBufferOnnxValue”, the container objects only need to be allocated/disposed one time during multiple InferenceSession.Run() calls. This avoids some overhead which may be beneficial for smaller models where the time is noticeable in the overall running time.
An example can be found at TestReusingFixedBufferOnnxValueNonStringTypeMultiInferences()
:
Running on GPU (Optional)
If using the GPU package, simply use the appropriate SessionOptions when creating an InferenceSession.
int gpuDeviceId = 0; // The GPU device ID to execute on
var session = new InferenceSession("model.onnx", SessionOptions.MakeSessionOptionWithCudaProvider(gpuDeviceId));