Link Search Menu Expand Document

How to develop a mobile application with ONNX Runtime

ONNX Runtime gives you a variety of options to add machine learning to your mobile application. This page outlines the general flow through the development process. You can also check out the tutorials in this section:

ONNX Runtime mobile application development flow

Steps to build for mobile platforms

  1. Which ONNX Runtime mobile library should I use?

    We publish the following ONNX Runtime mobile libraries:

    • Android C/C++
    • Android Java
    • iOS C/C++
    • iOS Objective C
  2. Which machine learning model does my application use?

    You need to understand your mobile app’s scenario and get an ONNX model that is appropriate for that scenario. For example, does the app classify images, do object detection in a video stream, summarize or predict text, or do numerical prediction.

    ONNX models can be obtained from the ONNX model zoo, converted from PyTorch or TensorFlow, and many other places.

    Once you have sourced or converted the model into ONNX format, there is a further step required to optimize the model for mobile deployments. Convert the model to ORT format for optimized model binary size, faster initialization and peak memory usage.

  3. How do I bootstrap my app development?

    If you are starting from scratch, bootstrap your mobile application according in your mobile framework XCode or Android Development Kit. TODO check this.

    a. Add the ONNX Runtime dependency b. Consume the onnxruntime API in your application c. Add pre and post processing appropriate to your application and model

  4. How do I optimize my application?

    The execution environment on mobile devices has fixed memory and disk storage. Therefore, it is essential that any AI execution library is optimized to consume minimum resources in terms of disk footprint, memory and network usage (both model size and binary size).

    ONNX Runtime Mobile uses the ORT model format which enables us to create a custom ORT build that minimizes the binary size and reduces memory usage for client side inference. The ORT model format file is generated from the regular ONNX model using the onnxruntime python package. The custom build does this primarily by only including specified operators and types in the build, as well as trimming down dependencies per custom needs.


Table of contents