Faster and Lighter Model Inference with ONNX Runtime from Cloud to Client

Created by Mudabir Qamar Ansari, Modified on Sun, 13 Dec, 2020 at 12:44 PM by Mudabir Qamar Ansari

ONNX Runtime is a high-performance inferencing and training engine for machine learning models. ONNX Runtime has been widely adopted by a variety of Microsoft products including Bing, Office 365 and Azure Cognitive Services, achieving an average of 2.9x inference speedup. Now we are glad to introduce ONNX Runtime quantization and ONNX Runtime mobile for further accelerating model inference with even smaller model size and runtime size. ONNX Runtime keeps evolving not only for cloud-based inference but also for on-device inference.

For more tips like this, check out the working remotely playlist at www.youtube.com/FoetronAcademy.

Also, if you need any further assistance then you can raise a support ticket (https://cloud.foetron.com/) and get it addressed.