Supernova - Accelerating Machine Learning Inference
With machine learning is widely used in enterprises, big data are trained on the edge, inference services go to production either in the cloud or on the edge.
On the edge
- Edge devices have limited resources, space and power supply
- Edge servers cost much higher than devices
- Hardware accelerators are heterogeneous in architecture and various on interfaces and performance on the edge
In the cloud
- Accelerator market is dominated by Nvidia GPU
- Other options come as AMD GPU, Intel Habana Goya/Altera FPGA, AWS Inferentia, Xilinx FPGA etc
- Common inference interfaces from cloud to edge doesn’t appear generally
- Limitation on specific hardware accelerators or cloud leads to new vendor lock-in
Project Supernova is to build a common machine learning inference service framework by enabling machine learning inference accelerators across edge endpoint devices, edge systems and cloud, with or without hardware accelerators.
- Micro-service based architecture with Restful API
- Support heterogenous system architectures from leading vendors
- Support accelerator compilers to native code
- Neutral to ML training framework file formats
- Work on both edge devices and clouds
- Hardware CPU support:
- x86-64, ARM64
- Hardware accelerator support:
- Intel VPU, Google Edge TPU, Nvidia GPU
- Inference toolkit support: OpenVINO, TensorRT & Tenserflow Lite
- Training framework data format: Tensorflow, Caffe, ONNX, MxNet
The common computing platforms including most resource-constrained edge system, PC, server, etc, where can deploy Linux/Docker.