Virtualized High Performance Computing Toolkit
Products:
High Performance Computing (HPC) is the use of parallel-processing techniques to solve complex computational problems. HPC systems have the ability to deliver sustained performance through the concurrent use of distributed computing resources,and they are typically used for solving advanced scientific and engineering problems, such as computational fluid dynamics, bioinformatics, molecular dynamics, weather modeling and deep learning with neural networks.
Due to their extreme demand on performance, HPC workloads often have much more intensive resource requirements than those workloads found in the typical enterprise. For example, HPC commonly leverages hardware accelerators, such as GPU and FPGA for compute as well as RDMA interconnects, which require special vSphere configurations.
This toolkit is intended to facilitate managing the lifecycle of these special configurations by leveraging vSphere APIs. It also includes features that help vSphere administrators perform some common vSphere tasks that are related to creating such high-performing environments, such as VM cloning, setting Latency Sensitivity, and sizing vCPUs, memory, etc.
Feature Highlights:
- Configure PCIe devices in DirectPath I/O mode, such as GPGPU, FPGA and RDMA interconnects
- Configure NVIDIA vGPU
- Configure RDMA SR-IOV (Single Root I/O Virtualization)
- Configure PVRDMA (Paravirtualized RDMA)
- Easy creation and destruction of virtual HPC clusters using cluster configuration files
- Perform common vSphere tasks, such as cloning VMs, configuring vCPUs, memory, reservations, shares, Latency Sensitivity, Distributed Virtual Switch/Standard Virtual Switch, network adapters and network configurations
There are two major functions in this toolkit, described below. Configuration of vHPC Environments
Using this toolkit, we can easily apply the following operations to a single VM or a list of VMs:
- Configure PCIe devices in DirectPath I/O mode, such as GPU, FPGA and RDMA interconnects
- Configure NVIDIA vGPU
- Configure RDMA SR-IOV (Single Root I/O Virtualization)
- Configure PVRDMA (Paravirtualized RDMA)
- Perform common vSphere tasks, such as cloning VMs, configuring vCPUs, memory, reservations, shares, Latency Sensitivity, Distributed Virtual Switch/Standard Virtual Switch, network adapters and network configurations
For example, cloning four VMs based on a template named vhpc_clone with specified CPU and memory customization and adding NVIDIA vGPU with vGPU profile grid_p100-4q into each VM can be done with two commands:
vhpc_toolkit> clone --template vhpc_clone --datacenter HPC_Datacenter --cluster COMPUTE_GPU_Cluster --datastore COMPUTE01_vsanDatastore --memory 8 --cpu 8 –-file VM-file
vhpc_toolkit> vgpu --add --profile grid_p100-4q --file VM-file
where VM-file is name of the file containing a list of VMs, one per line.
vHPC Cluster Creation and Destruction using a Configuration FileThis function can help vSphere administrators create/destroy virtual HPC clusters using a cluster configuration file as input. For example, creating a cluster based on the cluster configuration file cluster.conf:
vhpc_toolkit> cluster --create --file cluster.conf
The cluster configuration file allows you to easily define a HPC/ML cluster with VMs with all kinds of special attributes. For more details, you are welcome to read the project README.md.
ExtensibilityThe toolkit is also built with extensibility in mind. It is easy to add additional operations that are currently not supported.
- OS for using this toolkit: Linux or Mac
- vSphere >=6.5
- Python >=3
There are two major functions in this toolkit, described below. Configuration of vHPC Environments
Using this toolkit, we can easily apply the following operations to a single VM or a list of VMs:
- Configure PCIe devices in DirectPath I/O mode, such as GPU, FPGA and RDMA interconnects
- Configure NVIDIA vGPU
- Configure RDMA SR-IOV (Single Root I/O Virtualization)
- Configure PVRDMA (Paravirtualized RDMA)
- Perform common vSphere tasks, such as cloning VMs, configuring vCPUs, memory, reservations, shares, Latency Sensitivity, Distributed Virtual Switch/Standard Virtual Switch, network adapters and network configurations
For example, cloning four VMs based on a template named vhpc_clone with specified CPU and memory customization and adding NVIDIA vGPU with vGPU profile grid_p100-4q into each VM can be done with two commands:
vhpc_toolkit> clone --template vhpc_clone --datacenter HPC_Datacenter --cluster COMPUTE_GPU_Cluster --datastore COMPUTE01_vsanDatastore --memory 8 --cpu 8 –-file VM-file
vhpc_toolkit> vgpu --add --profile grid_p100-4q --file VM-file
where VM-file is name of the file containing a list of VMs, one per line.
vHPC Cluster Creation and Destruction using a Configuration FileThis function can help vSphere administrators create/destroy virtual HPC clusters using a cluster configuration file as input. For example, creating a cluster based on the cluster configuration file cluster.conf:
vhpc_toolkit> cluster --create --file cluster.conf
The cluster configuration file allows you to easily define a HPC/ML cluster with VMs with all kinds of special attributes. For more details, you are welcome to read the project README.md.