Sign Up for the Quarterly Newsletter

Virtualized High Performance Computing Toolkit

Products:
This is an Open Source fling. Find out about VMware Open Source initiative on https://www.vmware.com/opensource.html

High Performance Computing (HPC) is the use of parallel-processing techniques to solve complex computational problems. HPC systems have the ability to deliver sustained performance through the concurrent use of distributed computing resources,and they are typically used for solving advanced scientific and engineering problems, such as computational fluid dynamics, bioinformatics, molecular dynamics, weather modeling and deep learning with neural networks.

Due to their extreme demand on performance, HPC workloads often have much more intensive resource requirements than those workloads found in the typical enterprise. For example, HPC commonly leverages hardware accelerators, such as GPU and FPGA for compute as well as RDMA interconnects, which require special vSphere configurations.

This toolkit is intended to facilitate managing the lifecycle of these special configurations by leveraging vSphere APIs. It also includes features that help vSphere administrators perform some common vSphere tasks that are related to creating such high-performing environments, such as VM cloning, setting Latency Sensitivity, and sizing vCPUs, memory, etc.

Feature Highlights:

  • Configure PCIe devices in DirectPath I/O mode, such as GPGPU, FPGA and RDMA interconnects
  • Configure NVIDIA vGPU
  • Configure RDMA SR-IOV (Single Root I/O Virtualization)
  • Configure  PVRDMA (Paravirtualized RDMA)
  • Easy creation and  destruction of virtual HPC clusters using cluster configuration files
  • Perform common vSphere tasks, such as cloning VMs, configuring vCPUs, memory, reservations, shares, Latency Sensitivity, Distributed Virtual Switch/Standard Virtual Switch, network adapters and network configurations
How Does vHPC Toolkit Work?

There are two major functions in this toolkit, described below. Configuration of vHPC Environments  

Using this toolkit, we can easily apply the following operations to a single VM or a list of VMs:

  • Configure PCIe devices in DirectPath I/O mode, such as GPU, FPGA and RDMA interconnects
  • Configure NVIDIA vGPU
  • Configure RDMA SR-IOV (Single Root I/O Virtualization)
  • Configure PVRDMA (Paravirtualized RDMA)
  • Perform common vSphere tasks, such as cloning VMs, configuring vCPUs, memory, reservations, shares, Latency Sensitivity, Distributed Virtual Switch/Standard Virtual Switch, network adapters and network configurations

For example, cloning four VMs based on a template named vhpc_clone with specified CPU and memory customization and adding NVIDIA vGPU with vGPU profile grid_p100-4q into each VM can be done with two commands:

vhpc_toolkit> clone --template vhpc_clone --datacenter HPC_Datacenter --cluster COMPUTE_GPU_Cluster --datastore COMPUTE01_vsanDatastore --memory 8 --cpu 8 –-file VM-file

vhpc_toolkit> vgpu --add --profile grid_p100-4q --file VM-file

where VM-file is name of the file containing a list of VMs, one per line.

vHPC Cluster Creation and Destruction using a Configuration File

This function can help vSphere administrators create/destroy virtual HPC clusters using a cluster configuration file as input. For example, creating a cluster based on the cluster configuration file cluster.conf:

vhpc_toolkit> cluster --create --file cluster.conf

The cluster configuration file allows you to easily define a HPC/ML cluster with VMs with all kinds of special attributes. For more details, you are welcome to read the project README.md.

Extensibility

The toolkit is also built with extensibility in mind. It is easy to add additional operations that are currently not supported.

This is an Open Source fling. Find out about VMware Open Source initiative on https://www.vmware.com/opensource.html
  • OS for using this toolkit: Linux or Mac
  • vSphere >=6.5
  • Python >=3
This is an Open Source fling. Find out about VMware Open Source initiative on https://www.vmware.com/opensource.html
How Does vHPC Toolkit Work?

There are two major functions in this toolkit, described below. Configuration of vHPC Environments  

Using this toolkit, we can easily apply the following operations to a single VM or a list of VMs:

  • Configure PCIe devices in DirectPath I/O mode, such as GPU, FPGA and RDMA interconnects
  • Configure NVIDIA vGPU
  • Configure RDMA SR-IOV (Single Root I/O Virtualization)
  • Configure PVRDMA (Paravirtualized RDMA)
  • Perform common vSphere tasks, such as cloning VMs, configuring vCPUs, memory, reservations, shares, Latency Sensitivity, Distributed Virtual Switch/Standard Virtual Switch, network adapters and network configurations

For example, cloning four VMs based on a template named vhpc_clone with specified CPU and memory customization and adding NVIDIA vGPU with vGPU profile grid_p100-4q into each VM can be done with two commands:

 vhpc_toolkit> clone --template vhpc_clone --datacenter HPC_Datacenter --cluster COMPUTE_GPU_Cluster --datastore COMPUTE01_vsanDatastore --memory 8 --cpu 8 –-file VM-file

 vhpc_toolkit> vgpu --add --profile grid_p100-4q --file VM-file

 where VM-file is name of the file containing a list of VMs, one per line.

vHPC Cluster Creation and Destruction using a Configuration File

 This function can help vSphere administrators create/destroy virtual HPC clusters using a cluster configuration file as input. For example, creating a cluster based on the cluster configuration file cluster.conf:

vhpc_toolkit> cluster --create --file cluster.conf

 The cluster configuration file allows you to easily define a HPC/ML cluster with VMs with all kinds of special attributes. For more details, you are welcome to read the project README.md.