Please provide your feedback in this short Flings' survey.

NUMA Observer

version 1.0 — August 19, 2021

Release Date: July 27, 2021

Summary

VMs such as SAP HANA and other big databases and critical applications run with large resource footprints and often have strict requirements around latency. Setting core and NUMA node affinities is recommended for such large critical applications where low latency is important.

While admins may configure large critical VMs with affinities to unique logical cores or NUMA nodes, maintenance and HA events can change this unique mapping. An HA event would migrate VMs to other hosts with spare capacity and those hosts may already be running VMs affined to the same cores or sockets. This results in multiple VMs constrained/scheduled to the same set of logical cores. These overlapping affinities may result in a CPU contention and/or non-local allocation of memory.

The NUMA Observer Fling scans your VM inventory and identifies VMs with overlapping core/NUMA affinities and generates alerts. Additionally, the Fling also collects statistics on remote memory usage and CPU starvation of critical VMs and raises alerts.





Requirements
  • Java 8
  • ESXi
  • vCenter
Instructions

Running the tool: Everything comes packaged in a jar and you just need to trigger the jar with java -jar <jar name>.

                             Once launched, the tool UI can be accessed by pointing the browser to localhost:8443

Configuration: The tool can be configured from the UI or can be configured with a file config.properties that is placed in the same directory as the jar. Following are the configuration parameters for the tool,

                        runMode - supports 2 modes namely VC and ESX

                        We can have multiple credentials ip2, user2, pass2 etc for ESX mode

                        ip1 - VC/ESX IP

                       user1 - VC/ESX username

                       pass1 - VC/ESX password

                       numThreads - Concurrency of the tool

                       maxRemoteMemGB - NUMA remote memory usage threshold for generating alerts (GB)

                       maxRemoteMemPercent - % of NUMA remote memory usage threshold for generating alerts

                       maxCpuReadyTime - Average CPU ready time threshold for generating alerts (ms)

                       minHanaMemGB - Configured memory threshold to check if a VM is HANA VM (GB)