Please provide your feedback in this short Flings' survey.

NUMA Observer

version 1.0 — August 19, 2021

Contributors 3
View All
Comments 11
View All

Release Date: July 27, 2021

Summary

VMs such as SAP HANA and other big databases and critical applications run with large resource footprints and often have strict requirements around latency. Setting core and NUMA node affinities is recommended for such large critical applications where low latency is important.

While admins may configure large critical VMs with affinities to unique logical cores or NUMA nodes, maintenance and HA events can change this unique mapping. An HA event would migrate VMs to other hosts with spare capacity and those hosts may already be running VMs affined to the same cores or sockets. This results in multiple VMs constrained/scheduled to the same set of logical cores. These overlapping affinities may result in a CPU contention and/or non-local allocation of memory.

The NUMA Observer Fling scans your VM inventory and identifies VMs with overlapping core/NUMA affinities and generates alerts. Additionally, the Fling also collects statistics on remote memory usage and CPU starvation of critical VMs and raises alerts.





Requirements
  • Java 8
  • ESXi
  • vCenter
Instructions
Running the tool: 
* Everything comes packaged in a jar and you just need to trigger the jar with java -jar name_of_the_jar.
* Once launched, the tool UI can be accessed by pointing the browser to localhost:8443

Note:
* When connecting to VC/ESX instances that provide a self-signed certificate from NUMA Observer, please use the "-sslNoVerify" command line argument.
* Please use java -jar NUMA-Observer-version.jar -h, for instructions.

Configuration: 
The tool can be configured from the UI or can be configured with a file config.json that is placed in the same directory as the jar.

Following is the sample config.json provided,

{
  "runMode": "VC", 
  "credentials": [
    {
      "ip": "xxx",
      "pwd": "xxx",
      "user": "xxxx"
    }
  ],
  "numThreads": 64,
  "minMonsterVmMemGB": 1,
  "maxRemoteMemGB": 1,
  "maxRemoteMemPercent": 1,
  "maxCpuReadyTime": 100,

  "definedMonsterVmTags": {
    "HANA" : true
  },

  "customMonsterVmTagList": [
    {
      "label": "Cassandra",
      "matchCriteria": {
        "vmNamePattern": "cas-[0-9]+-*"
      }
    },
    {
      "label": "Oracle",
      "matchCriteria": {
        "vmNamePattern": "orc-*"
      }
    },
    {
      "label": "HANA_Simulated",
      "matchCriteria": {
        "vmNamePattern": "[Hh]ana.*"
      }
    },
    {
      "label": "HANA_Template",
      "matchCriteria": {
        "vmNamePattern": "HanaTemplate"
      }
    }
  ]
}


* runMode - supports 2 modes namely VC and ESX
* credentials - list of credentials (one entry for VC mode, one or more for ESX mode)
* numThreads - Concurrency of the tool
* maxRemoteMemGB - NUMA remote memory usage threshold for generating alerts (GB)
* maxRemoteMemPercent - % of NUMA remote memory usage threshold for generating alerts
* maxCpuReadyTime - Average CPU ready time threshold for generating alerts (ms)
* minHanaMemGB - Configured memory threshold to check if a VM is HANA VM (GB)
* definedMonsterVmTagList - Pre-defined tags (do not modify this)
* customMonsterVmTagList - List of custom defined tags