Please provide your feedback in this short Flings' survey.
fling logo of Mjolnir : Automation Library for VMware Mangle

Mjolnir : Automation Library for VMware Mangle

version 1.0 — April 27, 2022

Summary

Mjolnir is a python utility package that helps in performing fault injections on remote hosts. It is a lightweight python wrapper that consumes VMware Mangle REST APIs in backend to inject or remediate a fault on a host machine.


Mangle is opensource product developed by VMware, which enables end-user to run chaos engineering. This product can be accessed or used with the help of both UI and API (Learn more about Mangle here Mangle Documentation). But it would be tedious for end-user to integrate this product into their automation framework and their development pipeline. End-user will have to write their integration code specific to their framework. But using Mjolnir, end-user will be able to perform any fault injections with just 3 steps in their framework.


This way can achieve the following Benefits:
  1. We can thoroughly test our applications and systems in our automation.
  2. We can better identify the nature and cause of production failures.
  3. We can prepare for the unexpected.

This utility is platform independent (tested on Linux and Windows) and would be distributed as a python’s wheel package which can be easily installed using python's native pip3 command.

Requirements

  • Python 3.6 or above
  • Require paramiko, requests and urllib3 python packages to be installed
  • Deploy and install Mangle server (Deployment Guide)

Instructions

Installation instructions:

  • Make sure the requirements are met.
  • Download the Mjolnir python package(.whl) from flings.vmware.com .
  • Its recommended to use Python virtual environment for the wheel package installation. (Learn more about python venv here https://docs.python.org/3.7/library/venv.html?highlight=pyvenv)
    1. Create python virtual environment using ‘venv’ module (Use python3.6 or above)
    2. Activate python virtual environment
    3.             
      # Create virtual environment
      $ python3.6 -m venv mjolnir-env
      $ ls mjolnir-env/
      bin  include  lib  lib64  pyvenv.cfg  share
      
      # Activate mjolnir virtual environment
      $ source mjolnir-env/bin/activate
      
      # Check pip version in virtual environment
      (mjolnir-env) $ pip --version
      pip 9.0.1 from /root/mjolnir-env/lib/python3.6/site-packages (python 3.6)
      (mjolnir-env) $
                  
                

instruction1.png

  • Install wheel package using python’s native pip (Ensuring pip version points to python 3.6 or above)
        
        # Install Mjolnir wheel package
        (flings-env)$ pip3 install vmware_mjolnir-1.0.19649893-py2.py3-none-any.whl
        
    

instruction2.png


Mjolnir Usage:

  • Once the package is installed, you can follow the below steps to inject.
    1. Import mjolnir package
    2. Configuring mangle server
    3. Inject faults in desired machines
    4. Remidiate injected faults

  • Sample snippet:
        
        import logging
        import sys
        import time
    # import mjolnir package import vmware.mjolnir as mjolnir
    logging.root.setLevel(logging.INFO) logger = logging.getLogger() logger.addHandler(logging.StreamHandler(sys.stdout))
    logger.info("configuring mangle server") mjolnir.configure_mangle_server(MANGLE_IP, MANGLE_USERNAME,PASSWORD, PORT)
    # Specify host machines on which faults to be injected machines = [{'ip': HOST_IP, 'username': USERNAME, 'password': PASSWORD, 'ssh_port': PORT}]
    logger.info("injecting faults in the above machines") task_id_lst = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="CPU", cpuload=60, timeout=120)
    # Place holder for your code (that you want to perform when machine has fault) time.sleep(50)
    logger.info("clearing faults based on the taskids returned") mjolnir.clear_faults(task_id_lst)

  • INFRA Fault types:

    1. MEMORY Fault
                      
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="MEMORY", memoryload=60, timeout=120)
                      
                  
    2. DISKIO Fault
                      
      # iosize: (bytes) To write in blocks of 5 KB to the disk of the specified VSMS specify the IO Size as 5120 (5 KB = 5120 bytes).
                     Max supported value is 5MB (5 MB = 5120 * 1024 bytes)
      # target_dir: specific directory location or partition to write to for simulating the DISK IO
      
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="DISKIO", iosize=5120, target_dir="/config", timeout=120)
                      
                  
    3. KILL PROCESS Fault
                      
      # pid
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="PROCESSKILL", process_id=5161)
      
      # process descriptor
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="PROCESSKILL", process_descriptor="com.vmware.nsx.cbm.Main")
                      
                  
    4. STOP SERVICE Fault
                      
      # INFO:: enables graceful shutdown of any process that is running on the specified VSMS using the appropriate stop commands
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="STOPSVC", svc_name="corfu-server", timeout=120)
                      
                  
    5. FILE HANDLER LEAK Fault
                      
      # INFO:: enables you to simulate conditions where a program requests for a handle to a resource but does not release it when the resource is no longer in use.
                This condition if left over extended periods of time, will lead to "Too many open file handles" errors and will cause performance degradation or crashes
      
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="FILEHANDLERLEAK", timeout=120)
      
      # NOTE:
      # a) Clear Fault(Remediation) not supported for this fault.
                      
                  
    6. DISK SPACE Fault
                      
      # diskload:   80 to simulate a Disk usage of 80% of the total disk size or space allocated for a partition
      # target_dir: specific directory location or partition to write to for simulating the DISK FAULT
      
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="DISKSPACE", diskload=80, target_dir="/config", timeout=120)
                      
                  
    7. KERNEL PANIC Fault
                      
      # INFO:: simulates conditions where the operating system abruptly stops to prevent further damages, security breaches or data corruption
      
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="KERNELPANIC", timeout=120)
      
      # NOTE:
      # a) Clear Fault(Remediation) not supported for this fault.
      # b) After injecting Kernel fault end-user need to turned-on the machine by own.
                      
                  
    8. CLOCK SKEW Fault
                      
      # INFO:: simulates conditions where the endpoint time is distorted and doesn't align with the standard NTP time.
                The skew can be in 'seconds', 'minutes', 'hours' or 'days' as specified at the time of running the fault
      
      # clock_skew_oper:: PAST
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="CLOCKSKEW", clock_skew_oper="PAST", seconds=120, minutes=0, hours=0, days=0, timeout=120)
      
      # clock_skew_oper:: FUTURE
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="CLOCKSKEW", clock_skew_oper="FUTURE", seconds=60, minutes=0, hours=0, days=0, timeout=120)
                      
                  
    9. NETWORK PARTITION Fault
                      
      # hosts: host IP or a list of host IPs to which the endpoint should lose network connectivity due to network partition
      task_id = mjolnir.inject_fault(topology, [topology.testbed.vsms[0]], fault_type="INFRA", fault_sub_type="NWPARTITION", hosts=[topology.testbed.vsms[1].ip, topology.testbed.vsms[2].ip], timeout=120)
                      
                  
    10. NETWORK: PACKET DELAY
                      
      # latency: (millisecond) simulate a packet delay of 150ms on a particular network interface of VSMS
      # nic-name: could be eth0, eth1, br0 etc depending on what adapter you would want to target for the fault
      
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="NETWORK", nw_fault_type='DELAY', latency=150, nicname='eth0', timeout=120)
                      
                  
    11. NETWORK: PACKET DUPLICATE Fault
                      
      # percentage: (%)value to specify what percentage of the packets should be duplicated
                     For e.g: 10 to simulate a packet duplication of 10 percentage on a particular network interface of VSM
      # nic-name: could be eth0, eth1, br0 etc depending on what adapter you would want to target for the fault
      
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="NETWORK", nw_fault_type='DUPLICATE', percentage=10, nicname='eth0', timeout=120)
                      
                  
    12. NETWORK: PACKET LOSS Fault
                      
      # percentage: (%)value to specify what percentage of the packets should be dropped
                     For e.g: 10 to simulate a packet drop of 10 percentage on a particular network interface of VSM
      # nic-name: could be eth0, eth1, br0 etc depending on what adapter you would want to target for the fault
      
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="NETWORK", nw_fault_type='LOSS', percentage=10, nicname='eth0', timeout=120)
                      
                  
    13. NETWORK: PACKET CORRUPT Fault
                      
      # percentage: (%)value to specify what percentage of the packets should be corrupted
                     For e.g: 10 to simulate a packet corruption of 10 percentage on a particular network interface of VSM
      # nic-name: could be eth0, eth1, br0 etc depending on what adapter you would want to target for the fault
      
      task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="NETWORK", nw_fault_type='CORRUPT', percentage=10, nicname='eth0', timeout=120)
                      
                  

Changelog

Changed instruction 2 image

Similar Flings

No similar flings found. Check these out instead...
Feb 22, 2017
fling logo of Horizon Collector for Mac

Horizon Collector for Mac

version 1.2

Horizon Collector for Mac automates the collection and archiving of Horizon View Client logs, eliminating the need to manually identify and gather relevant log files. Horizon Collector also simplifies the process for enabling complete DEBUG logging, and can upload the logs to VMware Support for you. In addition to the application logs, this script will collect PCoIP, USB, RTAV, and ThinPrint logs. Recommended users of this script: VDI Administrators and end-users alike.

Mar 27, 2015
fling logo of Android vSphere Big Data Extensions Client

Android vSphere Big Data Extensions Client

version 1.0

Android vSphere Big Data Extensions Client is an Android application which provides vSphere BDE users a tool for monitoring and simple management of the vSphere BDE server.

Aug 10, 2015
fling logo of Extensible Debugging Tool for EPSec and NetX

Extensible Debugging Tool for EPSec and NetX

version 1.0

This Fling will help VMware partners, customers and developers to troubleshoot EPSec/NetX programs which integrate multiple components and subcomponents, where each of these components serves specific functionalities.

Feb 16, 2021
fling logo of Workspace ONE Discovery

Workspace ONE Discovery

version 1.2

Discovery provides you a view of the Managed device and can be used to help with troubleshooting.

Oct 29, 2020
fling logo of True SSO Diagnostic Utility

True SSO Diagnostic Utility

version 2.2

Horizon View True SSO uses Microsoft Enterprise Certificate Servers to issue certificates that are used when the user logs on to the desktop.

May 25, 2022
fling logo of Skyline CLI

Skyline CLI

version 1.0.2

The Skyline CLI enables customers to fully automate the deployment and operations of Skyline collectors.

View More