Summary
Mjolnir is a python utility package that helps in performing fault injections on remote hosts. It is a lightweight python wrapper that consumes VMware Mangle REST APIs in backend to inject or remediate a fault on a host machine.
Mangle is opensource product developed by VMware, which enables end-user to run chaos engineering. This product can be accessed or used with the help of both UI and API (Learn more about Mangle here Mangle Documentation). But it would be tedious for end-user to integrate this product into their automation framework and their development pipeline. End-user will have to write their integration code specific to their framework. But using Mjolnir, end-user will be able to perform any fault injections with just 3 steps in their framework.
This way can achieve the following Benefits:
- We can thoroughly test our applications and systems in our automation.
- We can better identify the nature and cause of production failures.
- We can prepare for the unexpected.
This utility is platform independent (tested on Linux and Windows) and would be distributed as a python’s wheel package which can be easily installed using python's native pip3 command.
Requirements
- Python 3.6 or above
- Require paramiko, requests and urllib3 python packages to be installed
- Deploy and install Mangle server (Deployment Guide)
Instructions
Installation instructions:
- Make sure the requirements are met.
- Download the Mjolnir python package(.whl) from flings.vmware.com .
- Its recommended to use Python virtual environment for the wheel package installation. (Learn more about python venv here https://docs.python.org/3.7/library/venv.html?highlight=pyvenv)
- Create python virtual environment using ‘venv’ module (Use python3.6 or above)
- Activate python virtual environment
# Create virtual environment $ python3.6 -m venv mjolnir-env $ ls mjolnir-env/ bin include lib lib64 pyvenv.cfg share # Activate mjolnir virtual environment $ source mjolnir-env/bin/activate # Check pip version in virtual environment (mjolnir-env) $ pip --version pip 9.0.1 from /root/mjolnir-env/lib/python3.6/site-packages (python 3.6) (mjolnir-env) $
- Install wheel package using python’s native pip (Ensuring pip version points to python 3.6 or above)
# Install Mjolnir wheel package
(flings-env)$ pip3 install vmware_mjolnir-1.0.19649893-py2.py3-none-any.whl
Mjolnir Usage:
- Once the package is installed, you can follow the below steps to inject.
- Import mjolnir package
- Configuring mangle server
- Inject faults in desired machines
- Remidiate injected faults
- Sample snippet:
import logging import sys import time
# import mjolnir package import vmware.mjolnir as mjolnir
logging.root.setLevel(logging.INFO) logger = logging.getLogger() logger.addHandler(logging.StreamHandler(sys.stdout))
logger.info("configuring mangle server") mjolnir.configure_mangle_server(MANGLE_IP, MANGLE_USERNAME,PASSWORD, PORT)
# Specify host machines on which faults to be injected machines = [{'ip': HOST_IP, 'username': USERNAME, 'password': PASSWORD, 'ssh_port': PORT}]
logger.info("injecting faults in the above machines") task_id_lst = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="CPU", cpuload=60, timeout=120)
# Place holder for your code (that you want to perform when machine has fault) time.sleep(50)
logger.info("clearing faults based on the taskids returned") mjolnir.clear_faults(task_id_lst)
- INFRA Fault types:
- MEMORY Fault
task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="MEMORY", memoryload=60, timeout=120)
- DISKIO Fault
# iosize: (bytes) To write in blocks of 5 KB to the disk of the specified VSMS specify the IO Size as 5120 (5 KB = 5120 bytes). Max supported value is 5MB (5 MB = 5120 * 1024 bytes) # target_dir: specific directory location or partition to write to for simulating the DISK IO task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="DISKIO", iosize=5120, target_dir="/config", timeout=120)
- KILL PROCESS Fault
# pid task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="PROCESSKILL", process_id=5161) # process descriptor task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="PROCESSKILL", process_descriptor="com.vmware.nsx.cbm.Main")
- STOP SERVICE Fault
# INFO:: enables graceful shutdown of any process that is running on the specified VSMS using the appropriate stop commands task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="STOPSVC", svc_name="corfu-server", timeout=120)
- FILE HANDLER LEAK Fault
# INFO:: enables you to simulate conditions where a program requests for a handle to a resource but does not release it when the resource is no longer in use. This condition if left over extended periods of time, will lead to "Too many open file handles" errors and will cause performance degradation or crashes task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="FILEHANDLERLEAK", timeout=120) # NOTE: # a) Clear Fault(Remediation) not supported for this fault.
- DISK SPACE Fault
# diskload: 80 to simulate a Disk usage of 80% of the total disk size or space allocated for a partition # target_dir: specific directory location or partition to write to for simulating the DISK FAULT task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="DISKSPACE", diskload=80, target_dir="/config", timeout=120)
- KERNEL PANIC Fault
# INFO:: simulates conditions where the operating system abruptly stops to prevent further damages, security breaches or data corruption task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="KERNELPANIC", timeout=120) # NOTE: # a) Clear Fault(Remediation) not supported for this fault. # b) After injecting Kernel fault end-user need to turned-on the machine by own.
- CLOCK SKEW Fault
# INFO:: simulates conditions where the endpoint time is distorted and doesn't align with the standard NTP time. The skew can be in 'seconds', 'minutes', 'hours' or 'days' as specified at the time of running the fault # clock_skew_oper:: PAST task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="CLOCKSKEW", clock_skew_oper="PAST", seconds=120, minutes=0, hours=0, days=0, timeout=120) # clock_skew_oper:: FUTURE task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="CLOCKSKEW", clock_skew_oper="FUTURE", seconds=60, minutes=0, hours=0, days=0, timeout=120)
- NETWORK PARTITION Fault
# hosts: host IP or a list of host IPs to which the endpoint should lose network connectivity due to network partition task_id = mjolnir.inject_fault(topology, [topology.testbed.vsms[0]], fault_type="INFRA", fault_sub_type="NWPARTITION", hosts=[topology.testbed.vsms[1].ip, topology.testbed.vsms[2].ip], timeout=120)
- NETWORK: PACKET DELAY
# latency: (millisecond) simulate a packet delay of 150ms on a particular network interface of VSMS # nic-name: could be eth0, eth1, br0 etc depending on what adapter you would want to target for the fault task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="NETWORK", nw_fault_type='DELAY', latency=150, nicname='eth0', timeout=120)
- NETWORK: PACKET DUPLICATE Fault
# percentage: (%)value to specify what percentage of the packets should be duplicated For e.g: 10 to simulate a packet duplication of 10 percentage on a particular network interface of VSM # nic-name: could be eth0, eth1, br0 etc depending on what adapter you would want to target for the fault task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="NETWORK", nw_fault_type='DUPLICATE', percentage=10, nicname='eth0', timeout=120)
- NETWORK: PACKET LOSS Fault
# percentage: (%)value to specify what percentage of the packets should be dropped For e.g: 10 to simulate a packet drop of 10 percentage on a particular network interface of VSM # nic-name: could be eth0, eth1, br0 etc depending on what adapter you would want to target for the fault task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="NETWORK", nw_fault_type='LOSS', percentage=10, nicname='eth0', timeout=120)
- NETWORK: PACKET CORRUPT Fault
# percentage: (%)value to specify what percentage of the packets should be corrupted For e.g: 10 to simulate a packet corruption of 10 percentage on a particular network interface of VSM # nic-name: could be eth0, eth1, br0 etc depending on what adapter you would want to target for the fault task_id = mjolnir.inject_generic_fault(machines, fault_type="INFRA", fault_sub_type="NETWORK", nw_fault_type='CORRUPT', percentage=10, nicname='eth0', timeout=120)
Changelog
Similar Flings
No similar flings found. Check these out instead...

PowerCLI for VMware Cloud on AWS
This Fling provides a community preview of the upcoming PowerCLI commands for managing VMware Cloud on AWS. It comes in the form of a single PowerCLI module and integrates with existing PowerCLI modules.

Tech For Good - Virtual Reality Experience
Download this Virtual Reality Application for the Oculus Quest and Oculus Go, you will watch this VR experience hosted by VMware, Bask Iyer, CIO and Chief Digital Transformation officer, as he walks us through 4 key technologies Cloud, Mobile, IoT and AI and illustrates new opportunities for technology to deliver a positive impact on society.

Horizon Service Installer for NSX
This VMware Fling is an easy-to-use utility that inserts Horizon View services into NSX, and then combines them into Service Groups.
Horizon View Events Database Export Utility
This utility allows administrators to easily apply very detailed filtering to the data and export it to a .CSV file. You can filter on time range, event severity, event source, session type (application or desktop), usernames and event types.

DRS Dump Insight
DRS Dump Insight is a service portal where users can upload drmdump files and it provides a summary of the DRS run, with a breakup of all the possible moves along with the changes in ESX hosts resource consumption before and after DRS run.

vCenter Cluster Performance Tool
vCenter Cluster Performance Tool is a Powershell script that uses vSphere PowerCLI to obtain performance data for a cluster by aggregating information from individual hosts.