Summary
Storage Simulator Using Cellular Automata is loosely based on the principles of cellular automata (CA) to model the performance characteristics of data path in a vSAN cluster. In general, CA can used to model and study any complex system with number of elements operating in parallel having short range relationships that as whole exhibit emergent behavior. When simulating a storage stack, we are modelling transmission of data blocks across a network of hardware resources communicating with each other through various interconnects. These includes processors, caches, DRAM, SSDs HDDs, PCIe links, ethernet links etc.
When modelling an IO request such as read/write, vSAN software stack applies various functions as the data block moves through this network. These functions include, data replication, parity calculation, checksum, encryption, compression etc. Some of these can lead to IO amplification.
This Fling implements a standalone vSAN simulation utility to aid developers in getting ideal speed-of-light (SOL) performance of a given cluster. This can be used as a starting point to rapidly iterate various ideas/features by making small changes to simulator and quantifying its potential performance impact. It can also be used by customers/partners to identify potential bottlenecks of their deployment under various type of workloads.
Requirements
Simulator depends on the following packages
- Python 3.7 or higher
- Graphviz 2.4 or higher
Instructions
Installation
After downloading the package, run the following commands
-
unzip Cell-Sim-download.zip
-
cd cell-simulator
Usage
This runs the simulator for the specified platform, ‘superMicro-af8’, for a pre-determined duration on a pre-determined workload. The results of run are captured in graphical form in a PDF file, which looks like the following. This was a 2 node superMicro all-flash cluster running random read. Textual output provides cluster level stats and aggregated stats for each of hardware resource involved in data path. The graphical layout shows each resource in the data path and its relationship with its neighbors via interconnect.
S uperm icro AF-8:2U: servers = 2, cache_disks = 4, cap_disks = 12 R/W mix 100/0 @ 4096
2.747469824 GB/s (2.747469824R/0.000000000W)
670768.999999133 IOPS (670768.999999 133 R/0.000 000000 W)
Resource SERVER |
IOPS 0 |
GB/s 0.00 |
Busy % 0.00 |
Max |
Busy % 0.00 |
HUB 12x lOGbE |
0 |
0.00 |
o.oo |
|
o.oo |
CPU 24xXeon@2.3 |
1006130 |
5.59 |
4. 14 |
|
5.74 |
CACHE- DISK |
0 |
o.oo |
o.oo |
|
o.oo |
CAP - DISK |
670769 |
2.75 |
66.54 |
|
100.00 |
DRAM 4xDDR 4- 2666 |
1520178 |
6.23 |
1.82 |
|
2.16 |
CPU -> DRAM 4xDDR4-2666 |
1520178 |
6.23 |
1.82 |
|
2.16 |
CPU -> CPU 2xUP-I10 .4GT |
492416 |
2.20 |
2.65 |
|
4.33 |
CPU -> SSD 4xPCie3 |
0 |
o. oo |
o. oo |
|
o. oo |
SSD - > CPU 4xPCie3 |
0 |
o. oo |
o. oo |
|
o. oo |
NIC lxlOGbE |
0 |
o. oo |
o. oo |
|
o. oo |
CPU - > NIC 8xPCie3 |
335361 |
1.46 |
5.17 |
|
6.40 |
NIC -> CPU 8xPCie3 |
335361 |
1.46 |
5.17 |
|
6.96 |
NIC -> HUB lxlOGbE |
335361 |
1.46 |
29.59 |
|
36.64 |
HUB -> NIC lxlOGbE |
335361 |
1.46 |
29.59 |
|
39.81 |
IOCTRL 8xSAS3-12Gbs |
0 |
0 .00 |
0 .00 |
|
0 .00 |
CPU -> IOCTRL BxPCie3 |
0 |
0.04 |
0.39 |
|
0.39 |
IOCTRL -> CPU 8xPCie3 |
670769 |
2.75 |
19.35 |
|
19.38 |
EXP 24xSAS3-12Gbs |
0 |
0.00 |
0.00 |
|
0.00 |
IOCTLR -> EXP 8xSAS3-12Gbs |
0 |
0.04 |
0.18 |
|
0.18 |
EXP -> IOCTRL BxSAS3-12Gbs |
670769 |
2.75 |
11.45 |
|
11.47 |
EXP -> DISK lxSAS3- 12Gbs |
0 |
0 .04 |
0 .24 |
|
0 .36 |
DISK -> EXP lxSAS3-12Gbs |
670769 |
2.75 |
15.26 |
|
22.94 |
Custom Runs
Input arguments to the simulator including platform type, number of servers in a cluster, type of CPU, SSD, workload type, duration of simulation etc. are specified via json file. These files modified and additional files can be added to test different configurations. For example, after the initial installation, we see the following directory layout:
Each directory contains one or more json file(s) that describes a customizable property of the simulator. For example, if we look at system
directory, we see the following:
Here we see two files, superMicro-af8.json
and superMicro-af8e.json
that describes the SuperMicro All Flash Model from vSAN ReadNode Portal. These are the two platforms that show up in ‘--help’
menu of simulator. If we look inside one these files, we see several key-value pairs each describing some part of that platform. In above case, physical specification of each server is described by server:sys-2029u-e2crt.json.
When we look further in server
directory:
The sys-2029u-e2crt.json
file describes server specification. Some aspects of the server including cpu, nic etc are further specified in other files, where key specifies the name of the directory and the value specifies the name of the json file containing specification. Simulation can be run on other platform types by adding the json files describing the physical specification of those platforms.
Just like physical specification, workload characteristics, IO pattern and output format are specified in separate json files as well. For example, the vanilla installation contains the following parameters:
This describes simulation to be run against a RAID1 object with two mirrors (hft:1)
and checksum verification enabled (crc:1)
. There are 32 objects involved in the simulation and the simulation is run for a duration of ‘1’ sec. Note that duration is not wall clock time. Time is measured in terms of amount of data processed by each resource the time it takes to process that data. For example, if an SSD is specified at 550MB/sec for sequential read, and if simulation needs to run for 1sec, roughly of 550MB worth of read operations would be simulated, assuming SSD is the slowest resource in the cluster. Other support object types include RAID5 and RAID6.
Apart from the object type, workload type is specified in iops.json
file. In this case we are running 100% 4KB random read. The out.json
file specifies the format of output file.
Other supported formats include: svg, png, jpeg, json, ps.
Any changes to the data path including feature addition or alternate methods to connect the hardware resources requires changes to the simulator itself. Most of simulator logic resides in cell_sim.py
Video
Contributors
Similar Flings
No similar flings found. Check these out instead...

ESXi Arm Edition
We need to learn more about how our customer base or new customers are likely to use ESXi Arm edition for Edge use-cases.

Site Recovery Manager Mobile
Site Recovery Manager is a business continuity and disaster recovery solution that helps you plan, test and run the recovery of virtual machines.

Red Hat OpenShift Container Platform as a Service on vRealize Automation Cloud
This Fling enables a cloud admin to download the package, integrate with Cloud Assembly and other
Infrastructure services and be able to provide an “OpenShift Cluster as a Service” offering.

FlowGate
The goal of Flowgate is to make facility awareness in IT management system and make IT operations management and automation better on high availability, cost saving and improved sustainability, with more information on power, cooling, environment (e.g. humidity, temperature) and security.

Android Device Pre-Verification Suite
This Android Device Pre-Verification Suite Fling reduces the time to perform a preliminary test on any Android device from any OEM.

Desktop Watermark
Desktop Watermark is a Windows native application that adds a watermark to a desktop for Virtual Desktop Infrastructure (VDI) auditing or exhibition purposes. A watermark has the ability to be visible or invisible. Invisible watermarks, seen in the screenshot, can be revealed by a tool bundled in the Fling. The tool should be configured by an administrator and enforced on the end user's desktop.