Please provide your feedback in this short Flings' survey.
Feb 23, 2020

Hi,

Want to run 2x tests, one that is within the limits and one that is beyond.
Clusters consists of an AF vSAN with 6x Hosts (each with 2x disk groups of 370GB Cache and 6TB capacity per disk group)

Can you recommend 2x tests configs I should run and what results should I look for/compare?

Thanks

Feb 23, 2020

It's difficult to make a recommendation because this depends on what you consider to be the limit. Latency comes to mind, but this could also be something else, like a minimum throughput per VM.

You can start with easyrun then run the same test profiles with additional VM and/or vmdk per VM and/or increasing the size of the vmdk until you reach your target "beyond the limit".

Feb 26, 2020

Ok so I will turn off Easy Run and start increasing the VM numbers and select 100ws-4k-70rdpct (else all at default) and monitor latency and throughput

How do I know if the result is good or not? Say if I run the test with 96VMs what latency and throughput range should I expect to see?

Feb 26, 2020

We can't give specific numbers because there are too many variables that can affect performance.

As a very, very broad rule of thumb for small block workloads we expect latency under 5 ms and a 95th percentile latency no more than 2-3 times average latency. If you are getting 100ms and/or a 95th percentile latency that is 10-20 times more, you are either having issues or pushing the system beyond it's resources.

Your best approach to understanding your environment is to start with a small workload, incrementally increasing the workload(s), and observing the trend for latency, throughput, and IOPS. Keep in mind that throughput, IOPS, and latency are (somewhat) inversely related (i.e. optimizing one comes at the cost of the others) and any system has finite resources (i.e. pushing a system beyond what it is capable won't yield more, just impact the other stats).

Feb 20, 2020

Hi,

Any chances to allow post configuration of the worker VMs ? I would like to add a NVMe Controller to the worker VMs post deployment, but it looks like even after making the required changes, when you restart the test and chose "Keep VMs", HCI Bench deletes them.
Is there any workaround to allow testing on modified worker VMs ?

Thanks

Feb 20, 2020
Feb 21, 2020

Thanks for suggestion, but it failed.
Upon inspection, this is what i have in the log file:

0 hosts/
1 resourcePool [Resources]: cpu 396.32/396.32/normal, mem 1336.66/1336.66/normal
networks: Management Network-80f72eee-083e-44ab-9148-3ed6a44e025a = VM Network
xxxxxx.local
Line 64: OVF hardware element 'ResourceType' with instance ID '4': No support for the virtual hardware device type '20'.
~
~

What I did :

1 - Deploy the original ovf
2 - Upgrade VM Hardware Compatibility of the VM to be able to add NVMe Controller (This is a requirement)
3 - Export ovf and replace original files on the destination

I am testing performance with vPMEMDisks and would like to use a NVMeController rather than Paravirtual.
This is my use case.

Thanks !

Feb 24, 2020

I sorted it out.
1. deploy the original ovf.
2. just upgrade the h/w version to post 6.7u2
3. export the vm, replace perf-photon-hcibench.ovf and disk-0.vmdk in /opt/output/vm-template/

below are some code changes need to be made in HCIBench, find the files at https://github.com/cleeistaken/hcibench_miscellaneous
1. replace the file ~/rvc/lib/rvc/modules/device.rb
2. replace the file ~/rvc/lib/rvc/modules/vsantest/perf.rb

Now the guest VMs deployed will have virtual nvme controllers/disks configured automatically, please make sure only have this deploy on the cluster which can support vm version 15+(6.7u2 or later)

another thing keep in mind, after having this change, you need to change the parameter file.

to do so:
1. just use HCIBench to configure a vdbench or fio parameter file.
2. start testing
the testing would not go through since the vdisk definition in the param file is diff:
in param file, vdisk is defined as /dev/sda, /dev/sdb,
but your guest vms would have /dev/nvme0n1, /dev/nvme1n3 or so.
3 modify the param file generated by HCIBench in /opt/automation/fio-param-files or /opt/automation/vdbench-param-files by replacing /dev/sda, /dev/sdb with /dev/nvme0n1 or so.
4. save the param file and go ahead with testing.

Feb 26, 2020

Thanks Chen,

I am going to try it out and let you know about the outcome.
This is really a important feature to be able to use nvme controllers and maybe in the future of the tool we could have the option to choose the type of controller for the worker VMs.

Feb 21, 2020

thinking the rbvmomi(sdk used for guest vm deployment) may not support virtual nvme ctr yet. lemme dig into this.
do you want to have all the disks created under nvme ctrl rather than pvscsi?

Mar 11, 2020

Hi Chen,

I tested now the procedure and followed all the steps.
The VM are now created with NVMe Controller but there is an issue as they do not boot.
If I login to the console I can see an error that says "Boot Failed".
I think it may be related with the NVMe Controller on the GuestOS Disk.
I am going to replace it by the orginal controller and see if that helps.

Thanks

PS: Please delete the other post - it was posted outside this thread by error.

Mar 11, 2020

So the controller for the guest OS has not changed.
The VMs simply do not boot. The upgraded Hardware VM version 15+ can boot but when I export and overwrite the original files disk-0.vmdk and perf-photon-hcibench.ovf after they are deployed, they simply cannot see the OS/boot disk.

Feb 10, 2020

Hi I am trying to use HCIBench on mt vsphere 6.7 towards vsan where I was able to test it with smaller disk capacity of 10gig with 5 vm's. when I had increased the vm's to 10 and disk capacity to 50gig the validation is successful without any errors and when testing the performance the cursor below is stuck for almost 24 hrs and had to cancel the process.Is there any limitation on the disk space that needs to be used?

Feb 10, 2020

where was it hanging on? vmdk prep or testing or deployment?

Feb 10, 2020

Hi all,

New issue in 2.3.1 which didn't exist in 2.2.1.
It appears the docker internal network can conflict if your management IP is on a 172.17 network.

In my case my management IP is: 172.17.1.108/24 (based on our lab networks).
When I attempt to validate the config I'm getting the following error:

2020-02-10 03:58:36 -0800: Validating Vdbench binary and the workload profiles...
2020-02-10 03:58:36 -0800: ------------------------------------------------------------------------------
2020-02-10 03:58:36 -0800: There are interfaces that conflict with the internal docker network:
Interface docker0 (172.17.0.1/16) contains the network on interface eth0 (172.17.1.108/24)
2020-02-10 03:58:36 -0800: --

Logging into the HCI bench appliance and checking the IPs I can see that the docker interface is overlapping with our internal lab networks.

root@photon-HCIBench [ ~ ]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:81:d9:83 brd ff:ff:ff:ff:ff:ff
inet 172.17.1.108/24 brd 172.17.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe81:d983/64 scope link
valid_lft forever preferred_lft forever
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:c5:ae:38:7f brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:c5ff:feae:387f/64 scope link
valid_lft forever preferred_lft forever

This issue does not occur in the previous version of HCIbench (2.2.1).
We may need a drop down or OVA/OVF property allowing us to change the docker network IP settings.
A lot of lab networks use the 172.17 subnet ranges.

Feb 10, 2020

Here is the same output from HCIbench 2.2.1 which has no issue with validation:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:81:1c:98 brd ff:ff:ff:ff:ff:ff
inet 172.17.1.108/24 brd 172.17.1.255 scope global eth0
valid_lft forever preferred_lft forever
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:80:23:6d brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever

So network appears the same in 2.2.1 but possibly there is a new validate pre-check in 2.3.1 for the docker network which wasn't there in 2.2.1.

There is no overlap from my lab network and the docker network.
The VM network is a 172.31 network range so they don't overlap either.

Is this pre-check for the docker network needed if you are using a different VM network range altogether?

Feb 10, 2020

The docker network is 172.17.0.0/16 so the range is 172.16.0.1 to 172.16.255.254. The eth0 IP above (172.17.1.108/24) conflicts with the docker IP range. If you look at the routing table there will be overlap, so we flag the (potential) problem.

If you use 172.31.0.0/16 (or any subnet) for the management network you should be fine (provided you don't use that for the worker network (i.e. eth1).

Feb 10, 2020

In our lab we are all assigned our own 172.17.x/24 subnets which are routable. Other IP ranges we have are non-routable (172.18/19.x/24) so I could give the HCIbench VM management a non-routable IP but can't get at the UI.

Is there a setting we can use to ignore that validation check?
Giving the internal docker network 65k+ IP addresses is significant.

We can continue to use version 2.2.1 as it doesn't have the validation checks but ideally we'd like to keep our testing platforms in sync with VMware best practices.

Feb 10, 2020

here are the steps to reconfig docker subnet:

1. SSH into HCIBench, and create file by: vi /etc/docker/daemon.json
2. add lines to the file and save it ( here we change the docker host ip to 192.168.5.1(please make sure not using 192.168 for guest vms ip prefix in my case)
{
"default-address-pools":
[
{"base":"192.168.5.0/24","size":24}
]
}
3. systemctl restart docker
4. change data source in grafana:
go to http://HCIBench_IP:3000/datasources admin/vmware, edit both graphite and influxdb by changing the ip from 172.17.0.1 to 192.168.5.1, save and test the datasource, then you should be good.

Feb 10, 2020
Feb 13, 2020

Thanks, changing the docker internal network fixes the issue.
You do need to create '/etc/docker/daemon,json' file as it not there but once you enter the setting and restart docker you can see the IP has changed in 'ifconfig'.

This workaround works for me.

Feb 10, 2020

should this pre-check be on eth1 only? don't see any need for it to check the docker network and eth0. they are primary static networks.