Please provide your feedback in this short Flings' survey.
Jul 22, 2021

Hi Chen/Charles(and the other Devs)!

First of all I want to thank you for this GREAT tool. It helps me accurately test performance of vSAN building blocks in an automated friendly way.
This tool really shows the power of what Flings can be. This tool should be an official fully supported VMware tool and developed even further !

I am currently doing tests on our new VxRail vSAN building blocks. As I am also going to do vSAN resilience tests (host failures etc.), I have provisioned quite some data for my tests (around 66% of the datastore raw capacity filled).
As I'm using the Prepare Virtual Disks with RANDOM option (compression is enabled on the cluster), it took hours to complete this action. After the action it ran the test.

But then on consecutive tests, it destroys my VMs (and all the hard work of RANDOM written data on the datastore :( ) and deploys new ones. I'm unsure why this happens. I have verified that:

- I'm using the exact same HCIBench config as the initial test
- I'm using the exact same Vdbench config as the initial test
- I’ve tried with both “Prepare Virtual Disk before Testing” on NONE and on RANDOM for consecutive tests. Which setting is right here if you re-use VMs and you did the RANDOM writes already during the initial deployment?

The vm-health-check.log shows:

2021-07-21 23:58:43 +0000: Verifying If Folder Exists...
2021-07-21 23:58:43 +0000: Folder Verified...
2021-07-21 23:58:43 +0000: Moving all vms to the current folder
2021-07-21 23:58:44 +0000: no matches for "temp/*"
2021-07-21 23:58:44 +0000: There are 100 VMs in the Folder, 100 out of 100 will be used
2021-07-21 23:58:45 +0000: no matches for "temp/*"
[KMoveIntoFolder temp: running ❲ ❳
[1A [KMoveIntoFolder temp: success
[1A [1B
2021-07-21 23:58:45 +0000: [ERROR] Not enough proper VMs in vSAN_R1PAFHYP0306
2021-07-21 23:58:45 +0000: [ERROR] Existing VMs not Compatible
2021-07-21 23:58:45 +0000: ABORT: VMs are not existing or Existing VMs are not Compatible

Here is my prevalidation check (anonymized the data):

--- XXXXXX.insim.biz ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 58ms
rtt min/avg/max/mdev = 0.460/0.491/0.544/0.039 ms
2021-07-21 11:16:29 +0000: VC IP and Credential Validated
2021-07-21 11:16:29 +0000: Validating Datacenter XX-VCF-XX-XXX...
2021-07-21 11:16:29 +0000: Datacenter XX-VCF-XX-XXX Validated
2021-07-21 11:16:29 +0000: Validating Cluster XXXXXXXXX...
2021-07-21 11:16:29 +0000: Cluster XXXXXXXXX Validated
2021-07-21 11:16:29 +0000: Cluster XXXXXXXXX has DRS mode: fullyAutomated
2021-07-21 11:16:29 +0000: Validating If Any Hosts in Cluster XXXXXXXXX for deployment is in Maintainance Mode...
2021-07-21 11:16:33 +0000: All the Hosts in Cluster XXXXXXXXX are not in Maitainance Mode
2021-07-21 11:16:33 +0000: Validating Network LS-AF-P-S-HCIBench_DI01_Workers...
2021-07-21 11:16:33 +0000: ------------------------------------------------------------------------------
2021-07-21 11:16:33 +0000: Found 4 LS-AF-P-S-HCIBench_DI01_Workers
2021-07-21 11:16:33 +0000: ------------------------------------------------------------------------------

The only "strange" thing I see here is that it finds 4 networks. However that is expected, as this is an NSX-T overlay backed segment which spans multiple VDSes (we have 1 NSX VDS per cluster), so this shouldn't be an issue.

Do you guys have any suggestions on how to fix this? I really need to re-use my VMs for consecutive tests, for my use cases.

Which logs should I forward to you for further troubleshooting? I hope you can help me out with this one guys.

P.S. I've also sent you an e-mail, as then I can attach logs.

Thanks and Regards !

Viresh

Jul 16, 2021

Toward the end of a test validation I am getting the following error:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xbcae4b]

Jul 12, 2021

Use version 2.6.0, deployed 10 guest VMs enabled DHCP, got IP from ["192.168.0.2", "192.168.0.3", "192.168.0.4", "192.168.0.5", "192.168.0.6", "192.168.0.7", "192.168.0.8", "192.168.0.9", "192.168.0.10", "192.168.0.11"].

Some VM can response PING
PING 192.168.0.6 (192.168.0.6) 56(84) bytes of data.
64 bytes from 192.168.0.6: icmp_seq=1 ttl=64 time=0.386 ms
64 bytes from 192.168.0.6: icmp_seq=2 ttl=64 time=0.200 ms
64 bytes from 192.168.0.6: icmp_seq=3 ttl=64 time=0.205 ms
64 bytes from 192.168.0.6: icmp_seq=4 ttl=64 time=0.178 ms
64 bytes from 192.168.0.6: icmp_seq=5 ttl=64 time=0.207 ms

--- 192.168.0.6 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 81ms

but some VMs are unreachable

PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data.
From 192.168.0.1 icmp_seq=1 Destination Host Unreachable
From 192.168.0.1 icmp_seq=2 Destination Host Unreachable
From 192.168.0.1 icmp_seq=3 Destination Host Unreachable
From 192.168.0.1 icmp_seq=4 Destination Host Unreachable
From 192.168.0.1 icmp_seq=5 Destination Host Unreachable

--- 192.168.0.3 ping statistics ---
5 packets transmitted, 0 received, +5 errors, 100% packet loss, time 101ms
pipe 4
Not able to ping VMs ["hci-fio-datastore-180789-0-2"], try another time...
Can't Ping VMs ["hci-fio-datastore-180789-0-2"] by their IPs ["192.168.0.3"]

Tried in different vcenter for serval times, always see this issue

Jul 12, 2021

Hi Zhong, you can try looking at the user guide for some good pointers on network troubleshooting. Here are my top questions:

- If you are using standard vSwitches, are all the ports tagged with the same/correct VLAN?
- Are all the physical switch trunk ports tagged with the same/correct VLAN?
- Are the unpingable VM always on the same hosts? If you take a VM with an IP that you can ping and vMotion it to a host with a VM that you can't ping, does it cease to be reachable?
- If you are using NSX you can create an overlay backed segment for the workers. It only needs to be L2.

If you still can't locate the issue, send us an email and we can setup a call to help you out.

Jul 14, 2021

Looks like an environment issue, it works on the new deployed environment.

Jul 15, 2021

Okay, that's great! Reach out if you need anything else.

Jul 13, 2021

Environment is re-installed, will retry and send logs. Used VDS; the unpingalbe VMs were on the same host, but some VMs on the host are pingable.

Jul 12, 2021

could you check whether the one that HCIBench can ping was on the same ESXi host with HCIBench?
this usually is caused by the esxi hosts cant talk to each other through the vLAN you specified.

Jul 13, 2021

Environment is re-installed, will retry and send logs. It can ping each other through the vLAN.

Jul 06, 2021

Hi,

First of all, thanks for this fling, it's very useful and user friendly !

I used version 2.6.0 using easymode with FIO and looking at the graphs, I can see that each run as a duration of 1h30m.
If I understood well the FIO conf, we have 30 minutes of ramp up, and then 1h of benchmarch with logging. Looking at the FIO manual, it states that rampup_time is : "fio will run the specified workload for this amount of time before logging any performance numbers."

What I don't understand then is why latencies are increasing a lot when the rampup_time is over. If it's just logging of the performance numbers, it shouldn't have a huge impact on the bench ?

Thanks in advance for your input

Tristan

Jul 07, 2021

fio doesn't report the latency when doing the ramp up, it would only report that during the testing phase.