Please provide your feedback in this short Flings' survey.
Nov 18, 2020

HI - I am running HCI bench and ran some easy run test, All the validate configuration was fine and test completed successfully. When i check the results its incomplete and i am unable to see the complete output. Below is what is see from result file. Can you help with the below below issue

Datastore = XXXXX
Resource Usage:
CPU USAGE = 31.99%
RAM USAGE = 8.62%
vSAN PCPU USAGE = 9.45%

Nov 18, 2020
Nov 16, 2020

Hi, guys.
There is one question,
why is Can't find any vSAN Datastores for test
I'm host node is connected to vsan

The message is as follows
2020-11-16 19:10:05 +0000: Network VLAN-100 (IT-TEST) is accessible from all the hosts of IDC-Cluster
2020-11-16 19:10:05 +0000: Validating Type of Network VLAN-100 (IT-TEST)...
2020-11-16 19:10:06 +0000: Network VLAN-100 (IT-TEST) Type is DistributedVirtualPortgroup
2020-11-16 19:10:07 +0000: Datastore vsanDatastore Validated
2020-11-16 19:10:07 +0000: Checking Datastore vsanDatastore type...
2020-11-16 19:10:07 +0000: Datastore vsanDatastore type is vsan
2020-11-16 19:10:07 +0000: Getting Datastore vsanDatastore id...
2020-11-16 19:10:09 +0000: Checking If Datastore vsanDatastore is accessible from any of the hosts of IDC-Cluster...
2020-11-16 19:10:10 +0000: Datastore vsanDatastore is accessible from host 10.99.254.182
2020-11-16 19:10:10 +0000: Datastore vsanDatastore is accessible from host 10.99.254.185
2020-11-16 19:10:10 +0000: Datastore vsanDatastore is accessible from host 10.99.254.186
2020-11-16 19:10:10 +0000: Datastore vsanDatastore is accessible from host 10.99.254.187
2020-11-16 19:10:10 +0000: Datastore vsanDatastore is accessible from host 10.99.254.181
2020-11-16 19:10:10 +0000: Datastore vsanDatastore is accessible from host 10.99.254.183
2020-11-16 19:10:10 +0000: Datastore vsanDatastore is accessible from host 10.99.254.184
2020-11-16 19:10:11 +0000: ------------------------------------------------------------------------------
2020-11-16 19:10:11 +0000: Can't find any vSAN Datastores for test
2020-11-16 19:10:11 +0000: ------------------------------------------------------------------------------

Nov 16, 2020

could you ssh into HCIBench and try the following steps,
1. irb
2. load '/opt/automation/lib/rvc-util.rb'
3. _get_vsandatastore_in_cluster

please paste the output here for further investigation.

Nov 16, 2020

root@photon-HCIBench [ ~ ]# irb
irb(main):001:0> load '/opt/automation/lib/rvc-util.rb'
=> true
irb(main):002:0> _get_vsandatastore_in_cluster
=> {}
irb(main):003:0>

Nov 16, 2020

one more thing you can try, ssh into HCIBench
1. rvc VC_IP
2. cd into /VC_IP/DATACENTER_NAME/computers/CLUSTER_NAME and run
3. vsantest.perf.find_vsan_datastore . #dont miss the dot
4. paste me the output

Nov 16, 2020

Hi Chen Wei:

The following is output

/10.99.254.161/Datacenter-TF-F5/computers/IDC-Cluster> vsantest.perf.find_vsan_datastore .
NoMethodError: undefined method `delete' for nil:NilClass
/opt/vmware/rvc/lib/rvc/modules/vsantest/perf.rb:988:in `block in find_vsan_datastore'
/opt/vmware/rvc/lib/rvc/modules/vsantest/perf.rb:986:in `each'
/opt/vmware/rvc/lib/rvc/modules/vsantest/perf.rb:986:in `find_vsan_datastore'
/opt/vmware/rvc/lib/rvc/command.rb:42:in `invoke'
/opt/vmware/rvc/lib/rvc/shell.rb:129:in `eval_command'
/opt/vmware/rvc/lib/rvc/shell.rb:73:in `eval_input'
/opt/vmware/rvc/bin/rvc:185:in `<main>'
/10.99.254.161/Datacenter-TF-F5/computers/IDC-Cluster>

Nov 17, 2020

hmmm, seems like it's not reading containerId.
what vc/esxi version you are testing?
also, could you insert this
puts x.info.props
into line# 988 of file /root/rvc/lib/rvc/modules/vsantest/perf.rb and run the rvc command above again?

Nov 17, 2020

Hi Chen Wei:

what vc/esxi version you are testing?
vc/esxi version is 6.5.0

Is following correct?

line# 988 of file /root/rvc/lib/rvc/modules/vsantest/perf.rb
ds_container_id = x.info.props

/10.99.254.161/Datacenter-TF-F5/computers/IDC-Cluster> vsantest.perf.find_vsan_datastore .
{"vsanDatastore"=>{"capacity"=>"21908", "freeSpace"=>"21093", "local"=>false}}

When I start test and export xxx.xxx-fio.xls

xxx.xxx-fio.xls show only column like 'Sheet Number','Throughput(MB)','Read Latency(ms)'
..,but there is no value

Thank you

Nov 17, 2020

no, you should insert
"puts x.info.props"
in that line, could you try again? please send email to vsanperformance@vmware.com, i'd like to schedule a zoom with you to help

Nov 16, 2020

what version of hcibench you use?

Nov 16, 2020

The version is 2.5.1

Is there anything shoud I confirm
thanks for your assistance

Nov 10, 2020

Hello Team,

I am able to use HCIBench fine, however I notice when the VMs get deployed they get deployed with SCSI controllers (SCSI controller 0 with LSI Logic Parallel, SCSI controller 1 and SCSI controller 2 with VMware Paravirtual). I want an NVMe controller to be added when I deploy the VMs. I am using an all flash vSAN, and I want this to happen so that I don't have to have protocol overhead between translating SCSI to NVMe.

Reading through some comments, it seems like it is possible to get this to work, but with some ruby code replacements and some other steps. Unfortunately I couldn't attempt this myself because the links for the replacement files no longer work.

If I could get some help so that when I deploy the VMs, they have NVMe controllers, that would be very helpful. Please let me know if you need additional information from me. Thanks and hope to hear from someone soon.

Nov 10, 2020

Could you tell a bit more why you would need vNVME controller being used for guest vms? from perf perspective it would not make any difference if using on vSAN comparing to pvscsi.

Nov 10, 2020

From my understanding, when I create my all flash vSAN, all it guarantees me is that it is using an SSD setup as opposed to HDD and SSD mix. However it does not tell me whether the devices inside are set as "sd" devices (sda, sdb, etc.) vs. "nvme" devices (nvme1n1, nvme2n1, etc.)

Since I know that my test setup has all NVMe SSDs, and not non-NVMe SSDs, there would have to be a SCSI to NVMe translation portion on the stack in order for the underlying NVMe drives to perform what is actually needed to be done by the benchmark (fio / vdbench). If my VM is constantly using a SCSI controller, this translation layer will add wasted time just simply doing translation as opposed to just directly generating NVMe commands and executing. I believe this would make a difference in the number of IOPs I would see my cluster achieve.

https://www.nvmexpress.org/wp-content/uploads/NVM-Express-SCSI-Translation-Reference-1_1-Gold.pdf

Section 1-1 of the Introduction of this pdf from nvmeexpress.orgs website shows where the NVM Express SCSI translation layer lies between the NVM between the Kernel OS Storage Stack and the NVM Express Driver, and then ultimately NVM Express Device.

Ultimately I want to eliminate this translation layer by having the VM use an NVMe controller directly. By eliminating the mapping that occurs, I believe it would save time and allow the device to perform a more accurate number in terms of IOPs that the device itself is capable of doing assuming no translation is necessary.

Please let me know if this is possible to do and how I would go about doing it, if it is. Thank you.

Nov 10, 2020

Virtual NVMe has nothing to do with vSAN because VSAN doesn't tier between normal SSD and NVMe storage.
vSAN LSOM would treat the I/O from vscsi or vnvme the same way and it doesnt matter whether your cache tier is non-NVMe or NVMe SSDs.
Please tell me your HCIBench version, i can take a look.
But again, i would recommend no bother if running vSAN, it would not make any perf diff from pvscsi to vnvme.

Nov 10, 2020

Assume I have NVMe for Cache, and NVMe for Data. In that case, all of my IO from the CPU to disk uses NVMe protocol. If my VM writes to the host via PVSCSI, there MUST be a protocol translation somewhere in the stack.

We have been able to show that a specific NVMe drive, under raw FIO testing, vs a specific high end SAS drive, is about 40% faster (not on VSAN, just FIO). However, when I use the same 2 drives as cache in VSAN on all flash configuration, the SAS drive is faster by about 10% with PVSCSI VMs.

It is pretty clear that SOMEWHERE in the stack, it is doing a protocol translation from PVSCSI (which is using a version of SAS protocol I am assuming?) to NVMe. Now, it may be that VSAN has to translate the commands to something, then back and NVMe VM - > vsan -> NVMe SSD is slower, but I highly doubt it.

We have some additional testing to prove out or config, but I believe that using NVMe VM disks, in an All NVMe SSD drive configuration would eliminate the protocol translation issue.

Nov 10, 2020

because for vSAN, we found the perf would be capped somewhere else.
but i agree, it would be nice to investigate, could you tell me your HCIBench ver so i can see whether there's a fast approach to do so?

Nov 10, 2020

We were using 2.4, but working on 2.5.1 right now.

Going to run a quick test to confirm that NVMe VM + all NVMe disk outperforms PVSCSI + all NVMe DIsk

I would like to put a button on HCI-Bench to allow the VMs to use NVMe disk instead of PVSCSI disks. This would also require a higher compatibility mode than is currently available on the OVA file, but thats an easy fix. We are working on that code now. If we create this, would that be useful for us to submit back to this project?

Where is the perf capped? Just wondering

Nov 10, 2020

re: code, of course! you can always fork and commit it back here:
https://github.com/cwei44/HCIBench

in theory, vnvme would drive more IOps since it provides larger queue depth with more queues, but if it's on vSAN, and you are driving very high IOps with large amount of OIO, you would most likely expect it's capped by vSAN CPU world for now.

Nov 12, 2020

Is there a parameter that can be edited so that we can adjust the vSAN CPU world? I tried to look through the advanced system settings for a particular host in the cluster, but didn't see any particular key there that alluded to adjusting this. If we could adjust this number up respective to the host PCPUs, I think this could allow NVMe drives to perform much better in terms of HCIBench FIO.

Nov 10, 2020
Nov 10, 2020

No, we dont have any doc for rvc, if you wanted to add that functionality:
1. upgrade to 2.5.1
2. look at the following script to start with
/opt/automation/lib/deploy-vms.rb
/root/rvc/lib/rvc/modules/vsantest/perf.rb(search for pvscsi)
/root/rvc/lib/rvc/modules/device.rb

Nov 11, 2020

One other question, is there any other HCI-Bench documentation around workflow. Once we confirm that using NVMe virtual disks is faster for all NVMe configurations (should confirm that later today), we will start modifying the code.

We would like to add a button, near easy run, that enables NVMe disk on the VMs. The code seems like it starts with HTML, then uses Javascript to parse, then calls bash scripts, which call Ruby. It also uses global variables which we are not entirely sure where those are instantiated. The data path is a little complex. Is there any documentation on how this is setup?

Nov 12, 2020

i would recommend to start with those three files i mentioned, i understand you want to add a toggle on the ui in order to make it happen, i can handle that part as long as the backend implementation is built.
we could add one more param into /opt/automation/perf-conf.yaml in order to allow user to do so.

Nov 10, 2020

AH! That makes sense, because the I would expect the physical NVMe disks to be faster than they are.

I believe the limit we are hitting is actually the vSAN World CPU. The problem is that we are having to use the CPU dedicated to VSAN for Protocol translation, which eats up ... probably half of the World CPU? If we can use vNVME disks for the VMs, that would eliminate the need for the Protocol translation, and unlock significant performance gains.

Nov 12, 2020

no, there is no such params exposed to end-uer.

Nov 08, 2020

Hi,

We are planning to run some benchmarking test on VMware tanzu containers, Does HCIBench currently support running on containers? if not, do you have any recommendations?

Thanks,
Mubeerr

Nov 09, 2020

Hi Team,

Any update on this?

Nov 09, 2020

Currently HCIBench does not support container based testing. I haven't done any testing in this area so I can't make any recommendations. There are third party articles that may shed some light on the topic. Searching for "benchmark kubernetes storage" has a few promising leads.