Comment thread started by Adil Rahman on HCIBench

Full comments
Nov 10, 2020

Hello Team,

I am able to use HCIBench fine, however I notice when the VMs get deployed they get deployed with SCSI controllers (SCSI controller 0 with LSI Logic Parallel, SCSI controller 1 and SCSI controller 2 with VMware Paravirtual). I want an NVMe controller to be added when I deploy the VMs. I am using an all flash vSAN, and I want this to happen so that I don't have to have protocol overhead between translating SCSI to NVMe.

Reading through some comments, it seems like it is possible to get this to work, but with some ruby code replacements and some other steps. Unfortunately I couldn't attempt this myself because the links for the replacement files no longer work.

If I could get some help so that when I deploy the VMs, they have NVMe controllers, that would be very helpful. Please let me know if you need additional information from me. Thanks and hope to hear from someone soon.

Nov 10, 2020

Could you tell a bit more why you would need vNVME controller being used for guest vms? from perf perspective it would not make any difference if using on vSAN comparing to pvscsi.

Nov 10, 2020

From my understanding, when I create my all flash vSAN, all it guarantees me is that it is using an SSD setup as opposed to HDD and SSD mix. However it does not tell me whether the devices inside are set as "sd" devices (sda, sdb, etc.) vs. "nvme" devices (nvme1n1, nvme2n1, etc.)

Since I know that my test setup has all NVMe SSDs, and not non-NVMe SSDs, there would have to be a SCSI to NVMe translation portion on the stack in order for the underlying NVMe drives to perform what is actually needed to be done by the benchmark (fio / vdbench). If my VM is constantly using a SCSI controller, this translation layer will add wasted time just simply doing translation as opposed to just directly generating NVMe commands and executing. I believe this would make a difference in the number of IOPs I would see my cluster achieve.

Section 1-1 of the Introduction of this pdf from nvmeexpress.orgs website shows where the NVM Express SCSI translation layer lies between the NVM between the Kernel OS Storage Stack and the NVM Express Driver, and then ultimately NVM Express Device.

Ultimately I want to eliminate this translation layer by having the VM use an NVMe controller directly. By eliminating the mapping that occurs, I believe it would save time and allow the device to perform a more accurate number in terms of IOPs that the device itself is capable of doing assuming no translation is necessary.

Please let me know if this is possible to do and how I would go about doing it, if it is. Thank you.

Nov 10, 2020

Virtual NVMe has nothing to do with vSAN because VSAN doesn't tier between normal SSD and NVMe storage.
vSAN LSOM would treat the I/O from vscsi or vnvme the same way and it doesnt matter whether your cache tier is non-NVMe or NVMe SSDs.
Please tell me your HCIBench version, i can take a look.
But again, i would recommend no bother if running vSAN, it would not make any perf diff from pvscsi to vnvme.

Nov 10, 2020

Assume I have NVMe for Cache, and NVMe for Data. In that case, all of my IO from the CPU to disk uses NVMe protocol. If my VM writes to the host via PVSCSI, there MUST be a protocol translation somewhere in the stack.

We have been able to show that a specific NVMe drive, under raw FIO testing, vs a specific high end SAS drive, is about 40% faster (not on VSAN, just FIO). However, when I use the same 2 drives as cache in VSAN on all flash configuration, the SAS drive is faster by about 10% with PVSCSI VMs.

It is pretty clear that SOMEWHERE in the stack, it is doing a protocol translation from PVSCSI (which is using a version of SAS protocol I am assuming?) to NVMe. Now, it may be that VSAN has to translate the commands to something, then back and NVMe VM - > vsan -> NVMe SSD is slower, but I highly doubt it.

We have some additional testing to prove out or config, but I believe that using NVMe VM disks, in an All NVMe SSD drive configuration would eliminate the protocol translation issue.

Nov 10, 2020

because for vSAN, we found the perf would be capped somewhere else.
but i agree, it would be nice to investigate, could you tell me your HCIBench ver so i can see whether there's a fast approach to do so?

Nov 10, 2020

We were using 2.4, but working on 2.5.1 right now.

Going to run a quick test to confirm that NVMe VM + all NVMe disk outperforms PVSCSI + all NVMe DIsk

I would like to put a button on HCI-Bench to allow the VMs to use NVMe disk instead of PVSCSI disks. This would also require a higher compatibility mode than is currently available on the OVA file, but thats an easy fix. We are working on that code now. If we create this, would that be useful for us to submit back to this project?

Where is the perf capped? Just wondering

Nov 10, 2020

re: code, of course! you can always fork and commit it back here:

in theory, vnvme would drive more IOps since it provides larger queue depth with more queues, but if it's on vSAN, and you are driving very high IOps with large amount of OIO, you would most likely expect it's capped by vSAN CPU world for now.

Nov 12, 2020

Is there a parameter that can be edited so that we can adjust the vSAN CPU world? I tried to look through the advanced system settings for a particular host in the cluster, but didn't see any particular key there that alluded to adjusting this. If we could adjust this number up respective to the host PCPUs, I think this could allow NVMe drives to perform much better in terms of HCIBench FIO.

Nov 10, 2020
Nov 10, 2020

No, we dont have any doc for rvc, if you wanted to add that functionality:
1. upgrade to 2.5.1
2. look at the following script to start with
/root/rvc/lib/rvc/modules/vsantest/perf.rb(search for pvscsi)

Nov 11, 2020

One other question, is there any other HCI-Bench documentation around workflow. Once we confirm that using NVMe virtual disks is faster for all NVMe configurations (should confirm that later today), we will start modifying the code.

We would like to add a button, near easy run, that enables NVMe disk on the VMs. The code seems like it starts with HTML, then uses Javascript to parse, then calls bash scripts, which call Ruby. It also uses global variables which we are not entirely sure where those are instantiated. The data path is a little complex. Is there any documentation on how this is setup?

Nov 12, 2020

i would recommend to start with those three files i mentioned, i understand you want to add a toggle on the ui in order to make it happen, i can handle that part as long as the backend implementation is built.
we could add one more param into /opt/automation/perf-conf.yaml in order to allow user to do so.

Nov 10, 2020

AH! That makes sense, because the I would expect the physical NVMe disks to be faster than they are.

I believe the limit we are hitting is actually the vSAN World CPU. The problem is that we are having to use the CPU dedicated to VSAN for Protocol translation, which eats up ... probably half of the World CPU? If we can use vNVME disks for the VMs, that would eliminate the need for the Protocol translation, and unlock significant performance gains.

Nov 12, 2020

no, there is no such params exposed to end-uer.