Oct 28, 2016

I wanted to ask if it is possible to have an additional option to switch the SCSI controller in the Photon VMs from "LSI Logic Parallel" with "VMware Paravirtual"? This could increase the VM queue size. This can make possible simulating few hevy writing VMs, with many CPUs and outstanding IOS. The current design is great for test with multiple VMs, however the load I would like to simulate several heavy writing VMs. As an option with more than one Paravirtual controllers per VM. Hope that will help to better utilize the vSAN Backend.

Thanks and keep the good work!

Oct 28, 2016

Now the LSI Logic Parallel is only for the OS vmdk, all the other data disks are created on Paravirtual SCSI controller.

Oct 25, 2016

Just want to say thanks, this is great stuff

Oct 25, 2016

Glad you like it :-)

Oct 25, 2016

The generated "Generated Vdbench Parameter File" with 10vmdks but ran by accident the test with the optional parameter "Number of Data Disk" =1.

Got:
13:52:49.181 Raw device 'sd=sd10,lun=/dev/sdj' does not exist, or no permissions.
13:52:49.182 Raw device 'sd=sd7,lun=/dev/sdg' does not exist, or no permissions.

Is it meant to be like that? It seems the Number of Data Disk in the config page is not overwriting the Vdbench parameter file.

Oct 25, 2016

"It seems the Number of Data Disk in the config page is not overwriting the Vdbench parameter file."

Correct.
The "Number of Data Disk" on the main page should be equal or greater than the number in the generation page.
looks like you generated a vdbench profile with 10 vmdk however, the vm only has 1 vmdk configure.

Oct 19, 2016

Hello-

I am able to kick off testing and I can see resources ramp up in vCenter (so I know it is working as intended), and in the GUI of the HCI tool it just hangs @ I/O Test Started.
I waited this out and it appears to just hang, waited overnight in test run. Still at I/O Test started
The VC log shows, VMs ready for perf runs as the last entry.

test-status.log
Deployment Started
Verifying If VMs are Accessible
Deployment Successfully Finished
I/O Test Started

I am seeing on the console of one of the vdbench-vc-VSA some errors.

Out of memory: Kill Process
Failed to start journal service

Any ideas?

Oct 21, 2016

To me, this sounds like the client VM was running out of RAM, to increase the CPU/RAM of the client VM, you can edit the file /opt/output/vm-template/perf-photon-vdbench.ovf, for cpu, you can edit line 38, 39 to change the number of vcpu from 4 to more.
<rasd:VirtualQuantity>4</rasd:VirtualQuantity>
<vmw:CoresPerSocket ovf:required="false">4</vmw:CoresPerSocket>

for memory, you can edit line 47 to change the RAM in MB from 4096 to more(e.g. 8192)
<rasd:VirtualQuantity>4096</rasd:VirtualQuantity>

then your client vm will have more cpu/ram, which will be helpful with resolving this issue.

Oct 19, 2016

That appears to be similar to what I've experienced. If the vdbench client vm has some issue (in your case out of memory) the all-in-one-testing.rb process will never exit, so you have to kill it manually. Run cleanup_vm.rb will clear the vms so you can start again.

I'm actually in that very state myself, just ran a 20 minute, test which did finish (as I saw the IOPS for 20 mins), but after one hour HCIbench is still attempting to grab the output files from the vdbench clients (io-test-<mytestname>.log is still being updated)

I tried to get to the console of the client vms, but I just see some unusual kernel messages and it wont let me login. (EXT4-fs error (device sdk2) so looks like the virtual disk got messed up). Its as if somebody was rapidly writing to disk lol.

I've run this same test 3 times today and the first 2 worked, so its quite intermittent.

FWIW I believe I saw a similar issue (out of memory) when I increased the thread count dramatically (120) in the vdbench param file, so might be worth posting your parm file. Again I was not able to reproduce that one.

Oct 19, 2016

Thanks Simon, that's really appreciated, I can kill that job now. I will manually purging the VMs and trying again. My parm=

*SD: Storage Definition
*WD: Workload Definition
*RD: Run Definition
*
sd=sd1,lun=/dev/sdb,openflags=o_direct,hitarea=0,range=(0,100),threads=2
sd=sd2,lun=/dev/sdc,openflags=o_direct,hitarea=0,range=(0,100),threads=2
sd=sd3,lun=/dev/sdd,openflags=o_direct,hitarea=0,range=(0,100),threads=2
sd=sd4,lun=/dev/sde,openflags=o_direct,hitarea=0,range=(0,100),threads=2
sd=sd5,lun=/dev/sdf,openflags=o_direct,hitarea=0,range=(0,100),threads=2
sd=sd6,lun=/dev/sdg,openflags=o_direct,hitarea=0,range=(0,100),threads=2
sd=sd7,lun=/dev/sdh,openflags=o_direct,hitarea=0,range=(0,100),threads=2
sd=sd8,lun=/dev/sdi,openflags=o_direct,hitarea=0,range=(0,100),threads=2
sd=sd9,lun=/dev/sdj,openflags=o_direct,hitarea=0,range=(0,100),threads=2
sd=sd10,lun=/dev/sdk,openflags=o_direct,hitarea=0,range=(0,100),threads=2
wd=wd1,sd=(sd1,sd2,sd3,sd4,sd5,sd6,sd7,sd8,sd9,sd10),xfersize=(8192,100),rdpct=70,seekpct=100
rd=run1,wd=wd1,iorate=max,elapsed=3600,warmup=1800

* 10 raw disks, 100% random, 70% read of 8k blocks at unlimited rate

Oct 20, 2016

I used your param file last night and the test ran. This morning, I also noticed it was still running :). I actually ended up rebooting the HCIbench vm to clean it out for my next run. I think the code to stop the tests has some issues, its better in this version than the last for sure.

Only thing of concern I see in your param file is the number of threads which should at least equal the number of physical disks. I typically have it at least 10, but I did run your param file with 2 threads.

Oct 21, 2016

Hi Simon,

what did you see from the console of HCIBench client VMs before you rebooting them? i suspect the client VM was running out of resource and some of them might be hanging so the ssh process between controller and client vms never end up. that's why the test appeared still running.

Oct 20, 2016

Thanks for your reply Simon, I modified that and got it working, sort of. Say I deploy 8 vms, it only shows stats on 1/2 of the VMs on a good run. Also my results.html is blank. You seeing anything similar?

Oct 20, 2016

Are you looking in io-test-<param-file-name>.log for errors?

I think if one of the tests fails the results will not be rolled up, so I think you have to get this to work completely. Errors in the io-test appear to make everything else go bad including the test not ending.

Can you get it to work with less vms, with less disks, just get a complete run with something smaller.

Are you seeing any data in /opt/output/results ?