Clustering, EMC, Storage, Virtualization, VMware, VPLEX, vSphere

VMware HA demo using vMSC with EMC VPLEX Metro

That’s a mouth full of abbreviations for a title, isn’t it?

So, let me give you some background info. VMware introduced something called the vSphere Metro Storage Cluster, and Duncan Epping talks about this feature here.

What the vMSC allows us to do, is to create a regular stretched vSphere cluster, but now also stretch out the storage between the two clusters. This can be done in two ways (to quote from Duncan’s article):

I want to briefly explain the concept of a metro / stretched cluster, which can be carved up in to two different type of solutions. The first solution is where a synchronous copy of your datastore is available on the other site, this mirror copy will be read-only. In other words there is a read-write copy in Datacenter-A and a read-only copy in Datacenter-B. This means that your VMs in Datacenter-B located on this datastore will do I/O on Datacenter-A since the read-write copy of the datastore is in Datacenter-A. The second solution is which EMC calls “write anywhere”. In this case VMs always write locally. The key point here is that each of the LUNs / datastores has a “preferred site” defined, this is also sometimes referred to as “site bias”. In other words, if anything happens to the link in between then the storage system on the preferred site for a given datastore will be the only one left who can read-write access it.

The last scenario described here is something that obviously can cause some issues. EMC tried to address this by introducing the “independent 3rd party”, in form of the VPLEX Witness. Some documentation states that this witness should run in a 3rd site, but I would recommend to run this in a separate failure domain.

In essence, we have created the following setup:

© VMware

Awesome stuff, because we can do new things that weren’t quite possible before. Since VPLEX is one of the key storage virtualization solutions from EMC that allow us to perform an active/active disk access, we can perform a vMotion between the two sites, and due to the nature of VPLEX, we also perform a sort of storage vMotion on the underlying disks. That, without you having to shut down the VM to do both things at the same time. Pretty neat!

Now, as Chad describes here, a new disk connectivity state was introduced with vSphere 5, called “Permanent Device Loss” or PDL. This was a great feature to communicate to your infrastructure that a target was intentionally removed. You could unmount the disk, and remove the paths to your target in a proper way.

It was also useful to indicate an unexpected loss of your target, indicating that your cluster is in a partitioned state. The problem here was that a PDL state and VMware HA didn’t work so well together. When you had an APD notification, HA didn’t “kill” your VM, and your virtual machine would usually continue to respond to pings, but that was about it.

Then along came vSphere 5 Update 1, which allows us to set a flag on each of the hosts inside our cluster, and set a different flag for our HA cluster. Now, we can actually use HA and see terminate the VMs and have it restart the virtual machines on the hosts in our cluster that still have access to their datastores in their respective preferred sites.

I’ve created a short (ok, 8 minutes) video that show exactly this scenario. You’ll get a quick view of the VPLEX setup. You’ll see the Brocade switches that will change from a config with the normal full zoneset, being switched to a zoneset that will disable the inter-switch links between both VPLEX clusters. And you’ll see the settings inside of my vSphere lab setup, with the behavior of the hosts and virtual machines.

Since I’m quite new to creating videos like this, I hope the output is acceptable, and the video is clear enough. If you have any questions, feedback or would like to see more, please leave me a comment and I’ll see what I can do. 🙂


Just a quick modification to my post, since it wasn’t actually VM-HA (or VM monitoring) responding to the PDL event, but HA terminating the VM when running in to the PDL state, as Duncan pointed out to me on Twitter. Sorry for any confusion I may have caused!

Virtualization, VMware, vSphere

Changing a forgotten ESXi 5 root password

It shouldn’t happen, but most folks I’ve spoken to have run in to this at some point in time. You are trying to log on to your ESXi host, and for some reason your root password isn’t working anymore.

The official stance that VMware has taken on this can be found in knowledge base article 1317898, and at the time of writing, it states the following:

ESXi 3.5, ESXi 4.x, and ESXi 5.0

Reinstalling the ESXi host is the only supported way to reset a password on ESXi. Any other method may lead to a host failure or an unsupported configuration due to the complex nature of the ESXi architecture. ESXi does not have a service console and as such traditional Linux methods of resetting a password, such as single-user mode do not apply.


So, after searching a bit and combining infos from several folks, I’ve found a way to reset the password, but you should note that this is not officially supported by VMware!


First off, I would recommend you empty your host of any running virtual machines, and put it in to maintenance mode. Next up, inside your vSphere Client, go to the “Home” screen, and select “Host Profiles”, or just press “Ctrl + Shift + P”. Once you are there, create a new profile from an existing host, and select the host that has the unknown password, and give it a name that you can remember.

Next up, edit the newly created profile and open up the “Security Configuration” section. From there, select the “Administrator Password” option, and in the right hand drop down menu, select “Configure a fixed administrator password”.

Now, you can set a new password, but please be careful about one thing. You need to set the password with a certain complexity level. For the exact details, have a look at VMware knowledge base entry 1012033, which states that the default password complexity policy that is set with PAM has the following default:

password requisite /lib/security/$ISA/pam_passwdqc.so retry=3 min=8,8,8,7,6which actually means the following:

  • retry=3: A user is allowed 3 attempts to enter a sufficient password.
  • N0=12: Passwords containing characters from one character class must be at least twelve characters long.
    example: chars1234567
  • N1=10: Passwords containing characters from two character classes must be at least ten characters long.
    example: CHars12345
  • N2=8: Passphrases must contain words that are each at least eight characters long.
    example: software
  • N3=8: Passwords containing characters from all four character classes must be at least eight characters long.
    example: CHars12!
  • N4=7: Passwords containing characters from all four character classes must be at least seven characters long.
    example: CHars1!
  • Example: password requisite /lib/security/$ISA/pam_passwdqc.so retry=3 min= 12,10,8,8,7

If you don’t actually follow these rules, and try to apply the profile, you will get a pretty cryptic error message that in my case just stated the following:
Authentication token manipulation errorWhich isn’t that helpful.

Once you have set the new password, you just need to select the profile you have created, and you need to attach your host to this profile. Once that is done, go to the “Hosts and Clusters” view (again, click “Ctrl + Shift + H” to jump there immediately), and right click your host. Select “Host Profile” from the menu, and from there click “Apply profile”.

Now, if SSH was disabled by the applied profile, enable it again by going to the “Configuration” tab, selecting “Security Profile” and going to the properties of the “Services” part. You can start the SSH service from there. Now you can log on using the newly assigned password and that was that.

But…! There’s always a but, isn’t there? This change only works as long as you keep the host profile, or as long as the host stays withing your vCenter. So, what can you do to make the change permanent? Simple, you log on via SSH, change the password with the “passwd” command and then run the auto-backup.sh script from /sbin.

Also, if you would like to work around the password complexity policy, you can modify the following file:
/etc/pam.d/passwd to reflect your own policy. If you want to do this, create a backup of the file, modify it to reflect your own policy. After that, change the password and run the auto-backup.sh script.

Again, these last steps are not recommended by me or VMware, and this will impact the security of your system, so be extremely cautious of your changes! I’m just trying to document the steps so you might have it a bit easier should this situation occur.

EMC, Storage, VAAI, VMware, VNX, vSphere

“My VAAI is Better Than Yours” – The file side of things

EMC VNXI have to admit it. I stole, or rather “borrowed”, part of this title from a blog post of a colleague of mine, Erik Zandboer. He just now published a post on the mindset behind VAAI, and what the actual effect is on the array itself, and on your vSphere infrastructure.

VAAI was already available in vSphere 4.1, and with the switch to vSphere 5 some new features were introduced, which means that as of this release, we now have the following situation:

Block: File:
HW accelerated Zeroing NFS – Full Copy
HW accelerated Copy NFS – Extended Statistics
HW accelerated Locking NFS – Space Reservation

Some folks will say that I left out Thin Provision Stun, which is true. And while it does help to resolve some issues, I left it out because I don’t really view it as a hardware offload, which is what I’m trying to focus on.

I took the hardware in our lab, – a EMC VNX 5300 -, for a spin in our vSphere 5 setup to show the same thing as Erik showed in his blog, but instead showing off some of the File / NFS accelerations.

To get the VNX to actually support NAS VAAI offloading and get the result you expected, you need to meet the following prerequisites:

  • vSphere 5 – You need vSphere 5 installed with an Enterprise or Enterprise Plus license
  • VNX OE for File 7.0.35.x – You need your VNX Operating Environment for File to be at least at version VNX OE 7.0.35.x or newer
  • NFSv3 – The offloads only work on NFSv3-based datastores
  • The vSphere NFS VAAI offload plugin which is referenced here

If all those prerequisites are met, you should normally be able to go in to your vSphere Client and see Hardware Acceleration as Supported:

You could also enable SSH for your ESXi host, – do this by going to the individual host, click on the “Configuration” tab, select “Security Profile” and start the SSH service -, and check the support from the command line. For block devices you could enter the following command:

esxcli storage core device vaai status getand get back a result that shows you the NAAID, the VAAI plugin name, and the primitives with their support state. By using the following command:
esxcli storage core device list you get a similar output, but again this only works for block devices, and won’t really help you when checking the support for NFS. I haven’t found any way so far to get a reliable statement back via SSH, but I’ll try to continue looking and update this post if I find something.

In case of the VNX, we can actually check on the array itself to see if we are using the primitives, so I’m actually showing you the output from the array itself, using the following command on the VNX:

server_stats server_2 -monitor nfs.v3.vstorage -type accu -i 1
First off, I went back in to my ESXi host and went in to the NFSv3 datastore that was hosting my virtual machine. In this case, a Windows 2008 server, running an SAP Enterprise Portal, and I used the vmkfstools to create a clone:

vmkfstools -i GI-C-SAP-EPBW.vmdk CLONE-GI-C-SAP-EPBW.vmdkand I set off a snap using a similar command:
vmkfstools -i GI-C-SAP-EPBW.vmdk CLONE-GI-C-SAP-EPBW.vmdk. All the while, I had the VNX command that I posted before running in a different window. The output from the VNX was showing that we are actually using the VAAI NFS offloading functions:

server_2 NFS VAAI op VAAI Op Calls VAAI Op Total uSecs VAAI VAAI Op
Timestamp Op Max Average
uSecs uSec/Op
09:07:14
09:07:15
09:07:16
09:07:17
09:07:18 vaaiFastClone 1 0 0 0
vaaiVxAttrs 3 0 1 0
vaaiRegister 5 0 0 0
09:07:19
.......
09:08:27 vaaiOffloadStatus 1 0 0 0
vaaiVxAttrs 7 1 1 0
vaaiRegister 10 0 0 0
09:08:28
09:08:29
09:08:30
09:08:31
09:08:32 vaaiOffloadStatus 2 0 0 0
server_2 NFS VAAI op VAAI Op Calls VAAI Op Total uSecs VAAI VAAI Op
Summary Op Max Average
uSecs uSec/Op
Minimum vaaiFullClone 0 0 83308 -
vaaiFastClone 0 0 0 0
vaaiOffloadStatus 0 0 0 0
vaaiOffloadAbort 0 0 0 -
vaaiVxAttrs 0 0 1 0
vaaiReserveSpace 0 0 0 -
vaaiRegister 0 0 0 0
Average vaaiFullClone 0 0 83308 -
vaaiFastClone 1 0 0 0
vaaiOffloadStatus 0 0 0 0
vaaiOffloadAbort 0 0 0 -
vaaiVxAttrs 3 0 1 0
vaaiReserveSpace 0 0 0 -
vaaiRegister 5 0 0 0
Maximum vaaiFullClone 0 0 83308 -
vaaiFastClone 1 0 0 0
vaaiOffloadStatus 2 0 0 0
vaaiOffloadAbort 0 0 0 -
vaaiVxAttrs 7 1 1 0
vaaiReserveSpace 0 0 0 -
vaaiRegister 10 0 0 0
(sorry for the formatting, I couldn’t get it to show the way it should).

Once the files are created, use a:
vmkfstools --extendedstat GI-C-SAP-EPBW.vmdk on the source file, or on the snap or clone to actually display the extended statistics. The “Capacity bytes” show the allocated space for the virtual disk, the “Used bytes” displays the blocks used for the virtual disk which in case of our snapshot is the fast clone and it’s parent. The “Unshared bytes” displays the usage of the actual fast clone itself without the parent.

I should point out that the offload did speed up my full clone operation, but it was “only” in the range of 20%. That isn’t a great deal, but using both esxtop and the vSphere Client performance graphs showed that the ESXi server was busy doing what it is supposed to do: virtualizing my resources! And that’s the most important thing, isn’t it?

Virtualization, VMware, vSphere

vSphere 5 introduction – Links

Hey folks,

since there is a lot going on surrounding the VMware launch of vSphere 5 today, I thought I’d just start a little page with links to the various blog posts and press announcements around this event.

So, here goes:

GestaltIT, VAAI, vCloud Director, Virtualization, VMware, VMworld, vSphere

vSphere 5, it’s here! What’s new?

It’s here, it’s here. 😉

VMware just announced their new version of vSphere 5, and as you have probably found out, general availability is targeted toward August this year. There is a whole bunch, and I mean a whole bunch, of new stuff coming out, and everyone knows what we can expect at VMworld this year.

Let me be clear that this post is in no way trying to sum up all the new things that are introduced with vSphere 5, but this is mean to give you a quick and easy to consume overview of some of the major new features.

Key stuff that is new or has changed in vSphere 5:

  • Virtual hardware limits. We can now address 32 virtual CPUs and a maximum of 1TB of RAM (note that virtual machine hardware type 8 is required). We see people running larger and larger workloads, and are seeing more and more people moving their tier 1 applications to a virtualized environent. Anyone who has tried to virtualize a large database or business warehouse system will know what I mean.

    One word of caution though. Even though we can now create very large installations, be careful. This is not a sensible size for all applications, and you should check on an application specific basis if you really need something this big, and are able to leverage all of the resources it offers.

  • VMFS version 5. With the updated version of the VMFS there are some modifications being made. For one, you no longer need to use extents to create volumes over 2TB in size, and they have added support for physical RDMs that are over 2TB.
  • The service console is missing. Well, it's not really missing, but there is no more service console, due to the fact that you will now only find ESXi as the hypervisor. Although some people might be missing some things without the traditional ESX service console, this does offer some advantages like having only a single vSphere package, hardened security and less patches. But this should probably be one of the changes that almost everyone has seen coming, so I'm not going to go in to the depths on the pros and cons of this choice.
  • VAAI has again been enhanced. With vSphere 5, there are enhancements for both block and file based storage:
    • for block:
      • Thin provision stun has been added, which is basically an option to get feedback when a thinly provisioned LUN is full. You will now get a message back from the array, and the affected guests are “stunned”. This allows the admin to add some more free space to the LUN, after which the guests can resume normal operation.
      • Space reclaim is the second feature that has been added. Now, one caveat is that this hardware offload is dependent of VMFS version 5. Anything prior to that won’t do the job. If that prerequisite is in place, any blocks that are freed up by VMFS operations, things like VM deletion or snapshot deletion, will now return their blocks to the pool of free blocks.
    • For file:
      • You can now use NFS full copy. Somewhat similar to the block version, copying of files can now be offloaded to the array, which of course should speed up things like clone creation.
      • Extended stats adds the ability to get the extended information from files. Information about actual space allocation or the fact if the file is deduplicated can now be retrieved.
      • We can now use space reservation, to actually pre-initialize a disk and allocate all of the required space right off the bat.
  • VMware has redesigned HA. The new architecture should help people who want to work with streched clusters.

    Basically, VMware has moved away from underlying EMC Autostart based construct to an entirely new model. The HA agent is now called the FDM, and one of the nodes in the cluster will now take on the role of master. All of the remaining nodes in the cluster are slaves to this master, which means that we are no longer using the primary/secondary concept that was common with the previous version of HA. During normal operation, we should only see one master node per cluster.

    Benefits of the new construct are that we are no longer that susceptible to DNS issues. Also, VMware has added additional communication paths, -we can now also leverage so called “Heartbeat datastores”-, that will aid us in the detection of failures. And, as a bonus VMware has also added support for IPv6.

    Since the entire HA stack has been rewritten, there are a lot of changes coming, and I’m planning on getting down to the nitty gritty in a future post, and I’m sure that my friend Duncan will also be explaining this in great detail on his blog.

  • VASA, or “vSphere Storage API for Storage Awareness” is basically a way for the storage array to actually tell vSphere what it can do, or what it is currently doing. Imagine getting feedback if your storage is cable of VAAI. Or something more simple like telling vSphere what RAID level a datastore has. Sounds sensible right? Now combine that with the new Storage DRS in vSphere 5, and you get a pretty good picture of what VASA can help you with.
  • Storage DRS. The DRS feature in vSphere is already pretty well known, and it’s something that I see in use a lot at customer sites.

    Well, now you can also use DRS for your storage. To enable this feature, you create a so called “datastore cluster”, which is in essence nothing more than several datastores combined. Now, when you create a new VM, it is placed inside of a datastore cluster, and storage DRS balances everything out based on some key criteria like space utilization and I/O latencies. More to follow in a different post.

Now, this is by no means a complete overview, and I’ll be going in to these an other new features in upcoming posts. And I don’t want to flood you with information that can also be found on plenty of other blogs out there, but this should give you a good start. Look back for the things mentioned up here, but also for things like the added support for software based FCoE initiators, APD / PDL, the vSphere storage appliance, the new SRM 5 or vCloud Director 1.5.

GestaltIT, vCloud Director, VMware, vSphere

Shorts: VMware vCloud Director not displaying the web portal

A colleague of mine approached me today with a question on our vCloud Director environment. He tried to log in to the vCloud Director portal, and was unable to log in, because there was no page being displayed at all.

After checking if I was able to ping the interface, I logged on to the machine to see if there were any obvious errors. The vCloud Director daemon was still running and so was the database, but a netstat did not show any listeners on the vCloud Director IP. So, after going over the vCloud Director log files, there was a pretty obvious error in the vcloud-container-info.log:

ORA-28001: the password has expired

So, you now stop your vCloud Director daemon and switch to your Oracle user to see what was going on inside of the DB:
sqlplus "/ as sysdba"

Now, list all the users to see if they have an expired password:
select username,ACCOUNT_STATUS,EXPIRY_DATE from dba_users;

Or display just the specific user:
select username,ACCOUNT_STATUS,EXPIRY_DATE from dba_users where username='VCLOUD';

And guess what came up:
USERNAME ACCOUNT_STATUS EXPIRY_DA
-------- -------------- ---------
VCLOUD EXPIRED 17-MAR-11

Expired is something that you don’t want to see for a user that is being used actively. So, let’s set the password again and unlock the user:
alter user VCLOUD identified by replace_this_with_your_password;
alter user VCLOUD account unlock;

So, once that is done, let’s check one more time:
SQL> select username,ACCOUNT_STATUS,EXPIRY_DATE from dba_users where username='VCLOUD';
USERNAME ACCOUNT_STATUS EXPIRY_DA
-------- -------------- ---------
VCLOUD OPEN 26-SEP-11

Now, start your vCloud Director daemon again, and in the log file you should no longer see the error, and the web interface should be working normally again.

Update – April 11th 2011:

One of my other colleagues actually ran in to the same issue and found my blog post. He gave me some feedback asking if I would not be able to add how to find the sqlplus binaries since not everyone is a Linux master, so here goes:

Normally, if Oracle is installed on Linux, it is one of the prerequisites to set the environment variables. Basically this means that you tune your system to allow Oracle to run on it. You perform tasks like telling the system how much shared memory to use, you set semaphores and create a seperate user under which the Oracle installation runs.

Part of these tasks usually also means setting the path to the Oracle binaries for this user I just mentioned. Now, in some situations, your database is already installed, but you don’t know as what user or in what directory. This is not necessarily an issue. Just use the “ps” command to list all processes from all users. Use something like:

ps -efor
ps auxf

and look for the Oracle processes. At the start of the line you should see as which user these processes are running.

Once you have identified the user, switch to said user, using the following command:

su - user_name
Obviously, replace the user_name with the actual username. The “su” (or “switch user” if you will) is a command to actually switch to a different user. The dash or minus sign that is appended after the “su” command, makes “su” pass the environment along unchanged, as if you were actually logged in as the specified user.

The benefit of adding the dash, is that the user environment is set correspondingly, meaning that your path for the Oracle user is also set. This in turn means, that you normally don’t have to worry about the exact path to the Oracle installation. Normally you can just enter “sqlplus” in the way described above, and you should be set.

Should you still not be able to find sqlplus, you can try using the “find” command to search for sqlplus. Try using something like this:

find / -name sqlplus
This actually tells the find command to start searching in the “/” or root-directory for files with sqlplus in their names. Depending on your Linux release, you could also change the “-name” option to “-iname”, which changes the search to ignore the case in your search. This way, your search would also return a result if the binary would be called SQLplus (most Unices and Linux installations are case sensitive).

Once you have found your sqlplus binary, just enter the full path to the binary and you should be set.

If you have any other feedback, just let me know folks, and I’ll be more than happy to append it to my post.

GestaltIT, Performance, Storage, VAAI, Virtualization, VMware, vSphere

What is VAAI, and how does it add spice to my life as a VMware admin?

EMC EBC Cork
I spent some days in Cork, Ireland this week presenting to a customer. Besides the fact that I’m now almost two months in to my new job, and I’m loving every part of it, there is one part that is extremely cool about my job.

I get to talk to customers about very cool and new technology that can help them get their job done! And while it’s in the heart of every techno loving geek to get caught up in bits and bytes, I’ve noticed one thing very quickly. The technology is usually not the part that is limiting the customer from doing new things.

Everybody know about that last part. Sometimes you will actually run in to a problem, where some new piece of kit is wreaking havoc and we can’t seem to put our finger on what the problem is. But most of the time, we get caught up in entirely different problems altogether. Things like processes, certifications (think of ISO, SOX, ITIL), compliance, security or just something “simple” as people who don’t want to learn something new or feel threatened about their role that might be changing.

And this is where technology comes in again. I had the ability to talk about several things to this customer, but one of the key points was that technology should help make my life easier. One of the cool new things that will actually help me in that area was a topic that was part of my presentation.

Some of the VMware admins already know about this technology, and I would say that most of the folks that read blogs have already heard about it in some form. But when talking to people at conventions or in customer briefings, I get to introduce folks over and over to a new technology called VAAI (vStorage API for Array Integration), and I want to explain again in this blog post what it is, and how it might be able to help you.

So where does it come from?

Well, you might think that it is something new. And you would be wrong. VAAI was introduced as a part of the vStorage API during VMworld 2008, even though the release of the VAAI functionality to the customers was part of the vSphere 4.1 update (4.1 Enterprise and Enterprise Plus). But VAAI isn’t the entire vStorage API, since that consists of a family of APIs:

  • vStorage API for Site Recovery Manager
  • vStorage API for Data Protection
  • vStorage API for Multipathing
  • vStorage API for Array Integration

Now, the “only API” that was added with the update from vSphere 4.0 to vSphere 4.1 was the last API, called VAAI. I haven’t seen any of the roadmaps yet that contain more info about future vStorage APIs, but personally I would expect to see even more functionality coming in the future.

And how does VAAI make my life easier?

If you read back a couple of lines, you will notice that I said that technology should make my life easier. Well, with VAAI this is actually the case. Basically what VAAI allows you to do is offload operations on data to something that was made to do just that: the array. And it does that at the ESX storage stack.

As an admin, you don’t want your ESX(i) machines to be busy copying blocks or creating clones. You don’t want your network being clogged up with storage vMotion traffic. You want your host to be busy with compute operations and with the management of your memory, and that’s about it. You want as much reserve as you can on your machine, because that allows you to leverage virtualization more effectively!

So, this is where VAAI comes in. Using the API that was created by VMware, you can now use a set of SCSI commands:

  • ATS: This command helps you out with hardware assisted locking, meaning that you don’t have to lock an entire LUN anymore but can now just lock the blocks that are allocated to the VMDK. This can be of benefit, for example when you have multiple machines on the same datastore and would like to create a clone.
  • XSET: This one is also called “full copy” and is used to copy data and/or create clones, avoiding that all data is sent back and forth to your host. After all, why would your host need the data if everything is stored on the array already?
  • WRITE-SAME: This is one that is also know as “bulk zero” and will come in handy when you create the VM. The array takes care of writing zeroes on your thin and thick VMDKs, and helps out at creation time for eager zeroed thick (EZT) guests.

Sounds great, but how do I notice this in reality?

Well, I’ve seen several scenarios where for example during a storage vMotion, you would see a reduction in CPU utilization of 20% or even more. In the other scenarios, you normally should also see a reduction in the time it takes to complete an operation, and the resources that are allocated to perform such an operation (usually CPU).

Does that mean that VAAI always reduces my CPU usage? Well, in a sense: yes. You won’t always notice a CPU reduction, but one of the key criteria is that with VAAI enabled, all of the SCSI operations mentioned above should always perform faster then without VAAI enabled. That means that even when you don’t see a reduction in CPU usage (which is normally the case), you will see that since the operations are faster, you get your CPU power back more quickly.

Ok, so what do I need, how do I enable it, and what are the caveats?

Let’s start off with the caveats, because some of these are easy to overlook:

  • The source and destination VMFS volumes have different block sizes
  • The source file type is RDM and the destination file type is non-RDM (regular file)
  • The source VMDK type is eagerzeroedthick and the destination VMDK type is thin
  • The source or destination VMDK is any sort of sparse or hosted format
  • The logical address and/or transfer length in the requested operation are not aligned to the minimum alignment required by the storage device (all datastores created with the vSphere Client are aligned automatically)
  • The VMFS has multiple LUNs/extents and they are all on different arrays

Or short and simple: “Make sure your source and target are the same”.

Key criteria to use VAAI are the use of vSphere 4.1 and an array that supports VAAI. If you have those two prerequisites set up you should be set to go. And if you want to be certain you are leveraging VAAI, check these things:

  • In the vSphere Client inventory panel, select the host
  • Click the Configuration tab, and click Advanced Settings under Software
  • Check that these options are set to 1 (enabled):
    • DataMover/HardwareAcceleratedMove
    • DataMover/HardwareAcceleratedInit
    • VMFS3/HardwareAcceleratedLocking

Note that these are enabled by default. And if you need more info, please make sure that you check out the following VMware knowledge base article: >1021976.

Also, one last word on this. I really feel that this is a technology that will make your life as a VMware admin easier, so talk to your storage admins (if that person isn’t you in the first case) or your storage vendor and ask if their arrays support VAAI. If not, ask them when they will support it. Not because it’s cool technology, but because it’s cool technology that makes your job easier.

And, if you have any questions or comments, please hit me up in the remarks. I would love to see your opinions on this.

Update: 2010-11-30
VMware guru and Yellow Bricks mastermind Duncan Epping was kind enough to point me to a post of his from earlier this week, that went in to more detail on some of the upcoming features. Make sure you check it out right here.