The Asymmetrical Logical Unit Access (ALUA) mode on CLARiiON

3 02 2010

I’ve noticed that I have been getting a lot of search engine hits relating to the various features, specifications and problems on the EMC CLARiiON array. One of the searches was related to a feature that has been around for a bit. It was actually introduced in 2001, but in order to give a full explanation I’m just going to start at the beginning.

DetourThe beginning is actually somewhere in 1979 when the founder of Seagate Technology, Alan Shugart, created the Shugart Associates Systems Interface (SASI). This was the early predecessor of SCSI and had a very rudimentary set of capabilities. Only few commands were supported and speeds were limited to 1.5 Mb/s. In 1981, Shugart Associates was able to convince the NCR corporation to team up and thereby convincing ANSI to set up a technical committee to standardize the interface. This was realized in1982 and known as the “X3T9.2 technical committee” and resulted in the name being changed to SCSI.

The committee published their first interface standard in 1986, but would grow on to become the group known now as “International Committee for Information Technology Standards” or INCITS and that is actually responsible for many of the standards used by storage devices such as T10 (SCSI), T11 (Fibre Channel) and T13 (ATA).

Now, in July 2001 the second revision of the SCSI Primary Commands (SPC-2) was published, and this included a feature called Asymmetrical Logical Unit Access mode or in short ALUA mode, and some changes were made in the newer revisions of the primary command set.

Are you with me so far? Good.

On Logical Unit Numbers

Since you came here to read this article I will just assume that I don’t have to explain the concept of a LUN. But what I might need to explain is that it’s common to have multiple connections to a LUN in environments that are concerned with the availability of their disks. Depending on the fabric and the amount of fibre channel cards you have connected you can have multiple paths to the same lun. And if you have multiple paths you might as well use them, right? It’s no good having the additional bandwidth lying around and then not using it.

Since you have multiple paths to the same disk, you need a tool that will somehow merge these paths and tell your operating system that this is the same disk. This tool might even help you achieve a higher throughput since it can balance the reads and writes over all of the paths.

As you might already have guessed there are multiple implementations of this, usually called Multipathing I/O, MPIO or just plainly Multipath, and you will be able to find a solution natively or as an additional piece of software for most modern operating systems.

What might be less obvious is that the connection to these LUNs don’t have to behave in the same way. Depending on what you are connecting to, you have several states for that connection. Or to draw the analogy to the CX4, some paths are active and some paths are passive.

Normally a path to a CLARiiON is considered active when we are connected to the service processor that is currently serving you the LUN. CLARiiON arrays are so called “active/passive” arrays, meaning that only one service processor is in charge of a LUN, and the secondary service processor is just waiting for a signal to take over the ownership in case of a failure. The array will normally receive a signal that tells it to switch from one service processor to the other one. This routine is called a “trespass” and happens so fast that you usually don’t really notice such a failover.

When we go back to the host, the connection state will be shown as active for that connection that is routed to the active service processor, and something like “standby” or “passive” for the connection that goes to the service processor that is not serving you that LUN. Also, since you have multiple connections, it’s not unlikely that the different paths can also have other properties that are different. Things like bandwith (you may have added a faster HBA later) or latency can be different. Due to the characteristics, the target ports might need to indicate how efficient a path is. And if a failure should occur, the link status might change, causing a path to go offline.

You can check the the status of a path to a LUN by asking the port on the storage array, the so called “target port”. For example, you can check the access characteristics of a path by sending the following SCSI command:

  • REPORT TARGET PORT GROUPS (RTPG)

Similar commands exist to actually set the state of a target port.

So where does ALUA come in?

What the ALUA interface does is allow an initiator (your server or the HBA in your server) to discover target port groups. Simply put, a group of ports that provide a common failover behavior for your LUN(s). By using the SCSI INQUIRY response, we find out to what standard the LUN adheres, if the LUN provides symmetric or asymmetric access, and if the LUN uses explicit or implicit failover.

To put it more simply, ALUA allows me to reach my LUN via the active and the inactive service processor. Oversimplified this just means that all traffic that is directed to the non-active service processor will be routed internally to the active service processor.

On a CLARiiON that is using ALUA mode this will result in the host seeing paths that are in an optimal state, and paths that are in an non-optimal state. The optimal path is the path to the active storage processor and is ready to perform I/O and will give you the best performance, and the non-optimal path is also ready to perform I/O but won’t give you the best performance since you are taking a detour.

The ALUA mode is available on CX-3 and CX-4, but the results you get can vary between both arrays. For example if you want to use ALUA with your vSphere installation you will need to use the CX-4 with FLARE 26 or newer and change the failover mode to “4”. Once you have changed the failover mode you will see a slightly different trespass behavior since you can now either manually initiate a trespass (explicit) or the array itself can perform a trespass once it’s noticed that the non-optimal path has received 128,000 or more I/Os than the optimal path (implicit).

Depending on which software you use – PowerPath or for example the native solution – you will find that ALUA is supported or not. You can take a look at Primus ID: emc187614 in Powerlink to get more details on supported configurations. Please note that you need a valid Powerlink account to access that Primus entry.


Actions

Information

9 responses

14 06 2010
ashwani duhan

Hi this link was useful to have a basic understanding of ALUA.
cheers!

16 06 2010
Bas Raayman

Thanks for the compliment, glad to hear it helped you.

26 07 2011
enealDC

Thanks a ton for explaining this. I was trying to understand ALUA

1 08 2011
Seven

Great!

16 10 2011
Greg

Good post, thanks. Spelling note: it’s asymmetrical with one s, not ass-ymmetrical. One s, 2 m’s.

21 06 2012
Gautam

nice post…so muc helpfull

4 07 2012
Andrew

Hi thanks very much for this. Another question, where could i find more details on A/A, A/A-A, A/P-F, A/P-G

29 03 2013
Eugene

I cant find any mention about ALUA in the SPC-2 standard (Revision 20
18 July 2001) As I see ALUA is only introduced in SPC-3. Please point me to the part of SPC-2 where you see ALUA, if I’m wrong.

17 03 2014
Ami Z

Thanks , very clear article !

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 3,443 other followers

%d bloggers like this: