VMAX VSA: IT’S ALIVE!!!!!!!!!!!!!!!!

31 08 2011

So folks, here’s a shameless copy of a blog post from one of the guys on my team. Dave was just brilliant and actually created a virtual storage appliance of the EMC VMAX. I think that’s downright awesome, and I wanted to help him get attention for what he did, so I asked him if I could copy his blog post, which is what you will find here:

young_frankenstein_doc_small

 

As the title suggests there is indeed a Symmetrix VMAX VSA. I have been working on this project since shortly after EMC World. As I look back through my emails, I received the code on 6/3/11 and I have been working on it in almost all of my free time since then.

Now finally it will make its public debut this week at VMworld 2011 as part of the EMC Interactive Demo booth on the show floor. As part of its grand unveiling I thought I would tell you a little about what makes it work.

Now to make a few things clear up front, this is a science project, I cannot distribute it, it does “work”. As part of the lab (I will publish the guide) the student actually provisions an iSCSI disk from the VSA to a ESXi 5.0 host.

One of the first things I noticed with the code when trying to virtualize it. It’s HUGE. There are 2 parts to the VSA.

1. The Service Processor (SP). In a physical VMAX this is the 1U server that is racked in the system bay. It has a special image of Windows XP and contains all of the proprietary software used to manage a VMAX. If you own a VMAX this is what you will see EMC field service personnel using when they come to work on your system. This is NOT accessible by a end-user as it requires special RSA credentials that change weekly. (one reason we can’t distribute it). Its specs are 2vCPU and 2GB of RAM and about 10GB of disk space.

2. Enginuity. This is the Operating Environment of the Symmetrix. For the purposes of this VSA it runs in a SuSE Enterprise Linux 11VM. One of the big deals with the VMAX was that Enginuity was ported from a PowerPC CPU to a Intel x86 based architecture. Without this change this VSA would never exist. Now this VM is big, so big as a matter of fact i had to use a RC build of vSphere 5 in order to even get it to work. I was finally able to scale it down a bit, but at one point it was using 32 vCPU’s 92GB of RAM and about 250GB of disk space.

Obviously one of the challenges for using this in a lab is that I needed it to use fewer resources. In the beginning this VMAX was a Single Engine model, which means it had 16 “slices” running. Each director has 4 DA (backend) directors, and 4 FA (front end) directors. I quickly found this was the biggest reason i needed so much memory and CPU. After working with one developer Chakib, who totally rocks by the way. We were able to scale this down to 1 FA and 1 DA per director. One interesting side note, when I was going down this path I asked Chakib what kind of VM he was using to test this. His reply was, “I am not using this in a VM, I have a physical Linux box with 200GB of RAM”. So I clearly had some work to do. But in its current state it uses 8 vCPU and “ONLY” 48GB of RAM. Which is still pretty darn big, but a lot better than it was when we started.

The networking requirements are pretty simple, the SP needs 1 Public NIC so that we can use its management tools. 2 Internal NICs which is used for internal communication to the directors. In our case that’s the Linux VM. The Linux VM needed the 2 internal NICs and 1 NIC to present an iSCSI target to. Then we put out ESXi host’s VMkernel NIC on the same vSwitch so it can use the iSCSI target provided by the VSA.

So that’s all great you say, but what actually works? That’s a good question.

What works is using Standard Devices, and very small ones today. One of the things I was told when I was given the code was that this WON’T and CAN’T do any I/O. Which obviously proved to be a bit of an issue. Chakib really worked his butt of to get me something that does I/O. So this is not like the Celerra UBER VSA by @lynxbat, where you can run a VM off of it. We hope we can do that one day. Thin Pools work to the extent you can create them, and put devices in a pool, but when you present it to a host it will not work. This kept me from using the VSI SPM plugin for vSphere as part of my lab, hey we always have next year! The really neat part to me is that the internal tools (SymmWin) that run on the SP fully work. It’s like having an actually VMAX, but without all the fuss of getting a few 50A power drops. As an ex-customer this to me is the coolest part, I got to put on my own BIN files, use Inlines (internal tool used to directly talk to the hardware). As a total nerd this thing is a dream come true.

So what’s next?

Well a lot of that depends on YOU! Since this is a total science project we need to show those in Symmetrix Engineering this is worth putting their time and money into. I need everyone here at VMworld this week to come try this thing, give me feedback, leave comments here, and if you aren’t at the show, express your desire for us to continue working on it. If no one is interested this will ultimately die on the vine. Please fill out this form so we can show how many of you all would like to see this project continue.

I have to give special thanks to Chad Sakac (@sakacc), Chris Horn (@horn_Chris) for getting me involved in this project and letting me run with it. Also all of the support they gave me during this process.

Here is a link to the lab guide being used this week at VMworld. Take a look and let me know what you think!

VMAX Lab Guide

Big thanks to Matt Cowger (@mcowger), Scott Lowe (@scott_lowe), and Tee Glasgow (@teeglasgow) for their help with the lab guide. Also to Rick Scherer (@rick_vmwaretips) for the blog help





Shorts: Trouble with symapi_db.bin causing erratic behavior

26 05 2010

Usually when you are connected to a EMC Symmetrix array you will install the Solutions Enabler package on your system. Solutions Enabler is basically both a set of tools to help you manage your Symmetrix arrays, as well as an API. The Solutions Enabler basically creates a small database that displays what Symmetrix arrays are connected to the host you are running the software on, the so called SYMAPI database that you will find as a file on your system called “symapi_db.bin”.

Under a normal situation you will run a discover process to initially scan and fill the database with entries. To do that you can issue the command:

symcfg discover

This will start the scan operation, and depending on the amount of arrays and the configuration on those arrays you can plan anywhere from just under a minute for a scan up to several minutes. Once the file has been created you could try opening the file and searching for strings inside of the file, and you will find a lot of information about devices, device paths, disk IDs and lot’s more.

Now, in some situations after your array configuration has changed, it is useful to refresh the database file. Under normal circumstances this should all be easily done and without any issues.

However, in some cases your database file might be facing problems, without manifestation in any obvious ways. I have seen cases where new devices would simply not show up. Other examples are error messages about disks that can not be reached because of access control list errors.

If you happen to have some erratic behavior on one of your hosts, you might want to try one thing before creating a service request in Powerlink. You might want to try creating a copy of your database, removing it and then performing a new discover. Some steps to help you do just that:

  • Create a backup of your device and/or composite groups using the symdg/symcg commands.
  • Rename your old symapi_db.bin to something else.
  • Issue a “symcfg discover” to create a new symapi_db.bin
  • Import your device and/or composite groups from the backup file(s) you created.

This won’t help you in all situations, but it helped me solve several cases were we were seeing erratic behavior on our hosts, and it might do the trick for you.





The thing about metas, SRDF/S and performance

8 01 2010

It’s not very common knowledge, but there is actually a link between the I/O performance you see on your server and the number of metas you configured when using SRDF/S.

I do a lot of stuff in our company and I tend to get pulled in to performance escalations. Usually because of the fact that I know my way around most modern operating systems, I know a bit about storage and about our applications and databases. Usually the problems all boil down to a common set of issues, and perhaps one day I will post a catalog of common performance troubleshooting tips here, but I wanted to use this post to write about something that was new to me and I thought it might be of use to you.

We have a customer with a large installation on Linux that was seeing performance issues in his average dialog response time. Now, for those who don’t know what a dialog response time is, it is the time it takes an SAP system to display a screen of information, process any data entered or requested there by the database and output the next screen with the requested information. It doesn’t include any time needed for network traffic of the time taken up by the front-end systems.

The strange thing was that the database reported fairly good response times, an excellent cache hit ratio but also reported that any waits were produced by the disks it used. When we looked at the Symmetrix box behind it we could not see any heavy usage on the disks, and it reported to be mostly “picking it’s nose”.

After a long time we got the suggestion that perhaps the SRDF/S mirroring was to blame for this delay. We decided to change to an RDF mode called “Adaptive Copy Write Pending” or ACWP and did indeed see a performance improvement, even though the database and storage box didn’t seem to show the same improvement that was seen in the dialog response time.

Then, someone asked a fairly simple questions:

“How many meta members do you use for your LUNs?”

Now, the first thought with a question like that is usually along the line of the number of spindles, short stroking and similar stuff. Until he said that the number of meta members also influences the performance when using SRDF/S. And that’s where it get’s interesting and I’m going to try and explain why this is so interesting.

To do that let’s first take a closer look at how SRDF works. SRDF/S usually gives you longer write response times. This because you write to the first storage box, copy everything over to the second box, receive an acknowledge from the second box and then respond back to say that the write was ok. You have to take things like propagation delay and RDF write times into account.

Now, you also need to consider that when you are using the synchronous mode, you can only have 1 outstanding write I/O per hyper. That means that if your meta consists of 4 hyper volumes you get 4 outstanding write I/Os. If you create your meta out of more hyper volumes you also increase the maximum number of outstanding write I/Os or higher sustained write rates if your workload is spread evenly.

So, lets say for example you have a host that is doing 8 Kb write I/O’s to a meta consisting of 2 hypers. The Remote site is about 12 miles away and you have a write service time of 2 ms. Since you have a 1000 ms in one second each hyper can do roughly 500 IOPS since you would need to divide the 1000 ms by the servie time of 2 ms: 1000 ms/2 ms = 500

Now, with 2 hypers in your meta you would roughly have around 8 MB/sec:
2 (hypers) x 500 IOPS x 8 KB.

And you can also see that if we increase the number of hypers, we also increase the maximum value. This is mostly true for random writes, and the behavior will be slightly different for sequential loads since these use a stripe size of 960 KB. And don’t forget that this is a cache to cache value since we are talking about the data being transferred between the Symmetrixes. We won’t receive a write commit until we get a write acknowledge from the second storage array.

So, what we will be doing next are two things. We will be increasing the number of hypers for the metas that our customer is using. Besides that we will also be upgrading our Enginuity since we expect a slightly different caching behavior.

I’ll try to see if I can update this post when we changed the values just to give you a feel on the difference it made (or perhaps did not make) and I hope this information is useful for anyone facing similar problems.








Follow

Get every new post delivered to your Inbox.

Join 1,597 other followers