Setting up the EMC Symmetrix adapter on the vCenter Operations Manager 5 vApp

10 07 2012

I present a lot on vCenter Operations Manager, a pretty neat monitoring tool from VMware. I like this tool a lot, because getting started with it is easy enough, and you have a plethora of features once you dive deeper in to it, and the best part about it? If you use the “big” version, -Enterprise or Enterprise Plus that is-, you can even monitor your applications and non-virtualized infrastructure. To monitor your things that go beyond your virtual machines, you can install so called “adapters”. In a nutshell, such an adapter is nothing more than a piece of software that tells vCenter Operations how to connect to things, and how to interpret the results it gets back. Now, EMC has created such an adapter for their VMAX and Symmetrix storage arrays, and has created a document that tells you how to set up and configure the adapter. That way, you can get loads of information from your storage system inside of vCenter Operations. Great stuff, right? Original image from: http://my.opera.com/supergreatChandu8/albums/showpic.dml?album=5466862&picture=82475462Yeah ok, maybe not so great. The biggest problem, is that the documentation seems to have been created for the normal installable version of vCenter Operations. However, VMware has also created a version in the form of an appliance, a so called vApp. You download the files, deploy the vApp, enter the IP-addresses of both virtual machines that are contained in the vApp, and away you go. Wonderfully easy to install, and besides certain limits in scalability, it offers pretty much the same functionality as the normal installer. This is where the problem starts if you want to use the EMC Symmetrix adapter. You can find almost all adapters on the Integrien FTP site, and there’s a folder containing all the files you need to get started with the Symmetrix adapter right here. My teammate Matt Cowger actually wrote a nice blog post on how to configure and set up the Symmetrix adapter. This works like a charm, except for one tiny thing that you will run in to when using the vCenter Operations vApp. When you go to create an adapter instance, you need to give it a name, indicate if you want to auto discover everything, and you need to input a path to the “EMC Symmetrix Main Input Folder”. This is the folder where you actually archive all of the performance and configuration data from your storage system. The documentation tells you that this should be:

* If the main input folder is on a remote Windows machine, you must share the folder before you add the adapter instance. Do not map the main input folder. Windows services do not work with mapped drives. * If the main input folder is on a remote Linux machine, you must mount the folder to the Collector server before you add the adapter instance.

Problem being, that if you actually have your Solutions Enabler host running on Windows, you need to input a UNC path in the format of \\servername\sharename. But the problem here is that the virtual machines inside the vApp do not come with any access methods for Windows shares. You won’t find any tools like mount.cifs, smbclient or even have the option to specify smbfs as the type of file system to mount. And that means what? Well, you will have two options to overcome this situation. You can either install the Services for Unix/Services for NFS on your Windows host and set up an NFS share on your Windows machine. Or, you can migrate your Solutions Enabler host to a Linux machine and set up everything there. OK, so how do I configure this stuff under Linux? Glad you asked. You can follow some of the steps from the post that Matt created, but I’m going to write them down here anyway so you will have one page with all the steps you need. I’m going to assume that you have already set up your Linux machine, and that you have installed the Solutions Enabler package. Go in to the following file: /usr/emc/API/symapi/config and add these following lines at the end of the file, then make sure you save your changes (create a backup of the original, this is always a good idea):

storstpd:dmn_run_spa = disable
storstpd:dmn_run_smc = disable
storstpd:dmn_run_ttp = enable
storstpd:dmn_run_ttp_on_sp = disable
storstpd:dmn_run_rtc = disable
storstpd:ttp_collection_interval = 5
storstpd:ttp_rdflnk_metrics = enable
storstpd:ttp_se_tcp_metrics = enable
storstpd:ttp_se_nw_metrics = enable
storstpd:ttp_dev_metrics = disable
storstpd:ttp_disk_metrics = disable
storstpd:ttp_dgdev_metrics = enable
storstpd:ttp_se_tcp_metrics = enable
storstpd:ttp_se_nw_metrics = enable
storstpd:ttp_se_nwi_metrics = enable
storstpd:ttp_re_sg_metrics = enable
storstpd:ttp_re_nwc_metrics = enable
storstpd:ttp_rdflnk_metrics = enable
storstpd:ttp_se_tcp_metrics = enable
storstpd:ttp_se_nw_metrics = enable
storstpd:use_compression = enable

Next, restart the storstpd daemon:

/opt/emc/SYMCLI/bin/stordaemon shutdown storstpd /opt/emc/SYMCLI/bin/stordaemon start storstpd

Check if the daemon is up and running again by issuing the following command. The first line should show the Daemon State as “Running”:

/opt/emc/SYMCLI/bin/stordaemon show storstpd

Now, since the Analytics VM will be actually collecting the information from the adapter, it needs to be able to access the files from your Solutions Enabler host. Since the Analytics VM will be running the collection process as a user called “Admin”, we need to consider something. The admin user on the vCenter Operations appliance will be running with a user ID (UID) of 1000, and a group ID (GID) of 1003. That means that we should either install our Solutions Enabler using a user with the same user ID and group ID, or we need to map some things so that the admin user can actually access the files later on. In order to export the directory with the required files for the Symmetrix adapter, we will add the following line to /etc/exports: /usr/emc/API/symapi/stp *(rw,insecure,all_squash,anonuid=0,anongid=0) Obviously, this isn’t the best you can do from a security perspective, so feel free to change these options as needed for your environment, but basically what we are doing here is this:

  • The * just means that all IP-addresses have access. You can change this to for example the IP of the analytics VM.
  • RW means that the export is created with read and write access.
  • Insecure means that clients can use non-reserved ports.
  • All_squash means that all users get mapped to the anonymous user account
  • anonuid=0 means that the anonymous user ID will get mapped to the user ID 0. Be careful since this is the root account!
  • anonguid=0 means that the anonymous group ID will get mapped to the group ID 0. Again, this is the root group!

If you did install your Solutions Enabler as a different user, make sure that you map the anonuid and anonguid to the respective numerical IDs, to allow access to the files we are going to export. Now, we simply restart the NFS server, or have it re-read its config should it already be online, using:

/etc/init.d/nfsserver restart

or

exportfs –ra

We can check if the export is working, using the following command:

showmount –e localhost

Now, we create a scheduled job to archive the Solutions Enabler file. To do that, add the following line to your crontab: 2-57/5 * * * * /opt/emc/SYMCLI/bin/stordaemon action storstpd -cmd archive This will cause the job start at 2 minutes past the hour, and run in 5 minute intervals. Check under /usr/emc/API/symapi/stp/ttp, to see if you have a new directory. Normally the directory should be the serial number of your storage array, and contain compressed files inside of that directory that contain the information the Symmetrix adapter will need. Final thing to do right now, is log on to the analytics VM, and create a folder where we will mount the required files. For example create a directory called /media/VMAX. Once you have created the directory, edit /etc/fstab to contain the following line: 10.10.10.10: /usr/emc/API/symapi/stp /media/VMAX nfs rw,lock 0 0 Make sure you change the IP address to match that of your Solutions Enabler host, and then mount the directory using the following command:

mount /media/VMAX

If you don’t have a firewall blocking communication, you should now be able to traverse the subdirectories and access the files. Finally, you can now configure the adapter, and input the directory you just mounted as the “EMC Symmetrix Main Input Folder”. So, in the text field, simply enter the following as the path:

/media/VMAX/ttp

If you test the adapter now, you should see it come back successfully, and after giving it a bit of time, start working with the data you are now importing from your VMAX/Symmetrix system. 🙂





VMAX VSA: IT’S ALIVE!!!!!!!!!!!!!!!!

31 08 2011

So folks, here’s a shameless copy of a blog post from one of the guys on my team. Dave was just brilliant and actually created a virtual storage appliance of the EMC VMAX. I think that’s downright awesome, and I wanted to help him get attention for what he did, so I asked him if I could copy his blog post, which is what you will find here:

young_frankenstein_doc_small

 

As the title suggests there is indeed a Symmetrix VMAX VSA. I have been working on this project since shortly after EMC World. As I look back through my emails, I received the code on 6/3/11 and I have been working on it in almost all of my free time since then.

Now finally it will make its public debut this week at VMworld 2011 as part of the EMC Interactive Demo booth on the show floor. As part of its grand unveiling I thought I would tell you a little about what makes it work.

Now to make a few things clear up front, this is a science project, I cannot distribute it, it does “work”. As part of the lab (I will publish the guide) the student actually provisions an iSCSI disk from the VSA to a ESXi 5.0 host.

One of the first things I noticed with the code when trying to virtualize it. It’s HUGE. There are 2 parts to the VSA.

1. The Service Processor (SP). In a physical VMAX this is the 1U server that is racked in the system bay. It has a special image of Windows XP and contains all of the proprietary software used to manage a VMAX. If you own a VMAX this is what you will see EMC field service personnel using when they come to work on your system. This is NOT accessible by a end-user as it requires special RSA credentials that change weekly. (one reason we can’t distribute it). Its specs are 2vCPU and 2GB of RAM and about 10GB of disk space.

2. Enginuity. This is the Operating Environment of the Symmetrix. For the purposes of this VSA it runs in a SuSE Enterprise Linux 11VM. One of the big deals with the VMAX was that Enginuity was ported from a PowerPC CPU to a Intel x86 based architecture. Without this change this VSA would never exist. Now this VM is big, so big as a matter of fact i had to use a RC build of vSphere 5 in order to even get it to work. I was finally able to scale it down a bit, but at one point it was using 32 vCPU’s 92GB of RAM and about 250GB of disk space.

Obviously one of the challenges for using this in a lab is that I needed it to use fewer resources. In the beginning this VMAX was a Single Engine model, which means it had 16 “slices” running. Each director has 4 DA (backend) directors, and 4 FA (front end) directors. I quickly found this was the biggest reason i needed so much memory and CPU. After working with one developer Chakib, who totally rocks by the way. We were able to scale this down to 1 FA and 1 DA per director. One interesting side note, when I was going down this path I asked Chakib what kind of VM he was using to test this. His reply was, “I am not using this in a VM, I have a physical Linux box with 200GB of RAM”. So I clearly had some work to do. But in its current state it uses 8 vCPU and “ONLY” 48GB of RAM. Which is still pretty darn big, but a lot better than it was when we started.

The networking requirements are pretty simple, the SP needs 1 Public NIC so that we can use its management tools. 2 Internal NICs which is used for internal communication to the directors. In our case that’s the Linux VM. The Linux VM needed the 2 internal NICs and 1 NIC to present an iSCSI target to. Then we put out ESXi host’s VMkernel NIC on the same vSwitch so it can use the iSCSI target provided by the VSA.

So that’s all great you say, but what actually works? That’s a good question.

What works is using Standard Devices, and very small ones today. One of the things I was told when I was given the code was that this WON’T and CAN’T do any I/O. Which obviously proved to be a bit of an issue. Chakib really worked his butt of to get me something that does I/O. So this is not like the Celerra UBER VSA by @lynxbat, where you can run a VM off of it. We hope we can do that one day. Thin Pools work to the extent you can create them, and put devices in a pool, but when you present it to a host it will not work. This kept me from using the VSI SPM plugin for vSphere as part of my lab, hey we always have next year! The really neat part to me is that the internal tools (SymmWin) that run on the SP fully work. It’s like having an actually VMAX, but without all the fuss of getting a few 50A power drops. As an ex-customer this to me is the coolest part, I got to put on my own BIN files, use Inlines (internal tool used to directly talk to the hardware). As a total nerd this thing is a dream come true.

So what’s next?

Well a lot of that depends on YOU! Since this is a total science project we need to show those in Symmetrix Engineering this is worth putting their time and money into. I need everyone here at VMworld this week to come try this thing, give me feedback, leave comments here, and if you aren’t at the show, express your desire for us to continue working on it. If no one is interested this will ultimately die on the vine. Please fill out this form so we can show how many of you all would like to see this project continue.

I have to give special thanks to Chad Sakac (@sakacc), Chris Horn (@horn_Chris) for getting me involved in this project and letting me run with it. Also all of the support they gave me during this process.

Here is a link to the lab guide being used this week at VMworld. Take a look and let me know what you think!

VMAX Lab Guide

Big thanks to Matt Cowger (@mcowger), Scott Lowe (@scott_lowe), and Tee Glasgow (@teeglasgow) for their help with the lab guide. Also to Rick Scherer (@rick_vmwaretips) for the blog help





Shorts: Trouble with symapi_db.bin causing erratic behavior

26 05 2010

Usually when you are connected to a EMC Symmetrix array you will install the Solutions Enabler package on your system. Solutions Enabler is basically both a set of tools to help you manage your Symmetrix arrays, as well as an API. The Solutions Enabler basically creates a small database that displays what Symmetrix arrays are connected to the host you are running the software on, the so called SYMAPI database that you will find as a file on your system called “symapi_db.bin”.

Under a normal situation you will run a discover process to initially scan and fill the database with entries. To do that you can issue the command:

symcfg discover

This will start the scan operation, and depending on the amount of arrays and the configuration on those arrays you can plan anywhere from just under a minute for a scan up to several minutes. Once the file has been created you could try opening the file and searching for strings inside of the file, and you will find a lot of information about devices, device paths, disk IDs and lot’s more.

Now, in some situations after your array configuration has changed, it is useful to refresh the database file. Under normal circumstances this should all be easily done and without any issues.

However, in some cases your database file might be facing problems, without manifestation in any obvious ways. I have seen cases where new devices would simply not show up. Other examples are error messages about disks that can not be reached because of access control list errors.

If you happen to have some erratic behavior on one of your hosts, you might want to try one thing before creating a service request in Powerlink. You might want to try creating a copy of your database, removing it and then performing a new discover. Some steps to help you do just that:

  • Create a backup of your device and/or composite groups using the symdg/symcg commands.
  • Rename your old symapi_db.bin to something else.
  • Issue a “symcfg discover” to create a new symapi_db.bin
  • Import your device and/or composite groups from the backup file(s) you created.

This won’t help you in all situations, but it helped me solve several cases were we were seeing erratic behavior on our hosts, and it might do the trick for you.





The thing about metas, SRDF/S and performance

8 01 2010

It’s not very common knowledge, but there is actually a link between the I/O performance you see on your server and the number of metas you configured when using SRDF/S.

I do a lot of stuff in our company and I tend to get pulled in to performance escalations. Usually because of the fact that I know my way around most modern operating systems, I know a bit about storage and about our applications and databases. Usually the problems all boil down to a common set of issues, and perhaps one day I will post a catalog of common performance troubleshooting tips here, but I wanted to use this post to write about something that was new to me and I thought it might be of use to you.

We have a customer with a large installation on Linux that was seeing performance issues in his average dialog response time. Now, for those who don’t know what a dialog response time is, it is the time it takes an SAP system to display a screen of information, process any data entered or requested there by the database and output the next screen with the requested information. It doesn’t include any time needed for network traffic of the time taken up by the front-end systems.

The strange thing was that the database reported fairly good response times, an excellent cache hit ratio but also reported that any waits were produced by the disks it used. When we looked at the Symmetrix box behind it we could not see any heavy usage on the disks, and it reported to be mostly “picking it’s nose”.

After a long time we got the suggestion that perhaps the SRDF/S mirroring was to blame for this delay. We decided to change to an RDF mode called “Adaptive Copy Write Pending” or ACWP and did indeed see a performance improvement, even though the database and storage box didn’t seem to show the same improvement that was seen in the dialog response time.

Then, someone asked a fairly simple questions:

“How many meta members do you use for your LUNs?”

Now, the first thought with a question like that is usually along the line of the number of spindles, short stroking and similar stuff. Until he said that the number of meta members also influences the performance when using SRDF/S. And that’s where it get’s interesting and I’m going to try and explain why this is so interesting.

To do that let’s first take a closer look at how SRDF works. SRDF/S usually gives you longer write response times. This because you write to the first storage box, copy everything over to the second box, receive an acknowledge from the second box and then respond back to say that the write was ok. You have to take things like propagation delay and RDF write times into account.

Now, you also need to consider that when you are using the synchronous mode, you can only have 1 outstanding write I/O per hyper. That means that if your meta consists of 4 hyper volumes you get 4 outstanding write I/Os. If you create your meta out of more hyper volumes you also increase the maximum number of outstanding write I/Os or higher sustained write rates if your workload is spread evenly.

So, lets say for example you have a host that is doing 8 Kb write I/O’s to a meta consisting of 2 hypers. The Remote site is about 12 miles away and you have a write service time of 2 ms. Since you have a 1000 ms in one second each hyper can do roughly 500 IOPS since you would need to divide the 1000 ms by the servie time of 2 ms: 1000 ms/2 ms = 500

Now, with 2 hypers in your meta you would roughly have around 8 MB/sec:
2 (hypers) x 500 IOPS x 8 KB.

And you can also see that if we increase the number of hypers, we also increase the maximum value. This is mostly true for random writes, and the behavior will be slightly different for sequential loads since these use a stripe size of 960 KB. And don’t forget that this is a cache to cache value since we are talking about the data being transferred between the Symmetrixes. We won’t receive a write commit until we get a write acknowledge from the second storage array.

So, what we will be doing next are two things. We will be increasing the number of hypers for the metas that our customer is using. Besides that we will also be upgrading our Enginuity since we expect a slightly different caching behavior.

I’ll try to see if I can update this post when we changed the values just to give you a feel on the difference it made (or perhaps did not make) and I hope this information is useful for anyone facing similar problems.








%d bloggers like this: