Performance – BasRaayman's Technical Take

Chris Mellor over at the Register put an interview online with NetApp CEO Tom Georgens.

To quote from the Register piece:

He is dismissive of multi-level tiering, saying: “The simple fact of the matter is, tiering is a way to manage migration of data between Fibre Channel-based systems and serial ATA based systems.”

He goes further: “Frankly I think the entire concept of tiering is dying.”

Now, for those who are not familiar with the concept of tiering, it’s basically moving data between faster and slower media in the background. Clasically tiering is something that every organization is already doing. You consider the value of the information, and based on that you decide if this data should be accessible instantly from your more expensive hardware, and even at home you will see that as the value decreases you will archive that data to a media that has a different type of performance like your USB archiving disk or for example by burning it to a DVD.

For companies the more interesting part in tiering comes with automation. To put it simply, you want your data to be available on a fast drive when you need it, and it can remain on slower drives if you don’t require it at that moment. Several vendors each have their own specific implementation of how they tier their storage, but you find this kind of technology coming from almost any vendor.

Aparrantly, NetApp has a different definition of tiering, since according to their CTO tiering is limited to the “migration of data between Fibre Channel-based systems and serial ATA based systems”. And this is where I heartily disagree with him. I purposely picked the example of home users who are also using different tiers, and it’s no different for all storage vendors.

The major difference? They remove the layer of fibre channel drives in between of the flash and SATA drives. They still tier their data to the medium that is most fitting. They will try to do that automatically (and hopefully succeed in doing so), but just don’t call it tiering anymore.

As with all vendors, NetApp is also trying to remove the fibre channel drive layer, and I am convinced that this will be possible as soon as the prices of flash drives can be compared to those of regular fibre channel drives, and the automated tiering is automated to a point that any actions performed are transparent to the connected system.

But, if NetApp doesn’t want to call it tiering, that’s fine by me but I hope they don’t honestly expect customers to fall for it. The rest of the world will continue to call it tiering, and they will try to sell you a purple unicorn that moves data around disk types as if by magic.

It’s not very common knowledge, but there is actually a link between the I/O performance you see on your server and the number of metas you configured when using SRDF/S.

I do a lot of stuff in our company and I tend to get pulled in to performance escalations. Usually because of the fact that I know my way around most modern operating systems, I know a bit about storage and about our applications and databases. Usually the problems all boil down to a common set of issues, and perhaps one day I will post a catalog of common performance troubleshooting tips here, but I wanted to use this post to write about something that was new to me and I thought it might be of use to you.

We have a customer with a large installation on Linux that was seeing performance issues in his average dialog response time. Now, for those who don’t know what a dialog response time is, it is the time it takes an SAP system to display a screen of information, process any data entered or requested there by the database and output the next screen with the requested information. It doesn’t include any time needed for network traffic of the time taken up by the front-end systems.

The strange thing was that the database reported fairly good response times, an excellent cache hit ratio but also reported that any waits were produced by the disks it used. When we looked at the Symmetrix box behind it we could not see any heavy usage on the disks, and it reported to be mostly “picking it’s nose”.

After a long time we got the suggestion that perhaps the SRDF/S mirroring was to blame for this delay. We decided to change to an RDF mode called “Adaptive Copy Write Pending” or ACWP and did indeed see a performance improvement, even though the database and storage box didn’t seem to show the same improvement that was seen in the dialog response time.

Then, someone asked a fairly simple questions:

“How many meta members do you use for your LUNs?”

Now, the first thought with a question like that is usually along the line of the number of spindles, short stroking and similar stuff. Until he said that the number of meta members also influences the performance when using SRDF/S. And that’s where it get’s interesting and I’m going to try and explain why this is so interesting.

To do that let’s first take a closer look at how SRDF works. SRDF/S usually gives you longer write response times. This because you write to the first storage box, copy everything over to the second box, receive an acknowledge from the second box and then respond back to say that the write was ok. You have to take things like propagation delay and RDF write times into account.

Now, you also need to consider that when you are using the synchronous mode, you can only have 1 outstanding write I/O per hyper. That means that if your meta consists of 4 hyper volumes you get 4 outstanding write I/Os. If you create your meta out of more hyper volumes you also increase the maximum number of outstanding write I/Os or higher sustained write rates if your workload is spread evenly.

So, lets say for example you have a host that is doing 8 Kb write I/O’s to a meta consisting of 2 hypers. The Remote site is about 12 miles away and you have a write service time of 2 ms. Since you have a 1000 ms in one second each hyper can do roughly 500 IOPS since you would need to divide the 1000 ms by the servie time of 2 ms: 1000 ms/2 ms = 500

Now, with 2 hypers in your meta you would roughly have around 8 MB/sec:
2 (hypers) x 500 IOPS x 8 KB.

And you can also see that if we increase the number of hypers, we also increase the maximum value. This is mostly true for random writes, and the behavior will be slightly different for sequential loads since these use a stripe size of 960 KB. And don’t forget that this is a cache to cache value since we are talking about the data being transferred between the Symmetrixes. We won’t receive a write commit until we get a write acknowledge from the second storage array.

So, what we will be doing next are two things. We will be increasing the number of hypers for the metas that our customer is using. Besides that we will also be upgrading our Enginuity since we expect a slightly different caching behavior.

I’ll try to see if I can update this post when we changed the values just to give you a feel on the difference it made (or perhaps did not make) and I hope this information is useful for anyone facing similar problems.