VMworld 2012 – Call for voting and a jiffy?!

30 05 2012

vote! by smallcaps, on FlickrThe Twitter world has been slightly abuzz. The reason? Well, a couple of weeks ago people were allowed to submit session proposals on VMworld.com. Basically, the call for papers is a way for folks to say “Hey, this is a cool idea for a session I have. This is what I would like to talk about.”. You submitted that on the site, and a first selection was made of the submissions, before they were now put online.

What do you need to do now? Well, you need to vote! If you go to VMworld.com you can click on the “Call for Papers Public Voting” link, and then cast a vote for the sessions you would like to see at VMworld. The only thing you need is a registered account at VMworld.com, and if you don’t have an account, you can create one here.

Once your are on the site, just browse through the sessions, and click on the thumb symbol in front of the session to cast your vote. It’s as easy as that, and you can vote for all the sessions that seem interesting to you (and others).

And while you are browsing, why not also take a quick look at session number 1665? This was submitted by my colleague Jonas Rosland and myself, and is titled:

Automagically Set-up Your Private Cloud Lab Environment: From Empty Box to Infrastructure as a Service in a Jiffy!

After casting your vote, it should look like this:

In the session, we will cover setting up a fully automated vCloud Director deployment in your lab environment. Starting off with an empty server and teaching you how to automate the installation of a full Cloud Infrastructure with ESXi, vCenter, vCloud Director and vApps, combined with the power of vCenter Orchestrator. And with all of this combined, you’ll be done in a jiffy!

If you think it would be interesting, we are both thankful for your vote! :)





vSphere 5, it’s here! What’s new?

12 07 2011

It’s here, it’s here. ;)

VMware just announced their new version of vSphere 5, and as you have probably found out, general availability is targeted toward August this year. There is a whole bunch, and I mean a whole bunch, of new stuff coming out, and everyone knows what we can expect at VMworld this year.

Let me be clear that this post is in no way trying to sum up all the new things that are introduced with vSphere 5, but this is mean to give you a quick and easy to consume overview of some of the major new features.

Key stuff that is new or has changed in vSphere 5:

  • Virtual hardware limits. We can now address 32 virtual CPUs and a maximum of 1TB of RAM (note that virtual machine hardware type 8 is required). We see people running larger and larger workloads, and are seeing more and more people moving their tier 1 applications to a virtualized environent. Anyone who has tried to virtualize a large database or business warehouse system will know what I mean.

    One word of caution though. Even though we can now create very large installations, be careful. This is not a sensible size for all applications, and you should check on an application specific basis if you really need something this big, and are able to leverage all of the resources it offers.

  • VMFS version 5. With the updated version of the VMFS there are some modifications being made. For one, you no longer need to use extents to create volumes over 2TB in size, and they have added support for physical RDMs that are over 2TB.
  • The service console is missing. Well, it's not really missing, but there is no more service console, due to the fact that you will now only find ESXi as the hypervisor. Although some people might be missing some things without the traditional ESX service console, this does offer some advantages like having only a single vSphere package, hardened security and less patches. But this should probably be one of the changes that almost everyone has seen coming, so I'm not going to go in to the depths on the pros and cons of this choice.
  • VAAI has again been enhanced. With vSphere 5, there are enhancements for both block and file based storage:
    • for block:
      • Thin provision stun has been added, which is basically an option to get feedback when a thinly provisioned LUN is full. You will now get a message back from the array, and the affected guests are “stunned”. This allows the admin to add some more free space to the LUN, after which the guests can resume normal operation.
      • Space reclaim is the second feature that has been added. Now, one caveat is that this hardware offload is dependent of VMFS version 5. Anything prior to that won’t do the job. If that prerequisite is in place, any blocks that are freed up by VMFS operations, things like VM deletion or snapshot deletion, will now return their blocks to the pool of free blocks.
    • For file:
      • You can now use NFS full copy. Somewhat similar to the block version, copying of files can now be offloaded to the array, which of course should speed up things like clone creation.
      • Extended stats adds the ability to get the extended information from files. Information about actual space allocation or the fact if the file is deduplicated can now be retrieved.
      • We can now use space reservation, to actually pre-initialize a disk and allocate all of the required space right off the bat.
  • VMware has redesigned HA. The new architecture should help people who want to work with streched clusters.

    Basically, VMware has moved away from underlying EMC Autostart based construct to an entirely new model. The HA agent is now called the FDM, and one of the nodes in the cluster will now take on the role of master. All of the remaining nodes in the cluster are slaves to this master, which means that we are no longer using the primary/secondary concept that was common with the previous version of HA. During normal operation, we should only see one master node per cluster.

    Benefits of the new construct are that we are no longer that susceptible to DNS issues. Also, VMware has added additional communication paths, -we can now also leverage so called “Heartbeat datastores”-, that will aid us in the detection of failures. And, as a bonus VMware has also added support for IPv6.

    Since the entire HA stack has been rewritten, there are a lot of changes coming, and I’m planning on getting down to the nitty gritty in a future post, and I’m sure that my friend Duncan will also be explaining this in great detail on his blog.

  • VASA, or “vSphere Storage API for Storage Awareness” is basically a way for the storage array to actually tell vSphere what it can do, or what it is currently doing. Imagine getting feedback if your storage is cable of VAAI. Or something more simple like telling vSphere what RAID level a datastore has. Sounds sensible right? Now combine that with the new Storage DRS in vSphere 5, and you get a pretty good picture of what VASA can help you with.
  • Storage DRS. The DRS feature in vSphere is already pretty well known, and it’s something that I see in use a lot at customer sites.

    Well, now you can also use DRS for your storage. To enable this feature, you create a so called “datastore cluster”, which is in essence nothing more than several datastores combined. Now, when you create a new VM, it is placed inside of a datastore cluster, and storage DRS balances everything out based on some key criteria like space utilization and I/O latencies. More to follow in a different post.

Now, this is by no means a complete overview, and I’ll be going in to these an other new features in upcoming posts. And I don’t want to flood you with information that can also be found on plenty of other blogs out there, but this should give you a good start. Look back for the things mentioned up here, but also for things like the added support for software based FCoE initiators, APD / PDL, the vSphere storage appliance, the new SRM 5 or vCloud Director 1.5.





Shorts: VMware vCloud Director not displaying the web portal

30 03 2011

A colleague of mine approached me today with a question on our vCloud Director environment. He tried to log in to the vCloud Director portal, and was unable to log in, because there was no page being displayed at all.

After checking if I was able to ping the interface, I logged on to the machine to see if there were any obvious errors. The vCloud Director daemon was still running and so was the database, but a netstat did not show any listeners on the vCloud Director IP. So, after going over the vCloud Director log files, there was a pretty obvious error in the vcloud-container-info.log:

ORA-28001: the password has expired

So, you now stop your vCloud Director daemon and switch to your Oracle user to see what was going on inside of the DB:
sqlplus "/ as sysdba"

Now, list all the users to see if they have an expired password:
select username,ACCOUNT_STATUS,EXPIRY_DATE from dba_users;

Or display just the specific user:
select username,ACCOUNT_STATUS,EXPIRY_DATE from dba_users where username='VCLOUD';

And guess what came up:
USERNAME ACCOUNT_STATUS EXPIRY_DA
-------- -------------- ---------
VCLOUD EXPIRED 17-MAR-11

Expired is something that you don’t want to see for a user that is being used actively. So, let’s set the password again and unlock the user:
alter user VCLOUD identified by replace_this_with_your_password;
alter user VCLOUD account unlock;

So, once that is done, let’s check one more time:
SQL> select username,ACCOUNT_STATUS,EXPIRY_DATE from dba_users where username='VCLOUD';
USERNAME ACCOUNT_STATUS EXPIRY_DA
-------- -------------- ---------
VCLOUD OPEN 26-SEP-11

Now, start your vCloud Director daemon again, and in the log file you should no longer see the error, and the web interface should be working normally again.

Update – April 11th 2011:

One of my other colleagues actually ran in to the same issue and found my blog post. He gave me some feedback asking if I would not be able to add how to find the sqlplus binaries since not everyone is a Linux master, so here goes:

Normally, if Oracle is installed on Linux, it is one of the prerequisites to set the environment variables. Basically this means that you tune your system to allow Oracle to run on it. You perform tasks like telling the system how much shared memory to use, you set semaphores and create a seperate user under which the Oracle installation runs.

Part of these tasks usually also means setting the path to the Oracle binaries for this user I just mentioned. Now, in some situations, your database is already installed, but you don’t know as what user or in what directory. This is not necessarily an issue. Just use the “ps” command to list all processes from all users. Use something like:

ps -efor
ps auxf

and look for the Oracle processes. At the start of the line you should see as which user these processes are running.

Once you have identified the user, switch to said user, using the following command:

su - user_name
Obviously, replace the user_name with the actual username. The “su” (or “switch user” if you will) is a command to actually switch to a different user. The dash or minus sign that is appended after the “su” command, makes “su” pass the environment along unchanged, as if you were actually logged in as the specified user.

The benefit of adding the dash, is that the user environment is set correspondingly, meaning that your path for the Oracle user is also set. This in turn means, that you normally don’t have to worry about the exact path to the Oracle installation. Normally you can just enter “sqlplus” in the way described above, and you should be set.

Should you still not be able to find sqlplus, you can try using the “find” command to search for sqlplus. Try using something like this:

find / -name sqlplus
This actually tells the find command to start searching in the “/” or root-directory for files with sqlplus in their names. Depending on your Linux release, you could also change the “-name” option to “-iname”, which changes the search to ignore the case in your search. This way, your search would also return a result if the binary would be called SQLplus (most Unices and Linux installations are case sensitive).

Once you have found your sqlplus binary, just enter the full path to the binary and you should be set.

If you have any other feedback, just let me know folks, and I’ll be more than happy to append it to my post.





What is VAAI, and how does it add spice to my life as a VMware admin?

27 11 2010

EMC EBC Cork

I spent some days in Cork, Ireland this week presenting to a customer. Besides the fact that I’m now almost two months in to my new job, and I’m loving every part of it, there is one part that is extremely cool about my job.

I get to talk to customers about very cool and new technology that can help them get their job done! And while it’s in the heart of every techno loving geek to get caught up in bits and bytes, I’ve noticed one thing very quickly. The technology is usually not the part that is limiting the customer from doing new things.

Everybody know about that last part. Sometimes you will actually run in to a problem, where some new piece of kit is wreaking havoc and we can’t seem to put our finger on what the problem is. But most of the time, we get caught up in entirely different problems altogether. Things like processes, certifications (think of ISO, SOX, ITIL), compliance, security or just something “simple” as people who don’t want to learn something new or feel threatened about their role that might be changing.

And this is where technology comes in again. I had the ability to talk about several things to this customer, but one of the key points was that technology should help make my life easier. One of the cool new things that will actually help me in that area was a topic that was part of my presentation.

Some of the VMware admins already know about this technology, and I would say that most of the folks that read blogs have already heard about it in some form. But when talking to people at conventions or in customer briefings, I get to introduce folks over and over to a new technology called VAAI (vStorage API for Array Integration), and I want to explain again in this blog post what it is, and how it might be able to help you.

So where does it come from?

Well, you might think that it is something new. And you would be wrong. VAAI was introduced as a part of the vStorage API during VMworld 2008, even though the release of the VAAI functionality to the customers was part of the vSphere 4.1 update (4.1 Enterprise and Enterprise Plus). But VAAI isn’t the entire vStorage API, since that consists of a family of APIs:

  • vStorage API for Site Recovery Manager
  • vStorage API for Data Protection
  • vStorage API for Multipathing
  • vStorage API for Array Integration

Now, the “only API” that was added with the update from vSphere 4.0 to vSphere 4.1 was the last API, called VAAI. I haven’t seen any of the roadmaps yet that contain more info about future vStorage APIs, but personally I would expect to see even more functionality coming in the future.

And how does VAAI make my life easier?

If you read back a couple of lines, you will notice that I said that technology should make my life easier. Well, with VAAI this is actually the case. Basically what VAAI allows you to do is offload operations on data to something that was made to do just that: the array. And it does that at the ESX storage stack.

As an admin, you don’t want your ESX(i) machines to be busy copying blocks or creating clones. You don’t want your network being clogged up with storage vMotion traffic. You want your host to be busy with compute operations and with the management of your memory, and that’s about it. You want as much reserve as you can on your machine, because that allows you to leverage virtualization more effectively!

So, this is where VAAI comes in. Using the API that was created by VMware, you can now use a set of SCSI commands:

  • ATS: This command helps you out with hardware assisted locking, meaning that you don’t have to lock an entire LUN anymore but can now just lock the blocks that are allocated to the VMDK. This can be of benefit, for example when you have multiple machines on the same datastore and would like to create a clone.
  • XSET: This one is also called “full copy” and is used to copy data and/or create clones, avoiding that all data is sent back and forth to your host. After all, why would your host need the data if everything is stored on the array already?
  • WRITE-SAME: This is one that is also know as “bulk zero” and will come in handy when you create the VM. The array takes care of writing zeroes on your thin and thick VMDKs, and helps out at creation time for eager zeroed thick (EZT) guests.

Sounds great, but how do I notice this in reality?

Well, I’ve seen several scenarios where for example during a storage vMotion, you would see a reduction in CPU utilization of 20% or even more. In the other scenarios, you normally should also see a reduction in the time it takes to complete an operation, and the resources that are allocated to perform such an operation (usually CPU).

Does that mean that VAAI always reduces my CPU usage? Well, in a sense: yes. You won’t always notice a CPU reduction, but one of the key criteria is that with VAAI enabled, all of the SCSI operations mentioned above should always perform faster then without VAAI enabled. That means that even when you don’t see a reduction in CPU usage (which is normally the case), you will see that since the operations are faster, you get your CPU power back more quickly.

Ok, so what do I need, how do I enable it, and what are the caveats?

Let’s start off with the caveats, because some of these are easy to overlook:

  • The source and destination VMFS volumes have different block sizes
  • The source file type is RDM and the destination file type is non-RDM (regular file)
  • The source VMDK type is eagerzeroedthick and the destination VMDK type is thin
  • The source or destination VMDK is any sort of sparse or hosted format
  • The logical address and/or transfer length in the requested operation are not aligned to the minimum alignment required by the storage device (all datastores created with the vSphere Client are aligned automatically)
  • The VMFS has multiple LUNs/extents and they are all on different arrays

Or short and simple: “Make sure your source and target are the same”.

Key criteria to use VAAI are the use of vSphere 4.1 and an array that supports VAAI. If you have those two prerequisites set up you should be set to go. And if you want to be certain you are leveraging VAAI, check these things:

  • In the vSphere Client inventory panel, select the host
  • Click the Configuration tab, and click Advanced Settings under Software
  • Check that these options are set to 1 (enabled):
    • DataMover/HardwareAcceleratedMove
    • DataMover/HardwareAcceleratedInit
    • VMFS3/HardwareAcceleratedLocking

Note that these are enabled by default. And if you need more info, please make sure that you check out the following VMware knowledge base article: >1021976.

Also, one last word on this. I really feel that this is a technology that will make your life as a VMware admin easier, so talk to your storage admins (if that person isn’t you in the first case) or your storage vendor and ask if their arrays support VAAI. If not, ask them when they will support it. Not because it’s cool technology, but because it’s cool technology that makes your job easier.

And, if you have any questions or comments, please hit me up in the remarks. I would love to see your opinions on this.

Update: 2010-11-30
VMware guru and Yellow Bricks mastermind Duncan Epping was kind enough to point me to a post of his from earlier this week, that went in to more detail on some of the upcoming features. Make sure you check it out right here.





Shorts: VMware vCloud Director installation tips

29 10 2010

So folks, I helped a colleague install the VMware vCloud Director. In case you are not aware of what the vCloud Director is I can give you a very rough description.

Think about how you deploy virtual machines. Usually you will deploy one machine at a time, which is a good thing if you only need one server. But usually in larger environments, you will find that applications or application systems are not based on a single server. You will find larger environments that consist of multiple servers that will segregate functions, so for example, your landscape could consist of a DB server, an application server, and one or more proxies that provide access to your application servers.

If you are lucky, the folks installing everything will only request one virtual machine at a time. Usually that isn’t the case though. Now, this is where vCloud Director comes in. This will allow you to roll out a set of virtual machines at a time as a landscape. But it doesn’t stop there, since you can do a lot more because you can pool things like storage, networks and you a tight integration with vShield to secure your environment. But this should give you a very rough idea of what you can do with the vCloud Director. For a more comprehensive overview, take a look at Duncan’s post here.

Anyway, let’s dig in to the technical part.

There are plenty of blog posts that cover how to set up the CentOS installation, so I won’t cover that at great length. If you are looking for that info, take a peek here. If you want to install the Oracle DB on CentOS, take a look here to see how it’s done.

Here are some tips that might come in useful during the install:

  • Use the full path to the keytool. There is a slight difference between /usr/bin/keytool, /usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/keytool and /opt/vmware/cloud-director/jre/bin/keytool. Be sure to use one of those, and if the commands to create and import your self-signed certificates are not working for some reason be sure to try a different one.

If you just simply create a database and browsed through the installation guide, you might have a hard time once you install the binary. Basically you run the “dbca” tool to create an empty database. If you by any chance forget to create the database files and run the installation binary (or the vCD configuration tool for that matter), you will receive an error while running the .sql database initialization scripts under /opt/vmware/cloud-director/db/oracle. The error message will tell you that there was an error creating the database.

Well, if only you had read the installation guide properly. Bascially what you do is start up the database:

sqlplus "/ as sysdba"
startup

Make sure that the path you use in the “create tablespace” command actually exists. If they don’t you need to perform “mkdir $ORACLE_HOME/oradata” first. Then create the tablespaces and corresponding files:

Create Tablespace CLOUD_DATA datafile '$ORACLE_HOME/oradata/cloud_data01.dbf' size 1000M autoextend on;
Create Tablespace CLOUD_INDX datafile '$ORACLE_HOME/oradata/cloud_indx01.dbf' size 500M autoextend on;

Now create a seperate user that we will give right for the database. The password for the user is the thing you type after “identified by”:

create user vcloud identified by vcloud default tablespace CLOUD_DATA;

Make sure that you give the user the correct rights to perform all the DB operations:

grant CONNECT, RESOURCE, CREATE TRIGGER, CREATE TYPE, CREATE VIEW, CREATE MATERIALIZED VIEW, CREATE PROCEDURE, CREATE SEQUENCE, EXECUTE ANY PROCEDURE to vcloud;

Now run the setup script, or run the configure script and you should be set to go.





Virtualization makes me say “Who cares about your hardware or operating system?!”

30 07 2010

When you come to think about it, people who work in the IT sector are all slightly nuts. We all work in an area that is notorious for trying to make itself not needed. When we find repetitive tasks, we try to automate them. When we have a feeling that we can improve something, we do just that. And by doing that, we try to remove ourselves from the equation where we possibly can. In a sense, we try to make ourselves invisible to the people working with our infrastructure, because a happy customer is one that doesn’t even notice that we are there or did something to allow him to work.

Traditional IT shops were loaded with departments that were responsible for storage, for networking, for operating systems and loads more. The one thing that each department has in common? They tried to make everything as easy and smooth as possible. Usually you will find loads of scripts that perform common tasks, automated installations and processes that intend to remove the effort from the admins.

In comes a new technology that allows me to automate even more, that removes the hassle of choosing the right hardware. That helps me reduce downtimes because of (un)planned maintenance. It also helps me reduce worrying about operating system drivers and stuff like that. It’s a new technology that people refer to as server virtualization. It’s wonderful and helps me automate yet another layer.

All of the people who are in to tech will now say “cool! This can help me make my life easier”, and your customer will thank you because it’s an additional service you can offer, and it helps your customer work. But the next question your customer is going to ask you is probably going to be something along the lines of “Why can’t I virtualize the rest?”, or perhaps even “Why can’t I virtualize my application?”. And you know what? Your customer is absolutely right. Companies like VMware are already sensing this, as can be read in an interview at GigaOM.

The real question your customer is asking is more along the lines of “Who cares about your hardware or operating system?!”. And as much as it pains me to say it (being a person who loves technology), it’s a valid question. When it comes to true virtualization, why should it bother me if am running on Windows, Unix, Mac or Linux? Who cares if there is an array in the background that uses “one point twenty-one jiggawatts” to transport my synchronously mirrored historic data back to the future?

In the long run, I as a customer don’t really care about either software or hardware. As a customer I only care about getting the job done, in a way that I expected to, and preferably as cheap as possible with the availability I need. In an ideal world, the people and the infrastructure in the back are invisible, because that means they did a good job, and I’m not stuck wondering what my application runs on.

This is the direction we are working towards in IT. It’s nothing new, and the concept of doing this in a centralized/decentralized fashion seem to change from decade to decade, but the only thing that remained a constant was that your customer only cared about getting the job done. So, it’s up to us. Let’s get the job done and try to automate the heck out of it. Lets remove ourselves from the equation, because time that your customer spends talking to you is time spent not using his application.





How do you define high availability and disaster recovery?

7 07 2010

A while back I was on a call with someone who asked me the difference between high availability (HA) and disaster recovery (DR), saying that there are so many different solutions out there and that a lot of people seem to use the terminology but are unable to explain anything more about these two descriptions. So, here’s an attempt to demystify things.

First of all, let’s take a look at the individual terms:

High Availability:

According to Wikipedia, you can define availability in the following ways:

The degree to which a system, subsystem, or equipment is operable and in a committable state at the start of a mission, when the mission is called for at an unknown, i.e., a random, time. Simply put, availability is the proportion of time a system is in a functioning condition.

The ratio of (a) the total time a functional unit is capable of being used during a given interval to (b) the length of the interval.

And most online dictionaries seem to have a similar definition of availability. When we are talking about HA, we imply that we want the functioning condition of your system to be increased.

Going by the above you will also notice that there is no fixed definition of the availability. Simply put, it would mean that you need to put your own definition in place when talking about HA. You need to define what HA means in your environment. I’ve had customers that needed HA and defined this as the system having a certain amount of uptime, which is one way to measure it.

On the other hand you would be hard pressed if you were able to work with your system, but the data that you were working with was corrupted because one of your power users made an error during a copy job and wrote an older data set in the wrong spot. This would mean that your system is in itself available. You can log on to it, you can work with it, but the output you are going to get will be wrong.

To me, such a scenario would mean that your system isn’t available. After all, it’s not about everything being online. It’s about using a system in the way you would expect it to work. But when you ask most people in IT about availability, the first thing you will likely hear is something related to uptime or downtime. So, my tip to you is once again:

Define what “available” means to you and your organization/customer!

Disaster Recovery:

Natural disasterLet’s do the same thing as before and let’s turn to some general definitions. Wikipedia defines disaster the following way:

disaster is a perceived tragedy, being either a natural calamity or man-made catastrophe. It is a hazard which has comes to fruition. A hazard, in turn, is a situation which poses a level of threat to life, health, property, or that may deleteriously affect society or an environment.

And recovery is defined the following way (when it comes to health):

Healing, or Cure, the process of recovering from an injury or illness.

So, in a nutshell this is about bouncing back to your feet once a disaster strikes. Now again, it’s important to define what you would call a disaster, but at least there seems to be some sort of common understanding that anything that would get you back up and running after an entire site goes down, usually falls under the label of a DR solution.

It all boils down to definitions!

When you talk to other companies or vendors about HA and/or DR, you will soon notice that most have a different understanding of what HA and DR are. Your main focus should be to have a clear definition for yourself. Try to find out the importance and value of your solution and base your requirements on that. Ask yourself simple questions like for example:

  • What is the maximum downtime I can cope with before I need to start working again? 8 hours per year? 1 hour per year? 4 hours per month? What is my RPO and RTO
  • How do I handle planned maintenance? Can I bring everything down or do I need to distribute my maintenance across independent entities?
  • Can I afford the loss of any data at all? Can I afford the partial loss of data?
  • What if we see a city-wide power outage? Do I need a failover site, or are all my users in the same spot and won’t be able to work anyway?

Questions like these will help you realize that not everything you have running has the same value. Your development system with 6000 people working on it worldwide might need better protection than your productive system that is only being used by 500 people spread through the Baltic region.

Or in short.

Knowing what kind of protection you need is key. Fact is that both HA and DR solutions never come cheap. If you need the certainty that your solution is available and able to recover from a disaster, you will notice that the price tag will quickly skyrocket. Which is another reason to make sure that you know exactly what kind of protection you need, and creating that definition is the most important starting point. Once you have your own definition, make sure that you communicate those definitions and requirements so that all parties are on the same page. It should make your life a little easier in the end.





This vendor is locking me in!

27 04 2010

Or so I’m told. Not just once or twice, but it’s something that is written down at least once each time a vendor introduces something new or when a revision of an existing product is rolled out.

Now, you could say that this is the pot calling the kettle black and I would agree with you. It’s a thing I mentioned in my UCS post, and also in my recent post on the stack wars. And today a tweet from @Zaxstor got me thinking about it some more. I asked the following on Twitter:

I hear this argument about vendor lock in all of the time. Open question: How do I avoid a vendor lock in? By going heterogeneous?

Because, when you think about it, we all are subject to vendor lock-in all of the time. As soon as I decide to purchase my new mobile phone, I am usually tied to either the phone manufacturer or the carrier that is use. Sometimes I am even tied to both, you just need to think about the iPhone as an example for this kind of lock-in.

The same goes for the car I drive. When I buy it from the dealer, I get an excellent package that is guaranteed to work. Until I take it to an inspection with a garage that is not part of the authorized network. My car will still drive, and will probably work great, but I no longer have a large part of the guarantees that came with it when I bought it, and would have been intact if I had taken it to an authorized dealer.

Now, I know my analogy is slightly flawed here since we are talking about things that work on a different scale and use entirely different technologies, but what I am trying to say is that we make decisions that lock us in with a certain vendor on an almost daily basis. Apparently the guys in and around the data center just like to talk about that problem a bit more.

One remark was made however by fellow blogger Dimitris Krekoukias and confirmed by several others:

“It’s not how you get in to the lock, but how you get out of it.”

And I do think that this is probably the key, but fortunately we have some help there from the competition. But it’s not all down to the others! All vendors are guilty with trying to sell something. It’s not their fault, it’s just something that “comes with the territory”. They will try to pitch you their product and make your head dizzy with what this new product can do. It’s all good, and it’s all grand according to them.

And yes, it is truly grand what this shiny new toy can do, but the question is if you really need it? Try to ask what kind of value a feature will offer in your specific setup. Try to judge if you really need this feature, and ask yourself the question what you are going to do if the feature proves to be less useful then expected.

Remember that not all is lost if you do lock yourself in with that vendor. Usually others will be quick to follow with new features and this is where the help from the competition comes in. Take the example with the mobile phone. Even if you will not receive any help from your current provider, you can bet that the provider that now also offers the same package will try to help you to become his customer. If NetApp is not providing you with an option to migrate out of that storage array, you can bet your pants that Hitachi will try and help you migrate to their arrays.

Now, I’m not saying that this is the best solution. Usually exchanging solutions is also accompanied with a loss of knowledge and investments that were made. But it’s all on you to factor that in before you take the plunge, and in the end that lock that you have with your current vendor might be hard and expensive to break, but usually it’s never a mission impossible.


P.S. Just as a side note, I’m not saying NetApp will not allow or help you to migrate out of an array, I’m just using these names as an example. Replace them with any vendor you like.

P.P.S. Being part of the discussion fellow blogger Storagebod posted something quite similar, be sure to read it here





My take on the stack wars

26 04 2010

As some of you might have read, the stack wars have started. One of the bigger coalitions announced in November 2009 was that between VMware, Cisco and EMC, aptly named VCE. Hitachi Data Systems announced something similar and partnered up with Microsoft, but left everyone puzzled about the partner that will be providing the networking technology in it’s stack. Companies like IBM have been able to provide customers with a complete solution stack for some time now, and IBM will be sure to tell it’s customers that they did so and offered the management tools in form of anything branded Tivoli. To me, IBM’s main weakness is not so much the stack that they offer, as the sheer number of solutions and the lack of one tool to manage it all, let alone getting an overview of all possible combinations.

So, what is this thing called the stack?

Actually the stack is just that, a stack. A stack of what you say? A stack of solutions, bound together by one or more management tools, offered to you as a happy meal that allows you to run the desired workloads on this stack. Or to put things more simply and quote from the Gestalt IT stack wars post:

  • Standard hardware configurations are specified for ease of purchasing and support
  • The hardware stack includes blade servers, integrated I/O technology, Ethernet networking for connectivity, and SAN or NAS storage
  • Unifying software is included to manage the hardware components in one interface
  • A joint services organization is available to help in selection, architecture, and deployment
  • Higher-level software, from the virtualization hypervisor through application platforms, will be included as well

Until now, we have usually seen a standardized form of hardware, including storage and connectivity. Vendors mix that up with one or multiple management tools and tend to target some form of virtualization. Finally a service offering is included to allow the customer to get service and support from one source.

This strategy has it’s advantages.

Compatibility is one of my favorite ones. You no longer need to work trough compatibility guides that are 1400 pages long and will burn you for installing a firmware version that was just one digit off and is now no longer supported in combination with one of your favorite storage arrays. You no longer have to juggle different release notes from your business warehouse provider, your hardware provider, your storage and network provider, your operating system and tomorrow’s weather forecast. Trying to find the lowest common denominator through all of this is still something magical. It’s actually a form of dark magic that usually means working long hours to find out if your configuration is even supported by all the vendors you are dealing with.

This is no longer the case with these stacks. Usually they are purpose or workload built and you have one central source where you get your support from. This source will tell you that you need at least firmware version X.Y on these parts to be eligible for support and you are pretty much set after that. And because you are working with a federated solution and received management tools for the entire stack, your admins can pretty much manage everything from this one console or GUI and be done with it. Or, if you don’t want to that you can use the service offering and have it done for you.

So far so good, right?

Yes, but things get more complicated from here on. For one there is one major problem, and that is flexibility. One of the bigger concerns came up during the Gestalt IT tech field day vBlock session at Cisco. With the vBlock, I have a fixed configuration and it will run smoothly and within certain performance boundaries as long as I stick to the specifications. In the case of a vBlock this was a quite obvious example, where if I add more RAM to a server blade then is specified, I no longer have a vBlock and basically no longer have those advantages previously stated.

Solution stacks force me to think about the future. I might be a Oracle shop now as far as my database goes. And Oracle will run fine on newly purchased stack. But what if I want to switch to Microsoft SQL Server in 3 years, because Mr. Ellison decided that he needs a new yacht and I no longer want to use Oracle? Is my stack also certified to run a different SQL server or am I no longer within my stack boundaries and lost my single service source or the guaranteed workload it could hold?

What about updates for features that are important to me as a single customer? Or what about the fact that these solution stacks work great for new landscapes, or in a highly homogeneous environment? But what about those other Cisco switches that I would love to manage from the tools that are offered within my vBlock, but are outside of the vBlock scope, even if they are the same models?

What about something simple as a “stack lock-in”? I don’t really have a vendor lock-in since only very few companies have the option of offering everything first hand. Microsoft doesn’t make server blades, Cisco doesn’t make SAN storage and that list goes on and on. But with my choice of stack, I am now locked in to a set of vendors, and I certainly have some tools to migrate in to that stack, but migrating out is an entirely different story.

The trend is the stack, it’s as simple as that. But for how long?

We can see the trend clearly. Every vendor seems to be working on a stack offering. I’m still missing Fujitsu as a big hardware vendor in this area, but I am absolutely certain we will see something coming from them. Smaller companies will probably offer part of their portfolio under some sort of OEM license or perhaps features will just be re-branded. And if they are successful enough, they will most likely be swallowed by the bigger vendors at some point.

But as with all in the IT, this is just a trend. Anyone who has been in the business longer than me can probably confirm this. We’ve seen a start with centralized systems, then moving towards a de-centralized environment. Now we are on the move again, centralizing everything.

I’m actually much more interested to see how long this trend will continue. I’m am certain that we will be seeing some more companies offer a complete solution stack, or joining in coalitions to offer said stack. I still think that Oracle was one of the first that pointed in this direction, but they were not the first to offer the complete stack.

So, how do you think this is going to continue? Do you agree with us? What companies do you think are likely to be swallowed, or will we see more coalitions from smaller companies? What are your takes on the advantages and disadvantages?

I’m curious to hear your take on this so let me know. I’m looking forward to what you have to say!





Gestalt IT Tech Field Day – On Cisco and UCS

14 04 2010

There are a couple of words that are high on my list as being the buzzwords for 2010. The previous year brought us things like “green computing”, but the new hip seems to be “federation”, “unification”. And let’s not forget the one that seems to last longer then just one year, it’s the problem solving term “cloud”.

Last Friday (April 9th), I and the rest of the Gestalt IT tech field day delegates were invited by Cisco to get a briefing on Cisco’s Unified Computing System or in short “UCS”. Basically this is Cisco’s view that builds on the notion that we are currently viewing a server as being tied to the application, instead of seeing the server as a resource that allows us to run that application.

Anyone in marketing will know that the next question being asked is “What is your suggestion to change all that?”, and Cisco’s marketing department didn’t disappoint us and tried to answer that question for us. The key, in their opinion, is using a system consisting of building blocks that allow me to to give customers a solution stack.

As the trend can be spotted to go towards commodity hardware, Cisco is following suit by using industry standard servers that are equipped with Intel Xeon processors. Other key elements are a virtualization of services, a focus on automated provisioning and unification of the fabric by means of FCoE.

What this basically means is that you order building blocks from Cisco in the form of blade servers, blade chassis, fabric interconnects and virtual adapters. But instead of connecting this stuff up and expanding my connectivity like I do in a standard scenario, I instead wire my hardware depending on the bandwidth requirements and that’s pretty much it. Once I am done with that, I can assign virtual interfaces as I need them on a per blade basis, which in term removes the hassle of plugging in physical adapters and cabling all that stuff up. In a sense it reminded me of the take that Xsigo offered with their I/O director, but with the difference that Cisco uses FCoE instead of Infiniband, and with Cisco you add the I/O virtualization to a more complete management stack.

The management stack

This is in my opinion the key difference. I can bolt together my own pieces of hardware and use the Xsigo I/O director in combination with VMware and have a similar set-up, but I will be missing out on one important element. A central management utility.

This UCS unified management offers me some advantages that I have not seen from other vendors. I can now tie the properties to the resources that I want, meaning that I can set up properties tied to a blade, but can also tie them to the VM or application running on that blade in form of service profiles. Things like MAC, WWN or QoS profiles are defined inside of these service profiles in an XML format and then applied to my resources as I see fit.

Sounds good, but…..?

There is always a but, that’s something that is almost impossible to avoid. Even though Cisco offers a solution that seems to offer some technical advantages, there are some potential drawbacks.

  • Vendor lock in:
    This is something that is quite easy to see. The benefit of getting everything from one vendor also means that my experience is only as good as the vendors support is in case of trouble. Same thing applies when ordering new hardware and there are unexpected problems somewhere in the ordering/delivery chain
  • The price tag:
    Cisco is not know to be cheap. Some would even say that Cisco is very expensive, and it will all boil down to one thing. Is the investment that I need to make for a UCS solution going to give me the return on invest? And is it going to do that anytime soon? Sure it can reduce my management overhead and complexity, sure it can lower my operational expense, but I want to see something in return for the money I gave Cisco and preferably today, not tomorrow.
  • Interoperability with my existing environment:
    This sort of stuff works great when you are lucky enough to create something new. A new landscape, a new data center or something along those lines. Truth is that usually we will end up adding something new to our existing environment. It’s great that I can manage all of my UCS stack with one management interface. But what about the other stuff? What if I already have other Cisco switches that are not connected to this new UCS landscape? Can I manage those using the built in UCS features? Or is this another thing that my admins have to learn?
  • The fact that UCS is unified does not mean that my company is:
    In smaller companies, you have a couple of sysadmins that do everything. They install hardware, configure the operating system, upload firewall policies to their routers and zone some new storage. So far so good, I’ll give them my new UCS gear and they usually know what goes where and will get going. Now I end up in the enterprise segment where I talk to one department to change my kernel parameters, a different to configure my switch port to auto-negotiate and the third one will check on the WWN of my fibre-channel HBA to see if this is matching to the one configured on the storage side. Now I need to get all of them together to work on creating the service profiles, although not all will be able to work outside of their knowledge silo. The other alternative would be to create a completely new team that just does UCS, but do I want that?

Besides the things that are fairly obvious and not necessarily Cisco’s fault, I think that Cisco was actually one of the first companies to go this way and one of the first to show an actual example of a federated and consolidated solution. Because that is what this is all about, it’s not about offering a piece of hardware, it’s about offering a solution. Initiatives like VCE and VCN only show us that Cisco is moving forward and is actually pushing towards offering complete solution stacks.

My opinion? I like it. I think Cisco have delivered something that is a usable showcase, and although unfortunately I have not been able to actually test it so far, I do really like the potential it offers and the way it was designed. If I ever get the chance to do some testing on a complete UCS stack, I’ll be sure to let you know more, but until then I at least hope that this post has made things a bit clearer and removed some of the questions you might have. And if that’s not the case, leave a comment and I will be sure to ask some more questions on your behalf.

Disclaimer:
The sponsors are each paying their share for this non-profit event. We, the delegates, are not paid to attend. Most of us will take some days off from our regular job to attend. What is paid for us is the flight, something to eat and the stay at a hotel. However as stated in the above post, we are not forced to write about anything that happens during the event, or to only write positive things.








Follow

Get every new post delivered to your Inbox.

Join 2,961 other followers

%d bloggers like this: