Nutanix, Performance, SAP, Uncategorized

RDMA on Nutanix AHV and the discomfort of heterogeneous environments

In the process of setting up our new environment for SAP HANA validation work, I spent some time in the data center setting up our environment, and I ran into some caveats which I figured I would share.

To set the stage, I am working with a Lenovo HX Nutanix cluster. The cluster consists of two HX-7820 appliances with 4x Intel 8180M CPU’s, 3TB RAM, NVMe, SSD and among other things two Mellanox CX-4 dual port NICs. The other two appliances are two HX-7821 with pretty much the same configuration except these systems have 6TB of RAM. The idea is to give this cluster as much performance as we can and to do that we decided to switch on Remote Direct Memory Access, also called RDMA in short.

Now, switching on RDMA isn’t that hard. Nutanix has added support for RDMA with AOS version 5.5, and according to our “one-click” mantra, it is as simple as going into our Prism web interface, clicking the gear symbol, going to “Network Configuration” and from the “Internal Interface” tab enable RDMA and put in the info about the subnet and VLAN you want to use as well as the priority number. On the switch side, you don’t need anything extremely complicated. On our Mellanox switch we did the following (note that you’d normally need to disable flow control on each port, but this is the default on Mellanox switches):

interface vlan 4000
dcb priority-flow-control enable force
dcb priority-flow-control priority 3 enable
interface ethernet 1/29/1 dcb priority-flow-control mode on force
interface ethernet 1/29/2 dcb priority-flow-control mode on force
interface ethernet 1/29/3 dcb priority-flow-control mode on force
interface ethernet 1/29/4 dcb priority-flow-control mode on force

With all of that in place, you would normally expect to see a small progress bar and that is it. RDMA set up and working.

Except that it wasn’t quite as easy in our scenario…

You see, one of the current caveats is that when you image a Nutanix host with AHV, we pass through the entire PCI device, in this case the NIC, to the controller VM (cVM). The benefit is that the cVM now has exclusive access to the PCI device. The issue that arises is that we currently do not forward a single port, which isn’t ideal in the case of a NIC that has multiple ports. Add on top of that the fact that we don’t give you the choice which port to use for RDMA, and the situation becomes slightly muddied.

So, first off. We essentially do nothing more than see if we have an RDMA capable NIC, and we pass through the first one that find during the imaging process. In a normal situation, this will always the RDMA capable NIC on the PCI-slot with the lowest slot number. It will also normally be the first NIC port that we find. Meaning that if you have for example a non-RDMA capable Intel NIC in PCI slot 4, and two dual port RDMA capable cards in slot 5 and 6, your designated RDMA interface is going to be the first port on the interface in slot 5.

Since you might want to see what MAC-address is being used, you can check from the cVM by running the ifconfig command against the rdma0 interface. Note that this interface by default will exist, but isn’t online, so it will not show up if you just run an ifconfig command without parameters:

ifconfig rdma0
rdma0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::ee0d:9aff:fed9:1322  prefixlen 64  scopeid 0x20
        ether ec:0d:9a:d9:13:22  txqueuelen 1000  (Ethernet)
        RX packets 71951  bytes 6204115 (5.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 477  bytes 79033 (77.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

To double check if you have the correct interface connected to your switch ports my tip would be to access the lights out management interface (IMM / ILO/ iDRAC, IPMI, etc.) and check your PCI devices from there. Usually these will tell you the MAC of the interface for the various PCI devices. Make sure you double check if you are connected to the right physical NIC and switch port.

The next topic that might come up is the fact that we will automatically disable c-states on the AHV host in the process of enabling RDMA. This is all done in the background, and again normally will be done automatically. In our case, since we added a couple of new nodes to the cluster, the BIOS settings were not the same across the cluster. The result of that was that on the AHV hosts, the HX-7820 nodes had the following file available that contained a value of “1”:

/sys/devices/system/cpu/cpu*/cpuidle/state[3-4]/disable

Due to the BIOS settings that were different on the NX-7821 hosts that we added, this file and the cpuidle (sub-)directories didn’t exist on the host. While the RDMA script tried to disable c-states 3 and 4 on the hosts, this was only successful on two out of the four nodes in the cluster. Upon comparing the BIOS settings we noticed some deviations in available settings due to differences in versions, and differences in some of the settings as they were delivered to us (MWAIT for example). After modifying the settings to match the other systems, the directories were now available and we could apply the c-states to all systems.

While we obviously have some work to do to add some more resiliency and flexibility to the way we enable RDMA, and it doesn’t hurt to have an operational procedure to ensure settings are the same on all systems before going online with them, I just want to emphasize one thing:

One click on the Nutanix platform works beautifully when all systems are the same.

There are however quite a couple of caveats that come into play when you work with a heterogeneous environment/setup:

  • Double check your settings at the BIOS level. Make them uniform as much as you can, but be aware of the fact that sometimes certain settings or options might not even be available or configurable anymore.
  • Plan your physical layout. Try not to mix a different number of adapters per host.
  • Create a physical design that can assist the people cabling with what to plug where to ensure consistency.
  • You can’t always avoid making changes to a production system, but if at all possible, have a similar smaller cluster for the purpose of quality assurance.
  • If you are working in a setup with a variety of systems things will hopefully work as designed but might not. Log tickets where possible, and provide info that goes a bit further than “it doesn’t work”. 😉

Oh, and one more thing. Plan extra time. The “quick” change of cables and enabling of RDMA ended up in spending 4 hours in the data center working through all of this. And that is with myself being pretty familiar with all of this. If you are new to this, again if at all possible, take your time to work through this, versus doing this on the fly and running into issues when you are supposed to be going live. 🙂

Uncategorized

Using a noVNC branch to connect to your Supermicro iKVM

Do you know how everyone just loves client side Java? Yeah, exactly. That’s why a lot of Nutanix customers will be quite happy with the IPMI firmware update for our latest systems. You will no longer have to rely on java to use the lights-out management, but you can simply use HTML5 to manage your systems.

Unfortunately, this update isn’t available to older systems. After cursing at Java again in my home lab, I decided to see if there was no way around it. Fortunately, I noticed that on GitHub, a developer called “kelleyk” posted a port of noVNC that adds support for ATEN iKVM, which is used in quite a couple of Supermicro servers.

So, after a bit of fiddling, I managed to get this to work on my Mac. First things first, download the fork from here: https://github.com/kelleyk/noVNC

Next up, make sure you have the xcode command line tools installed. If not, just run:

xcode-select --install

Also, since we will need a web socket bridge to forward requests to your IPMI interface (which encrypts traffic), make sure you have a certificate named self.pem, or generate one by issuing:

openssl req -new -x509 -days 365 -nodes -out self.pem -keyout self.pem

Once that is done, the rest is simple. Open a terminal and go to the directory where you copied the noVNC fork, and go to the “utils” directory. From there, run the launch.sh shell script, and provide it with the IP-address of your IPMI interface and use the default VNC port as the port number:

./launch.sh --vnc 192.168.10.10:5900

This will launch the script, and give you a link that you can open in your browser:

Using local websockify at ~/noVNC-bmc-support/utils/websockify/run
Starting webserver and WebSockets proxy on port 6080
WebSocket server settings:
 - Listen on :6080
 - Flash security policy server
 - Web server. Web root: /Users/basraayman/Downloads/noVNC-bmc-support
 - SSL/TLS support
 - proxying from :6080 to 192.168.10.10:5900

Navigate to this URL:

 http://Bass-MacBook-Pro-Retina.local:6080/vnc.html?host=Bass-MacBook-Pro-Retina.local&port=6080

Follow the link in your browser, leave the values as they are, and in the password field, input your IPMI username and password separated by a colon, so in the following format (note that ADMIN is both the default username and password in this case):

ADMIN:ADMIN

screen-shot-2016-12-03-at-15-56-09

Once that is done, click the connect button, and you are now able to connect to the lights-out interface using your browser, no java needed. And while this might not be super ideal (no forwarding of iso images and such), it should make day to day administration a bit easier.

screen-shot-2016-12-03-at-16-00-55

So, give it a whirl, and let me know if this works for you in the comments. 🙂

Uncategorized

[Shorts] Using the Nutanix Docker Machine Driver

Nutanix announced version 4.7 of its AOS, basically the operating system for the controller VMs. One of the things that is new, are the so called Acropolis Container Services. Basically allowing the Nutanix cluster to act as in such a way that you can for example use the docker-machine command (with a corresponding driver) to create new Docker instances that deploy automatically on your cluster.

While we are still waiting for the Docker Machine Driver to be posted to our portal, I’ve dug up an internal version of the driver, and decided to test it and post my experiences.

Right now it’s relatively simple to get started. You need your Nutanix cluster running AHV and some working credentials for the cluster. Also, you need a system that has Docker machine installed.

Once you have that, download the Docker machine driver for your platform from Nutanix (once it is posted that is). We should be offering a driver for OSX, Windows and Linux, so you should pretty much be covered.

On OSX the the procedure to get started is relatively simple. Copy the file you downloaded, and if it has an .osx extension in the filename, remove that extension. Modify the file to allow it to be executed (chmod +x), and then move it to the directory that contains your docker-machine executable/binary.

Now, you can start docker-machine and call the Nutanix driver, and that will give you a couple of additional options:

docker-machine create --driver nutanix 

Usage: docker-machine create [OPTIONS] [arg...]
 Create a machine

Description:
 Run 'docker-machine create --driver name' to include the create flags 
 for that driver in the help text.

Options:
 .... 
 --nutanix-endpoint Nutanix management endpoint ip address/FQDN [$NUTANIX_ENDPOINT]
 --nutanix-password Nutanix management password [$NUTANIX_PASSWORD]
 --nutanix-username Nutanix management username [$NUTANIX_USERNAME]
 --nutanix-vm-cores "1" Number of cores per VCPU of the VM to be created [$NUTANIX_VM_CORES]
 --nutanix-vm-cpus "1" Number of VCPUs of the VM to be created [$NUTANIX_VM_CPUS]
 --nutanix-vm-image [--nutanix-vm-image option --nutanix-vm-image option] The name of the VM disks to clone from, for the newly created VM
 --nutanix-vm-mem "1024" Memory in MB of the VM to be created [$NUTANIX_VM_MEM]
 --nutanix-vm-network [--nutanix-vm-network option --nutanix-vm-network option] The name of the network to attach to the newly created VM
 ....

Now, to create for example a VM based on an ISO image that is stored in your image configuration, you simply use one command:

docker-machine create --driver nutanix --nutanix-username 'docker-deployment-user' --nutanix-password 'P@ssw0rd' --nutanix-endpoint '10.0.0.50:9440'  --nutanix-vm-network production --nutanix-vm-image CentOS-7 Docker-CentOS

And you can obviously also add things like VM memory and CPU information. Once you hit enter, the driver connects to the Nutanix cluster and tells it what to do.

The result?

Screen Shot 2016-07-07 at 14.04.14

And you can manage your systems using docker-machine like you are used to. Now there is obviously more you can do, but I just wanted to give you a quick way to get started. So, have fun playing with it. 🙂

KVM, Nutanix, Storage, Uncategorized, Virtualization, VMware

Nutanix OS 4.0 – Prism Central

One of the features that has been announced for Nutanix OS 4.0 (also called NOS), is something called Prism Central.

So what does Prism Central do? Well, perhaps things are more obvious if we speak about the internal name we once used. It was referenced as our Multi-Cluster UI, and that is exactly what it is. Instead of having to open multiple tabs in your browser and switching between tabs to actually manage your Nutanix clusters, you can now open one tab, register multiple clusters, and manage them all from one interface, or get a basic overview of what is going on across all clusters.

First things first: Disclaimer – Keep in mind this is based on an early code version, and things will most likely change before you can download the software.

I spoke to our developers, and received a version to play with, so I’ll walk you through the process. Prism Central comes as an OVF, and you simply deploy this VM in your infrastructure. The requirements for the VM are the following (again, this might change):

8GB RAM
2 vCPUs
260GB disk space

With that configuration, you can monitor 100 nodes while we assume that you can go up to 100 VMs per node.

With that said, the installation itself is quite easy. We deploy the OVF from vCenter:

Prism Central - OVF Deployment
Prism Central – OVF Deployment

We give the VM a name:

Prism Central - OVF Deployment - Naming
Prism Central – OVF Deployment – Naming

And follow the normal steps for any OVF. Things like selecting a resource pool, datastore, and then selecting the disk format and network mapping. You will only need one interface, but I’d recommend deploying the Prism Central VM in the same network as your controller VMs. Once that is done, you click on “Finish” and wait for the VM to deploy:

Prism Central - OVF Deployment - Finished
Prism Central – OVF Deployment – Finished

Now, my assumption is that we will be changing to the OVA format to make deployment a bit easier. In this version, I still had to configure the IP addresses manually (no DHCP in my network), and deploying from an OVA should make that a breeze, but I will outline the steps I used here anyway.

After connecting to the vSphere console of the VM, we log on to the console using “nutanix” as the user and “nutanix/4u” as the password. Then, you simply edit the file /etc/sysconfig/network-scripts/ifcfg-eth0 and input the IP-address you would like to use. In my case it looks like this:

DEVICE="eth0"
NM_CONTROLLED="no"
ONBOOT="yes"
BOOTPROTO="none"
IPADDR="10.64.20.110"
NETMASK="255.255.255.0"
GATEWAY="10.64.20.1"

Simply save the file and restart your network services, and you should now be able to access the machine using your favorite ssh client. Now there is one thing left to do (and again, I’m assuming this should no longer be there in a final release, just trying to be complete):

cluster -f --cluster_function_list="multicluster" -s ip_of_your_prism_central create

Which should result in something like this:
nutanix@NTNX-10-64-20-110-A-CVM:~$ cluster -f --cluster_function_list="multicluster" -s 10.64.20.110 create
2014-04-17 05:50:37 INFO cluster:1469 Executing action create on SVMs 10.64.20.110
2014-04-17 05:50:37 INFO cluster:593 Discovered node:
ip: 10.64.20.110
rackable_unit_serial: 10-64-20-110
node_position: A
node_uuid: ed763914-2c16-4aff-9b6b-d4ea962af9fe

2014-04-17 05:50:37 INFO cluster:632 Configuring Zeus mapping ({u'10.64.20.110': 1}) on SVM node 10.64.20.110
2014-04-17 05:50:37 INFO cluster:650 Creating cluster with SVMs: 10.64.20.110
2014-04-17 05:50:37 INFO cluster:654 Disable fault tolerance for 1-node cluster
2014-04-17 05:50:39 INFO cluster:687 Waiting for services to start
Waiting on 10.64.20.110 (Up, ZeusLeader) to start: ConnectionSplicer Medusa DynamicRingChanger Pithos Prism AlertManager Arithmos SysStatCollector
Waiting on 10.64.20.110 (Up, ZeusLeader) to start: ConnectionSplicer Medusa DynamicRingChanger Pithos Prism AlertManager Arithmos SysStatCollector
Waiting on 10.64.20.110 (Up, ZeusLeader) to start: DynamicRingChanger Pithos Prism AlertManager Arithmos SysStatCollector
...
...
...
Waiting on 10.64.20.110 (Up, ZeusLeader) to start: DynamicRingChanger Pithos Prism AlertManager Arithmos SysStatCollector
Waiting on 10.64.20.110 (Up, ZeusLeader) to start: AlertManager Arithmos SysStatCollector
Waiting on 10.64.20.110 (Up, ZeusLeader) to start:
The state of the cluster: start
Lockdown mode: Enabled

CVM: 10.64.20.110 Up, ZeusLeader
Zeus UP [14429, 14442, 14443, 14447, 14453, 14466]
Scavenger UP [14660, 14675, 14676, 14793]
ConnectionSplicer UP [14690, 14703]
Medusa UP [14760, 14775, 14776, 14780, 14940]
DynamicRingChanger UP [15946, 15973, 15974, 15986]
Pithos UP [15950, 15980, 15981, 15994]
Prism UP [15969, 15995, 15996, 16004]
AlertManager UP [16019, 16049, 16051, 16079, 16102]
Arithmos UP [16029, 16080, 16081, 16099]
SysStatCollector UP [16041, 16092, 16093, 16178]
2014-04-17 05:51:07 INFO cluster:1531 Success!

And voila! You can now log on to your instance of Prism Central!

Prism Central
Prism Central

As you can see, it looks quite the same as the regular 4.0 version, except that if you click on the top left “Prism Central” text, a menu will fold out on the left hand side. But, since we want to monitor a cluster, let’s go ahead and register a cluster.

To do so, just connect to your NOS 4.0 cluster, and click on the small gear symbol on the top right corner, and select “Prism Central Registration”. There, fill out the Prism Central IP, the username and password for Prism Central, and click on “Save”

Prism Central - Registration
Prism Central – Registration

If all goes well, the cluster registers, and you will see an event in your Prism Central stating that a user has been added (we support single sign on in Prism Central), and that a cluster has been added to Multicluster. And, you should now be able to see the new cluster that was registered in Prism Central:

Prism Central - Cluster registered
Prism Central – Cluster registered

To now manage that cluster, simply click on Prism Central on the top left, and then select the cluster from the list on the left hand side:

Prism Central - Cluster selection
Prism Central – Cluster selection

From there on, you can manage the cluster just like you would in your regular interface. My colleague Suda Srinivasan was kind enough to create a video that walks you through the interface:

So, that’s it for now. If you have any questions, feel free to let me know.

Uncategorized

How I flunked my VCAP5-DCD / How do I speak design?

Man With Computer by graur razvan ionut - Image courtesy of freedigitalphotos.net If you are reading this title and am wondering how I managed to flunk that test. First, thank you for your confidence in me. Secondly, there was one basic thing that was missing from my preparation:

rest

During VMworld in Barcelona (and also during VMworld in the US), there was a 50% discount for people trying to obtain their VCP, or one of the VCAP certifications. Since I thought I would save the company some money, I went ahead and scheduled the test during the conference, and boy was that a mistake.

While there are certain key elements to preparing for a test like the VCAP5-DCD (some of which I’ll go in to a bit further down in this post), there are basics that you won’t be able to get around.

I was actually working during they conference, working at the VMworld Hands-on Labs. If that wasn’t enough, working for a vendor at a conference usually also means that you have more appointments, combine that with meeting up with customers during the parties, or going to dinner with folks, and all of the other stuff surrounding the conference, and you will end up just being tired at a certain point.

I made a major mistake of underestimating how fit I would be, which meant that I was actually starting to nod off after about 2 hours in to my certification. 4 hours is a long time to sit an exam, and folks like Jason Boche reported have described several tips on how to prepare for the exam. Unfortunately, I wasn’t fit enough and only scored 290 points, which means I missed the 300 point passing grade. So my top tip? Make sure to rest up and be fit for the exam!

There are plenty of resources describing the exam itself, and you can find some useful tips on blogs like thesaffageek.co.uk or the vBrownbags, but trust me, being well rested before the exam is one of the key things. All in all, I think the exam is very fair. You will encounter Visio like parts of the test where you are designing a solution or mapping out dependencies, there are drag and drop parts to it where you will drag boxes with keywords or design parts to their counterparts. And finally, there are the normal multiple choice parts.

You will be reading for 3.5 hours, up to 4 hours for non-native English speakers, and that means that there is just a massive amount of text to get through. Also, if you are not designing environments on an everyday basis, there is a thing that will bite you in the proverbial rear end, and that is being comfortable with 4 major categories that you will encounter in almost every question in the test.

¿Habla design?

Yes, Habla is Spanish, and when it comes to the 4 major categories, if you don’t use them every day, this may come off as Spanish to you. Here they are:

  • Requirement
  • Risk
  • Assumption
  • Constraint

Seems easy enough, doesn’t it?

“I require you to wear your seatbelt. If you don’t, you risk your life when getting in to an accident. Thats because, if you don’t, I’m assuming you’ll fly out the window when you get in to a crash. And yes, wearing the seatbelt will constrain your movement in the driver’s seat, but who would want you moving all over your car while driving anyway?”

If you put it like that, most people will comprehend what is meant. Then, when you talk to a customer, things get more vague, and quite a few people who I’ve spoken to, will have the biggest problem in distinguishing between a requirement, a consumption and a restraint. If a sentence actually start with “I assume that…” or “I require you to…” things are relatively simple.

I’ve been trying to bulk up on the definitions that are used by VMware in their certifications, and what was of help to me was for example the “Designing VMware Infrastructure” training videos by Trainsignal.

Scott Lowe, actually takes you through the various steps in creating a logical and physical design. He gives you a headstart on things to consider when you are actually designing (including some tools you can use like mind mapping software), and he goes over the design terminology.

The latter part is actually what I personally think is missing for a lot of people. They know what the technical limitations are, and will be able to look them up. They’ll be able to get their head wrapped around the physical design, and there are a lot of smart folks out there that grasp how things interact and can give a holistic overview.

But then you get to the actual lingo, and that’s where some small things may help you make it click up in your head. 🙂

An example I found very striking was the notion that an assumption is always coming from the view of the architect, not the business:

Assumptions vs. constraints – ©Trainsignal

The business will be able to tell you what the Vendor is that you are going to use for your network gear (there’s a constraint for ya), and tell you that they want to have an availability of 99.99% for their HR application (there’s your requirement). But you may need to assume that the bandwidth that is available to you for replication, won’t be shared by environments that were out of scope for your design (and that could also be a risk). It is something that you might be able to eliminate by asking further questions, but it could be an assumption for a final design.

Since this is a topic that I’m dealing with in preparation for my re-take of the VCAP5-DCD, I’ll be posting some updates here with the resources I’ve used, and just put some things out there for you to take a stab at, and then I’ll see if I’m any good at design, and I’ll find out where I can improve. Also, if you think I already botched it in the example I gave here, let me know and leave a comment. It will help me prepare better.

Fusion, Uncategorized, Virtualization, VMware

VMware releases Fusion 5

Some folks on Twitter already spotted it: It seems like along with the release of VMware Workstation 9, the guys and girls over at VMware also released Fusion 5.

And with this new release come some expected things. As always, it’s bigger, better and faster. It adds suport for some of the new operating systems that are out there, for example, there is full support for Mountain Lion, and suppport was added for the new MacBook Pro with retina displays. Windows 8 support is finally built in, and they didn’t only think about the Redmond users, but also added 3D desktop support for selected Linux distributions, and one really cool feature (if your Mac is new enough to support it) is that you can use Airplay to display your VM on a TV using an Apple TV.

VMware claims that there are over 70 new features in the new version of Fusion, but I wasn’t able to locate the entire list as of yet. If I do find that list, I’ll be sure to add it to my post. In the meantime, here is the marketing shortlist:

  • Optimized for OS X Mountain Lion
  • Optimized for Windows 8
  • Designed for latest Macs
  • Faster performance
  • Retina Display Optimization
  • USB 3 support
  • Enhanced battery management
  • Enhanced UI
  • 1-click snapshot
  • Linux 3D
  • Embedded Learning Center

In the meantime, if you purchased your copy of VMware Fusion 4 from July 25th through September 30th, 2012, you are covered by the VMware Fusion “Technology Guarantee Program”, and you are eligible for a complimentary electronic upgrade to VMware Fusion 5. You can find out all about that here: http://www.vmware.com/support/product-support/fusion/faq/licensing.html?cd=4&hl=en&ct=clnk

If you want to test out this new version, go right ahead and download a trial here: http://www.vmware.com/go/tryfusion, or to immediately purchase the new version, follow this link: http://www.vmware.com/go/buyfusion. Be sure to do a quick Google search on discount codes, since there are bound to be some upgrade offers out there. 🙂

Cisco, UCS, Uncategorized

Shorts: How to reboot a Cisco UCS 6100 series fabric interconnect

I actually had this written down in my notes somewhere, and today a colleague of mine called me because he was having some issues. Since it’s not necessarily obvious, and others might be having the same problem, I thought I’d post a quick note here.

To reset the fabric interconnect on a Cisco UCS 6100 series, connect via your serial cable or use putty to ssh in to your fabric, and use the following commands:

connect local-mgmt [enter]
reboot [enter]

Obviously you leave out the space after the command, and the [enter] should be replaced by you actually pressing the enter key. Not sure if this is of help to folks out there, but I figured it’s more useful out here than in my private notes.

Update – July 10th:

Dan gave a good hint in the comments, that you might want check if the fabric interconnect you are trying to reboot is the primary or the subordinate, so here goes.

Connect to your fabric interconnect just like you did in the top example, and enter the local management:

connect local-mgmt
Now, check to see the state of the fabric interconnect you have attached to:
show cluster extended-state

This will give you a bunch of output, but there is one part that you are interested in more than anything else:
B: UP, SUBORDINATE
A: UP, PRIMARY

B: memb state UP, lead state SUBORDINATE, mgmt services state: UP
A: memb state UP, lead state PRIMARY, mgmt services state: UP
heartbeat state PRIMARY_OK

At the prompt, you can see (and hopefully you already knew to begin with) to what node or member you are connecting to. If you are at the node that is in a subordinate state, use the following command to change the state of the node you are rebooting to subordinate:
cluster lead a
or
cluster lead b
Where a or b is the node that you want to become subordinate. After that you can reboot the respective member

Uncategorized, Virtualization, VMware, vSphere

Shorts: What is it about cpuid.corespersocket on vSphere?

Time for another short! The google searches leading to this blog show searches coming in based on the cpuid.corespersocket setting. In this short I’ll try to explain what this setting is for and how it behaves. So, let’s dig right in!

The cpuid.corespersocket setting

In a sense, you would assume that the name of the setting says it all. And in a sense, it does. In the physical world, you will have a number of sockets on your motherboard, this number of sockets is normally also the number of physical CPU’s that you have on said motherboard (at least in an ideal world), and each CPU will have one or more cores on it.

Wikipedia describes this in the following way:

One can describe it as an integrated circuit to which two or more individual processors (called cores in this sense) have been attached.

…..

The amount of performance gained by the use of a multi-core processor depends very much on the software algorithms and implementation. In particular, the possible gains are limited by the fraction of the software that can be parallelized to run on multiple cores simultaneously; this effect is described by Amdahl’s law. In the best case, so-called embarrassingly parallel problems may realize speedup factors near the number of cores, or beyond even that if the problem is split up enough to fit within each processor’s or core’s cache(s) due to the fact that the much slower main memory system is avoided.

Now, this sounds quite good, but some of you may ask what kind of influence this has on my virtualized systems. The most obvious answer would be “none at all”. This is because by default your virtualized system will see the cores as physical CPU’s and be done with it.

So, now you are probably wondering why VMware would even distinguish between cores and sockets. The answer is quite simple; It’s due to licensing. Not so much by VMware, but by the software or operating system that you would like to virtualize. You see, some of that software is licensed per core, and some will license by the number of sockets (some even combine the two).

So how do I use it?

As with all things computer related… It depends. When you are using ESX 3.5 you have no chance of using it. With ESX 4, you can actually use this feature, but it is not officially supported (someone please point me in the right direction if this is incorrect). And starting with ESX 4.1 the setting is now officially supported, and even documented in the VMware Knowledge Base as KB Article: 1010184.

Simply put, you can now create a virtual machine with for example 4 vCPU’s and set the cpuid.corespersocket to 2. This will make your operating system assume that you have two CPU’s, and that each CPU has two cores. If you create a machine with 8 vCPU’s and again select a cpuid.corespersocket of 2, your operating system will report 4 dual-core CPU’s.

You can actually set this value by either going this route:

  1. Power off the virtual machine.
  2. Right-click on the virtual machine and click Edit Settings.
  3. Click Hardware and select CPUs.
  4. Choose the number of virtual processors.
  5. Click the Options tab.
  6. Click General, in the Advanced options section.
  7. Click Configuration Parameters.
  8. Include cpuid.coresPerSocket in the Name column.
  9. Enter a value ( try 2, 4, or 8 ) in the Value column.

    Note: This must hold:

    #VCPUs for the VM / cpuid.coresPerSocket = An integer

    That is, the number of vCPUs must be divisible by cpuid.coresPerSocket. So if your virtual machine is created with 8 vCPUs, coresPerSocket can only be 1, 2, 4, or 8.

    The virtual machine now appears to the operating system as having multi-core CPUs with the number of cores per CPU given by the value that you provided in step 9.

  10. Click OK.
  11. Power on the virtual machine.

If the setting isn’t shown, for example for those who want to experiment with it under ESX 4.0, you can create the values in the following way:

  1. Power off the virtual machine.
  2. Right-click on the virtual machine and click Edit Settings.
  3. Click the Options tab.
  4. Click General, under the Advanced options section.
  5. Click Configuration Parameters.
  6. Click Add Row.
  7. Enter “cpuid.coresPerSocket” in the Name column.
  8. Enter a value ( try 2, 4, or 8 ) in the Value column.
  9. Click OK.
  10. Power on the virtual machine.

To check if your settings actually worked, you can use the sysinternals tool called Coreinfo on your Windows systems, and on Linux you can perform a simple “cat /proc/cpuinfo” to see if everything works.