vCenter 5.5 Resource Exhaustion Detected

Following an upgrade of vCenter server from 5.0 to 5.5 the vCenter service intermittently stopped and we began to see a number of resource exhaustion events:

Event ID: 2004
Source: Resource-Exhaustion-Detect
Windows successfully diagnosed a low virtual memory condition. The following programs consumed the most virtual memory: java.exe (2116) consumed 10149273600 bytes, java.exe (2088) consumed 4624416768 bytes, and vpxd.exe (14440) consumed 4113379328 bytes.

We increased the allocated memory (from 12GB to 24GB) and page file (from 4Gb to 6GB) but continued to experience problems. I came across a similar issue in the VMware Communities in which user Sateesh_vcloud documented the standard JVM Heap settings:

Default values for vCenter server installation:

(null)

Our vCenter inventory was approx 100 hosts and 2000 virtual machines but we had selected large inventory for all services during the upgrade. Therefore the JVM heap allocated for each service was likely larger than we required. Sateesh_vcloud also documented the locations of the configuration files for each service:

Single Sign On:
C:\Program Files\VMware\Infrastructure\SSOServer\conf\wrapper.conf
Set wrapper.java.additional.9=”-Xmx” (default: “1024M”) to “256M”
Set wrapper.java.additional.14=”-XX:MaxPermSize=” (default: “512M”) to “128M” (or half of the Xmx value chosen before)

Inventory Service:
C:\Program Files\VMware\Infrastructure\Inventory Service\conf\wrapper.conf
Set wrapper.java.maxmemory (default: “3072”) to “384” (MB)

Tomcat:
C:\Program Files\VMware\Infrastructure\tomcat\conf\wrapper.conf
Set wrapper.java.additional.9=”-Xmx” (default: “1024M”) to “512M” – “768M”
Set wrapper.java.additional.14=”-XX:MaxPermSize” (default: “256M”) to half of the Xmx value chosen before

Web Client:
C:\Program Files\VMware\Infrastructure\vSphereWebClient\server\bin\service\conf\wrapper.conf
Set wrapper.java.initmemory (default: “1024m”) to “256m”
Set wrapper.java.maxmemory (default: “1024m”) to “384m”

Log Browser:
C:\Program Files\VMware\Infrastructure\vSphereWebClient\logbrowser\conf\wrapper.conf
Set wrapper.java.maxmemory (default: “512”) to “256” (MB)

Profile Driven Storage:
C:\Program Files\VMware\Infrastructure\Profile-Driven Storage\conf\wrapper.conf
Set wrapper.java.initmemory (default: “256”) to “128” (MB)
Set wrapper.java.maxmemory (default: “1024”) to “384” (MB)

Orchestrator:
C:\Program Files\VMware\Infrastructure\Orchestrator\app-server\bin\wrapper.conf
Set wrapper.java.additional.3=-Xmn (default: “768m”) to “256m”
Set wrapper.java.initmemory (default: “2048”) to “384” (MB)
Set wrapper.java.maxmemory (default: “2048”) to “512” (MB)

I updated the Inventory Service configuration file to 6144MB (previously 1288MB) and restarted the service. We have not had a reoccurrence of the resource exhaustion and the vCenter service has been stable.

vCenter Service Failing to Start

I recently upgraded vCenter from 5.0 U3 to 5.5 U2 which went smoothly and ran fine until our standard monthly windows patch window when we found the primary vCenter service would not start.

I initially flagged the issue with our database operations team and asked them to health check the SQL database for vCenter.

However I continued investigating and upon checking the vpxd.log file I found:

[VpxdReverseProxy] Failed to create http proxy: An attempt was made to access a socket in a way forbidden by its access permissions.

This lead me to a VMware knowledge base article listing troubleshooting steps for the vCenter service. Step four of this article suggested verification of the ports required by vCenter. Running ‘netstat –bano’ I found port 80 appeared to be in use by process id 4. Via Task Manager I found process ID 4 owned by the System which was not a conclusive identifier however it ruled out some potential suspects.

Looking at the knowledge base article again, it lists some services to specifically check for –

‘If another application, such as Microsoft Internet Information Server (IIS) (also known as Web Server (IIS) on Windows 2008 Enterprise), Routing and Remote Access Service (RAS), World Wide Web Publishing Services (W3SVC), Windows Remote Management service (WS-Management) or the Citrix Licensing Support service are utilizing any of the ports, vCenter Server cannot start.’

Reviewing the services running on the server I found the Window Remote Management service. I stopped the service and then retried vCenter. It was successful. I was then able to restart the Windows Remote Management service and vCenter continued to run.

I subsequently found a blog called The World According to Gabe that detailed a permanent solution.

Recording the key steps here for my own future reference:

If when you run winrm get winrm/config | find /I “http” you find that WinRM is listening on port 80 by default, run the following command:

winrm set winrm/config/listener?Address=*+Transport=HTTP @{Port=”8888”}

If you want WinRM to listen on a different port, just change the “8888” to whatever port you wish, without breaking the formatting.

If you find that WinRM is not listening on port 80 by default, but is still grabbing the port, run the following command:

winrm set winrm/config/service @{EnableCompatibilityHttpListener=”false”}

Later still I found another VMware knowledge base article specific to the Window Remote Management service.

Dell Customised Image

My preference it is use vendor customised ESXi images where possible. They include vendor specific drivers which may be required to utilise the hardware properly and ensure you’re getting consistency across your esxi fleet.

Unfortunately Dell have managed to introduce a frustrating admin headache into their customised images.

It seem they changed the names of some of their customised VIBs between releases but kept the same files within those VIBS which causes the upgrade process to bomb out.

The remediation for this issue is to either manually uninstall the VIBs via the command line before performing the upgrade or upgrade using the standard VMware files Neither option is very enticing. Uninstalling the VIBs is time consuming across multiple hosts. Using the stand VMware files means you may miss out on a Dell update.

The issue is described in a VMware KB article and on the Dell community site.

CBT Bug

Came across a reddit post the other day pointing out a bug in Changed Block Tracking which warrants further investigation.

A basic explanation of the bug is that if a VM has CBT enabled and it’s capacity is subsequently expanded by 128GB or more (either via a one off expansion or multiple smaller expansions over time) then CBT can no longer be relied upon for accuracy. Any backup utilising CBT must be treated as suspect.

To remediate the issue you need to disable and then re-enable CBT on the VM.

As you would expect there is a VMware KB explaining the situation in more detail.

The most comprehensive coverage I have found it on the educationalcentre.co.uk blog.

Transparent Page Sharing – Disabled by Default

A recent post on the VMware Security & Compliance Blog announces some changes coming to the Transparent Page Sharing (TPS) feature of vSphere.

1. TPS is going to be disabled by default in the next update to ESXi.

2. There will be additional options for enabling and adding security to TPS.

Given the introduction of large pages the effectiveness of TPS has been somewhat limited therefore the impact should be minimal to environments that do not over allocate memory. There is a VMware Knowledge Base article that explains this quite well.

In addition the issue has been blogged by Frank DennemanJosh Odgers, Michael Webster and Duncan Epping among others.

Finally, I thought Duncan Epping‘s note on his post about Project Fork was interesting ‘he disabling of TPS will be overruled per VMFork group. So the parent and childs belonging to the same group will be able to leverage TPS and share pages. ,

 

 

vSphere Blogs

There are numerous fantastic blogs that many brilliant engineers have created that cover vSphere and related topics. It is quite amazing the wealth of information contained in these sites.

A few of my favourites are:

Long White Virtual Clouds

Kiwi Michael Webster provides great insight and has particularly good pieces on certificate management, clustering, and storage

TheSaffaGeek
Especially appreciate the certification related articles. Comments on the different certification study resources and exams and the interviews with individuals holding the VCDX are great.

Frank Denneman and Duncan Epping
These two literally wrote the book on vSphere clustering services and each have great sites.

Chad Sakac
Good articles explaining new technologies in the storage space.

Nick Marshall
Listing of vBrownBag episodes related to each certification is a handy feature of Nick’s site.

 

VCAP-DCD – Passed!

A couple of weeks ago I passed the VCAP-DCD exam. It may be the most challenging exam I have sat since university. I felt the questions were fair however there are a large number of questions and it is important that you manage your time to ensure you get through all of them. My advice to anyone sitting the exam would be mark off the design questions as you complete them so that you know what sort of time you need to keep available.
In terms of study materials there is a wealth of material available. I would reccomend the VMware design course as it covers the core sections of the exam quite well. It is a course that encourages group discussion which is valuable for design, a different perspective can lead to interesting design choices that you may not have considered. I would also recommend reviewing the vsphere best practices documents, in particular the network doc which steps through some of the design decisions you may need to make.  Finally I would recommend the vBrownBag series of podcasts as a fantastic resource and contribution to the community.

Host Profile Configuration – Answer Files

When creating new version 5.0 Host Profiles you may quickly find yourself looking at a compliance failure error stating:

Expected user input parameters missing. Check answer file for host. 

This error relates to a new feature introduced to Host Profiles with vSphere 5.0, Answer Files.

Answer Files are designed to allow administrators to pre-configure answers to the questions required by deployment via host profiles. For example host specific IP address details.

The vSphere 5 Documentation Center provides specific details related to configuration of Answer Files.

To quickly resolve the Host Profile compliance failure you should right click the host and select ‘Update answer file’.

You will then be prompted to enter the host specific details required for the Host Profile. It will not apply the host profile or cause any change to occur to the host.

Finally run the Host Profile compliance check again to check for any other failures.

Host Profile Configuration – Misc.HeartbeatPanicTimeout

Creating new Host Profiles recently I found vSphere 5.0 introduced a number of new settings and I ran into a few that caused failures for compliance checks against various hosts in the cluster.

The high level process I followed was:

  1. Created a new Host Profile based on first host in the cluster
  2. Attached second host in the cluster
  3. Ran Check Compliance
  4. First error states ‘Option Misc.HeartbeatPanicTimeout doesn’t match the specified criteria’

Checking the Host Profile configuration this setting is found under “Advanced configuration option” and in this case was set to 60.

This setting is described in the VMware article ‘Advanced configuration options for VMware High Availability in vSphere 5.x’ (http://kb.vmware.com/kb/2033250):

heartbeat

heartbeat

My interpretation of this information is that in an HA enabled cluster all hosts will have a Misc.HeartbeatPanicTimeout setting of 60 seconds assuming a default configuration. Checking the host I found the setting was 14.

After further digging I found hosts that had been upgraded from ESXi 4.1 to 5.0 had a setting of 60 while hosts build as ESXi 5.0 had a setting of 14.

Best practice would dictate using the latest standard of 14 however there is a lack of documentation (I could find very little) regarding this and therefore there may be confusion.

I expect a number of admins would find themselves in this position of mixed settings as it would be fairly common to upgrade existing hosts in a cluster and add new hosts at the new version.

 

Virtual Machine MAC Addresses

While attending a vSphere  Design Workshop a question arose regarding when a virtual machine MAC address may be changed. It was suggested that a virtual machine’s MAC address should not ever change following initial creation of the VM. However, there appear to be some circumstances under which a virtual machine may be configured with a new MAC address. Some are obvious such as when a new network adapter is added it will have a new MAC address. Others are not as obvious.

In vSphere 5.x performing a vmotion or storage vmotion does not result in a change to the MAC address of a running VM however the documentation suggests that moving a virtual machine location on the same host may result in a network interfaces MAC address changing. The vSphere Documentation Center for 5.0 states:

After the MAC address has been generated, it does not change unless the virtual machine is moved to a different location, for example, to a different path on the same server. The MAC address in the configuration file of the virtual machine is saved. All MAC addresses that have been assigned to network adapters of running and suspended virtual machines on a given physical machine are tracked.

In addition vSphere 5.x does not track the MAC addresses assigned to virtual machines that are powered off. Therefore when a powered off virtual machine is powered on it is possible that it’s MAC address will have been assigned to another virtual machine. In this situation the virtual machine being powered on will be assigned a new MAC address. The vSphere Documentation Center for 5.0 states:

The MAC address of a powered off virtual machine is not checked against those of running or suspended virtual machines. It is possible that when a virtual machine is powered on again, it can acquire a different MAC address. This acquisition is caused by a conflict with a virtual machine that was powered on when this virtual machine was powered off.

Usually MAC addresses are not of great concern to a vSphere administrator. However some software requires a static consistent MAC address. In these cases it is possible to manually configured the MAC address so that it is known and unlikely to change.

A manual MAC address may be configured via the vSphere client:

1. Log in to the vSphere Client and select the virtual machine from the inventory panel.

2. Click the Summary tab and click Edit Settings.

3. Select the network adapter from the Hardware list.

4. In the MAC Address group, select Manual.

5. Enter the static MAC address, and click OK.

For vSphere 5.1 there are some specific requirements that have been helpfully documented by Cormac Hogan in his article ‘Heads Up! Valid Static Address Ranges in vSphere 5.1‘. These requirements appear to have been relaxed in vSphere 5.5 as the vSphere 5.5 Documentation Center states:

…all unique manually generated addresses are supported.