Friday, July 11, 2014

SharePoint 2013: Start-CacheHost : Cannot start service AppFabricCachingService on computer

Problem

You have two AppFabric hosts in your SharePoint Server 2013 farm and want to add a third.  After adding this host, you find that the new host status is DOWN.  You try starting the service first using the usual Start- CacheHost command, but this fails.  When you try starting it through the Services control panel, it does start up, but then stops after a few minutes.  You then check the CacheClusterConfiguration file and see all three hosts there and configured correctly.  But when you check the health stats (Get-CacheClusterHealth), only the two original hosts are shown.

Solution
  • After adding a new cache host, stop and then start the CacheCluster.
References
Notes
  • Thanks to Marco van Wieren and his post for providing the clue needed for solving this.
  • Here's the process I went through to add the instance and troubleshoot:
    1. I first added the instance by executing the following script on the target new AppFabric host:
      $SPFarm = Get-SPFarm $cacheClusterName = "SPDistributedCacheCluster_" + $SPFarm.Id.ToString() $cacheClusterManager = [Microsoft.SharePoint.DistributedCaching.Utilities.SPDistributedCacheClusterInfoManager]::Local $cacheClusterInfo = $cacheClusterManager.GetSPDistributedCacheClusterInfo($cacheClusterName) $instanceName ="SPDistributedCacheService Name=AppFabricCachingService" $serviceInstance = Get-SPServiceInstance | ? {($_.Service.Tostring()) -eq $instanceName -and ($_.Server.Name) -eq $env:computername} if([System.String]::IsNullOrEmpty($cacheClusterInfo.CacheHostsInfoCollection)) {Add-SPDistributedCacheServiceInstance; $cacheClusterInfo.CacheHostsInfoCollection}
    2. After this, I executed Use-CacheCluster followed by Get-CacheHost to check service status. Note that I did this while logged into the new AppFabric host machine.  The results were that my two existing hosts were UP while the new one was DOWN.  I then executed the following script to spot check the new host's service instance status:
      $instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
      $serviceInstance = Get-SPServiceInstance | ? {($_.service.tostring()) -eq $instanceName -and ($_.server.name) -eq $env:computername}
      $serviceInstance
      The result was that the new service instance status is Online. 
    3. I then decided to check the service instance status of all hosts in the farm using this script:
      Get-SPServiceInstance | ? {($_.service.tostring()) -eq "SPDistributedCacheService Name=AppFabricCachingService"} | select Server, Status
      The results were that all service instances were Online.
    4. I then started the AppFabric service manually, via the Services control panel on the new host.  I noted that the correct service account was configured (spService); and when I clicked Start, the service started up without issue.
    5. Still logged into this machine, I again executed Use-CacheCluster followed by Get-CacheHost to check service application status.  This time, the results were UNKNOWN for all.  I then logged into one of the existing AppFabric hosts and executed Use-CacheCluster followed by Get-CacheHost to check service status.  The results were UNKNOWN for the new host and UP for the two existing hosts.
    6. I waited a minute or two to give time for the caching service to warm up and synchronize.  I then repeated steps 4 - 5, but experienced the same outcome.
    7. I repeated steps 4 - 5, but this time from the SharePoint Management Shell, executing Start-CacheHost.  This immediately resulted in the error:
      Start-CacheHost : Cannot start service AppFabricCachingService on computer.
    8. I then checked the cluster configuration by exporting the configuration file using this script:
      export-cacheclusterconfig C:\CacheClusterConfig.txt
      All three hosts were identified in this file and the configuration information for each seemed to be correct.
    9. Next, I checked the health metrics by executing this script:
      Use-CacheCluster
      Get-CacheClusterhealth
      Oddly, this showed that the two existing hosts were perfectly healthy, but it did not mention anything about the new one I had just added.
    10. I then checked all of the caches using this script:
      Get-Cache
      The results of this were a listing of the names of all of the caches, but this listing only showed the caches for the two existing hosts and nothing for the new one.
    11. Next, I checked the configuration of the new host using this script:
      Get-CacheHostConfig
      The result showed a CacheSize of 1124 MB, which was significantly higher than any of the CacheSizes for the other hosts.
    12. Thinking that this might be the issue, I stopped the cluster, set the new CacheSize and then started the cluster using this script:
      Stop-CacheCluster
      Set-CacheHostConfig -Hostname OCS-VS-BAT13D1 -CachePort 22233 -CacheSize 300
      Start-CacheCluster
    13. I then checked the service status of all hosts, using Get-CacheHost, and this time all instances reported back UP status.
    14. It wasn't clear to me whether it was setting the CacheSize to a lower value or stopping and starting the cache cluster that resolve the issue.  So, I stopped the cluster, set the CacheSize back to 1124 MB, and then started the cluster and checked service statuses again: they were all up.
    15. Therefore, it was stopping and starting the cache cluster that enabled the new host to fully integrate with the cluster.

No comments: