Showing posts with label Troubleshooting. Show all posts
Showing posts with label Troubleshooting. Show all posts

Friday, September 12, 2014

SharePoint 2013: Table SPDistributedCacheCalls_Partition9 has XXX bytes that has exceeded the max bytes XXX

Problem

You see the following event appear in a SharePoint 2013 web front server application log:
Log Name: Application Source: Microsoft-SharePoint Products-SharePoint Foundation Date: [Date/Time] Event ID: 8319 Task Category: Usage Infrastructure Level: Critical Keywords: User: [Farm Service Account] Computer: [A farm web front end server] Description: Table SPDistributedCacheCalls_Partition9 has 461774848 bytes that has exceeded the max bytes 460175067 Event Xml: ...

This event appears in the application at about 5 minute intervals.  You then check the ULS logs and find the following:
Table SPDistributedCacheCalls_Partition9 has 461774848 bytes that has exceeded the max bytes 460175067
also appearing at about five minute intervals.

Solution
  1. Update usage events that you want to monitor
    1. Login to a farm server using the SharePoint Setup User Administrator account.
    1. Launch Central Administration.
    2. Go: Monitoring > Reporting > Configure usage and health data collection.
    3. Update the usage events you want to monitor. Unfortunately, there is presently (from my searching) no Microsoft article that clearly describes these events.
    4. Click OK.  The changes you make here will not be implemented immediately, as they are scheduled as jobs.
  2. Update usage event logging retention periods
    1. While still logged into the server, launch a SharePoint Management Shell with elevated privileges.
    2. Execute the command, Get-SPUsageDefinition.  This will list all of the usage events, whether they are being logged and their retention periods.
    3. Execute this command to set the retention period for a single definition:
      Set-SPUsageDefinition -Identity “[name of definition]” -DaysRetained [number of days]
      If you enter the display name of the usage definition, use quotes. If you use its ID, no quotes are necessary.  For example, changing the retention period for the Task Use definition:
    4. Execute this command to set the retention period for all usage definitions at the same time:
      Get-SPUsageDefinition | ForEach-Object {Set-SPUsageDefinition -Identity $_.name -DaysRetained [number of days]}
      For example, to change them all to 7 days:
  3. Force changes to be implemented promptly
    1. Back in Central Administration, go: Monitoring > Reporting > Configure usage and health data collection.
    2. Scoll down a bit and look for Log Collection Schedule.
    3. Click this link.
    4. Click on a job definition listed here, then click on its Run Now button.  Repeat this for the second job definition.  Even this will not put your changes into affect immediately, but will force the job to run after the current job is completed.
References
Notes
  • When I experienced this error for my production farm (400 users), it was sufficient for me to configure usage definitions like so in order to resolve this error:

SharePoint 2013: Workflow not working after installing August 2014 PU

Problem

After installing the August 2014 public updates to a SharePoint 2013 farm, some users reported that when they clicked on the Workflows link for a list or library item, they were navigated to the usual "Sorry..." page rather than the Workflows page.  A workflow designer also reported that she could not longer create 2013 workflows.  Curiously, 2010 workflows (legacy workflows migrated from 2010) continued to work and 2010 workflows could still be created in SharePoint Designer 2013.

Solution
  1. Install  hotfix KB2880963, per the Microsoft TechNet article, Install a software update (SharePoint 2013)
  2. On the application server hosting Workflow Manager 2013, stop and restart these services:
    1. Service Bus Gateway
    2. Service Bus Message Broker
    3. Workflow Manager Backend
  3. Lastly, restart IIS on the WFEs.
References
Notes
  • After installing this hotfix on all of the SharePoint 2013 servers, you may find that running the SharePoint Products Configuration Wizard returns an error:
    Error: Some farm products and patches were not detected on this or other servers. If products or patches are missing locally...
    If this occurs, execute Get-SPProduct -Local on each of the machines identified in the error message.  It may take several minutes for this command to finish.
  • In Solution Step 2), above, I mention restarting workflow services.  It's not clear to me whether all of them need to be restarted, one or more of them, or maybe just restarting IIS, or some other combination of these.  Whichever the case may be, this is the combination I did to recover from experiencing an error clicking on Workflows for a list item, after installing the hotfix.  This approach worked for resolving this problem on both my development and production farms.

Wednesday, July 9, 2014

SharePoint 2013: there was an error during installation

Problem

You are attempting to install SharePoint Server 2013 prerequisites to a Windows Server 2012 virtual machine on Hyper-V.  The server has Internet access.  You run PrerequisiteInstaller as Administrator.  Part way through the installation process, the prerequisite installer stops and displays the usual prerequisite installation error dialog:

Troubleshooting
  1. Action: Reviewed prerequisite installer log file.
    1. Results: saw the following error in this file:
      Error: The tool was unable to install Application Server Role, Web Server (IIS) Role.
  2. Action: Searched for this error text.
    1. Results: references 1 and 2 (below) seemed to suggest for my environment that this error involved PrerequisiteInstaller not being able to access the Internet to download appropriate prerequisites. 
  3. Action: opened browser and connected to known accessible website.
    1. Result: successfully connected to website.
    2. Observation: Internet connectivity established, but noted that IE Enhanced Security Settings were enabled.  Perhaps IE Enhanced Security Settings might play a role.
  4. Action: disabled IE Enhanced Security Settings, and then re-ran PrerequisiteInstaller.
    1. Result: still failed.
    2. Observation: perhaps Firewall played a role.
  5. Action: Checked Firewall settings for target server and also for production server currently running older instance of SharePoint (2010).
    1. Results: firewall was set to default settings on target server and on server currently running older instance. Recalled that a development instance was successfully installed without changing default firewall settings.
    2. Observation: firewall settings not likely the cause.  Internet connectivity not likely the cause.
  6. Action: searched again for error text.
    1. Results: reference 3 seemed to suggest that the issue may be permissions-related; considered tweaking GPO, but wanted to avoid making such modifications.  References 4, 5 and 6 seemed to indicate that for some reason the PrerequisiteInstaller was not able to install the Application Server and Web Server (IIS) roles on Windows Server 2012 and that therefore these should be installed manually.
  7. Actions: Added Application Server and Web Server roles per reference 6; then re-ran PrerequisiteInstaller.
    1. Results: still failed, but noted that PrerequisiteInstaller seemed to run for longer period before failing.
  8. Actions: searched again for error text.
    1. Results: reference 7 seemed to indicate that perhaps just need to run aspnet_regiis -enable -i, but for .NET version 4.0.  I recalled that I have had to do this for previous 2010 installations that experienced problems on installation and this resolved it.
  9. Action: ran aspnet_regiis -enable -i command, at console, as administrator.
    1. Results: the command did not fail, but simply presented help instructions.
  10. Action: searched again for error text.
    1. Results: found reference 8.  Copy of PrerequisiteInstaller log file was displayed in reference.  This log file included the original error message I identified above (see step 1, above).  Additionally, it highlighted an error message that I had apprently overlooked: Error when enabling ASP.NET v4.0.30319.  This reference also had a link to another reference, 9, which was touted as the solution.
  11. Action: Performed the steps in reference 9.  These steps added ASP.NET 3.5. 
    1. Results: Installed.
  12. Action: restarted PrerequisiteInstaller.exe.
    1. Results: completed without issue.
Solution
  • Prior to running the PrerequisiteInstaller, install the Application and Web Server roles manually.  During the installation of the Web Server role, be sure to check the ASP.NET 3.5 feature.
References
  1. SharePoint 2013: Install Prerequisites Offline or Manually on Windows Server 2012 - A Comprehensive Guide
  2. The Products Preparation Tool in SharePoint Server 2013 may not progress past "Configuring Application Server Role, Web Server (IIS) Role
  3. SharePoint 2013 Pre requisites install fail, Error: The tool was unable to install Application Server Role, Web Server (IIS) Role
  4. SharePoint 2013 pre-requisite: Application and Web Server Role configuration error
  5. Installing SharePoint 2013 on Windows Server 2012 R2 *RTM*
  6. Installing SharePoint 2013 on Windows Server 2012 R2 Preview
  7. The tool was unable to install Application Server Role, Web Server (IIS) Role
  8. Trying to install SharePoint 2013 on server 2012 Unable to install the application to web server IIS
  9. IIS 8.0 Using ASP.NET 3.5 and ASP.NET 4.5
  10. SharePoint 2013 SP1 support in Windows Server 2012 R2
  11. Error The tool was unable to install Application Server Role, Web Server IIS Role Last return code 0X41D=1053.

Tuesday, July 8, 2014

SharePoint 2013: Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED))

Problem

You attempt to run a PowerShell script against the SharePoint 2013 farm My Site web application and experience the following error message appearing for each website the script iterates through:
Get-SPWeb : Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED))
At line:1 char:79
+ Get-SPWebApplication http:/[YourWebAppURL] | Get-SPSite -Limit All | Ge ...
+                                                                               ~~
    + CategoryInfo          : InvalidData: (Microsoft.Share....SPCmdletGetWeb:SPCmdletGetWeb) [Get
   -SPWeb], UnauthorizedAccessException
    + FullyQualifiedErrorId : Microsoft.SharePoint.PowerShell.SPCmdletGetWeb
You verify that you are running the script in the SharePoint Management Shell as Administrator.  You also verify that the user account you are logged in as is a member of the Farm Administrators group and that it is the Primary site collection administrator for the root site collection of the My Site web application.

Solution
  • Configure a new User Policy for the My Site web application that grants your administrator account Full Control to all the site collections in this web application:
    1. Login to the farm's Central Administration as a farm administrator.
    2. Navigate to: Application Management > Web Applications > Manage Web Applications.
    3. Select the target web application.  This will enable the options on the Web Applications ribbon.
    4. Click the User Policy button.
    5. Click Add Users.
    6. Click Next.
    7. Enter a username, select Full Control, and then click Finish.  When adding the username, be sure to prefix it with i:0#.w|.  So, for example, DOMAIN\John.Smith would be entered as:
      i:0#.w|DOMAIN\john.smith
    8. Open a fresh SharePoint Management Shell as Administrator and re-run the script.
References
Notes
  • An account added to the SharePoint Farm Administrators group is not automatically granted access to a farm web application or to user sites.  A SharePoint Farm Administrator can, however, add himself to a site collection's Site Collection Administrators group and thereby gain administrative access to that site collection and all of its subsites.
  • Administrators of site collections and sites must be members of the Site Collection Administrators group or of the site Administrators group.  A member of the Site Collection Administrators group has administrative access to all sites and subsites within the site collection.
  • This issue can also affect the results you get when you use PowerShell to harvest My Site metrics.  For example, running this command
    Get-SPWebApplication http:/[MySiteURL] | Get-SPSite -Limit All | Select URL, @{Expression={$_.Usage.Storage/1024/1024}}
    will return a list of "0" for the storage values for all My Sites for which your administrator account is not configured as the Site Collection Administrator.

Monday, June 30, 2014

SharePoint 2013: Session OfficeSearchHealthSession failed to start with the following error: 0xC0000035

Problem

The following error appears in a SharePoint 2013 farm server's Application log:
Log Name:      Microsoft-Windows-Kernel-EventTracing/Admin
Source:        Microsoft-Windows-Kernel-EventTracing
Date:          [Date/Time]
Event ID:      2
Task Category: Session
Level:         Error
Keywords:      Session
User:          [farm service account]
Computer:      [a farm server]
Description:
Session "OfficeSearchHealthSession" failed to start with the following 
error: 0xC0000035
Event Xml:
...
Solution
  • Ignore this error.
  • You can generate it manually by executing the following PowerShell script on the server:
    Start-Job -ScriptBlock {Restart-Service sptracev4}
    $job = Get-SPTimerJob -Identity "Search Health Monitoring - Trace Events"
    $job.Execute([System.Guid]::Empty)
    
    Refresh the Application log.
References
Notes
  • Thanks to Janis Norvelis for presenting the script.  Janis originally presented this script with respect to seeing this error appearing in SharePoint Server 2010 application logs.
  • I have seen this error appear in the Application log of all farm servers hosting SharePoint Server 2013, every 24 hours, separated by approximately one minute intervals among the farm servers.  It occurs on all my farms, both 2010 and 2013.  One example:
    • WFE1: 6:12:03 AM
    • APP1:  6:13:02 AM
    • WFE2: 6:14:03 AM

Thursday, June 19, 2014

SharePoint 2013: The option for the SharePoint 2013 Workflow platform is not available

Problem

I administer a SharePoint 2013 farm.  I had a user who wished to create a workflow using the new SharePoint 2013 Workflow platform.  When the user connected to a website on the farm, using Designer 2013, and then launched the Create workflow process for a list, the following message was displayed in the Create dialog:
The option for the SharePoint 2013 Workflow platform is not available because the workflow service is not configured on the server.  Please contact your server administrator.
This message seemed odd since I knew I had installed the Workflow Manager 1.0 on the application server just fine.  I begen troubleshooting.

Troubleshooting
  1. Check software installation:
    1. Workflow Manager 1.0 installed to application server.
    2. Workflow Manager Client 1.0 installed to all SharePoint servers in farm.
  2. Check workflow services are running.
    1. These should be running on the application server hosting Workflow Manager.
    2. The following services were verified as being started:
      1. Service Bus Gateway
      2. Service Bus Message Broker
      3. Windows Fabric Host Service
      4. Workflow Manager Backend
  3. Check the Workflow Service Application Proxy was started and connected.
  4. Check that the workflow service was registered for HTTP and a valid URL existed (Get-SPWorkflowServiceApplicationProxy).
  5. Check that the Workflow Management Site is started  and correctly served the service configuration schema (h-t-t-p://AppServer:12291).
  6. Check that the Workflow Management Application Pool was started.
  7. Check that web front end reboot after workflow manager client installation.
Solution
  1. Install the Workflow Manager Client on all web front end servers.
  2. Reboot each server after installation.
References

Wednesday, June 11, 2014

SharePoint 2013: UserProfileApplicationNotAvailableException_Logging... ProfilePropertyCache does not have...

Problem

You've opened a SharePoint 2013 Management Shell as the farm setup user administrator account (eg spAdmin).  You are attempting to add, remove or get SPProfileLeader data.  You can get an instance of the service application proxy just fine; but when you try to perform a command against it, you experience the following error shown in the shell:
Get-SPProfileLeader : UserProfileApplicationNotAvailableException_Logging ::
UserProfileApplicationProxy.ApplicationProperties ProfilePropertyCache 
does not have 48c36bfc-c2de-4a8a-afd5-04885b11c9bb
At line:1 char:1
+ Get-SPProfileLeader -ProfileServiceApplicationProxy $upaProxy
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidData: (Microsoft.Offic...CmdletGetLeader:
SPCmdletGetLeader) [
   Get-SPProfileLeader], UserProfileApplicationNotAvailableException
    + FullyQualifiedErrorId : 
      Microsoft.Office.Server.UserProfiles.PowerShell.SPCmdletGetLeader
Solution
  1. Login to a farm server (that hosts SharePoint Server) as the farm setup user administrator account.
  2. Launch Central Administration as administrator.
  3. Go: Application Management > Service Applications > Manage service applications.
  4. Select (don't click on) your user profile service application.
  5. Up above, on the Service Applications ribbon, click the Permissions button.
  6. Add the farm setup administrator account.
  7. Enable Full Control for this account.
  8. Click OK.
  9. Close out the shell that produced the error message and open a new one as Administrator.
  10. Re-run your commands.
References
Notes
  • Thanks to Network Steve's post and Bram de Jager's post (referenced above) for providing the clues needed to solve this issue.  Steve suggested running the shell and commands under the farm service account to solve the problem I was experiencing.  This hinted at a permissions issue.  Bram's post was completely unrelated to the issue I was experiencing, but reminded me about how accounts are granted access to the User Profile service application.  Integrating the two discussions produced the necessary solution.
  • Traditional SharePoint Server 2013 farm; hosted on Windows Server 2012 VMs.  Hyper-V.  Patched through April 2014 CU.

Thursday, June 5, 2014

SharePoint 2013: Processing this item failed because of an unknown error when trying to parse its contents

Problem

You perform a crawl, and then, checking the crawl log, you see an overwhelming number of errors like the following:
Processing this item failed because of an unknown error when trying to parse its contents...
and
The content processing pipeline failed to process the item...
Solution
  1. Description: the Search Service account requires specific local user rights assignments to function fully, including:
    1. Adjust memory quotas for a process
    2. Impersonate a client after authentication
    3. Replace a process level token
  2. Short term: add the search service account (and the content access service account, if you use such) to the local Administrators group on each server in the farm hosting SharePoint Server.  By default, the members of the local Administrators group are automatically granted most all user rights, including those listed above.
  3. Long-term: request your network administrators to modify your organization's GPO to grant your farm's Search Service account these rights.
References

SharePoint 2013: You do not have an email address

Problem

You recently deployed User Profile Service application to your 2013 farm.  The service runs without issue.  User information appears in their My Sites.  However, when they attempt to configure alerts on lists or document libraries, they experience the following error message:

Troubleshooting
  • Check repeatability: explore the issue further and find that you can't create alerts either; not even with your administrator account. 
  • Determine scope: conduct inquiries among IT staff, you learn that staff are receiving email notifications from process workflows; and from your users you discover that they are receiving notifications when you grant them new permissions.  Therefore, you can logically conclude that the cause is not due to firewall or email server communication issues.
  • Research indicators:  a standard search found a few references involving the error message, all of which seemed to indicate that the problem involved user profile service application configuration, specifically, appropriate mapping of the AD email field.
  • Verify conclusion: perform ad-hoc search of user profiles to view random selection, and then check email address.  None indicated for any user.  Check email field mapping: currently set to default, or aCSPolicyName.
Solution
  1. Launch Central Administration.
  2. Go to: Application Management > Service Applications > Manage service applications > User Profile Service > Manage User Properties.
  3. Scroll down to the Contact Information section and then look for Work email.
  4. Hover the cursor over the Work email field, to expose the drop down, and then select Edit from this drop down.
  5. Scroll down to the Property Mapping for Synchronization section.
  6. Click the Remove button.
  7. In the Add New Mapping section, select mail from the Attribute drop down.
  8. Click the Add button.  A new entry will appear in the Property Mapping for Synchronization.
  9. Click OK.
  10. Go to: Go to: Application Management > Service Applications > Manage service applications > User Profile Service.
  11. In the Synchronization section, click Start Profile Synchronization.
  12. After this completes, wait an additional hour before engaging in any significant testing of alert creation.
References
  1. email not working : The following users do not have e-mail addresses specified
  2. SharePoint 2013 Alert Error: You do not have an email address
  3. Error when you create an alert in Microsoft SharePoint Online in Office 365 for enterprises pre-upgrade: "You do not have an email address"
  4. Configure alert settings for a Web application (SharePoint Server 2010)
  5. Manage user profile synchronization in SharePoint Server 2013
  6. Synchronize user and group profiles in SharePoint Server 2013
  7. Timer job reference (SharePoint 2013)
  8. Troubleshooting Steps for SharePoint Alert Email Does Not Go Out
Notes
  • User email addresses will not be immediately available to the alert creation process after completion of the user profile synchronization.  This is because further internal process must be complete that update internal user profile tables with the new email data.  This internal process involves various User Profile Service jobs that may take some time to complete.  From my own experience, it took upwards of an hour before all users could successfully receive list and library alerts.

Wednesday, June 4, 2014

SQL Server 2012: SQLServer Error: 15404 Could not obtain information about Windows NT group/user

Problem

In SQL Server 2012, you are configuring scheduled backup maintenance plans for your SharePoint 2013 farm databases. When you attempt to execute a maintenance plan, you experience the following error:
[date] [time],,Error,[298] SQLServer Error: 15404 <c/> Could not obtain 
information about Windows NT group/user [Account you are logged in as]<c/> 
error code 0x5. [SQLSTATE 42000] (ConnIsLoginSysAdmin)
Solution
  1. Determine the domain account that you will use to login to SQL Server as (eg, spAdmin) to create the farm database maintenance plans. 
  2. Launch Active Directory Users and Computers as administrator.
  3. Navigate to this account in the directory.
  4. Right-click this account and choose Properties.
  5. Select the Security tab.
  6. Click the Add button.
  7. Enter the SQL Server service domain account and click OK.
  8. On the Properties dialog, select this account in the Group or user names list.
  9. Ensure that Read is enabled and all its subpermissions (eg, Read account restrictions, Read general information, etc).  Disabling and enabling Read automatically enables all the Read subpermissions. 
  10. Click OK.
References
Notes
  • The error presented above appears in the SQL Server Error Logs: [ServerName] > Management > SQL Server Agent > Error Logs.

Tuesday, June 3, 2014

SharePoint 2013: Event 8193: A failure was reported when trying to invoke a service application: EndpointFailure

Problem

You recently rebuilt your SharePoint 2013 farm's Managed Metadata service application, and it runs without issue.  A day later, you check the SharePoint application server application event log, and see the following events:
Log Name:      Application
Source:        Microsoft-SharePoint Products-SharePoint Foundation
Date:          [Date/Time]
Event ID:      8313
Task Category: Topology
Level:         Error
Keywords:      
User:          [Farm Service Account]
Computer:      [Application Server]
Description:
A failure was reported when trying to invoke a service application: EndpointFailure
Process Name: OWSTIMER
Process ID: 1884
AppDomain Name: DefaultDomain
AppDomain ID: 1
Service Application Uri: urn:schemas-microsoft-com:sharepoint:service: 
26fdc21debf74e8a8fbb25146d98c2c4#authority=urn:
uuid:3e1fa89c3f23469f8e4947ba8a99c448&authority=
[Application Server]/Topology/topology.svc
Active Endpoints: 1
Failed Endpoints:1
Affected Endpoint: [Application Server]/26fdc21debf74e8a8fbb25146d98c2c4/
SearchService.svc
Event Xml:
...
and
Log Name:      Application
Source:        Microsoft-SharePoint Products-SharePoint Foundation
Date:          [Date/Time]
Event ID:      8313
Task Category: Topology
Level:         Error
Keywords:      
User:          [Farm Service Account]
Computer:      [Application Server]
Description:
A failure was reported when trying to invoke a service application: 
EndpointFailure
Process Name: w3wp
Process ID: 15280
AppDomain Name: /LM/W3SVC/522633276/ROOT-1-130460993295064685
AppDomain ID: 2
Service Application Uri: urn:schemas-microsoft-com:sharepoint:service:
7c3ac0e364d64a9bad706582b3ae004d#authority=
urn:uuid:3e1fa89c3f23469f8e4947ba8a99c448&authority=
[Application Server]/Topology/topology.svc
Active Endpoints: 1
Failed Endpoints:1
Affected Endpoint: [Application Server]/7c3ac0e364d64a9bad706582b3ae004d/
MetadataWebService.svc
Event Xml:

Solution
  1.  Rebuild the search service application.
References
Notes
  • The reference cited above did not directly apply to our situation but pointed the way to the solution: namely, rebuilding the affected service.

Tuesday, May 20, 2014

SharePoint 2013: FatalError: Object Search Service failed in event OnBackup

Problem

You recently deployed a new SharePoint Server 2013 farm. You attempted to perform a full backup through Central Administration for the first time.  After the backup completes, you review the Backup job status and find a number of errors, all involving backup of the Search service databases. You see this error in the status page:
Object Search Service failed in event OnBackup. For more information, see the spbackup.log or sprestore.log file located in the backup directory. FaultException: Management called failed with System.InvalidOperationException: 'Job failed: Have tried to perform backup/restore operation twice on all in-sync members in cluster SP0e4f5d0f2fce.0, but none succeeded. Last failure message: Microsoft.Ceres.SearchCore.Seeding.SnapshotTransferException: Could not send chunk ms\%default\gen.000000000000007e.state: Localpath: [0-349> to target BackupDirectoryTarget[directory=[PathToBackupFolder]\spbr0000\I.0.0,validateTransfers=False] at Microsoft.Ceres.SearchCore.Seeding.SnapshotSender.SendChunks(ISnapshot snapshot, ISeedSource source, ISeedTarget target, SeedStatus status, Func`1 checkAborted, Int32 targetFragIndex) at Microsoft.Ceres.SearchCore.Seeding.SnapshotSender.FirstPhaseTransfer(ISeedSource source, ISeedTarget target, Action`1 updateProgress, Func`1 shouldAbort) at Microsoft.Ceres.SearchCore.Seeding.BackupWorker.BackupWork.DoFirstPhaseWork()' at at Microsoft.Ceres.SearchCore.IndexController.BackupService.ThrowOnFailure(JobStatus status) at Microsoft.Ceres.SearchCore.IndexController.BackupService.ProgressFirstPhase(String handle) at Microsoft.Ceres.SearchCore.IndexController.IndexControllerManagementAgent.WrapCall[T](Func`2 original)
Opening the backup log file, spbackup, you see that the backup process was able to communicate with the backend just fine, as you'll see plenty of messages like these:
...
[Date/Time] Progress: [Search_AdminDB] 90 percent complete.
[Date/Time] Progress: [Search_AdminDB_CrawlStore] 100 percent complete.
[Date/Time] Progress: [Search_AdminDB_LinksStore] 91 percent complete.
[Date/Time] Progress: [Search_AdminDB_AnalyticsReportingStore] 91 percent 
complete.
[Date/Time] Progress: [Search_AdminDB] 97 percent complete.
[Date/Time] Verbose: [Search_AdminDB_CrawlStore] SQL Server Message: 
Processed 3 pages for database 'Search_AdminDB_CrawlStore', file 
'Search_AdminDB_CrawlStore_log' on file 1.
[Date/Time] Progress: [Search_AdminDB_LinksStore] 95 percent complete.
...
which indicates that the backup process is communicating with the backend just fine and thus this SQL Server interaction is not associated with the failure.  Looking farther down, towards the completion section of the log, you see errors occuring again:
...
[Date/Time] FatalError: Object Search_AdminDB failed in event
OnBackupComplete. For more information, see the spbackup.log or 
sprestore.log file located in the backup directory.
 Aborted due to error in another component.
[Date/Time] Verbose: Starting object: Search_AdminDB_CrawlStore.
[Date/Time] FatalError: Object Admin (C: on [Search Host Server]) 
failed in event OnBackupComplete. For more information, see the spbackup.log 
or sprestore.log file located in the backup directory.
 Aborted due to error in another component.
...
Lastly, looking at the application event log on the Search Service host server, you see the following timer event error:
Log Name:      Application
Source:        Microsoft-SharePoint Products-SharePoint Foundation
Date:          [Date/Time]
Event ID:      6398
Task Category: Timer
Level:         Critical
Keywords:      
User:          [Farm Account]
Computer:      [Search host server]
Description:
The Execute method of job definition Microsoft.SharePoint.Administration.Backup.SPBackupRestoreJobDefinition (ID 26eac21f-d391-4024-a82f-8ee0b5738ba3) threw an exception. More information is included below.
The backup job failed. For more information, see the error log that is located in the backup directory.
Event Xml:
...

Solution
  1. Identify the Search Service Account.
  2. Navigate to the intranet folder to which you intend to write full farm backups
  3. Grant the Search Service Account the following permissions to this folder:
    1. Read
    2. Write
References
  1. SharePoint 2013 Farm Backup Error : Object Search Service Application failed in event OnBackup. Could not send chunk to target BackupDirectoryTarget
  2. Plan for administrative and service accounts in SharePoint 2013
  3. Account permissions and security settings in SharePoint 2013
  4. Create and configure a Search service application in SharePoint Server 2013
  5. Configure backup and restore permissions in SharePoint 2013
  6. Back up Search service applications in SharePoint 2013
Notes
  • General: the issue involves service accounts and permissions to the destination backup folder. 
  • Backup log file: this will be found in the same folder containing the farm backup files.  For example, if your backup folder is SharePointBackup, inside this folder you will find a number of subfolders, each corresponding to a previous farm backup, numbered: spbr0000, spbr0001, spbr0002 and so on.  Open the most recent one to view the log of the most recent backup.
  • Search Service Account Access to Backup: this is new from 2010.  For 2010, you had to grant access to the SQL Server Service Account.  I was not able to find any discussion on this among all of the expected Microsoft documentation resources (see references 2 - 6).  Thanks to Amol Meshe for resolving this one.

SharePoint 2013: Product Configuration Wizard stuck on task 9 of 10

Problem

You have installed SP1 or some other cumulative update on all your farm servers without issue.  You then run the SharePoint Products Configuration Wizard on each machine.  It completes without issue on one or two machines, but then, on the next machine, it nearly completes but then appears to remain stuck on configuration task 9 of 10.  You wait an hour, but it still remains stuck on task 9 of 10.  You then check the Upgrade Status page in Central Administration

Solution
  1. Delete the SharePoint Products Configuration Wizard dialog.  Terminate the process of necessary to remove it.
  2. Open the Services.msc panel, and then look for the SharePoint Timer Service.
  3. Stop this service.
  4. Open Windows Explorer, and then navigate to the cache folder at: C:\ProgramData\Microsoft\SharePoint\Config.
  5. Look for the most recent cache folder, and then open it.
  6. Take note of how many configuration files are in this folder.  For example, looking in this folder for one of my WFEs, I see 1736, including the cache.ini.
  7. Delete all files in this folder EXCEPT cache.ini.
  8. Open the cache.ini file in a text editor, and then randomly modify the number, but keep it at the same number of digits.
  9. Save the cache.ini file.
  10. Start the SharePoint Timer Service.
  11. Now watch the configuration folder.  It will start filling up with new configuration cache files quickly.  When it reaches the number you noted down in step 6, and you no longer see any new files being created, proceed to the next step.
  12. Open a command prompt as Administrator.
  13. Run the following command: Psconfig.exe -cmd upgrade -inplace b2b -wait -force
  14. Check the Upgrade Status.
References
Notes
  • Upgrade Status page: CA > Upgrade and Migration > Upgrade and Patch Management > Check upgrade status.
  • Cache folder: You may get a Folder not found error trying to navigate to this folder.  If so, try navigating first to one of the higher tier folders, then double-clicking on each subfolder in turn.
  • The path to psconfig is herec:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\15\BIN\.
  • Seems stuck on 10%: I've experienced this several times over the years.  Before determining that the upgrade process is actually experiencing problems, just do a simple check of the Upgrade log.  Check to see what the most recent time stamp is: if its quite recent then the upgrade process is likely moving forward successfully.  Check back again a moment later: if you see a new time stamp then again it is likely that the upgrade process is successfully moving forward and is just not updating the percentage complete in a meaningful way.  
  • Actually stuck on 10%: then there is the case where you check back after awhile - maybe after a long while - and no new time stamp entry has been added to the Upgrade log - even after an hour or two.  In this case, there might be a problem.  Be sure to review the Upgrade error log to see if there are any unusual errors.  In this case, which I have experienced just once, fortunately.  This occurred during an upgrade launched by this command: PSConfig.exe -cmd upgrade -inplace b2b -force -cmd applicationcontent -install -cmd installfeatures.  Nothing unusual seemed to occur until it reached Step 5 of 6.  Checking the upgrade log found no issues; there was an upgrade error log generated, but the errors logged were the pesky "web application is configured with claims authentication mode however the content database you are trying to..." warnings that I ignore.  I waited an hour and then followed the procedure above, and I was able to successfully complete the upgrade. 
  • Force upgrade to end with error: when it seems like the upgrade is stuck, you can force it to end, albeit with error, by stopping the timer and then restarting it.

Friday, May 16, 2014

SharePoint 2013: The process was terminated due to an unhandled exception

Problem

You have a SharePoint Server 2013 farm, with one application and two web front end (WFE) servers in a traditional topology.  The farm servers are VMs hosted on Hyper-V.  Farm servers have both private and external NICs configured.  The private network is limited to farm servers.  A HOST file is used to enable private network routing among farm servers.  The Distributed cache service is running on the application server and the two WFEs.  You review the Application Event log for one of your web front end (WFE) servers and find the following set of events appearing in the log on an hourly basis:
Log Name:      Application
Source:        Application Error
Date:          [date/Time]
Event ID:      1000
Task Category: (100)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      [WFE]
Description:
Faulting application name: WerFault.exe, version: 6.2.9200.16659, time stamp: 0x51db3bf4
Faulting module name: wer.dll, version: 6.2.9200.16384, time stamp: 0x501081cc
Exception code: 0xc0000005
Fault offset: 0x0000000000021fe5
Faulting process id: 0x318c
Faulting application start time: 0x01cf62c15e954491
Faulting application path: C:\Windows\system32\WerFault.exe
Faulting module path: C:\Windows\system32\wer.dll
Report Id: 9d3c283d-ceb4-11e3-9405-00155d38891a
Faulting package full name: 
Faulting package-relative application ID: 
Event Xml:
...
and
Log Name:      Application
Source:        Application Error
Date:          [date/time]
Event ID:      1000
Task Category: (100)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      [WFE]
Description:
Faulting application name: DistributedCacheService.exe, version: 1.0.4632.0, time stamp: 0x4eafeccf
Faulting module name: KERNELBASE.dll, version: 6.2.9200.16815, time stamp: 0x52f2ca60
Exception code: 0xe0434352
Fault offset: 0x00000000000264a8
Faulting process id: 0x3588
Faulting application start time: 0x01cf62c03cd277d1
Faulting application path: C:\Program Files\AppFabric 1.1 for Windows Server\DistributedCacheService.exe
Faulting module path: C:\Windows\system32\KERNELBASE.dll
Report Id: 9c501e53-ceb4-11e3-9405-00155d38891a
Faulting package full name: 
Faulting package-relative application ID: 
Event Xml:
...
and
Log Name:      Application
Source:        .NET Runtime
Date:          [date/time]
Event ID:      1026
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      [WFE]
Description:
Application: DistributedCacheService.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: Microsoft.ApplicationServer.Caching.DataCacheException
Stack:
   at Microsoft.ApplicationServer.Caching.VelocityWindowsService.StartServiceCallback(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()

Event Xml:
...
You check the Distributed Cache service on the WFEs through Manage Services on Server in Central Administration and see that the services are running on all servers. You then open a SharePoint Management Shell as administrator on one of the WFEs, and you run Use-CacheCluster and Get-CacheHost to check the status of the cache hosts.  The status shows the application server status as UNKNOWN, the local server as UP and the other WFE as UNKNOWN.  You then remote into the application server and repeat the powershell commands, and this time the status is: application server UP and the two WFEs UNKNOWN.  A similar result is experienced for the second WFE.

Solution
  1. Check the HOST file on each of the farm servers and verify that the host names in the file are fully qualified.
References
  1. Cache Administration with Windows PowerShell (AppFabric 1.1)
  2. AppFabric 1.1 caching service crashes with System.UriFormatException: Invalid URI: The hostname could not be parsed
  3. AppFabric Event ID 1000 and Event ID 1026 with SharePoint 2013
  4. Raw error past data on PASTEBIN by Anonymous
  5. AppFabric Caching and SharePoint: Concepts and Examples (Part 1)
Notes
  • The primary posting that helped solve this was [3]. Hat tip to the experts at Sterling International Consulting Group for identifying this one as it relates to static private networking.

SharePoint 2013: Critical Event 6398: The Execute method of job definition Microsoft.SharePoint.Administration.SPAppInstallationJobDefinition... threw an exception

Problem

You review the Application event log on one of your SharePoint Server 2013 farm web front end (WFE) servers and find the following critical event occuring every five minutes:
Log Name:      Application
Source:        Microsoft-SharePoint Products-SharePoint Foundation
Date:          [Date/Time]
Event ID:      6398
Task Category: Timer
Level:         Critical
Keywords:      
User:          [FarmAccount]
Computer:      [WFE1]
Description:
The Execute method of job definition 
Microsoft.SharePoint.Administration.SPAppInstallationJobDefinition 
(ID 16bfab57-4403-44da-9092-37f0f81fb32e) threw an exception. More 
information is included below.

Access to the path 'C:\ProgramData\Microsoft\SharePoint\AppInstallation'
is denied.
Event Xml:
...
Solution
  1. Open Windows Explorer, and navigate to C:\ProgramData\Microsoft.
  2. Right-click the SharePoint subfolder, and then choose Properties.
  3. Select the Security tab.
  4. Take  note of any entries here that display as SIDS only.
  5. Perform SID lookup, both against AD and against the local machine.
  6. Verify that the following two user groups are added and that they have been configured with the following permissions:
    1. WSS_ADMIN_WPG
      1. Full Control
    2. WSS_WPG
      1. Read & Execute
      2. List folder contents
      3. Read
  7. Remove any corrupted accounts/groups that appear here (you will see their SIDS only).
  8. Reboot the server.
References
Notes
  • I have found that performing a full re-installation of SharePoint Server 2013 (including configuration) seems to sometimes cause corruptions among the local user groups that SharePoint configures and that existing user groups are not properly cleaned up or overwritten when performing a re-installation.
  • I have found that this critical event can be temporarily resolved by adding the (AppFabric) Distributed Cache service account to the local Administrators group on each server hosting the Distribute Cache service.  I had to reboot the server after doing this, or the new account privileges was not realized.
  • Through investigation, the SIDS that I found here did not match up with the current SIDS for the WSS_ADMIN_WPG and WSS_WPG user groups already configured on the WFEs (the SIDS will be different from WFE to WFE - these are local user groups).  I think that these orphaned SIDS may reflect earlier SIDS for these same user groups from a previous installation.
  • The farm service account is a member of the WSS_WPG and WSS_ADMIN_WPG local user groups.

Wednesday, May 14, 2014

SharePoint 2013: Unable to change topology when Generation controller is not active

Problem

You are attempting to modify the existing farm search topology by adding a second WFE and adding index and query components to this second WFE.  Within your SharePoint Management PowerShell window, you clone the current enterprise search topology, and make the appropriate modifications, but when you then try to activate it, you are presented with the following error: 
Exception calling "Activate" with "0" argument(s): "Topology activation failed. Management called failed with System.InvalidOperationException: 'Unable to change topology when Generation controller is not active' at at Microsoft.Ceres.SearchCore.IndexController.IndexController.IsEmptyIndex(String indexSystemName) at Microsoft.Ceres.SearchCore.IndexController.IndexControllerManagementAgent.WrapCall[T](Func`2 original)" At line:1 char:1 + $Clone.Activate() + ~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [], MethodInvocationException + FullyQualifiedErrorId : SearchTopologyActivationException SharePoint 2013 Unable to change topology when Generation controller is not active
You then check the Search Service Topology and note that the index component on the WFE currently hosting the index and query components is in a degraded state.  This is the only index component for the farm.  You then engage in troubleshooting.

Troubleshooting
  1. Action: attempted to reset index.
    • Result: would not complete.
  2. Action: attempted to delete Search Service Application
    • Result: would not complete.
  3. Action: perform steps as in reference [2], then removed and re-installed the Search Service application.
    • Result: was able to perform a delete of the Search Service Application and then build a new search service application.
  4. Action: viewed search topology on Search Service Administration page.
    • Result: index partition no longer displays in degraded state.
Solution
  1. Get out of any still running index reset or search application deletion prompts that display that they are still busy.
  2. If you are not there already, remote into the farm server hosting the search service.
  3. Stop the SharePoint Timer Service on this server.
  4. Open up Windows Explorer, and then navigate to C:\ProgramData\Microsoft\SharePoint\Config\.  You'll see one or more folders at this location named by a GUID.  Look for the most recent one. 
    Note: this posting assumes that the SharePoint 2013 binaries were installed default (to the C: drive).
  5. Open this folder.  You will see a number of XML files. 
  6. Delete all XML files in this folder only.
  7. Now look for the file cache.ini.  It should be the only one remaining.
  8. Open this file in a text editor.  It will contain a single six-digit number.
  9. Edit this number randomly, but keeping it at six-digits.
  10. Save the file.
  11. Restart the SharePoint Timer Service.
  12. Now delete the Search Service Application.
  13. Rebuild the Search Service Application.
References
  1. Create new Index Component and add it to a clone topology?
  2. SharePoint 2013 Unable to change topology when Generation controller is not active
  3. An update conflict has occurred, and you must re-try this action. The object SearchServiceApplication Name={FAST SSA} was updated by {account}, in OWSTIMER (5836) process, on machine {server name}.
  4. One or more servers is not responding (SharePoint Foundation 2010)
Notes
  • If your site displays an HTTP 500 Internal Server error page, after performing these stops (specifically, stopping and starting the SharePoint Timer Service), check IIS to verify that all application pools are started on the WFE you were working on.

Tuesday, May 13, 2014

SharePoint 2013: My Site content is not being crawled

Problem

You have deployed a new Search Service Application and launched a full crawl of all content among all sites, including your farm My Site instance.  On reviewing the crawl log, you see the following warning associated with My Site crawl:
This item and all items under it will not be crawled because the owner has set the NoCrawl flag to prevent it from being searchable.

You're not sure where this setting is.

Solution
  1. In any browser, navigate to the root My Site URL.
  2. From the Settings gear icon (top right corner), click Site Settings.  The Site Settings page is displayed.
  3. In the Search group, look for Search and offline availability.
  4. Click this link.  The Search and Offline Availability page is displayed.
  5. For the Indexing Site Content setting, select Yes.
  6. Click OK.
  7. Start a new crawl.
References
Notes
  • This setting used to be located in the Site Administration group in SharePoint Server 2010.

SharePoint 2013: Built-in accounts are used as application pool or service identities

Problem

You find the following entry in the SharePoint 2013 Central Administration, Review problems and solutions, All Reports listing:

TitleBuilt-in accounts are used as application pool or service identities. 
Severity2 - Warning
CategoryConfiguration
ExplanationUsing built-in accounts like Network Service or Local System as application pool or as service identities is not supported in a farm configuration.  The following services are currently running as built-in identities on one or more servers: c2wts(Windows Service)
RemedyBrowse to  .../_admin/FarmCredentialManagement.aspx and change the account used for the services listed in the explanation. For more information about this rule, see "http://go.microsoft.com/fwlink/?LinkID=142699".
Failing Servers
Failing ServicesSPTimerService (SPTimerV4)
Rule SettingsView
  
Solution
  1. Determine the SharePoint service account that you want to use as the identity for the Claims to Windows Token Service.
    1. This service account should be a domain account with standard user privileges. 
      For my environments, I generally use the services account (eg, spService) as the identity for C2WTS.
    2. If your environment is locked down, ensure that the network GPO is modified to grant the service account the Impersonate a client after authentication and Log on as a service local user rights assignments.
    3. Using the SharePoint Setup/Administrator account (eg, spadmin), launch Central Administration, and then navigate to Security > Configure service accounts.
    4. From the upper dropdown, select Windows Service - Claims to Windows Token Service.
    5. From the lower dropdown, select the desired service account.
    6. Click OK.
    7. Check the Events log on the server on which the service is started to ensure that this account re-assignment did not generate any errors.
References
  1. Claims to Windows Token Service (c2WTS)
  2. c2wts - Windows Service account SharePoint Server 2010
  3. Built-in accounts are used as application pool or service identities (SharePoint Foundation 2010)
  4. How to configure the Claims to Windows Token Service (C2WTS) for SharePoint 2013
  5. Local Policy Settings
  6. SharePoint 2013: Service Account Configurations and Permissions
Notes
  • Remarkably, the SharePoint 2013 Health Analyzer rules reference does not include an entry for this issue, though the 2010 rules reference does.  A link to the appropriate rule reference for this issue is provided in the References section.
  • Reference 4 indicates that the C2WTS service account should have the Act as part of the operating system user right assignment in local policy.  However, in its Local Policy Settings reference, Microsoft states that,
    This user right is extremely powerful; anyone with this right can take complete control of the computer and erase virtually any evidence of their activities.

    Limit the Act as part of the operating system user right to as few accounts as possible—it should not even be assigned to the Administrators group under normal circumstances. When a service requires this user right, configure the service to log on by using the local System account, which has this user right inherently. Do not create a separate account and assign this user right to it.

    This user right is rarely needed by any accounts other than the local System account.
    Thus, it's not clear to me that the C2WTS service account actually requires this user right assignment.  The other two rights assignments, Impersonate a client after authentication and Log on as a service, are common assignments made to service accounts by the SharePoint Products Configuration Wizard, and thus I have no difficulty with these.  More conclusively, I have set a domain account to run C2WTS that does not have the Act as part of the operating system user right assignment, and the service ran just fine without generating any events on the server.

Monday, May 12, 2014

SharePoint 2013: verifying workflow setup returns HTTP 403 Forbidden error

Problem

You have completed installation and configuration of Workflow Manager 1.0 to the farm server hosting this application and workflow manager clients to the web front end (WFE) servers.  You now attempt to verify correct setup.  You launch a browser, and then try to connect to the fully qualified domain name of the server + :12290/, as indicated.  You then experience the error:
Solution
  1. Launch IE as administrator, and then connect to the URL.
  2. In the alternative, first launch Central Administration, and then connect to the URL using the same or different tab of this browser instance.
References
Notes
  • For this posting, Workflow Manager 1.0 was installed to a farm application server.
  • The farm is installed to Windows Server 2012 VMs.

SharePoint 2013: 500 Internal Server Error and Security Token Service Failure

Problem

You connect to the primary website for your customers and experience the following response:
You check farm server logs, and, on one of the web front ends (WFE), you see application errors like the following:
Log Name:      Application
Source:        Microsoft-SharePoint Products-SharePoint Foundation
Date:          [date/time]
Event ID:      8306
Task Category: Claims Authentication
Level:         Error
Keywords:      
User:          [search or other service account]
Computer:      [name of problematic WFE]
Description:
An exception occurred when trying to issue security token: The server was 
unable to process the request due to an internal error.  For more 
information about the error, either turn on IncludeExceptionDetailInFaults 
(either from ServiceBehaviorAttribute or from the servicedebug configuration 
behavior) on the server in order to send the exception information back to 
the client, or turn on tracing as per the Microsoft .NET Framework SDK 
documentation and inspect the server trace logs..
Event Xml:
...
This error may feature various service accounts due to different farm services being affected.  You then connect to Central Administration, review the health report, and see the following rule violation:
The Security Token Service is not available
and the server triggering this rule violation is one of the farm's WFEs.

Solution
  1. Remote into the problematic WFE.
  2. Open IIS Manager, and then navigate to Application Pools.
  3. View the state of the SecurityTokenServiceApplicationPool.
    • If it is started, stop and restart it.
    • If it is stopped, start it.
  4. Open a command prompt as administrator.
  5. Perform an IISReset /NoForce
  6. Connect to the primary customer site again.
References
Notes
  • About this posting: this posting consolidates my notes regarding troubleshooting and resolving one cause of an HTTP 500 error.  In this scenario, the HTTP 500 error is associated with an issue involving the Security Token Service.
  • Central Admin may not be affected: If you don't host CA on one of your WFEs, it won't be affected by this issue.  This is the case here, where the farm's Central Administration site is not hosted on one of the WFEs (it employs the traditional rather than streamlined topology).  This scenario helps demonstrate why you may want to choose the traditional topology over a streamlined one: had CA also been hosted on one of the WFEs, it too would have been affected by the SecurityTokenService issue and would not have been available to provide useful troubleshooting information.
  • What to do if it's not Security Token Service: If you're looking at this posting, and your HTTP 500 Internal Server experience is not associated with a Security Token Service Failure, then glance over this checklist:
    • In Services, verify SharePoint Timer Service (SPTimerV4) is running
    • In IIS, verify website is started
    • In IIS, verify SharePoint Web Services Root application pool is started.
    I have experienced that if the SharePoint Web Services Root application pool is stopped on one of the WFEs, even though they're in NLB configuration, this will bring down the entire website.  I don't understand why, and I haven't had the time to explore it further.
  • Some users see HTTP 500 but other are still able to access: farm topologies employing some form of network load balancing (NLB) will present seemingly inconsistent experiences here to the administrator trying to troubleshoot.  This is due to NLB routing some requests to the failing WFE and other requests to the remaining healthy WFEs.
  • PersistedNavigationTermSetSyncJobDefinition failure: if you are using managed navigation, and the security token service fails on one of the farm WFEs, you may see this error appearing in the server log at fairly regular intervals (eg, every 15 minutes or so):
    Log Name:      Application
    Source:        Microsoft-SharePoint Products-SharePoint Foundation
    Date:          [date/time]
    Event ID:      6398
    Task Category: Timer
    Level:         Critical
    Keywords:      
    User:          [farm service account]
    Computer:      [WFE on which security token service is failing]
    Description:
    The Execute method of job definition Microsoft.SharePoint.Publishing.Internal.
    PersistedNavigationTermSetSyncJobDefinition 
    (ID 5cfde201-6fd8-4ee3-8920-95a7d017a22f) 
    threw an exception. More information is included below.
    
    The server was unable to process the request due to an internal error.  
    For more information about the error, either turn on 
    IncludeExceptionDetailInFaults (either from ServiceBehaviorAttribute or 
    from the servicedebug configuration behavior) on the server in order 
    to send the exception information back to the client, or turn on tracing 
    as per the Microsoft .NET Framework SDK documentation and inspect the 
    server trace logs.
    Event Xml:
    
    and this one also this one, though it may occur at varying times:
    Log Name:      Application
    Source:        Microsoft-SharePoint Products-SharePoint Server
    Date:          [date/time]
    Event ID:      8088
    Task Category: Taxonomy
    Level:         Warning
    Keywords:      
    User:          NT AUTHORITY\IUSR
    Computer:      [problematic WFE]
    Description:
    The Managed Metadata Service 'Managed Metadata Service' is inaccessible.
    Event Xml:
    ...
    
  • SharePoint Check: I developed a simple PowerShell script that interrogates farm services, IIS features and other services and presents the results listed in the shell.  One thing it returns is the state of the IIS application pools.