Saturday, January 18, 2014

SharePoint 2010 Health Analyzer: The timer service failed to recycle

Problem

You find the following entry in the SharePoint 2010 Central Administration Review problems and solutions listing:

TitleThe timer service failed to recycle.
Severity2 - Warning
CategoryPerformance
ExplanationThe last attempt to recycle the timer service failed as have most of the other attempts during the past week. Recycling typically fails because other timer jobs are running when the recycle is scheduled. To view which jobs blocked the recycle view the history for the recycle job and click on the failed status link for more information. The error message for the failed job entry will contain a list of jobs that were still running. The history for the recycle job can be found at: [path to timer job history for the associated timer job]
RemedyChange the schedule for the timer recycle job so that it does not conflict with other long-running timer jobs. This can be done from the central administration site at [path to timer job history for the associated timer job].  For more information about this rule, see "http://go.microsoft.com/fwlink/?LinkID=142615".
Failing Servers[server name list]
Failing ServicesSPTimerService (SPTimerV4)
Rule SettingsView
 
Troubleshooting
  1. Navigate to the timer job history and see the list of instances that failed. 
  2. Clicking the Failed status link, the following error message is seen: The timer service was not recycled because the following jobs were still running: Microsoft SharePoint Foundation Usage Data Import.
  3. CA > Monitoring > Timer Job Status, Job Definition: Timer Service Recycle, View: Job Definition
    1. Scheduled to run daily @ 6PM on all farm servers
    2. Duration : 00:10:30.
  4. CA > Monitoring > Timer Job Status, Job Definition: Microsoft SharePoint Foundation Usage Data Import, View: Job Definition
    1. Scheduled to run every 30 minutes on all farm servers
    2. Duration: varies between 6 and 9 hours
    3. Progress has been stuck at 0% for all farm servers for many hours.
    4. Reviewing history: when first started, completed in 2-3 hours.  Now, two weeks later, completes in 6-9 hours.  Steady increases in duration logged over this period.
Solution
  1. At next maintenance window, bounced servers (for other maintenance as well).  This effectively restarted the SharePoint 2010 Timer service
  2. Durations had reached 11+ hours for Microsoft SharePoint Foundation Usage Data Import jobs on all servers.
  3. Reviewed job history for Microsoft SharePoint Foundation Usage Data Import:
    1. Durations dropped from over 11 hours down to minutes.
      1. Application Server: few minutes
      2. WFE1: several minutes
      3. WFE2: 10+ minutes.
    2. Difference between WFEs is interesting.  Will need to research this.
  4. Manually started Timer Service Recycle job
  5. Checked Timer Service Recycle job history several hours later:
    1. Succeeded for all servers.
  6. Reopened issue, and then clicked Reanalyze Now.
  7. Checked report a few minutes later:
    1. Issue gone.
This is only a temporary solution until I can get the machines patched through December 2013 CU, which, according to one of the references below, is one solution to this problem.

Notes
  • Server OS: Windows 2008 R2
  • SharePoint Farm patch level: 14.0.6123.5000
  • Verified that KB2775511 not installed on server
References

4 comments:

Anonymous said...

We have the exact same problem. Did you install the Dec 2013 CU and if so did it remedy the problem. Also did the CU install without issue?

Al said...

We're in the process of migrating to 2013, so, non-catastrophic 2010 issues have low priority. Once the migration's completed, I'll have the time to go back to the 2010 farm (which we'll keep for awhile), and explore some of these issues just for my own interest.

Anonymous said...

So how is your progress of exploring this issue? Does installing the Dec 2013 CU helps?

Al said...

I have performed several test farm deployments on Windows Server 2012 VMs, each time updating the binaries through the December 2013 CU. So far, I have not had the timer recycle issue appear once for any test farm deployment. Additionally, my 2013 dev farm (also updated through December 2013 CU) has been up since February 2014 and has not presented this issue to date. Thus, it would appear that the December 2013 CU resolved the timer recycle issue.