Tuesday, April 26, 2016

SharePoint 2013: SQL Server: Status: Warning: One or more tasks failed

Problem

You have configured regular maintenance plans for your SharePoint 2013 farm's SQL Server backend, including full, differential and transaction, and these plans generate reports on completion. Reviewing the differential backup report, you see the following:
Microsoft(R) Server Maintenance Utility (Unicode) Version 11.0.6020
Report was generated on "[Server name]\[SQL Server named instance]".
Maintenance Plan: User Database Differential backup MP
Duration: 00:00:02
Status: Warning: One or more tasks failed.
Details:
Back Up Database (Differential) ([Server name]\[SQL Server named instance])
Backup Database on Local server connection
Databases that have a compatibility level of 70 (SQL Server version 7.0) will be skipped.
Databases: All user databases
Type: Differential
Append existing
Task start: [beginning time/date stamp].
Task end: [ending time/date stamp].
Failed:(-1073548784) Executing the query "BACKUP DATABASE [a-farm-database..." failed with the following error: "Cannot perform a differential backup for database "[a-farm-database]", because a current database backup does not exist. Perform a full database backup by reissuing BACKUP DATABASE, omitting the WITH DIFFERENTIAL option.
BACKUP DATABASE is terminating abnormally.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.
The failure of the differential backups to complete presented a database recovery inconsistency problem.  At present, during any given week, farm databases could only be recovered up to the most recent successful differential and transaction logs that completed prior to the failed backup.  This presented a potential increase in recovery point window.

Troubleshooting

1) Checking the server's application log, you see a bunch of 3041 events - one for each of the farm databases:
Log Name:      Application
Source:        MSSQL$[instance name]
Date:          [time/date stamp]
Event ID:      3041
Task Category: Backup
Level:         Error
Keywords:      Classic
User:          DOMAIN\[sql server service account]
Computer:      [SQL Server host name]
Description:
BACKUP failed to complete the command BACKUP DATABASE [a-farm-database] 
WITH DIFFERENTIAL. Check the backup application log for detailed messages.
Event Xml:
...
2) Checking  SQL Server logs, you see events similar to the server Application Log event noted above.  You also find that backups were completed successfully, just before and immediately after the backups that failed.  For example, the evening differential that was scheduled to begin at 8 AM failed.  However, transaction log backups performed just before the evening differential, at 7 PM, and the first transaction log backup of the following day, at 7 AM, both completed successfully.  Filtering by the event ID, you find that there have been other instances of event 3041 occurring apparently randomly over many months.

3) In discussions with systems administrators, you find that there were no network issues occurring during the time period for the differential backups - and, in any case, any network issues would have affected communications among the farm servers, which would have presented as other error events occurring in server event logs.  No such events were found in server event logs for the period in question.

4) Also in these discussions, systems administrators inform you that Data Protection Manager is scheduled to run production farm server backups at 9 PM, Monday through Saturday.  Correlating 3041 events against a calendar, you verify that 3041 events occurred only on weekdays and Saturday, and that no 3041 events occurred on Sundays. System administrators noted however that DPM backups, though they may be scheduled to run at a certain time, do not always actually run at that time and that they may occur any time in a window around their scheduled time.  This suggested that, though SQL Server database differentials were scheduled to run at 8 PM, and DPM backups were scheduled to run at 9 PM, there could still be some overlapped that might occur randomly on any given day.

5) In discussions with the database administrator, the DBA informs you that he reviewed the MSDB.backupmediafamily table and found entries in this table that:
  a) were identified by GUID and not file path and name and that
  b) correlated with the appearance of 3041 events.
Apparently, DPM backups were sometimes being logged to the production farm SQL Server.  When they did, they would break the path from the differential back to the full backup, and thus the differential would fail.  What was puzzling was the random nature of the DPM entries in the MSDB: some days they were entered and some days there were not.

6) After further discussion with the DBA, it was agreed to move database backup maintenance plan scheduling to two hours after DPM backup scheduling, and then monitor.  Over the next two weeks, no further backup failures occured and the problem was considered resolved.
   

Solution

  • Schedule DPM and SQL Server backups sufficiently apart so that no overlap can occur.

References

  • Farm servers virtualized on HyperV 2012.  Microsoft Data Protection Manager (DPM) runs nightly, backing up all production farm servers.
  • Single SQL Server instance.
  • SQL Server database backup maintenance plans configured.

No comments: