We have a 5 node cluster. All nodes are running fully patched versions of Windows Server2012 Datacenter (including hotfixes KB2813630 and KB2796995). Storage is EqualLogic running firmware 6.0.2. All nodes have EqualLogic HIT 4.5 installed
and we are using the hardware provider. We have two 3TB thin provisioned CSVs setup. One is not in use. The other currently contains the first 14 VMs that have been moved from our existing stand-alone Windows Server 2008 R2 SP1 Hyper-V servers.
Only 5 of the 14 VMs are being backed up. Protection was stopped and started for the move and the required consistency check was performed after the move. The DPM server is a physical server running SCDPM 2012 SP1 RU2. All Hyper-V servers
have had their agent updated after RU2. The SCDPM server only has a single protection group setup for all Hyper-V servers (legacy 2008 R2 servers and 2012 cluster). All backups are succeeding on the legacy servers which are running the same EqualLogic
HIT version and are storing their VMs on the same SAN. Overnight, some backups will fail and others will succeed. When I fix them up the next day, they will sometimes fail as well even if I tell it to resume backups on one VM at a time. I
can see the hardware snapshots being created on the SAN. The SAN doesn't report any errors. SCDPM fails and reports the following:
Type: Recovery point
Status: Failed
Description: The VSS application writer or the VSS provider is in a bad state. Either it was already in a bad state or it entered a bad state during the current operation. (ID 30111 Details: VssError:A function call was made when the object
was in an incorrect state
for that function
(0x80042301))
More information
End time: 4/23/2013 3:37:09 PM
Start time: 4/23/2013 3:34:44 PM
Time elapsed: 00:02:25
Data transferred: 0 MB
Cluster node xxxxx.xxxx.xxx
Recovery Point Type Express Full
Source details: \Backup Using Child Partition Snapshot\vm1
Protection group: Hyper-V VMs - Daily
It leaves the Micrsoft Hyper-V VSS Writer in a failed state with a Timed Out error. All other VSS writers are fine. I am also intermittently seeing the following in Application log on some nodes only when backups fail:
Event: 12363
Source: VSS
An expected hidden volume arrival did not complete because this LUN was not detected.
LUN ID {350f0b61-0244-4708-abab-a413fb710e7b}
Version 0x0000000000000001
Device Type 0x0000000000000000
Device TypeModifier 0x0000000000000000
Command Queueing 0x0000000000000001
Bus Type 0x0000000000000009
Vendor Id EQLOGIC
Product Id 100E-00
Product Revision 6.0
Serial Number 6090A0881074D4686E17059B9F4365CA
Storage Identifiers
Version 16
Identifier Count 2
Identifier 0
CodeSet "VDSStorageIdCodeSetBinary" (1)
Type "VDSStorageIdTypeFCPHName" (3)
Byte Count 16
60 90 A0 88 10 74 D4 68 6E 17 05 9B 9F 43 65 CA `....t.hn....Ce.
Identifier 1
CodeSet "VDSStorageIdCodeSetBinary" (1)
Type "VDSStorageIdTypeVendorSpecific" (0)
Byte Count 16
01 00 00 00 1F BF 0E 6A 00 00 00 3F 00 00 10 54 .......j...?...T
Operation:
Exposing Volumes
Locating shadow-copy LUNs
PostSnapshot Event
Executing Asynchronous Operation
Context:
Execution Context: Provider
Provider Name: Dell EqualLogic VSS HW Provider
Provider Version: 4.5.0
Provider ID: {d4689bdf-7b60-4f6e-9afb-2d13c01b12ea}
Current State: DoSnapshotSet
Event: 8194
Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface. hr = 0x80070005, Access is denied.
. This is often caused by incorrect security settings in either the writer or requestor process.
Operation:
Gathering Writer Data
Context:
Writer Class Id: {e8132975-6f93-4464-a53e-1050253ae220}
Writer Name: System Writer
Writer Instance ID: {d70791b2-f0fe-416e-bbea-e631878ee313}