Quantcast
Viewing latest article 5
Browse Latest Browse All 90

[SOLVED] DPM 2010 several 5121 events causing cluster storage to crash (storage signature change event 1034) HW VSS

Hi all,

I have a 3 node Hyper-V R2 cluster (Node majority model) with all MS updates installed, below is more information about them:

node1: Windows 2008 R2 Datacenter Core, Service Pack 1, Cluster Version 6.1.7601
node2: Windows 2008 R2 Datacenter Core, Service Pack 1, Cluster Version 6.1.7601
node3: Windows 2008 R2 Datacenter Core, Service Pack 1, Cluster Version 6.1.7601

These 3 nodes are connected to an Equallogic PS6010XV storage using 10Gbe cards (iSCSI SAN) like below:

node1,2,3 |----- 10 Gb Card 1 ----- Dell 8024F Switch 1 ----- |-----------------|
                 |                                                                          |   PS6010XV   |
                 |----- 10 Gb Card 2 ----- Dell 8024F Switch 2 ----- |-----------------|

All three nodes have Dell HitKit installed (HitKit provides hardware VSS snapshots)

C:\Users\administrator>vssadmin list providers
vssadmin 1.1 - Ferramenta de linha de comando administrativa de cópias de sombra
 de volume
(C) Copyright 2001-2005 Microsoft Corp.

Nome do provedor: 'Microsoft Software Shadow Copy provider 1.0'
   Tipo de provedor: Sistema
   Identificação do provedor: {b5946137-7b9f-4925-af80-51abd60b20d5}
   Versão: 1.0.0.7

Nome do provedor: 'Dell EqualLogic VSS HW Provider'
   Tipo de provedor: Hardware
   Identificação do provedor: {d4689bdf-7b60-4f6e-9afb-2d13c01b12ea}
   Versão: 3.4.2.5386

Microsoft DPM 2010 3.0.7707.0 (Rollup 1) is installed into another physical machine and connected to the 3 node cluster through another network (Public Network).

I've configured DPM to perform a host level backup using the hardware VSS provider using this linkHow to use Hardware VSS Writers for DPM Hyper-V Backup and configured DPM MaxAllowedParallellBackups param to 1 (don't know why DPM doesn't respect this parameter since I always see 3 simultaneous backups everytime, not only one).

I created a protection group for all virtual machines with an Express Full every day in the afternoon .... everything was running as expected except the fact of several 5121 events shown in cluster logs but searching on the forums I found that this event is pretty normal since IO becomes redirected for some time during a hardware VSS snapshot. I can see the snapshots being created from Equallogic console too, each with its own proprietary (node1, node2, node3).

What is really bothering me is that after some time the host level backup works Ok (several 5121 events) but suddenly the CSV volume crashes with a 1069 event followed by a 1034 event ... what put all the 13 virtual machines in a failure state :-(, below a screenshot of the problem:

Image may be NSFW.
Clik here to view.
1069

Image may be NSFW.
Clik here to view.
1034

To solve this problem I took the following steps:

1. I had to remove the storage from the cluster (Failover Cluster Manager) there was no way to put it online, it indicates failure.
2. I entered into each cluster node and through iscsicpl command recognized the volume again.
3. I entered into the cluster node that was the proprietary of the CSV disk and put it online.
4. After step 3 the storage was recognized by the cluster so I put the disk offline again.
5. I added the 1.5TB "non empty" volume back again in the cluster and put it into CSV.
6. I started all virtual machines.

I know that this shouldn't happen but I have no clue of where the problem could be ... maybe a KB for the cluster nodes, maybe a new version of Equallogic HitKit (Hardware VSS), maybe a storage firmware upgrade (it's relatively new).

Could someone help me to solve or spot this problem?

Thank you very much !!!
Marcos Hass Wakamatsu





Viewing latest article 5
Browse Latest Browse All 90

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>