Infuriating two node cluster issue!

Soldato
Joined
7 May 2004
Posts
5,503
Location
Naked and afraid
Infuriating because it WAS working and now decided it doesn’t want to!

We’ve had a simple Windows 2000 cluster running on two IBM Blades for over a year now, last Friday it decided it didn’t want to speak to it’s neighbour over the heartbeat anymore.

Event Type: Warning
Event Source: ClusSvc
Event Category: (16)
Event ID: 1123
Date: 22/01/2010
Time: 17:37:12
User: N/A
Computer: #####01
Description:
The node lost communication with cluster node '#####02' on network 'Private'.

Event Type: Warning
Event Source: ClusSvc
Event Category: (16)
Event ID: 1135
Date: 22/01/2010
Time: 17:37:28
User: N/A
Computer: #####01
Description:
Cluster node #####02 was removed from the active cluster membership. The Clustering Service may have been stopped on the node, the node may have failed, or the node may have lost communication with the other active cluster nodes.


Event Type: Error
Event Source: ClusSvc
Event Category: (16)
Event ID: 1108
Date: 22/01/2010
Time: 17:40:39
User: N/A
Computer: #####01
Description:
The join of node #####02 to the cluster timed out and was aborted.

Nothing was changed!

We tried removing and re-adding the second node but it failed saying it couldn’t communicate with the first, we checked comms and it’s all OK, no failures.

We changed the IP of the second nodes heartbeat, uninstalled Cluster services and re-installed which oddly worked.

Then literally four hours later the node dropped out, the same ‘fix’ as before didn’t work.

Any ideas at all and what this could be, failing NIC perhaps? No errors to that effect and the Cisco switch in the Blade Chassis doesn’t report any comms errors so it’s be hard to diagnose for sure!
 
Soldato
Joined
17 Oct 2002
Posts
3,941
Location
West Midlands
I would assume that because its been running for so long that the switch configuration hasnt changed: IE Speed and Duplex settings, Spanning Tree Disabled per port, Correct VLAN's, No port security set on the ports?

Ive had several occasions where servers are configured to use a specific VLAN's for server to server communication only for someone to delete said VLAN from the VTP database.
 
Back
Top Bottom