URGENT - RAID Failure

Associate
Joined
18 Nov 2003
Posts
1,311
Location
Newcastle
Hi all,

Came in this morning to find one of the drives erroring on our RAID5 array. Replacing the drive but the server is failing to boot up stating it has a raid config mismatch. Is it safe for us to create a new config but to KEEP all the data on the drives?

Any help is much appreciated.

Cheers
 
Associate
Joined
13 Oct 2009
Posts
238
Location
Cumbria
They're hot pluggable SCSI/SAS disks?
I'd probably remove the spare disk and boot it. Then, on a spare machine I'd initialise the spare disk to wipe it (via the RAID BIOS menu) and put it back in to rebuild the RAID 5 array.
Sounds like the RAID card has found existing array configs on your faulty RAID5 array and your spare disk. I think it's good practice to have your spare disks completely blank.
 
Associate
OP
Joined
18 Nov 2003
Posts
1,311
Location
Newcastle
Thanks for the suggestions, Toughnoodle. This is a real odd one here.

Basically, we have three drives in a RAID5 config and 1 is dying. The one dying has a solid light displaying but no audible alarm. If we leave the server to boot up normally it absolutely crawls through the starting up of Windows then finally blue screens. We know this drive is wrecked and have a spare ready to go in but we get 'Unresolved configuration mismatch between disks and NVRAM on the adapter' when we try to boot up with the new drive installed.

We're just not sure if creating a new config will erase all the data from the drives. We don't want this as we know the data on the drives are good. Any ideas?

Controller: LSI 320-1
 
Associate
OP
Joined
18 Nov 2003
Posts
1,311
Location
Newcastle
Yes it's a custom built server. I've tried ringing LSI but they don't start yet (im guessing it's in America).

Does anyone know if creating that new configuration will erase the data?
 
Associate
Joined
18 Jan 2004
Posts
1,950
Location
Somewhere
Yes it's a custom built server. I've tried ringing LSI but they don't start yet (im guessing it's in America).

Does anyone know if creating that new configuration will erase the data?


honestly dont touch it until youve spoken to LSI techsupport, you could end up nuking it. If they are in the states their likely to be open in an hour or so.

Lets hope they support you/it......

(this thead is the reason people dont build servers....)
 
Soldato
Joined
18 Oct 2002
Posts
4,034
Location
Somewhere on the Rainbow
They are supposed to have a tech support in the uk, 01344 413441, is that the number you're using?

If all else fails, according to the manual here you would use the raid bios to configure the new drive as a hot spare then the card should rebuild it into the array? Section 3.8 details the procedure. I take it you have back ups?
 
Associate
OP
Joined
18 Nov 2003
Posts
1,311
Location
Newcastle
Well we are up and running again. What a day!!

LSI Support didnt pick up the phone at all this afternoon. So great support from them i must say, NOT!

Anyway, the way i managed to fix this was to force the damaged drive to go into failure mode. This then caused the two other drives to take over and enabled me to boot into Windows 2003 at full speed. Pulled out the dead disk and replaced with a new drive. The new drive is currently rebuilding itself within Windows.

I still though, can't understand why the RAID alarm hadn't gone off to start with, and why on reboot the controller lost its config. We luckily managed to pull this back off the damaged array and back into the NVRAM. The disk we removed degraded the server to an almost standstill.

Time to bring in the SAN's and crack open the beers.
 
Soldato
Joined
10 Oct 2005
Posts
8,706
Location
Nottingham
I still though, can't understand why the RAID alarm hadn't gone off to start with, and why on reboot the controller lost its config. We luckily managed to pull this back off the damaged array and back into the NVRAM. The disk we removed degraded the server to an almost standstill.

Time to bring in the SAN's and crack open the beers.

Good to know you've got it fixed.

As to why it didn't alert and degraded things so much ... well the problem is probably related to the disk not completely failing. I've seen it quite a few times on commercial Unix servers where a disk is faulty but hasn't actually failed completely. This then floods the system with error messages and (in my cases) scsi bus reset errors until the disk can be properly forced out of the configuration manually at which point the system returns to normal running with degraded disk resiliency.

In my experience disk failures are rarely it works -> it doesn't work ... there's normally a mid ground which screws things up.
 
Soldato
Joined
14 Dec 2005
Posts
12,488
Location
Bath
Good to know you've got it fixed.

As to why it didn't alert and degraded things so much ... well the problem is probably related to the disk not completely failing. I've seen it quite a few times on commercial Unix servers where a disk is faulty but hasn't actually failed completely. This then floods the system with error messages and (in my cases) scsi bus reset errors until the disk can be properly forced out of the configuration manually at which point the system returns to normal running with degraded disk resiliency.

In my experience disk failures are rarely it works -> it doesn't work ... there's normally a mid ground which screws things up.

I've seen this too.

Disk has problems but not enough for the controller to mark it as failed and ignore the disk. Instead it just keeps on trying to use the half-dead disk causing all manner of problems. Get the damaged half-dead drive out or convince the controller that it's ****ed and not to bother with that disk and hey presto, everything's working fine again (well obviously with reduced redundancy until you replace the failed disk).
 
Back
Top Bottom