There are still a few tasks that it should be better to perform before declaring installation complete.
If not already done examine carefully boot messages. Particularly these sections: RAID arrays detections, RAM disk loading, LVM subsystem activation, root file system mounting, modules loading, other file systems mounting.
If any strange error messages or warnings are displayed investigate them.
Look at Appendix E: boot messages for some examples of successful boot messages.
Look at Appendix C: troubleshooting tips for troubleshooting tips.
A valid substitute for the rescue floppies can be the Debian "Woody" 3.0 cdrom plus the RAID and LVM extension disk. It would be advisable to keep also at hand the floppy with the scripts used to install the system along with the configuration of disks logical volumes.
Look at section Required Software for pointers to the extension disk.
Look at section Appendix C: troubleshooting tips for recovery instruction in case of a crash.
It is better to discover that this new super-fault-tolerant-unbreakable system cannot boot when an hard disk fails before the system goes on production (and before a real crash, too).
This is a good time also for checking the rescue floppies, make some practice with RAID disk faults and write down a Disaster Recovery procedure.
![]() | Warning! |
|---|---|
Before going on with the following tests, beware that they include possible data loss and array reconstruction. So, if any valuable information has been already put on the system, backup it. Moreover consider that array reconstruction can be a lengthy process. Always disconnect the system from power while physically working at it!!! You have been warned! ;-) |
A common problem could be that the system is unable to boot from the second hard disk, for a couple of different reasons, ranging from BIOS problems to a mis-configured lilo.
A good test for checking that the system will boot with a degraded array is to power down the system, disconnect the first hard disk (both power and data cables) then reboot the system.
The RAID subsystem should detect a disk "fault", remove the faulty drives from the arrays and then boot.
Moreover, if a hot spare drive had been configured (ex. /dev/hdc), this should be automatically used to start array reconstruction. In this case before going on with other tests wait until reconstruction is complete.
Arrays status can be monitored by looking at /proc/mdstat file.
If no hot spare was used, a second useful test can be power down again the system, reconnecting the first disk and then boot. The system should detect that the superblock on the first disk is older than that on the second one and keep the first disk from joining the arrays.
To put back the first disk into the arrays use the command raidhotadd; with two mirrored hard disk /dev/hda and /dev/hdb, provided that the "faulty" drive was /dev/hda, use:
raidhotadd /dev/md1 /dev/hda1 raidhotadd /dev/md2 /dev/hda2 |
Again, the reconstruction process should start.
If a hot spare disk was used there are two options:
Leave /dev/hdc into the array and use /dev/hda as hot spare. Update /etc/raidtab to match this situation (swap roles between /dev/hda and /dev/hdc).
Return to original situation: mark /dev/hdc as faulty with raidhotgenerateerror, then add /dev/hda to the array with raidhotadd, then wait until reconstruct is completed again.
In both cases reboot to verify correct system functionality.
![]() | Warning! |
|---|---|
Unless your system has hot swap hard disk and hot swap support do not hot plug any hard disk from the system while it's running (do not unplug the data cable neither the power cable). This would lead to system complete lockup. While the first thought could be "What I installed a RAID system for, then?" this behavior is correct, or it may be defined as a "feature": a software RAID system is a low cost system targeted to protect from a reasonable amount of damage or misfortune, for example hard disk damage limited to some blocks or tracks. If an hard disk suddenly stops responding to commands (as if it was unplugged) the system will lock up and manual shut down, disconnect of the disk and restart will be required. In any case the system will boot up again with the remaining drive(s). It must be clear that this is not a limit due to the use of software raid, but to the hardware architecture instead. The same problem would show up with the use of some low cost RAID SCSI card that does not support hot swap, in case of a serious disk problem. |