Surviving a Linux Filesystem Failures
When you use term filesystem failure, you mean corrupted filesystem
data structures (or objects such as inode, directories,
superblock etc. This can be caused by any one of the following reason:
* Mistakes by Linux/UNIX Sys admin
* Buggy device driver or utilities (especially third party utilities)
* Power outage (very rarer on production system) due to UPS failure
* Kernel bugs (that is why you don't run latest kernel on production
Linux/UNIX system, most of time you need to use stable kernel release)
Due to filesystem failure:
- File system will refuse to mount
- Entire system get hangs
- Even
if filesystem mount operation result into success, users may notice
strange behavior when mounted such as system reboot, gibberish
characters in directory listings etc
So how the hell you are
gonna Surviving a Filesystem Failures? Most of time fsck (front end to
ext2/ext3 utility) can fix the problem, first simply run e2fsck - to
check a Linux ext2/ext3 file system (assuming /home [/dev/sda3
partition] filesystem for demo purpose), first unmount /dev/sda3 then
type following command :
# e2fsck -f /dev/sda3
Where,
- -f : Force checking even if the file system seems clean.
Please note that If the
superblock is not found,
e2fsck will terminate with a fatal error. However Linux maintains
multiple redundant copies of the superblock in every file system, so you
can use -b {alternative-superblock} option to get rid of this problem.
The location of the backup superblock is dependent on the filesystem's
blocksize:
- For filesystems with 1k blocksizes, a backup superblock can be found at block 8193
- For filesystems with 2k blocksizes, at block 16384
- For 4k blocksizes, at block 32768.
Tip you can also try any one of the following command(s) to determine alternative-superblock locations:
# mke2fs -n /dev/sda3
OR
# dumpe2fs /dev/sda3|grep -i superblock
To repair file system by alternative-superblock use command as follows:
# e2fsck -f -b 8193 /dev/sda3
However
it is highly recommended that you make backup before you run fsck
command on system, use dd command to create a backup (provided that you
have spare space under /disk2)
# dd if=/dev/sda2 of=/disk2/backup-sda2.img
If you are using Sun Solaris UNIX, see howto:
Restoring a Bad Superblock.
Please note that things started to get complicated if hard disk participates in software RAID array. Take a look at
Software-RAID HOWTO - Error Recovery. This article/tip is part of
Understanding UNIX/Linux file system series, Continue reading rest of the Understanding Linux file system series (this is part III):
- Part I - Understanding Linux superblock
- Part II - Understanding Linux superblock
- Part III - An example of Surviving a Linux Filesystem Failures
- Part IV - Understanding filesystem Inodes
- Part V - Understanding filesystem directories
- Part VI - Understanding UNIX/Linux symbolic (soft) and hard links
- Part VII - Why isn't it possible to create hard links across file system boundaries?