Home | Small and Floppy based Linux | Just for Laughs | Documentation | Contact Me |

DVD Region Codes--The Studios Strike Back

Testing your hard drive in Linux

Stroke vs Heart Attack..,which is it & what to do

If you Microwave Water you will want to read this

Testing your hard drive in Linux

I recently needed to test a hard drive in Linux, and had a hard time finding out how to do it properly. In DOS, you can run a Surface Scan in Scandisk. Linux does not have anything called Surface Scan, however. In Linux, it is called checking for bad blocks.

What is a block? I'm not exactly sure how it's defined, but it's basically a chunk of data on the hard drive. So if you have a 40 gig partition, you divide it into a whole bunch of indexed blocks that might be like 4096 bytes each. Block 0 is the first 4096 bytes, Block 1 the second 4096 bytes, and so on. An important thing you should know is that the "blocks" are a part of the filesystem. At time of formatting, a blocksize is chosen for the filesystem. The partition itself does not have a blocksize, the filesystem does.

If part of your hard drive is messed up, the block or blocks that contain that bad part should be marked bad. Basically, this means the block number is added to a list of bad blocks. Then, you give the list to the filesystem on the partition. The file system stores it somewhere and remembers not to use those bad blocks. If you use e2fsck, the process of giving the list to the filesystem is automated. Since that prevents errors, that is preferable.

There are 2 general ways to find the bad blocks.

The first way is to just try reading every block. If one of the reads causes the hard drive to throw an error, then the block in question is marked bad. This, however, is not the best way, because sometimes the hard drive can have a bad part of the disk that doesn't throw an error when read for some reason. The second, slower method, is to write data to every block on the hard disk, and make sure it's the same when it's read back. It is possible to do this without erasing the data in your partition, but it makes it take longer. This second method, read/write, is what is done in a DOS Surface Scan.

Programs to use

In Linux, there is pretty much only one program that is used to check for bad blocks. It is called, surprisingly enough, "badblocks". You should only use this program directly, though, when you are checking a blank partition, or a non ext2 or ext3 filesystem. When checking an ext2 or ext3 filesystem partition, you should use e2fsck, which runs badblocks in the background.

Using e2fsck

You should use this when checking an ext2 or ext3 filesystem. These 2 methods automatically save the bad blocks found into the filesystem so that those parts of the hard drive are no longer used.

Read-only method: e2fsck -c -C /dev/hda1 ---OR--- e2fsck -c -C -y /dev/hda1 (This answers yes to all questions, so it is sure to finish by itself.)

Non-destructive read/write method: e2fsck -c -c -C /dev/hda1 ---OR--- e2fsck -c -c -C -y /dev/hda1 (This answers yes to all questions, so it is sure to finish by itself.)
Note: Filesystem must NOT be mounted. You therefore have to use a rescue cd if you need to check the root filesystem. I recommend this cd: http://rescuecd.sourceforge.net/

Using badblocks

You should use this when checking a blank partition. You can also use it on a partition with a non ext2 or ext3 filesystem. There might be an equivilent of e2fsck for your filesystem, though, so you might try that. When you use badblocks, the bad blocks list for your partition will not be saved in the filesystem automatically. It is possible to save the badblocks list, and then have the filesystem read in that list. The problem is, you must set the blocksize in badblocks to be the blocksize the filesystem will be, or currently is. Otherwise the block numbers will not correspond to the blocks in that filesystem. I'm not going to describe how to import the block list into the filesystem. You can read the man files for that information.

Read-only method: badblocks -b 4096 -p 4 -c 32768 -s /dev/hda1
The number after -b is the block size. 4096 means 4096 bytes. You don't need to change this unless you're using the bad blocks list for something.
The number after -p is the number of passes it should run on the hard drive. The 4 means it will stop testing the hard drive after it has tested the entire hard drive 4 times without the bad blocks list changing. So if it finds new bad blocks on third pass, and none after that, it will have done 7 passes all together. If you don't want to do multiple passes, you can skip this switch to save time.
The number after -c is the number of blocks it tests at a time. The default is 16. The -b number * the -c number equals the number of bytes of RAM it will use. You should probably use as much of your available memory as possible to save time. Just make sure you don't use too much. You certainly wouldn't want this data to be swapped. If you run out of physical and swap memory, the program will just crash. The above settings use 128 megs of RAM.

Destructive read/write method: badblocks -b 4096 -p 4 -c 16384 -w -s /dev/hda1
The number after -b is the block size. 4096 means 4096 bytes. You don't need to change this unless you're using the bad blocks list for something.
The number after -p is the number of passes it should run on the hard drive. The 4 means it will stop testing the hard drive after it has tested the entire hard drive 4 times without the bad blocks list changing. So if it finds new bad blocks on third pass, and none after that, it will have done 7 passes all together. If you don't want to do multiple passes, you can skip this switch to save time.
The number after -c is the number of blocks it tests at a time. The default is 16. The -b number * the -c number * 2 equals the number of bytes of RAM it will use. You should probably use as much of your available memory as possible to save time. Just make sure you don't use too much. You certainly wouldn't want this data to be swapped. If you run out of physical and swap memory, the program will just crash. The above settings use 128 megs of RAM.

Other things missing from this page

There is a non-destructive read/write mode of badblocks. (You should use e2fsck for ext2 and ext3 filesystems, though.)
If your hard drive has bad blocks randomly scattered throughout, it is probably shot. If they are localized to a small area, then it is more likely still useable.

-Aaron Talbot
talasonic@earthlink.net