How to give your disks and files a health check and repair basic faults

From SynologyWiki
Jump to: navigation, search

Contents

What is this Wiki page about?

This Wiki describes how to run diagnostics on your hard disks and the data they contain. It is aimed at doing this while the disks are installed in your NAS but as this limits your options it will also mention the benefits of doing this outside the NAS

Overview of the process

First we check individual hard disk(s) health using smartmontools (Note: Synology Firmware now includes SMART support so you only need to use smartmontools if you have an older Firmware ), this checks for bad blocks and other "physical" issues with the disk(s) Then you check your NAS data volumes e.g. /dev/md2 (= /volume1 in a multibay NAS) or /dev/hda2 (= /volume1 in a single bay NAS) for "logical" errors using e2fsck. The process will not corrupt your data but if faults are found you could lose data.

Limitations

If your disks are installed in your NAS whilst performing these checks the repairing of "logical" errors using e2fsck is restricted to your data, you won't be able to repair or check the NAS's own Operating System files (or at least the original author of this wiki page doesn't know how to do this, but it should be possible in a multi drive NAS, if you know how to do this please update it).

If you have the option to remove the disks from your NAS and connect them to a linux PC you will be able to use e2fsck on the NAS's operating system files as well as on your data.

How often should I check my disks and files

There will be many opinions about this but the authors opinion is;

The physical check (i.e. SMART or smartmontools) should be used everytime; a disk is added to the NAS, the NAS reports a disk as having crashed, or else every 1000 hours of disk operation (this is not the same as power on hours, for a SOHO user this is approximately once a year).

The logical check (i.e. e2fsck) should be used everytime there is; an improper shutdown (power cut), a disk crash, any noticeable file type errors, or every 1000 hours of operation (again this is not the same as power on hours, for a SOHO user this is approximately once a year).

As your disk(s) get older and start to show signs of age (i.e. increasing number of re-allocated sectors, increased read/write error rates, etc.) you should steadily increase the frequency of the checks and eventually decide when to replace the drive.

Background Info

Single bay NASs e.g. DS106 DS107 etc

For these models the linux operating system (used by Synology and virtually all other NAS providers) represents your SATA hard disk as /dev/hda and its three partitions as /dev/hda1 /dev/hda2 and /dev/hda3. As RAID is not a possibility on these system the logical storage of your data matches the physical storage on the disk. Hence both physicaly and logically your data (/volume1) is stored in /dev/hda3. The instructions provided below are written on the assumption of a multibay NAS where the disk names are different and the physical and logical storage of data can be different (due to the use of RAID). Consequently when following the instructions/information below you will need to use /dev/hda instead of /dev/sda for Smartmontools, and for e2fsck you should use /dev/hda3 instead of /dev/md2.

Multibay NASs

The linux operating system (used by Synology and virtually all other NAS providers) represents your SATA hard disks as /dev/sda (for disk1) /dev/sdb (for disk2) etc. Hence the physical error checks using SMART or smartmontools are run on the hard disks themselves, e.g. /dev/sda for disk1 as described in the section on smartmontools below.

The phyiscal hard disks on a multibay Synology NAS are all setup with three partitions, i.e. for disk1 (/dev/sda) these are /dev/sda1 /dev/sda2/ and /dev/sda3. You could say these partitons are "physical partitions" because they were written to the disks when the disks were formated by the synology NAS.

To make the managment of data in a potential RAID setup uniform the synology NAS's store data using "logical partitions" (/dev/md0 /dev/md1 /dev/md2 etc) which map through to the "physical partions" discussed above (e.g. /dev/sda1) based on whether you use RAID or not. This has the advantage that your data in /volume1 will always be in the "logical partition" /dev/md2, but may not be in the "physical partition" /dev/sda3 or /dev/sdb3 as this depends on what physical disks you have and whether they are used in RAID or not.

Hence due to the use of "logical partitions" (which may or may not be used as Basic (no RAID), RAID0, RAID1, or RAID5) your data is always stored in the same "logical" way. Again, RAID or no RAID only affects where your data is physically stored, not how it is logically stored.

The instructions/information below assumes a multibay NAS, but to help singlebay users I occasionally also refer to their syntax in brackets e.g. (/dev/hda and /dev/hda3 for single bay NASs).

All systems

The synology NAS's use at least three logical disk partitions.

  • Partition 0 (e.g. /dev/md0 or /dev/hd1 for singe bay NASs) is only for the NAS's own Operating System files
  • Partition 1 (e.g. /dev/md1 or /dev/hd2) is a temporary memory swap space used only by the NAS's OS
  • Partiton 2 (e.g. /dev/md2 or /dev/hd3) is /volume1 i.e. where your data is, and if you have more volumes then /dev/md3 is /volume2 etc.

Hence when checking for "logical" errors we use e2fsck on these "logical partitions", e2fsck is not affected by where the data is pysically stored, i.e. whether you use RAID or not.

Linux "mounts" logical partitions to the root directory using a directory name you (or your NAS's OS) specify so that the data in the partition can be accessed. Hence "/dev/md2" (or /dev/hda3 for single bay NASs) containing your data is mounted as "/volume1" by the synology operating system.

The e2fsck command that checks for logical errors can be run in either read only mode (i.e. check only but do not fix) or a make changes mode (check and fix it). e2fsck will allow you to run the read only mode on any logical partition (e.g. /dev/md2 or /dev/hda3) even when it is mounted. However I recomend you do not do this, I recomend you only run it on a unmounted partition else the results may not be accurate. The "make changes" mode of e2fsck can only be used on unmounted partitions.

Consequently to fix any errors on say /dev/md2 (or /dev/hda3) you must first unmount "/volume1" and any other mounts to your data below /volume1, e.g. /volume1/opt if it is mounted.

Assumptions

  1. It is recomended you have a backup of your data
  2. It is assumed you are prepared for the NAS to be out of action for between approximately 2 hours or a day (depends on number of disks, disk size, number of errors found etc etc).

The Procedure

Make sure you have read the background info above so you understand the nameing convention used below. Single bay NAS users replace /dev/... as neccessary

  1. Ensure you have a backup of your data
  2. Reboot your NAS
  3. Enable the Command Line Interface (Telnet or SSH)
  4. You now need to run the Hard Disks SMART tests. If your Synology firmware supports SMART, run the "Extended Test" on each disk, it can found in "Storage Manager" -> "HDD Management" -> "S.M.A.R.T. Test". If your Synology FW is old and does not support SMART you will need to install and use smartmontools as described in A short list of the more useful 900+ ipkg packages to do a "long self test" on each drive in your NAS.
  5. After performing the SMART tests login to the command line as root
  6. Type "df" to see what is mounted on your NAS.
  7. Before you unmount the desired partition, stop services running on the NAS, using the following commands: "/usr/syno/etc/rc.d/S25download.sh stop", "/usr/syno/etc/rc.d/S20pgsql.sh stop", "/usr/syno/etc/rc.d/S80Samba.sh stop", "/usr/syno/etc/rc.d/S83nfsd.sh stop". If you don't do this, you may get an error when you run the volume check, "Device or resource busy while trying to open /dev/md2".
  8. Now you must unmount anything that is directly or indirectly mounted to the partition you want to check. For example let's check /volume1. Everyone should have /dev/md2 mounted as /volume1. To unmount it use the command "umount -f /volume1". Some of you may also see that subdirectories of /volume1 are mounted, e.g. you may see /volume1/opt mounted as /opt in which case you will also need to "umount -f /opt" etc.
  9. First we will do a read only "logical" check on the /volume1 data (i.e. partition /dev/md2), enter the command "e2fsck -v -n -f /dev/md2" this check may take from 10 minutes to 2 hours depending on how much data you have and how many errors are found.
  10. If you have any other partitions, e.g. /volume2 then you can check those by unmounting it and using the command "e2fsck -v -n -f /dev/md3" etc.
  11. For partitions that have errors you should re-run the e2fsck command in the "make changes mode" e.g. use the command "e2fsck -v -f -y /dev/md2".
  12. It would be nice if we could also check the Operating System partition /dev/md0, however, I do not know a way to unmount it and leave the NAS operational. You could use e2fsck in the check only mode e.g. "e2fsck -v -n -f /dev/md0" with the partition still mounted, but I warn against doing this as it may report errors that don't really exist.
  13. When you have finished repairing any "logical" errors reboot the NAS, e.g. enter the command "reboot".
Personal tools
Community Resources