[SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Questions and mods regarding system management may go here
Forum rules
1) This is a user forum for Synology users to share experience/help out each other: if you need direct assistance from the Synology technical support team, please use the following form:

https://account.synology.com/support/su ... p?lang=enu



2) To avoid putting users' DiskStation at risk, please don't paste links to any patches provided by our Support team as we will systematically remove them. Our Support team will provide the correct patch for your DiskStation model.
Galaxy
I'm New!
I'm New!
Posts: 8
Joined: Thu Nov 23, 2017 10:47 pm

[SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by Galaxy » Thu Nov 23, 2017 11:04 pm

EDIT: I fixed it and got all my data back! I made a whole bunch of tests and will explain everything for you. For better understanding, I have added comments to my questions in this post in blue. On the next post I will explain, how I recovered all my data from a raid 5 with two failed disks. I hope it helps someone. THE MOST IMPORTANT THING IS THAT YOU KEEP CALM AND DON'T RUSH INTO ANYTHING! THINK ABOUT IT AND AFTER THAT, THINK ABOUT IT AGAIN! You read that everywhere and it's true, believe me!

Hey guys,

I need your help!

My Setup:
4 Disks: Western Digital Red Series (WD40EFRX) from February 2016 running in an raid 5 (Slot 1,2,3,4); Slot 5 is empty; DS515+
  1. Let me explain what happend and what I did in steps:
  2. On 17.11.2017 (early in the morning) my DS did an automatic update (I have had set the box to do automatic updates).
  3. On 18.11.2017 at around 00:30 I wanted to access my data and could not reach the web interface, nor the mountpoint on my windows and also couldn't ping it. So I looked at the DS and saw that the blue led light was flashing. If I remeber correctly every disk led was dead. It looked similiar to this one here: https://forum.synology.com/enu/viewtopi ... 7&t=128497 - So I did (sadly, should have waited...) a restart via the power button: Hold it for a few seconds until the beep. The DiskStation did a normal shutdown with a flashing blue led. After that I started it again via Power Button.
  4. After the start had completed, the status led was orange and all disk leds were on (no beeping, just the normal one, which says that the station has booted). I checked to access web, mountpoint, but could not reach my data. So I checked with Synology Assistant and found it with issue "Configuration lost" and an dhcp ip adress.
  5. I searched the web and it said that it should not be a problem to do a reinstall. My data would not be touched. Only the system partitions will be touched. So I did a connect and after that a reinstall (web showed reinstall, if I remember correctly) with manual downloaded .pat file (DSM_DS1515+_15217.pat). If the data would be wiped I should get a red text, which wasn't shown. So the reinstall started and failed between 40% and 50% with "failed formatting system partition [35]" (I don't remember the correct wording, but it was a "[35]" there. After that the nas showed me to install, not to reinstall.
  6. So I did a manual shutdown and start via power key, like in step 2 explained again. This time it came up with an orange led and a constant beeping (beep, beep, beep...). Ironically the station came up now with my old configuration (with the manual ip-adress i had set for example). But all my installed packages were gone. It showed me that volume1 crashed. volume1 is now shown with disk3 and disk4 but missing disk1 and disk2. Also it is showing no bytes. On the disk side every disk is on "normal", but has the error "system partition failed" (In German: "Systempartitionierung fehlgeschlagen). It looks like that:
    Image Image Image
  7. So now I can't do a reapair of the raid 5, because of 2 lost disks, my raid 5 is gone (well at least i thought that :wink:). Now I did shut it down and gone on research. I will explain that down below. My goal is to get the data back. It doesn't matter if I have to recreate settings and the volume later. I just want to get my data back if it's possible...
Because of automatically monthly running smart tests which ran fine (all disks are "normal"), I don't think my disks failed. I think the update or my manual shutdown (step2) or the reinstall did mess things up, so the volume got lost... In some logs is shown that a few bad sectors were found: "Bad sector was found on disk[3]". And also "I/O-Error on disk 3 on NAS01". This errors started in the middle of October and there are like 15 of it. But I assume that this is disk3 in the nas not disk1 or disk2 which failed? I am assuming correct?
Yes ist's correct: "I/O-Error on disk 3 on NAS01" means the disk in the phisycally third slot. "I/O-Error on disk 5 on NAS01" would be the phyisically fifth slot and "I/O-Error on disk 1 on NAS01" would be the first phisycally slot. Same goes for "disk[3]", "disk[5]" or "disk[1]" and so on. Don't mess this up with the "raid slot" mentioned down below, because mdadm counts from 0. So in my example, I have 4 disks, which are physically mentioned with 1,2,3,4 and in mdadm (the tool to manage software raids in Linux), they are mentioned as 0,1,2,3 - just keep that in mind! (Btw: NAS01 is just the name of the nas-system - so on yours it will most likely be another name of course).

I checked outputs from various files in /etc/space as well, so I did not change the disk order by accident (I paniced a little at that day...). I also checked some logs. It seems to be, that on a boot the diskstation tries to stop the /dev/md2 and wants to do a an force assemble, but fails because it can't stop the /dev/md2 (Why? I don't know...)
As I said, panic is the worst thing... Every step you do, write 'em down and make photos and videos, so you can remember correctly! For the peace of your heart, if you changed the disk order by accident, it's not that big issue at all. I tried this on a test case as well and I could get everything back. Mdadm is really robust on such things. Just keep reading. :)

I did run a few mdadm commands to check what is going on:
root@NAS01:/dev# cat /proc/mdstat
This shows the raid information. As you can see there are three raids: md0, md1 and md2. md0 and md1 are just system partitions in Synology Systems (md0 is normally the system partition and md1 is the swap partition). The following numbers (md2, md3 and so on) contain your created raids and therefore your actual data. Note that md0, md1, md2, md3 and so on are just names for the raids. It could be that md0 is your data and md2 your system partition for example. It's not likely like that in Synology systems, but you'll never know if you don't check that. Again: Test as much as you can and get as much information before you do anything! You can get to know which raid contains your data in some ways, I will explain on next post. However, you can see U and _ in the output. A U means this device is still in your raid and up. A _ means this device is missing in your raid (This has not to mean that your disk is dead!). Above of this blocks you can see which devices are in the raid. For example on md2 you see that sdc3 (/dev/sdc3) and sdd3 (/dev/sdd3) are still present on slot 2 and 3. But slot 0 and slot 1 are missing, you discover that because of the leading two missing _ and because the raid will start to count on 0, when it is created. Note here, that the slot number of the raid doesn't have to be your acutally physical disk order in your device! So the slot 0 in the raid doesn't have to be the first(1) physical disk in your device. It could also be the second (2) or fifth or whatever. Normally Synology devices use phyiscal disk slot 1 for raid slot 0, physical disk slot 2 for raid slot 1 and so on and also names the devices like that (first disk is /dev/sda, second disk is dev/sdb etc.), but just keep in mind, that it isn't always like this! Another thing you can see, is that md0 and md1 have five disks (UU_ _ _) - but I only have 4 disks in my case :?: This is shown, because Synology creates the system and swap partition on all disks, no matter if they exist or not. It's done like this to expand this raids on all new disks you put in your enclosure, to provide always a working DSM, regardless which disk fails. Note also that md2 is raid5 and md1 and md0 are raid1. As I did had an raid5 with 4 disks, I know at this point, that md2 is the raid which contains my data and has missing two devicdes (0 and 1) - I assume that this are /dev/sda3 and /dev/sdb3, but at this point I still can't be shure! So let's check it out :wink:

Code: Select all

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sdc3[2] sdd3[3]
      11706589632 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/2] [__UU]

md1 : active raid1 sdc2[0] sdd2[1]
      2097088 blocks [5/2] [UU___]

md0 : active raid1 sda1[0] sdb1[1]
      2490176 blocks [5/2] [UU___]

unused devices: <none>

root@NAS01:/dev# mdadm --detail /dev/md0
With this commands you can get more details about the existing raids. You can see here, that as I mentioned before, indeed this raid contains 5 disks, although I only have 4 disks. The funny thing is that the synology system raids are always degraded (see State at md0 and md1), if you don't use every physically slot :D

Code: Select all

/dev/md0:
        Version : 0.90
  Creation Time : Sat Nov 18 00:20:00 2017
     Raid Level : raid1
     Array Size : 2490176 (2.37 GiB 2.55 GB)
  Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
   Raid Devices : 5
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Nov 23 18:30:08 2017
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 5c6bb801:f95aa80d:3017a5a8:c86610be
         Events : 0.2264

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       0        0        2      removed
       3       0        0        3      removed
       4       0        0        4      removed

root@NAS01:/dev# mdadm --detail /dev/md1

Code: Select all

/dev/md1:
        Version : 0.90
  Creation Time : Sat Nov 18 01:11:06 2017
     Raid Level : raid1
     Array Size : 2097088 (2048.28 MiB 2147.42 MB)
  Used Dev Size : 2097088 (2048.28 MiB 2147.42 MB)
   Raid Devices : 5
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Thu Nov 23 18:27:25 2017
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : d9a4606d:3e753320:8467a421:84d4f733 (local to host NAS01)
         Events : 0.30

    Number   Major   Minor   RaidDevice State
       0       8       34        0      active sync   /dev/sdc2
       1       8       50        1      active sync   /dev/sdd2
       2       0        0        2      removed
       3       0        0        3      removed
       4       0        0        4      removed

root@NAS01:/dev# mdadm --detail /dev/md2
So let me explain a bit more with the output of md2, as for md0 and md1 we don't really care. What you can see is the devices the raid normally contains (Raid Devices : 4) and the total devices which are in the raid at the moment (Total Devices : 2). You can compare this with the output of the previous command (cat /proc/mdstat). Another important thing is that on Persistence is shown: Superblock is persistent. However if not, there are also possibilities to solve from this problem, so don't lose hope if so and keep reading. Basically Linux raid reserves a bit of space (called a superblock) on each component device. This space holds metadata about the RAID device and allows correct assembly of the array. As it is an raid 5 with two missing disks, the state of the raid is in fact FAILED (see State). Also you see when the raid was created,the version of the Suberblock format etc. If you need more information take a look here: https://github.com/tinganho/linux-kerne ... ion/md.txt

Code: Select all

/dev/md2:
        Version : 1.2
  Creation Time : Sat May 21 17:13:58 2016
     Raid Level : raid5
     Array Size : 11706589632 (11164.27 GiB 11987.55 GB)
  Used Dev Size : 3902196544 (3721.42 GiB 3995.85 GB)
   Raid Devices : 4
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Thu Nov 23 18:27:43 2017
          State : clean, FAILED
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : NAS01:2  (local to host NAS01)
           UUID : eadf7c1e:9c3e4ac3:3f35ab65:118a3497
         Events : 156

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       0        0        1      removed
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3
 

root@NAS01:/dev# fdisk -l
So now it get's interesting, because here you can see what actually happend to my raid. This command lists all partitions of all disks. Let's take /dev/sdd for example. On /dev/sdd you can se three existing partitions: /dev/sdd1, /dev/sdd2 and /dev/sdd3 with their size. So let's be honest, it's really obvious, that /dev/sdd3 is part of my raid 5, because of the size (3.6 TB * 4 disks = 14,8 - 3.6 (minus one disk for raid 5) = 10.8, which is my volume size). But don't find your raid because it seems to be obvious. Just compare the output with the output of the commands cat /proc/mdstat and mdadm --detail /dev/mdx. From this commands we see, which partitions are in the raids:
  • For md0 it is /dev/sda1 and /dev/sdb1
  • For md1 it is /dev/sdc2 and /dev/sdd2
  • For md2 it is /dev/sdc3 and /dev/sdd3


If we compare this with the output here, we can see that on /dev/sda there is missing the partition /dev/sda2 and on /dev/sdb there is missing the partition /dev/sdb2 and therefore they both are missing in md1. Another thing you should see, is what I mentioned above: That Synology normally works like this /dev/sdx1 is in md0, /dev/sdx2 is in md1 and /dev/sdx3 is in md2 - or if you have more raids, /dev/sdx4 would be in md3 and so on. As I said, you can never be shure but it looks clearly like that. So let's take this as presume fact to go on. With this, it explains, why md1 has two missing disks, which should be there: /dev/sda2 and /dev/sdb2. They just don't exist anymore. So the error "system partition failed", which occured on reinstall was at least correct - in fact two parts of the swap raid are gone :?: I still don't know why this happens, but I can only imagine that the CPU has something to do with it... Recently Intel admited, that the Intel C2000 Series has Failures. While other resellers took back products, which have guarantee, Synology just extended the support of the affected devices - see Announcement:https://www.synology.com/en-global/comp ... %20Updatet. I had enabled the automatic updates on my case and could see in the logs, that a day before my case wasn't accessible, an update was made. Either this failed and deleted the two partitions, or my reinstall... I still don't know, but will contact Synology for a replacement. My trust in this case is gone, as this should not happen and I also tested my first disk (which is in fact /dev/sda) with an extended S.M.A.R.T. test, which completed with no errors (I also analyzed the raw values).

Ok back to topic. If all is OK, you should see, that your partitions of your raid which crashed (failed), are still there. As we presume, that everything with a 3 at the end belongs to /dev/md2 and we find them all in the ouptut (/dev/sda3, /dev/sdb3, /dev/sdc3 and /dev/sda3), we know that the partitions of the raid are still there - so my reinstall didn't wipe my data partitions (As I said before, make photos. I paniced and wasn't really shure which errors were shown and was afraid, the reinstall wiped all :D; This step was a rlly good point to take a deep breath ^^).

Code: Select all

Disk /dev/sdd: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 2F83CF67-C568-4D90-8884-75ACFBA7954D

Device       Start        End    Sectors  Size Type
/dev/sdd1     2048    4982527    4980480  2.4G Linux RAID
/dev/sdd2  4982528    9176831    4194304    2G Linux RAID
/dev/sdd3  9437184 7813832351 7804395168  3.6T Linux RAID


Disk /dev/sda: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 6E557084-F62E-4FB3-AC85-1A88A1D150DE

Device       Start        End    Sectors  Size Type
/dev/sda1     2048    4982527    4980480  2.4G Linux RAID
/dev/sda3  9437184 7813832351 7804395168  3.6T Linux RAID


Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 2BAD3013-6B28-4528-8EE9-36FF2DEC9B68

Device       Start        End    Sectors  Size Type
/dev/sdb1     2048    4982527    4980480  2.4G Linux RAID
/dev/sdb3  9437184 7813832351 7804395168  3.6T Linux RAID


Disk /dev/sdc: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 3A38788A-4BE1-4CAC-BBE1-A135FF1AFF6C

Device       Start        End    Sectors  Size Type
/dev/sdc1     2048    4982527    4980480  2.4G Linux RAID
/dev/sdc2  4982528    9176831    4194304    2G Linux RAID
/dev/sdc3  9437184 7813832351 7804395168  3.6T Linux RAID


Disk /dev/md0: 2.4 GiB, 2549940224 bytes, 4980352 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/md1: 2 GiB, 2147418112 bytes, 4194176 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/zram0: 2.4 GiB, 2522873856 bytes, 615936 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/zram1: 2.4 GiB, 2522873856 bytes, 615936 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/zram2: 2.4 GiB, 2522873856 bytes, 615936 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/zram3: 2.4 GiB, 2522873856 bytes, 615936 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/synoboot: 120 MiB, 125829120 bytes, 245760 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xfd900657

Device         Boot Start    End Sectors  Size Id Type
/dev/synoboot1 *       63  32129   32067 15.7M 83 Linux
/dev/synoboot2      32130 224909  192780 94.1M 83 Linux

root@NAS01:/dev# mdadm --examine /dev/sd[abcd]1

Code: Select all

/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 5c6bb801:f95aa80d:3017a5a8:c86610be
  Creation Time : Sat Nov 18 00:20:00 2017
     Raid Level : raid1
  Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
     Array Size : 2490176 (2.37 GiB 2.55 GB)
   Raid Devices : 5
  Total Devices : 2
Preferred Minor : 0

    Update Time : Thu Nov 23 18:29:58 2017
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 3
  Spare Devices : 0
       Checksum : abbbeb63 - correct
         Events : 2258


      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       0        0        4      faulty removed


/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 5c6bb801:f95aa80d:3017a5a8:c86610be
  Creation Time : Sat Nov 18 00:20:00 2017
     Raid Level : raid1
  Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
     Array Size : 2490176 (2.37 GiB 2.55 GB)
   Raid Devices : 5
  Total Devices : 2
Preferred Minor : 0

    Update Time : Thu Nov 23 18:29:58 2017
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 3
  Spare Devices : 0
       Checksum : abbbeb75 - correct
         Events : 2258


      Number   Major   Minor   RaidDevice State
this     1       8       17        1      active sync   /dev/sdb1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       0        0        4      faulty removed


/dev/sdc1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : c07f7981:9727785b:3017a5a8:c86610be
  Creation Time : Sat Nov 18 00:28:02 2017
     Raid Level : raid1
  Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
     Array Size : 2490176 (2.37 GiB 2.55 GB)
   Raid Devices : 5
  Total Devices : 2
Preferred Minor : 0

    Update Time : Sat Nov 18 00:43:38 2017
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 3
  Spare Devices : 0
       Checksum : ad94f06a - correct
         Events : 4936


      Number   Major   Minor   RaidDevice State
this     0       8       33        0      active sync   /dev/sdc1

   0     0       8       33        0      active sync   /dev/sdc1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       0        0        4      faulty removed
   

/dev/sdd1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : c07f7981:9727785b:3017a5a8:c86610be
  Creation Time : Sat Nov 18 00:28:02 2017
     Raid Level : raid1
  Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
     Array Size : 2490176 (2.37 GiB 2.55 GB)
   Raid Devices : 5
  Total Devices : 2
Preferred Minor : 0

    Update Time : Sat Nov 18 00:43:40 2017
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 3
  Spare Devices : 0
       Checksum : ad94f07f - correct
         Events : 4937


      Number   Major   Minor   RaidDevice State
this     1       8       49        1      active sync   /dev/sdd1

   0     0       8       33        0      active sync   /dev/sdc1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       0        0        4      faulty removed



root@NAS01:/dev# mdadm --examine /dev/sd[abcd]2

Code: Select all

/dev/sdc2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : d9a4606d:3e753320:8467a421:84d4f733 (local to host NAS01)
  Creation Time : Sat Nov 18 01:11:06 2017
     Raid Level : raid1
  Used Dev Size : 2097088 (2048.28 MiB 2147.42 MB)
     Array Size : 2097088 (2048.28 MiB 2147.42 MB)
   Raid Devices : 5
  Total Devices : 2
Preferred Minor : 1

    Update Time : Thu Nov 23 18:27:25 2017
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 3
  Spare Devices : 0
       Checksum : 7ec7fead - correct
         Events : 30


      Number   Major   Minor   RaidDevice State
this     0       8       34        0      active sync   /dev/sdc2

   0     0       8       34        0      active sync   /dev/sdc2
   1     1       8       50        1      active sync   /dev/sdd2
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       0        0        4      faulty removed
   
 
/dev/sdd2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : d9a4606d:3e753320:8467a421:84d4f733 (local to host NAS01)
  Creation Time : Sat Nov 18 01:11:06 2017
     Raid Level : raid1
  Used Dev Size : 2097088 (2048.28 MiB 2147.42 MB)
     Array Size : 2097088 (2048.28 MiB 2147.42 MB)
   Raid Devices : 5
  Total Devices : 2
Preferred Minor : 1

    Update Time : Thu Nov 23 18:27:25 2017
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 3
  Spare Devices : 0
       Checksum : 7ec7febf - correct
         Events : 30


      Number   Major   Minor   RaidDevice State
this     1       8       50        1      active sync   /dev/sdd2

   0     0       8       34        0      active sync   /dev/sdc2
   1     1       8       50        1      active sync   /dev/sdd2
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       0        0        4      faulty removed

root@NAS01:/dev# mdadm --examine /dev/sd[abcd]3
Ok just a few words to this comamnds. The ouput shows information of the partitions /dev/sda3, /dev/sdb3, /dev/sdc3 and /dev/sdd3. The commands before equivalent to the partitons with a 1 and 2 at the end. The key information here is the Device UUID and the Events (Event Count). The UUID (Universally Unique Identifier) is a 128-bit number which is randomly generated to identify the device here. The Event Count, basically counts every major event what happens to the whole raid. If you boot up your nas, mdadm "reassembles" your raid and at this process the events of all containing disks will be counted up (normally it is current events +1) - More information: https://raid.wiki.kernel.org/index.php/Event. If one disk differs by the others, mdadm will not automatically inlcude this disks in the raid. And this is exactly what is happening here on every boot. As you can see, the events of /dev/sda3 and /dev/sdb3 are lower than the others. So mdadm doesn't reassemble them on boot, which leads to the crashed volume. Why this works like that? Imagine one disk of four disks had failed and was thrown out of the raid 5. As it is a raid 5 with one lost disk then, you could still access and change the data on the raid, because it is just degraded. If you now change a lot of data and would try to assemble the lost disk later on, it will possible lead to data loss, because the data on the thrown out disk is outdated... This is why in this case you do a rebuild and not an assemble.
  • /dev/sda3 -> 144
  • /dev/sdb3 -> 144
  • /dev/sdc3 -> 156
  • /dev/sdd3 -> 156


Note also, that the raid level of the partition is shown. As we know, that md0 and md1 are raid 1 and md2 is raid 5, this is another indication, that /dev/sda3, /dev/sdb3, /dev/sdc3 and /dev/sdd3 is in fact my crashed raid 5.

Code: Select all

/dev/sda3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : eadf7c1e:9c3e4ac3:3f35ab65:118a3497
           Name : NAS01:2  (local to host NAS01)
  Creation Time : Sat May 21 17:13:58 2016
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7804393120 (3721.42 GiB 3995.85 GB)
     Array Size : 23413179264 (11164.27 GiB 11987.55 GB)
  Used Dev Size : 7804393088 (3721.42 GiB 3995.85 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 5a742e05:cd4dc858:cc89ff46:afe66cbc

    Update Time : Fri Nov 17 05:48:35 2017
       Checksum : 6caaf75c - correct
         Events : 144

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdb3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : eadf7c1e:9c3e4ac3:3f35ab65:118a3497
           Name : NAS01:2  (local to host NAS01)
  Creation Time : Sat May 21 17:13:58 2016
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7804393120 (3721.42 GiB 3995.85 GB)
     Array Size : 23413179264 (11164.27 GiB 11987.55 GB)
  Used Dev Size : 7804393088 (3721.42 GiB 3995.85 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 808a7ce1:02b4f1bd:e1e366ff:0a4cd1a8

    Update Time : Fri Nov 17 05:48:35 2017
       Checksum : 52ee333a - correct
         Events : 144

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : AAAA ('A' == active, '.' == missing)
   
 
/dev/sdc3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : eadf7c1e:9c3e4ac3:3f35ab65:118a3497
           Name : NAS01:2  (local to host NAS01)
  Creation Time : Sat May 21 17:13:58 2016
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7804393120 (3721.42 GiB 3995.85 GB)
     Array Size : 23413179264 (11164.27 GiB 11987.55 GB)
  Used Dev Size : 7804393088 (3721.42 GiB 3995.85 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 91949959:f4bafdcb:5b756116:9ae3f29e

    Update Time : Thu Nov 23 18:27:43 2017
       Checksum : e63a082d - correct
         Events : 156

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : ..AA ('A' == active, '.' == missing)
 

/dev/sdd3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : eadf7c1e:9c3e4ac3:3f35ab65:118a3497
           Name : NAS01:2  (local to host NAS01)
  Creation Time : Sat May 21 17:13:58 2016
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7804393120 (3721.42 GiB 3995.85 GB)
     Array Size : 23413179264 (11164.27 GiB 11987.55 GB)
  Used Dev Size : 7804393088 (3721.42 GiB 3995.85 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 1626944f:cc7b07bc:a62141ac:3024989c

    Update Time : Thu Nov 23 18:27:43 2017
       Checksum : 5fc3476d - correct
         Events : 156

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : ..AA ('A' == active, '.' == missing)

I read about how to get the disks back in to the arrray and thought about running this commands:
mdadm --stop /dev/md2
mdadm --assemble --run /dev/md2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 -v

My problem is, that I don't know if the order of the disks depends in the last command? Because I don't know if /dev/sda3 or /dev/sdb3 should be on first or second position. Furthermore I don't know if the array is in case build of /dev/sda3 ; /dev/sdb3? I think it can only be this two? I am also scared now if the reinstall process just wiped my data partitions on disk1 and disk2. Could that be possible? Or is this not possible because there exists the /dev/sda3 and /dev/sdb3 on this disks? So If I would run with force I would kill my last hope which would be a data rescue comapany... (If I am right)?
Yes this is really a problem. You should know the correct order! Well mdadm is very robust as I said and when you reassemble the raid it will check which disk was on which logically slot in the raid, no matter where the disk is now and was before physically (CAUTION! THAT ONLY APPLIES TO AN --ASSEMBLE). But I would not recommend to just count on the robustness and forget the bain. I tested it twice on another case and it went well, but a bird in the hand is worth two in the bush. So check the correct disk order before - I will explain how, in the next post.

More problematic is, that the Event Count differs:
/dev/sda3 144
/dev/sdb3 144
/dev/sdc3 156
/dev/sdd3 156

I heard if I run it without --force , then it will fail because of "possible out-of-date" issue. But I think the Event Count differs, because of the boots I did (did like 4, after step 6 I think). Also I read, that running this command with --force can be dangerous..
Command with force:
mdadm --assemble --force --run /dev/md2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 -v
Yes this is true, it would fail with "possible out-of-date" error on /dev/sda3 and /dev/sdb3 and would start the raid as crashed with /dev/sdc3 & dev/sdd3. It would not hurt, but also nothing would be achieved - this is exactly what happened on every boot. By the way, I assuemed correctly. The event count, counted +2 on /dev/sdc3 and /dev/sdd3 on every boot - this is why the event count differs on my partitions.

So what can I do at best? I don't think my disks failed, because a double fail at the same time is very unlikely. A little bit later yes, but not on the same time on reboot... But if they have a failure, I could break even more if I'm doing stuff... So I'm asking me if I should clone one of the failed disks, or every disk...
I will explain you in the next post, what I did and what you should do.

I appreciate any help.
Thank you in advance!

So there we are, I hope you could follow me, if not just ask :). I think it is really important, that you know what you are doing and not unthinkingly run commands, which are recommended on the internet. This is why I write in such detail. So research the web and take the time you need. I know it is hard, if there is valuable data on the raid and not knowing of getting it back. But as I said in the beginning: KEEP CALM! Every step you do could be the step which leads to full data loss. If you do nothing, you can't destroy anything. So think and research before running the commands. Make shure you understood what I or others did and trying to explain before going to act. And if you have another case to test with, even better.

Ok so let's move on and let me explain in steps what I did to get all back. :)
Last edited by Galaxy on Thu Dec 28, 2017 5:05 pm, edited 9 times in total.

Galaxy
I'm New!
I'm New!
Posts: 8
Joined: Thu Nov 23, 2017 10:47 pm

Re: [SOLVED] How to Recover from raid 5 with two failed disks - volume crashed after update (raid 5)

Unread post by Galaxy » Wed Dec 27, 2017 5:03 pm

  1. First of, whenever something like this happens, you can never be shure if it is a hardware defect on the case or the disks, or just a software issue like mine was. So check this first:
    1. Foremost I took out my disks and put a number on all disks (beginning by 1 up to 4), so I can't mix them up by accident. Better way is, to do that in advance, when you put your nas in operation the first time. So if you havn't done it, do it now.
    2. After that I checked every disk for noises. I started one disk by one in a quickport like this: http://en.sharkoon.com/category/storage ... tions.aspx, but you could also connect them in your computer. However the quickport has the advance, that there are no other noises, which could mislead. Just listen for scratching or something like this - noises you normally wouldn't expect. If you aren't shure, watch videos on YouTube or try other disks for comparison. They all sounded alright for me.

      There are several recommendations in the internet, to test the disks with an extended S.M.A.R.T Test. But I would not recommend that, when the disk sounds alright. If a disk has in fact errors and you test every disk, this will only lead to probably more errors. Why you should test a disk if it is maybe defect? If it is, the test will not change anything about that. Furthermore it will increase the probability of causing more errors on the disk. The test for a 4TB disk for example takes around 10 hours. The commands you run to get logs or information and even the assemble will only take minutes. So if your disk isn't making any strange noises and you don't have runned SMART values, which show that the disk is nearly dead, don't rush for a test. However if you have old runned SMART values which indicate, that a disk will die, your best option is to clone the disk, or if the data is worth the money, go to a data rescue company and let them clone the disk for you. Then you can try to rescue your data with the cloned disk.
  2. After that, test if your case has an issue. Steps for a hardware test are described here: https://forum.synology.com/enu/viewtopi ... 15#p472440 All OK? Then go on, otherwise raise a ticket at synology support.
  3. Once we have checked that the case and the disks seem to be alright, we can begin to gather some information about what happened. The best thing is to write down all commands you want to run (e.g. in a notepad on your computer). Then you can start the case, connect via ssh, copy paste the commands in one go and stop the case immediately after that. Afterwards you can analyze the output and think about your next steps. You should do it like that, because if after all there is an issue with the disks, the best is not to start them very often or keep them running for a long time.

    At least you should do the following, to get some information about what is going on:
    • Run the following commands via ssh:
      Note that before you can run any command, activate ssh on your case, if you havn't already: https://www.synology.com/en-global/know ... m_terminal. How you can login via ssh and get root permissions is described here: https://www.synology.com/en-global/know ... SSH_Telnet

      This commands are specific for my amount of disks and partitions. If you have more disks or partitions, run them also for them (like for /dev/sde or /dev/sdb4 etc.)

      Code: Select all

      mdadm --examine /dev/sd[abcd]1
      mdadm --examine /dev/sd[abcd]2
      mdadm --examine /dev/sd[abcd]3
      
      smartctl --xall /dev/sda
      smartctl --xall /dev/sdb
      smartctl --xall /dev/sdc
      smartctl --xall /dev/sdd
      
      mdadm --detail /dev/md0
      mdadm --detail /dev/md1
      mdadm --detail /dev/md2
      
      fdisk -l /dev/sd[abcd]
      
      cat /proc/mdstat
      
      Also enter the following folder /etc/space. There should be files with the name space_history_<date>_<time>.xml, like this one: space_history_20171203_141741.xml. To get the content just write cat in front of them.
      This files are essential to get the disk order you had at the beginning. They contain information (amongst other things, the serial number of the physical disks, the logical slot of the partitions in the raid and the device names (e.g. /dev/sda3)) about how the raids are assembled at the time the name of the file contains. Get the contet of some older and a current file. It shows how the raid was assembled before and how it is assembled now.

      Code: Select all

      cd /etc/space
      
      cat space_history_20161212_054349.xml
      cat space_history_20170101_103420.xml
      cat space_history_20170616_054728.xml
      
      Example Outputs from mine:
      You can see that the raid 5 which contains the data is indeed as assumed build out of the partitions /dev/sda3, /dev/sdb3, /dev/sdc3 and /dev/sdd3. Furthermore you can see the serial of each disk and which device name was given to the partitions (remeber, that /dev/sda hasn't to be the same physical disk, if you would change the physical disk order in the case). And of course, you can see the logical slots of the devices in the raid. So if we now compare an older output with an more younger output we can take the information, that the disks with the serial 123C and 123D have the same device name and are on the same slot as before. So this disks are fine. Now we have to check if this is also for the two lost disks.

      We can get this information if we analyze our runned commands above. In the output of smartctl --xall /dev/sdx there is a line "Serial Number:". So we can get for every device name the serial number and compare this with an older xml. For example: Take the "Serial Number:" from the command smartctl --xall /dev/sda and check if this number is equivalent to the number with the device name /dev/sda in an older output of the xml. If it's the same, you know that the disk is correctly named and most likely in the correct physical slot and more important, which logical slot it was and should be in the raid config (for /dev/sda it is slot 0 in my case).

      You can be most likely shure that the disk is also in the correct physical slot, because normally the first physical disk is slot 0 in the raid and named /dev/sda, the second is slot 1 and named /dev/sdb and so on (I mentioned that in my previous post). However if you see for example, that the "Serial Number:" from the command smartctl --xall /dev/sda has the serial number of /dev/sdb in an older xml file and vice versa, then it seems to be that you most likely reversed the disk /dev/sda and /dev/sdb. You can change the disk order then, so that it will match (check with the same commands after that, if it now fits). Anyways, how I said mdadm is very robust and should recognize every disk correctly and therefore assemble them in the correct order, because of using the uuid instead the device name. So the physical order isn't that necessary when you want to assemble the raid. I have to put a big :!: here, because I would not stake my life on that! I tested it with my test case twice with different mixed orders and it worked, but I can't promise that it would work with another version of mdadm or DSM. So it's better to get back the correct physical disk order if you mixed it, or even better never mix the order and number your disks (Step one :wink:)

      If everything fits we can go on.

      Sidenote: I changed the original serial numbers to 123A, 123B and so on.
      • Older xml with active raid

        Code: Select all

        <?xml version="1.0" encoding="UTF-8"?>
        <spaces>
            <space path="/dev/md2" reference="/volume1" uuid="eadf7c1e:9c3e4ac3:3f35ab65:118a3497" device_type="2" drive_type="0" container_type="2" limited_raidgroup_num="12" >
                <device>
                    <raid path="/dev/md2" uuid="eadf7c1e:9c3e4ac3:3f35ab65:118a3497" level="raid5" version="1.2">
                        <disks>
                            <disk status="normal" dev_path="/dev/sda3" model="WD40EFRX-68WT0N0        " serial="123A" partition_version="8" partition_start="9437184" partition_size="7804395168" slot="0">
                            </disk>
                            <disk status="normal" dev_path="/dev/sdb3" model="WD40EFRX-68WT0N0        " serial="123B" partition_version="8" partition_start="9437184" partition_size="7804395168" slot="1">
                            </disk>
                            <disk status="normal" dev_path="/dev/sdc3" model="WD40EFRX-68WT0N0        " serial="123C" partition_version="8" partition_start="9437184" partition_size="7804395168" slot="2">
                            </disk>
                            <disk status="normal" dev_path="/dev/sdd3" model="WD40EFRX-68WT0N0        " serial="123D" partition_version="8" partition_start="9437184" partition_size="7804395168" slot="3">
                            </disk>
                        </disks>
                    </raid>
                </device>
                <reference>
                    <volume path="/volume1" dev_path="/dev/md2" uuid="eadf7c1e:9c3e4ac3:3f35ab65:118a3497" type="btrfs">
                    </volume>
                </reference>
            </space>
        </spaces>
        
      • xml shortly after the raid failed (volume crashed)

        Code: Select all

        <?xml version="1.0" encoding="UTF-8"?>
        <spaces>
            <space path="/dev/md2" reference="/volume1" uuid="eadf7c1e:9c3e4ac3:3f35ab65:118a3497" device_type="2" drive_type="0" container_type="2" limited_raidgroup_num="12" >
                <device>
                    <raid path="/dev/md2" uuid="eadf7c1e:9c3e4ac3:3f35ab65:118a3497" level="raid5" version="1.2">
                        <disks>
                            <disk status="normal" dev_path="/dev/sdc3" model="WD40EFRX-68WT0N0        " serial="123C" partition_version="8" partition_start="9437184" partition_size="7804395168" slot="2">
                            </disk>
                            <disk status="normal" dev_path="/dev/sdd3" model="WD40EFRX-68WT0N0        " serial="123D" partition_version="8" partition_start="9437184" partition_size="7804395168" slot="3">
                            </disk>
                        </disks>
                    </raid>
                </device>
                <reference>
                    <volume path="/volume1" dev_path="/dev/md2" uuid="eadf7c1e:9c3e4ac3:3f35ab65:118a3497">
                    </volume>
                </reference>
            </space>
        </spaces>
        
      Don't forget to shutdown your case after running the commands. You can analyze the output in peace when the case is shut down. :wink:
  4. Now after you have analyzed what happend based on the output, you can rethink what you want to do next. If you have a case like mine, where two disks of your raid 5 got lost at the same time and the event count doesn't differ to much from the event counts of the disks, which are still in the raid, your chances are very good to get your raid back online with just assembling the disks.

    To assemble the raid you have to follow this steps:
    1. Start the case and login via ssh.
    2. Run some of the commands you runned before to check, if the disk order still fits.
    3. Stop the raid (Run the command mdadm --stop and behind this the name of the raid (in my case /dev/md2)):

      Code: Select all

      mdadm --stop /dev/md2
    4. Assemble the raid. Run the command mdadm --assemble --run and behind this the name of the raid (in my case /dev/md2) and every device name (partition) your raid contains. In my case it is /dev/sda3, /dev/sdb3, /dev/sdc3 and /dev/sdd3. The order doesn't matter when you run the assemble command (I tried this on my test case), but I would recommend to run it in the correct order compared to the xml files (device name of slot 0 first, then device name of slot 1 and so on). The -v at the end means verbose and will just put out more information when the command runs.
      In my case the command is this one:

      Code: Select all

      mdadm --assemble --run /dev/md2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 -v
      Because of the different event count of my disks, the command will fail - well not fail, but my raid will only start with two disks (/dev/sdc3 & /dev/sdd3). The first two (/dev/sda3 & /dev/sdb3) have an older event count (144) as the two (/dev/sdc3 & /dev/sdd3) which were still in the raid after the crash (156). The good point is, that you now can see how mdadm will try to assemble your raid and compare this with the output of your previous commands. For example: In my case, the serial number (123A) of /dev/sda3 in the xml file should match the "Serial Number:" of the output of smartctl --xall /dev/sda and the slot number (0) in the xml should match with the assemble command slot number (/dev/sda3 is identified as a member of /dev/md2, slot 0). Same goes for the other disks. If something doesn't fit, it could happen that mdadm will assemble your raid in the wrong order if you force the assemble - no matter what, don't do that!
      Note that it could be, that the (possible out of date) isn't shown.

      Code: Select all

      mdadm: looking for devices for /dev/md2
      mdadm: /dev/sda3 is identified as a member of /dev/md2, slot 0.
      mdadm: /dev/sdb3 is identified as a member of /dev/md2, slot 1.
      mdadm: /dev/sdc3 is identified as a member of /dev/md2, slot 2.
      mdadm: /dev/sdd3 is identified as a member of /dev/md2, slot 3.
      mdadm: added /dev/sda3 to /dev/md2 as 0 (possibly out of date)
      mdadm: added /dev/sdb3 to /dev/md2 as 1 (possibly out of date)
      mdadm: added /dev/sdc3 to /dev/md2 as 2
      mdadm: added /dev/sdd3 to /dev/md2 as 3 
      mdadm: /dev/md2 has been started with 2 drives (out of 4).
    5. When you are shure that the command will assemble your raid in the correct order, you can run it with force now:

      Code: Select all

      mdadm --assemble --force --run /dev/md2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 -v
      If everything goes well, it will have success, assemble your raid and start it (you can see, how the event count is forced to be counted up):
      Note that the sequence "mdadm: added x to y" in the output of the command can vary, but this doesn't matter.

      Code: Select all

      mdadm: looking for devices for /dev/md2
      mdadm: /dev/sda3 is identified as a member of /dev/md2, slot 0.
      mdadm: /dev/sdb3 is identified as a member of /dev/md2, slot 1.
      mdadm: /dev/sdc3 is identified as a member of /dev/md2, slot 2.
      mdadm: /dev/sdd3 is identified as a member of /dev/md2, slot 3.
      mdadm: forcing event count in /dev/sda3(0) from 144 upto 160
      mdadm: forcing event count in /dev/sdb3(1) from 144 upto 160
      mdadm: added /dev/sdb3 to /dev/md2 as 1
      mdadm: added /dev/sdc3 to /dev/md2 as 2
      mdadm: added /dev/sdd3 to /dev/md2 as 3
      mdadm: added /dev/sda3 to /dev/md2 as 0
      mdadm: /dev/md2 has been started with 4 drives.
    6. Hell yeah if this goes well you are one, ok two steps away from your loved data. Create a directory (e.g. name it recovery):

      Code: Select all

      mkdir /recovery
    7. And mount your raid (In my case I mounted /dev/md2 to the created directory /recovery). I also would recommend to mount as read only, as I did (-o ro).

      Code: Select all

      mount -o ro /dev/md2 /recovery
    8. If you now take a look into the directory your data should be there:

      Code: Select all

      ls -la /recovery
  5. Now just take a program like WinSCP connect to your nas and copy everything to another storage. After that you can decide what to do next. I would recommend to check all disks with extended SMART-Tests and to reinstall the NAS - but that's your choice 8)


    Some more information, about my tests:

    Before I did anything, I tested this all on a test case with other disks. I also could reproduce the error I had. On my first two disks the swap partition was missing (/dev/md1 -> /dev/sda2 & /dev/sdb2), so I decided to delete them manual on a test case. I installed the test case from scratch with 4 disks, and created a raid 5. After the consistency check I shut it down, removed the first disk (/dev/sda) and deleted the partition (/dev/sda2) with gparted. After that I reinserted it and well guess what happend. Yes the raid 5 (/dev/md2) was degraded. I then did the same for the second disk. And yes indeed, I got the configuration lost issue in the Synology Assistant and couldn't access the nas. So I tried to reinstall. However, this time the reinstallation had run without any error, but on boot there was nothing reinstalled - it came up with configuration lost again (I tried the reinstall 3 times, with the same result). So I checked disk1 and disk2 in gparted - well both partitions were missing... So the reinstall could just not recreate the partitions - same issue like on my case... The only difference was, that my case booted up after the reinstall failure, instead of showing configuration lost again. So I pulled disk1 and disk2 out, started the case and after a few seconds put disk1 and disk2 back in. Now it booted up and as expected, the raid crashed. So to do the final test, I installed the case from scratch again and deleted the last two disks (disk3 and disk4). Now the case started normally, but yes as expected, the raid was crashed... This means, that the cases from synology always try to start with the dsm from the first disk and if there is any issue, it won't boot. And also it means, that if a system partition on a disk dies (I assume that the same will happen, if you delete parts of /dev/md0), the partitions from your volumes are also affected... Very strange :?: :shock: I can only explain the behaviour to me, that the disk will be thrown out of all raids when something happens to one array (e.g. /dev/md1), because it is assumend something abnormal happened.

    I also tested a recreation of the raid after deleting the system partition from disk3 and disk4. Well the assemble worked as well before, but I wanted to test the recreation. On this link is explained how to recreate the raid successfully, if an assemble won't work (for example if the supberblock is missing <- what I mentioned in my previous post): https://unix.stackexchange.com/a/146944/208241 But keep in mind, that recreating is dangerous. You should know what you do! And on the recreation the disk order is significant, especially in the mdadm --create command! The wrong order will cause data loss!

    Some uselfull links:

Galaxy
I'm New!
I'm New!
Posts: 8
Joined: Thu Nov 23, 2017 10:47 pm

Re: [SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by Galaxy » Thu Dec 28, 2017 6:23 pm

So there we are, I hope this will help someone. If you have questions, suggestions or found mistakes, please post it so I can adjust it.

Last but not least there is to say again: Raid is no Backup! So make it more often. I nearly learned it the hard way and would have lost some valuable data for me - not all, but some of the latest :?
And even better is to make a backup from backup and after that a backup from this backup offsite :lol: (fire or water can be dangerous... :twisted:)
Well seriously get a better backup concept which runs automatically, if you not having it already. :wink:

Another thing is not to use raid 5 anymore. I wanted to change my system soon, but never did (you know the life get's you :D). So I'm glad it crashed now and I was lucky, so that I can do it now. The main reason for not using raid 5 anymore is the time a rebuild process will take. If you have disks from the same vendor and bought them the same time, the probability that more than one disk will fail at the same time is huge. In the past that wasn't a big problem, because you did run disks with around 500 GB. But nowadays with 4, 8 or even 10 TB disks, the rebuild time takes days. Then it is very likely that another disk dies within this process or while you are copying your data from a degraded raid 5 and your raid is gone... So you can imagine what follows: Even Raid 6 with such large disks and raids with very much disks will not be recommended anymore in the future. Well for your home business it isn't that big issue when you have a good backup concept, because normally you don't have that much and large disks and the reliability needs. But in critical business it's an issue.

I will build my nas from scratch with raid 6 now and go for an encrypted cloud backup: https://www.synology.com/en-global/dsm/ ... cloud_sync :mrgreen:

stryke
Trainee
Trainee
Posts: 12
Joined: Mon Jan 24, 2011 9:17 am

Re: [SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by stryke » Sat Jul 07, 2018 5:01 am

Thank you Galaxy for such a details sharing.

I have also encountered the problem which I believe is similar to yours, only that I have an ancient DS410 with 4 bays on RAID 5 and no hot spare.

Initially when I saw that the volume had crashed I panicked (like you said not to) and switched off the NAS, after which I only got the blink blue light of death when I wanted to switch it on again.

Found a spare old HDD 250gb, took out all the 4 HDD and installed the 250gb drive and managed to get a new system installed to startup the NAS. Seems hardware is ok.

Did a few disk swaps and now finally got the NAS to startup with the original 4 HDDs inside.

Now it looks like this:

Image

Image

Image

I am afraid to click on the REPAIR.

I have zero knowledge on Linux so even though your detailed instructions are step by step I am still afraid to execute anything at all. I don't even know which Mac client I can use to SSH to check the status of the RAID.

Created a support ticket to Synology since the crash with was 3 days ago and set up for them to remotely access the NAS to do troubleshooting, but have yet to receive any form of assistance yet from them except how to enable remote access for them.

Hopefully they are able to help. This is taking too long and I am very worried.

My life's work is stored inside that NAS.

Now I am undecided to turn the NAS off until further advise or to leave it on for the tech support team to access as and when they are ready to.

Went out to spend my last dollars on a WD Red 6TB drive to back up whatever data I can salvage but can't see data at all as Volume not available..... deep down in my soul I believe its there.
Shalom & regards
Stryke

Synology DS410
WD Green 3TB x4 Raid 5

stryke
Trainee
Trainee
Posts: 12
Joined: Mon Jan 24, 2011 9:17 am

Re: [HELP] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by stryke » Sat Jul 07, 2018 8:39 am

This are the information I've gathered.

Hopefully someone is able to help advise.

Code: Select all

Personalities : [raid1] [linear] [raid0] [raid10] [raid6] [raid5] [raid4] 
md2 : active raid5 sda3[7] sdc3[5](E) sdb3[6]
      8776635840 blocks super 1.1 level 5, 64k chunk, algorithm 2 [4/3] [UUE_]
      
md1 : active raid1 sdd2[3] sdc2[2] sdb2[1] sda2[0]
      2097088 blocks [4/4] [UUUU]
      
md0 : active raid1 sda1[0] sdc1[2]
      2490176 blocks [4/2] [U_U_]
      
unused devices: <none>

/dev/md0:
        Version : 0.90
  Creation Time : Mon Jan 24 02:26:28 2011
     Raid Level : raid1
     Array Size : 2490176 (2.37 GiB 2.55 GB)
  Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
   Raid Devices : 4
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sat Jul  7 16:21:59 2018
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 055469f2:3f40eb73:e8f7cd2c:fc4aa1f2 (local to host DiskStation)
         Events : 0.3803871

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0        1      removed
       2       8       33        2      active sync   /dev/sdc1
       3       0        0        3      removed
DiskStation> mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Sat Jul  7 11:31:41 2018
     Raid Level : raid1
     Array Size : 2097088 (2048.28 MiB 2147.42 MB)
  Used Dev Size : 2097088 (2048.28 MiB 2147.42 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sat Jul  7 11:33:15 2018
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

           UUID : 7ed1090e:12e9f252:e8f7cd2c:fc4aa1f2 (local to host DiskStation)
         Events : 0.19

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       8       34        2      active sync   /dev/sdc2
       3       8       50        3      active sync   /dev/sdd2
DiskStation> mdadm --detail /dev/md2
/dev/md2:
        Version : 1.1
  Creation Time : Mon Jan 24 02:50:24 2011
     Raid Level : raid5
     Array Size : 8776635840 (8370.05 GiB 8987.28 GB)
  Used Dev Size : 2925545280 (2790.02 GiB 2995.76 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Sat Jul  7 11:32:33 2018
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : DiskStation:2  (local to host DiskStation)
           UUID : 39855676:8fa7f29f:090a5689:1deeb031
         Events : 51790876

    Number   Major   Minor   RaidDevice State
       7       8        3        0      active sync   /dev/sda3
       6       8       19        1      active sync   /dev/sdb3
       5       8       35        2      active sync   /dev/sdc3
       3       0        0        3      removed

fdisk -l
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sda: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks  Id System
/dev/sda1               1      267350  2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sdb: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks  Id System
/dev/sdb1               1      267350  2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sdc: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks  Id System
/dev/sdc1               1      267350  2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sdd: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks  Id System
/dev/sdd1               1      267350  2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them

Disk /dev/sde: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks  Id System
/dev/sde1               1      267350  2147483647+ ee EFI GPT

DiskStation> mdadm --examine /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 055469f2:3f40eb73:e8f7cd2c:fc4aa1f2 (local to host DiskStation)
  Creation Time : Mon Jan 24 02:26:28 2011
     Raid Level : raid1
  Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
     Array Size : 2490176 (2.37 GiB 2.55 GB)
   Raid Devices : 4
  Total Devices : 2
Preferred Minor : 0

    Update Time : Sat Jul  7 17:18:46 2018
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 0
       Checksum : 7c1a2661 - correct
         Events : 3804481


      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       0        0        1      faulty removed
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       0        0        3      faulty removed
DiskStation> mdadm --examine /dev/sda2
/dev/sda2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7ed1090e:12e9f252:e8f7cd2c:fc4aa1f2 (local to host DiskStation)
  Creation Time : Sat Jul  7 11:31:41 2018
     Raid Level : raid1
  Used Dev Size : 2097088 (2048.28 MiB 2147.42 MB)
     Array Size : 2097088 (2048.28 MiB 2147.42 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

    Update Time : Sat Jul  7 11:33:15 2018
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : d6c92310 - correct
         Events : 19


      Number   Major   Minor   RaidDevice State
this     0       8        2        0      active sync   /dev/sda2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       8       50        3      active sync   /dev/sdd2
DiskStation> mdadm --examine /dev/sda3
/dev/sda3:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : 39855676:8fa7f29f:090a5689:1deeb031
           Name : DiskStation:2  (local to host DiskStation)
  Creation Time : Mon Jan 24 02:50:24 2011
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5851090617 (2790.02 GiB 2995.76 GB)
     Array Size : 17553271680 (8370.05 GiB 8987.28 GB)
  Used Dev Size : 5851090560 (2790.02 GiB 2995.76 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 05194094:ab219eee:2204c256:aec4d752

    Update Time : Sat Jul  7 16:37:14 2018
       Checksum : a64c1d40 - correct
         Events : 51790880

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AAA. ('A' == active, '.' == missing)
DiskStation> mdadm --examine /dev/sda4
mdadm: cannot open /dev/sda4: No such file or directory
DiskStation> mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 055469f2:3f40eb73:e8f7cd2c:fc4aa1f2 (local to host DiskStation)
  Creation Time : Mon Jan 24 02:26:28 2011
     Raid Level : raid1
  Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
     Array Size : 2490176 (2.37 GiB 2.55 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Fri Jul  6 19:36:09 2018
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 7c18b526 - correct
         Events : 3796249


      Number   Major   Minor   RaidDevice State
this     1       8       17        1      active sync   /dev/sdb1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
DiskStation> mdadm --examine /dev/sdb2
/dev/sdb2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7ed1090e:12e9f252:e8f7cd2c:fc4aa1f2 (local to host DiskStation)
  Creation Time : Sat Jul  7 11:31:41 2018
     Raid Level : raid1
  Used Dev Size : 2097088 (2048.28 MiB 2147.42 MB)
     Array Size : 2097088 (2048.28 MiB 2147.42 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

    Update Time : Sat Jul  7 11:33:15 2018
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : d6c92322 - correct
         Events : 19


      Number   Major   Minor   RaidDevice State
this     1       8       18        1      active sync   /dev/sdb2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       8       50        3      active sync   /dev/sdd2
DiskStation> mdadm --examine /dev/sdb3
/dev/sdb3:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : 39855676:8fa7f29f:090a5689:1deeb031
           Name : DiskStation:2  (local to host DiskStation)
  Creation Time : Mon Jan 24 02:50:24 2011
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5851090617 (2790.02 GiB 2995.76 GB)
     Array Size : 17553271680 (8370.05 GiB 8987.28 GB)
  Used Dev Size : 5851090560 (2790.02 GiB 2995.76 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 5af2122d:0971db9e:c49d37f1:4bdf23a1

    Update Time : Sat Jul  7 16:37:14 2018
       Checksum : d81e05f9 - correct
         Events : 51790880

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : AAA. ('A' == active, '.' == missing)
DiskStation> mdadm --examine /dev/sdb4
mdadm: cannot open /dev/sdb4: No such file or directory
DiskStation> mdadm --examine /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 055469f2:3f40eb73:e8f7cd2c:fc4aa1f2 (local to host DiskStation)
  Creation Time : Mon Jan 24 02:26:28 2011
     Raid Level : raid1
  Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
     Array Size : 2490176 (2.37 GiB 2.55 GB)
   Raid Devices : 4
  Total Devices : 2
Preferred Minor : 0

    Update Time : Sat Jul  7 17:21:02 2018
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 0
       Checksum : 7c1a274d - correct
         Events : 3804513


      Number   Major   Minor   RaidDevice State
this     2       8       33        2      active sync   /dev/sdc1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       0        0        1      faulty removed
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       0        0        3      faulty removed
DiskStation> mdadm --examine /dev/sdc2
/dev/sdc2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7ed1090e:12e9f252:e8f7cd2c:fc4aa1f2 (local to host DiskStation)
  Creation Time : Sat Jul  7 11:31:41 2018
     Raid Level : raid1
  Used Dev Size : 2097088 (2048.28 MiB 2147.42 MB)
     Array Size : 2097088 (2048.28 MiB 2147.42 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

    Update Time : Sat Jul  7 11:33:15 2018
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : d6c92334 - correct
         Events : 19


      Number   Major   Minor   RaidDevice State
this     2       8       34        2      active sync   /dev/sdc2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       8       50        3      active sync   /dev/sdd2
DiskStation> mdadm --examine /dev/sdc3
/dev/sdc3:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : 39855676:8fa7f29f:090a5689:1deeb031
           Name : DiskStation:2  (local to host DiskStation)
  Creation Time : Mon Jan 24 02:50:24 2011
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5851090617 (2790.02 GiB 2995.76 GB)
     Array Size : 17553271680 (8370.05 GiB 8987.28 GB)
  Used Dev Size : 5851090560 (2790.02 GiB 2995.76 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 0852ed62:7def939e:b029bf0e:84c245ad

    Update Time : Sat Jul  7 16:37:14 2018
       Checksum : 375a4777 - correct
         Events : 51790880

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : AAA. ('A' == active, '.' == missing)
DiskStation> mdadm --examine /dev/sdc4
mdadm: cannot open /dev/sdc4: No such file or directory
DiskStation> mdadm --examine /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 055469f2:3f40eb73:e8f7cd2c:fc4aa1f2 (local to host DiskStation)
  Creation Time : Mon Jan 24 02:26:28 2011
     Raid Level : raid1
  Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
     Array Size : 2490176 (2.37 GiB 2.55 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sat Jul  7 00:47:43 2018
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 7c191b50 - correct
         Events : 3799973


      Number   Major   Minor   RaidDevice State
this     3       8       49        3      active sync   /dev/sdd1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       0        0        1      faulty removed
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
DiskStation> mdadm --examine /dev/sdd2
/dev/sdd2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7ed1090e:12e9f252:e8f7cd2c:fc4aa1f2 (local to host DiskStation)
  Creation Time : Sat Jul  7 11:31:41 2018
     Raid Level : raid1
  Used Dev Size : 2097088 (2048.28 MiB 2147.42 MB)
     Array Size : 2097088 (2048.28 MiB 2147.42 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

    Update Time : Sat Jul  7 11:33:15 2018
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : d6c92346 - correct
         Events : 19


      Number   Major   Minor   RaidDevice State
this     3       8       50        3      active sync   /dev/sdd2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       8       50        3      active sync   /dev/sdd2
DiskStation> mdadm --examine /dev/sdd3
/dev/sdd3:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x2
     Array UUID : 39855676:8fa7f29f:090a5689:1deeb031
           Name : DiskStation:2  (local to host DiskStation)
  Creation Time : Mon Jan 24 02:50:24 2011
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5851090617 (2790.02 GiB 2995.76 GB)
     Array Size : 17553271680 (8370.05 GiB 8987.28 GB)
  Used Dev Size : 5851090560 (2790.02 GiB 2995.76 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
Recovery Offset : 5485397776 sectors
          State : clean
    Device UUID : 57349eae:c6c74175:a5959ff4:ca7d8b72

    Update Time : Fri Jul  6 17:15:54 2018
       Checksum : cbd167b5 - correct
         Events : 51790820

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing)
DiskStation>
Last edited by stryke on Sat Jul 07, 2018 10:39 am, edited 2 times in total.
Shalom & regards
Stryke

Synology DS410
WD Green 3TB x4 Raid 5

Galaxy
I'm New!
I'm New!
Posts: 8
Joined: Thu Nov 23, 2017 10:47 pm

Re: [SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by Galaxy » Sat Jul 07, 2018 9:37 am

Hi stryke,

to use ssh on a MAC you can just google how to connect. Here is a guide for example:
http://osxdaily.com/2017/04/28/howto-ssh-client-mac/

As long as you don't run any command which will assemble the raid or something like that, you will not harm your system. The first commands I posted just print information to the screen. They don't edit any file.

You can run the following commands, which will only give information and post the output here, if you want:

Code: Select all

mdadm --examine /dev/sd[abcd]1
mdadm --examine /dev/sd[abcd]2
mdadm --examine /dev/sd[abcd]3

smartctl --xall /dev/sda
smartctl --xall /dev/sdb
smartctl --xall /dev/sdc
smartctl --xall /dev/sdd

mdadm --detail /dev/md0
mdadm --detail /dev/md1
mdadm --detail /dev/md2

fdisk -l /dev/sd[abcd]

cat /proc/mdstat
And also jump into the folder /etc/space and print one file with a timestamp before the crash and the latest one:
Jump into the folder:

Code: Select all

cd /etc/space
Show the files in the folder:

Code: Select all

ls -la
And then print two files:

Code: Select all

cat space_history_yyyymmdd_012345.xml
cat space_history_yyyymmdd_012345.xml
What I can imagine from the pictures and your last post, is that the third disk (/dev/sdc3) may be faulty or at least has some error. The other two with "System Partion Failed" looking like my issue. But thats only guessing. To give more information, I need the output of the commands...

The status (E) behind that disk is synology specific. An (F) would mean Faulty, and show the disk has some issues. I don't exactly know what the (E) means and it's not easy to find information on the internet. Anyway, somebody did a rescue with that status here - so there's hope :wink: https://serverfault.com/questions/56816 ... in-e-state

Do your hear any noise from your disks, like scratching? Because if that's the case, I would turn off the case and only turn it on, if you know what you want to do. On my case I wrote down the commands I wanted to run, started the case, run them and immediately afterwards shutted down the case. Then I analyzed the output. But I know the struggle between keeping the system online or shutting it down. Especially if a synology support should be able to connect... If you don't hear any weird noise and want the synology support to be able to connect, I would keep it online.
Last edited by Galaxy on Wed Jul 11, 2018 11:21 pm, edited 1 time in total.

stryke
Trainee
Trainee
Posts: 12
Joined: Mon Jan 24, 2011 9:17 am

Re: [SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by stryke » Sat Jul 07, 2018 10:44 am

Hi Galaxy,

Thanks for the reply.

Ive edited my last post with information that I've gathered following your guide.

Ive gathered that md2 is my paid 5 data partition.

other than that I have gathered that somehow, my raid disks positions are wrong (7, 5, 6) and I have only a 4 bay NAS.

Also I believe that disk 4 is removed (either the NAS is not detecting it or something else).

under md2 info: [UUE_] and i do not know what 'E' means. will look at you given link.

hope you can help me to shed some light on how I can rectify the isse so that I can retrieve my life's work to back up to a new disk before I reformat the whole array and see if I need to change all the HDD.

I do not hear any weird scratching noise coming from the disks. If any comfort, SMART status all normal.

Indeed, the HDD status LEDs on the NAS indicate 1,2,4 as green, and 3 as amber. so your deduction of Disk 3 being at fault could be true. but also disk 4 is missing. so I am really quite confused now.
Shalom & regards
Stryke

Synology DS410
WD Green 3TB x4 Raid 5

stryke
Trainee
Trainee
Posts: 12
Joined: Mon Jan 24, 2011 9:17 am

Re: [SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by stryke » Sat Jul 07, 2018 10:50 am

Code: Select all

DiskStation> smartctl --xall /dev/sda
smartctl 6.2 (build date Apr 26 2016) [ppc-linux-2.6.32.12] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               WDC
Product:              WD30EZRX-00DC0B0
Revision:             80.0
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Lowest aligned LBA:   0
Serial number:        WD-WMC1T0688740
Device type:          disk
Local Time is:        Sat Jul  7 17:49:31 2018 SGT
SMART support is:     Unavailable - device lacks SMART capability.
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===

Error Counter logging not supported


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging
Device does not support Background scan results logging
scsiPrintSasPhy Log Sense Failed [unsupported scsi opcode]

DiskStation> smartctl --xall /dev/sdb
smartctl 6.2 (build date Apr 26 2016) [ppc-linux-2.6.32.12] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               WDC
Product:              WD30EZRX-00D8PB0
Revision:             80.0
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Lowest aligned LBA:   0
Rotation Rate:        5400 rpm
Serial number:        WD-WCC4N6FAJ1F4
Device type:          disk
Local Time is:        Sat Jul  7 17:49:54 2018 SGT
SMART support is:     Unavailable - device lacks SMART capability.
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===

Error Counter logging not supported


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging
Device does not support Background scan results logging
scsiPrintSasPhy Log Sense Failed [unsupported scsi opcode]

DiskStation> smartctl --xall /dev/sdc
smartctl 6.2 (build date Apr 26 2016) [ppc-linux-2.6.32.12] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               WDC
Product:              WD30EZRX-19D8PB0
Revision:             81.0
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Lowest aligned LBA:   0
Rotation Rate:        5400 rpm
Serial number:        WD-WMC4N0E8NA70
Device type:          disk
Local Time is:        Sat Jul  7 17:50:09 2018 SGT
SMART support is:     Unavailable - device lacks SMART capability.
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===

Error Counter logging not supported


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging
Device does not support Background scan results logging
scsiPrintSasPhy Log Sense Failed [unsupported scsi opcode]

DiskStation> smartctl --xall /dev/sdd
smartctl 6.2 (build date Apr 26 2016) [ppc-linux-2.6.32.12] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               WDC
Product:              WD30EZRX-00DC0B0
Revision:             80.0
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Lowest aligned LBA:   0
Serial number:        WD-WMC1T0723184
Device type:          disk
Local Time is:        Sat Jul  7 17:50:34 2018 SGT
SMART support is:     Unavailable - device lacks SMART capability.
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===

Error Counter logging not supported


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging
Device does not support Background scan results logging
scsiPrintSasPhy Log Sense Failed [unsupported scsi opcode]

DiskStation> 
Shalom & regards
Stryke

Synology DS410
WD Green 3TB x4 Raid 5

stryke
Trainee
Trainee
Posts: 12
Joined: Mon Jan 24, 2011 9:17 am

Re: [SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by stryke » Sat Jul 07, 2018 10:55 am

Code: Select all

DiskStation> cd /etc/space  
DiskStation> ls -la
drwx------    2 root     root          8192 Jul  7 11:31 .
drwxr-xr-x   38 root     root          4096 Jul  7 11:33 ..
-rw-rw-rw-    1 root     root           933 Jun  6  2013 space_history_20130606_233354.xml
-rw-rw-rw-    1 root     root           933 Jul 26  2013 space_history_20130726_191524.xml
-rw-rw-rw-    1 root     root           933 Aug  1  2013 space_history_20130801_012911.xml
-rw-rw-rw-    1 root     root           933 Sep  8  2013 space_history_20130908_042111.xml
-rw-rw-rw-    1 root     root           965 Sep  9  2013 space_history_20130909_025052.xml
-rw-rw-rw-    1 root     root           965 Sep  9  2013 space_history_20130909_030759.xml
-rw-rw-rw-    1 root     root           965 Sep 20  2013 space_history_20130920_165553.xml
-rw-rw-rw-    1 root     root           965 Oct  8  2013 space_history_20131008_005137.xml
-rw-rw-rw-    1 root     root           965 Dec  2  2013 space_history_20131202_161254.xml
-rw-rw-rw-    1 root     root           965 Dec  3  2013 space_history_20131203_031522.xml
-rw-rw-rw-    1 root     root           965 Jan 16  2014 space_history_20140116_164025.xml
-rw-rw-rw-    1 root     root           965 Jan 16  2014 space_history_20140116_182911.xml
-rw-rw-rw-    1 root     root           965 Aug  2  2014 space_history_20140802_230916.xml
-rw-rw-rw-    1 root     root           965 Aug 15  2014 space_history_20140815_171835.xml
-rw-rw-rw-    1 root     root           965 Sep 17  2014 space_history_20140917_181239.xml
-rw-rw-rw-    1 root     root           965 Oct 21  2014 space_history_20141021_143907.xml
-rw-rw-rw-    1 root     root           965 Oct 25  2014 space_history_20141025_045937.xml
-rw-rw-rw-    1 root     root           965 Oct 26  2014 space_history_20141026_022322.xml
-rw-rw-rw-    1 root     root           965 Nov  3  2014 space_history_20141103_012540.xml
-rw-rw-rw-    1 root     root           965 Nov  6  2014 space_history_20141106_201454.xml
-rw-rw-rw-    1 root     root           965 Dec  1  2014 space_history_20141201_162933.xml
-rw-rw-rw-    1 root     root           814 Dec  2  2014 space_history_20141202_182910.xml
-rw-rw-rw-    1 root     root           969 Dec  2  2014 space_history_20141202_183315.xml
-rw-rw-rw-    1 root     root           965 Dec  5  2014 space_history_20141205_030544.xml
-rw-rw-rw-    1 root     root           965 Dec 19  2014 space_history_20141219_030711.xml
-rw-rw-rw-    1 root     root           965 Jan  2  2015 space_history_20150102_030511.xml
-rw-rw-rw-    1 root     root           965 Jan 16  2015 space_history_20150116_030653.xml
-rw-rw-rw-    1 root     root           965 Jan 23  2015 space_history_20150123_030355.xml
-rw-rw-rw-    1 root     root           965 Apr 18  2015 space_history_20150418_143123.xml
-rw-rw-rw-    1 root     root           814 Apr 20  2015 space_history_20150420_164550.xml
-rw-rw-rw-    1 root     root           814 Apr 20  2015 space_history_20150420_165650.xml
-rw-rw-rw-    1 root     root           969 Apr 20  2015 space_history_20150420_170935.xml
-rw-rw-rw-    1 root     root           965 Apr 26  2015 space_history_20150426_111232.xml
-rw-rw-rw-    1 root     root          1216 May 15  2015 space_history_20150515_035238.xml
-rw-rw-rw-    1 root     root          1216 May 16  2015 space_history_20150516_162751.xml
-rw-rw-rw-    1 root     root          1216 May 23  2015 space_history_20150523_193710.xml
-rw-rw-rw-    1 root     root          1216 May 24  2015 space_history_20150524_005642.xml
-rw-rw-rw-    1 root     root          1011 May 25  2015 space_history_20150525_013735.xml
-rw-rw-rw-    1 root     root          1011 May 25  2015 space_history_20150525_234922.xml
-rw-rw-rw-    1 root     root          1220 May 25  2015 space_history_20150525_235224.xml
-rw-rw-rw-    1 root     root          1216 Jun  5  2015 space_history_20150605_030403.xml
-rw-rw-rw-    1 root     root          1216 Jun 12  2015 space_history_20150612_030523.xml
-rw-rw-rw-    1 root     root          1216 Jun 18  2015 space_history_20150618_171551.xml
-rw-rw-rw-    1 root     root          1216 Jun 23  2015 space_history_20150623_004810.xml
-rw-rw-rw-    1 root     root          1216 Jun 23  2015 space_history_20150623_010840.xml
-rw-rw-rw-    1 root     root          1216 Jul  3  2015 space_history_20150703_030727.xml
-rw-rw-rw-    1 root     root          1216 Jul 17  2015 space_history_20150717_033346.xml
-rw-rw-rw-    1 root     root          1216 Jul 31  2015 space_history_20150731_030537.xml
-rw-rw-rw-    1 root     root          1216 Jul 31  2015 space_history_20150731_031011.xml
-rw-rw-rw-    1 root     root          1216 Aug 21  2015 space_history_20150821_030535.xml
-rw-rw-rw-    1 root     root          1216 Sep 11  2015 space_history_20150911_030410.xml
-rw-rw-rw-    1 root     root          1301 Oct 30  2015 space_history_20151030_030700.xml
-rw-rw-rw-    1 root     root          1301 Nov  6  2015 space_history_20151106_214608.xml
-rw-rw-rw-    1 root     root          1301 Nov 20  2015 space_history_20151120_030743.xml
-rw-rw-rw-    1 root     root          1301 Dec 10  2015 space_history_20151210_131054.xml
-rw-rw-rw-    1 root     root          1301 Dec 10  2015 space_history_20151210_134713.xml
-rw-rw-rw-    1 root     root          1301 Dec 29  2015 space_history_20151229_235031.xml
-rw-rw-rw-    1 root     root          1301 Jan 29  2016 space_history_20160129_030832.xml
-rw-rw-rw-    1 root     root          1301 Mar  8  2016 space_history_20160308_000533.xml
-rw-rw-rw-    1 root     root          1301 Mar  8  2016 space_history_20160308_201100.xml
-rw-rw-rw-    1 root     root          1301 Apr  9  2016 space_history_20160409_171958.xml
-rw-rw-rw-    1 root     root          1301 Apr  9  2016 space_history_20160409_220638.xml
-rw-rw-rw-    1 root     root          1301 Apr 16  2016 space_history_20160416_173108.xml
-rw-rw-rw-    1 root     root          1301 Apr 16  2016 space_history_20160416_173649.xml
-rw-rw-rw-    1 root     root          1301 Apr 27  2016 space_history_20160427_211040.xml
-rw-rw-rw-    1 root     root          1301 Apr 29  2016 space_history_20160429_030919.xml
-rw-rw-rw-    1 root     root          1301 May 24  2016 space_history_20160524_025137.xml
-rw-rw-rw-    1 root     root          1301 Jul  2  2016 space_history_20160702_131009.xml
-rw-rw-rw-    1 root     root          1301 Jul 23  2016 space_history_20160723_002402.xml
-rw-rw-rw-    1 root     root          1301 Jul 29  2016 space_history_20160729_031107.xml
-rw-rw-rw-    1 root     root          1301 Aug 14  2016 space_history_20160814_140712.xml
-rw-rw-rw-    1 root     root          1301 Aug 14  2016 space_history_20160814_143949.xml
-rw-rw-rw-    1 root     root          1301 Oct 28  2016 space_history_20161028_084313.xml
-rw-rw-rw-    1 root     root          1301 Nov 18  2016 space_history_20161118_011131.xml
-rw-rw-rw-    1 root     root          1301 Dec  2  2016 space_history_20161202_222456.xml
-rw-rw-rw-    1 root     root          1301 Feb 14  2017 space_history_20170214_160745.xml
-rw-rw-rw-    1 root     root          1301 Feb 14  2017 space_history_20170214_172327.xml
-rw-rw-rw-    1 root     root          1301 Feb 14  2017 space_history_20170214_172619.xml
-rw-rw-rw-    1 root     root          1301 Feb 24  2017 space_history_20170224_201502.xml
-rw-rw-rw-    1 root     root          1301 Mar 20  2017 space_history_20170320_132134.xml
-rw-rw-rw-    1 root     root          1301 May  4  2017 space_history_20170504_135346.xml
-rw-rw-rw-    1 root     root          1301 May 31  2017 space_history_20170531_224836.xml
-rw-rw-rw-    1 root     root          1301 Jun  2  2017 space_history_20170602_173120.xml
-rw-rw-rw-    1 root     root          1301 Jun  2  2017 space_history_20170602_173940.xml
-rw-rw-rw-    1 root     root          1301 Jun  2  2017 space_history_20170602_175139.xml
-rw-rw-rw-    1 root     root          1301 Jul 24  2017 space_history_20170724_124122.xml
-rw-rw-rw-    1 root     root          1301 Jul 24  2017 space_history_20170724_125953.xml
-rw-rw-rw-    1 root     root          1301 Jul 24  2017 space_history_20170724_135720.xml
-rw-rw-rw-    1 root     root          1301 Jul 24  2017 space_history_20170724_140039.xml
-rw-rw-rw-    1 root     root          1301 Oct  7  2017 space_history_20171007_142702.xml
-rw-rw-rw-    1 root     root          1301 Nov 10  2017 space_history_20171110_035720.xml
-rw-rw-rw-    1 root     root          1301 Dec 18  2017 space_history_20171218_175308.xml
-rw-rw-rw-    1 root     root          1301 Mar  8 11:22 space_history_20180308_112156.xml
-rw-rw-rw-    1 root     root          1301 Mar 12 14:01 space_history_20180312_140120.xml
-rw-rw-rw-    1 root     root          1301 Mar 12 23:09 space_history_20180312_230921.xml
-rw-rw-rw-    1 root     root          1301 Jul  2 20:46 space_history_20180702_204642.xml
-rw-rw-rw-    1 root     root          1096 Jul  3 12:31 space_history_20180703_123119.xml
-rw-rw-rw-    1 root     root          1305 Jul  3 12:35 space_history_20180703_123533.xml
-rw-rw-rw-    1 root     root          1096 Jul  6 00:33 space_history_20180706_003314.xml
-rw-rw-rw-    1 root     root          1096 Jul  6 00:54 space_history_20180706_005431.xml
-rw-rw-rw-    1 root     root          1096 Jul  6 01:52 space_history_20180706_015205.xml
-rw-rw-rw-    1 root     root          1096 Jul  6 17:39 space_history_20180706_173956.xml
-rw-rw-rw-    1 root     root           891 Jul  6 19:03 space_history_20180706_190349.xml
-rw-rw-rw-    1 root     root           891 Jul  6 19:09 space_history_20180706_190949.xml
-rw-rw-rw-    1 root     root           891 Jul  6 19:43 space_history_20180706_194339.xml
-rw-rw-rw-    1 root     root          1096 Jul  7 00:25 space_history_20180707_002545.xml
-rw-rw-rw-    1 root     root          1096 Jul  7 00:51 space_history_20180707_005111.xml
-rw-rw-rw-    1 root     root          1096 Jul  7 00:57 space_history_20180707_005702.xml
-rw-rw-rw-    1 root     root             0 Jul  7 01:05 space_history_20180707_010516.xml
-rw-rw-rw-    1 root     root           891 Jul  7 01:13 space_history_20180707_011317.xml
-rw-rw-rw-    1 root     root          1096 Jul  7 11:31 space_history_20180707_113156.xml
DiskStation> cat space_history_20180707_113156.xml
<?xml version="1.0" encoding="UTF-8"?>
<spaces>
	<space path="/dev/md2" reference="/volume1" uuid="39855676:8fa7f29f:090a5689:1deeb031" device_type="2" container_type="2">
		<device>
			<raid path="/dev/md2" uuid="39855676:8fa7f29f:090a5689:1deeb031" level="raid5" version="1.1">
				<disks>
					<disk status="normal" dev_path="/dev/sda3" model="WD30EZRX-00DC0B0        " serial="WD-WMC1T0688740" partition_version="7" partition_start="9437184" partition_size="5851090881" slot="0">
					</disk>
					<disk status="normal" dev_path="/dev/sdb3" model="WD30EZRX-00D8PB0        " serial="WD-WCC4N6FAJ1F4" partition_version="8" partition_start="9437184" partition_size="5851090881" slot="1">
					</disk>
					<disk status="normal" dev_path="/dev/sdc3" model="WD30EZRX-19D8PB0        " serial="WD-WMC4N0E8NA70" partition_version="8" partition_start="9437184" partition_size="5851090881" slot="2">
					</disk>
				</disks>
			</raid>
		</device>
		<reference>
			<volume path="/volume1" dev_path="/dev/md2" uuid="39855676:8fa7f29f:090a5689:1deeb031">
			</volume>
		</reference>
	</space>
</spaces>
DiskStation> cat space_history_20180707_113156.xml
<?xml version="1.0" encoding="UTF-8"?>
<spaces>
	<space path="/dev/md2" reference="/volume1" uuid="39855676:8fa7f29f:090a5689:1deeb031" device_type="2" container_type="2">
		<device>
			<raid path="/dev/md2" uuid="39855676:8fa7f29f:090a5689:1deeb031" level="raid5" version="1.1">
				<disks>
					<disk status="normal" dev_path="/dev/sda3" model="WD30EZRX-00DC0B0        " serial="WD-WMC1T0688740" partition_version="7" partition_start="9437184" partition_size="5851090881" slot="0">
					</disk>
					<disk status="normal" dev_path="/dev/sdb3" model="WD30EZRX-00D8PB0        " serial="WD-WCC4N6FAJ1F4" partition_version="8" partition_start="9437184" partition_size="5851090881" slot="1">
					</disk>
					<disk status="normal" dev_path="/dev/sdc3" model="WD30EZRX-19D8PB0        " serial="WD-WMC4N0E8NA70" partition_version="8" partition_start="9437184" partition_size="5851090881" slot="2">
					</disk>
				</disks>
			</raid>
		</device>
		<reference>
			<volume path="/volume1" dev_path="/dev/md2" uuid="39855676:8fa7f29f:090a5689:1deeb031">
			</volume>
		</reference>
	</space>
</spaces>
DiskStation> 
DiskStation> cat space_history_20180703_123533.xml
<?xml version="1.0" encoding="UTF-8"?>
<spaces>
	<space path="/dev/md2" reference="/volume1" uuid="39855676:8fa7f29f:090a5689:1deeb031" device_type="2" container_type="2">
		<device>
			<raid path="/dev/md2" uuid="39855676:8fa7f29f:090a5689:1deeb031" level="raid5" version="1.1">
				<disks>
					<disk status="normal" dev_path="/dev/sda3" model="WD30EZRX-00DC0B0        " serial="WD-WMC1T0688740" partition_version="7" partition_start="9437184" partition_size="5851090881" slot="0">
					</disk>
					<disk status="normal" dev_path="/dev/sdb3" model="WD30EZRX-00D8PB0        " serial="WD-WCC4N6FAJ1F4" partition_version="8" partition_start="9437184" partition_size="5851090881" slot="1">
					</disk>
					<disk status="normal" dev_path="/dev/sdc3" model="WD30EZRX-19D8PB0        " serial="WD-WMC4N0E8NA70" partition_version="8" partition_start="9437184" partition_size="5851090881" slot="2">
					</disk>
					<disk status="rebuilding" dev_path="/dev/sdd3" model="WD30EZRX-00DC0B0        " serial="WD-WMC1T0723184" partition_version="8" partition_start="9437184" partition_size="5851090881" slot="3">
					</disk>
				</disks>
			</raid>
		</device>
		<reference>
			<volume path="/volume1" dev_path="/dev/md2" uuid="39855676:8fa7f29f:090a5689:1deeb031">
			</volume>
		</reference>
	</space>
</spaces>
DiskStation> 
Shalom & regards
Stryke

Synology DS410
WD Green 3TB x4 Raid 5

Galaxy
I'm New!
I'm New!
Posts: 8
Joined: Thu Nov 23, 2017 10:47 pm

Re: [SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by Galaxy » Wed Jul 11, 2018 11:21 pm

Sorry for the late answer.

Has Synology contacted you yet?
stryke wrote:
Sat Jul 07, 2018 10:44 am
Ive gathered that md2 is my paid 5 data partition.
-> Correct, md2 is your raid 5.

stryke wrote:
Sat Jul 07, 2018 10:44 am
I do not hear any weird scratching noise coming from the disks.
-> That's good :)

stryke wrote:
Sat Jul 07, 2018 10:44 am
other than that I have gathered that somehow, my raid disks positions are wrong (7, 5, 6) and I have only a 4 bay NAS.
-> Yes, that's weird indeed... What is also strange, is that there is an /dev/sde. Did you plug in another fifth drive, or an external one?

stryke wrote:
Sat Jul 07, 2018 10:44 am
SMART status all normal.
-> Are this extended smart tests or just quick ones (and when did they run the last time)? Old and quick tests don't give you information on which you can safely rely on. Sidenote: Depending on your use case, it is better to run extended smart tests more often, which will indeed put more workload on that disks, but you are more "safer" and get warned when something goes wrong, or to run them less frequently with a good backup strategy and no priority on keeping your system running 24/7.


Ok let's check your raid:
Your /dev/sdd3 is lost in your raid and got out of sync:

Code: Select all

sda3, sdb3, sdc3	Events: 51790880
sdd3			Events: 51790820
The disks differ in 60 Events (51790880 - 51790820), so that's not much difference and there's a good chance to reassamble the raid successfully. Also the /dev/md1 is still intact - so your third disk can't be complete bad...

I compared the output as well:

Code: Select all

disk 		serial smartctl		serial space files
	  
sda3  		WD-WMC1T0688740		WD-WMC1T0688740
sdb3		WD-WCC4N6FAJ1F4		WD-WCC4N6FAJ1F4
sdc3		WD-WMC4N0E8NA70		WD-WMC4N0E8NA70		<-- disk with the error (E) state
sdd3		WD-WMC1T0723184		WD-WMC1T0723184		<-- your missing one (state rebuilding)
So you havn't messed with the disks, all are in correct order. ;-)

But what I noticed in the output of your space_history_20180703_123533.xml file, is that your /dev/sdd3 was on a rebuilding process... Did you click on repair or something like that and started a rebuild? And could it be, that you have turned off the case, while that rebuild process was running? For me it seems to be, that somehow a rebuild was triggered, then got interrupted and therefore in the next file this disk (/dev/sdd3) is out of sync (event count).
If a rebuild for this disk was made and didn't successfully finish, you can't be shure that the data on this disk is correct. Therefore I would not recommend a reassemble on the first hand before beeing shure about that...

Is that file (space_history_20180703_123533.xml) indeed the latest file before your raid crashed? I would check some of the other files to see if the rebuilding process finished or if right after this status there only follow files with the missing /dev/sdd3. If there is any file with /dev/sdd3 status "normal" between this one and the first file which shows /dev/sdd3 as missing, I would assume that the rebuild successfully finished and therefore would try a reassamble.


Your problem also seems to be very similar to this two posts: If the rebuild was interrupted, a reassemble won't work...

Then you could try to recreate the raid like in the two posts explained:

Code: Select all

mdadm --stop /dev/md2
mdadm --verbose --create /dev/md2 --chunk=64 --level=5 --raid-devices=4 /dev/sda3 /dev/sdb3 /dev/sdc3 missing
Note: --chunk=64, --level=5 and --raid-devices=4 can of course be taken from the output of cat /proc/mdstat.

After the process is complete, check the output:

Code: Select all

cat /proc/mdstat
And finally add the missing disk:

Code: Select all

mdadm --manage /dev/md2 --add /dev/sdd3
Note: I would use a new fresh drive for this last command and replace it with the current drive which contains /dev/sdd3 because it gets rebuilded out of the data from the other disks anyway. And if you use a new fresh drive you avoid possible errors, which this older drive could have. This command is like replacing a drive in your raid. You can find a kind of HowTo replace a drive here: https://www.howtoforge.com/replacing_ha ... aid1_array

The problem on the rebuild is that it takes time and puts workload on the disks... So when one of your disks fails, it's very likely that while the rebuild is in progress another disk fails, especially if all disks are from the same vendor and date of manufacture.

So I can't tell you if it would work for shure... Before going on I highly recommend you, to check the other files (/etc/space) for the status of /dev/sdd3 and if you are not shure, just post them here. A reassemble is the easiest and safest way to get the raid back online, if It's possible!

stryke
Trainee
Trainee
Posts: 12
Joined: Mon Jan 24, 2011 9:17 am

Re: [SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by stryke » Sat Jul 14, 2018 4:13 am

Hi Galaxy,

Thanks for the detailed reply.

Synology tech support has helped me make the volume visible again but read access only for me to backup my data. They said that my disks 2,3,4 have bad sectors which caused the volume to crash. looks like I have to spend to replace all the HDDs.... :(

I will back up all my important files, reformat the old drives and create a new volume then run extensive SMART tests to see if the bad sectors can be reallocated. if not, I'll juts have to replace the HDDs.

/dev/sde could be the satashare volume I connected to the NAS to prepare to back up the data.
Shalom & regards
Stryke

Synology DS410
WD Green 3TB x4 Raid 5

Galaxy
I'm New!
I'm New!
Posts: 8
Joined: Thu Nov 23, 2017 10:47 pm

Re: [SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by Galaxy » Sat Jul 14, 2018 10:31 am

That sounds great! :) I am very happy for you, that you got your data back! Well bad disks is not the best to imagine, but it‘s not that issue compared to loosing well beloved data. ;)

Yes, that would clear things up with that /dev/sde.

I have replaced three of my disks as well, after copying my data safely and running extended smart tests. The disks were still in warranty, so I could do a RMA at Western Digital. I would check if you have the option, too.

I am wondering what synology did. How long did it take? If it didn‘t take so long, I would assume they did a reassemble. Otherwise the rebuild is more likely.

You could run the command

Code: Select all

history
which shows the last commands which have been run and post the output here. This might help other people who have issues like that. :)

stryke
Trainee
Trainee
Posts: 12
Joined: Mon Jan 24, 2011 9:17 am

Re: [SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by stryke » Sat Jul 14, 2018 12:34 pm

Hi Galaxy,

indeed it is great news to be able to access my data again.

they took about 3-4 days to help me recover the data as 'read-only'.

unfortunately, while I was backing up my data I experienced a power outage and now my volume disappeared again.

just sent another request to tech support to help.

I wish they would share with me what commands they did remotely to help recover the volume.

another thing is that my HDDs are out of warranty. if I were to buy another 3 6tb drives that will set me back quite a few hundreds of dollars. :(
Shalom & regards
Stryke

Synology DS410
WD Green 3TB x4 Raid 5

stryke
Trainee
Trainee
Posts: 12
Joined: Mon Jan 24, 2011 9:17 am

Re: [SOLVED] How to recover data from raid 5 with two failed disks - volume crashed after DSM update

Unread post by stryke » Sat Jul 14, 2018 12:38 pm

I entered the code 'history' that you advised but this is what I got:

Code: Select all

DiskStation> history
-ash: history: not found
Shalom & regards
Stryke

Synology DS410
WD Green 3TB x4 Raid 5

Post Reply

Return to “System Managment Mods”