You. Are. My. Hero.
My Sad Story.
I received an alert that a drive failed the SMART test. I was a bit surprised since I replaced most of the drives in the last year - whatever. The next day we had a power outage on one side of the apartment and the NAS was not happy when it came up.
- I have 1 disk group made up of 4 disks in RAID5
- I have 2 volumes on said disk group.
- Volume 1 is the primary with all of the good stuff
- Volume 2 really serves no purpose anymore but its there.
- Volume 1 was listed as degraded so I followed the advice in the NAS and let it run through its check.
- The following morning the status of Volume 1 changed from degraded to crashed.
- Volume 2 however was fine.
- I ended up bouncing it again at some point to verify the physical connections since the disk that allegedly failed was now no longer detected.
Not that there's ever good timing for something like this but this was not ideal as I was moving in 3 days.
I moved in 1 day and got the NAS reconnected by the end of the weekend
I ended up opening a case with Synology support via the Synology website and hoped for a speedy response. I eventually received a response and they asked me to email them the special support key and admin password.
They wanted the key and the admin password. In an Email. In clear text.
smh
I called support to explain the situation and the guy pointed me to the Support Center app on the NAS. I ASSUME that uses HTTPS to submit info but you never know. I put all the details in the support ticket I opened through the support app on the NAS and again waited for a response.
I think I was getting a response in about 24 hours which when you're in "dire straits" feels like an eternity. Again they asked for the password information in an email to which I asked if there was a secure web portal where I could update the ticket etc.
I ultimately called the support line again and asked the same question and this time the person said yes. HOWEVER, it wasn't a support site in the traditional sense. It was access to one of their NAS' where I could upload the debug.dat as well as any additional information. The guy said he'd update the ticket with a note stating that the credentials would be in the folder that way the tech working on my NAS could get the information he needed. Sounded good!
I get in there and add the files to my folder. (I don't recall if I created the folder OR if it already existed) After adding all the details it dawned on me that the credentials they gave were very very very generic and super easy to figure out making me fear what was going on. I started browsing the structure and sure enough, could get into every other ticket. folder on the NAS. Every. One. If someone had been inclined to peruse the folders, they would have found a lovely package by me containing all the details they would need to get into my NAS. Great.
Ok so its December 10th and they have yet to be able to start working on the NAS. All this week I've been fielding emails from Synology Support about not providing the key in the email, not providing the admin password in the email, the NAS not being accessible etc. I had to send screenshots from my phone while at work to prove that it was indeed online and accessible. What nonsense.
Earlier in the week, knowing a little about lvm, I poked around the system to see what the issue was and that's where I discovered the problem was with a corrupt journal transaction. For some crazy reason, although volume 1 wasn't mounted (per df & mount) I couldn't run fsck against it. I get an email from Synology support asking about the commands that were run on the box and why etc. I told them plainly I was poking around trying to understand the structure and left it at that. (Curious, I thought they couldn't acces the NAS though?)
Well, its been way too long without any appreciable action and I'm getting antsy.
So this evening I ran
Code: Select all
fsck.ext4 -fvpc on /dev/mapper/vol1-origin
Then by God's good grace was finally able to mount it in read-only where previously I couldn't and the error escapes me.
I googled something fairly obscure like
Code: Select all
"JBD2: journal transaction" on dm-3-8 "is corrupt"
That found this post which allowed me to get all the shares up and running again.
I'm pulling everything down now and will likely redo everything this weekend.
I'm going to add more data to this post that will hopefully lead more would be Synology Googlers / Bingers in a similar predicament here.
I've also added loads of interesting reads at the bottom.
Excerpt of dmesg
Code: Select all
[ 0.180731] raid6: sse2x1 113 MB/s
[ 0.201468] raid6: sse2x2 218 MB/s
[ 0.221403] raid6: sse2x4 433 MB/s
[ 0.225005] raid6: using algorithm sse2x4 (433 MB/s)
[ 0.230004] raid6: using ssse3x2 recovery algorithm
[ 5.246323] ahci 0000:00:1f.2: version 3.0
[ 5.246669] ahci 0000:00:1f.2: irq 43 for MSI/MSI-X
[ 5.246796] ahci: SSS flag set, parallel bus scan disabled
[ 5.252430] ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x3f impl SATA mode
[ 5.260665] ahci 0000:00:1f.2: flags: 64bit ncq sntf stag pm led clo pio slum part ccc ems sxs
[ 5.269412] ahci 0000:00:1f.2: setting latency timer to 64
[ 5.317588] scsi0 : ahci
[ 5.324795] scsi1 : ahci
[ 5.329774] scsi2 : ahci
[ 5.334874] scsi3 : ahci
[ 5.342832] scsi4 : ahci
[ 5.347462] scsi5 : ahci
[ 5.351217] ata1: SATA max UDMA/133 abar m2048@0xfe87b000 port 0xfe87b100 irq 43
[ 5.358661] ata2: SATA max UDMA/133 abar m2048@0xfe87b000 port 0xfe87b180 irq 43
[ 5.366081] ata3: SATA max UDMA/133 abar m2048@0xfe87b000 port 0xfe87b200 irq 43
[ 5.373502] ata4: SATA max UDMA/133 abar m2048@0xfe87b000 port 0xfe87b280 irq 43
[ 5.380902] ata5: SATA max UDMA/133 abar m2048@0xfe87b000 port 0xfe87b300 irq 43
[ 5.388303] ata6: SATA max UDMA/133 abar m2048@0xfe87b000 port 0xfe87b380 irq 43
[ 5.395702] Delay 10 seconds to wait for disk 1 ready.
[ 15.707036] ata1: SATA link down (SStatus 0 SControl 300)
[ 15.712477] Delay 10 seconds to wait for disk 2 ready.
[ 26.024035] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 26.030832] ata2.00: ATA-9: WDC WD30EFRX-68EUZN0, 80.00A80, max UDMA/133
[ 26.037545] ata2.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 26.045288] ata2.00: configured for UDMA/133
[ 26.050223] scsi 1:0:0:0: Direct-Access WDC WD30EFRX-68EUZN0 80.0 PQ: 0 ANSI: 5
[ 26.060436] sd 1:0:0:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
[ 26.068213] sd 1:0:0:0: [sdb] 4096-byte physical blocks
[ 26.070764] Delay 10 seconds to wait for disk 3 ready.
[ 26.078850] sd 1:0:0:0: [sdb] Write Protect is off
[ 26.083673] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 26.083800] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 26.147914] sdb: sdb1 sdb2 sdb3
[ 26.154214] sd 1:0:0:0: [sdb] Attached SCSI disk
[ 36.377037] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 36.384786] ata3.00: ATA-8: TOSHIBA DT01ACA300, MX6OABB0, max UDMA/133
[ 36.391332] ata3.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 36.400124] ata3.00: configured for UDMA/133
[ 36.405841] scsi 2:0:0:0: Direct-Access TOSHIBA DT01ACA300 MX6O PQ: 0 ANSI: 5
[ 36.416134] sd 2:0:0:0: [sdc] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
[ 36.423944] sd 2:0:0:0: [sdc] 4096-byte physical blocks
[ 36.429474] sd 2:0:0:0: [sdc] Write Protect is off
[ 36.434296] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 36.434383] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 36.492639] sdc: sdc1 sdc2 sdc3
[ 36.498097] sd 2:0:0:0: [sdc] Attached SCSI disk
[ 36.734037] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 36.740920] ata4.00: ATA-8: ST3000DM001-9YN166, CC9D, max UDMA/133
[ 36.747123] ata4.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 36.765868] ata4.00: configured for UDMA/133
[ 36.770705] scsi 3:0:0:0: Direct-Access ATA ST3000DM001-9YN166 CC9D PQ: 0 ANSI: 5
[ 36.780826] sd 3:0:0:0: [sdd] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
[ 36.788631] sd 3:0:0:0: [sdd] 4096-byte physical blocks
[ 36.794159] sd 3:0:0:0: [sdd] Write Protect is off
[ 36.798967] sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[ 36.799067] sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 36.866594] sdd: sdd1 sdd2 sdd3
[ 36.871856] sd 3:0:0:0: [sdd] Attached SCSI disk
[ 37.097036] ata5: SATA link down (SStatus 0 SControl 300)
[ 37.407028] ata6: SATA link down (SStatus 0 SControl 300)
[ 37.413461] sata_sil24 0000:02:00.0: version 1.1
[ 38.415188] scsi6 : sata_sil24
[ 38.420917] ata7: SATA max UDMA/100 host m128@0xfeafdc00 port 0xfeafe000 irq 17
[ 38.428333] ata7: limiting SATA link speed to 1.5 Gbps
[ 40.453041] ata7: SATA link down (SStatus 0 SControl 10)
[ 40.525638] md: linear personality registered for level -1
[ 40.531177] md: raid0 personality registered for level 0
[ 40.536521] md: raid1 personality registered for level 1
[ 40.541841] md: raid10 personality registered for level 10
[ 40.547468] md: raid6 personality registered for level 6
[ 40.552806] md: raid5 personality registered for level 5
[ 40.558131] md: raid4 personality registered for level 4
[ 40.563878] device-mapper: ioctl: 4.25.0-ioctl (2013-06-26) initialised: dm-devel@redhat.com
[ 41.792950] md: Autodetecting RAID arrays.
[ 41.860661] md: invalid raid superblock magic on sdb3
[ 41.865732] md: sdb3 does not have a valid v0.90 superblock, not importing!
[ 41.931504] md: invalid raid superblock magic on sdc3
[ 41.936579] md: sdc3 does not have a valid v0.90 superblock, not importing!
[ 42.011564] md: invalid raid superblock magic on sdd3
[ 42.016634] md: sdd3 does not have a valid v0.90 superblock, not importing!
[ 42.023613] md: Scanned 9 and added 6 devices.
[ 42.028074] md: autorun ...
[ 42.030882] md: considering sdb1 ...
[ 42.034481] md: adding sdb1 ...
[ 42.037723] md: sdb2 has different UUID to sdb1
[ 42.042279] md: adding sdc1 ...
[ 42.045524] md: sdc2 has different UUID to sdb1
[ 42.050078] md: adding sdd1 ...
[ 42.053323] md: sdd2 has different UUID to sdb1
[ 42.057893] md: created md0
[ 42.060704] md: bind<sdd1>
[ 42.063457] md: bind<sdc1>
[ 42.066206] md: bind<sdb1>
[ 42.068950] md: running: <sdb1><sdc1><sdd1>
[ 42.073725] md/raid1:md0: active with 3 out of 4 mirrors
[ 42.079098] md0: detected capacity change from 0 to 2549940224
[ 42.085242] md: considering sdb2 ...
[ 42.088859] md: adding sdb2 ...
[ 42.092108] md: adding sdc2 ...
[ 42.095375] md: adding sdd2 ...
[ 42.099646] md: created md1
[ 42.102482] md: bind<sdd2>
[ 42.105298] md: bind<sdc2>
[ 42.108058] md: bind<sdb2>
[ 42.110802] md: running: <sdb2><sdc2><sdd2>
[ 42.115658] md/raid1:md1: active with 3 out of 4 mirrors
[ 42.121091] md1: detected capacity change from 0 to 2147418112
[ 42.127262] md: ... autorun DONE.
[ 42.160627] md0: unknown partition table
[ 45.991142] EXT4-fs (md0): barriers disabled
[ 45.999122] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts:
[ 48.452862] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: barrier=1
[ 50.658451] EXT4-fs (md0): re-mounted. Opts: (null)
[ 56.186358] sd 1:0:0:0: Attached scsi generic sg0 type 0
[ 56.191918] sd 2:0:0:0: Attached scsi generic sg1 type 0
[ 56.197487] sd 3:0:0:0: Attached scsi generic sg2 type 0
[ 56.451316] md1: unknown partition table
[ 56.503910] Adding 2097084k swap on /dev/md1. Priority:-1 extents:1 across:2097084k
[ 57.981887] Btrfs loaded
[ 58.055705] 2015-12-4 4:3:34 UTC
[ 58.058976] synobios: load, major number 201
[ 58.063296] Brand: Synology
[ 58.066154] Model: DS-411+II
[ 62.993923] md: md2 stopped.
[ 63.012258] md: bind<sdc3>
[ 63.015352] md: bind<sdd3>
[ 63.018384] md: bind<sdb3>
[ 63.023658] md/raid:md2: device sdb3 operational as raid disk 1
[ 63.029630] md/raid:md2: device sdd3 operational as raid disk 3
[ 63.035616] md/raid:md2: device sdc3 operational as raid disk 2
[ 63.042488] md/raid:md2: allocated 4346kB
[ 63.046887] md/raid:md2: raid level 5 active with 3 out of 4 devices, algorithm 2
[ 63.054434] RAID conf printout:
[ 63.054447] --- level:5 rd:4 wd:3
[ 63.054456] disk 1, o:1, dev:sdb3
[ 63.054462] disk 2, o:1, dev:sdc3
[ 63.054468] disk 3, o:1, dev:sdd3
[ 63.054560] md2: detected capacity change from 0 to 8987271954432
[ 63.089666] md2: unknown partition table
[ 67.300535] EXT4-fs (dm-4): barriers disabled
[ 67.414398] EXT4-fs (dm-4): mounted filesystem with writeback data mode. Opts: usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,synoacl,data=writeback,oldalloc
[ 68.172962] EXT4-fs (dm-3): barriers disabled
[ 68.992074] JBD2: journal transaction 9300484 on dm-3-8 is corrupt.
[ 68.998374] EXT4-fs (dm-3): error loading journal
[ 70.205127] ata2.00: configured for UDMA/133
[ 70.209506] ata2: EH complete
[ 70.215114] ata4.00: configured for UDMA/133
[ 70.219465] ata4: EH complete
[ 70.222530] ata3.00: configured for UDMA/133
[ 70.226879] ata3: EH complete
[ 74.608244] ata2.00: configured for UDMA/133
[ 74.612659] ata2: EH complete
[ 74.864621] ata3.00: configured for UDMA/133
[ 74.868967] ata3: EH complete
[ 75.224228] ata4.00: configured for UDMA/133
[ 75.228584] ata4: EH complete
Code: Select all
> fdisk -l
fdisk: device has more than 2^32 sectors, can't use all of them
Disk /dev/sdb: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 1 267350 2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them
Disk /dev/sdc: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 1 267350 2147483647+ ee EFI GPT
fdisk: device has more than 2^32 sectors, can't use all of them
Disk /dev/sdd: 2199.0 GB, 2199023255040 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdd1 1 267350 2147483647+ ee EFI GPT
Disk /dev/synoboot: 128 MB, 128450560 bytes
255 heads, 63 sectors/track, 15 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/synoboot1 * 1 2 16033+ 83 Linux
/dev/synoboot2 3 14 96390 83 Linux
Code: Select all
> pvdisplay
--- Physical volume ---
PV Name /dev/md2
VG Name vg1
PV Size 8.17 TB / not usable 2.44 MB
Allocatable yes
PE Size (KByte) 4096
Total PE 2142732
Free PE 438793
Allocated PE 1703939
PV UUID ki4s01-uXuM-IDOl-x2Tb-s9dX-y2pf-iHDE3K
Code: Select all
> vgdisplay
--- Volume group ---
VG Name vg1
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 8
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 3
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size 8.17 TB
PE Size 4.00 MB
Total PE 2142732
Alloc PE / Size 1703939 / 6.50 TB
Free PE / Size 438793 / 1.67 TB
VG UUID VtDoSS-Rc3x-2q4q-DOig-ellf-p1sS-bkKR61
Code: Select all
> lvdisplay
--- Logical volume ---
LV Name /dev/vg1/syno_vg_reserved_area
VG Name vg1
LV UUID hoRzk1-zeP0-FMM1-fPvu-gGKe-Bt5W-FQy1rk
LV Write Access read/write
LV Status available
# open 0
LV Size 12.00 MB
Current LE 3
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 384
Block device 253:0
--- Logical volume ---
LV Name /dev/vg1/volume_1
VG Name vg1
LV UUID lWYX0l-lth9-u8y1-M4ce-g5c9-XaUH-h880ic
LV Write Access read/write
LV Status available
# open 1
LV Size 5.50 TB
Current LE 1441792
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 4096
Block device 253:1
--- Logical volume ---
LV Name /dev/vg1/volume_2
VG Name vg1
LV UUID kI1EpE-oFiQ-va02-ABJG-f9WA-7U5X-UBPi31
LV Write Access read/write
LV Status available
# open 1
LV Size 1.00 TB
Current LE 262144
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 4096
Block device 253:2
Code: Select all
> lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
syno_vg_reserved_area vg1 -wi-a- 12.00M
volume_1 vg1 -wi-ao 5.50T
volume_2 vg1 -wi-ao 1.00T
Code: Select all
> cat /etc/fstab
none /proc proc defaults 0 0
/dev/root / ext4 defaults 1 1
/dev/mapper/vol1-origin /volume1 ext4 usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,synoacl 0 0
/dev/mapper/vol2-origin /volume2 ext4 usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,synoacl 0 0
Code: Select all
> mount
/dev/root on / type ext4 (defaults)
/sys on /sys type sysfs (0)
none on /dev/pts type devpts (gid=4,mode=620)
/tmp on /tmp type tmpfs (0)
/run on /run type tmpfs (rw,nosuid,nodev,relatime,mode=755)
/dev/shm on /dev/shm type tmpfs (rw,nosuid,nodev,relatime)
none on /sys/fs/cgroup type tmpfs (uid=0,gid=0,mode=0755,size=4k)
/dev/bus/usb on /proc/bus/usb type bind (bind)
none on /sys/kernel/debug type debugfs (0)
/dev/mapper/vol2-origin on /volume2 type ext4 (usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,synoacl)
securityfs on /sys/kernel/security type securityfs (0)
Code: Select all
> df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 2385528 888008 1378736 40% /
/tmp 507440 656 506784 1% /tmp
/run 507440 2804 504636 1% /run
/dev/shm 507440 0 507440 0% /dev/shm
none 4 0 4 0% /sys/fs/cgroup
/dev/bus/usb 503400 4 503396 1% /proc/bus/usb
/dev/mapper/vol2-origin 1056758196 39557092 1017082320 4% /volume2
Code: Select all
> fsck.ext4 -n /dev/vg1/volume_1
e2fsck 1.42.6 (21-Sep-2012)
Warning! /dev/vg1/volume_1 is in use.
Warning: skipping journal recovery because doing a read-only filesystem check.
1.42.6-4482: is cleanly umounted, 639158/184549376 files, 1075425181/1476395008 blocks
Code: Select all
> mount -t ext4 -o ro,noload /dev/vg1/volume_1 /vol1tst/
mount: mounting /dev/vg1/volume_1 on /vol1tst/ failed: Device or resource busy
Code: Select all
> dmesg | tail
[ 74.612659] ata2: EH complete
[ 74.864621] ata3.00: configured for UDMA/133
[ 74.868967] ata3: EH complete
[ 75.224228] ata4.00: configured for UDMA/133
[ 75.228584] ata4: EH complete
[ 75.238476] loop: module loaded
[ 83.593195] findhostd uses obsolete (PF_INET,SOCK_PACKET)
[ 1616.734365] EXT4-fs (dm-3): barriers disabled
[ 1617.366499] JBD2: journal transaction 9300484 on dm-3-8 is corrupt.
[ 1617.372799] EXT4-fs (dm-3): error loading journal
Here are some of the more interesting finds on my search for recovering from this nightmare.
http://forum.synology.com/wiki/index.ph ... s_on_Linux
https://serverfault.com/questions/4906/ ... sk-cloning
https://superuser.com/questions/605435/ ... ng-journal
http://forum.synology.com/enu/viewtopic ... 39&t=61747
https://serverfault.com/questions/16702 ... r-bad-bloc
May no one else have to suffer through something like this.