Bug or disk fail?

Topics pertaining to volume setup, boot/shutdown, initialization, DSM update, HDD migration.
Forum rules
This is a user forum for Synology users to share experience/help out each other: if you need direct assistance from the Synology technical support team, please use the following form:
https://myds.synology.com/support/suppo ... p?lang=enu

Bug or disk fail?

Postby ginopilotino » Thu Dec 16, 2010 11:11 am

Synology ds211 with two samsung hd204ui In raid1 config.

Somethimes the raid degrade ad i have to repair it

I got this on dmesg:
Code: Select all
[87145.150000] ata1: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen
[87145.160000] ata1: edma_err_cause=00000020 pp_flags=00000000, SError=00180000
[87145.170000] ata1: SError: { 10B8B Dispar }
[87145.170000] ata1: hard resetting link
[87145.690000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
[87145.730000] ata1.00: configured for UDMA/133
[87145.730000] ata1: EH complete
[88168.680000] ata1: exception Emask 0x10 SAct 0x0 SErr 0x10002 action 0xe frozen
[88168.690000] ata1: edma_err_cause=00000008 pp_flags=00000000, dev disconnect
[88168.700000] ata1: SError: { RecovComm PHYRdyChg }
[88168.700000] ata1: hard resetting link
[88169.460000] ata1: SATA link down (SStatus 0 SControl F300)
[88171.460000] ata1: hard resetting link
[88171.810000] ata1: SATA link down (SStatus 0 SControl F300)
[88173.810000] ata1: hard resetting link
[88174.160000] ata1: SATA link down (SStatus 0 SControl F300)
[88174.160000] ata1.00: disabled
[88174.160000] ata1: EH complete
[88174.170000] sd 0:0:0:0: rejecting I/O to offline device
[88174.170000] ata1.00: detaching (SCSI 0:0:0:0)
[88174.180000] sd 0:0:0:0: rejecting I/O to offline device
[88174.180000] sd 0:0:0:0: rejecting I/O to offline device
[88174.180000] end_request: I/O error, dev sda, sector 4980608
[88174.180000] md: super_written gets error=-5, uptodate=0
[88174.180000] raid1: Disk failure on sda1, disabling device.
[88174.180000]         Operation continuing on 1 devices
[88174.210000] RAID1 conf printout:
[88174.210000]  --- wd:1 rd:2
[88174.210000]  disk 0, wo:0, o:1, dev:sdb1
[88174.210000]  disk 1, wo:1, o:0, dev:sda1
[88174.220000] RAID1 conf printout:
[88174.220000]  --- wd:1 rd:2
[88174.220000]  disk 0, wo:0, o:1, dev:sdb1
[88174.280000] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[88174.310000] sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00
[88174.320000] sd 0:0:0:0: [sda] Stopping disk
[88174.380000] sd 0:0:0:0: [sda] START_STOP FAILED
[88174.380000] sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00
[88174.440000] SynoCheckRdevIsWorking (7636): remove active disk sda3 from md2 raid_disks 2 mddev->degraded 0 mddev->level 1
[88174.450000] raid1: Disk failure on sda3, disabling device.
[88174.450000]         Operation continuing on 1 devices
[88174.460000] syno_hot_remove_disk (7536): cannot remove active disk sda3 from md2 ... rdev->raid_disk 0 pending 0
[88174.510000] RAID1 conf printout:
[88174.510000]  --- wd:1 rd:2
[88174.510000]  disk 0, wo:1, o:0, dev:sda3
[88174.520000]  disk 1, wo:0, o:1, dev:sdb3
[88174.520000] RAID1 conf printout:
[88174.520000]  --- wd:1 rd:2
[88174.530000]  disk 1, wo:0, o:1, dev:sdb3
[88175.480000] SynoCheckRdevIsWorking (7636): remove active disk sda3 from md2 raid_disks 2 mddev->degraded 1 mddev->level 1
[88175.490000] md: unbind<sda3>
[88175.490000] md: export_rdev(sda3)
[88175.530000] SynoCheckRdevIsWorking (7636): remove active disk sda2 from md1 raid_disks 2 mddev->degraded 0 mddev->level 1
[88175.540000] raid1: Disk failure on sda2, disabling device.
[88175.540000]         Operation continuing on 1 devices
[88175.550000] syno_hot_remove_disk (7536): cannot remove active disk sda2 from md1 ... rdev->raid_disk 1 pending 0
[88175.580000] RAID1 conf printout:
[88175.580000]  --- wd:1 rd:2
[88175.580000]  disk 0, wo:0, o:1, dev:sdb2
[88175.580000]  disk 1, wo:1, o:0, dev:sda2
[88175.590000] RAID1 conf printout:
[88175.590000]  --- wd:1 rd:2
[88175.590000]  disk 0, wo:0, o:1, dev:sdb2
[88176.570000] SynoCheckRdevIsWorking (7636): remove active disk sda2 from md1 raid_disks 2 mddev->degraded 1 mddev->level 1
[88176.580000] md: unbind<sda2>
[88176.580000] md: export_rdev(sda2)
[88176.620000] SynoCheckRdevIsWorking (7636): remove active disk sda1 from md0 raid_disks 2 mddev->degraded 1 mddev->level 1
[88176.630000] md: unbind<sda1>
[88176.630000] md: export_rdev(sda1)
[88580.030000] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
[88580.030000] ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
[88580.040000] ata1: SError: { PHYRdyChg DevExch }
[88580.040000] ata1: hard resetting link
[88580.800000] ata1: SATA link down (SStatus 0 SControl F300)
[88580.800000] ata1: EH complete
[88594.340000] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
[88594.350000] ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
[88594.360000] ata1: SError: { PHYRdyChg DevExch }
[88594.360000] ata1: hard resetting link
[88595.120000] ata1: SATA link down (SStatus 0 SControl F300)
[88595.120000] ata1: EH complete
[88599.120000] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
[88599.130000] ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
[88599.140000] ata1: SError: { PHYRdyChg DevExch }
[88599.140000] ata1: limiting SATA link speed to 1.5 Gbps
[88599.150000] ata1: hard resetting link
[88599.900000] ata1: SATA link down (SStatus 0 SControl F300)
[88599.900000] ata1: EH complete
[88600.700000] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
[88600.710000] ata1: edma_err_cause=00000010 pp_flags=00000000, dev connect
[88600.720000] ata1: SError: { PHYRdyChg DevExch }
[88600.720000] ata1: limiting SATA link speed to 1.5 Gbps
[88600.730000] ata1: hard resetting link
[88602.080000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
[88602.100000] ata1.00: ATA-8: SAMSUNG HD204UI, 1AQ10001, max UDMA/133
[88602.100000] ata1.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[88602.130000] ata1.00: configured for UDMA/133
[88602.130000] ata1: EH complete
[88602.130000] scsi 0:0:0:0: Direct-Access     SAMSUNG HD204UI          1AQ1 PQ: 0 ANSI: 5
[88602.150000] sd 0:0:0:0: [sda] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[88602.160000] sd 0:0:0:0: Attached scsi generic sg0 type 0
[88602.190000] sd 0:0:0:0: [sda] Write Protect is off
[88602.190000] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[88602.190000] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[88602.220000]  sda: sda1 sda2 sda3
[88602.240000] sd 0:0:0:0: [sda] Attached SCSI disk
[88602.670000] fuse init (API version 7.13)
[88603.790000] ata1.00: configured for UDMA/133
[88603.790000] ata1: EH complete
[88603.810000] ata2.00: configured for UDMA/133
[88603.810000] ata2: EH complete
[89790.720000] md: bind<sda1>
[89790.770000] RAID1 conf printout:
[89790.770000]  --- wd:1 rd:2
[89790.770000]  disk 0, wo:0, o:1, dev:sdb1
[89790.780000]  disk 1, wo:1, o:1, dev:sda1
[89790.780000] md: recovery of RAID array md0
[89790.790000] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[89790.790000] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[89790.800000] md: using 128k window, over a total of 2490176 blocks.
[89791.090000] md: bind<sda2>
[89791.180000] RAID1 conf printout:
[89791.180000]  --- wd:1 rd:2
[89791.180000]  disk 0, wo:0, o:1, dev:sdb2
[89791.190000]  disk 1, wo:1, o:1, dev:sda2
[89791.200000] md: delaying recovery of md1 until md0 has finished (they share one or more physical units)
[89794.220000] md: bind<sda3>
[89794.780000] RAID1 conf printout:
[89794.780000]  --- wd:1 rd:2
[89794.780000]  disk 0, wo:1, o:1, dev:sda3
[89794.790000]  disk 1, wo:0, o:1, dev:sdb3
[89794.790000] md: delaying recovery of md2 until md1 has finished (they share one or more physical units)
[89794.800000] md: delaying recovery of md1 until md0 has finished (they share one or more physical units)
[89827.410000] md: md0: recovery done.
[89827.430000] md: recovery of RAID array md1
[89827.430000] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[89827.440000] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[89827.450000] md: using 128k window, over a total of 2097088 blocks.
[89827.460000] md: delaying recovery of md2 until md1 has finished (they share one or more physical units)
[89827.510000] RAID1 conf printout:
[89827.510000]  --- wd:2 rd:2
[89827.510000]  disk 0, wo:0, o:1, dev:sdb1
[89827.520000]  disk 1, wo:0, o:1, dev:sda1
[89849.450000] md: md1: recovery done.
[89849.470000] md: recovery of RAID array md2
[89849.470000] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[89849.480000] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[89849.490000] md: using 128k window, over a total of 1948793280 blocks.
[89849.540000] RAID1 conf printout:
[89849.540000]  --- wd:2 rd:2
[89849.540000]  disk 0, wo:0, o:1, dev:sdb2
[89849.550000]  disk 1, wo:0, o:1, dev:sda2
ginopilotino
Novice
Novice
 
Posts: 56
Joined: Thu Nov 18, 2010 8:53 pm
Location: Italia

Re: Bug or disk fail?

Postby maxxfi » Thu Dec 16, 2010 12:23 pm

Does DS211 use SATA cables?
Can it be they are defective?
DS-411 (DSM 4.2-3202) w/ 2x WD10EFRX + 1x HD154UI
DS-106j (DSM 3.0-1357), PATA-to-SATA adapter, 2.5" HM250HI
User avatar
maxxfi
Architect
Architect
 
Posts: 4315
Joined: Sun Dec 27, 2009 12:13 pm
Location: Espoo, Finland

Re: Bug or disk fail?

Postby markglh » Thu Dec 16, 2010 1:35 pm

I've had this happen to me too after shutting down and booting back up :(

In addition to my other problem:
http://forum.synology.com/enu/viewtopic.php?f=19&t=31364

related maybe?
markglh
Beginner
Beginner
 
Posts: 21
Joined: Thu Dec 09, 2010 8:41 am

Re: Bug or disk fail?

Postby ginopilotino » Thu Dec 16, 2010 4:57 pm

May be, but spinup time never failed to me. But one of my disk has throughput performance below 100.
It's always the same disk that degrade my raid1.
I'm testing deeply my hd204ui with samsung tool
ginopilotino
Novice
Novice
 
Posts: 56
Joined: Thu Nov 18, 2010 8:53 pm
Location: Italia

Re: Bug or disk fail?

Postby ginopilotino » Thu Dec 16, 2010 4:59 pm

maxxfi wrote:Does DS211 use SATA cables?
Can it be they are defective?


No cable in ds211
ginopilotino
Novice
Novice
 
Posts: 56
Joined: Thu Nov 18, 2010 8:53 pm
Location: Italia

Re: Bug or disk fail?

Postby ginopilotino » Thu Dec 16, 2010 5:47 pm

My ds211 is bugged? My samsung is bugged? Or I have to wait for a new firmware?
ginopilotino
Novice
Novice
 
Posts: 56
Joined: Thu Nov 18, 2010 8:53 pm
Location: Italia

Re: Bug or disk fail?

Postby ginopilotino » Thu Dec 16, 2010 9:13 pm

Could be due to the CCLT/TLER/ERC issue?
ginopilotino
Novice
Novice
 
Posts: 56
Joined: Thu Nov 18, 2010 8:53 pm
Location: Italia

Re: Bug or disk fail?

Postby ginopilotino » Fri Dec 17, 2010 7:34 pm

UPDATE

I disabled the hybernation of hard disks. About 24h without problems.
ginopilotino
Novice
Novice
 
Posts: 56
Joined: Thu Nov 18, 2010 8:53 pm
Location: Italia

Re: Bug or disk fail?

Postby markglh » Fri Dec 17, 2010 8:16 pm

thanks for the update. If this happens again to me I'll disable that. Not an ideal solution tho imho.
markglh
Beginner
Beginner
 
Posts: 21
Joined: Thu Dec 09, 2010 8:41 am

Re: Bug or disk fail?

Postby ginopilotino » Fri Dec 17, 2010 8:27 pm

markglh wrote:thanks for the update. If this happens again to me I'll disable that. Not an ideal solution tho imho.


If it happens again, try to see what appear in dmesg
ginopilotino
Novice
Novice
 
Posts: 56
Joined: Thu Nov 18, 2010 8:53 pm
Location: Italia

Re: Bug or disk fail?

Postby xcopy » Fri Dec 17, 2010 9:34 pm

I'm guessing that the Samsung drives are causing your problems. I'm betting disk FAILURE... :(

I've had 3 of the new F4 2TB drives fail in the last 30 days, 2 of them failed within 1-2 days of first booting and loading 1TB+ of data, and the other lasted 30 days total, from start to ugly end. I was using the drives infrequently as backups in external hard drive docks, and the issues began when they were booted. Windows (7 Pro x64) told me there were errors (bad blocks is what this means) and would ask me if I wanted to scan first or continue without scanning. The choice to scan would fail every time. I'm sure what you're seeing is the drives coming out of hibernation and bad blocks are being reported. the DS may be overriding it, but I'm 99% sure the drives - and data ? - are on their way out.

Samsung is trying to keep the news quiet (what a surprise) but search for "samsung f4 firmware" or something like that and you'll find lots of info. It's not happening to everyone right away, but sooner or later the drive will fail when certain conditions are met. I had my problems confirmed by Samsung support (in a poorly worded, broken-english email) telling me to patch the drives and telling me data loss, if any, was my problem. The good news is that IF you can patch the drives now, you shouldn't lose any/much data. I wasn't that lucky and some files and folders were so corrupted that I couldn't read, access, or delete them. The patch process involves some work I guess, but I would immediately back up everything you can if you haven't already.

The firmware problem means every single Samsung F4 drive is bad. Not some, but all, as in 100% of them are bad..

Samsung released the patch a week ago to "fix" this but they've said they won't have clean drives in the sales channel until February. Until then, everyone is going to be buying defective drives, including me; I have two more drives on the way, but I'll patch them before using them. Regardless of whether you think you do or do not have the problem, you DO have it; it's guaranteed.

Good luck.
xcopy
Experienced
Experienced
 
Posts: 121
Joined: Fri Dec 03, 2010 4:34 pm

Re: Bug or disk fail?

Postby ginopilotino » Sat Dec 18, 2010 2:48 pm

The patch is related to the bug in the smart command. No other problems are known for this drive. As I tested it a lot of time on my pc, I'm sure the problem is not related to the drive, but may be to the ds or to an incompatibility. I enabled hibernation yesterday but no problem until now.
ginopilotino
Novice
Novice
 
Posts: 56
Joined: Thu Nov 18, 2010 8:53 pm
Location: Italia

Re: Bug or disk fail?

Postby ginopilotino » Sun Dec 19, 2010 12:42 pm

I think it was a mechanical problem. When I tighten the screws, my hard drive move a little, soo I think the problem is there. Now a tighten screws less so the drive stay straight. No problem in over 2 days.
ginopilotino
Novice
Novice
 
Posts: 56
Joined: Thu Nov 18, 2010 8:53 pm
Location: Italia

Re: Bug or disk fail?

Postby xcopy » Wed Dec 22, 2010 11:37 pm

ginopilotino wrote:The patch is related to the bug in the smart command. No other problems are known for this drive. As I tested it a lot of time on my pc, I'm sure the problem is not related to the drive, but may be to the ds or to an incompatibility. I enabled hibernation yesterday but no problem until now.



Sorry, but I completely disagree with that statement. There absolutely is a problem with these drives and it has nothing to do with a DS. There are tons of people that have reported data loss problems, including me and I never ran any of the smart tests. All three drives failed running in USB 2.0 docks.

Go read reviews on any technical site if you doubt there's a problem. Samsung told me the data loss issues I experienced were related to the problem.

I hope the patch works, because I have two new drives in transit as i write this.
xcopy
Experienced
Experienced
 
Posts: 121
Joined: Fri Dec 03, 2010 4:34 pm

Re: Bug or disk fail?

Postby ginopilotino » Thu Dec 23, 2010 3:24 pm

No need to execute a smart test. Data loss is related to the identify smart command. Apply the patch and live happy :)
ginopilotino
Novice
Novice
 
Posts: 56
Joined: Thu Nov 18, 2010 8:53 pm
Location: Italia


Return to Installation, Configuration, Migration, Expansion

Who is online

Users browsing this forum: pkarnezis and 6 guests