ESX 4.1.0 NFS VM Corruption

This room is for the discussion of how the Synology DiskStation can meet the storage needs for Virtual HyperVisors.

ESX 4.1.0 NFS VM Corruption

Postby tbone22 » Tue Jan 31, 2012 1:00 am

I have an ESX server (version 4.1.0) setup to use NFS on our DS1511+. The setup was pretty easy, and all of VMs were running fine until we lost a drive [5] the other day. It showed up as defective in the log, and crashed in the Storage Manager but since we have it configured for Hybrid RAID w/1 disk FT everything stayed up, or so we thought. All 4 of the VMs on the ESX server suffered severe IO errors resulting in massive data corruptions. Upon rebooting the VMs they all went into checkdisk startup and found all kinds of issues. All but one of the machines wouldn't boot! The one that did bbot had so many "bad image" exceptions while reading files that it was deemed worthless and had to be rebuilt.

The reason we purchased the DS1511+ was to ensure our VMs would survive a drive failure. There are mission critical VMs! Can anyone offer any help as to how to prevent this from happening again. I have a new drive and I'm going to replace it in the DS1511+ tomorrow. I've completely rebuilt the VMs on the ESX server, and it seems to be running fine now. Will they all crash again if we have another drive failure after the RAID rebuilds!?!?!

Please help!

Thanks,

TBone
tbone22
I'm New!
I'm New!
 
Posts: 4
Joined: Tue Jan 31, 2012 12:40 am

Re: ESX 4.1.0 NFS VM Corruption

Postby hans_lenze » Wed Feb 01, 2012 11:13 pm

There are mission critical VMs! Can anyone offer any help as to how to prevent this from happening again.


Easy. Make backups.
DS411 (3x 2TB RAID5, 1x 128GB SSD), 2x HP Microserver ESXi 5.0
hans_lenze
Beginner
Beginner
 
Posts: 29
Joined: Tue Dec 20, 2011 1:04 am

Re: ESX 4.1.0 NFS VM Corruption

Postby tbone22 » Thu Feb 02, 2012 4:03 am

Easy. Make backups.


Wow, excellent idea! Maybe we should just use a typewriter instead, that could work too, right?

Thankfully I have backups for all 4 VMs. This is not the first time this has happened, unfortunately. The whole purpose of purchasing a device like this, for us, was fault tolerance. I don't want the downtime while restoring from backups.

Has anyone else had corruption of their VMs when configured with ESX on NFS?
tbone22
I'm New!
I'm New!
 
Posts: 4
Joined: Tue Jan 31, 2012 12:40 am

Re: ESX 4.1.0 NFS VM Corruption

Postby gabi_cavaller » Wed Feb 08, 2012 5:38 pm

hans_lenze wrote:
There are mission critical VMs! Can anyone offer any help as to how to prevent this from happening again.


Easy. Make backups.


That's a ridiculous answer.

The whole point of tbone22 having the 1511 is for redundancy/FT and uptime, therefore, irrespective of one hard drive going down, the Synology device should still be running at optimum* performance, without the loss of any data.

I am sure that tbone22 knows that RAID is not a backup!

tbone22 - what are your VM's running and what disks are you using ? I wonder if there is a NFS issue on this box instead of iSCSI.

G.
gabi_cavaller
Trainee
Trainee
 
Posts: 13
Joined: Wed Feb 08, 2012 5:34 pm

Re: ESX 4.1.0 NFS VM Corruption

Postby hans_lenze » Fri Feb 10, 2012 1:12 am

I don't think it's a ridiculous answer.

The only way to prevent dataloss in the event of a volume corruption/IO stream failure is having a copy of the data in a different location. In other words: a backup.

I'll give a plausible reason for dataloss in this case: Synology DSM uses kernel 2.6.32 and SHR uses MDADM for the RAID volumes which are merged together using LVM on which an EXT4 volume resides. The standard SATA disk that is available today has anywhere between 16 and 64MB of on-disk write cache to enhance performance as well as Native Command Queuing (NCQ). I'll assume on-disk write cache and NCQ have been enabled (Synology default). MDADM has a queue as well that can contain 1024 operations (Synology default for RAID5 sets).
The data for the LVM volume is written to MDADM. LVM in combination with this kernel version and EXT3 or EXT4 is known to screw up some journalling in case of a disk drive failure caused by a lack of barrier passthrough.

This has been fixed in kernel 2.6.33

Worst case: MDADM RAID5 write hole
In the Worst Case event of a drive failure, the MDADM RAID set fails because the write hole screws up the superblock persistency, taking a significant part of the LVM volume with it, causing the LVM volume to crash and take all your important data with it. Synology support is usually able to restore most of your data but you will encounter downtime. (You can also restore it yourself if you know how to use the command line.)

To eliminate the use of LVM I'd use standard RAID on a Synology device.

In my opinion, getting a Synology DiskStation DS1511+ for redundancy and/or fault tolerance is ridiculous. The only part that's redundant is the RAID set and that redundancy hinges on a software implementation. That being said: contact Synology support. They're committed to the product and can really help you with troubleshooting and maybe a solution. My first step would be to get rid of LVM.
DS411 (3x 2TB RAID5, 1x 128GB SSD), 2x HP Microserver ESXi 5.0
hans_lenze
Beginner
Beginner
 
Posts: 29
Joined: Tue Dec 20, 2011 1:04 am

Re: ESX 4.1.0 NFS VM Corruption

Postby tbone22 » Mon Feb 13, 2012 11:48 pm

gabi_cavaller wrote:
hans_lenze wrote:
There are mission critical VMs! Can anyone offer any help as to how to prevent this from happening again.


Easy. Make backups.


That's a ridiculous answer.

The whole point of tbone22 having the 1511 is for redundancy/FT and uptime, therefore, irrespective of one hard drive going down, the Synology device should still be running at optimum* performance, without the loss of any data.

I am sure that tbone22 knows that RAID is not a backup!

tbone22 - what are your VM's running and what disks are you using ? I wonder if there is a NFS issue on this box instead of iSCSI.

G.


The VMs are running Windows Server 2003 (x86) and 2008 R2 (x64). The disks in the device are 5 Seagate ST2000DL003.
tbone22
I'm New!
I'm New!
 
Posts: 4
Joined: Tue Jan 31, 2012 12:40 am


Return to Virtual HyperVisors (VMWare/ESXi)

Who is online

Users browsing this forum: No registered users and 3 guests