Creating a Flash-enabled Fault-tolerant Cluster
Adding the necessary SSDs and RAID controllers, or Fusion-io cards, is only the first step in adding acceleration to a VMware cluster.
You then need to think about how you are going to manage the cluster, and how it will be tolerant of failure without data loss.
You could simply create a datastore from your flash hardware, and svMotion your VM onto it. But what happens to your VM when the host fails? If the storage isn’t shared your VM won’t be recovered by VMware HA.
- You could use VMware’s vFRC as a read-cache?—?but this then needs to be configured on a per-disk basis, and in any case only offers read caching?—?not write.
With read caching, the read operation is copied to flash, with this flash then used for the next read. This means there is no need for fault tolerance as all changes are made directly to SAN.
However, write caching does need fault tolerance, as the write is acknowledged back to the VM as soon as the write to flash completes, with the data then being destaged to SAN afterwards. This means that without fault tolerance there is risk of data loss should the flash device or ESXi host fail, before the write has been destaged.
Happily, the flash virtualisation software we are using mitigates this scenario by mirroring the writes to another flash device in another ESXi host within the cluster. This mirror is then discarded once the destage to SAN has completed.
If the host running a VM should fail, the replica host will destage the pending writes to SAN with the VM then being rebooted by HA as normal.
If the replica host should fail, the policy for the VM will be changed back to read-cache if there are not enough replica hosts to satisfy the write poliy. Once another host has been selected, the VM’s policy will be automatically changed back to use write-caching.
- When a VM is vMotioned by DRS it simply continues to access the server-side flash on its original host in the short-term, while populating the cache on its new host. This means there is no loss of performance following a vMotion event.
Hopefully the details above demonstrate how our chosen flash acceleration vendor?—?PernixData?—?provides us with enterprise-class features and fault tolerance.
Feel free to contact us to discuss using our SSD-acclerated Cloud platform or even for some consultancy on deploying this setup on your own on-premise infrastructure 🙂
Our specialists have the answer