Self-Healing Failover Virtual Machines

Overview

To ensure maximum uptime and reliability for your workloads, Onidel offers Self-Healing Failover Virtual Machines powered by high-availability (HA) clustering. This feature automatically detects host failures and seamlessly restarts your virtual machines (VMs) on healthy nodes, minimising downtime and reducing manual intervention.

Currently, HA is available in Singapore, with expansion planned for Sydney and Amsterdam in upcoming phases.

How it works

Distributed Storage with Triple Replication

Your virtual machine disks are stored on NVMe-backed block storage, built on a Ceph-powered distributed storage system. Every block of data is replicated three times across independent nodes.

This ensures:

Fault tolerance: Even if a storage device or node fails, your data remains accessible.
High performance: NVMe technology delivers fast read/write speeds.
Consistency: Automatic synchronisation maintains data integrity across all replicas.

Continuous Node Monitoring

The infrastructure continuously monitors the health of all compute nodes in the cluster. If a node becomes unresponsive (due to hardware, network, or power issues), the system immediately detects the failure.

Automated VM Failover

Once a failure is detected:

A failover event is triggered.
Affected VMs are automatically restarted on a healthy node within the same cluster.
The storage layer ensures that the VM has access to its replicated data with no risk of corruption.

Automated VM Failover

The system is self-healing, meaning no manual action is required from you:

Failed nodes are automatically isolated.
Services are restored as quickly as possible with minimal downtime.
Once the failed node is repaired and re-joins the cluster, it automatically reintegrates into the pool of available resources.

Updated on: 20/08/2025

Was this article helpful?

Thank you!