Bug #5131
migration failure due to ceph system datastore being misused as a fs datastore
Status: | Closed | Start date: | 04/25/2017 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | - | % Done: | 0% | |
Category: | Drivers - Storage | |||
Target version: | - | |||
Resolution: | worksforme | Pull request: | ||
Affected Versions: | OpenNebula 5.2 |
Description
I have 2 ceph datastores: one image datastore (id 100) and one system datastore (id 101):
root@one-ctrl-1:~# onedatastore show 101 DATASTORE 101 INFORMATION ID : 101 NAME : ceph-system USER : oneadmin GROUP : oneadmin CLUSTERS : 0 TYPE : SYSTEM DS_MAD : - TM_MAD : ceph BASE PATH : /var/lib/one//datastores/101 DISK_TYPE : RBD STATE : READY DATASTORE CAPACITY TOTAL: : 56.7T FREE: : 54.1T USED: : 2.6T LIMIT: : - PERMISSIONS OWNER : uma GROUP : u-- OTHER : --- DATASTORE TEMPLATE BRIDGE_LIST="iaas-vm-1.u07.univ-nantes.prive iaas-vm-2.u07.univ-nantes.prive iaas-vm-3.u07.univ-nantes.prive iaas-vm-4.u07.univ-nantes.prive iaas-vm-5.u07.univ-nantes.prive" CEPH_HOST="172.20.107.54:6789 172.20.106.54:6789 172.20.108.54:6789" CEPH_SECRET="6f5cab54-404b-4c63-b883-65ae350be8e7" CEPH_USER="opennebula" DATASTORE_CAPACITY_CHECK="YES" DISK_TYPE="RBD" DS_MIGRATE="NO" POOL_NAME="opennebula" RESTRICTED_DIRS="/" SAFE_DIRS="/var/tmp" SHARED="YES" TM_MAD="ceph" TYPE="SYSTEM_DS"
When trying to migrate a VM it fails, because opennebula is trying to use the ceph datastore as a mounted fs datastore:
Tue Apr 25 15:46:16 2017 [Z0][VM][I]: New LCM state is SAVE_MIGRATE Tue Apr 25 15:46:21 2017 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/save 'one-425' '/var/lib/one//datastores/101/425/checkpoint' 'iaas-vm-4.u07.univ-nantes.prive' 425 iaas-vm-4.u07.univ-nantes.prive Tue Apr 25 15:46:21 2017 [Z0][VMM][E]: save: Command "virsh --connect qemu:///system save one-425 /var/lib/one//datastores/101/425/checkpoint" failed: error: Failed to save domain one-425 to /var/lib/one//datastores/101/425/checkpoint Tue Apr 25 15:46:21 2017 [Z0][VMM][I]: error: operation failed: domain save job: unexpectedly failed Tue Apr 25 15:46:21 2017 [Z0][VMM][E]: Could not save one-425 to /var/lib/one//datastores/101/425/checkpoint Tue Apr 25 15:46:21 2017 [Z0][VMM][I]: ExitCode: 1 Tue Apr 25 15:46:21 2017 [Z0][VMM][I]: Failed to execute virtualization driver operation: save. Tue Apr 25 15:46:21 2017 [Z0][VMM][E]: Error saving VM state: Could not save one-425 to /var/lib/one//datastores/101/425/checkpoint Tue Apr 25 15:46:21 2017 [Z0][VM][I]: New LCM state is RUNNING Tue Apr 25 15:46:21 2017 [Z0][LCM][I]: Fail to save VM state while migrating. Assuming that the VM is still RUNNING (will poll VM). Tue Apr 25 16:05:16 2017 [Z0][VM][I]: New LCM state is SAVE_MIGRATE Tue Apr 25 16:05:16 2017 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/save 'one-425' '/var/lib/one//datastores/101/425/checkpoint' 'iaas-vm-4.u07.univ-nantes.prive' 425 iaas-vm-4.u07.univ-nantes.prive Tue Apr 25 16:05:16 2017 [Z0][VMM][I]: bash: ligne 2: impossible de créer un fichier temporaire pour le « here-document » : Aucun espace disponible sur le périphérique Tue Apr 25 16:05:16 2017 [Z0][VMM][E]: save: Command "virsh --connect qemu:///system save one-425 /var/lib/one//datastores/101/425/checkpoint" failed: error: Failed to save domain one-425 to /var/lib/one//datastores/101/425/checkpoint Tue Apr 25 16:05:16 2017 [Z0][VMM][I]: error: operation failed: domain save job: unexpectedly failed Tue Apr 25 16:05:16 2017 [Z0][VMM][E]: Could not save one-425 to /var/lib/one//datastores/101/425/checkpoint Tue Apr 25 16:05:16 2017 [Z0][VMM][I]: ExitCode: 1 Tue Apr 25 16:05:16 2017 [Z0][VMM][I]: Failed to execute virtualization driver operation: save. Tue Apr 25 16:05:16 2017 [Z0][VMM][E]: Error saving VM state: Could not save one-425 to /var/lib/one//datastores/101/425/checkpoint Tue Apr 25 16:05:16 2017 [Z0][VM][I]: New LCM state is RUNNING Tue Apr 25 16:05:16 2017 [Z0][LCM][I]: Fail to save VM state while migrating. Assuming that the VM is still RUNNING (will poll VM).
There is no datastore mounted on /var/lib/one//datastores/101/ since datastore 101 is a ceph datastore. "/var/lib/one//datastores/101/" is just a local path on the host...
History
#1 Updated by Anton Todorov about 4 years ago
Hi,
IMHO it is an expected behavior.
There are still Linux distributions in support that are shipped with virtualization stack (libvirt) that cannot read the checkpoint file directly from a block device so the driver is storing the RAM dump on the local file system and then it is imported to a CEPH volume. Later on resume or when the VM is started on the other host the checkpoint file is extracted from the CEPH volume and served to libvirt to start the VM. So for this to work you must have enough free space on the HV node for this operation.
Kind Regards,
Anton Todorov
#2 Updated by Javi Fontan almost 4 years ago
- Category set to Drivers - Storage
- Status changed from Pending to Closed
- Resolution set to worksforme
As Anton said this is the expected behavior.