Bug #3741
Snapshot revert stops working after some iterations
Status: | Closed | Start date: | 04/08/2015 | |
---|---|---|---|---|
Priority: | None | Due date: | ||
Assignee: | - | % Done: | 0% | |
Category: | Drivers - VM | |||
Target version: | Release 4.14 | |||
Resolution: | worksforme | Pull request: | ||
Affected Versions: | OpenNebula 4.10 |
Description
Hi, i'm running ONE 4.10.2. I have a VM with a qcow2 image that is running a job every 5 minutes. After finishing it, the VM is restored to a saved snapshot. For some reason, after a couple of iterations of reverting the snapshot, I get an error of the sort:
Wed Apr 8 11:27:24 2015 [Z0][VMM][I]: VM Snapshot successfully created.
Wed Apr 8 11:28:19 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:29:15 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:30:08 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:31:04 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:32:00 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:32:56 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:33:49 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:34:45 2015 [Z0][VMM][I]: VM Snapshot successfully reverted.
Wed Apr 8 11:34:45 2015 [Z0][VMM][I]: VM running but monitor state is POWEROFF
Wed Apr 8 11:34:45 2015 [Z0][DiM][I]: New VM state is POWEROFF
Wed Apr 8 11:35:35 2015 [Z0][VMM][I]: VM found again, state is RUNNING
Wed Apr 8 11:35:35 2015 [Z0][LCM][I]: New VM state is RUNNING
When this happens, the snapshot is lost (because VM state was changed).
Regards,
Joaquín
Related issues
History
#1 Updated by Ruben S. Montero over 6 years ago
- Related to Bug #3740: VM snapshots not visible after powercycle (poweroff / resume) added
#2 Updated by Ruben S. Montero over 6 years ago
Hi Joaquin
It seems that the monitor probe sends information while the VM is snapshoting... oned then assumes it's been powered off and clears the snapshots in its internal data. When the VM is found again it's moved to running but the snapshots are gone.
This will be solved by preserving snapshots during a powercycle, till then try to tune the timing of the action and the monitoring cycle (although playing with timing in a distributed system is not a good idea...).
Alternatively we can take a look to the monitor probe to report the VM as active while snapsotting.
Thanks
#3 Updated by Ruben S. Montero over 6 years ago
- Category set to Drivers - VM
- Status changed from Pending to New
- Target version set to Release 4.14
#4 Updated by Joaquin Rinaudo over 6 years ago
Hi, can the alternative solution of reporting the VM as active when snapshoting be implemented before the 4.14 release?
Unfortunately, tuning the timing of the action isn't possible since the idea is to revert as soon as the job has finished (and the job duration is different in each run). I can try including dead waiting times to even out the jobs to be able to sync to the monitoring cycle but it's not a great solution performance wise.
Thanks
#5 Updated by Ruben S. Montero about 6 years ago
Joaquin Rinaudo wrote:
Hi, can the alternative solution of reporting the VM as active when snapshoting be implemented before the 4.14 release?
Unfortunately, tuning the timing of the action isn't possible since the idea is to revert as soon as the job has finished (and the job duration is different in each run). I can try including dead waiting times to even out the jobs to be able to sync to the monitoring cycle but it's not a great solution performance wise.Thanks
We need to find out what's the state of a VM while snapshotting, basically the probe executes a virsh list and a dominfo. You may try to hack the state here:
https://github.com/OpenNebula/one/blob/master/src/vmm_mad/remotes/poll_xen_kvm.rb#L68
and here:
https://github.com/OpenNebula/one/blob/master/src/vmm_mad/remotes/poll_xen_kvm.rb#L271
#6 Updated by Joaquin Rinaudo about 6 years ago
When snapshotting (both taking and reverting a snapshot) the state is set to paused.
virsh --connect qemu:///system --readonly dominfo one-220
Id: 2
Name: one-220
UUID: 87892a7b-f829-bb85-6d99-60b1a2d40547
OS Type: hvm
State: paused
CPU(s): 1
CPU time: 622218.4s
Max memory: 2097152 KiB
Used memory: 2097152 KiB
Persistent: no
Autostart: disable
Managed save: no
The commit https://github.com/OpenNebula/one/commit/87cef75a8e42cf4f74a05160255dfb7796690bf7#diff-650e58e49530115caad6187b7283aa25L266 already fixed the issue.
Joaquín
#7 Updated by Ruben S. Montero about 6 years ago
- Status changed from New to Closed
- Resolution set to worksforme
Great! closing this then...