Bug #4878
Ceph persitent image "loses" changes
Status: | Closed | Start date: | 10/23/2016 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | Jaime Melis | % Done: | 0% | |
Category: | Drivers - Storage | |||
Target version: | Release 5.2.1 | |||
Resolution: | fixed | Pull request: | ||
Affected Versions: | OpenNebula 5.2 |
Description
Ceph image loses recent changes if image has been non persitent.
Steps to reproduce:
1) create persitent empty datablock (id 42), mount it in vm and add some files:
mkfs.ext4 /dev/vdb
mkdir /mnt/d1
mount /dev/vdb /mnt/d1/; cd /mnt/d1/; touch test1; touch test2; touch test3
Verify in ceph cluster: rbd -p one --id libvirt du
NAME PROVISIONED USED one-42 1024M 73728k
2) terminate vm
3) make image none persitent & boot vm with it
Verify in ceph cluster: rbd -p one --id libvirt du
NAME PROVISIONED USED one-42@snap 1024M 73728k <-- snapshot of original one-42 1024M 0 <-- original image one-42-623-1 1024M 0 <-- new vm none persitent image, child of snapshot
4) so far all ok with created files, terminate vm
Verify in ceph cluster: rbd -p one --id libvirt du
NAME PROVISIONED USED one-42@snap 1024M 73728k <-- snapshot of original still present one-42 1024M 0
5) make image presitent again and boot vm with it. remove some files:
rm test2 test3
Verify in ceph cluster: rbd -p one --id libvirt du
NAME PROVISIONED USED one-42@snap 1024M 73728k <-- old snapshot of original (with 3 files) one-42 1024M 12288k <-- running presistent image (now with 1 file)
6) terminate vm and make image non persitent
7) boot new vm with image that had 1 file, and list dir: ls -al /mnt/d1
drwxr-xr-x 3 root root 4096 Oct 23 16:10 . drwxr-xr-x 22 root root 4096 Oct 5 21:50 .. drwx------ 2 root root 16384 Oct 23 16:08 lost+found -rw-r--r-- 1 root root 0 Oct 23 16:10 test1 -rw-r--r-- 1 root root 0 Oct 23 16:10 *test2* <-- deleted file in step 5 -rw-r--r-- 1 root root 0 Oct 23 16:10 *test3* <-- deleted file in step 5
Verify in ceph cluster: rbd -p one --id libvirt du
NAME PROVISIONED USED one-42@snap 1024M 73728k <- old snapshot with 3 files one-42 1024M 12288k <-- original image with 1 file one-42-625-1 1024M 4096k <-- latest image, clone of snapshot 1st line with 3 files
Verify that new clone is in fact from snapshot:
rbd --id libvirt children one/one-42@snap
one/one-42-625-1
So now whenever I boot image as non persitent, old snapshot (3 files) for image cloning is used. When I boot it as persitent original/latest image (1 file) is used.
I think rbd snapshot needs to be removed after we terminate last running vm with none persitent image.
Associated revisions
B #4878: Ceph persistent image "loses" changes
B #4878: Ceph persistent image "loses" changes
(cherry picked from commit 5764fe974a19a56c39810a8a4e4c6fca018255d7)
B #4878: Ceph persistent image "loses" changes
History
#1 Updated by Ruben S. Montero over 4 years ago
Are you sure you are using 5.2? The @snap is removed for persistent images, take a look here:
https://github.com/OpenNebula/one/blob/master/src/tm_mad/ceph/ln#L73
Do you have any errors when instantiating the VM after making the image persistent?
#2 Updated by Taavi K over 4 years ago
Ruben S. Montero wrote:
Are you sure you are using 5.2? The @snap is removed for persistent images, take a look here:
https://github.com/OpenNebula/one/blob/master/src/tm_mad/ceph/ln#L73
Do you have any errors when instantiating the VM after making the image persistent?
Yes, it's 5.2.
Can't find any errors either.
#3 Updated by Taavi K over 4 years ago
Oned.log reports success converting image to persitent but snapshot is still there:
[Z0][ReM][D]: Req:2032 UID:0 ImageInfo invoked , 42 [Z0][ReM][D]: Req:2032 UID:0 ImageInfo result SUCCESS, "<IMAGE><ID>42</ID><U..." [Z0][ReM][D]: Req:1680 UID:0 ImagePersistent invoked , 42, true [Z0][ReM][D]: Req:1680 UID:0 ImagePersistent result SUCCESS, 42 [Z0][ReM][D]: Req:5056 UID:0 ImageInfo invoked , 42 [Z0][ReM][D]: Req:5056 UID:0 ImageInfo result SUCCESS, "<IMAGE><ID>42</ID><U..."
Ceph cluster still has a snapshot:
rbd --id libvirt -p one info one-42
rbd image 'one-42': size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.217ed81238e1f29 format: 2 features: layering flags:
rbd --id libvirt -p one info one-42@snap
rbd image 'one-42': size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.217ed81238e1f29 format: 2 features: layering flags: protected: True
oned -v
Copyright 2002-2016, OpenNebula Project, OpenNebula Systems OpenNebula 5.2.0 is distributed and licensed for use under the terms of the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0).
Ceph cluster version 10.2.3 and relevant auth permissions:
client.libvirt key: *removed* caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=one, allow rwx pool=one_ssd
#4 Updated by Ruben S. Montero over 4 years ago
Note that the snapshot will be removed the next time the image is used, not right after the persistent change
#5 Updated by Taavi K over 4 years ago
This still does not happen after persistent image use.
In first post step 5, snapshot should be deleted but it's still there.
#6 Updated by Ruben S. Montero over 4 years ago
I cannot reproduce this, will try in another Ceph cluster, and update the issue.
#7 Updated by Taavi K over 4 years ago
Seems to be same issue in this forum topic:
https://forum.opennebula.org/t/non-persistent-to-persistent-and-back-again-ceph/2199
#8 Updated by Jaime Melis over 4 years ago
- Resolution set to worksforme
I'm sorry, but I have also tried to replicate the issue and it works for me. The code seems alright, as Rubén explained, the snapshot is being removed when the VM is instantiated.
Could it be perhaps a problem with an unproper umount, without running sync or something like that?
Closing with worksforme.
Please reopen if you have any additional info.
#9 Updated by Jaime Melis over 4 years ago
- Status changed from Pending to Closed
#10 Updated by Jaime Melis over 4 years ago
- Status changed from Closed to Assigned
- Assignee set to Jaime Melis
- Target version set to Release 5.4
Actually reopening... I followed your instructions again and I've seen the bug.
Thanks for the detailed explanation!
#11 Updated by Taavi K over 4 years ago
Sure :)
#12 Updated by Jaime Melis over 4 years ago
- Status changed from Assigned to Closed
- Resolution changed from worksforme to fixed
Fixed
#13 Updated by Tino Vázquez over 4 years ago
- Target version changed from Release 5.4 to Release 5.2.1