Bug #5563
ceph rbd image attached to existing VM via attach_disk is missing additional ceph monitors
Status: | Pending | Start date: | 11/17/2017 | |
---|---|---|---|---|
Priority: | High | Due date: | ||
Assignee: | - | % Done: | 0% | |
Category: | Drivers - VM | |||
Target version: | - | |||
Resolution: | Pull request: | |||
Affected Versions: | OpenNebula 5.4.1 |
Description
Hello,
any additional ceph rbd image attached to an existing VM via attach disk is missing additional ceph monitors. Example:
1st disk (defined via template) has following XML definition:
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='writeback'/>
<auth username='XXX'>
<secret type='ceph' uuid='XXX'/>
</auth>
<source protocol='rbd' name='POOL/IMAGE'>
<host name='XXX.XXX.XXX.XXX' port='6789'/>
<host name='XXX.XXX.XXX.XXX' port='6789'/>
<host name='XXX.XXX.XXX.XXX' port='6789'/>
</source>
any additional ceph image has following XML definition:
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<auth username='XXX'>
<secret type='ceph' uuid='XXX'/>
</auth>
<source protocol='rbd' name='POOL/IMAGE'>
<host name='XXX.XXX.XXX.XXX' port='6789'/>
</source>
Although the corresponding Datastore has 3 ceph monitors defined in opennebula the attach script only uses the first one. So the other 2 ceph monitors are missing.
This is very bad because if the 1st ceph monitor dies somehow the attached disk in the VM gets unusable. If it had all 3 monitors defined then it could switch to one of the other 2.
The source code for getting the ceph monitors of the corresponding DS from which the image should be attached seems to be in
remotes/vmm/kvm/attach_disk
cat <<EOF > $ATTACH_FILE
<disk type='$TYPE_XML' device='$DEVICE'>
<driver name='qemu' type='$DRIVER' $CACHE $DISCARD/>
<source $TYPE_SOURCE='$SOURCE' $SOURCE_ARGS>
$SOURCE_HOST
</source>
$AUTH
<target dev='$TARGET'/>
$READONLY
</disk>
EOF
the variable containing the ceph monitors is $SOURCE_HOST
it gets defined in remotes/scripts_common.sh:
function get_source_xml
function get_disk_information
unfortunately i am not sure where exactly the error is located.
Any help is appreciated. thanks :-)
Best,
tobi
History
#1 Updated by Tobias Fischer over 3 years ago
Here is how i fixed it:
remotes/scripts_common.sh:
function get_source_xml {
HOSTS=""
for host in $1 ; do
BCK_IFS=$IFS
IFS=':'
unset k HOST_PARTS SOURCE_HOST
for part in $host ; do
HOST_PARTS[k++]="$part"
done
SOURCE_HOST="$SOURCE_HOST<host name='${HOST_PARTS[0]}'"
if [ -n "${HOST_PARTS[1]}" ]; then
SOURCE_HOST="$SOURCE_HOST port='${HOST_PARTS[1]}'"
fi
SOURCE_HOST="$SOURCE_HOST/>"
HOSTS+=$SOURCE_HOST
IFS=$BCK_IFS
done
echo "$HOSTS"
}
so i added
HOSTS=""
HOSTS+=$SOURCE_HOST
echo "$HOSTS"
and also in
function get_disk_information {
SOURCE_HOST=$(get_source_xml "$CEPH_HOST")
i added quotes around $CEPH_HOST