Bug #5353
undeploy fails when using ceph system datastore
Status: | Pending | Start date: | 09/06/2017 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | Ruben S. Montero | % Done: | 0% | |
Category: | Core & System | |||
Target version: | Release 5.6 | |||
Resolution: | Pull request: | |||
Affected Versions: | OpenNebula 5.4 |
Description
Hello,
when I use a Ceph System Datastore the undeployment of VMs fails with following error:
Wed Sep 6 12:31:51 2017 [Z0][TM][I]: Command execution fail: /var/lib/one/remotes/tm/ceph/mv node01.example.com:/var/lib/one//datastores/130/23214 opennebula:/var/lib/one//datastores/130/23214 23214 130
Wed Sep 6 12:31:51 2017 [Z0][TM][I]: mv: Moving node01.example.com:/var/lib/one/datastores/130/23214 to opennebula:/var/lib/one/datastores/130/23214
Wed Sep 6 12:31:51 2017 [Z0][TM][E]: mv: Command "set -e -o pipefail
Wed Sep 6 12:31:51 2017 [Z0][TM][I]:
Wed Sep 6 12:31:52 2017 [Z0][TM][I]: tar -C /var/lib/one/datastores/130 --sparse -cf - 23214 | ssh opennebula 'tar -C /var/lib/one/datastores/130 --sparse -xf -'
Wed Sep 6 12:31:52 2017 [Z0][TM][I]: rm -rf /var/lib/one/datastores/130/23214" failed: ssh: Could not resolve hostname opennebula: Name or service not known
Wed Sep 6 12:31:52 2017 [Z0][TM][E]: Error copying disk directory to target host
Wed Sep 6 12:31:52 2017 [Z0][TM][I]: ExitCode: 255
Wed Sep 6 12:31:53 2017 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host
Wed Sep 6 12:31:53 2017 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY_FAILURE
Wed Sep 6 12:34:36 2017 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY
When using NFS System Datastore then Undeployment works as expected.
The question is why for the controller "opennebula" is used instead of "opennebula.example.com"? Do I have to configure it somewhere?
Temporary fix is to add "opennebula" with IP to /etc/hosts. But would be nice to fix it differently so we don't have to change /etc/hosts on all blades in case we have to change the IP of controller :-)
Thanks
History
#1 Updated by Ruben S. Montero over 3 years ago
- Category set to Drivers - Storage
- Assignee set to Vlastimil Holer
- Target version set to Release 5.4.3
#2 Updated by Vlastimil Holer over 3 years ago
This is a problem with the core. Frontend hostname is detected by the gethostname
, which doesn't return the FQDN. It can return FQDN only in case the FQDN is set as the hostname.
https://github.com/OpenNebula/one/blob/512da1ee67ee83aef9df736aaa9988349a62d0d2/src/nebula/Nebula.cc#L53
Example 1:
$ hostname thunder $ hostname -f thunder.localdomain
and gethostname
returns thunder.
Example 2:
$ hostname thunder.localdomain $ hostname -f thunder.localdomain
and gethostname
returns thunder.localdomain.
There'll have to be more sophisticated frontend FQDN detection, preferably also configurable in the oned.conf
(frontend can have multiple IPs for public and cluster-private communication, without the override option it can use wrong interface with e.g. some performance penalty).
#3 Updated by Vlastimil Holer over 3 years ago
Tobias,
Temporary fix is to add "opennebula" with IP to /etc/hosts. But would be nice to fix it differently so we don't have to change /etc/hosts on all blades in case we have to change the IP of controller :-)
a better fix for you, for now, is to ensure you have FQDN as a hostname before starting the OpenNebula (check with just the "hostname" command without parameters, see examples in my previous comment).
Despite this is a bad practice, it's now often the recommended way
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/ch-configure_host_names
However, Red Hat recommends that both static and transient names match the fully-qualified domain name (FQDN) used for the machine in DNS, such as host.example.com.
Best regards,
Vlastimil
#4 Updated by Tobias Fischer over 3 years ago
Hello Vlastimil,
thanks for your help - very appreciated!
Best Regards,
Tobias
#5 Updated by Ruben S. Montero over 3 years ago
- Target version changed from Release 5.4.3 to Release 5.6
#6 Updated by Vlastimil Holer over 3 years ago
- Category changed from Drivers - Storage to Core & System
- Assignee changed from Vlastimil Holer to Ruben S. Montero