Feature #2997
Configure PHYDEV for 802.1Q VLAN per host and cluster
Status: | Pending | Start date: | 06/25/2014 | |
---|---|---|---|---|
Priority: | Normal | Due date: | ||
Assignee: | Jaime Melis | % Done: | 0% | |
Category: | Packaging | |||
Target version: | - | |||
Resolution: | Pull request: |
Description
Hi,
With the arrival of "Predictable Network Interface Names" [1] in all new Linux distributions, it's becoming very difficult to use and rename network interfaces using eth* names. Moreover, renaming using kernel space names like eth*/wlan* is no more supported [2][3] by udev/systemd-udevd and impossible in RHEL 7.
This is really blocking us from using RHEL 7 as we need to use eth1 as 802.1Q physical device. As PHYDEV can be very specific to hardware/Linux distribution, it should be possible to declare it inside the host template and eventually inside cluster template.
Priority would be : network template < cluster template < host template
Kind regards,
Laurent
1. http://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
2. https://bugs.freedesktop.org/show_bug.cgi?id=56929#c3
3. Excerpt from udev manpage
NAME What a network interface should be named. Also, as a temporary workaround, this is what a device node should be named; usually the kernel provides the defined node name or creates and removes the node before udev even receives any event. Changing the node name from the kernel's default creates inconsistencies and is not supported. If the kernel and NAME specify different names, an error is logged. udev is only expected to handle device node permissions and to create additional symlinks, not to change kernel-provided device node names. Instead of renaming a device node, SYMLINK should be used. However, symlink names must never conflict with device node names, as that would result in unpredictable behavior.
History
#1 Updated by Ruben S. Montero about 7 years ago
This really conflicts with the actual workflow of OpenNebula. Note that VNET configuration is attached to the VM before allocating it to a host. This is:
1.- The VNET is bound to a given bridge br0 (or nic dev for the matter of fact)
2.- A VM is created with NIC referring to the device defined in the VNET
3.- The VM is allocated to the host
The proposed changed will require to move the NIC initialization to a latter stage and in general it will prevent live migrate between hosts were NICs have different names.
All distributions can be configured to add a udev rule to set the name of the interfaces so they are consistent across hosts. This is a fully supported mechanism in RedHat 7 as explained here.
This would be the recommended procedure, for the reasons outlined before.
#2 Updated by Laurent Grawet about 7 years ago
Sorry but, apparently, you didn't fully understand the problem.
All distributions can be configured to add a udev rule to set the name of the interfaces
Right, this is still the case. I used to do it until now. You can rename to anything except eth* (or any other kernel name space name). This is not supported anymore and don't expect it to work with current and future udev/systemd releases. I managed to do it with Ubuntu 14.04 but not with RHEL 7.
This is what you get at boot :
Jun 25 17:13:29 kvm1 systemd-udevd[807]: starting version 208 Jun 25 17:13:30 kvm1 systemd-udevd: error changing net interface name eth1 to eth4: File exists Jun 25 17:13:30 kvm1 systemd-udevd: error changing net interface name eth2 to eth1: File exists Jun 25 17:13:30 kvm1 systemd-udevd: error changing net interface name eth3 to eth5: File exists Jun 25 17:13:30 kvm1 systemd-udevd: error changing net interface name eth4 to eth2: File exists Jun 25 17:13:30 kvm1 systemd-udevd: error changing net interface name eth5 to eth6: File exists Jun 25 17:13:30 kvm1 systemd-udevd: error changing net interface name eth6 to eth3: File exists
This has already been discussed in systemd bug 56929 (posted with other references in my initial report)
https://bugs.freedesktop.org/show_bug.cgi?id=56929#c3
Developer Kay Sievers replied with :
Short version: The rule will not work any more, and cannot be made working with systemd. The rules need to use names now which do not use kernel names like ethX. Biosdevname *should* work fine, even without "BIOS support" in that sense. It should be able to calculate a predictable name based on the physical location of the hardware, at least if PCI/USB hardware is used. Long version: We do no longer support renaming network interfaces in the kernel namespace. Interface names are required to use custom names that can never clash with the kernel created ones. We do not support swapping names; we cannot win any race against the kernel creating new interfaces at the same time. We do no longer support the creation of udev rules from inside the hotplug path. It was pretty naive to ever try this in the first place, it all is a problem that cannot be solved properly, and which creates many more new problems than it solves. The entire udev-based automatic persistent network names is all just a long history of failures, it pretended to be able to solve something; but it couldn't deliver. We completely stopped pretending that now, and need to move on to something that can work out in a reliable and predictable manner. Predictable network interface names require a tool like biosdevname, or manually configured names, which do not use the kernel names.
This is a fully supported mechanism in RedHat 7 as explained here.
This documentation explain how to disable "consistent network device naming" to use old eth* names (what I did) but not how to rename it. And we end up with interfaces in wrong order so...
Ok, I understand this is a real problem with current OpenNebula workflow. Maybe this could be done at vnm driver/script level ? A bit dirty but...
#3 Updated by Ruben S. Montero about 7 years ago
- Target version deleted (
Release 4.8)
#4 Updated by Laurent Grawet about 7 years ago
Some more references about these udev/systemd-udevd changes:
Kay Sievers Tue, 31 Jan 2012 12:13:41 +0100 ... Pretending we are able to rename netif names in the same namespace the kernel is allocating new names is just plain wrong. There are races you can't control. The entire approach creates far more problems than it solves. We just have to admit it was wrong to do that. Custom/to-rename netif names can just not be ethX. ...
https://lkml.org/lkml/2012/1/31/149
Kay Sievers 2013-07-27 10:37:09 EDT Interface names can no longer be swapped, they need different names not clashing with the kernel namespace. It is not expected to work in Fedora 19 or any newer release. Edit the file to use any name other than eth0/1. Remove the rules and use the new default names, or install biosdevname.
https://bugzilla.redhat.com/show_bug.cgi?id=989099
#5 Updated by Ruben S. Montero about 7 years ago
- Tracker changed from Request to Feature
- Category changed from Drivers - Network to Packaging
- Assignee set to Jaime Melis
- Target version set to Release 4.8
Hi,
I think that the idea would be rename the interfaces to a common name not necessarily ethX. Let me move this to the 4.8 issues, to take a look when we prepare the CentOS 7 packages.
Thanks!
Ruben
Laurent Grawet wrote:
Some more references about these udev/systemd-udevd changes:
[...]
https://lkml.org/lkml/2012/1/31/149
#6 Updated by Laurent Grawet about 7 years ago
Hi,
I totally agree, renaming to anything but kernel namespace name is the solution for new cloud deployments. This should be advised in documentation and this could also be handled at packaging level.
I wish I've been warned about this 3 years ago but renaming eth* to eth* was the way to go. Now I need a way to handle all the hosts currently in production and the new ones. I don't want to duplicate all virtual networks because users will get lost and it is too painful to manage.
About the current workflow, when you think about it, all a VM instance should know is the bridge name to attach vnic, not the physical interface name.
I was thinking about remotes/vnm/802.1Q/HostManaged.rb script. The script could check for a 802.1Q PHYDEV variable in HOST (and eventually CLUSTER) template. But maybe the core could simply pass the right value to the script depending on which HOST/CLUSTER the VM is deployed.
Thanks,
Laurent
#7 Updated by Laurent Grawet about 7 years ago
I've just noticed VNM driver actions only receive XML VM template encoded in base64 format. So it implies changes at core level to handle it that way.
#8 Updated by Ruben S. Montero about 7 years ago
Laurent Grawet wrote:
I've just noticed VNM driver actions only receive XML VM template encoded in base64 format. So it implies changes at core level to handle it that way.
That can be worked out. The template has the NIC, and we can add any attribute from the nework thourh the INHERIT_VNET_ATTR, currently we are passing VLAN_TAGGED_ID for the drivers...
About the doc, we'll take care about it as part of the issue. Thanks again for your great feedback!!
#9 Updated by Ruben S. Montero about 7 years ago
I was thinking of a kind of mapping, host-dev... coded in the VNET.
#10 Updated by Laurent Grawet about 7 years ago
Does it mean we would have to edit all VNETs (currently 42) each time we add an new host ?
#11 Updated by Laurent Grawet about 7 years ago
And what about passing the HOST template along with the VM template to the VNM drivers ?
It wouldn't break current drivers and we could add a simple test for variable presence to the script.
But I understand the will to centralize VNM configuration.
#12 Updated by Ruben S. Montero about 7 years ago
Laurent Grawet wrote:
Does it mean we would have to edit all VNETs (currently 42) each time we add an new host ?
True, we cannot go that way
#13 Updated by Ruben S. Montero about 7 years ago
Laurent Grawet wrote:
And what about passing the HOST template along with the VM template to the VNM drivers ?
It wouldn't break current drivers and we could add a simple test for variable presence to the script.
But I understand the will to centralize VNM configuration.
It is not the best solution, but we could do a onehost show from the driver for this special configurations
#14 Updated by Jaime Melis almost 7 years ago
After thinking more about this issue, and instead of changing the code we would really like to simply recommend for C7 to rename the interfaces manually, to something other than ethX.
We've done a quick PoC and if we simply add this rule:
[root@localhost ~]# cat /etc/udev/rules.d/70-persistent-net.rules SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="02:00:ac:10:4d:c9", NAME="net0"
Everything works as expected. Do you have any reason not to like this solution?
#15 Updated by Laurent Grawet almost 7 years ago
Yes because I have now 43 VNETs configured with PHYDEV=eth1, so it is too late for me. I wish I've been warned about this 3 years ago but renaming eth* to eth* was the way to go. I don't want to duplicate all virtual networks because users will get lost and it is too painful to manage. And if you mix RHEL6 HOSTS/eth1 currently in production with new RHEL7 HOSTS/net0 and migrate it will fail. (Live migration from RHEL6 to RHEL7 is supported by KVM/RedHat)
But you're right, your solution is the best solution when starting from scratch.
#16 Updated by Laurent Grawet almost 7 years ago
Hi,
Is there any update on this feature ? We are stuck on this one. Another idea is to read a local config file on the hypervisor.
Thanks
Laurent
#17 Updated by Ruben S. Montero almost 7 years ago
Laurent Grawet wrote:
Hi,
Is there any update on this feature ? We are stuck on this one. Another idea is to read a local config file on the hypervisor.
Thanks
Laurent
Hi Laurent,
Right now, we will recommend the setup outlined in #note-14. If come out with a solution easy to implement without modifying the data workflow in oned we'll implement that.
You mention a fix at the driver level, this probably will work as a work around for you. If you have a mapping file in each host like:
ETHO="enp0s5"
Then you could easily source the file, sed deployment.0 with s/eth0/$ETH0/g. You could even put the files (named after the hostname of each host) in the vmm directory and distribute them with onehost sync. The modification is in file kvm/deploy after the cat > $domain....
#18 Updated by Laurent Grawet almost 7 years ago
Actually, I was thinking about patching /var/lib/one/remotes/vnm/802.1Q/HostManaged.rb
Because modifying deployment file would not solve nic hotplug problem, right ?
#19 Updated by Ruben S. Montero almost 7 years ago
Laurent Grawet wrote:
Actually, I was thinking about patching /var/lib/one/remotes/vnm/802.1Q/HostManaged.rb
Because modifying deployment file would not solve nic hotplug problem, right ?
Yes right, you can do the same in pre ARGV0 includes the template, just Base64::decode and gsub the string. then you shoud be able to encode it and create an OpenNebulaHM from it
#20 Updated by Ruben S. Montero almost 7 years ago
Ruben S. Montero wrote:
Laurent Grawet wrote:
Actually, I was thinking about patching /var/lib/one/remotes/vnm/802.1Q/HostManaged.rb
Because modifying deployment file would not solve nic hotplug problem, right ?Yes right, you can do the same in pre ARGV0 includes the template, just Base64::decode and gsub the string. then you shoud be able to encode it and create an OpenNebulaHM from it
And also in attach_nic. I think it maybe easier to patch the action scripts rather than the libs. Note that attach nic does not use HostManage.rb
#21 Updated by Laurent Grawet almost 7 years ago
Note that attach nic does not use HostManage.rb
So when you hotplug and use attach_nic, the bridge has to exists otherwise it would fail ?
Because all I can find in attach_nic is $BRIDGE and $MAC.
#22 Updated by Ruben S. Montero almost 7 years ago
Laurent Grawet wrote:
Note that attach nic does not use HostManage.rb
So when you hotplug and use attach_nic, the bridge has to exists otherwise it would fail ?
Because all I can find in attach_nic is $BRIDGE and $MAC.
Exactly, bridge is supposed to either exists or created by the pre action
#23 Updated by Laurent Grawet almost 7 years ago
The following workaround works for us:
--- /var/lib/one/remotes/vnm/802.1Q/pre.ori 2014-06-11 16:06:20.000000000 +0200 +++ /var/lib/one/remotes/vnm/802.1Q/pre 2014-09-04 14:59:29.176602599 +0200 @@ -20,6 +20,19 @@ $: << File.join(File.dirname(__FILE__), "..") require 'HostManaged' +require 'yaml' +require 'socket' +require 'base64' + +config_file = File.expand_path(File.dirname(__FILE__) + '/host.yml') +if File.file?(config_file) + config = YAML.load_file(config_file) + hostname = Socket.gethostname + if config.has_key?(hostname) && config[hostname].has_key?('phydev') + template = Base64.decode64(ARGV[0]).gsub(/\<PHYDEV\>\<\!\[CDATA\[.*\]\]\>\<\/PHYDEV\>/, "<PHYDEV><![CDATA[#{config[hostname]['phydev']}]]></PHYDEV>") + ARGV[0] = Base64.encode64(template) + end +end hm = OpenNebulaHM.from_base64(ARGV[0]) exit hm.activate
It uses a "host.yaml" config file in the same directory of the pre script. All is needed is the hypervisor hostname and corresponding PHYDEV:
kvmhost1: phydev: 'net1'
#24 Updated by Ruben S. Montero almost 7 years ago
Laurent Grawet wrote:
The following workaround works for us:
[...]
It uses a "host.yaml" config file in the same directory of the pre script. All is needed is the hypervisor hostname and corresponding PHYDEV:
[...]
Thanks for sharing!...
#25 Updated by Laurent Grawet almost 7 years ago
Here is a small update. There is a problem with the regex and multiple netif as the xml template is transmitted as a single line :
Base64.decode64(ARGV[0]).gsub(/\<PHYDEV\>\<\!\[CDATA\[[\w\.\:\-\_\#]+\]\]\>\<\/PHYDEV\>/, "<PHYDEV><![CDATA[#{config[hostname]['phydev']}]]></PHYDEV>"
#26 Updated by Ruben S. Montero almost 7 years ago
- Target version deleted (
Release 4.8)