
Building bridges: “Bridge must not have discovery mode for LACP interface; Interface file: bondeth0;”
Some days start as an exatastic day and end up grasping your hair. This post is about such a day. A brand new shiny X7 Quarter Rack exadata was being installed for a customer. This setup was a fairly straightforward one. Quarter rack, full ovm and 2 virtual clusters. Not too fancy. LACP is being used for the public network on the fibers and vlan tagging is done on the switch.
Given that, my customer created a OEDA configuration using only LACP and not specifying the vlans. This is allowed in the june 2018 version of OEDA:
Only when you select the “Advanced” button and select the Enable Network VLAN option, the vlan id box appears and you can fill in the VLAN to be used.
So far so good, we had very default configuration and checkip.sh had run successfully. That means, green light, lets go for it!
First step of install.sh (verification of the config file) ran perfectly fine. But the second one failed with
1 |
Bridge must not have discovery mode for LACP interface Interface file bondeth0 |
Uh oh, nothing to find on My Oracle Support, nor on Google. So that means I had to open an SR.
At this point I decided to try to understand why domu_maker works the way it works. During a fresh install the bond for the public network and the bridge in dom0 is built. When you check the logs, you see that it is exactly during building the bridge that it fails.
In the logging from OEDA ($OEDA_HOME/log) you see things like this:
1 2 3 4 5 6 7 8 |
########## Data Collection ################### Step2_Create_Virtual_Machine_180623_152112.out ======================================== 2018-06-23 15:24:43,128 [FINE][ OCMDThread][ EsCommonUtils:1257] ESC[40;31m[ERROR ]ESC[0m Bridge must not have discovery mode for LACP interface: Interface; file: bondeth0; /EXAVMIMAGES/conf/final-obfuscatedadm01vm01.obfuscated.com-vm.xml 2018-06-23 15:24:43,128 [FINE][-1-thread-86][ EsCommonUtils:1257] ESC[40;31m[ERROR ]ESC[0m Bridge must not have discovery mode for LACP interface: Interface; file: bondeth0; /EXAVMIMAGES/conf/final-obfuscatedadm02vm01.obfuscated.com-vm.xml 2018-06-23 15:24:43,129 [INFO][ OCMDThread][ KommandOutput:193] ESC[40;31m[ERROR ]ESC[0m Bridge must not have discovery mode for LACP interface: Interface; file: bondeth0; /EXAVMIMAGES/conf/final-obfuscatedadm01vm01.obfuscated.com-vm.xml 2018-06-23 15:24:43,129 [INFO][-1-thread-86][ KommandOutput:193] ESC[40;31m[ERROR ]ESC[0m Bridge must not have discovery mode for LACP interface: Interface; file: bondeth0; /EXAVMIMAGES/conf/final-obfuscatedadm02vm01.obfuscated.com-vm.xml |
It’s customer data, so please forgive me the obfuscation. But the point is not very clear why he fails.
Let me give you the “Official” solution from Oracle first:
“I checked internally and the only workaround is to disable LACP mode on switch to proceed with vm creation.
– Disable LACP on network switch
– Uncheck lacp and recreate the config file.
We can re-enable LACP once the vm creation is successful.”
Well … in this case, I couldn’t do that. The network admins already created the LACP bonding and the OEDA allowed it, to do it this way. Also, launching 2 change requests would take way too much time which would put us too far behind on schedule. This means … Time to be creative.
The magic logfile, which is cleaned up afterwards, can be found in /var/log/cellos/exadata.img.domu_maker.trc or .log. When exactly it is cleaned up, I don’t know yet. I wanted to capture more information to put in this blogpost, but it was already gone. The thing is, it show you exactly what is going on. That way I discovered, that even if you select LACP, the installer still does his verifications just the same way as it does it for the non-lacp interfaces. I mean, it takes the interface, puts an address on it and verifies using ping (icmp! dear firewall admins, so be kind during installation please) to verify if it can reach the default gateway.
When I discovered that, the solution was very simple, but efficient. Let’s do it manually. So on both nodes I created my bridge manually. The Domu_maker command does 90% for you, so it is really easy:
1 2 |
[root@obfuscateddom0adm01 linux-x64]# /opt/exadata_ovm/exadata.img.domu_maker add-bonded-bridge-dom0 vmbondeth0 eth3 eth4 [root@obfuscateddom0adm01 linux-x64]# |
you notice that I didn’t specify the LACP option nor the vlan id. Vlan’s are handled (in this case) on switch level. So the help from this function:
1 2 |
add-bonded-bridge-dom0 <bridge_name> <slave1> <slave2> [<vlan>] [lacp] Add bridge over bonded Ethernet interface with optional vlan id and optional lacp mode in DOM0. |
I would assume vlan is optional, but it isn’t. Now you understand why I chose the classic bridge. Next step is to convert it into an LACP bond. In /etc/sysconfig/network-scripts/ifcfg-bondeth0 change the bonding opts to the bonding opts you want to have. In my case I end up with this configuration file:
1 2 3 4 5 6 7 8 9 10 |
#### DO NOT REMOVE THESE LINES #### #### %GENERATED BY CELL% #### DEVICE=bondeth0 USERCTL=no BOOTPROTO=none ONBOOT=yes #BONDING_OPTS="mode=active-backup miimon=100 downdelay=2000 updelay=5000 num_grat_arp=100" BONDING_OPTS="mode=4 miimon=100 downdelay=200 updelay=200 num_grat_arp=100 lacp_rate=1 xmit_hash_policy=layer3+4" BRIDGE=vmbondeth0 NM_CONTROLLED=no |
To make this active (system wasn’t in use yet) I restarted the complete network from the node:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
[root@obfuscateddom0adm01 EXAVMIMAGES]# service network restart Shutting down interface vmbondeth0: [ OK ] Shutting down interface vmeth0: [ OK ] Shutting down interface bondeth0: [ OK ] Shutting down interface eth0: [ OK ] Shutting down interface ib0: [ OK ] Shutting down interface ib1: [ OK ] Shutting down loopback interface: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface bondeth0: [ OK ] Bringing up interface eth0: [ OK ] Bringing up interface ib0: RTNETLINK answers: File exists Error adding address 192.168.10.1 for ib0. [ OK ] Bringing up interface ib1: RTNETLINK answers: File exists Error adding address 192.168.10.2 for ib1. [ OK ] Bringing up interface vmbondeth0: [ OK ] Bringing up interface vmeth0: Determining if ip address 198.18.5.92 is already in use for device vmeth0... [ OK ] [root@obfuscateddom0adm01 EXAVMIMAGES]# |
And the bridge is also known in the system
1 2 3 4 5 |
[root@obfuscateddom0adm01 EXAVMIMAGES]# brctl show bridge name bridge id STP enabled interfaces vmbondeth0 8000.000af7d5bfe0 no bondeth0 vmeth0 8000.0010e0dd9642 no eth0 [root@obfuscateddom0adm01 EXAVMIMAGES]# |
Before we try the installer again, it’s best to verify if it all works.
So first put an IP address on the bridge:
1 2 3 4 5 6 |
[root@obfuscateddom0adm01 EXAVMIMAGES]# ip addr add 10.123.123.68/24 dev vmbondeth0 [root@obfuscateddom0adm01 EXAVMIMAGES]# ifup vmbondeth0 [root@obfuscateddom0adm01 EXAVMIMAGES]# ethtool vmbondeth0 Settings for vmbondeth0: Link detected: yes [root@obfuscateddom0adm01 EXAVMIMAGES]# |
Also we want the default gateway to be reachable:
1 2 3 4 5 6 7 8 9 |
[root@obfuscateddom0adm01 EXAVMIMAGES]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 199.99.99.1 0.0.0.0 UG 0 0 0 vmeth0 10.123.123.0 0.0.0.0 255.255.255.0 U 0 0 0 vmbondeth0 192.168.8.0 0.0.0.0 255.255.252.0 U 0 0 0 ib0 192.168.8.0 0.0.0.0 255.255.252.0 U 0 0 0 ib1 199.99.99.0 0.0.0.0 255.255.255.0 U 0 0 0 vmeth0 [root@obfuscateddom0adm01 EXAVMIMAGES]# |
it should be reachable through the correct interface, so do the ping test:
1 2 3 4 5 6 7 8 9 10 |
[root@obfuscateddom0adm01 EXAVMIMAGES]# ping 10.123.123.1 -I vmbondeth0 -c3 PING 10.123.123.1 (10.123.123.1) from 10.123.123.68 vmbondeth0: 56(84) bytes of data. 64 bytes from 10.123.123.1: icmp_seq=1 ttl=255 time=3.40 ms 64 bytes from 10.123.123.1: icmp_seq=2 ttl=255 time=0.732 ms 64 bytes from 10.123.123.1: icmp_seq=3 ttl=255 time=0.824 ms --- 10.123.123.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 0.732/1.653/3.405/1.239 ms [root@obfuscateddom0adm01 EXAVMIMAGES]# |
And that works. So now clean it up, and to be sure restart the network again so that we are in a clean state for the installer:
1 2 3 4 5 6 7 8 9 |
[root@obfuscateddom0adm01 EXAVMIMAGES]# ip addr del 10.123.123.68/24 dev vmbondeth0 [root@obfuscateddom0adm01 EXAVMIMAGES]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 199.99.99.1 0.0.0.0 UG 0 0 0 vmeth0 192.168.8.0 0.0.0.0 255.255.252.0 U 0 0 0 ib0 192.168.8.0 0.0.0.0 255.255.252.0 U 0 0 0 ib1 199.99.99.0 0.0.0.0 255.255.255.0 U 0 0 0 vmeth0 [root@obfuscateddom0adm01 EXAVMIMAGES]# |
At this point, you would think all would succeed.
Unfortunately it still failed, with the very same error. So this couldn’t be a real error. It must be something else. When you read the domu_maker script, it tells you exactly why:
1 2 3 4 5 6 7 8 |
elif (match_re "$int_domu" '^bondeth') && [ "$network_discovery__bondeth_mode" == 'lacp' ]; then # 26338063 - lacp bridge cannot be in discovery mode imageLogger_logMsg $IMLOG_DOMUER_HANDLE ${LINENO} DISPLAY_TO_STDOUT $imageLogger_LOG_ERROR 0 0 "Bridge must not have $VM_BRIDGE_DISCOVERY mode for LACP interface: Interface; file: $int_vlan_domu; $disc_conf" return 1 elif [ "$vlan_domu" != '0' ]; then imageLogger_logMsg $IMLOG_DOMUER_HANDLE ${LINENO} DISPLAY_TO_STDOUT $imageLogger_LOG_ERROR 0 0 "Bridge must not have $VM_BRIDGE_DISCOVERY mode for tagged VLAN interface: Interface; file: $int_vlan_domu; $disc_conf" return 1 fi |
I don’t like messing around in my installation xml’s, neither do I in oracle provided scripts, but you see the #26338063 ? It refers to a non-public bug. BUG 26338063 – DEPLOYING A VM WHERE THE BOND IS LACP FAILS.
When you search for more information on that bug, it should have been fixed in the October 2017 bundle, but apparently it is still there.
Next step is a little nasty, on line 7049 of the domu_maker script I changed
1 |
elif (match_re "$int_domu" '^bondeth') && [ "$network_discovery__bondeth_mode" == 'lacp' ]; then |
to
1 |
elif (match_re "$int_domu" '^bondeth') && [ "$network_discovery__bondeth_mode" != 'lacp' ]; then |
I avoid the message with this. I KNOW it is safe (in this case) to continue, and he shouldn’t do everything for me, so I can skip this.
When the install.sh was retried it ran without any error. Great succes!
But, and this is very important. Before running the third step, modify the domu_maker back to the original values.
1 2 3 |
[root@obfuscateddom0adm01 linux-x64]# sed -n 7049p /opt/exadata_ovm/exadata.img.domu_maker elif (match_re "$int_domu" '^bondeth') && [ "$network_discovery__bondeth_mode" == 'lacp' ]; then [root@obfuscateddom0adm01 linux-x64]# |
The rest of the install was just as default and straightforward as originally planned.
I would like to mention 2 more things.
- This is a hack. This is not a clean way of working and the SR is still open and Oracle is informed about this. But I am still convinced, when things are allowed in OEDA, the install.sh script should be able to handle it without problems.
- Thank you Andy for the heads up and encouragement.
As always, questions, remarks? find me on twitter @vanpupi