Exasock Bonding support extensions
The exasock driver has been extended to support bonding, but only in the active-backup mode. Furthermore it's meant to manage only bonds containing ExaNIC devices, so it will reject attempts to bind it to already-existing bonding interfaces which contain non-ExaNIC devices, or to add a non-ExaNIC device to a bond which is currently under its management.
All configuration of bonding interfaces and other standard bonding
configuration such as MII Monitor intervals, ARP Monitor intervals,
primary slave configuration etc, are done as normal through /sys
on the upstream Linux bonding
driver.
A summary of the operation of this extension is that it wraps itself
around the Linux bonding driver and enforces constraints which are
required (such as ExaNIC device membership and rejecting bond modes
other than active-backup), and then it exports information about each
bonding interface it has wrapped, using a per-bond /dev
file. Userspace
software can then read this /dev
metadata and use it to determine
which link within the bonding interface is the active one.
Known pitfalls
- We strongly recommend that you set an mii or arp interval on your
bonding interfaces (
/sys/class/net/<iface>/bonding/{miimon,arp_interval}
so that the Linux Bonding driver's monitor service will actively poll your bond for when the active ExaNIC goes down. The reason is that the exanic driver doesn't currently support link down messages, so the Linux bonding driver can currently miss active link down events -- and also because Linux itself doesn't issue the right kinds of event messages to ensure that we can faithfully track the status of the link inside of the exasock-bonding extension without using polling. Using an active monitor (whether MII or ARP) circumvents this issue.- Exasock-bonding will actively monitor the miimon/arp_interval properties of the interfaces under its management, and it will poll for link status updates at the lower of the two intervals (i.e, MIN(miimon, arp_interval)).
- However, Exasock-bonding doesn't actually query the drivers for the links in the bond -- it queries the Linux bonding driver for to know what it thinks is the current link status. So if you don't set an MII/ARP interval, and Linux's Bonding driver fails to detect a link status change, so also will Exasock-bonding.
- We only support active-backup mode, and attempting to write a value
other than active-backup into
/sys/class/net/{iface}/bonding/mode
will result in an error, if{iface}
is a bonding interface under this extension's management.
Known limitations:
- Exasock-bonding does not allow recursive bond membership. The extension will reject attempts to make it manage a bonding interface which has member devices that are themselves, bonding interfaces.
- Libexasock in userspace requires that a bond have at least one member device before an attempt is made to use that bond with libexasock.
- There doesn't have to be an active device in the link though, but there
must be at least one member.
-
Note
The Linux bonding driver will usurp the MAC address of one of the member devices. We recommend that you not remove that particular device from the bond. In practice, this works perfectly fine, but the current behaviour of the Linux bonding driver is that it will retain that usurped MAC address even after releasing the device that it usurped it from. So you will end up with 2 devices having the same MAC address.
-
- Exasock-bonding does not support VLAN pseudo-interfaces as slave devices - all slaves must be direct ExaNIC devices.
Compiling the extensions
Just run make
as you usually would for the exasock kernel module.
Your kernel may not support the bonding extension because before Linux v3.19, the kernel didn't publicly export the headers needed for the extension to build. You'll see a warning message if this is the case.
Using the extensions
For a practical example, see the setupbonding
shell script inside of
exanic-software
(examples/exasock/setupbonding
).
All configuration, setup, etc is done using standard Linux utilities (ifconfig, ip, etc). All of the standard Linux bonding driver configuration interfaces work exactly the same way they usually do.
Creating a bonding master
This section is only in here for completeness and to make it clear that all configuration is exactly the same as the standard Linux bonding driver procedure.
Notice
Bonding can be set up using the sysfs interface or iproute2
commands;
this guide uses the sysfs interface.
For example, to create a new bonding interface named mybond
echo "+mybond" >/sys/class/net/bonding_masters
To add two slave devices, enp0s0
and enp0s0d1
to mybond
:
echo "+enp0s0" >/sys/class/net/mybond/bonding/slaves
echo "+enp0s0d1" >/sys/class/net/mybond/bonding/slaves
To remove enps0d1
from mybond
:
echo "-enp0s0" >/sys/class/net/mybond/bonding/slaves
All of this works exactly the same as the stock bonding configuration, because Exasock-bonding uses the Linux bonding module.
Placing an already existing bonding interface under management:
Before attempting to place a bonding interface under the management of Exasock-bonding, you must first create it with the Linux Bonding driver -- follow the steps above or see the Linux bonding documentation, or go on the web and use whichever tutorial suits your preferences.
From there, you simply tell Exasock-bonding about the bonding interface which you have created and which you wish to place under its management.
To do this, you have to first load the Exasock kernel module:
modprobe exasock
When the Exasock driver has been successfully loaded, it will create a
/sys
file called /sys/class/net/exabond_masters
-- the naming is meant
to be similar to the /sys file created by Linux's bonding module
(bonding_masters
).
Let's assume you've created a bonding interface called mybond
, like the
above example.
To tell Exasock-bonding that you want it to manage mybond
, you just
execute this command:
echo "+mybond" >/sys/class/net/exabond_masters
When you are ready to have Exasock-bonding stop managing this bonding interface, execute this command:
echo "-mybond" >/sys/class/net/exabond_masters
Notice
You do not need to remove all existing NICs from the bond before you
remove it from being under the management of Exasock-bonding. You're free
to place bonding interfaces under management and remove them from management
fluidly -- just be sure to close all handles to the /dev/exabond-{iface-name}
file because of course, Linux won't allow Exasock-bonding to delete the
/dev
file until all handles are closed.
Notice
You do not need to add member ExaNICs to a bond before placing it under management -- you can place empty bonds under management.
Reading the link status of the bond from userspace:
The bonding extension to the exasock kernel module exports metadata about
exasock bonding interfaces to the userspace through device files in /dev
.
Whenever you place a bonding interface under management a new /dev node will
be created with a name of the form /dev/exabond-{BONDING_IFACE_NAME}
.
For example, if you created a bond called mybond
and then placed it under
Exasock-bonding's management as shown above, then upon placing it under
management, a new /dev
file will be created called /dev/exabond-mybond
.
To read the metadata in that file, simply mmap()
it as READ ONLY
because the extensions will reject attempts to map it as writeable.
-
For documentation of the data structure and so on, please see
src/libs/exasock/kernel/exasock-bonding.h
. -
For a convenient library which will do all the work of parsing the data structure for you as well as dealing with the integrity protocol for ensuring that you don't get partial reads due to races between the kernel and userspace, see the library implemented in
src/libs/exasock/exasock-bonding-priv.h
.