Thursday, November 28, 2013

What commands are called during the startup of the neutron-plugin-openvswitch-agent?

I spent a few days last week troubleshooting a networking issue at client that required me to step through the initialization of the neutron-plugin-openvswitch-agent under Havana. My client is using the Neutron ML2 plugin and Open vSwitch (commonly refered to as “OVS”) agent configured to use GRE tunneling. The final resolution of the issue was simplistic, the correct OVS module had not be installed but as I generated a large amount of notes I thought I would post them in hopes that they will be beneficial to someone else. My client is using the ML2 plugin + Open vSwitch (commonly refered to as “OVS”) agent configured to use GRE tunneling.

If you haven’t dealt with OpenStack in the past the neutron-plugin-openvswitch-agent relies on several underlying services and utilities: the Open vSwitch kernel module, Open vSwitch management utilities (ovs-vsctl and ovs-ofctl), iptables, and ip. The OpenFlow management API is also used to build and manage the OVS flow tables that OVS leverages to manipulate and direct network traffic.

If you are interested you can view these entries in the /var/log/neutron/openvswitch-agent.log file when the neutron-plugin-openvswitch-agent is configured for debug logging. Also in some of the steps below I will walk through some of the code as those commands were the most interesting for me at the time.

Before I continue I want to thank Kyle Mestery from Cisco (IRC nick: mestery) who helped point me in the right direction on the more intense code blocks. Here are the primary references I used to decipher/interpret what is going on.

The primary code files I dissected were:

There are also some additional references scattered through the post.

All of the OVS commands are called via neutron-rootwrap or, if configured, just sudo. I also removed all of the single quotes and some of the commas for readability sake.

sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf

The agent starts out by retrieving the IP information of the OVS integration bridge (br-int) local port. The MAC address returned is reformated for use as the suffix of the Neutron OVS agent’s ID.

ip -o link show br-int

If the br-tun patch port exists on the br-int bridge it is deleted.

ovs-vsctl --timeout=2 -- --if-exists del-port br-int patch-tun

Next any existing entries in the integration bridge flow table are deleted to ensure a clean environment and the first OpenFlow flow entry is added. The arguments hard_timeout and idle_timeout are set to 0 to ensure that the flow does not expire. The priority argument is set to 1 to ensure that other flows with a priority <1 will not act on a packet first. The actions=normal argument-value key-pair indicates that the default L2 processing performed by the Open vSwitch will occur.

ovs-ofctl del-flows br-int
ovs-ofctl add-flow br-int hard_timeout=0,idle_timeout=0,priority=1,actions=normal

If tunneling is enabled (as it is in this case) the setup_tunnel_br function definition is called to configure the tunnel bridge (br-tun). The neutron-plugin-openvswitch-agent requires a specific OVS switch and OpenFlow flow configuration to operate, to achieve the clean slate the existing br-tun switch is destroyed (if it existed) and then recreated.

ovs-vsctl --timeout=2 -- --if-exists del-br br-tun
ovs-vsctl --timeout=2 add-br br-tun

Next the agent will add a new port to the br-int bridge…

ovs-vsctl --timeout=2 add-port br-int patch-tun

…configure the new port as a patch…

ovs-vsctl --timeout=2 set Interface patch-tun type=patch

…and assign the patch port to act as a peer to the patch-int port.

ovs-vsctl --timeout=2 set Interface patch-tun options:peer=patch-int

If ovs-vsctl show is executed now the OVS bridges will look like this. This is the base configuration for a Neutron node that is not running the router agent (L3 agent).

12cb27cf-6188-45c0-9421-b11ef1b865c8
    Bridge br-tun
        Port br-tun
            Interface br-tun
                type: internal
    Bridge br-int
        Port br-int
            Interface br-int
                type: internal
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
    ovs_version: "1.10.2"

Prior to the next few steps the neutron-plugin-openvswitch-agent verifies that the patch port has been successfully created.

ovs-vsctl --timeout=2 get Interface patch-tun ofport

The agent adds and configures a patch port and verifies the creation using the same steps for the br-tun bridge.

ovs-vsctl --timeout=2 add-port br-tun patch-int
ovs-vsctl --timeout=2 set Interface patch-int type=patch
ovs-vsctl --timeout=2 set Interface patch-int options:peer=patch-tun
ovs-vsctl --timeout=2 get Interface patch-int ofport

Even though the tunnel bridge was just created the Neutron OVS agent deletes any existing flows for the tunnel bridge. If you are like me you are probably thinking “Why? I just created it.”. By default a single entry is created in the OpenFlow flow table and we need to remove it so as to not conflict with the flows that will be added.

ovs-ofctl del-flows br-tun

This next set of steps uses the output from the previous ovs-vsctl command to fill the in_port argument value. In this case the in_port value is 1, so remember that in_port=1 is really in_port=patch-int on the br-tun bridge.

You will also see that multiple OpenFlow flow tables are used and, in hopes of reducing your (and most definitely my) future confusion, here’s the breakdown of which table is which from the constants.py file. I recommend reading an excellent blog post by Assaf Muller from Red Hat Israel describing what each flow table does.

PATCH_LV_TO_TUN = 1
GRE_TUN_TO_LV = 2
VXLAN_TUN_TO_LV = 3
LEARN_FROM_TUN = 10
UCAST_TO_TUN = 20
FLOOD_TO_TUN = 21

The flow tables are effectively chained together by the flow entries themselves and the flow’s priority establishes the packet manipulation hierarchy for other flow entries in same flow table with “0” having the lowest priority and “254” the highest. There are also special entries in the flow table used to ensure that any OpenFlow management traffic always takes precedence when the controller is in-band but the flows aren’t visible using the standard “dump-flows” arguments.

Now onto the rest of the configuration…

A flow is inserted into the default OpenFlow flow table that uses the resubmit action to ensure that all packets entering the patch-int port are initially sorted to flow table 1 (PATCH_LV_TO_TUN).

ovs-ofctl add-flow br-tun "hard_timeout=0,idle_timeout=0,priority=1,in_port=1,actions=resubmit(,1)"

Another flow is then inserted into the default OpenFlow flow table with the lowest priority (0) and will drop any other packets not sorted by the first flow.

ovs-ofctl add-flow br-tun hard_timeout=0,idle_timeout=0,priority=0,actions=drop

Unicast packets (represented by the value of the dl_dst argument) are forwarded to OpenFlow flow table 20 (referred to as the UCAST_TO_TUN flow table). It has the lowest priority (priority=0) and no packet timeout (hard_timeout=0,idle_timeout=0).

ovs-ofctl add-flow br-tun "hard_timeout=0,idle_timeout=0,priority=0,table=1,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00,actions=resubmit(,20)"

Multicast packets (represented by the value of the dl_dst argument) are forwarded to OpenFlow flow table 21 (FLOOD_TO_TUN). It has a higher priority (priority=1) to ensure that packets are matched against it prior to the unicast flow and has no packet timeout (hard_timeout=0,idle_timeout=0).

ovs-ofctl add-flow br-tun "hard_timeout=0,idle_timeout=0,priority=0,table=1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00,actions=resubmit(,21)"

Next OpenFlow table 2 (GRE_TUN_TO_LV) and OpenFlow table 3 (VXLAN_TUN_TO_LV) are populated initially with entries that drop all traffic. These two tables are also used to set the local VLAN ID (or Lvid) used internally in the Open vSwitch switch itself based on the tunnel ID.

ovs-ofctl add-flow br-tun hard_timeout=0,idle_timeout=0,priority=0,table=2,actions=drop
ovs-ofctl add-flow br-tun hard_timeout=0,idle_timeout=0,priority=0,table=3,actions=drop

The following flow is more complex than the others, it is used to dynamically learn the MAC addresses traversing the switch dataplane. First the flow is inserted into OpenFlow table 10 (LEARN_FROM_TUN) with no idle or hard timeout for packets matched to the flow itself, with a higher priority of 1, and outputs to the patch-int port.

The learn argument is used to modify an existing flow table, in this case flow table 20 (UCAST_TO_TUN). The priority is set to 1 to ensure that all of the packets are sorted via this flow first and a timeout is set to ensure that the new MAC address will eventually be removed if not seen again within the timeout period.

The remaining arguments are prefixed by the letters NXM, the abbreviation for ”Nicira Extended Match”. NXM is an Open vSwitch extension written by Nicira that provides a matching facility for network packets.

NXM_OF_VLAN_TCI[0..11] represents the internal-to-Open vSwitch-switch (referred to as the Lvid) 802.1q VLAN tag control information (TCI) header. This TCI field is 12 bits long, hence the [0..11] value is required to retrieve the entire field and is saved…

NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[] assigns the source MAC address to the destination…

load:0->NXM_OF_VLAN_TCI[] clears the internal-to-Open vSwitch-switch VLAN TCI …

load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[] writes the GRE tunnel ID to the tun_id register…

output:NXM_OF_IN_PORT[] sets what the egress OVS port is…

…and finally output: 1 forwards the packet out via the original patch-int port.

ovs-ofctl add-flow br-tun "hard_timeout=0,idle_timeout=0,priority=1,table=10,actions=learn(table=20,priority=1,hard_timeout=300,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]), output:1"

Here’s a quick example of what two flows dynamically created by this learn argument look like. In this case the dl_dst value in each flow points to the specific network interface of VMs running on nova-compute nodes.

cookie=0x0, duration=339549.273s, table=20, n_packets=62605, n_bytes=16031792, hard_timeout=300, idle_age=3, hard_age=3, priority=1,vlan_tci=0x0003/0x0fff,dl_dst=fa:16:3e:b1:07:8f actions=load:0->NXM_OF_VLAN_TCI[],load:0x2->NXM_NX_TUN_ID[],output:2
cookie=0x0, duration=97.68s, table=20, n_packets=0, n_bytes=0, hard_timeout=300, idle_age=97, priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:6c:a6:5a actions=load:0->NXM_OF_VLAN_TCI[],load:0x1->NXM_NX_TUN_ID[],output:3

The next flow is used to capture any remaining packets and forward them to flow table 21 (FLOOD_TO_TUN) - basically these packets are unknown and all unknown packets are not considered multicast/broadcast and should be dropped.

ovs-ofctl add-flow br-tun "hard_timeout=0,idle_timeout=0,priority=0,table=20,actions=resubmit(,21)"

Finally, drop any of the packets that get to flow table 21 (FLOOD_TO_TUN).

ovs-ofctl add-flow br-tun hard_timeout=0,idle_timeout=0,priority=0,table=21,actions=drop

Next the Neutron OVS agent retrieves the list of existing bridges, filters out the integration and tunnel bridges, and then searches for any remaining bridges to determine whether any are externally linked and should be managed.

ovs-vsctl --timeout=2 list-br

Up to now the integration bridge (br-int) and tunnel bridge (br-tun) have been configured with their default specifications. The GRE tunnel ports have not been created or configured so the agent does that now.

The remote GRE endpoints are retrieved from the topology schema overseen/maintained by the neutron-server daemon. Notice that the GRE ports below are named using an IP prefixed by ‘gre’, this one thing that has changed (IMHO for the better) in the latest iteration of the Neutron OVS agent. The IP address in the name is the remote host’s IP, in this case 172.31.254.29 is a remote nova-compute node’s IP address.

For brevity’s sake I’m only including the configuration steps for one remote GRE endpoint, this environment has many but the configuration is the same.

First a port is created on the tunnel bridge with the name gre-172.31.254.29…

ovs-vsctl --timeout=2 -- --may-exist add-port br-tun gre-172.31.254.29

…the port is configured as type gre…

ovs-vsctl --timeout=2 set Interface gre-172.31.254.29 type=gre

…a remote GRE endpoint is added…

ovs-vsctl --timeout=2 set Interface gre-172.31.254.29 options:remote_ip=172.31.254.29

…the local GRE endpoint is added…

ovs-vsctl --timeout=2 set Interface gre-172.31.254.29 options:local_ip=172.31.254.65

…will be controlled by a OpenFlow flow…

ovs-vsctl --timeout=2 set Interface gre-172.31.254.29 options:in_key=flow
ovs-vsctl --timeout=2 set Interface gre-172.31.254.29 options:out_key=flow

…and then a check is done to determine whether the port was created and configured successfully. In the background a comparison is done between the returned ofport value and -1. As long as the ofport value is greater than -1 the port was created/configured successfully. If -1 is returned the port configuration failed but the agent’s initialization sequence doesn’t stop.

ovs-vsctl --timeout=2 get Interface gre-172.31.254.29 ofport

The integer returned from the last command is also used to populate the in_port argument value in the next OpenFlow flow table configuration command which is used to add the port to the GRE_TUN_TO_LV flow table. Any network packets that ingress this new GRE endpoint port will be manipulated using the OpenFlow flow entries in the GRE_TUN_TO_LV created earlier.

ovs-ofctl add-flow br-tun "hard_timeout=0,idle_timeout=0,priority=1,in_port=4,actions=resubmit(,2)"

The next two commands tally the existing, added, and/or removed ports by comparing the ports that existed during the last poll of the Neutron OVS agent. Both of these commands are initiated by an if statement found on line 1081 and 1082 in the ovs_neutron_plugin.py file:

if polling_manager.is_polling_required:
port_info = self.update_ports(ports)

First the function definition update_ports in the same file is called using the self.update_ports(ports) action. The update_ports calls the function definition get_vif_port_set in the ovs_lib.py file which retrieves the list of existing integration bridge ports and sets the result to a variable called port_names.

ovs-vsctl --timeout=2 list-ports br-int

Secondly the list Interface action retrieves a list of the ports with their names and external IDs (the attached MAC address, interface ID, interface status, and VM ID) and assigns the results to a variable.

ovs-vsctl --timeout=2 --format=json -- --columns=name,external_ids list Interface

The two variables are compared, if a difference is found the divergence is calculated as one of two states: added or removed, and the network ports are created or deleted (in this case there was no divergence found so no logs…I’ll try to dig some up if I can).

Lastly the firewall needs to be refreshed for the new ports, both IPv4 and IPv6.

iptables-save -c
iptables-restore -c
ip6tables-save -c
ip6tables-restore -c

At this point the underlying structure has been built for the plugin to work and the next steps update the OVS bridges and OpenFlow flow tables to support the existing ports. Once those (if any) ports have been created and configured the neutron-plugin-openvswitch-agent will poll periodically.