Multiple Network Attachments with bridge CNI
Introduction
Over the last years the KubeVirt project has improved a lot regarding secondary interfaces networking configuration. Now it’s possible to do an end to end configuration from host networking to a VM using just the Kubernetes API with special Custom Resource Definitions. Moreover, the deployment of all the projects has been simplified by introducing KubeVirt hyperconverged cluster operator (HCO) and cluster network addons operator (CNAO) to install the networking components.
The following is the operator hierarchy list presenting the deployment responsibilities of the HCO and CNAO operators used in this blog post:
- kubevirt-hyperconverged-cluster-operator (HCO)
- cluster-network-addons-operator (CNAO)
- multus
- bridge-cni
- kubemacpool
- kubernetes-nmstate
- KubeVirt
- cluster-network-addons-operator (CNAO)
Introducing cluster-network-addons-operator
The cluster network addons operator manages the lifecycle (deploy/update/delete) of different Kubernetes network components needed to configure secondary interfaces, manage MAC addresses and defines networking on hosts for pods and VMs.
A Good thing about having an operator is that everything is done through the API and you don’t have to go over all nodes to install these components yourself and assures smooth updates.
In this blog post we are going to use the following components, explained in a greater detail later on:
- multus: to start a secondary interface on containers in pods
- linux bridge CNI: to use bridge CNI and connect the secondary interfaces from pods to a linux bridge at nodes
- kubemacpool: to manage mac addresses
- kubernetes-nmstate: to configure the linux bridge on the nodes
The list of components we want CNAO to deploy is specified by the NetworkAddonsConfig
Custom Resource (CR) and the progress of the installation appears in the CR status field, split per component. To inspect
this progress we can query the CR status with the following command:
kubectl get NetworkAddonsConfig cluster -o yaml
To simplify this blog post we are going to use directly the NetworkAddonsConfig from HCO, which by default installs all the network components, but just to illustrate CNAO configuration, the following is a NetworkAddonsConfig CR instructing to deploy multus, linuxBridge, nmstate and kubemacpool components:
apiVersion: networkaddonsoperator.network.kubevirt.io/v1
kind: NetworkAddonsConfig
metadata:
name: cluster
spec:
multus: {}
linuxBridge: {}
nmstate: {}
imagePullPolicy: Always
Connecting Pods, VMs and Nodes over a single secondary network with bridge CNI
Although Kubernetes provides a default interface that gives connectivity to pods and VMs, it’s not easy to configure which NIC should be used for specific pods or VMs in a multi NIC node cluster. A Typical use case is to split control/traffic planes isolated by different NICs on nodes.
With linux bridge CNI + multus it’s possible to create a secondary NIC in pod containers and attach it to a L2 linux bridge on nodes. This will add container’s connectivity to a specific NIC on nodes if that NIC is part of the L2 linux bridge.
To ensure the configuration is applied only in pods on nodes that have the bridge, the k8s.v1.cni.cncf.io/resourceName
label is added. This goes hand in hand with another component, bridge-marker which inspects nodes networking and if a new bridge pops up it will mark the node status with it.
This is an example of the results from bridge-marker on nodes where bridge br0 is already configured:
---
status:
allocatable:
bridge.network.kubevirt.io/br0: 1k
capacity:
bridge.network.kubevirt.io/br0: 1k
This is an example of NetworkAttachmentDefinition
to expose the bridge available on the host to users:
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: bridge-network
annotations:
k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/br0
spec:
config: >
{
"cniVersion": "0.3.1",
"name": "br0-l2",
"plugins": [{
"type": "bridge",
"bridge": "br0",
"ipam": {}
}]
}
Then adding the bridge secondary network to a pod is a matter of adding the following annotation to it:
annotations:
k8s.v1.cni.cncf.io/networks: bridge-network
Setting up node networking with NodeNetworkConfigurationPolicy (aka nncp)
Changing Kubernetes cluster node networking can be done manually iterating over all the cluster nodes and making changes or using different automatization tools like ansible. However, using just another Kubernetes resource is more convenient. For this purpose the kubernetes-nmstate project was born as a cluster wide node network administrator based on Kubernetes CRs on top of nmstate.
It works as a Kubernetes DaemonSet
running pods on all the cluster nodes and reconciling three different CRs:
- NodeNetworkConfigurationPolicy to specify cluster node network desired configuration
- NodeNetworkConfigurationEnactment (nnce) to troubleshoot issues with nncp
- NodeNetworkState (nns) to view the node’s networking configuration
Note
Project kubernetes-nmstate has a distributed architecture to reduce kube-apiserver connectivity dependency, this means that every pod will configure the networking on the node that it’s running without much interaction with kube-apiserver.
In case something goes wrong and the pod changing the node network cannot ping the default gateway, resolve DNS root servers or has lost the kube-apiserver connectivity it will rollback to the previous configuration to go back to a working state. Those errors can be checked by running kubectl get nnce
. The command displays potential issues per node and nncp.
The desired state fields follow the nmstate API described at their awesome doc
Also for more details on kubernetes-nmstate there are guides covering reporting, configuration and troubleshooting. There are also nncp examples.
Demo: mixing it all together, VM to VM communication between nodes
With the following recipe we will end up with a pair of virtual machines pair on two different nodes with one secondary NICs, eth1 at vlan 100. They will be connected to each other using the same bridge on nodes that also have the external secondary NIC eth1 connected.
Demo environment setup
We are going to use a kubevirtci as Kubernetes ephemeral cluster provider.
To start it up with two nodes and one secondary NIC and install NetworkManager >= 1.22 (needed for kubernetes-nmstate) and dnsmasq follow these steps:
git clone https://github.com/kubevirt/kubevirtci
cd kubevirtci
# Pin to version working with blog post steps in case
# k8s-1.19 provider disappear in the future
git reset d5d8e3e376b4c3b45824fbfe320b4c5175b37171 --hard
export KUBEVIRT_PROVIDER=k8s-1.19
export KUBEVIRT_NUM_NODES=2
export KUBEVIRT_NUM_SECONDARY_NICS=1
make cluster-up
export KUBECONFIG=$(./cluster-up/kubeconfig.sh)
Installing components
To install KubeVirt we are going to use the operator kubevirt-hyper-converged-operator, this will install all the components needed to have a functional KubeVirt with all the features including the ones we are going to use: multus, linux-bridge, kubemacpool and kubernetes-nmstate.
curl https://raw.githubusercontent.com/kubevirt/hyperconverged-cluster-operator/master/deploy/deploy.sh | bash
kubectl wait hco -n kubevirt-hyperconverged kubevirt-hyperconverged --for condition=Available --timeout=500s
Now we have a Kubernetes cluster with all the pieces to startup a VM with bridge attached to a secondary NIC.
Creating the br0 on nodes with a port attached to secondary NIC eth1
First step is to create a L2 linux-bridge at nodes with one port on the secondary NIC eth1, this will be used later on by the bridge CNI.
cat <<EOF | kubectl apply -f -
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
name: br0-eth1
spec:
desiredState:
interfaces:
- name: br0
description: Linux bridge with eth1 as a port
type: linux-bridge
state: up
bridge:
options:
stp:
enabled: false
port:
- name: eth1
EOF
Now we wait for the bridge to be created checking nncp conditions:
kubectl wait nncp br0-eth1 --for condition=Available --timeout 2m
After the nncp becomes available, we can query the nncp resources in the cluster and see it listed with successful status.
kubectl get nncp
NAME STATUS
br0-eth1 SuccessfullyConfigured
We can inspect the status of applying the policy to each node.
For that there is the NodeNetworkConfigurationEnactment
CR (nnce):
kubectl get nnce
NAME STATUS
node01.br0-eth1 SuccessfullyConfigured
node02.br0-eth1 SuccessfullyConfigured
Note
In case of errors it is possible to retrieve the error dumped by nmstate running
kubectl get nnce -o yaml
the status will contain the error.
We can also inspect the network state on the nodes by retrieving the NodeNetworkState
and
checking if the bridge br0
is up using jsonpath
kubectl get nns node01 -o=jsonpath='{.status.currentState.interfaces[?(@.name=="br0")].state}'
kubectl get nns node02 -o=jsonpath='{.status.currentState.interfaces[?(@.name=="br0")].state}'
When inspecting the full currentState yaml we get the following interface configuration:
kubectl get nns node01 -o yaml
status:
currentState:
interfaces:
- bridge:
options:
group-forward-mask: 0
mac-ageing-time: 300
multicast-snooping: true
stp:
enabled: false
forward-delay: 15
hello-time: 2
max-age: 20
priority: 32768
port:
- name: eth1
stp-hairpin-mode: false
stp-path-cost: 100
stp-priority: 32
description: Linux bridge with eth1 as a port
ipv4:
dhcp: false
enabled: false
ipv6:
autoconf: false
dhcp: false
enabled: false
mac-address: 52:55:00:D1:56:00
mtu: 1500
name: br0
state: up
type: linux-bridge
We can also check that the bridge-marker
is working and check verify on nodes:
kubectl get node node01 -o yaml
The following should appear stating that br0 can be consumed on the node:
status:
allocatable:
bridge.network.kubevirt.io/br0: 1k
capacity:
bridge.network.kubevirt.io/br0: 1k
At this point we have an L2 linux bridge ready and connected to NIC eth1.
Configure network attachment with a L2 bridge and a vlan
In order to make the bridge a L2 bridge, we specify no IPAM (IP Address Management) since we are not going to configure any ip address for the bridge. To configure bridge vlan-filtering we add the vlan we want to use to isolate our VMs:
cat <<EOF | kubectl apply -f -
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: br0-100-l2
annotations:
k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/br0
spec:
config: >
{
"cniVersion": "0.3.1",
"name": "br0-100-l2-config",
"plugins": [
{
"type": "bridge",
"bridge": "br0",
"vlan": 100,
"ipam": {}
},
{
"type": "tuning"
}
]
}
EOF
Start a pair of VMs on different nodes using the multus configuration to connect a secondary interfaces to br0
Now it’s time to startup the VMs running on different nodes so we can check external connectivity of br0. They will also have a secondary NIC eth1 to connect to the other VM running at different node, so they go over the br0 at nodes.
The following picture illustrates the cluster:
First step is to install the virtctl
command line tool to play with virtual machines:
curl -L -o virtctl https://github.com/kubevirt/kubevirt/releases/download/v0.33.0/virtctl-v0.33.0-linux-amd64
chmod +x virtctl
sudo install virtctl /usr/local/bin
Now let’s create two VirtualMachine
s on each node. They will have one secondary NIC connected to br0 using the multus configuration for vlan 100. We will also activate kubemacpool to be sure that mac addresses are unique in the cluster and install the qemu-guest-agent so IP addresses from secondary NICs are reported to VM and we can inspect them later on.
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: default
labels:
mutatevirtualmachines.kubemacpool.io: allocate
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
name: vma
spec:
running: true
template:
spec:
nodeSelector:
kubernetes.io/hostname: node01
domain:
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: default
masquerade: {}
- name: br0-100
bridge: {}
machine:
type: ""
resources:
requests:
memory: 1024M
networks:
- name: default
pod: {}
- name: br0-100
multus:
networkName: br0-100-l2
terminationGracePeriodSeconds: 0
volumes:
- name: containerdisk
containerDisk:
image: kubevirt/fedora-cloud-container-disk-demo
- name: cloudinitdisk
cloudInitNoCloud:
networkData: |
version: 2
ethernets:
eth1:
addresses: [ 10.200.0.1/24 ]
userData: |-
#!/bin/bash
echo "fedora" |passwd fedora --stdin
dnf -y install qemu-guest-agent
sudo systemctl enable qemu-guest-agent
sudo systemctl start qemu-guest-agent
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
name: vmb
spec:
running: true
template:
spec:
nodeSelector:
kubernetes.io/hostname: node02
domain:
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: default
masquerade: {}
- name: br0-100
bridge: {}
machine:
type: ""
resources:
requests:
memory: 1024M
networks:
- name: default
pod: {}
- name: br0-100
multus:
networkName: br0-100-l2
terminationGracePeriodSeconds: 0
volumes:
- name: containerdisk
containerDisk:
image: kubevirt/fedora-cloud-container-disk-demo
- name: cloudinitdisk
cloudInitNoCloud:
networkData: |
version: 2
ethernets:
eth1:
addresses: [ 10.200.0.2/24 ]
userData: |-
#!/bin/bash
echo "fedora" |passwd fedora --stdin
dnf -y install qemu-guest-agent
sudo systemctl enable qemu-guest-agent
sudo systemctl start qemu-guest-agent
EOF
Wait for the two VMs to be ready. Eventually you will see something like this:
kubectl get vmi
NAME AGE PHASE IP NODENAME
vma 2m4s Running 10.244.196.142 node01
vmb 2m4s Running 10.244.140.86 node02
We can check that they have one secondary NIC without address assigned:
kubectl get vmi -o yaml
## vma
interfaces:
- interfaceName: eth0
ipAddress: 10.244.196.144
ipAddresses:
- 10.244.196.144
- fd10:244::c48f
mac: 02:4a:be:00:00:0a
name: default
- interfaceName: eth1
ipAddress: 10.200.0.1/24
ipAddresses:
- 10.200.0.1/24
- fe80::4a:beff:fe00:b/64
mac: 02:4a:be:00:00:0b
name: br0-100
## vmb
interfaces:
- interfaceName: eth0
ipAddress: 10.244.140.84
ipAddresses:
- 10.244.140.84
- fd10:244::8c53
mac: 02:4a:be:00:00:0e
name: default
- interfaceName: eth1
ipAddress: 10.200.0.2/24
ipAddresses:
- 10.200.0.2/24
- fe80::4a:beff:fe00:f/64
mac: 02:4a:be:00:00:0f
name: br0-100
Let’s finish this section by verifying connectivity between vma and vmb using ping
. Open the console of vma virtual machine and use ping
command with destination IP address 10.200.0.2, which is the address assigned to the secondary interface of vmb:
Note
The user and password for this VMs is fedora
, it was configured at cloudinit userData
virtctl console vma
ping 10.200.0.2 -c 3
PING 10.200.0.2 (10.200.0.2): 56 data bytes
64 bytes from 10.200.0.2: seq=0 ttl=50 time=357.040 ms
64 bytes from 10.200.0.2: seq=1 ttl=50 time=379.742 ms
64 bytes from 10.200.0.2: seq=2 ttl=50 time=404.066 ms
--- 10.200.0.2 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 357.040/380.282/404.066 ms
Conclusion
In this blog post we used network components from KubeVirt project to connect two VMs on different nodes through a linux bridge connected to a secondary NIC. This illustrates how VM traffic can be directed to a specific NIC on a node using a secondary NIC on a VM.