A Practical Guide to Container Networking
Published 11/03/2022
Originally published by Tigera.
Written by Reza Ramezanpour, Tigera.
An important part of any Kubernetes cluster is the underlying containers. Containers are the workloads that your business relies on, what your customers engage with, and what shapes your networking infrastructure. Long story short, containers are arguably the soul of any containerized environment.
One of the most popular open-source container orchestration systems, Kubernetes, has a modular architecture. On its own, Kubernetes is a sophisticated orchestrator that helps you manage multiple projects in order to deliver highly available, scalable, and automated deployment solutions. But to do so, it relies on having a suite of underlying container orchestration tools.
This blog post focuses on containers and container networking. Throughout this post, you will find information on what a container is, how you can create one, what a namespace means, and what the mechanisms are that allow Kubernetes to limit resources for a container.
Containers
A container is an isolated environment used to run an application. By utilizing the power of cgroup, namespace, and filesystem from the Linux kernel, containers can be allocated with a limited amount of resources and filesystems inside isolated environments.
Note: Some applications deliver containers that use other technologies. In this post, I will focus on these three Linux components, since these are the main contributors when running a container inside Kubernetes.
Namespace
In Linux, processes can be isolated using namespaces. A process inside a namespace is unaware of applications that are running in other namespaces or in the host environment.
Let’s open up a Linux terminal and create a namespace.
Note: These examples are executed by a root account inside the terminal. If you see a permission denied error message, try the command with sudo or a root account.
We can list the current network namespaces by issuing the following command.
ip netns list
Next, let’s create a new network namespace.
ip netns add my_ns
Now that we have a network namespace, we can assign processes to run in that isolated environment.
ip netns exec my_ns bash
If you need to verify your current namespace, use the following command.
ip netns
If you would like to take this further, I highly recommend that you open a new terminal and issue commands such as ps and ip link in both windows to get a better understanding of the host and container environment.
Note: Container life cycle is tied to the process it's associated with.
Use the exit command to get out of the namespace.
Control groups (cgroups)
Control groups, or cgroups, allow Linux to allocate a limited amount of resources, such as memory, network bandwidth, and CPU time, to users.
Inside /sys/fs/ directory (in most Linux distros) there is a folder called cgroup that holds proposed resource limitations. You can find these limitations by using the following command:
ls /sys/fs/cgroup
Back in our host machine, let’s create a cgroup folder architecture for our namespace.
mkdir -p /sys/fs/cgroup/pids/my_ns_cgroup
After creating a cgroup folder, Linux automatically populates the necessary files by following the cgroup-v1 standard. We can examine these files by using the following command.
ls /sys/fs/cgroup/pids/my_ns_cgroup
Let’s add a limit to the amount of processes that can be running in our namespace.
echo 2 | sudo tee /sys/fs/cgroup/pids/my_ns_cgroup/pids.max
Filesystem
A filesystem (FS) is the method and data structure used to store and retrieve information in an operating system. Because of its influence, it is impossible to discuss the container filesystem without talking about Docker, since they proposed an archive-based architecture that is now embedded in every container runtime environment. This layer-based architecture for container images could be arguably one of the most important ones.
A layered container image is a tar archive with a JSON-based manifest that preserves changes to the stored data files by assigning each action with a layer name. In a running container, these layers merge to form a read and writable new layer reflecting the most recent changes.
This is possible by using the Overlay File System feature of Linux that results in an efficient way to quickly spawn new containers without wasting storage resources.
Now that we know the basics, let's explore how Docker uses these technologies to run a container.
First let's check out what driver is in use by executing the following command:
docker info | egrep -i storage
Overlay2 is an Overlay filesystem implementation and the default option in most Linux based distros. Overlay uses the UnionFS to merge two directories into a top layer; these directories are referred to as upper and lower directories.
It is important to note that lower directories are not modified in any way and it is possible for an upper directory to have multiple lower directories.
Execute the following command to pull the base Ubuntu image.
docker pull ubuntu:20.04
After the pull is finished, run the following command to determine the IMAGE ID hash:
docker image ls --no-trunc
Use the IMAGE ID hash from the previous output and run the following command to peek into the content of the manifest file.
Note: Adjust the Docker image path if you have changed the default docker installation path.
cat /var/lib/docker/image/overlay2/imagedb/content/sha256/d13c942271d66cb0954c3ba93e143cd253421fe0772b8bed32c4c0077a546d4d
Now that we know how to look for the manifests, let's run a Ubuntu container and check where the data layers are stored.
Execute the following command to run a Ubuntu container:
docker run --rm -it ubuntu:20.04 /bin/bash
Now execute this command to see the mounted overlays:
cat /proc/mounts | egrep overlay
You should be able to see the location of lowerdir and upperdir in the output result.
If you would like to experiment further, try examining the content of directories returned by the previous command in the host terminal.
Namespace networking
The namespace that we created earlier (my_ns) is isolated, meaning there is only a loopback network interface. Let's verify this.
From the host machine, let’s look at the interfaces in the namespace by executing the following command.
ip netns exec my_ns ip link
You should see an output similar to:
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Just like a computer, a namespace requires network interfaces to communicate with other resources in a network realm. There are a variety of virtual interfaces available in Linux that can offer different functionalities depending on your desired use case. Let’s use veth to add a virtual interface into our namespace.
ip link add ns_side netns my_ns type veth peer name host_side
We can verify our new virtual interface by running the ip link command inside the namespace.
ip netns exec my_ns ip link
You should see an output similar to:
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ns_side@if5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 56:48:4f:6f:4b:00 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Running ip link inside the host should output a similar result.
5: host_side@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether e2:8f:30:87:9f:25 brd ff:ff:ff:ff:ff:ff link-netns my_ns
If you look more closely at the previous terminal, interfaces ns_side and host_side have a DOWN status. This means both interfaces are disabled at the moment.
Issue the following commands to enable the logical interface for both sides.
ip link set host_side up ip netns exec my_ns ip link set ns_side up
Add an IP address to the host.
ip address add 192.168.254.1/32 dev host_side
Add an IP address to the namespace side.
ip netns exec my_ns ip address add 192.168.254.2/32 dev ns_side
Both host and namespace have an IP address, but cannot ping each other since these two are not in the same broadcast domain. We need routes in order to establish communication between the host and the namespace.
Use the following command to add a route for the host side.
ip route add 192.168.254.2 dev host_side
Use the following command to add a route for the namespace side.
ip netns exec my_ns ip route add 192.168.254.1 dev ns_side
Note: At this point, both interfaces are up and running. It is possible to extend this example by using NAT and providing internet access inside the namespace. However, that is out of the scope of this post.
CNI
As we previously established, containers are a combination of namespace, cgroup, and filesystem. Kubernetes uses container runtime applications, such as Docker, CRI-O, and containerd, to manage containers in a cluster. A container runtime is just a fancy way to automate the previous steps and create and maintain namespaces automatically.
Manually creating a container is a fun experience, but it is an impossible task if you want to apply it to a large-scale datacenter. Container networking environments are software defined networking (SDN) realms where SDN is used to establish communication with other resources within the datacenter. In a datacenter where a massive amount of containers is created and removed every second, SDN can provide the efficiency to process packets inside and outside of the container. In the context of Kubernetes, this is usually where the container networking interface (CNI) shines.
CNI is a CNCF project that provides a set of specifications and libraries to achieve networking in any environment. Anyone can use these specifications to write a plugin for any project and offer essential or advanced networking features.
Kubernetes complies with CNI specification standards to provide network connectivity for clusters. According to Kubernetes documentation, there are four areas that a CNI needs to address.
- Communication between containers
- Communication between pods
- Communication from a pod to a service
- Communication from an external resource to a local service
Calico Open Source is a networking and network security solution for containers, virtual machines, and native host-based workloads. It supports multiple data planes, including a pure Linux eBPF data plane, a standard Linux networking data plane, and a Windows HNS data plane. Calico provides a full networking stack, but can also be used in conjunction with cloud provider CNIs to provide network policy enforcement.
Conclusion
In this post, we walked through how to create a network namespace that runs bash binary, and hooked it to the host machine networking interface to establish the network communication between the two. Kubernetes provides a lot more features to orchestrate your containerized environment that can be daunting to replicate manually, but the point of creating one container is to show how CNI allows a container to establish connection with other resources inside an environment.
Check out this container security guide to learn how to secure Docker, Kubernetes, and all major elements of the modern container stack.
Related Articles:
The Evolution of DevSecOps with AI
Published: 11/22/2024
Six Key Use Cases for Continuous Controls Monitoring
Published: 10/23/2024
Rowing the Same Direction: 6 Tips for Stronger IT and Security Collaboration
Published: 10/16/2024
Secure by Design: Implementing Zero Trust Principles in Cloud-Native Architectures
Published: 10/03/2024