naftalyava.com

Between Containers and Virtual Machines

Kata Containers: When Containers and Virtual Machines Make a Baby

In March this year, we celebrated the 10-year anniversary of Docker, a technology that revolutionized the way we build and deploy applications. The adoption was explosive—so explosive that it’s extremely difficult to find a developer today who isn’t using containerization technology in some capacity.

The next evolution of containerization technology, Kata Containers, is still relatively unknown. To fully grasp the significance of its emergence, it’s essential to take a journey back to the roots of virtualization technology.

Xen Hypervisor & Hardware-Assisted Virtualization

In simple terms, virtualization allows a single physical computer to be divided into multiple virtual computers. This magic is achieved through a piece of software known as the hypervisor.

While the concept of virtualization dates to the late 1960s, significant advancements were made in the mid-2000s. In 2005/2006, Intel and AMD introduced hardware-assisted virtualization with VT-x and AMD-V technologies for the x86 architecture. This enabled virtual machines (VMs) to operate with minimal performance overhead. That same year, the Xen hypervisor incorporated VT-x and AMD-V, setting the stage for the rapid growth of public cloud platforms like AWS, GCP, Azure, and OCI.

As businesses increasingly adopted cloud services, the need for heightened security became paramount, especially when different clients shared the same physical hardware for their VMs. Hypervisors leverage hardware capabilities to ensure distinct separation between VMs. This means that even if one client’s VM experiences a crash or failure, others remain unaffected.

Going a step further, AMD introduced Secure Encrypted Virtualization (SEV) technology in 2017, enhancing VM isolation. With SEV, not only is data protected from other VMs, but it also introduces a “zero-trust” approach towards the hypervisor itself, ensuring that even the hypervisor cannot access encrypted VM data. This offers an added layer of protection in a shared-resource environment.

The Emergence of Containers

Virtual machines streamlined application deployment by abstracting the underlying hardware. However, they introduced new challenges. For instance, there was still the responsibility of maintaining the operating system (OS) on which the application ran. This involved configuring, updating, and patching security vulnerabilities. Furthermore, installing and configuring all the application’s dependencies remained a tedious task.

Containerization emerged as a solution to these challenges. Containers package the application together with its environment, dependencies, and configurations, ensuring consistency across deployments.

Recognizing the need to simplify OS maintenance further, AWS launched Fargate in 2017. With Fargate, developers can run containers on the cloud without the overhead of OS management. As the popularity of containerized applications surged, orchestrating these containers at scale became a challenge. This was effectively addressed by technologies like Kubernetes, which automate the deployment, scaling, and management of containerized applications.

Building on the Foundations

Having understood the intricate evolution of hardware capabilities over four decades, we appreciate how instrumental these advancements were in enhancing both performance and security for virtual machines. These developments have made it feasible to run multiple virtual machines on a single physical host without significant performance overhead while also maintaining strong isolation between them.

However, when it comes to containers, a different approach is taken. Unlike virtual machines, which rely heavily on these hardware-based virtualization features, containers don’t create whole separate virtualized hardware environments. Instead, they function within the same OS kernel and rely on built-in features of that kernel for their isolation.

At the heart of container isolation is a mechanism called namespaces. Introduced in the Linux kernel, namespaces effectively provide isolated views of system resources to processes. There are several types of namespaces in Linux, each responsible for isolating a particular set of system resources. For example:

And so on, for user, UTS, cgroup, and IPC namespaces.

The beauty of namespaces is their ability to provide a lightweight, efficient, and rapid isolation mechanism. This makes it possible for containers to start almost instantaneously and use minimal overhead, all while operating in isolated environments.

However, it’s essential to understand that while namespaces provide a degree of isolation, they don’t offer the same robust boundary that a virtual machine does with its separate kernel and often hardware-assisted barriers.

Kata Containers is Born

Building on the foundation of virtual machines and containers, Kata Containers emerged as a solution that seamlessly fuses the strengths of both worlds.

Traditional containers, with their reliance on kernel namespaces, bring unparalleled agility and efficiency. They can be spun up in fractions of a second and have minimal overhead, making them perfect for dynamic, scalable environments. On the other hand, virtual machines, backed by decades of hardware innovation, offer a more robust isolation boundary, giving a heightened sense of security, especially in multi-tenant environments.

Kata Containers seeks to bridge the gap between these two paradigms. At its core, Kata Containers provides a container runtime that integrates with popular container platforms like Kubernetes. But instead of relying solely on the kernel’s namespace for isolation, Kata Containers launches each container inside its lightweight virtual machine.

The below diagram demonstrates the difference between VM, Kata Containers, and conventional containers.

Kata Containers Architecture

read more

How to Set Up Open vSwitch & QEMU/KVM for Your Virtualization Home Lab

Introduction

Today, we’re addressing a common challenge for home lab users: how to make the most of our server with multiple cores and significant memory. Whether you’re running a Plex server, Kubernetes clusters, or VMs for development, a frequent issue is that these VMs are often hidden behind NAT, limiting their accessibility from other devices on your network.

Tools Overview

To solve this, we will use two open source projects:

  1. QEMU/KVM: This hypervisor and virtual machine monitor lets us create and manage VMs.
  2. Open vSwitch (OVS): This is a software-based switch that enables more complex network configurations in virtualized environments than traditional switches.

Understanding the Problem

The main issue is that VMs operating behind NAT are not directly accessible from other machines on your home network, which can restrict your ability to interact with these VMs from other devices.

Implementing the Solution

Here’s how to configure your VMs to be accessible within your network using Open vSwitch and QEMU/KVM.

Step 1: Setting Up Open vSwitch

  1. Install Open vSwitch:
    sudo apt-get install openvswitch-switch
    
  2. Create a Virtual Switch:
    sudo ovs-vsctl add-br vm_net
    
  3. Verify the Bridge:
    sudo ovs-vsctl show
    

    Ensure your newly created virtual bridge vm_net is listed.

Step 2: Configuring Network Interface

We now link our network interface to the virtual bridge to allow VMs to communicate with the home network.

  1. Add Network Interface to the Bridge:
    sudo ovs-vsctl add-port vm_net eth0
    

    Replace eth0 with the correct identifier for your network interface.

  2. Check Configuration:
    sudo ovs-vsctl show
    

    Make sure the network interface is correctly integrated with the bridge.

Step 3: Adjusting VM Network Settings

We need to ensure that the VMs utilize the Open vSwitch bridge for network communication.

  1. Update VM Network Config: Adjust your VM’s network configuration to connect through the vm_net bridge:
    <interface type="bridge">
       <source bridge="vm_net"/>
       <virtualport type="openvswitch"></virtualport>
       <model type="e1000e"/>
    </interface>
    
  2. Restart the VM:
    sudo virsh start <vm-name>
    

Verifying the Setup

After these configurations, your VM should receive an IP address from your home DHCP server. Check the VM’s network details and try to ping the VM from another device in your network to ensure connectivity.

Conclusion

By integrating QEMU/KVM with Open vSwitch, you’ve overcome the NAT limitations, making your VMs fully accessible within your network. This configuration not only simplifies network management but also enhances the usability of your home lab.

If you prefer to consume this post as a video, I got you covered:

read more

Exploring eBPF and XDP: An Example

A year ago, I was exploring a few Kubernetes CNI plugins when I stumbled upon the Cilium project. Cilium uses eBPF and XDP for network traffic control, security, and visibility.

eBPF (Extended Berkeley Packet Filter) allows you to attach your code on the fly to almost any function within the Linux kernel. XDP (Xpress DataPath), on the other hand, enables manipulation of network traffic even before it reaches the network stack of the Linux kernel. Essentially, eBPF and XDP let you dynamically add logic to network traffic control while bypassing the kernel network stack potentially giving you better performance.

Although I initially considered utilizing these technologies to accelerate Kubernetes workloads using a DPU, a type of smart NIC, eventually I scrapped this XDP offload idea and went in a different direction, but the technology remained stuck in my head since then.

Fast forward to today, I decided to spend a weekend building a functional example that uses most of the basic building blocks of eBPF and XDP.

What the code does?

How?

  1. Utilize libbpf to abstract away many of the repeating eBPF boilerplate code, simplifying the process of writing, loading, and managing the eBPF program.
  2. Establish communication between the user-space code and the eBPF program.
  3. Utilize an eBPF ring buffer for communication where the XDP will be the initiator.
  4. Use an eBPF hash map allowing user-space code to dynamically define which IPs should be blocked.

Let’s break down the main parts of the eBPF code.

read more

How To: Remote Development on VSCode using SSH

My personal setup at home includes several machines: a Windows 11 machine and a Linux based home server. Now while Windows 11 is perfect for web browsing and occasional gaming, the bulk of my time is spent writing and compiling code and Windows is not the ideal environment for that. This is where the “Remote SSH” plugin for VSCode comes in handy. It allows you to use your VSCode running on Windows as if It was running on your Linux machine.

Below are the required configuration steps:

  1. On you Windows machine, generate SSH key pair. Open PowerShell and run the following command:

    ssh-keygen -b 4096

    By default this will generate two keys under c:/Users/<user name>/.ssh/ Copy the public key [content of id_rsa.pub].

  2. On your Linux machine, run the following command to create authorized_keys file:

    vim ~/.ssh/authorized_keys Paste the public key from earlier step.

  3. Install Remote SSH plugin for VSCode.

    Remote SSH Plugin

  4. To configure the plugin, click ctrl + shift + p and type ssh config. Open the configuration file and fill it with the following [adjusted with your IP addresses and etc]:
    Host 192.168.1.10
      HostName 192.168.1.10
      User navadiaev
      Port 22
      PreferredAuthentications publickey
      IdentityFile "C:\Users\nafta\.ssh\id_rsa"
    
  5. Click ctrl + shift + p again and type connect to host. You should be able to select the host you just configured and login.

Below is a video where I execute the above instructions:

read more

Network Booting Using iPXE With DHCP Proxy

In this short post I am going to explain how to setup an iPXE server with a DHCP proxy, meaning you will not need to configure anything on the existing DHCP server you have on the network. This comes especially handy when you can’t control/modify the existing DHCP server.

Lets dive into the setup instructions. In case you will want to understand a bit more how it all works, I will be uploading a youtube video which explains the configuration provided in this post.

  1. Install dnsmasq:

    sudo apt-get install dnsmasq

  2. Get ipxe from https://ipxe.org/download, you can get the source code and compile yourself or just download the precompiled binaries. You can also download all the files needed from link at the end of this post. The file you will need from this step is ipxe.efi, it needs to be placed in the root folder of your tftp server.

  3. Download Ubuntu Live 22.04 ISO, from www.ubuntu.com and from the iso image retrieve /casper/initrd and /casper/vmlinuz files. Create folder “casper” at the root folder of your tftp server, and copy both of these files there.

  4. You also should create grub/grub.cfg configuration file under your root tftp folder. This file is defines the boot menu you see once your iPXE client boots. Below is an example where we use our iPXE server to boot a Ubuntu ISO image from Ubuntu web servers:

    menuentry "Install Ubuntu 22.04 (Pull the iso from web)" {
       set gfxpayload=keep
       linux   /casper/vmlinuz url=https://releases.ubuntu.com/jammy/ubuntu-22.04.1-desktop-amd64.iso only-ubiquity ip=dhcp ---
       initrd  /casper/initrd
    }
    
read more