gyptazy

DevOps

Developer

IT Consultant

gyptazy

DevOps

Developer

IT Consultant

Blog Post

XCP-ng – A More Professional Alternative to Proxmox Based on Xen

2024-07-20 XCP-ng, Xen
XCP-ng – A More Professional Alternative to Proxmox Based on Xen

After Broadcom increased the license fees, other virtualization solutions than VMware ESXi become very popular. I already talked about other solutions like CBSD, running on FreeBSD using bhyve for virtual machines (VM) workloads or Harvester, as an HCI solution that also allows you to run VM in a different approach. However, we mostly hear Proxmox in one of the first sentences when it comes to VMware ESXi replacements and Proxmox is a really great software based on KVM. However, there’re also other solutions around and an also outstanding one is definitely XCP-ng. XCP-ng is based on Xen and follows a completely different approach than Proxmox with KVM.

XCP-ng (Xen Cloud Platform – Next Generation) is an open-source virtualization platform which is created from the XenServer hypervisor. It is designed to offer a powerful, robust and also cost-effective solution for managing virtualized environments. Based on the popular Xen hypervisor, XCP-ng benefits from Xen’s reputation for performance and scalability, making it suitable for a variety of virtualization tasks. The open-source nature of XCP-ng is a major advantage, providing users with full access to inspect, modify, and contribute to all its components, which in turn supports transparency and encourages community involvement. However, when looking for XCP-ng you will only find some smaller discussions, high level summaries and XCP-ng compared to Proxmox. I deep dived into both projects, even developed ProxLB for Proxmox and can’t tell which is better in general – just because it highly depends on your needs. Let’s have a look at the things in detail!

History

XCP-ng (initially released in 2018) is a Linux distribution based on the Xen Project, featuring a pre-configured Xen Hypervisor and Xen API (XAPI) ready for use. Launched in 2018, it originated from a fork of Citrix XenServer, which had been open-sourced in 2013 but was later discontinued as a free and open-source offering by Citrix. This led to the revival of the project as XCP-ng. As of January 2020, XCP-ng is part of the Linux Foundation through the Xen Project. It might also be interesting to learn more about the different virtualization modi in Xen.

XCP-ng vs Proxmox

Let’s have a short look at some key facts of XCP-ng and Proxmox:

XCP-ng

  • Open-Source
  • Based on Xen
  • Tier-1 Hypervisor
  • Microkernel
  • VM
  • 1000+ Nodes per Cluster
  • Mgmt IPv4 only (changes with v8.3)
  • No Webinterface (changes with v8.3)
  • Complex network (including VLANs, bonding, LACP, SR-IOV)
  • Local, NFS, iSCSI, HBA, Ceph
  • Backup does not support de-duplication
  • VM disk size max. 2TB
  • Live migration of VMs
  • Loadbalancing (DRS)

Proxmox

  • Open-Source
  • Based on KVM
  • Tier-2 Hypervisor
  • Full-fledged Linux Kernel
  • VM + LXC (Container)
  • ~30 Nodes per Cluster
  • Mgmt IPv4 + IPv
  • Built-In Webinterface
  • Complex network (including VLANs, bonding, LACP, SR-IOV)
  • Local, NFS, iSCSI, Ceph, ZFS, OCFS2
  • Backup supports de-duplication
  • Only limited by the used storage type
  • Live migration of VMs
  • No Loadbalancing (but with ProxLB)

Both solutions grow up and it’s harder than ever to decide. Every solution comes with its own pros and cons.

Security

Proxmox and XCP-ng differ significantly in their security models, primarily due to their underlying kernel architectures. Proxmox is built on a regular Linux kernel, a monolithic kernel that integrates all system services and drivers directly into the core operating system. This approach can lead to a larger attack surface since any vulnerability within the kernel can potentially be exploited to gain control over the entire system. However, the extensive and continuous security updates provided by the Linux community help mitigate these risks.

In contrast, XCP-ng is based on the Xen hypervisor, which utilizes a microkernel architecture. Xen is a microkernel with around 200,000 lines of code, compared to Linux’s 2 million lines of code, making it significantly smaller and more stable.

The microkernel design of Xen inherently reduces the attack surface by isolating essential services and running them in separate, unprivileged domains. This isolation enhances fault tolerance and security because a breach in one service domain is less likely to compromise the entire system. By keeping the hypervisor minimal and delegating most system services to separate drivers or extensions, Xen reduces the risk of vulnerabilities within the hypervisor impacting the broader system. This design inherently limits the attack surface, enhancing overall security. Additionally, Xen incorporates robust security features such as access control and sandboxing, which further isolate guest VMs from one another, preventing potential cross-VM attacks. Additionally, Xen’s security process is highly robust, featuring a pre-disclosure list, a clear method for reporting security issues, and a dedicated security response team, making it one of the best security models in the industry. This meticulous security framework is far superior to that of Linux, which runs KVM.

Another key security advantage of Xen is its design for inter-VM communication, which is secure by design. In Xen, inter-VM communication is mediated by the hypervisor, which acts as an arbiter between VMs that do not trust each other. This approach, coupled with Xen’s minimal attack surface, provides a secure method for handling requests between VMs. In contrast, KVM’s use of virtio often results in direct memory access (DMA) everywhere. As a result, it may be more susceptible to certain attacks, including those that exploit privilege escalation or memory corruption.

Performance

Xen offers several virtualization modes, each with unique benefits and trade-offs. Full virtualization enables the hypervisor to completely simulate the underlying hardware, allowing unmodified guest operating systems to run as if on a physical machine. This mode leverages hardware-assisted virtualization features from modern CPUs, like Intel VT-x and AMD-V, ensuring broad OS compatibility and simplifying the migration of existing systems. However, full virtualization can introduce performance overhead due to the necessity of hardware emulation and the complexity of the hypervisor.

Paravirtualization, on the other hand, requires the guest operating system to be modified to interact directly with the hypervisor. This direct interaction reduces the overhead associated with full hardware emulation, significantly boosting performance. Paravirtualization is efficient and scalable but requires changes to the guest OS, which limits compatibility to those systems that can be modified, thus excluding many proprietary or closed-source operating systems.

Combining aspects of both full virtualization and paravirtualization, Hardware-Assisted Paravirtualization (PVH) utilizes hardware virtualization extensions along with paravirtualized interfaces. This hybrid approach offers improved performance and simplicity compared to full virtualization while being more accommodating of various guest operating systems than pure paravirtualization. However, PVH may still necessitate some OS modifications and relies on the availability of hardware virtualization extensions.

Xen’s architecture is made of several essential components: the Xen hypervisor, Domain 0 (Dom0), and Domain U (DomU). The Xen hypervisor operates directly on the hardware, managing resources like CPU and memory, and providing a platform for guest operating systems. Domain 0, or Dom0, is a special, privileged virtual machine with direct hardware access that manages other virtual machines. Dom0 usually runs a modified version of Linux. Domain U, or DomU, refers to unprivileged virtual machines that run user workloads and can operate in any of Xen’s supported virtualization modes.

When it comes to KVM, we mostly have the same options. But it wasn’t always that way. Designed to enhance communication between guest and host systems, VirtIO is a standard for network and disk device drivers in virtualized environments. It focuses on high performance and low overhead by offering a consistent interface for virtual devices, making their development simpler and ensuring hypervisor compatibility. KVM has included VirtIO since 2007, which has greatly improved the interaction between guest systems and virtual devices. This improvement, over traditional emulation methods, has helped establish KVM as a reliable and efficient virtualization solution used widely in cloud and enterprise sectors.

In older articles, you might frequently find claims that Xen is faster, while more recent articles might assert that KVM has the speed advantage. These differing conclusions often come from various factors, including the type of virtualization being compared. Earlier comparisons sometimes pitted fully virtualized environments against paravirtualized ones, which skewed results, particularly before the presence of VirtIO for KVM. VirtIO significantly enhanced KVM’s performance, making direct comparisons more complex and context-dependent.

But what is really the current state in 2024 when comparing XCP-ng vs Proxmox?

To give it a try, I created two test cases with a single local NVMe disk:

  • Running in a VM
  • Running on dedicated hardware

Running in a VM

Virtualizing a hypervisor like Xen/XCP-ng or Proxmox/KVM within a virtual machine can be a practical solution when hardware resources are limited, allowing for more efficient use of existing infrastructure. This setup is ideal for lab environments where performance is not a critical factor, providing a cost-effective way for users to learn and practice virtualization technologies. Additionally, it enables homelab enthusiasts to experiment and test configurations without needing additional physical servers. Furthermore, leveraging affordable VPS offers available online can extend these benefits, enabling users to run virtualized hypervisors without substantial investment in hardware. Keep in mind, that the performance may decrease drastically in such setups. However, it might still be interesting to compare such setups.

For this test case, I used a dedicated system running the current version of Proxmox with the following specs:

CPU: AMD Ryzen 7 5700U
RAM: 2x 32GB 3200 Mhz DDR4 (Cruical)
Disk: 1x NVMe (Kingston OM8PGP41024Q-A0)
OS: Proxmox 8.2 (Debian 12)

On top of this, both hypervisors got installed which were not running in parallel. It was the only VM running on top when running the tests. Inside this virtualized hypervisor, a small sized Debian 12 (bookworm) VM with 1 vCPU (host CPU), 2 GB memory and 10 GB disk (paravirtualized) has been installed. Inside this VM, the YABS test has been performed. In that case, only the disk speed and the Geekbench 6 score are important to us.

This outcome was somehow unexpected since we can see a very huge gap in the disk metrics between Proxmox and XCP-ng while the CPU score in Geekbench 6 comes closer to a par. Seeing this, I placed the Debian test VM for each hypervisor on a shared NFS storage which is connected by a 10G line to the Proxmox node. Also here, we can see how they diverge from each other by increasing the FIO size. As running such a setup is not really recommended, we can also clearly see that Xen is not an option for such a usage unless it really only comes some technically tests where VM performance absolutely doesn’t matter. Xen operates too close to the underlying hardware that it cannot perform in a virtualized environment. However, it might still be useful when it comes to developing and contributing to the project. For Proxmox, this might still be a legit solution where it still performs in a great way.

Running on Dedicated Hardware

In the next round of testing, I tried running the hypervisors directly on the hardware, which is the standard procedure. Proxmox performed without any issues, but XCP-ng had storage complications. It appears that the AceMagician AM06 device had driver issues with the controller, causing a drastic drop in performance. As a result, I took advantage of another device to continue with the tests. For the test I used the following configuration, which got tested in an equal way with Proxmox and XCP-ng:

CPU: AMD Ryzen 7 5700U
RAM: 2x 32GB 3200 Mhz DDR4 (Cruical)
Disk: 1x NVMe (Kingston OM8PGP41024Q-A0)
OS: Proxmox 8.2 / XCP-ng 8.2.1

With the new hardware, it got really interesting to validate the performance metrics from the inside of a VM. Therefore, the typical Debian 12 VM with 1 vCPI, 2GB memory and 10GB disk got created. All tests were performed inside this VM running on Proxmox and XCP-ng in PVHVM mode.

It initially appears that Proxmox is still twice as fast as XCP-ng, but it’s crucial to note that tapdisk in Xen is single-threaded. Comparing a single VM on each hypervisor gives us numbers, but they aren’t meaningful. Consequently, I set up four VMs and revalidated this on both platforms, leading to a completely different outcome. With four VMs, we equaled Proxmox’s performance. While we had such different values for the disk, the CPU values were mostly equal and consistent during all tests.

Performance Outcome

Both hypervisors provide great performance when running in a real-life setup with multiple guests and it’s a no-brainer that no one would run just a single VM on a whole hypervisor. Therefore, the performance is comparable between both ones when it comes to local disk storage.

Things may be different in general when it comes to other storage types like iSCSI, NFS etc. Unfortunately, I only ad the possibility to try NFS in the virtualized environment which I also provide here in my overview but would completely ignore since we saw the issues running Xen in a nested VM. The next step is about gathering metrics with NFS to compare them here. This will be updated at a later time.

Installation XCP-ng

The installation is pretty straight forward and the same procedure as always. The ISO images can be downloaded at: https://mirrors.xcp-ng.org/isos/8.2/xcp-ng-8.2.1-20231130.iso. If needed, you can just copy the image to an USB drive by simply running the following command:

dd if=xcp-ng-8.2.1-20231130.iso of=/dev/sdX bs=8M oflag=direct

The installer will prompt you to select the target disk for the installation. Choose the appropriate disk, keeping in mind that this will erase any existing data on it. Configure the network settings, either manually or via DHCP, to ensure the node can connect to your network. You will also need to set a root password for administrative access.

Proceed with the installation process, which will copy the necessary files to the disk and configure the system. After the installation is complete, the system will prompt you to remove the installation medium and reboot the node. Upon reboot, XCP-ng will load, and you will be able to access it via the console or remotely using tools such as Xen Orchestra or the XCP-ng Center.

However, XCP-ng just comes as a hypervisor without any management interface. The current recommended solution is to install Orchestra. Orchestra runs directly as a VM on one of the nodes. There are different ways to install Orchestra, where you can also build and install it directly from the sources if you like. For a quick test, the easiest way is to navigate to the IP address of the XCP-ng node (e.g. https://10.10.10.1). 

Install Orchestra

Currently, Orchestra is the solution to manage your cluster including all the nodes. It must be installed on to of the hypervisor and can be installed by an easy-to-use installer or manually by building it from source. When using the easy method, the vendor’s image is used which does not provide you all features without a valid license. Therefore, it might be interesting to build it from scratch. There’re also some container (Docker/Podman) images to use. However, in near future – with the upcoming version 8.3 of XCP-ng – a small management tool will be shipped just like in Proxmox and is called XO Lite. In my other blog post you can find some more information about it.

Orchestra’s interface is packed with features to simplify the management of XCP-ng environments and can simply be accessed by its IP address on a web browser after installation. It includes a dashboard that gives an overview of hosts, VMs, and storage statuses. VM management is streamlined with options to create, clone, start, stop, and migrate VMs, along with advanced features like snapshots and live migrations. Monitoring tools provide comprehensive data on CPU, memory, and network usage. The backup and restore features are extensive, supporting various backup types and scheduling. Storage management supports local storage, NFS, and iSCSI repositories. Access control allows for the creation of user roles and permissions, ensuring secure access. Network management is facilitated with tools for creating and configuring virtual networks. The update mechanism ensures both Xen Orchestra and XCP-ng hosts remain up-to-date. Plugins extend the interface’s functionality with features like automated backups, continuous replication, and alerting systems.

There are multiple ways to install Orchestra on on of your XCP-ng nodes and it is in the operator’s decision to choose the desired way. However, one of the easiest ways to install Orchestra is by navigating to a node’s IP address by HTTPS (e.g. https://10.10.10.1). The “quick-deploy” offers you to install Orchestra directly to this node.

The wizard will therefore require some more information like the username and password of the underlying host node but also for the newly to be created OXM VM. An IP address can be defined but also obtained by DHCP. As already mentioned before, another approach leads into using the sources. This solution is covered within my howto blog post here.

Good to know

XCP-ng is a bit more nit picky regarding hardware than Proxmox. Working with cheap consumer hardware may even make some trouble when trying to create some test- or dev labs. We already saw the issues during the tests with the AM06 system where we had to switch to other hardware to gather meaningful metrics. But it also applies to all other peripheral devices like external network dongles. The installer might simply welcome you with this error:

But even solving this issue does not mean that everything works out of the box. In this blog post I described the way how to use external network adapters for the management interface by creating udev rules and deleting the already existent piff entry of the device. This issues apply also to other devices such as controllers. Therefore, it is important to validate the compatibility within the Hardware Compatibility List (HCL) at Xen before installing or even buying new hardware.

Also good to know, that XCP-ng requires an IPv4 management interface. Running an IPv6 only management network is unfortunately not possible.

There are also of course some limitations and the limitation are high but should be kept in mind. Therefore, each XCP-ng node can have up to 6 TB memory, 16 physical NICs, 512 logical CPUs, 512 virtual NICs and up to 800 VLANs.

LTS Releases of XCP-ng are supported for five years, if you prefer latests improvements you can also use the latest standard release. The current 8.2 LTS release ist supported until 2025-06-25.

Currently there isn’t any IPv6 support for the management interface. There was a way to integrate IPv6 for the management interfaces but is now postponed to XCP-ng 8.3.

Also pretty interesting is the fact, that XCP-ng will start with version 8.3 to ship a tiny web interface for administration out of the box. XO Lite will provide basic management of the nodes. Currently, it’s still in the beta an can be used with XCP-ng Beta/RC versions.

The 2TB disk limit in Xen might also be annoying for many use cases and is caused by the storage stack SMAPIv1 when VHD is used. However, you might still be able to use the RAW format but you’ll be loosing all VHD features like Snapshot, delta backup, fast clone and live storage migrations. The limitation will probably be gone in future by using SMAPIv3 – but it’s still a long way to go.

Using XCP-ng without license also means that you’re running without vendor support. While smaller issues might be easy to fix, bigger ones might lead into serious issues. The community is still small and on the web you rarely find tips solving real issues but you can still reach out to the community in the forums or the chat in Discord, IRC and Mattermost.

Conclusion

XCP-ng imposes stricter hardware compatibility requirements compared to Proxmox, often resulting in issues with consumer-grade hardware. Peripheral devices, such as external network dongles, may face compatibility challenges, necessitating verification against Xen’s Hardware Compatibility List (HCL). XCP-ng mandates an IPv4 management interface, as IPv6 support is currently unavailable, though planned for version 8.3. The 2TB disk limit when using the VHD format due to the SMAPIv1 storage stack may also be an issue in some cases which can still be worked around by loosing some features of the VHD image format.

The upcoming XCP-ng 8.3 release will introduce XO Lite, a basic web interface which is currently in beta (see also my blog post about XO Lite). Long-Term Support (LTS) releases are supported for five years, with the present 8.2 LTS release receiving updates until June 25, 2025. So, when evaluating XCP-ng, this might be the perfect time to be ready when the new LTS release is coming.

Also to consider, the smaller XCP-ng community may complicate troubleshooting due to the absence of vendor support without a license. However, community support is available through forums and various chat platforms. Proxmox, leveraging a larger user base and a regular Linux kernel architecture, offers extensive community resources and frequent security updates, despite a potentially larger attack surface due to its monolithic kernel architecture.

In the end, there is no right or false and XCP-ng is definately a great approach. With the upcoming version many features that might look annoying will be handled in a different and better way which will be very welcomed by new users.

My personal benefits by using XCP-ng over Proxmox results in a smoother way of upgrading the host nodes. How often did something break in Proxmox after an upgrade and needed to be fixed? Also, I love that the count of nodes in a cluster scales much butter than in Proxmox with Corosync and pmxcfs which is also very nit picky about latencies. With XCP-ng it isn’t a problem to manage nodes that are not nearby. On the other hand, I still use Proxmox for environments where a mixed usage of VMs and CTs is being used. Unfortunately, there isn’t any container support in XCP-ng integrated (sure, thats a completely different approach). Also that some features that we already knew from VMware ESX with DRS might have an important role in enterprises. In XCP-ng / Orchestra we need a license to use this feature, while we have a free third-party alternative with ProxLB for Proxmox. Lastly, it’s up to the use case where and what you need to choose your suitable solution. If this was interesting to you, you might also want to have a look at Harvester as a more modern solution compared to Proxmox, which is based on Kubernetes.

Taggs: