bhyve on FreeBSD and VM Live Migration – Quo vadis?
():When I think about bhyve Live Migration, it’s something I encounter almost daily in my consulting calls. VMware’s struggles with Broadcom’s licensing issues have been a frequent topic, even as we approach the end of 2024. It’s surprising that many customers still feel uncertain about how to navigate this mess. While VMware has been a mainstay in enterprise environments for years, these ongoing issues make customers nervous. And they should be – it’s hard to rely on something when even the licensing situation feels volatile.
Now, as much as I’m a die-hard FreeBSD fan, I have to admit that FreeBSD still falls short when it comes to virtualization – at least from an enterprise perspective. In these environments, it’s not just about running a VM; it’s about having the flexibility and capabilities to manage workloads without interruption. Years ago, open-source solutions like KVM (e.g., Proxmox) and Xen (e.g., XCP-ng) introduced features like live migration, where you can move VMs between hosts with zero downtime. Even more recently, solutions like SUSE Harvester (utilizing KubeVirt for running VMs) have shown that this is now an essential part of any virtualization ecosystem.
Whenever I talk to customers about possible alternatives to VMware, I find myself saying, “There’s nothing that can replace VMware one-to-one!” And it’s unfortunately the truth. That’s why it’s so important to understand their requirements fully before suggesting any solutions. Nine times out of ten, live migration is a critical or even very critical requirement. And as soon as we get to that point, it becomes clear that FreeBSD just isn’t a realistic option for most enterprises right now – at least not for virtualization environments with enterprise requirements.
Sure, I’ve been able to migrate a few systems to FreeBSD based solutions, but these have always been single-node setups. That’s fine for some use cases, but in today’s enterprise world, the idea of telling a customer they need to shut down their VMs just to perform maintenance on the host is absurd. Downtime, even for short periods, is unacceptable. Even if you’re just copying a VM’s disk to another host, shutting it down, and syncing deltas before bringing it back online, the downtime is still too long by 2024 standards. All established connections would drop. Imagine dropping the VPN connections, (huge) file transfers and all other important sessions. That’s simply not a viable solution.
We were finally starting to make progress with bhyve and live migration in the FreeBSD world, and it’s a huge relief. For a long time, I’ve felt that we needed to shift away from the mindset of “it fits for me.” Sure, FreeBSD works great for those of us who are die-hard fans, but the truth is, there are millions of people out there with different requirements, especially in enterprise environments. We can’t just focus on what works for us personally – if we want FreeBSD to grow and become a serious competitor to Linux-based virtualization solutions, we need to think bigger. I’ve already written a blog post about this, sharing my thoughts on how we can make FreeBSD more attractive to new users and meet the demands of modern IT infrastructure.
Luckily, there are some really talented people working to address these gaps. Elena Mihăilescu, Mihai Carabaș, Oleg Ginzburg, and Oleg Minin (Oleg G., and Oleg M. in cbsd (cbsd live migration) / ClonOS) are tackling the tough parts of making live migration a reality for FreeBSD. Elena and Mihai presented the latest updates at BSDCan 2019 in Ottawa, and their work gives us a clearer picture of how things are evolving. Their paper provides fresh insights into the concepts behind moving a guest from one host to another, covering cold, warm, and live migrations.
Live migration, of course, is the most critical for enterprises. It allows you to move a running VM from one host to another without any downtime, and it works by migrating the memory in rounds while the guest is still running. In the final round, the source VM is stopped, any remaining memory is copied over, along with the CPU and device state, and then the VM is restarted on the destination host. It’s a delicate process, and memory migration is always the most complex part, requiring multiple rounds to ensure a smooth transition.
One of the techniques they use is a page fault-based approach. And here’s how it works: first, the source VM is stopped and its CPU and device state are copied to the destination host. Then, the VM is started on the new host, but instead of copying all the memory at once, the system waits for a page fault to occur. When a page fault happens, that memory page is copied over at that moment. This method helps reduce the downtime and keeps the VM running as smoothly as possible during the migration.
Like with any live migration technology, the memory migration itself is the trickiest part. Doing it efficiently and without interrupting the running system requires careful orchestration, which is why it happens in multiple rounds. But with this ongoing work, we’re starting to see real progress, and it’s a huge step toward making FreeBSD a serious contender in the virtualization space.
Usage
How can this now be used? This is achieved by using the newly introduced parameter -migrate-live and can also easily be used.
# Run a VM
bhyve <options> vm_src
# Receive VM to be migrated
bhyve <options> -R src_IP,port vm_dst
# Send/Start VM to migrate
bhyvectl --migrate-live=dst_IP,port vm_src
Limitations
As exciting as the new developments in bhyve and live migration are, we have to be realistic about the current limitations. From my own perspective, while we’ve made progress, there are still several challenges that need to be addressed before FreeBSD and bhyve can truly compete in the enterprise virtualization space. The overall progress is not comparable with other solutions and it has become very quiet around this topic.
One of the biggest constraints is that live migration currently only works for the low memory segment – specifically for VMs with less than 3GB of RAM. While this has improved to ~14GB according to AsiaBSDCon 2023, it is still a major limitation for enterprise workloads.
Another limitation is the ongoing issues with memory corruption, which seem to be tied to bhyve’s handling of threads for network and disk operations during live migration. This severely impacts reliability in production environments.
Additionally, live migration was only possible with wired memory. While this ensures consistency, it limits flexibility and scalability in dynamic environments. Progress has been made here as well, again highlighted in the AsiaBSDCon 2023 papers.
I’m optimistic because these advancements are exactly what FreeBSD needs to become more appealing to a broader audience. We’re seeing real, actionable steps toward closing the gap with KVM and VMware.
A great example of how this could unfold is what we’ve seen with Podman running on FreeBSD. The same could happen with bhyve as live migration matures. Projects like ClonOS or BVCP (bhyve-webadmin) already point in the right direction. This broader engagement would be crucial for FreeBSD’s growth.