NAME
gyptazy - DevOps, Coding, Networking and BSD!

OPTIONS

CONTENT
NFSv3 vs NFSv4 Storage on Proxmox: The Latency Clash That Reveals More Than You Think (2025-07-04):
When it comes to virtualization, many people still think that NFS isn’t suitable for serious workloads in their enterprise environment and that you need to rely on protocols like iSCSI or Fibre Channel to get proper performance. That mindset might have made sense years ago, but times have changed. Today, we have access to incredibly fast networks and not only in enterprise but even at home. It’s not uncommon to see 10 Gbit networking in home labs, and enterprises are already moving to 25, 40, 100, or even 400 Gbit infrastructure. So the bottleneck is no longer bandwidth rather than the protocol overhead and hardware interaction that really matters. NFS, despite being around for decades, is often underestimated. Many still think of it as a basic file-sharing protocol, not realizing how far it’s come and how capable it is when properly configured and used with the latest versions. Especially in virtualized environments like Proxmox, NFS can be a powerful and flexible storage backend that scales very well if set up correctly. The problem is, people often judge NFS based on outdated assumptions or suboptimal configurations where they’re usually using older versions like NFSv3 without taking advantage of what newer versions offer. In this post, we’re focusing on the differences between NFSv3 and NFSv4, especially when it comes to latency. While high throughput can be achieved by running multiple VMs in parallel or doing sequential reads, latency is a whole different challenge. And latency is becoming more and more important. Whether you’re running databases, machine learning workloads, or any latency-sensitive applications, it’s not anymore just about how fast you can push and pull data in bulk – it’s about how quickly the system responds. That’s where NFSv4 really shines. Compared to NFSv3, it brings major improvements not just in functionality but also in how efficiently it handles operations. Features like n-connect, which allows multiple TCP connections per mount, and pNFS (Parallel NFS), which enables direct data access from clients to storage nodes, provide serious performance gains. On top of that, NFSv4 has better locking mechanisms, improved security, stateful protocol design, and more efficient metadata handling, all of which reduce round-trip times and overall latency. One major issue in such setups often relies in the default behavior, where many ones simply use NFSv3 and start complaining about the performance which is indeed often not that good. Therefore, you should take the time to properly configure NFSv4 in your environment.

Latencies by Storage-Type
Before we’re starting with our tests on NFSv3 and NFSv4, we should have a rough idea what kind of latencies we should expect during our tests.
Latency overview of different storage types

NFSv4 Advantages
NFSv4 offers major improvements over NFSv3, including:
o n-connect (multiple TCP connections per mount)
o pNFS (Parallel NFS for direct data access)
o Improved locking, security, and stateful protocol
o Efficient metadata handling for lower latency

Proper configuration of NFSv4 is key to achieving low latency and high throughput, especially for databases, ML workloads, or other latency-sensitive applications.

Test Setup
VM:
o OS: minimal Debian
o vCPU: 4
o RAM: 4GB
o Disk: VirtIO SCSI Single
o Network: 2x2.5Gbit

Proxmox Node:
o System: Ace Magician AM06 Pro
o OS: Proxmox 8.4.1
o CPU: AMD Ryzen 5 5500U
o Memory: 2x32GB DDR4
o Network: 2x Intel I226-V 2.5Gbit

Storage Server:
o System: GMKtec G9 NAS
o OS: FreeBSD 14.3
o CPU: Intel N150
o Memory: 12GB DDR5
o Disks: 2x WD Black SN7100 NVMe, ZFS Mirror
o Network: 2x Intel I226-V 2.5Gbit

NFSv3 Random Read Test

fio --rw=randread --name=IOPS-read --bs=4k --direct=1 --filename=/dev/sda \
    --numjobs=1 --ioengine=libaio --iodepth=1 --refill_buffers --group_reporting \
    --runtime=60 --time_based

IOPS-read: (groupid=0, jobs=1): read: IOPS=1641, BW=6566KiB/s, lat avg=607.07µs
clat percentiles: 50% = 644µs, 95% = 758µs, 99.99% = 1942µs
Disk utilization: 96.42%

NFSv4 Random Read Test

fio --rw=randread --name=IOPS-read --bs=4k --direct=1 --filename=/dev/sda \
    --numjobs=1 --ioengine=libaio --iodepth=1 --refill_buffers --group_reporting \
    --runtime=60 --time_based

IOPS-read: (groupid=0, jobs=1): read: IOPS=2377, BW=9509KiB/s, lat avg=419.05µs
clat percentiles: 50% = 379µs, 95% = 545µs, 99.99% = 1680µs
Disk utilization: 95.84%

Latency Comparison

Metric                  NFSv3      NFSv4.2      Notes
---------------------------------------------------------------
IOPS (avg)              1641       2377         ~45% higher for NFSv4.2
Bandwidth (avg)         6566 KiB/s 9509 KiB/s  ~45% higher throughput
Submit Latency (slat)   9.5µs      7.44µs       Lower slat with NFSv4.2
Completion Latency      597.54µs   411.62µs     ~31% lower for NFSv4.2
Total Latency           607.07µs   419.05µs     Overall ~31% faster
Latency Distribution    67% in 750-1000µs  70% under 500µs  Tighter latency with NFSv4.2
Max Completion Latency  4319µs     3057µs       ~30% lower
CPU Usage (sys%)        2.85%      3.37%        Slightly higher for NFSv4.2
Disk Utilization        96.42%     95.84%       Similar, disk-bound

The comparison between NFSv3 and NFSv4.2 clearly shows that NFSv4.2 delivers better performance in terms of both throughput and latency. The IOPS and bandwidth achieved with NFSv4.2 are about 45% higher, which reflects significant improvements likely due to advancements in the protocol such as enhanced caching, more efficient operations, and improved integration with the network stack. Latency also sees a notable reduction, with the average completion latency dropping from around 598 microseconds to about 412 microseconds, resulting in quicker response times for random input/output workloads.
NFSv3 vs NFSv4
Furthermore, the latency distribution under NFSv4.2 is more consistent and generally lower, with most requests completing in under 500 microseconds, whereas NFSv3 has a wider range with many requests falling between 750 and 1000 microseconds. While CPU usage is a bit higher for NFSv4.2, likely due to the additional features and kernel activity, this trade-off is common when achieving better performance and lower latency. Both setups show very high disk utilization, indicating that the tests are pushing the disk subsystem to its limits in either case.

Conclusion
NFSv4.2 clearly outperforms NFSv3, offering ~45% higher IOPS and throughput, with lower and more consistent latency. CPU usage is slightly higher, but the trade-off is worth it. Both setups saturate the disk, so the real improvement comes from protocol efficiency. Switching to NVMe-oF or SPDK can push performance even further, but NFSv4 remains an easy-to-deploy, Proxmox-integrated solution for reliable latency and bandwidth.