Message ID | 20250418142436.6121-1-nirsof@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io: Set unix socket buffers on macOS | expand |
Hi Nir, On 18/4/25 16:24, Nir Soffer wrote: > Testing with qemu-nbd shows that computing a hash of an image via > qemu-nbd is 5-7 times faster with this change. > > Tested with 2 qemu-nbd processes: > > $ ./qemu-nbd-after -r -t -e 0 -f raw -k /tmp/after.sock /var/tmp/bench/data-10g.img & > $ ./qemu-nbd-before -r -t -e 0 -f raw -k /tmp/before.sock /var/tmp/bench/data-10g.img & > > With nbdcopy, using 4 NBD connections: > > $ hyperfine -w 3 "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null:" > "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null:" > Benchmark 1: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null: > Time (mean ± σ): 8.670 s ± 0.025 s [User: 5.670 s, System: 7.113 s] > Range (min … max): 8.620 s … 8.703 s 10 runs > > Benchmark 2: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null: > Time (mean ± σ): 1.839 s ± 0.008 s [User: 4.651 s, System: 1.882 s] > Range (min … max): 1.830 s … 1.853 s 10 runs > > Summary > ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null: ran > 4.72 ± 0.02 times faster than ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null: > > With blksum, using one NBD connection: > > $ hyperfine -w 3 "blksum 'nbd+unix:///?socket=/tmp/before.sock'" \ > "blksum 'nbd+unix:///?socket=/tmp/after.sock'" > Benchmark 1: blksum 'nbd+unix:///?socket=/tmp/before.sock' > Time (mean ± σ): 13.606 s ± 0.081 s [User: 5.799 s, System: 6.231 s] > Range (min … max): 13.516 s … 13.785 s 10 runs > > Benchmark 2: blksum 'nbd+unix:///?socket=/tmp/after.sock' > Time (mean ± σ): 1.946 s ± 0.017 s [User: 4.541 s, System: 1.481 s] > Range (min … max): 1.912 s … 1.979 s 10 runs > > Summary > blksum 'nbd+unix:///?socket=/tmp/after.sock' ran > 6.99 ± 0.07 times faster than blksum 'nbd+unix:///?socket=/tmp/before.sock' > > This will improve other usage of unix domain sockets on macOS, I tested > only qemu-nbd. > > Signed-off-by: Nir Soffer <nirsof@gmail.com> > --- > io/channel-socket.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/io/channel-socket.c b/io/channel-socket.c > index 608bcf066e..b858659764 100644 > --- a/io/channel-socket.c > +++ b/io/channel-socket.c > @@ -410,6 +410,19 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, > } > #endif /* WIN32 */ > > +#if __APPLE__ > + /* On macOS we need to tune unix domain socket buffer for best performance. > + * Apple recommends sizing the receive buffer at 4 times the size of the > + * send buffer. > + */ > + if (cioc->localAddr.ss_family == AF_UNIX) { > + const int sndbuf_size = 1024 * 1024; Please add a definition instead of magic value, i.e.: #define SOCKET_SEND_BUFSIZE (1 * MiB) BTW in test_io_channel_set_socket_bufs() we use 64 KiB, why 1 MiB? > + const int rcvbuf_size = 4 * sndbuf_size; > + setsockopt(cioc->fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(sndbuf_size)); > + setsockopt(cioc->fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(rcvbuf_size)); > + } > +#endif /* __APPLE__ */ Thanks, Phil.
On Fri, Apr 18, 2025 at 05:24:36PM +0300, Nir Soffer wrote: > Testing with qemu-nbd shows that computing a hash of an image via > qemu-nbd is 5-7 times faster with this change. > > +++ b/io/channel-socket.c > @@ -410,6 +410,19 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, > } > #endif /* WIN32 */ > > +#if __APPLE__ > + /* On macOS we need to tune unix domain socket buffer for best performance. > + * Apple recommends sizing the receive buffer at 4 times the size of the > + * send buffer. > + */ > + if (cioc->localAddr.ss_family == AF_UNIX) { > + const int sndbuf_size = 1024 * 1024; > + const int rcvbuf_size = 4 * sndbuf_size; > + setsockopt(cioc->fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(sndbuf_size)); > + setsockopt(cioc->fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(rcvbuf_size)); > + } > +#endif /* __APPLE__ */ Why does this have to be limited? On linux, 'man 7 unix' documents that SO_SNDBUF is honored (SO_RCVBUF is silently ignored but accepted for compatibility). On the other hand, 'man 7 socket' states that it defaults to the value in /proc/sys/net/core/wmem_default (212992 on my machine) and cannot exceed the value in /proc/sys/net/core/wmem_max without CAP_NET_ADMIN privileges (also 212992 on my machine). Of course, Linux and MacOS are different kernels, so your effort to set it to 1M may actually be working on Apple rather than being silently cut back to the enforced maximum. And the fact that raising it at all makes a difference merely says that unlike Linux (where the default appears to already be as large as possible), Apple is set up to default to a smaller buffer (more fragmentation requires more time), and bumping to the larger value improves performance. But can you use getsockopt() prior to your setsockopt() to see what value Apple was defaulting to, and then again afterwards to see whether it actually got as large as you suggested?
> On 18 Apr 2025, at 17:50, Philippe Mathieu-Daudé <philmd@linaro.org> wrote: > > Hi Nir, > > On 18/4/25 16:24, Nir Soffer wrote: >> Testing with qemu-nbd shows that computing a hash of an image via >> qemu-nbd is 5-7 times faster with this change. >> Tested with 2 qemu-nbd processes: >> $ ./qemu-nbd-after -r -t -e 0 -f raw -k /tmp/after.sock /var/tmp/bench/data-10g.img & >> $ ./qemu-nbd-before -r -t -e 0 -f raw -k /tmp/before.sock /var/tmp/bench/data-10g.img & >> With nbdcopy, using 4 NBD connections: >> $ hyperfine -w 3 "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null:" >> "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null:" >> Benchmark 1: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null: >> Time (mean ± σ): 8.670 s ± 0.025 s [User: 5.670 s, System: 7.113 s] >> Range (min … max): 8.620 s … 8.703 s 10 runs >> Benchmark 2: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null: >> Time (mean ± σ): 1.839 s ± 0.008 s [User: 4.651 s, System: 1.882 s] >> Range (min … max): 1.830 s … 1.853 s 10 runs >> Summary >> ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null: ran >> 4.72 ± 0.02 times faster than ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null: >> With blksum, using one NBD connection: >> $ hyperfine -w 3 "blksum 'nbd+unix:///?socket=/tmp/before.sock'" \ >> "blksum 'nbd+unix:///?socket=/tmp/after.sock'" >> Benchmark 1: blksum 'nbd+unix:///?socket=/tmp/before.sock' >> Time (mean ± σ): 13.606 s ± 0.081 s [User: 5.799 s, System: 6.231 s] >> Range (min … max): 13.516 s … 13.785 s 10 runs >> Benchmark 2: blksum 'nbd+unix:///?socket=/tmp/after.sock' >> Time (mean ± σ): 1.946 s ± 0.017 s [User: 4.541 s, System: 1.481 s] >> Range (min … max): 1.912 s … 1.979 s 10 runs >> Summary >> blksum 'nbd+unix:///?socket=/tmp/after.sock' ran >> 6.99 ± 0.07 times faster than blksum 'nbd+unix:///?socket=/tmp/before.sock' >> This will improve other usage of unix domain sockets on macOS, I tested >> only qemu-nbd. >> Signed-off-by: Nir Soffer <nirsof@gmail.com> >> --- >> io/channel-socket.c | 13 +++++++++++++ >> 1 file changed, 13 insertions(+) >> diff --git a/io/channel-socket.c b/io/channel-socket.c >> index 608bcf066e..b858659764 100644 >> --- a/io/channel-socket.c >> +++ b/io/channel-socket.c >> @@ -410,6 +410,19 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, >> } >> #endif /* WIN32 */ >> +#if __APPLE__ >> + /* On macOS we need to tune unix domain socket buffer for best performance. >> + * Apple recommends sizing the receive buffer at 4 times the size of the >> + * send buffer. >> + */ >> + if (cioc->localAddr.ss_family == AF_UNIX) { >> + const int sndbuf_size = 1024 * 1024; > > Please add a definition instead of magic value, i.e.: > > #define SOCKET_SEND_BUFSIZE (1 * MiB) Using 1 * MiB is nicer. Not sure about the “magic” value; Do you mean: #define SOCKET_SEND_BUFSIZE (1 * MiB) In the top of the file or near the definition? const int sndbuf_size = 1 * MiB; If we want it at the top of the file the name may be confusing since this is used only for macOS and for unix socket. We can have: #define MACOS_UNIX_SOCKET_SEND_BUFSIZE (1 * MiB) Or maybe: #if __APPLE__ #define UNIX_SOCKET_SEND_BUFSIZE (1 * MiB) #endif But we use this in one function so I’m not sure it helps. In vmnet-helper I’m using this in 2 places so it moved to config.h. https://github.com/nirs/vmnet-helper/blob/main/config.h.in > > BTW in test_io_channel_set_socket_bufs() we use 64 KiB, why 1 MiB? This test use small buffer size so we can see the effect of partial reads/writes. I’m trying to improve throughput when reading image data with qemu-nbd. This will likely improve also qemu-storage-daemon and qemu builtin nbd server but I did not test them. I did some benchmarks with send buffer size 64k - 2m, and it shows that 1m gives the best performance. Running one qemu-nbd process with each configuration: % ps ... 18850 ttys013 2:01.78 ./qemu-nbd-64k -r -t -e 0 -f raw -k /tmp/64k.sock /Users/nir/bench/data-10g.img 18871 ttys013 1:53.49 ./qemu-nbd-128k -r -t -e 0 -f raw -k /tmp/128k.sock /Users/nir/bench/data-10g.img 18877 ttys013 1:47.95 ./qemu-nbd-256k -r -t -e 0 -f raw -k /tmp/256k.sock /Users/nir/bench/data-10g.img 18885 ttys013 1:52.06 ./qemu-nbd-512k -r -t -e 0 -f raw -k /tmp/512k.sock /Users/nir/bench/data-10g.img 18894 ttys013 2:02.34 ./qemu-nbd-1m -r -t -e 0 -f raw -k /tmp/1m.sock /Users/nir/bench/data-10g.img 22918 ttys013 0:00.02 ./qemu-nbd-2m -r -t -e 0 -f raw -k /tmp/2m.sock /Users/nir/bench/data-10g.img % hyperfine -w 3 "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/64k.sock' null:” \ "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/128k.sock' null:” \ "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/256k.sock' null:” \ "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/512k.sock' null:” \ "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/1m.sock' null:” \ "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/2m.sock' null:" Benchmark 1: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/64k.sock' null: Time (mean ± σ): 2.760 s ± 0.014 s [User: 4.871 s, System: 2.576 s] Range (min … max): 2.736 s … 2.788 s 10 runs Benchmark 2: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/128k.sock' null: Time (mean ± σ): 2.284 s ± 0.006 s [User: 4.774 s, System: 2.044 s] Range (min … max): 2.275 s … 2.294 s 10 runs Benchmark 3: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/256k.sock' null: Time (mean ± σ): 2.036 s ± 0.010 s [User: 4.734 s, System: 1.822 s] Range (min … max): 2.021 s … 2.052 s 10 runs Benchmark 4: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/512k.sock' null: Time (mean ± σ): 1.763 s ± 0.005 s [User: 4.637 s, System: 1.801 s] Range (min … max): 1.755 s … 1.771 s 10 runs Benchmark 5: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/1m.sock' null: Time (mean ± σ): 1.653 s ± 0.012 s [User: 4.568 s, System: 1.818 s] Range (min … max): 1.636 s … 1.683 s 10 runs Benchmark 6: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/2m.sock' null: Time (mean ± σ): 1.802 s ± 0.052 s [User: 4.573 s, System: 1.918 s] Range (min … max): 1.736 s … 1.896 s 10 runs Summary ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/1m.sock' null: ran 1.07 ± 0.01 times faster than ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/512k.sock' null: 1.09 ± 0.03 times faster than ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/2m.sock' null: 1.23 ± 0.01 times faster than ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/256k.sock' null: 1.38 ± 0.01 times faster than ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/128k.sock' null: 1.67 ± 0.02 times faster than ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/64k.sock' null: I can add a combat table showing the results in a comment, or add the test output to the commit message for reference. > >> + const int rcvbuf_size = 4 * sndbuf_size; >> + setsockopt(cioc->fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(sndbuf_size)); >> + setsockopt(cioc->fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(rcvbuf_size)); >> + } >> +#endif /* __APPLE__ */ > > Thanks, > > Phil.
This should be changed also on the client side. The libnbd part is here: https://gitlab.com/nbdkit/libnbd/-/merge_requests/21 We may want to change also the nbd client code used in qemu-img. I can look at this later. > On 18 Apr 2025, at 17:24, Nir Soffer <nirsof@gmail.com> wrote: > > Testing with qemu-nbd shows that computing a hash of an image via > qemu-nbd is 5-7 times faster with this change. > > Tested with 2 qemu-nbd processes: > > $ ./qemu-nbd-after -r -t -e 0 -f raw -k /tmp/after.sock /var/tmp/bench/data-10g.img & > $ ./qemu-nbd-before -r -t -e 0 -f raw -k /tmp/before.sock /var/tmp/bench/data-10g.img & > > With nbdcopy, using 4 NBD connections: > > $ hyperfine -w 3 "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null:" > "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null:" > Benchmark 1: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null: > Time (mean ± σ): 8.670 s ± 0.025 s [User: 5.670 s, System: 7.113 s] > Range (min … max): 8.620 s … 8.703 s 10 runs > > Benchmark 2: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null: > Time (mean ± σ): 1.839 s ± 0.008 s [User: 4.651 s, System: 1.882 s] > Range (min … max): 1.830 s … 1.853 s 10 runs > > Summary > ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null: ran > 4.72 ± 0.02 times faster than ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null: > > With blksum, using one NBD connection: > > $ hyperfine -w 3 "blksum 'nbd+unix:///?socket=/tmp/before.sock'" \ > "blksum 'nbd+unix:///?socket=/tmp/after.sock'" > Benchmark 1: blksum 'nbd+unix:///?socket=/tmp/before.sock' > Time (mean ± σ): 13.606 s ± 0.081 s [User: 5.799 s, System: 6.231 s] > Range (min … max): 13.516 s … 13.785 s 10 runs > > Benchmark 2: blksum 'nbd+unix:///?socket=/tmp/after.sock' > Time (mean ± σ): 1.946 s ± 0.017 s [User: 4.541 s, System: 1.481 s] > Range (min … max): 1.912 s … 1.979 s 10 runs > > Summary > blksum 'nbd+unix:///?socket=/tmp/after.sock' ran > 6.99 ± 0.07 times faster than blksum 'nbd+unix:///?socket=/tmp/before.sock' > > This will improve other usage of unix domain sockets on macOS, I tested > only qemu-nbd. > > Signed-off-by: Nir Soffer <nirsof@gmail.com> > --- > io/channel-socket.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/io/channel-socket.c b/io/channel-socket.c > index 608bcf066e..b858659764 100644 > --- a/io/channel-socket.c > +++ b/io/channel-socket.c > @@ -410,6 +410,19 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, > } > #endif /* WIN32 */ > > +#if __APPLE__ > + /* On macOS we need to tune unix domain socket buffer for best performance. > + * Apple recommends sizing the receive buffer at 4 times the size of the > + * send buffer. > + */ > + if (cioc->localAddr.ss_family == AF_UNIX) { > + const int sndbuf_size = 1024 * 1024; > + const int rcvbuf_size = 4 * sndbuf_size; > + setsockopt(cioc->fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(sndbuf_size)); > + setsockopt(cioc->fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(rcvbuf_size)); > + } > +#endif /* __APPLE__ */ > + > qio_channel_set_feature(QIO_CHANNEL(cioc), > QIO_CHANNEL_FEATURE_READ_MSG_PEEK); > > -- > 2.39.5 (Apple Git-154) >
> On 18 Apr 2025, at 21:55, Eric Blake <eblake@redhat.com> wrote: > > On Fri, Apr 18, 2025 at 05:24:36PM +0300, Nir Soffer wrote: >> Testing with qemu-nbd shows that computing a hash of an image via >> qemu-nbd is 5-7 times faster with this change. >> > >> +++ b/io/channel-socket.c >> @@ -410,6 +410,19 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, >> } >> #endif /* WIN32 */ >> >> +#if __APPLE__ >> + /* On macOS we need to tune unix domain socket buffer for best performance. >> + * Apple recommends sizing the receive buffer at 4 times the size of the >> + * send buffer. >> + */ >> + if (cioc->localAddr.ss_family == AF_UNIX) { >> + const int sndbuf_size = 1024 * 1024; >> + const int rcvbuf_size = 4 * sndbuf_size; >> + setsockopt(cioc->fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(sndbuf_size)); >> + setsockopt(cioc->fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(rcvbuf_size)); >> + } >> +#endif /* __APPLE__ */ > > Why does this have to be limited? On linux, 'man 7 unix' documents > that SO_SNDBUF is honored (SO_RCVBUF is silently ignored but accepted > for compatibility). On the other hand, 'man 7 socket' states that it > defaults to the value in /proc/sys/net/core/wmem_default (212992 on my > machine) and cannot exceed the value in /proc/sys/net/core/wmem_max > without CAP_NET_ADMIN privileges (also 212992 on my machine). > > Of course, Linux and MacOS are different kernels, so your effort to > set it to 1M may actually be working on Apple rather than being > silently cut back to the enforced maximum. Testing shows values up to 2m send buffer, 8m receive buffer shows the values changes the performance, so they are not silently clipped. > And the fact that raising > it at all makes a difference merely says that unlike Linux (where the > default appears to already be as large as possible), Apple is set up > to default to a smaller buffer (more fragmentation requires more > time), and bumping to the larger value improves performance. But can > you use getsockopt() prior to your setsockopt() to see what value > Apple was defaulting to, and then again afterwards to see whether it > actually got as large as you suggested? Sure, tested with: diff --git a/io/channel-socket.c b/io/channel-socket.c index b858659764..9600a076be 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -418,8 +418,21 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, if (cioc->localAddr.ss_family == AF_UNIX) { const int sndbuf_size = 1024 * 1024; const int rcvbuf_size = 4 * sndbuf_size; + int value; + socklen_t value_size = sizeof(value); + + getsockopt(cioc->fd, SOL_SOCKET, SO_SNDBUF, &value, &value_size); + fprintf(stderr, "before: send buffer size: %d\n", value); + getsockopt(cioc->fd, SOL_SOCKET, SO_RCVBUF, &value, &value_size); + fprintf(stderr, "before: recv buffer size: %d\n", value); + setsockopt(cioc->fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(sndbuf_size)); setsockopt(cioc->fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(rcvbuf_size)); + + getsockopt(cioc->fd, SOL_SOCKET, SO_SNDBUF, &value, &value_size); + fprintf(stderr, "after: send buffer size: %d\n", value); + getsockopt(cioc->fd, SOL_SOCKET, SO_RCVBUF, &value, &value_size); + fprintf(stderr, "after: recv buffer size: %d\n", value); } #endif /* __APPLE__ */ With 1m send buffer: % ./qemu-nbd -r -t -e 0 -f raw -k /tmp/nbd.sock ~/bench/data-10g.img before: send buffer size: 8192 before: recv buffer size: 8192 after: send buffer size: 1048576 after: recv buffer size: 4194304 With 2m send buffer: % ./qemu-nbd -r -t -e 0 -f raw -k /tmp/nbd.sock ~/bench/data-10g.img before: send buffer size: 8192 before: recv buffer size: 8192 after: send buffer size: 2097152 after: recv buffer size: 8388608
diff --git a/io/channel-socket.c b/io/channel-socket.c index 608bcf066e..b858659764 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -410,6 +410,19 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, } #endif /* WIN32 */ +#if __APPLE__ + /* On macOS we need to tune unix domain socket buffer for best performance. + * Apple recommends sizing the receive buffer at 4 times the size of the + * send buffer. + */ + if (cioc->localAddr.ss_family == AF_UNIX) { + const int sndbuf_size = 1024 * 1024; + const int rcvbuf_size = 4 * sndbuf_size; + setsockopt(cioc->fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(sndbuf_size)); + setsockopt(cioc->fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(rcvbuf_size)); + } +#endif /* __APPLE__ */ + qio_channel_set_feature(QIO_CHANNEL(cioc), QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
Testing with qemu-nbd shows that computing a hash of an image via qemu-nbd is 5-7 times faster with this change. Tested with 2 qemu-nbd processes: $ ./qemu-nbd-after -r -t -e 0 -f raw -k /tmp/after.sock /var/tmp/bench/data-10g.img & $ ./qemu-nbd-before -r -t -e 0 -f raw -k /tmp/before.sock /var/tmp/bench/data-10g.img & With nbdcopy, using 4 NBD connections: $ hyperfine -w 3 "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null:" "./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null:" Benchmark 1: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null: Time (mean ± σ): 8.670 s ± 0.025 s [User: 5.670 s, System: 7.113 s] Range (min … max): 8.620 s … 8.703 s 10 runs Benchmark 2: ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null: Time (mean ± σ): 1.839 s ± 0.008 s [User: 4.651 s, System: 1.882 s] Range (min … max): 1.830 s … 1.853 s 10 runs Summary ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/after.sock' null: ran 4.72 ± 0.02 times faster than ./nbdcopy --blkhash 'nbd+unix:///?socket=/tmp/before.sock' null: With blksum, using one NBD connection: $ hyperfine -w 3 "blksum 'nbd+unix:///?socket=/tmp/before.sock'" \ "blksum 'nbd+unix:///?socket=/tmp/after.sock'" Benchmark 1: blksum 'nbd+unix:///?socket=/tmp/before.sock' Time (mean ± σ): 13.606 s ± 0.081 s [User: 5.799 s, System: 6.231 s] Range (min … max): 13.516 s … 13.785 s 10 runs Benchmark 2: blksum 'nbd+unix:///?socket=/tmp/after.sock' Time (mean ± σ): 1.946 s ± 0.017 s [User: 4.541 s, System: 1.481 s] Range (min … max): 1.912 s … 1.979 s 10 runs Summary blksum 'nbd+unix:///?socket=/tmp/after.sock' ran 6.99 ± 0.07 times faster than blksum 'nbd+unix:///?socket=/tmp/before.sock' This will improve other usage of unix domain sockets on macOS, I tested only qemu-nbd. Signed-off-by: Nir Soffer <nirsof@gmail.com> --- io/channel-socket.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)