From patchwork Tue Nov 15 07:09:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 13043282 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E112C43217 for ; Tue, 15 Nov 2022 07:10:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232644AbiKOHJ5 (ORCPT ); Tue, 15 Nov 2022 02:09:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232647AbiKOHJy (ORCPT ); Tue, 15 Nov 2022 02:09:54 -0500 Received: from 66-220-144-178.mail-mxout.facebook.com (66-220-144-178.mail-mxout.facebook.com [66.220.144.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7661620354 for ; Mon, 14 Nov 2022 23:09:45 -0800 (PST) Received: by dev0134.prn3.facebook.com (Postfix, from userid 425415) id 2098915F1107; Mon, 14 Nov 2022 23:09:35 -0800 (PST) From: Stefan Roesch To: kernel-team@fb.com Cc: shr@devkernel.io, axboe@kernel.dk, olivier@trillion01.com, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, ammarfaizi2@gnuweeb.org Subject: [RFC PATCH v3 1/4] liburing: add api to set napi busy poll settings Date: Mon, 14 Nov 2022 23:09:30 -0800 Message-Id: <20221115070933.1792142-2-shr@devkernel.io> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221115070933.1792142-1-shr@devkernel.io> References: <20221115070933.1792142-1-shr@devkernel.io> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This adds three functions to manage the napi busy poll settings: - io_uring_register_napi_busy_poll_timeout - io_uring_unregister_napi_busy_poll_timeout - io_uring_register_napi_prefer_busy_poll Signed-off-by: Stefan Roesch --- src/include/liburing.h | 6 ++++++ src/include/liburing/io_uring.h | 4 ++++ src/liburing.map | 7 +++++++ src/register.c | 23 +++++++++++++++++++++++ 4 files changed, 40 insertions(+) diff --git a/src/include/liburing.h b/src/include/liburing.h index 12a703f..47bbced 100644 --- a/src/include/liburing.h +++ b/src/include/liburing.h @@ -235,6 +235,12 @@ int io_uring_register_sync_cancel(struct io_uring *ring, int io_uring_register_file_alloc_range(struct io_uring *ring, unsigned off, unsigned len); +int io_uring_register_napi_prefer_busy_poll(struct io_uring *ring, + bool prefer_busy_poll); +int io_uring_register_napi_busy_poll_timeout(struct io_uring *ring, + unsigned int to); +int io_uring_unregister_napi_busy_poll_timeout(struct io_uring *ring); + int io_uring_get_events(struct io_uring *ring); int io_uring_submit_and_get_events(struct io_uring *ring); diff --git a/src/include/liburing/io_uring.h b/src/include/liburing/io_uring.h index a3e0920..2e53f52 100644 --- a/src/include/liburing/io_uring.h +++ b/src/include/liburing/io_uring.h @@ -499,6 +499,10 @@ enum { /* register a range of fixed file slots for automatic slot allocation */ IORING_REGISTER_FILE_ALLOC_RANGE = 25, + /* set/clear busy poll settings */ + IORING_REGISTER_NAPI_PREFER_BUSY_POLL = 26, + IORING_REGISTER_NAPI_BUSY_POLL_TIMEOUT = 27, + /* this goes last */ IORING_REGISTER_LAST }; diff --git a/src/liburing.map b/src/liburing.map index 06c64f8..2e41a40 100644 --- a/src/liburing.map +++ b/src/liburing.map @@ -67,3 +67,10 @@ LIBURING_2.3 { io_uring_get_events; io_uring_submit_and_get_events; } LIBURING_2.2; + +LIBURING_2.4 { + global: + io_uring_napi_register_prefer_busy_poll; + io_uring_napi_register_busy_poll_timeout; + io_uring_napi_unregister_busy_poll_timeout; +} LIBURING_2.3; diff --git a/src/register.c b/src/register.c index e849825..50250b8 100644 --- a/src/register.c +++ b/src/register.c @@ -367,3 +367,26 @@ int io_uring_register_file_alloc_range(struct io_uring *ring, IORING_REGISTER_FILE_ALLOC_RANGE, &range, 0); } + +int io_uring_register_napi_prefer_busy_poll(struct io_uring *ring, + bool prefer_busy_poll) +{ + return __sys_io_uring_register(ring->ring_fd, + IORING_REGISTER_NAPI_PREFER_BUSY_POLL, + NULL, prefer_busy_poll); +} + +int io_uring_register_napi_busy_poll_timeout(struct io_uring *ring, + unsigned int to) +{ + return __sys_io_uring_register(ring->ring_fd, + IORING_REGISTER_NAPI_BUSY_POLL_TIMEOUT, + NULL, to); +} + +int io_uring_unregister_napi_busy_poll_timeout(struct io_uring *ring) +{ + return __sys_io_uring_register(ring->ring_fd, + IORING_REGISTER_NAPI_BUSY_POLL_TIMEOUT, + NULL, 0); +} From patchwork Tue Nov 15 07:09:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 13043280 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1B4CC433FE for ; Tue, 15 Nov 2022 07:09:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232600AbiKOHJz (ORCPT ); Tue, 15 Nov 2022 02:09:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232601AbiKOHJw (ORCPT ); Tue, 15 Nov 2022 02:09:52 -0500 Received: from 66-220-144-178.mail-mxout.facebook.com (66-220-144-178.mail-mxout.facebook.com [66.220.144.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 522E41FFA2 for ; Mon, 14 Nov 2022 23:09:44 -0800 (PST) Received: by dev0134.prn3.facebook.com (Postfix, from userid 425415) id 24A3D15F1109; Mon, 14 Nov 2022 23:09:35 -0800 (PST) From: Stefan Roesch To: kernel-team@fb.com Cc: shr@devkernel.io, axboe@kernel.dk, olivier@trillion01.com, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, ammarfaizi2@gnuweeb.org Subject: [RFC PATCH v3 2/4] liburing: add documentation for new napi busy polling Date: Mon, 14 Nov 2022 23:09:31 -0800 Message-Id: <20221115070933.1792142-3-shr@devkernel.io> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221115070933.1792142-1-shr@devkernel.io> References: <20221115070933.1792142-1-shr@devkernel.io> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This adds two man pages for the two new functions: - io_uring_register_napi_busy_poll_timeout - io_uring_unregister_napi_busy_poll_timeout Signed-off-by: Stefan Roesch --- ...io_uring_register_napi_busy_poll_timeout.3 | 35 +++++++++++++++++++ man/io_uring_register_napi_prefer_busy_poll.3 | 35 +++++++++++++++++++ ..._uring_unregister_napi_busy_poll_timeout.3 | 26 ++++++++++++++ 3 files changed, 96 insertions(+) create mode 100644 man/io_uring_register_napi_busy_poll_timeout.3 create mode 100644 man/io_uring_register_napi_prefer_busy_poll.3 create mode 100644 man/io_uring_unregister_napi_busy_poll_timeout.3 diff --git a/man/io_uring_register_napi_busy_poll_timeout.3 b/man/io_uring_register_napi_busy_poll_timeout.3 new file mode 100644 index 0000000..3acce60 --- /dev/null +++ b/man/io_uring_register_napi_busy_poll_timeout.3 @@ -0,0 +1,35 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register_napi_busy_poll_timeout 3 "November 10, 2022" "liburing-2.4" "liburing Manual" +.SH NAME +io_uring_register_napi_busy_poll_timeout \- register NAPI busy poll timeout +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_register_napi_busy_poll_timeout(struct io_uring *" ring "," +.BI " unsigned int " timeout) +.PP +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_register_napi_busy_poll_timeout (3) +function registers the NAPI busy poll +.I timeout +for subsequent operations. + +Registering a NAPI busy poll timeout is a requirement to be able to use +NAPI busy polling. The other way to enable NAPI busy polling is to set the +proc setting /proc/sys/net/core/busy_poll. + +NAPI busy poll can reduce the network roundtrip time. + + +.SH RETURN VALUE +On success +.BR io_uring_register_napi_busy_poll_timeout (3) +return 0. On failure they return +.BR -errno . diff --git a/man/io_uring_register_napi_prefer_busy_poll.3 b/man/io_uring_register_napi_prefer_busy_poll.3 new file mode 100644 index 0000000..713840e --- /dev/null +++ b/man/io_uring_register_napi_prefer_busy_poll.3 @@ -0,0 +1,35 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register_napi_prefer_busy_poll 3 "November 11, 2022" "liburing-2.4" "liburing Manual" +.SH NAME +io_uring_register_napi_prefer_busy_poll \- register NAPI prefer busy poll setting +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_register_napi_prefer_busy_poll(struct io_uring *" ring "," +.BI " bool " prefer_busy_poll) +.PP +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_register_napi_prefer_busy_poll (3) +function registers the NAPI +.I prefer_busy_poll +for subsequent operations. + +Registering a NAPI prefer busy poll seeting sets the mode when calling the +function napi_busy_loop and corresponds to the SO_PREFER_BUSY_POLL socket +option. + +NAPI prefer busy poll can help in reducng the network roundtrip time. + + +.SH RETURN VALUE +On success +.BR io_uring_register_napi_prefer_busy_poll (3) +return 0. On failure they return +.BR -errno . diff --git a/man/io_uring_unregister_napi_busy_poll_timeout.3 b/man/io_uring_unregister_napi_busy_poll_timeout.3 new file mode 100644 index 0000000..666e006 --- /dev/null +++ b/man/io_uring_unregister_napi_busy_poll_timeout.3 @@ -0,0 +1,26 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_unregister_napi_busy_poll_timeout 3 "November 10, 2022" "liburing-2.4" "liburing Manual" +.SH NAME +io_uring_unregister_napi_busy_poll_timeout \- unregister NAPI busy poll timeout +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_unregister_napi_busy_poll_timeout(struct io_uring *" ring ") +.PP +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_unregister_napi_busy_poll_timeout (3) +function unregisters the NAPI busy poll +for subsequent operations. + +.SH RETURN VALUE +On success +.BR io_uring_unregister_napi_busy_poll_timeout (3) +return 0. On failure they return +.BR -errno . From patchwork Tue Nov 15 07:09:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 13043283 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB87CC43219 for ; Tue, 15 Nov 2022 07:10:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232533AbiKOHKB (ORCPT ); Tue, 15 Nov 2022 02:10:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232579AbiKOHJy (ORCPT ); Tue, 15 Nov 2022 02:09:54 -0500 Received: from 66-220-144-178.mail-mxout.facebook.com (66-220-144-178.mail-mxout.facebook.com [66.220.144.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7679320355 for ; Mon, 14 Nov 2022 23:09:46 -0800 (PST) Received: by dev0134.prn3.facebook.com (Postfix, from userid 425415) id 292B715F110B; Mon, 14 Nov 2022 23:09:35 -0800 (PST) From: Stefan Roesch To: kernel-team@fb.com Cc: shr@devkernel.io, axboe@kernel.dk, olivier@trillion01.com, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, ammarfaizi2@gnuweeb.org Subject: [RFC PATCH v3 3/4] liburing: add test programs for napi busy poll Date: Mon, 14 Nov 2022 23:09:32 -0800 Message-Id: <20221115070933.1792142-4-shr@devkernel.io> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221115070933.1792142-1-shr@devkernel.io> References: <20221115070933.1792142-1-shr@devkernel.io> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This adds two test programs to test the napi busy poll functionality. It consists of a client program and a server program. To get a napi id, the client and the server program need to be run on different hosts. To test the napi busy poll timeout, the -t needs to be specified. A reasonable value for the busy poll timeout is 100. By specifying the busy poll timeout on the server and the client the best results are accomplished. Signed-off-by: Stefan Roesch --- .gitignore | 2 + examples/Makefile | 2 + examples/napi-busy-poll-client.c | 432 +++++++++++++++++++++++++++++++ examples/napi-busy-poll-server.c | 380 +++++++++++++++++++++++++++ 4 files changed, 816 insertions(+) create mode 100644 examples/napi-busy-poll-client.c create mode 100644 examples/napi-busy-poll-server.c diff --git a/.gitignore b/.gitignore index 6e8a2f7..89b5a41 100644 --- a/.gitignore +++ b/.gitignore @@ -15,6 +15,8 @@ /examples/io_uring-test /examples/io_uring-udp /examples/link-cp +/examples/napi-busy-poll-client +/examples/napi-busy-poll-server /examples/ucontext-cp /examples/poll-bench /examples/send-zerocopy diff --git a/examples/Makefile b/examples/Makefile index e561e05..59f1260 100644 --- a/examples/Makefile +++ b/examples/Makefile @@ -15,6 +15,8 @@ example_srcs := \ io_uring-test.c \ io_uring-udp.c \ link-cp.c \ + napi-busy-poll-client.c \ + napi-busy-poll-server.c \ poll-bench.c \ send-zerocopy.c diff --git a/examples/napi-busy-poll-client.c b/examples/napi-busy-poll-client.c new file mode 100644 index 0000000..38c4798 --- /dev/null +++ b/examples/napi-busy-poll-client.c @@ -0,0 +1,432 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MAXBUFLEN 100 +#define PORTNOLEN 10 +#define ADDRLEN 80 +#define RINGSIZE 1024 + +#define printable(ch) (isprint((unsigned char)ch) ? ch : '#') + +enum { + IOURING_RECV, + IOURING_SEND, + IOURING_RECVMSG, + IOURING_SENDMSG +}; + +struct ctx +{ + struct io_uring ring; + struct sockaddr_in6 saddr; + + int sockfd; + int buffer_len; + int num_pings; + bool napi_check; + + union { + char buffer[MAXBUFLEN]; + struct timespec ts; + }; + + int rtt_index; + double *rtt; +} ctx; + +struct options +{ + int num_pings; + int timeout; + + bool sq_poll; + bool busy_loop; + bool prefer_busy_poll; + + char port[PORTNOLEN]; + char addr[ADDRLEN]; +} options; + +struct option longopts[] = +{ + {"address" , 1, NULL, 'a'}, + {"busy" , 0, NULL, 'b'}, + {"help" , 0, NULL, 'h'}, + {"num_pings", 1, NULL, 'n'}, + {"port" , 1, NULL, 'p'}, + {"prefer" , 1, NULL, 'u'}, + {"sqpoll" , 0, NULL, 's'}, + {"timeout" , 1, NULL, 't'}, + {NULL , 0, NULL, 0 } +}; + +void printUsage(const char *name) +{ + fprintf(stderr, + "Usage: %s [-l|--listen] [-a|--address ip_address] [-p|--port port-no] [-s|--sqpoll]" + " [-b|--busy] [-n|--num pings] [-t|--timeout busy-poll-timeout] [-h|--help]\n" + "--address\n" + "-a : remote or local ipv6 address\n" + "--busy\n" + "-b : busy poll io_uring instead of blocking.\n" + "--num_pings\n" + "-n : number of pings\n" + "--port\n" + "-p : port\n" + "--sqpoll\n" + "-s : Configure io_uring to use SQPOLL thread\n" + "--timeout\n" + "-t : Configure NAPI busy poll timeoutn" + "--prefer\n" + "-u : prefer NAPI busy poll\n" + "--help\n" + "-h : Display this usage message\n\n", + name); +} + +void printError(const char *msg, int opt) +{ + if (msg && opt) + fprintf(stderr, "%s (-%c)\n", msg, printable(opt)); +} + +void setProcessScheduler(void) +{ + struct sched_param param; + + param.sched_priority = sched_get_priority_max(SCHED_FIFO); + if (sched_setscheduler(0, SCHED_FIFO, ¶m) < 0) + fprintf(stderr, "sched_setscheduler() failed: (%d) %s\n", + errno, strerror(errno)); +} + +double diffTimespec(const struct timespec *time1, const struct timespec *time0) +{ + return (time1->tv_sec - time0->tv_sec) + + (time1->tv_nsec - time0->tv_nsec) / 1000000000.0; +} + +uint64_t encodeUserData(char type, int fd) +{ + return (uint32_t)fd | ((uint64_t)type << 56); +} + +void decodeUserData(uint64_t data, char *type, int *fd) +{ + *type = data >> 56; + *fd = data & 0xffffffffU; +} + +const char *opTypeToStr(char type) +{ + const char *res; + + switch (type) { + case IOURING_RECV: + res = "IOURING_RECV"; + break; + case IOURING_SEND: + res = "IOURING_SEND"; + break; + case IOURING_RECVMSG: + res = "IOURING_RECVMSG"; + break; + case IOURING_SENDMSG: + res = "IOURING_SENDMSG"; + break; + default: + res = "Unknown"; + } + + return res; +} + +void reportNapi(struct ctx *ctx) +{ + unsigned int napi_id = 0; + socklen_t len = sizeof(napi_id); + + getsockopt(ctx->sockfd, SOL_SOCKET, SO_INCOMING_NAPI_ID, &napi_id, &len); + if (napi_id) + printf(" napi id: %d\n", napi_id); + else + printf(" unassigned napi id\n"); + + ctx->napi_check = true; +} + +void sendPing(struct ctx *ctx) +{ + struct io_uring_sqe *sqe = io_uring_get_sqe(&ctx->ring); + + clock_gettime(CLOCK_REALTIME, (struct timespec *)ctx->buffer); + + io_uring_prep_send(sqe, ctx->sockfd, ctx->buffer, sizeof(struct timespec), 0); + sqe->user_data = encodeUserData(IOURING_SEND, ctx->sockfd); +} + +void receivePing(struct ctx *ctx) +{ + struct io_uring_sqe *sqe = io_uring_get_sqe(&ctx->ring); + + io_uring_prep_recv(sqe, ctx->sockfd, ctx->buffer, MAXBUFLEN, 0); + sqe->user_data = encodeUserData(IOURING_RECV, ctx->sockfd); +} + +void recordRTT(struct ctx *ctx) +{ + struct timespec startTs = ctx->ts; + + // Send next ping. + sendPing(ctx); + + // Store round-trip time. + ctx->rtt[ctx->rtt_index] = diffTimespec(&ctx->ts, &startTs); + ctx->rtt_index++; +} + +void printStats(struct ctx *ctx) +{ + double minRTT = DBL_MAX; + double maxRTT = 0.0; + double avgRTT = 0.0; + double stddevRTT = 0.0; + + // Calculate min, max, avg. + for (int i = 0; i < ctx->rtt_index; i++) { + if (ctx->rtt[i] < minRTT) + minRTT = ctx->rtt[i]; + if (ctx->rtt[i] > maxRTT) + maxRTT = ctx->rtt[i]; + + avgRTT += ctx->rtt[i]; + } + avgRTT /= ctx->rtt_index; + + // Calculate stddev. + for (int i = 0; i < ctx->rtt_index; i++) + stddevRTT += fabs(ctx->rtt[i] - avgRTT); + stddevRTT /= ctx->rtt_index; + + fprintf(stdout, " rtt(us) min/avg/max/mdev = %.3f/%.3f/%.3f/%.3f\n", + minRTT * 1000000, avgRTT * 1000000, maxRTT * 1000000, stddevRTT * 1000000); +} + +void completion(struct ctx *ctx, struct io_uring_cqe *cqe) +{ + char type; + int fd; + int res = cqe->res; + + decodeUserData(cqe->user_data, &type, &fd); + if (res < 0) { + fprintf(stderr, "unexpected %s failure: (%d) %s\n", + opTypeToStr(type), -res, strerror(-res)); + abort(); + } + + switch (type) { + case IOURING_SEND: + receivePing(ctx); + break; + case IOURING_RECV: + if (res != sizeof(struct timespec)) { + fprintf(stderr, "unexpected ping reply len: %d\n", res); + abort(); + } + + if (!ctx->napi_check) { + reportNapi(ctx); + sendPing(ctx); + } else { + recordRTT(ctx); + } + + --ctx->num_pings; + break; + + default: + fprintf(stderr, "unexpected %s completion\n", + opTypeToStr(type)); + abort(); + break; + } +} + +int main(int argc, char *argv[]) +{ + struct ctx ctx; + struct options opt; + struct __kernel_timespec *tsPtr; + struct __kernel_timespec ts; + struct io_uring_params params; + int flag; + + memset(&opt, 0, sizeof(struct options)); + + // Process flags. + while ((flag = getopt_long(argc, argv, ":hsba:n:p:t:", longopts, NULL)) != -1) { + switch (flag) { + case 'a': + strcpy(opt.addr, optarg); + break; + case 'b': + opt.busy_loop = true; + break; + case 'h': + printUsage(argv[0]); + exit(0); + break; + case 'n': + opt.num_pings = atoi(optarg) + 1; + break; + case 'p': + strcpy(opt.port, optarg); + break; + case 's': + opt.sq_poll = true; + break; + case 't': + opt.timeout = atoi(optarg); + break; + case 'u': + opt.prefer_busy_poll = true; + break; + case ':': + printError("Missing argument", optopt); + printUsage(argv[0]); + exit(-1); + break; + case '?': + printError("Unrecognized option", optopt); + printUsage(argv[0]); + exit(-1); + break; + + default: + fprintf(stderr, "Fatal: Unexpected case in CmdLineProcessor switch()\n"); + exit(-1); + break; + } + } + + if (strlen(opt.addr) == 0) { + fprintf(stderr, "address option is mandatory\n"); + printUsage(argv[0]); + exit(1); + } + + ctx.saddr.sin6_port = htons(atoi(opt.port)); + ctx.saddr.sin6_family = AF_INET6; + + if (inet_pton(AF_INET6, opt.addr, &ctx.saddr.sin6_addr) <= 0) { + fprintf(stderr, "inet_pton error for %s\n", optarg); + printUsage(argv[0]); + exit(1); + } + + // Connect to server. + fprintf(stdout, "Connecting to %s... (port=%s) to send %d pings\n", opt.addr, opt.port, opt.num_pings - 1); + + if ((ctx.sockfd = socket(AF_INET6, SOCK_DGRAM, 0)) < 0) { + fprintf(stderr, "socket() failed: (%d) %s\n", errno, strerror(errno)); + exit(1); + } + + if (connect(ctx.sockfd, (struct sockaddr *)&ctx.saddr, sizeof(struct sockaddr_in6)) < 0) { + fprintf(stderr, "connect() failed: (%d) %s\n", errno, strerror(errno)); + exit(1); + } + + // Setup ring. + memset(¶ms, 0, sizeof(params)); + memset(&ts, 0, sizeof(ts)); + + if (opt.sq_poll) { + params.flags = IORING_SETUP_SQPOLL; + params.sq_thread_idle = 50; + } + + if (io_uring_queue_init_params(RINGSIZE, &ctx.ring, ¶ms) < 0) { + fprintf(stderr, "io_uring_queue_init_params() failed: (%d) %s\n", + errno, strerror(errno)); + exit(1); + } + + if (opt.prefer_busy_poll) + io_uring_register_napi_prefer_busy_poll(&ctx.ring, opt.prefer_busy_poll); + + if (opt.timeout) + io_uring_register_napi_busy_poll_timeout(&ctx.ring, opt.timeout); + + if (opt.busy_loop) + tsPtr = &ts; + else + tsPtr = NULL; + + + // Use realtime scheduler. + setProcessScheduler(); + + // Copy payload. + clock_gettime(CLOCK_REALTIME, &ctx.ts); + + // Setup context. + ctx.napi_check = false; + ctx.buffer_len = sizeof(struct timespec); + ctx.num_pings = opt.num_pings; + + ctx.rtt_index = 0; + ctx.rtt = (double *)malloc(sizeof(double) * opt.num_pings); + if (!ctx.rtt) { + fprintf(stderr, "Cannot allocate results array\n"); + exit(1); + } + + // Send initial message to get napi id. + sendPing(&ctx); + + while (ctx.num_pings != 0) { + int res; + unsigned num_completed = 0; + unsigned head; + struct io_uring_cqe *cqe; + + do { + res = io_uring_submit_and_wait_timeout(&ctx.ring, &cqe, 1, tsPtr, NULL); + } + while (res < 0 && errno == ETIME); + + io_uring_for_each_cqe(&ctx.ring, head, cqe) { + ++num_completed; + completion(&ctx, cqe); + } + + if (num_completed) + io_uring_cq_advance(&ctx.ring, num_completed); + } + + printStats(&ctx); + free(ctx.rtt); + io_uring_queue_exit(&ctx.ring); + + // Clean up. + close(ctx.sockfd); + + return 0; +} diff --git a/examples/napi-busy-poll-server.c b/examples/napi-busy-poll-server.c new file mode 100644 index 0000000..11acf44 --- /dev/null +++ b/examples/napi-busy-poll-server.c @@ -0,0 +1,380 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MAXBUFLEN 100 +#define PORTNOLEN 10 +#define ADDRLEN 80 +#define RINGSIZE 1024 + +#define printable(ch) (isprint((unsigned char)ch) ? ch : '#') + +enum { + IOURING_RECV, + IOURING_SEND, + IOURING_RECVMSG, + IOURING_SENDMSG +}; + +struct ctx +{ + struct io_uring ring; + struct sockaddr_in6 saddr; + struct iovec iov; + struct msghdr msg; + + int sockfd; + int buffer_len; + int num_pings; + bool napi_check; + + union { + char buffer[MAXBUFLEN]; + struct timespec ts; + }; +} ctx; + +struct options +{ + int num_pings; + int timeout; + + bool listen; + bool sq_poll; + bool busy_loop; + bool prefer_busy_poll; + + char port[PORTNOLEN]; + char addr[ADDRLEN]; +} options; + +struct option longopts[] = +{ + {"address" , 1, NULL, 'a'}, + {"busy" , 0, NULL, 'b'}, + {"help" , 0, NULL, 'h'}, + {"listen" , 0, NULL, 'l'}, + {"num_pings", 1, NULL, 'n'}, + {"port" , 1, NULL, 'p'}, + {"prefer" , 1, NULL, 'u'}, + {"sqpoll" , 0, NULL, 's'}, + {"timeout" , 1, NULL, 't'}, + {NULL , 0, NULL, 0 } +}; + +void printUsage(const char *name) +{ + fprintf(stderr, + "Usage: %s [-l|--listen] [-a|--address ip_address] [-p|--port port-no] [-s|--sqpoll]" + " [-b|--busy] [-n|--num pings] [-t|--timeout busy-poll-timeout] [-h|--help]\n" + " --listen\n" + "-l : Server mode\n" + "--address\n" + "-a : remote or local ipv6 address\n" + "--busy\n" + "-b : busy poll io_uring instead of blocking.\n" + "--num_pings\n" + "-n : number of pings\n" + "--port\n" + "-p : port\n" + "--sqpoll\n" + "-s : Configure io_uring to use SQPOLL thread\n" + "--timeout\n" + "-t : Configure NAPI busy poll timeoutn" + "--prefer\n" + "-u : prefer NAPI busy poll\n" + "--help\n" + "-h : Display this usage message\n\n", + name); +} + +void printError(const char *msg, int opt) +{ + if (msg && opt) + fprintf(stderr, "%s (-%c)\n", msg, printable(opt)); +} + +void setProcessScheduler() +{ + struct sched_param param; + + param.sched_priority = sched_get_priority_max(SCHED_FIFO); + if (sched_setscheduler(0, SCHED_FIFO, ¶m) < 0) + fprintf(stderr, "sched_setscheduler() failed: (%d) %s\n", + errno, strerror(errno)); +} + +uint64_t encodeUserData(char type, int fd) +{ + return (uint32_t)fd | ((__u64)type << 56); +} + +void decodeUserData(uint64_t data, char *type, int *fd) +{ + *type = data >> 56; + *fd = data & 0xffffffffU; +} + +const char *opTypeToStr(char type) +{ + const char *res; + + switch (type) { + case IOURING_RECV: + res = "IOURING_RECV"; + break; + case IOURING_SEND: + res = "IOURING_SEND"; + break; + case IOURING_RECVMSG: + res = "IOURING_RECVMSG"; + break; + case IOURING_SENDMSG: + res = "IOURING_SENDMSG"; + break; + default: + res = "Unknown"; + } + + return res; +} + +void reportNapi(struct ctx *ctx) +{ + unsigned int napi_id = 0; + socklen_t len = sizeof(napi_id); + + getsockopt(ctx->sockfd, SOL_SOCKET, SO_INCOMING_NAPI_ID, &napi_id, &len); + if (napi_id) + printf(" napi id: %d\n", napi_id); + else + printf(" unassigned napi id\n"); + + ctx->napi_check = true; +} + +void sendPing(struct ctx *ctx) +{ + + struct io_uring_sqe *sqe = io_uring_get_sqe(&ctx->ring); + + io_uring_prep_sendmsg(sqe, ctx->sockfd, &ctx->msg, 0); + sqe->user_data = encodeUserData(IOURING_SENDMSG, ctx->sockfd); +} + +void receivePing(struct ctx *ctx) +{ + bzero(&ctx->msg, sizeof(struct msghdr)); + ctx->msg.msg_name = &ctx->saddr; + ctx->msg.msg_namelen = sizeof(struct sockaddr_in6); + ctx->iov.iov_base = ctx->buffer; + ctx->iov.iov_len = MAXBUFLEN; + ctx->msg.msg_iov = &ctx->iov; + ctx->msg.msg_iovlen = 1; + + struct io_uring_sqe *sqe = io_uring_get_sqe(&ctx->ring); + io_uring_prep_recvmsg(sqe, ctx->sockfd, &ctx->msg, 0); + sqe->user_data = encodeUserData(IOURING_RECVMSG, ctx->sockfd); +} + +void completion(struct ctx *ctx, struct io_uring_cqe *cqe) +{ + char type; + int fd; + int res = cqe->res; + + decodeUserData(cqe->user_data, &type, &fd); + if (res < 0) { + fprintf(stderr, "unexpected %s failure: (%d) %s\n", + opTypeToStr(type), -res, strerror(-res)); + abort(); + } + + switch (type) { + case IOURING_SENDMSG: + receivePing(ctx); + --ctx->num_pings; + break; + case IOURING_RECVMSG: + ctx->iov.iov_len = res; + sendPing(ctx); + if (!ctx->napi_check) + reportNapi(ctx); + break; + default: + fprintf(stderr, "unexpected %s completion\n", + opTypeToStr(type)); + abort(); + break; + } +} + +int main(int argc, char *argv[]) +{ + int flag; + struct ctx ctx; + struct options opt; + struct __kernel_timespec *tsPtr; + struct __kernel_timespec ts; + struct io_uring_params params; + + memset(&opt, 0, sizeof(struct options)); + + // Process flags. + while ((flag = getopt_long(argc, argv, ":lhsba:n:p:t:", longopts, NULL)) != -1) { + switch (flag) { + case 'a': + strcpy(opt.addr, optarg); + break; + case 'b': + opt.busy_loop = true; + break; + case 'h': + printUsage(argv[0]); + exit(0); + break; + case 'l': + opt.listen = true; + break; + case 'n': + opt.num_pings = atoi(optarg) + 1; + break; + case 'p': + strcpy(opt.port, optarg); + break; + case 's': + opt.sq_poll = true; + break; + case 't': + opt.timeout = atoi(optarg); + break; + case 'u': + opt.prefer_busy_poll = true; + break; + case ':': + printError("Missing argument", optopt); + printUsage(argv[0]); + exit(-1); + break; + case '?': + printError("Unrecognized option", optopt); + printUsage(argv[0]); + exit(-1); + break; + + default: + fprintf(stderr, "Fatal: Unexpected case in CmdLineProcessor switch()\n"); + exit(-1); + break; + } + } + + if (strlen(opt.addr) == 0) { + fprintf(stderr, "address option is mandatory\n"); + printUsage(argv[0]); + exit(1); + } + + ctx.saddr.sin6_port = htons(atoi(opt.port)); + ctx.saddr.sin6_family = AF_INET6; + + if (inet_pton(AF_INET6, opt.addr, &ctx.saddr.sin6_addr) <= 0) { + fprintf(stderr, "inet_pton error for %s\n", optarg); + printUsage(argv[0]); + exit(1); + } + + // Connect to server. + fprintf(stdout, "Listening %s : %s...\n", opt.addr, opt.port); + + if ((ctx.sockfd = socket(AF_INET6, SOCK_DGRAM, 0)) < 0) { + fprintf(stderr, "socket() failed: (%d) %s\n", errno, strerror(errno)); + exit(1); + } + + if (bind(ctx.sockfd, (struct sockaddr *)&ctx.saddr, sizeof(struct sockaddr_in6)) < 0) { + fprintf(stderr, "bind() failed: (%d) %s\n", errno, strerror(errno)); + exit(1); + } + + // Setup ring. + memset(¶ms, 0, sizeof(params)); + memset(&ts, 0, sizeof(ts)); + + if (opt.sq_poll) { + params.flags = IORING_SETUP_SQPOLL; + params.sq_thread_idle = 50; + } + + if (io_uring_queue_init_params(RINGSIZE, &ctx.ring, ¶ms) < 0) { + fprintf(stderr, "io_uring_queue_init_params() failed: (%d) %s\n", + errno, strerror(errno)); + exit(1); + } + + if (opt.prefer_busy_poll) + io_uring_register_napi_prefer_busy_poll(&ctx.ring, opt.prefer_busy_poll); + + if (opt.timeout) + io_uring_register_napi_busy_poll_timeout(&ctx.ring, opt.timeout); + + if (opt.busy_loop) + tsPtr = &ts; + else + tsPtr = NULL; + + + // Use realtime scheduler. + setProcessScheduler(); + + // Copy payload. + clock_gettime(CLOCK_REALTIME, &ctx.ts); + + // Setup context. + ctx.napi_check = false; + ctx.buffer_len = sizeof(struct timespec); + ctx.num_pings = opt.num_pings; + + // Receive initial message to get napi id. + receivePing(&ctx); + + while (ctx.num_pings != 0) { + int res; + unsigned int num_completed = 0; + unsigned int head; + struct io_uring_cqe *cqe; + + do { + res = io_uring_submit_and_wait_timeout(&ctx.ring, &cqe, 1, tsPtr, NULL); + } + while (res < 0 && errno == ETIME); + + io_uring_for_each_cqe(&ctx.ring, head, cqe) { + ++num_completed; + completion(&ctx, cqe); + } + + if (num_completed) { + io_uring_cq_advance(&ctx.ring, num_completed); + } + } + + // Clean up. + io_uring_queue_exit(&ctx.ring); + close(ctx.sockfd); + + return 0; +} From patchwork Tue Nov 15 07:09:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 13043284 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B4CEC4332F for ; Tue, 15 Nov 2022 07:10:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230217AbiKOHKS (ORCPT ); Tue, 15 Nov 2022 02:10:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232642AbiKOHKB (ORCPT ); Tue, 15 Nov 2022 02:10:01 -0500 Received: from 66-220-144-178.mail-mxout.facebook.com (66-220-144-178.mail-mxout.facebook.com [66.220.144.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69CD2205CB for ; Mon, 14 Nov 2022 23:09:52 -0800 (PST) Received: by dev0134.prn3.facebook.com (Postfix, from userid 425415) id 2CD6115F110D; Mon, 14 Nov 2022 23:09:35 -0800 (PST) From: Stefan Roesch To: kernel-team@fb.com Cc: shr@devkernel.io, axboe@kernel.dk, olivier@trillion01.com, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, ammarfaizi2@gnuweeb.org Subject: [RFC PATCH v3 4/4] liburing: update changelog with new feature Date: Mon, 14 Nov 2022 23:09:33 -0800 Message-Id: <20221115070933.1792142-5-shr@devkernel.io> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221115070933.1792142-1-shr@devkernel.io> References: <20221115070933.1792142-1-shr@devkernel.io> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add a new entry to the changelog file for the napi busy poll feature. Signed-off-by: Stefan Roesch --- CHANGELOG | 3 +++ 1 file changed, 3 insertions(+) diff --git a/CHANGELOG b/CHANGELOG index 09511af..1db0269 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,6 @@ +liburing-2.4 release +- Support for napi busy polling + liburing-2.3 release - Support non-libc build for aarch64.