From patchwork Mon Nov 21 19:14:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 13051577 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8996DC4167E for ; Mon, 21 Nov 2022 19:15:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231277AbiKUTPk (ORCPT ); Mon, 21 Nov 2022 14:15:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231272AbiKUTPa (ORCPT ); Mon, 21 Nov 2022 14:15:30 -0500 Received: from 66-220-144-178.mail-mxout.facebook.com (66-220-144-178.mail-mxout.facebook.com [66.220.144.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F393654F8 for ; Mon, 21 Nov 2022 11:15:16 -0800 (PST) Received: by dev0134.prn3.facebook.com (Postfix, from userid 425415) id 9009A1B8130E; Mon, 21 Nov 2022 11:15:00 -0800 (PST) From: Stefan Roesch To: kernel-team@fb.com Cc: shr@devkernel.io, axboe@kernel.dk, olivier@trillion01.com, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, ammarfaizi2@gnuweeb.org Subject: [PATCH v5 1/4] liburing: add api to set napi busy poll settings Date: Mon, 21 Nov 2022 11:14:56 -0800 Message-Id: <20221121191459.998388-2-shr@devkernel.io> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221121191459.998388-1-shr@devkernel.io> References: <20221121191459.998388-1-shr@devkernel.io> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This adds two functions to manage the napi busy poll settings: - io_uring_register_napi - io_uring_unregister_napi Signed-off-by: Stefan Roesch --- src/include/liburing.h | 3 +++ src/include/liburing/io_uring.h | 12 ++++++++++++ src/liburing.map | 6 ++++++ src/register.c | 12 ++++++++++++ 4 files changed, 33 insertions(+) diff --git a/src/include/liburing.h b/src/include/liburing.h index 12a703f..98ffd73 100644 --- a/src/include/liburing.h +++ b/src/include/liburing.h @@ -235,6 +235,9 @@ int io_uring_register_sync_cancel(struct io_uring *ring, int io_uring_register_file_alloc_range(struct io_uring *ring, unsigned off, unsigned len); +int io_uring_register_napi(struct io_uring *ring, struct io_uring_napi *napi); +int io_uring_unregister_napi(struct io_uring *ring, struct io_uring_napi *napi); + int io_uring_get_events(struct io_uring *ring); int io_uring_submit_and_get_events(struct io_uring *ring); diff --git a/src/include/liburing/io_uring.h b/src/include/liburing/io_uring.h index a3e0920..25caee3 100644 --- a/src/include/liburing/io_uring.h +++ b/src/include/liburing/io_uring.h @@ -499,6 +499,10 @@ enum { /* register a range of fixed file slots for automatic slot allocation */ IORING_REGISTER_FILE_ALLOC_RANGE = 25, + /* set/clear busy poll settings */ + IORING_REGISTER_NAPI = 26, + IORING_UNREGISTER_NAPI = 27, + /* this goes last */ IORING_REGISTER_LAST }; @@ -621,6 +625,14 @@ struct io_uring_buf_reg { __u64 resv[3]; }; +/* argument for IORING_(UN)REGISTER_NAPI */ +struct io_uring_napi { + __u32 busy_poll_to; + __u8 prefer_busy_poll; + __u8 pad[3]; + __u64 resv; +}; + /* * io_uring_restriction->opcode values */ diff --git a/src/liburing.map b/src/liburing.map index 06c64f8..74036d3 100644 --- a/src/liburing.map +++ b/src/liburing.map @@ -67,3 +67,9 @@ LIBURING_2.3 { io_uring_get_events; io_uring_submit_and_get_events; } LIBURING_2.2; + +LIBURING_2.4 { + global: + io_uring_register_napi; + io_uring_unregister_napi; +} LIBURING_2.3; diff --git a/src/register.c b/src/register.c index e849825..ff87e73 100644 --- a/src/register.c +++ b/src/register.c @@ -367,3 +367,15 @@ int io_uring_register_file_alloc_range(struct io_uring *ring, IORING_REGISTER_FILE_ALLOC_RANGE, &range, 0); } + +int io_uring_register_napi(struct io_uring *ring, struct io_uring_napi *napi) +{ + return __sys_io_uring_register(ring->ring_fd, + IORING_REGISTER_NAPI, napi, 0); +} + +int io_uring_unregister_napi(struct io_uring *ring, struct io_uring_napi *napi) +{ + return __sys_io_uring_register(ring->ring_fd, + IORING_UNREGISTER_NAPI, napi, 0); +} From patchwork Mon Nov 21 19:14:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 13051576 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0A73C43217 for ; Mon, 21 Nov 2022 19:15:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229712AbiKUTPP (ORCPT ); Mon, 21 Nov 2022 14:15:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230517AbiKUTPH (ORCPT ); Mon, 21 Nov 2022 14:15:07 -0500 Received: from 66-220-144-178.mail-mxout.facebook.com (66-220-144-178.mail-mxout.facebook.com [66.220.144.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B85363143 for ; Mon, 21 Nov 2022 11:15:06 -0800 (PST) Received: by dev0134.prn3.facebook.com (Postfix, from userid 425415) id 942E31B81310; Mon, 21 Nov 2022 11:15:00 -0800 (PST) From: Stefan Roesch To: kernel-team@fb.com Cc: shr@devkernel.io, axboe@kernel.dk, olivier@trillion01.com, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, ammarfaizi2@gnuweeb.org Subject: [PATCH v5 2/4] liburing: add documentation for new napi busy polling Date: Mon, 21 Nov 2022 11:14:57 -0800 Message-Id: <20221121191459.998388-3-shr@devkernel.io> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221121191459.998388-1-shr@devkernel.io> References: <20221121191459.998388-1-shr@devkernel.io> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This adds two man pages for the two new functions: - io_uring_register_nap - io_uring_unregister_napi Signed-off-by: Stefan Roesch --- man/io_uring_register_napi.3 | 40 ++++++++++++++++++++++++++++++++++ man/io_uring_unregister_napi.3 | 27 +++++++++++++++++++++++ 2 files changed, 67 insertions(+) create mode 100644 man/io_uring_register_napi.3 create mode 100644 man/io_uring_unregister_napi.3 diff --git a/man/io_uring_register_napi.3 b/man/io_uring_register_napi.3 new file mode 100644 index 0000000..78eaa71 --- /dev/null +++ b/man/io_uring_register_napi.3 @@ -0,0 +1,40 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_register_napi 3 "November 16, 2022" "liburing-2.4" "liburing Manual" +.SH NAME +io_uring_register_napi \- register NAPI busy poll settings +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_register_napi(struct io_uring *" ring "," +.BI " struct io_uring_napi *" napi) +.PP +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_register_napi (3) +function registers the NAPI settings for subsequent operations. The NAPI +settings are specified in the structure that is passed in the +.I napi +parameter. The structure consists of the napi timeout +.I busy_poll_to +(napi busy poll timeout in us) and +.I prefer_busy_poll. + +Registering a NAPI settings sets the mode when calling the function +napi_busy_loop and corresponds to the SO_PREFER_BUSY_POLL socket +option. + +NAPI busy poll can reduce the network roundtrip time. + + +.SH RETURN VALUE +On success +.BR io_uring_register_napi (3) +return 0. On failure they return +.BR -errno . +It also updates the napi structure with the current values. diff --git a/man/io_uring_unregister_napi.3 b/man/io_uring_unregister_napi.3 new file mode 100644 index 0000000..f7087ef --- /dev/null +++ b/man/io_uring_unregister_napi.3 @@ -0,0 +1,27 @@ +.\" Copyright (C) 2022 Stefan Roesch +.\" +.\" SPDX-License-Identifier: LGPL-2.0-or-later +.\" +.TH io_uring_unregister_napi 3 "November 16, 2022" "liburing-2.4" "liburing Manual" +.SH NAME +io_uring_unregister_napi \- unregister NAPI busy poll settings +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int io_uring_unregister_napi(struct io_uring *" ring "," +.BI " struct io_uring_napi *" napi) +.PP +.fi +.SH DESCRIPTION +.PP +The +.BR io_uring_unregister_napi (3) +function unregisters the NAPI busy poll settings for subsequent operations. + +.SH RETURN VALUE +On success +.BR io_uring_unregister_napi (3) +return 0. On failure they return +.BR -errno . +It also updates the napi structure with the current values. From patchwork Mon Nov 21 19:14:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 13051578 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07975C43217 for ; Mon, 21 Nov 2022 19:15:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229542AbiKUTPk (ORCPT ); Mon, 21 Nov 2022 14:15:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229849AbiKUTPN (ORCPT ); Mon, 21 Nov 2022 14:15:13 -0500 Received: from 66-220-144-178.mail-mxout.facebook.com (66-220-144-178.mail-mxout.facebook.com [66.220.144.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0FB963143 for ; Mon, 21 Nov 2022 11:15:11 -0800 (PST) Received: by dev0134.prn3.facebook.com (Postfix, from userid 425415) id 991941B81312; Mon, 21 Nov 2022 11:15:00 -0800 (PST) From: Stefan Roesch To: kernel-team@fb.com Cc: shr@devkernel.io, axboe@kernel.dk, olivier@trillion01.com, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, ammarfaizi2@gnuweeb.org Subject: [PATCH v5 3/4] liburing: add example programs for napi busy poll Date: Mon, 21 Nov 2022 11:14:58 -0800 Message-Id: <20221121191459.998388-4-shr@devkernel.io> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221121191459.998388-1-shr@devkernel.io> References: <20221121191459.998388-1-shr@devkernel.io> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This adds two example programs to test the napi busy poll functionality. It consists of a client program and a server program. To get a napi id, the client and the server program need to be run on different hosts. To test the napi busy poll timeout, the -t needs to be specified. A reasonable value for the busy poll timeout is 100. By specifying the busy poll timeout on the server and the client the best results are accomplished. Signed-off-by: Stefan Roesch --- .gitignore | 2 + examples/Makefile | 2 + examples/napi-busy-poll-client.c | 442 +++++++++++++++++++++++++++++++ examples/napi-busy-poll-server.c | 386 +++++++++++++++++++++++++++ 4 files changed, 832 insertions(+) create mode 100644 examples/napi-busy-poll-client.c create mode 100644 examples/napi-busy-poll-server.c diff --git a/.gitignore b/.gitignore index 6e8a2f7..89b5a41 100644 --- a/.gitignore +++ b/.gitignore @@ -15,6 +15,8 @@ /examples/io_uring-test /examples/io_uring-udp /examples/link-cp +/examples/napi-busy-poll-client +/examples/napi-busy-poll-server /examples/ucontext-cp /examples/poll-bench /examples/send-zerocopy diff --git a/examples/Makefile b/examples/Makefile index e561e05..59f1260 100644 --- a/examples/Makefile +++ b/examples/Makefile @@ -15,6 +15,8 @@ example_srcs := \ io_uring-test.c \ io_uring-udp.c \ link-cp.c \ + napi-busy-poll-client.c \ + napi-busy-poll-server.c \ poll-bench.c \ send-zerocopy.c diff --git a/examples/napi-busy-poll-client.c b/examples/napi-busy-poll-client.c new file mode 100644 index 0000000..9b2e543 --- /dev/null +++ b/examples/napi-busy-poll-client.c @@ -0,0 +1,442 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MAXBUFLEN 100 +#define PORTNOLEN 10 +#define ADDRLEN 80 +#define RINGSIZE 1024 + +#define printable(ch) (isprint((unsigned char)ch) ? ch : '#') + +enum { + IOURING_RECV, + IOURING_SEND, + IOURING_RECVMSG, + IOURING_SENDMSG +}; + +struct ctx +{ + struct io_uring ring; + struct sockaddr_in6 saddr; + + int sockfd; + int buffer_len; + int num_pings; + bool napi_check; + + union { + char buffer[MAXBUFLEN]; + struct timespec ts; + }; + + int rtt_index; + double *rtt; +} ctx; + +struct options +{ + int num_pings; + int timeout; + + bool sq_poll; + bool busy_loop; + bool prefer_busy_poll; + + char port[PORTNOLEN]; + char addr[ADDRLEN]; +} options; + +struct option longopts[] = +{ + {"address" , 1, NULL, 'a'}, + {"busy" , 0, NULL, 'b'}, + {"help" , 0, NULL, 'h'}, + {"num_pings", 1, NULL, 'n'}, + {"port" , 1, NULL, 'p'}, + {"prefer" , 1, NULL, 'u'}, + {"sqpoll" , 0, NULL, 's'}, + {"timeout" , 1, NULL, 't'}, + {NULL , 0, NULL, 0 } +}; + +void printUsage(const char *name) +{ + fprintf(stderr, + "Usage: %s [-l|--listen] [-a|--address ip_address] [-p|--port port-no] [-s|--sqpoll]" + " [-b|--busy] [-n|--num pings] [-t|--timeout busy-poll-timeout] [-u||--prefer] [-h|--help]\n" + "--address\n" + "-a : remote or local ipv6 address\n" + "--busy\n" + "-b : busy poll io_uring instead of blocking.\n" + "--num_pings\n" + "-n : number of pings\n" + "--port\n" + "-p : port\n" + "--sqpoll\n" + "-s : Configure io_uring to use SQPOLL thread\n" + "--timeout\n" + "-t : Configure NAPI busy poll timeoutn" + "--prefer\n" + "-u : prefer NAPI busy poll\n" + "--help\n" + "-h : Display this usage message\n\n", + name); +} + +void printError(const char *msg, int opt) +{ + if (msg && opt) + fprintf(stderr, "%s (-%c)\n", msg, printable(opt)); +} + +void setProcessScheduler(void) +{ + struct sched_param param; + + param.sched_priority = sched_get_priority_max(SCHED_FIFO); + if (sched_setscheduler(0, SCHED_FIFO, ¶m) < 0) + fprintf(stderr, "sched_setscheduler() failed: (%d) %s\n", + errno, strerror(errno)); +} + +double diffTimespec(const struct timespec *time1, const struct timespec *time0) +{ + return (time1->tv_sec - time0->tv_sec) + + (time1->tv_nsec - time0->tv_nsec) / 1000000000.0; +} + +uint64_t encodeUserData(char type, int fd) +{ + return (uint32_t)fd | ((uint64_t)type << 56); +} + +void decodeUserData(uint64_t data, char *type, int *fd) +{ + *type = data >> 56; + *fd = data & 0xffffffffU; +} + +const char *opTypeToStr(char type) +{ + const char *res; + + switch (type) { + case IOURING_RECV: + res = "IOURING_RECV"; + break; + case IOURING_SEND: + res = "IOURING_SEND"; + break; + case IOURING_RECVMSG: + res = "IOURING_RECVMSG"; + break; + case IOURING_SENDMSG: + res = "IOURING_SENDMSG"; + break; + default: + res = "Unknown"; + } + + return res; +} + +void reportNapi(struct ctx *ctx) +{ + unsigned int napi_id = 0; + socklen_t len = sizeof(napi_id); + + getsockopt(ctx->sockfd, SOL_SOCKET, SO_INCOMING_NAPI_ID, &napi_id, &len); + if (napi_id) + printf(" napi id: %d\n", napi_id); + else + printf(" unassigned napi id\n"); + + ctx->napi_check = true; +} + +void sendPing(struct ctx *ctx) +{ + struct io_uring_sqe *sqe = io_uring_get_sqe(&ctx->ring); + + clock_gettime(CLOCK_REALTIME, (struct timespec *)ctx->buffer); + + io_uring_prep_send(sqe, ctx->sockfd, ctx->buffer, sizeof(struct timespec), 0); + sqe->user_data = encodeUserData(IOURING_SEND, ctx->sockfd); +} + +void receivePing(struct ctx *ctx) +{ + struct io_uring_sqe *sqe = io_uring_get_sqe(&ctx->ring); + + io_uring_prep_recv(sqe, ctx->sockfd, ctx->buffer, MAXBUFLEN, 0); + sqe->user_data = encodeUserData(IOURING_RECV, ctx->sockfd); +} + +void recordRTT(struct ctx *ctx) +{ + struct timespec startTs = ctx->ts; + + // Send next ping. + sendPing(ctx); + + // Store round-trip time. + ctx->rtt[ctx->rtt_index] = diffTimespec(&ctx->ts, &startTs); + ctx->rtt_index++; +} + +void printStats(struct ctx *ctx) +{ + double minRTT = DBL_MAX; + double maxRTT = 0.0; + double avgRTT = 0.0; + double stddevRTT = 0.0; + + // Calculate min, max, avg. + for (int i = 0; i < ctx->rtt_index; i++) { + if (ctx->rtt[i] < minRTT) + minRTT = ctx->rtt[i]; + if (ctx->rtt[i] > maxRTT) + maxRTT = ctx->rtt[i]; + + avgRTT += ctx->rtt[i]; + } + avgRTT /= ctx->rtt_index; + + // Calculate stddev. + for (int i = 0; i < ctx->rtt_index; i++) + stddevRTT += fabs(ctx->rtt[i] - avgRTT); + stddevRTT /= ctx->rtt_index; + + fprintf(stdout, " rtt(us) min/avg/max/mdev = %.3f/%.3f/%.3f/%.3f\n", + minRTT * 1000000, avgRTT * 1000000, maxRTT * 1000000, stddevRTT * 1000000); +} + +int completion(struct ctx *ctx, struct io_uring_cqe *cqe) +{ + char type; + int fd; + int res = cqe->res; + + decodeUserData(cqe->user_data, &type, &fd); + if (res < 0) { + fprintf(stderr, "unexpected %s failure: (%d) %s\n", + opTypeToStr(type), -res, strerror(-res)); + return -1; + } + + switch (type) { + case IOURING_SEND: + receivePing(ctx); + break; + case IOURING_RECV: + if (res != sizeof(struct timespec)) { + fprintf(stderr, "unexpected ping reply len: %d\n", res); + abort(); + } + + if (!ctx->napi_check) { + reportNapi(ctx); + sendPing(ctx); + } else { + recordRTT(ctx); + } + + --ctx->num_pings; + break; + + default: + fprintf(stderr, "unexpected %s completion\n", + opTypeToStr(type)); + return -1; + break; + } + + return 0; +} + +int main(int argc, char *argv[]) +{ + struct ctx ctx; + struct options opt; + struct __kernel_timespec *tsPtr; + struct __kernel_timespec ts; + struct io_uring_params params; + struct io_uring_napi napi; + int flag; + + memset(&opt, 0, sizeof(struct options)); + + // Process flags. + while ((flag = getopt_long(argc, argv, ":hsbua:n:p:t:", longopts, NULL)) != -1) { + switch (flag) { + case 'a': + strcpy(opt.addr, optarg); + break; + case 'b': + opt.busy_loop = true; + break; + case 'h': + printUsage(argv[0]); + exit(0); + break; + case 'n': + opt.num_pings = atoi(optarg) + 1; + break; + case 'p': + strcpy(opt.port, optarg); + break; + case 's': + opt.sq_poll = true; + break; + case 't': + opt.timeout = atoi(optarg); + break; + case 'u': + opt.prefer_busy_poll = true; + break; + case ':': + printError("Missing argument", optopt); + printUsage(argv[0]); + exit(-1); + break; + case '?': + printError("Unrecognized option", optopt); + printUsage(argv[0]); + exit(-1); + break; + + default: + fprintf(stderr, "Fatal: Unexpected case in CmdLineProcessor switch()\n"); + exit(-1); + break; + } + } + + if (strlen(opt.addr) == 0) { + fprintf(stderr, "address option is mandatory\n"); + printUsage(argv[0]); + exit(1); + } + + ctx.saddr.sin6_port = htons(atoi(opt.port)); + ctx.saddr.sin6_family = AF_INET6; + + if (inet_pton(AF_INET6, opt.addr, &ctx.saddr.sin6_addr) <= 0) { + fprintf(stderr, "inet_pton error for %s\n", optarg); + printUsage(argv[0]); + exit(1); + } + + // Connect to server. + fprintf(stdout, "Connecting to %s... (port=%s) to send %d pings\n", opt.addr, opt.port, opt.num_pings - 1); + + if ((ctx.sockfd = socket(AF_INET6, SOCK_DGRAM, 0)) < 0) { + fprintf(stderr, "socket() failed: (%d) %s\n", errno, strerror(errno)); + exit(1); + } + + if (connect(ctx.sockfd, (struct sockaddr *)&ctx.saddr, sizeof(struct sockaddr_in6)) < 0) { + fprintf(stderr, "connect() failed: (%d) %s\n", errno, strerror(errno)); + exit(1); + } + + // Setup ring. + memset(¶ms, 0, sizeof(params)); + memset(&ts, 0, sizeof(ts)); + memset(&napi, 0, sizeof(napi)); + + if (opt.sq_poll) { + params.flags = IORING_SETUP_SQPOLL; + params.sq_thread_idle = 50; + } + + if (io_uring_queue_init_params(RINGSIZE, &ctx.ring, ¶ms) < 0) { + fprintf(stderr, "io_uring_queue_init_params() failed: (%d) %s\n", + errno, strerror(errno)); + exit(1); + } + + if (opt.timeout || opt.prefer_busy_poll) { + napi.prefer_busy_poll = opt.prefer_busy_poll; + napi.busy_poll_to = opt.timeout; + + io_uring_register_napi(&ctx.ring, &napi); + } + + if (opt.busy_loop) + tsPtr = &ts; + else + tsPtr = NULL; + + // Use realtime scheduler. + setProcessScheduler(); + + // Copy payload. + clock_gettime(CLOCK_REALTIME, &ctx.ts); + + // Setup context. + ctx.napi_check = false; + ctx.buffer_len = sizeof(struct timespec); + ctx.num_pings = opt.num_pings; + + ctx.rtt_index = 0; + ctx.rtt = (double *)malloc(sizeof(double) * opt.num_pings); + if (!ctx.rtt) { + fprintf(stderr, "Cannot allocate results array\n"); + exit(1); + } + + // Send initial message to get napi id. + sendPing(&ctx); + + while (ctx.num_pings != 0) { + int res; + unsigned num_completed = 0; + unsigned head; + struct io_uring_cqe *cqe; + + do { + res = io_uring_submit_and_wait_timeout(&ctx.ring, &cqe, 1, tsPtr, NULL); + } + while (res < 0 && errno == ETIME); + + io_uring_for_each_cqe(&ctx.ring, head, cqe) { + ++num_completed; + if (completion(&ctx, cqe)) + goto out; + } + + if (num_completed) + io_uring_cq_advance(&ctx.ring, num_completed); + } + + printStats(&ctx); + +out: + free(ctx.rtt); + + if (opt.timeout || opt.prefer_busy_poll) + io_uring_unregister_napi(&ctx.ring, &napi); + io_uring_queue_exit(&ctx.ring); + + // Clean up. + close(ctx.sockfd); + + return 0; +} diff --git a/examples/napi-busy-poll-server.c b/examples/napi-busy-poll-server.c new file mode 100644 index 0000000..1336ba8 --- /dev/null +++ b/examples/napi-busy-poll-server.c @@ -0,0 +1,386 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MAXBUFLEN 100 +#define PORTNOLEN 10 +#define ADDRLEN 80 +#define RINGSIZE 1024 + +#define printable(ch) (isprint((unsigned char)ch) ? ch : '#') + +enum { + IOURING_RECV, + IOURING_SEND, + IOURING_RECVMSG, + IOURING_SENDMSG +}; + +struct ctx +{ + struct io_uring ring; + struct sockaddr_in6 saddr; + struct iovec iov; + struct msghdr msg; + + int sockfd; + int buffer_len; + int num_pings; + bool napi_check; + + union { + char buffer[MAXBUFLEN]; + struct timespec ts; + }; +} ctx; + +struct options +{ + int num_pings; + int timeout; + + bool listen; + bool sq_poll; + bool busy_loop; + bool prefer_busy_poll; + + char port[PORTNOLEN]; + char addr[ADDRLEN]; +} options; + +struct option longopts[] = +{ + {"address" , 1, NULL, 'a'}, + {"busy" , 0, NULL, 'b'}, + {"help" , 0, NULL, 'h'}, + {"listen" , 0, NULL, 'l'}, + {"num_pings", 1, NULL, 'n'}, + {"port" , 1, NULL, 'p'}, + {"prefer" , 1, NULL, 'u'}, + {"sqpoll" , 0, NULL, 's'}, + {"timeout" , 1, NULL, 't'}, + {NULL , 0, NULL, 0 } +}; + +void printUsage(const char *name) +{ + fprintf(stderr, + "Usage: %s [-l|--listen] [-a|--address ip_address] [-p|--port port-no] [-s|--sqpoll]" + " [-b|--busy] [-n|--num pings] [-t|--timeout busy-poll-timeout] [-u|--prefer] [-h|--help]\n" + " --listen\n" + "-l : Server mode\n" + "--address\n" + "-a : remote or local ipv6 address\n" + "--busy\n" + "-b : busy poll io_uring instead of blocking.\n" + "--num_pings\n" + "-n : number of pings\n" + "--port\n" + "-p : port\n" + "--sqpoll\n" + "-s : Configure io_uring to use SQPOLL thread\n" + "--timeout\n" + "-t : Configure NAPI busy poll timeoutn" + "--prefer\n" + "-u : prefer NAPI busy poll\n" + "--help\n" + "-h : Display this usage message\n\n", + name); +} + +void printError(const char *msg, int opt) +{ + if (msg && opt) + fprintf(stderr, "%s (-%c)\n", msg, printable(opt)); +} + +void setProcessScheduler() +{ + struct sched_param param; + + param.sched_priority = sched_get_priority_max(SCHED_FIFO); + if (sched_setscheduler(0, SCHED_FIFO, ¶m) < 0) + fprintf(stderr, "sched_setscheduler() failed: (%d) %s\n", + errno, strerror(errno)); +} + +uint64_t encodeUserData(char type, int fd) +{ + return (uint32_t)fd | ((__u64)type << 56); +} + +void decodeUserData(uint64_t data, char *type, int *fd) +{ + *type = data >> 56; + *fd = data & 0xffffffffU; +} + +const char *opTypeToStr(char type) +{ + const char *res; + + switch (type) { + case IOURING_RECV: + res = "IOURING_RECV"; + break; + case IOURING_SEND: + res = "IOURING_SEND"; + break; + case IOURING_RECVMSG: + res = "IOURING_RECVMSG"; + break; + case IOURING_SENDMSG: + res = "IOURING_SENDMSG"; + break; + default: + res = "Unknown"; + } + + return res; +} + +void reportNapi(struct ctx *ctx) +{ + unsigned int napi_id = 0; + socklen_t len = sizeof(napi_id); + + getsockopt(ctx->sockfd, SOL_SOCKET, SO_INCOMING_NAPI_ID, &napi_id, &len); + if (napi_id) + printf(" napi id: %d\n", napi_id); + else + printf(" unassigned napi id\n"); + + ctx->napi_check = true; +} + +void sendPing(struct ctx *ctx) +{ + + struct io_uring_sqe *sqe = io_uring_get_sqe(&ctx->ring); + + io_uring_prep_sendmsg(sqe, ctx->sockfd, &ctx->msg, 0); + sqe->user_data = encodeUserData(IOURING_SENDMSG, ctx->sockfd); +} + +void receivePing(struct ctx *ctx) +{ + bzero(&ctx->msg, sizeof(struct msghdr)); + ctx->msg.msg_name = &ctx->saddr; + ctx->msg.msg_namelen = sizeof(struct sockaddr_in6); + ctx->iov.iov_base = ctx->buffer; + ctx->iov.iov_len = MAXBUFLEN; + ctx->msg.msg_iov = &ctx->iov; + ctx->msg.msg_iovlen = 1; + + struct io_uring_sqe *sqe = io_uring_get_sqe(&ctx->ring); + io_uring_prep_recvmsg(sqe, ctx->sockfd, &ctx->msg, 0); + sqe->user_data = encodeUserData(IOURING_RECVMSG, ctx->sockfd); +} + +void completion(struct ctx *ctx, struct io_uring_cqe *cqe) +{ + char type; + int fd; + int res = cqe->res; + + decodeUserData(cqe->user_data, &type, &fd); + if (res < 0) { + fprintf(stderr, "unexpected %s failure: (%d) %s\n", + opTypeToStr(type), -res, strerror(-res)); + abort(); + } + + switch (type) { + case IOURING_SENDMSG: + receivePing(ctx); + --ctx->num_pings; + break; + case IOURING_RECVMSG: + ctx->iov.iov_len = res; + sendPing(ctx); + if (!ctx->napi_check) + reportNapi(ctx); + break; + default: + fprintf(stderr, "unexpected %s completion\n", + opTypeToStr(type)); + abort(); + break; + } +} + +int main(int argc, char *argv[]) +{ + int flag; + struct ctx ctx; + struct options opt; + struct __kernel_timespec *tsPtr; + struct __kernel_timespec ts; + struct io_uring_params params; + struct io_uring_napi napi; + + memset(&opt, 0, sizeof(struct options)); + + // Process flags. + while ((flag = getopt_long(argc, argv, ":lhsbua:n:p:t:", longopts, NULL)) != -1) { + switch (flag) { + case 'a': + strcpy(opt.addr, optarg); + break; + case 'b': + opt.busy_loop = true; + break; + case 'h': + printUsage(argv[0]); + exit(0); + break; + case 'l': + opt.listen = true; + break; + case 'n': + opt.num_pings = atoi(optarg) + 1; + break; + case 'p': + strcpy(opt.port, optarg); + break; + case 's': + opt.sq_poll = true; + break; + case 't': + opt.timeout = atoi(optarg); + break; + case 'u': + opt.prefer_busy_poll = true; + break; + case ':': + printError("Missing argument", optopt); + printUsage(argv[0]); + exit(-1); + break; + case '?': + printError("Unrecognized option", optopt); + printUsage(argv[0]); + exit(-1); + break; + + default: + fprintf(stderr, "Fatal: Unexpected case in CmdLineProcessor switch()\n"); + exit(-1); + break; + } + } + + if (strlen(opt.addr) == 0) { + fprintf(stderr, "address option is mandatory\n"); + printUsage(argv[0]); + exit(1); + } + + ctx.saddr.sin6_port = htons(atoi(opt.port)); + ctx.saddr.sin6_family = AF_INET6; + + if (inet_pton(AF_INET6, opt.addr, &ctx.saddr.sin6_addr) <= 0) { + fprintf(stderr, "inet_pton error for %s\n", optarg); + printUsage(argv[0]); + exit(1); + } + + // Connect to server. + fprintf(stdout, "Listening %s : %s...\n", opt.addr, opt.port); + + if ((ctx.sockfd = socket(AF_INET6, SOCK_DGRAM, 0)) < 0) { + fprintf(stderr, "socket() failed: (%d) %s\n", errno, strerror(errno)); + exit(1); + } + + if (bind(ctx.sockfd, (struct sockaddr *)&ctx.saddr, sizeof(struct sockaddr_in6)) < 0) { + fprintf(stderr, "bind() failed: (%d) %s\n", errno, strerror(errno)); + exit(1); + } + + // Setup ring. + memset(¶ms, 0, sizeof(params)); + memset(&ts, 0, sizeof(ts)); + memset(&napi, 0, sizeof(napi)); + + if (opt.sq_poll) { + params.flags = IORING_SETUP_SQPOLL; + params.sq_thread_idle = 50; + } + + if (io_uring_queue_init_params(RINGSIZE, &ctx.ring, ¶ms) < 0) { + fprintf(stderr, "io_uring_queue_init_params() failed: (%d) %s\n", + errno, strerror(errno)); + exit(1); + } + + if (opt.timeout || opt.prefer_busy_poll) { + napi.prefer_busy_poll = opt.prefer_busy_poll; + napi.busy_poll_to = opt.timeout; + + io_uring_register_napi(&ctx.ring, &napi); + } + + if (opt.busy_loop) + tsPtr = &ts; + else + tsPtr = NULL; + + + // Use realtime scheduler. + setProcessScheduler(); + + // Copy payload. + clock_gettime(CLOCK_REALTIME, &ctx.ts); + + // Setup context. + ctx.napi_check = false; + ctx.buffer_len = sizeof(struct timespec); + ctx.num_pings = opt.num_pings; + + // Receive initial message to get napi id. + receivePing(&ctx); + + while (ctx.num_pings != 0) { + int res; + unsigned int num_completed = 0; + unsigned int head; + struct io_uring_cqe *cqe; + + do { + res = io_uring_submit_and_wait_timeout(&ctx.ring, &cqe, 1, tsPtr, NULL); + } + while (res < 0 && errno == ETIME); + + io_uring_for_each_cqe(&ctx.ring, head, cqe) { + ++num_completed; + completion(&ctx, cqe); + } + + if (num_completed) { + io_uring_cq_advance(&ctx.ring, num_completed); + } + } + + // Clean up. + if (opt.timeout || opt.prefer_busy_poll) + io_uring_unregister_napi(&ctx.ring, &napi); + + io_uring_queue_exit(&ctx.ring); + close(ctx.sockfd); + + return 0; +} From patchwork Mon Nov 21 19:14:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Roesch X-Patchwork-Id: 13051579 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B94BC352A1 for ; Mon, 21 Nov 2022 19:15:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229639AbiKUTPm (ORCPT ); Mon, 21 Nov 2022 14:15:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231403AbiKUTPd (ORCPT ); Mon, 21 Nov 2022 14:15:33 -0500 Received: from 66-220-144-178.mail-mxout.facebook.com (66-220-144-178.mail-mxout.facebook.com [66.220.144.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAF1DC67E7 for ; Mon, 21 Nov 2022 11:15:19 -0800 (PST) Received: by dev0134.prn3.facebook.com (Postfix, from userid 425415) id 9D05D1B81314; Mon, 21 Nov 2022 11:15:00 -0800 (PST) From: Stefan Roesch To: kernel-team@fb.com Cc: shr@devkernel.io, axboe@kernel.dk, olivier@trillion01.com, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, ammarfaizi2@gnuweeb.org Subject: [PATCH v5 4/4] liburing: update changelog with new feature Date: Mon, 21 Nov 2022 11:14:59 -0800 Message-Id: <20221121191459.998388-5-shr@devkernel.io> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221121191459.998388-1-shr@devkernel.io> References: <20221121191459.998388-1-shr@devkernel.io> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add a new entry to the changelog file for the napi busy poll feature. Signed-off-by: Stefan Roesch --- CHANGELOG | 3 +++ 1 file changed, 3 insertions(+) diff --git a/CHANGELOG b/CHANGELOG index 09511af..1db0269 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,6 @@ +liburing-2.4 release +- Support for napi busy polling + liburing-2.3 release - Support non-libc build for aarch64.